Wednesday, 16 May 2018

Weather Prediction with Machine Learning in MATLAB

This is the next in the series of  my Artificial Intelligence (AI) / Machine Learning (ML) posts . The first covered the use of TensorFlow for Object Detection. The second described how to deploy the trained TensorFlow model on the Google Cloud ML Engine.

In this third post, I focus entirely on MATLAB in order to explore its machine learning capabilities, specifically for prototyping ML models. The topic of deploying the trained models for production is touched upon but not expanded here.

As in my previous posts, I make no apologies for any technical decisions which may be considered sub-optimal by those who know better. This was an AI/ML learning exercise for me, first and foremost.

The Goal

Devise an ML algorithm to forecast the (aviation) weather, in half-hour increments, up to three days into the future, using historical time series of weather data. I was motivated to tackle this specific problem because (i) high-quality aviation weather data is readily available -- so the task of data preparation and cleansing is minimised, enabling the focus to be primarily on the ML algorithms; (ii) I've lamented the loss of the (very useful -- in my opinion) 3-day aviation forecast from the UK MET Office website ever since they removed it a couple of years ago.

The Dataset

For this initial exploration, I used aviation weather data (METARs) for Ronaldsway Airport (EGNS) on the Isle of Man since (i) I am based here and fly my Scottish Aviation Bulldog from here; (ii) being located in Northern Europe, the weather is varied and changeable, so the models (hopefully) have interesting features to be detected during training, thereby exercising the models' forecasting capabilities more than if the weather was uniform and more easily predictable. The modelling techniques presented for this single location can of course be extended to any other location for which analogous data exists (i.e., pretty-much anywhere).

The underlying weather data was obtained from US NOAA/NWS  (as utilised in JustMET, iNavCalc, and ReallySimpleMovingMap). The training set comprised METAR data captured every half hour for EGNS over the 3.5 month period from 30 December 2017 through 13 April 2018. Each half-hourly recorded METAR was persisted (for long term storage e.g., for future analysis) to an Amazon DynamoDB database, as well as to a Microsoft Azure SQL database (for temporary storage and staging). The triggering to capture each successive half-hourly METAR via web-service calls to NOAA was implemented using a Microsoft Azure Scheduler.

The Toolkit

Solution Path

Pre-processing the Raw Data

The first task was to pre-process the raw data, in this case, primarily to correct for data gaps since there was (inevitably) some (unforeseen) down-time over the weeks and months in the automated METAR capture process.

To start out, the data was retrieved (into MATLAB) from the Azure SQL database using the MATLAB Database Toolbox functionality. GOTCHA: the graphical interface bundled with the MATLAB Database Toolbox is rather limited and cannot handle stored procedures. Instead, the command-line functions must be used when retrieving data from databases via stored-procedures.

Next, the retrieved data was reformatted into a MATLAB Timetable. This proved to be a very convenient format for manipulating and preparing the data. It is an extension of the MATLAB table format, designed specifically to handle time-stamped data, and therefore ideal for handling the multivariate METAR time-series. Note: the MATLAB table format is a relatively recent innovation, and seems to be MATLAB's answer to the DataFrame object from the powerful and popular pandas library available for Python.

The set of 8 variables collected for analysis and forecasting are summarised below (for detailed definitions, see here). The variables pertain to observations made on the ground at the location of the given weather station (airport), distributed via the METAR reports. I have kept the units as per the METARs (rather than converting to S.I.). Each observation (i.e., at each sample time) contains the following set of data:
  • Temperature (units: degrees Celsius)
  • Dewpoint (units: degrees Celsius)
  • Cloudbase (units: feet)
  • Cloudcover (units: oktas, dimensionless, 0 implies clear sky, 8 implies overcast), converted to numerical value from the raw skycover categorical variable from the METAR (i.e.,"CAVOK" -- 0 oktas; "FEW" -- 1.5 oktas; "SCT" -- 3.5 oktas; "BKN" -- 5.5 oktas; "OVC" -- 8 oktas). Note: whenever "CAVOK" was specified, this was taken to set the Cloudcover value to zero and the Cloudbase value to 5000 feet -- even if skies were clear all the way up since the METAR vertical extent formally ends at 5000 feet above the airport (typically). Making this assumption (hopefully) means erring on the safe side, even if it tampers -- in a sense --with the natural data. 
  • Surface Pressure (units: Hectopascals)
  • Visibility (units: miles)
  • Wind Speed (units: knots)
  • Wind Direction (units: degrees from True North)
Additionally, the date and time of each observation is given (in UTC), from which the local time-of-day can be determined from the known longitude via the MATLAB expression:


Note: as well as local-time-of-day, it would also be worthwhile to include the day-of-year in the analyses (since weather is known to be seasonal). This was not done for now since the entire data set spans only 3.5 months rather than at least one complete year. This means that since the validation set is taken from the 30% tail-end (see later), and the training set is the 70% taken from the start, up to the beginning of the tail-end, there will be no common values for day-of-year in both the training and validation sets, so it is not sensible to include day-of-year for now. However, if/when the collected data set spans sufficient time (1.3 years for the 30:70 split) such that the training and validation sets both contain common values for day-of-year, then it should be included in a future refinement.

Filling the Data Gaps

The MATLAB command for creating the aforementioned timetable structure from individual vectors is as follows:


Since all the variables in the timetable are continuous (rather than categorical), it is simple in MATLAB to fill for missing data by interpolation as follows:

% Define the complete time vector, every 30 minutes
newTimes = [datetimeUTC(1):minutes(30):datetimeUTC(end)];

% Use interpolation for numerical values

TT.Properties.VariableContinuity = {'continuous','continuous','continuous','continuous',...

% Perform the data filling
TT1 = retime(TT,newTimes);

METAR Data Time Series Plots

These cleaned METAR time series for EGNS are plotted in the graphs below and serve as the source of training and validation data for the upcoming ML models. Each time series is 4,974 data points in length (corresponding to the 3.5 month historical record, sampled each half hour).

Modelling Phase 1: LSTM models for each variable

Since it is generally known that long short-term (LSTM) neural networks are well-suited to the task of building regression models for time series data, it seemed the natural starting point for these investigations, not least since LSTM layers are now available within MATLAB.

A separate LSTM model was therefore built for each of the METAR data variables by following the MATLAB example presented here. Note: it is possible to build an LSTM model for multiple time series taken together, but I felt it would be "asking too much" of the model, so I opted for separate models for each (single variable) time series. It may be worthwhile revisiting this decision in a future attempt at refining the modelling.

When building the models for the METAR data, the various (hyper-)parameters available within the LSTM model for "tweaking" (such as the number of neurons, per layer the number of layers, the number of samples back in time to be used to fit the model for looking forward in time, etc) not surprisingly needed to be changed from the default settings and from those settings presented in the MATLAB example, in order to achieve useful results on the METAR data. This is not unreasonable, given that the data sets are so different, and that machine learning is essentially data-driven. By trial-and-error experimentation, the following code snippet captures the set of hyper-parameters which were found to be effective on the METAR variables.

%% Example data setup for LSTM model on the first chunk of data

% Look back 92 hours. Seems suitable for METAR data
numTimeStepsTrain = 184; 

% 3 days maximum forecast look-ahead
numTimeStepsPred = 144;
windowLength = numTimeStepsPred+numTimeStepsTrain;

% where data_entire_history is entire time series

XTrain = data(1:numTimeStepsTrain);
% where data is the first window of time series

YTrain = data(2:numTimeStepsTrain+1);
% target for LSTM is one time-step into the future

XTest = data(numTimeStepsTrain+1:end-1);
% inputs for testing the LSTM model at all forecast look-aheads

YTest = data(numTimeStepsTrain+2:end);
% targets for testing the LSTM model at all forecast look-aheads

%For a better fit and to prevent the training from diverging,
%standardize the training data to have zero mean and unit variance.
%Standardize the test data using the same parameters as the training

mu = mean(XTrain);
sig = std(XTrain);

XTrain = (XTrain - mu) / sig;
YTrain = (YTrain - mu) / sig;

XTest = (XTest - mu) / sig;

%% Define LSTM Network Architecture
inputSize = 1;
numResponses = 1;
numHiddenUnits =65; % seems suitable for METAR data

layers = [ ...

opts = trainingOptions('adam', ...
    'MaxEpochs',maxEpochs, ...
    'GradientThreshold',1, ...
    'InitialLearnRate',0.005, ...
    'LearnRateSchedule','piecewise', ...
    'LearnRateDropPeriod',125, ...
    'LearnRateDropFactor',0.2, ...

% Train the LSTM network with the specified training options by

% using trainNetwork.

net = trainNetwork(XTrain,YTrain,layers,opts);

% Forecast Future Time Steps
% To forecast the values of multiple time steps in the future,

% use the predictAndUpdateState function to predict time steps
% one at a time and update the network state at each prediction.
% For each prediction, use the previous prediction as input to
% the function.
% To initialize the network state, first predict on the training 
% data XTrain. Next, make the first prediction using the last
% time step of the training response YTrain(end). Loop over the 
% remaining predictions and input the previous prediction to
% predictAndUpdateState.

net = predictAndUpdateState(net,XTrain);
[net,YPred] = predictAndUpdateState(net,YTrain(end));

numTimeStepsTest = numel(XTest);
for i = 2:numTimeStepsTest
    [net,YPred(1,i)] = predictAndUpdateState(net,YPred(i-1));

% Unstandardize the predictions using mu and sig calculated

% earlier.

YPred = sig*YPred + mu;

% The training progress plot reports the root-mean-square error

%(RMSE) calculated from the standardized data. Calculate the RMSE
% from the unstandardized predictions.

rmse = sqrt(mean((YPred-YTest).^2))

Note: it may be better to tweak the hyper-parameters specific to the modeling of each variable, again, not done, here, but an idea for future enhancements.

Using the above-mentioned set of hyper-parameters, the graph below shows a typical LSTM training convergence history (in this case, for Temperature). Note: this plot, (optionally) generated by MATLAB interactively during training, is similar to that available via TensorBoard (when training TensorFlow models), but with the added advantage that there is a "Stop Button" on the MATLAB interface that enables the user to stop the training at any time (and capture the network parameters at that time).

The typical forecast results (i.e., from just one arbitrary window of the historical data set) obtained from the LSTM models for each variable are shown in the following plots. In each plot, the (92 hour) training data window is (arbitrarily) chosen to be before "time zero" when the forecast starts (and extends from half an hour to 72 hours i.e., 3 days). The black curve is the training data, the blue curve the forecast results, and the red curve the test values against which the forecast performance can be directly compared. For all METAR variables, the forecasts are seen to be effective only out to a few hours at most, with significant deviations beyond that -- and some variables are worse than others.

Forecast Error Plots

By re-training each LSTM model for each METAR variable, at each successive sample point in the test data set, and comparing with the known measurements for each forecast time, it is possible to build-up a statistical picture of the average performance of the models over forecasting time. For the 3.5 month historical METAR data set, sampled every half hour, and subtracting two window widths (first and last), this implies training approximately 4600 LSTM models per METAR variable. Note: when it comes to production deployment of the models, the principle of re-training each model at each sample point i.e., each time a new weather observation comes in -- every half hour in the case of METARs, --means that the model for the given variable is the most up-to-date that it can be at any given time, for use in forecasting forward from that point in time. By averaging the mean-squared error in the (4600) forecasts of each of the trained models sliding forward in time from the begin to the end of the entire data set, the expected accuracy of the forecast for each look-ahead forecast time can be assessed. These accuracies (in terms of absolute average mean-square error and relative average mean-square error) are plotted in the Forecast Error Plots below (blue curves, labelled "LSTM alone") for each variable versus look-ahead time (from half an hour to three days). On the absolute error plots, the standard deviation of the underlying observations is also shown (denoted sdev obs). Whenever the error curve is (well) below the sdev obs line, the forecast can be considered better than random. But whenever the error curve is near to or above the sdev obs line, the forecast is no better than random, and should be considered as ineffective. Similarly, on the relative error curves, whenever the error is below 50% (as indicated by the line marked 50% error), the forecast may be considered as effective, though the lower the better. Above a relative error of 50%, the forecast should be considered as ineffective.

In terms of the range to which the LSTM forecasts are considered useful (in terms of look-ahead period), these values have been extracted from the plots and summarised in the following table.

Variable Usable forecast 
Temperature     6 hours
Dewpoint     2 hours
Cloudbase     2 hours
Cloud Cover     No usable forecast
Sea-Level Pressure    25 hours
Visibility    2 hours
Windspeed    3 hours
Wind Direction    3 hours

The LSTM forecasts are generally seen to be useful out to a few hours, with some exceptions: Sea-Level Pressure forecast is good out to (an impressive)  25 hours; but Cloud Cover forecast is no good at all.

Modelling Phase 2: Using the LSTM model outputs in combination with the other METAR variables to perform regressions

The benefit of the LSTM modelling from Phase 1 above is that the recent history of a given variable is utilised in predicting it's future path. This should presumably be better than using just a single snapshot in time (e.g., now) to predict the future. That said, from the results obtained, the accuracy diminishes quite significantly when forecasting out beyond a couple of hours or so. In this next phase, the idea is to utilise the information from the other METAR variables to help improve the forecasts for the given variable (which so far has been based only on histories of itself). This makes sense from the laws-of-physics point of view.  For example, the temperature an hour from now will depend not only on the temperature now, but on: the time of day (since there is a diurnal temperature cycle) of the measurement and of the desired forecast; the extent of cloud cover; the wind strength (possibly), etc., so it makes intuitive sense to somehow tie these other known measurements and time-factors into the forecasts for a given variable.

The strategy therefore is to re-cast the forecasting task as a neural network multivariate regression problem where the inputs (regressors) comprise: (i) all the measured METAR variables at a given time, (ii) the time of day of those measurements, (ii) the time difference between the measurement time and the time of the forecast looking ahead; (iii) and the estimated value of the variable in question at the time of the forecast looking ahead obtained from the LSTM model from Phase 1. The output (target) of the regression is the estimate of the value of the variable in question at the time of the forecast looking ahead (for each look ahead time). For training, all input and output values are known. Moreover, since the LSTM models have been re-defined every half hour (by sliding through the entire data set), a large set (i.e., 664,521) of input/output values is available for training this neural net regression model.

To perform the neural network regression, MATLAB has two options available: the (older) train function; and the newer trainNetwork function (which was used above for the LSTM training). The differences between the two methods are discussed here. I opted to use the newer trainNetwork method since it is focused on Deep Learning and can make use of large data sets running on GPUs. At this point I would like to extend my gratitude to Musab Khawaja at the Mathworks who provided me with sample code (in the snippet below) demonstrating how to adapt the imageInputLayer (normally used for image processing) for use in general-purpose multivariable regression.

As with the LSTM modelling, the hyper-parameters need to be chosen for the training. Again, by trial-and-error, the following common set (within the code snippet below) proved to be suitable for each of the METAR data fits:

% Here, x is the array of appropriate regressor observations, and
% t is the vector of targets

% Create a table with last col the outputs
data=array2table([x' t']);

numVars = sizen;
n = height(data);

% reshape to 4D - first 3D for the 'image', last D for each
% sample
dataArray = reshape(data{:,1:numVars}', [numVars 1 1 n]);

% ...assume first numVars columns are predictors (regressors)

output = data{:,numVars+1}; % assume response is last column

% Split into 70% training and 30% test set
pc = 0.7;
rng('default') % for reproducibility
% Don't shuffle yet, since don't want training sliding
% window to leak into validation set
max_i = ceil(n*pc);
idxTrain = idx(1:max_i);
idxTest = idx(max_i+1:n);

% Now shuffle the training and validation sets independently

% Prepare arrays for regressions

trainingData = dataArray(:, :, :, idxTrain);
trainingOutput = output(idxTrain);

testData = dataArray(:, :, :, idxTest);
testOutput = output(idxTest);
testSet = {testData, testOutput};

% Define network architecture
layers = [...
    imageInputLayer([numVars,1,1]) % Non-image regression!
    fullyConnectedLayer(500) % Seems suitable for METAR data


% Set training options
options = trainingOptions('adam', ...
    'ExecutionEnvironment','gpu', ...
    'InitialLearnRate', 1e-4, ...
    'MaxEpochs', 1000, ...
    'MiniBatchSize', 10000,...% Seems suitable for METAR data
    'ValidationData', testSet, ...
    'ValidationFrequency', 25, ...
    'ValidationPatience', 5, ... 

    'Shuffle', 'every-epoch', ...

% Train
net = trainNetwork(trainingData, trainingOutput, layers, options);

% Predict for validation plots

A Deep Learning model with the afore-mentioned hyper-parameters was trained for each METAR variable, in turn, selected as the target. As evident from the code snippet, the first 70% of the entire data set was used for training (which amounted to 465,165 data points), the remaining 30% ("tail-end") for validation (which amounted to 199,356  data points).

The errors in the corresponding forecasts when applied to the validation data are displayed as the red curves (labelled "Multi-regression plus LSTM") in the Forecast Error Plots presented earlier. For comparison, the yellow curves (labelled "Multi-regression alone") in t
he error plots correspond to the multivariate regressions re-trained but this time excluding the LSTM outputs as regressors. It can be seen that in all cases, the LSTM models out-perform the Multi-regressions when limiting our attention to those regimes where the forecasts are deemed to be useful i.e., when the absolute rms error is below the standard deviation of the observations and when the relative rms error is below 50%. This came somewhat as a surprise, since intuitively it was felt that the addition of information via the other variables should have been more beneficial than was observed. Perhaps a refined regression analysis, as discussed below, would reveal such. Outside the usable regimes, the Multi-regressions sometimes out-perform the LSTM models, but by then, none of the models are effective (errors too large). It is also interesting to note that the inclusion of the LSTM outputs as inputs to the Multi-regressions generally improves their performance (i.e., red curves lower than yellow curves) -- but not in every case (see for example, the Cloudbase forecasts, where the yellow curve is lower than the red curve).

Other Things To Try

Some ideas to try next include (not exhaustive):

  • The LSTM models were trained to target the observations just one sample period (half hour) away from the inputs, but then asked to predict out to three days away, with the accuracy dropping off dramatically within the first few samples ahead. Instead, it might be worth trying training a different LSTM model for each look-ahead period (by down-sampling before training). This would entail having a different LSTM model per look-ahead period, but perhaps the forecast accuracy would be better, particularly further out ?
  • Likewise, for the Multi-regression models, all look-ahead periods were included in a single regression model. This means that the accuracy for the short look-ahead periods is penalised by the errors further out (since the stochastic gradient descent optimiser minimises a single number: the rms error across all look-ahead periods). Instead, it might be worth trying training a different Multi-regression model for each look-ahead period. Again, this would entail more models, but the accuracy may be better.
  • The hyper-parameter settings for all models were set via trial-and-error. It might be worth trying a more systematic approach e.g., by invoking an outer layer of optimisation which uses for example, techniques invoking genetic algorithms, to choose the optimum set of hyper parameters.
  • Try incorporating additional information in the regressions. For example, weather data for other locations known to correlate with the weather in the given location. Case-in-point: since most of the weather systems on the Isle of Man originate from the Atlantic i.e., to the west, it might be useful to incorporate weather data from Ireland, with suitable lag, to try and improve the model predictions for the Isle of Man. 

End-to-End Recipe For Weather Forecaster

If the Multi-regression results can be improved by the suggestions above, such that they can compete with the LSTM models over some portion(s) of the look-ahead range, then the following general recipe for an online ML-based weather forecaster can be proposed:

Every few months (or so):

  1. For a given location, gather as long a history of half-hourly METAR data as possible/available, ideally over at least the past year in order to capture seasonal variations
  2. From the data in 1), perform a set of LSTM fits (sliding across the data, one sample at a time) to obtain the estimated quantities for use as inputs (alongside the METAR data) to the multivariate regressions. Note: for the 3.5 month historical data set, this set of fits took approximately two days per METAR variable on a p2.xlarge (GPU-equipped) AWS instance, owing to the many thousands of LSTM training runs required.  
  3. With the data from 1) combined with the estimates from 2), Perform Deep learning multivariate regressions for each target variable. Refer to this trained model as the REGRESSION MODEL for the given target variable. Also perform a regression for the given target variable excluding the LSTM estimates as a regressor. Refer to this trained model as the REDUCED REGRESSION MODELfor the given target variable. Note: for the 3.5 month historical data set, these two fits took approximately one hour per METAR variable on a p2.xlarge (GPU-equipped) AWS instance. Re-create a revised set of Forecast Error Plots from the results of the runs in 2 & 3 (in order to be able to select the best model per forecast look-ahead period, see below).

Every time a new observation is received (half-hourly):

  1. Re-train the LSTM models, one per variable, using the latest measurement as the most recent available. For each variable, refer to this trained model as the LSTM MODEL for the given variable (note: this fit will take a few minutes per METAR variable on a p2.xlarge GPU-equipped AWS instance). 
  2. For each forecast look-ahead period (i.e., half hourly up to three days ahead), use each LSTM MODEL each REGRESSION MODEL, and each REDUCED REGRESSION MODEL to generate three different forecasts for the given variable (note: these will take only a few seconds per METAR variable on a p2.xlarge GPU-equipped AWS instance). For each forecast look-ahead period, choose the forecast (i.e., from the LSTM MODEL, the REGRESSION MODEL, or the REDUCED REGRESSION MODEL) depending on which gives the lowest rms error for the given forecast look-ahead period, by referring to the updated Forecast Error Plots. 

Production Deployment Possibilities

The optimum choice of computational and software platform for the Production Deployment of the end-to-end ML-based weather forecaster presented above is not at all clear, requiring a detailed  exploration of the available technical options and trade-offs. However, the following possibilities come to mind, each with its own advantages and disadvantages:

  1. Deploy on a suite of MATLAB-equipped Cloud-based server instances. Has the advantage that the code can be used essentially "as is" (since the MATLAB code is already written via the prototypes presented here). Has the disadvantage with respect to cost that the servers would have to be "always on", and the associated MATLAB licensing costs may become prohibitive.
  2. Use the MATLAB compiler to package the trained models into deployable libraries which can be installed within (say, Docker) containers which can be instantiated on-demand in the Cloud (and automatically shut down when dormant). Has the advantages that the code is essentially written (just needs to be run through the MATLAB compiler); and that by using containers, there is no need to incur the cost of "always on" server instances. There some open questions, however: can the (half hourly) re-training of the LSTM models via the trainNetwork function be compiled via the MATLAB Compiler?; can functions deployed from the MATLAB Compiler access GPUs, or must the GPU Coder be used?; can compiled MATLAB software running within containers access GPUs? 
  3. Re-write the models in an open-source ML framework such as TensorFlow and deploy on the Google Cloud ML Engine as exemplified here. Has the disadvantage that all the models would have to be rewritten, outside MATLAB.
  4. Any suggestions welcome : )

A Note On Workflow

In the past, I would tend to use MATLAB much like I would use other functional programming languages i.e., by creating many functions (subroutines) and calling them from a main program. However, by its very nature, machine learning is much more of a trial-and-error process than the type of analyses I have been used to. It is generally more amenable to the interactive process of defining a set of parameters, running the parameters through a script (e.g., which contains the ML model training commands), viewing the outputs (e.g., in terms of suitable performance metrics), re-assessing the assumptions and tweaking the parameters accordingly, then running the script again, etc., until a satisfactory outcome has been achieved.  In fact, this very mode of interaction has resulted in Jupyter Notebooks being one of the most widely-used IDEs for developing ML models in Python. Again, MATLAB seems to have their own recently-introduced answer to this: namely, the MATLAB Live Editor. As such, when starting out on this exploration, and having successfully used Jupyter Notebooks for Python on previous ML projects, I launched into using the MATLAB Live Editor for running the aforementioned interactive ML design scripts. Whilst I found this to be useful in the early prototyping stage for a given model, I reverted back to the tried-and-tested technique of executing scripts (stored in m-files) with embedded local sub-functions (for calling from loops in the given script). I simply found this mode of operation to be more productive. Also, the publish options from Live Edit seemed to be less flexible and less configurable than for normal m-file scripts.


  • MATLAB is a highly productive platform for prototyping Machine Learning (in particular, Deep Learning) algorithms. The data-wrangling tools are excellent. In my opinion, it is easier to develop the ML models in MATLAB than in Python/TensorFlow, but that could be due to the fact that I have a long experience (decades) with using MATLAB compared with only a few weeks using Python/TensorFlow.
  • Weather forecasting is a hard problem. The Deep Learning approaches developed here show some promise, particularly the LSTM models, but generally only out to a few hours -- and not to the 3 days desired at the outset. Further refinement (perhaps along the lines presented above in Other Things To Try) would hopefully improve the predictive ability of the models.

Saturday, 5 May 2018

Navigation -- New Track Reference Technique

Stuck on the ground due to fog on the Isle of Man today, waiting to do an air-test on one of our Bulldogs (to bed-in its brand new engine), our test pilot Robert Miller (with 21,000 hours on non-airline military and civilian aircraft!) spent the afternoon in Costa's Coffee Castletown explaining the New Track Reference Technique for aerial navigation using map and compass. Here's his write-up.

Wednesday, 21 March 2018

Royal Air Force Centenary

Unfortunately, work commitments take me out of the country from tomorrow onwards including1 April 2018, so I am no longer able to participate in the planned flying activities (Bulldog and Chipmunk formation at RAF Henlow) in celebration of the RAF Centenary. So, I grabbed an hour of decent weather this morning on the Isle of Man, and flew my ex-RAF Bulldog TMk1 as a minor personal tribute. See photos.

I have the privilege of having been taught to fly by the Royal Air Force at the Universities of Glasgow and Strathclyde Air Squadron back in the 1980s.

...and yesterday's tragic events at RAF Valley are a stark reminder of the risks taken ever day on our behalf...

Tuesday, 2 January 2018

Deploying a TensorFlow Object Detector into Production using Google Cloud ML Engine

This is the follow-on post to my previous post which described how I trained a Deep Learning AI (using the Google Object Detection API )  to detect specific "P" symbols on screenshots of map images (as used by ParkingRadar).

In this post, I describe the final part of the process: namely deploying the trained AI model into "production".

Google Cloud ML Engine

As the title of the post suggests, I opted for the Google Cloud ML Engine for the production infrastructure for the simple reason that I wanted a serverless solution such that I would only be paying on-demand for the required computing resources as I needed them, rather than having to pay for  continuously-operating virtual machine(s) (or Docker container(s)) whether I was utilising them or not.

From what I could ascertain at the time I was deciding, Google Cloud ML Engine was the only available solution which provides such on-demand scaling (importantly, effectively reducing my assigned resources -- and costs -- to zero when not in use by me). Since then, AWS SageMaker has come on the scene, but I could not determine from the associated documentation whether the computing resources are similarly auto-scaled (from as low as zero). If anyone knows the answer to this, please advise via the Comments section below.

GOTCHA: one of the important limitations of the  Google Cloud ML Engine for online prediction is that it auto-allocates single core CPU-based nodes (virtual machines), rather than GPUs. This means that the prediction is slow -- especially on the (relatively complex) TensorFlow object detector model which I'm using (multiple minutes per prediction!). I suppose this may be the price one has to pay for the on-demand flexibility, but since Google obviously has GPUs and TPUs at their disposal, it would be a welcome improvement if they were to offer such on their Cloud ML Engine. Maybe that will come...

Deploying the TensorFlow Model into Google Cloud ML

Exporting the Trained Model from TensorFlow

The first step is to export the trained model in the appropriate format. As in the previous post, and picking up where I left off,  the Python method included with the Google Object Detection API does this, and can be called from the Ubuntu console as follows:

python object_detection/
--input_type encoded_image_string_tensor
--output_directory /risklogical/DeeplearningImages/Outputs/PR_Detector_JustP_RCNN_ForDeploy

where the paths and filenames are obviously substituted with your own. GOTCHA: in the above code snippet, it is important to specify  

--input_type encoded_image_string_tensor

rather than what I used previously, namely

--input_type image_tensor

since by specifying encoded_image_string_tensor  this enables the image data to be presented to the model via encoded JSON via a RESTful web-service (in production)  rather than simply via Python code (which I used in the previous post for post-training ad hoc testing of the model).

DOUBLE GOTCHA: ...and this is perhaps the worst of all the gotchas from the entire project. Namely, the Google object detection TensorFlow models, when exported via the Google API command as presented above, are NOT COMPATIBLE with the Google Cloud ML Engine if the command IS NOT RUN VIA TensorFlow VERSION 1.2. If you happen to use a later version of TensorFlow such as TF 1.3 (as I first did, since that was what I had installed on my Ubuntu development machine for training the model) THE MODEL WILL FAIL on the Google Cloud ML Engine. The workaround is to create a Virtual Environment, install TensorFlow Version 1.2 into that Virtual Environment, and run the command as presented above, from within the Virtual Environment. Perhaps the latest version of TensorFlow has eliminated this annoying incompatibility, but I'm not sure. If it has indeed not yet been resolved (does anyone know?), then c'mon Google!

Deploying the Exported Model to Google Cloud ML

Creating a Google Compute Cloud Account

In order to complete the next few steps, I had to create an account on Google Compute Cloud. That is all well-documented and the procedure will not be repeated here. The process was straightforward.

Installing the Google Cloud SDK

This is required in order to interact with the Google Compute Cloud from my Ubuntu model-building/training machine e.g., for copying the exported model across. The SDK and installation instructions can be found here. The process was straightforward.

Copying the Exported Model to Google Cloud Storage Platform

I copied the exported model described earlier up to the cloud by issuing the following command from the Ubuntu console:

gsutil cp -r /risklogical/DeeplearningImages/Outputs/PR_Detector_JustP_RCNN_ForDeploy/saved_model/ gs://parkingradar/trained_models/ 

where the gsutil application is from the Google Cloud SDK. The parameter containing the path to the saved model uses the same path specified when calling the method above (and obviously should be substituted with yours), and the destination on Google Cloud Storage ("gs://...") is where my models are (temporarily) stored in a staging area on the cloud (and obviously should be substituted with yours).

Creating the Model on Google Cloud ML

I then had to create what Google Cloud ML refers to as a 'model' -- but which is really just a container for actual models which are then distinguished by version number -- by issuing the following command from the Ubuntu console:

gcloud ml-engine models create DetectPsymbolOnOSMMap --regions us-central1/ 

where the gcloud application is from the Google Cloud SDK. The name DetectPsymbolOnOSMMap is the (arbitrary) name I gave to my 'model', and the --regions  parameter allows me to specify the location of the infrastructure on the Google Compute Cloud (I selected us-central1).

The next step is the key one for creating the actual runtime model on the Google Cloud ML. I did this by issuing the following command from the Ubuntu console:

gcloud ml-engine versions create v3 --model DetectPsymbolOnOSMMap --origin=gs://parkingradar/trained_models/saved_model --runtime-version=1.2
What this command does is create a new runtime version under the model tag name DetectPsymbolOnOSMMap (version v3 in this example -- as I had already created v1, and v2 from earlier prototypes) of the exported TensorFow model held in the temporary cloud staging area (gs://parkingradar/trained_models/saved_model ).  GOTCHA: it is essential to specify the parameter --runtime-version=1.2 (for the TensorFlow version) since Google Cloud ML does not support later versions of TensorFlow (see earlier DOUBLE GOTCHA).

At this point I found it helpful to login to the Google Compute Cloud portal (using my Google Compute Cloud access credentials) where I can view my deployed models. Here's what the portal looks like for the model version just deployed:

At this point, the exported TensorFlow model is now available for running on Google Cloud ML. It can be run remotely for test purposes (via the gcloud ml-engine predict command) but I'll not cover that here since my central purpose was to invoke the model from a web-service in order to "hook it up" to the ParkingRadar back-end, so I'll move on to that general topic now.

Running the Exported Model on Google Cloud ML via a C# wrapper

Why C# ?

Since the ParkingRadar back-end stack is written in C#, I opted for C# for developing the wrapper code for calling the model on Google Cloud ML. Although Python was the most suitable choice for training and preparing the Deep Learning model for deployment, in my case C# was the natural choice for this next phase.

This reference provides comprehensive example code necessary to get it all working -- mostly. I say mostly, because in that reference they gloss over the issues surrounding authentication via OAUTH2. It turns out that the aspects surrounding authentication were the most awkward to resolve, so I'll provide some details on how to get this working.

Source-code snippet

Here is the C# code-listing containing the essential elements for wrapping the calls to the deployed model on Google Cloud ML (for the specific deployed model and version described above). The code contains all the key components including (i) a convenient class for formatting the image to be sent, (ii) the code required for authentication via OAUTH2; (iii) the code to make the actual call via RESTful web-service to the appropriate end-point for the model running on the Google Cloud ML; (iv) code for interpreting the results returned from the prediction including the parsing of the bounding boxes, and filtering results against a specified threshold score. The results are packaged into XML, but this is entirely optional and can instead be packaged into whatever format you wish.  Hopefully the code is self-explanatory. GOTCHA: for reasons unknown to me at least, specification of model version caused a JSON parsing failure. The workaround was to leave the version parameter blank in the method call. This forces Google Cloud ML to use the assigned default version for the given model. This default assignment can be easily adjusted via the Google Compute Cloud portal introduced earlier.

using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;
using Google.Apis.Auth.OAuth2;
using Newtonsoft.Json;
using System.IO;
using System.Xml;

namespace prediction_client

  class Image
      public String imageBase64String { get; set; }

      public String imageAsJsonForTF;// { get; set; }
      public Image(string imageBase64String)
       this.imageBase64String = imageBase64String;
       this.imageAsJsonForTF = "{\"instances\": [{\"b64\":\"" + this.imageBase64String + "\"}]}";

 class Prediction
  //For object detection
  public List<Double> detection_classes { get; set; }
  public List<Double> detection_boxes { get; set; }
  public List<Double> detection_scores { get; set; }

  public override string ToString()
    return JsonConvert.SerializeObject(this);

 class PredictClient

  private HttpClient client;
  public PredictClient()
    this.client = new HttpClient();
    client.BaseAddress = new Uri("");
    client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));

    //Set infinite timeout for long ML runs (default 100 sec)
    client.Timeout = System.Threading.Timeout.InfiniteTimeSpan;

public async Task<string> Predict<I, O>(String project, String                         model, string instances, String version = null)
 var version_suffix = version == null ? "" : $"/version/{version}";
 var model_uri = $"projects/{project}/models/{model{version_suffix}";
 var predict_uri = $"{model_uri}:predict";

//Service Accounts which is what should be used here rather than


// Get active credential from credentials json file distributed with
// app
// NOTE: need to use App_data folder since cannot put files in bin 
// on Azure web-service...
  string credPath = System.Web.Hosting.HostingEnvironment.MapPath(@"~/App_Data/**********-********.json");  

 var json = File.ReadAllText(credPath);
 Newtonsoft.Json.Linq.JObject cr = (Newtonsoft.Json.Linq.JObject)JsonConvert.DeserializeObject(json);
 string s = (string)cr.GetValue("private_key");
 // Create an explicit ServiceAccountCredential 

 // credential           
  ServiceAccountCredential credential = null;
  credential = new ServiceAccountCredential(
  new ServiceAccountCredential.Initializer((string)cr.GetValue("client_email"))//("client_email"))
  Scopes = new[] { ""  }

 var bearer_token = await credential.GetAccessTokenForRequestAsync().ConfigureAwait(false);

 client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", bearer_token);
 var request = instances;
 var content = new StringContent(instances, Encoding.UTF8, "application/json");

 var responseMessage = await client.PostAsync(predict_uri, content);

 var responseBody = await responseMessage.Content.ReadAsStringAsync();
 return responseBody;

 class PredictionCaller
  static PredictClient client = new PredictClient();
  private String project = "************";
  private String model = "DetectPsymbolOnOSMMap";
  private String version = "v3";

  //Only show results with score >=this        
  private double thresholdSuccessPercent = 0.95;

     private String imageBase64String;
  public string resultXmlStr = null;

 public PredictionCaller(string project, string model, double thresholdSuccessPercent, string imageBase64String)
    this.project = project;
    this.model = model;
    //this.version = version;//OMIT and force use of DEFAULT version
    this.thresholdSuccessPercent = thresholdSuccessPercent;
    this.imageBase64String = imageBase64String;

 public async Task RunAsync()
  string XMLstr = null;
  string errStr = null;
   Image image = new Image(this.imageBase64String);
   var instances = image.imageAsJsonForTF;

   string responseJSON = await client.Predict<String, Prediction>(this.project, this.model, instances).ConfigureAwait(false); //version blank to force use of default version for model

//since version mechanism not working via json ???

   dynamic response = JsonConvert.DeserializeObject(responseJSON);
    int numberOfDetections = Convert.ToInt32(response.predictions[0].num_detections);

//Create XML of detection results
  XMLstr = "<PredictionResults Project=\"" + project + "\" Model =\"" + model + "\"  Version =\"" + version + "\" SuccessThreshold =\"" + thresholdSuccessPercent.ToString() + "\">";

   for (int i = 0; i < numberOfDetections; i++)
    double score = (double)response.predictions[0].detection_scores[i];
    double[] box = new double[4];
    for (int j = 0; j < 4; j++)
      box[j] = (double)response.predictions[0].detection_boxes[i][j];
//See //
    double box_ymin = (double)box[0];

    double box_xmin = (double)box[1];
    double box_ymax = (double)box[2];
    double box_xmax = (double)box[3];

    //Just include if score better than threshold%
    if (score >= thresholdSuccessPercent)

      XMLstr += "<Prediction Score=\"" + score.ToString() + "\" Xmin =\"" + box_xmin.ToString() + "\" Xmax =\"" + box_xmax.ToString() + "\" Ymin =\"" + box_ymin.ToString() + "\" Ymax =\"" + box_ymax.ToString() + "\"/>";
     catch (Exception E)
      errStr += "<Error><![CDATA[" + E.Message + "]]></Error>";
  catch (Exception E)
   errStr += "<Error><![CDATA[" + E.Message + "]]></Error>";
    if (!string.IsNullOrWhiteSpace(errStr))
     XMLstr += errStr;
    XMLstr += "</PredictionResults>";

   //safety test that XML good
    XmlDocument xmlDoc = new XmlDocument();
  catch (Exception e)
    XMLstr = "<Error>CLOUD_ML_ENGINE_FAILURE</Error>";


For the ParkingRadar application, I actually built the above code into a RESTful web-service hosted on Microsoft Azure cloud where some of the ParkingRadar back-end code-stack resides. The corresponding WebApi controller code looks like this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Web.Http;
namespace FlyRestful.Controllers
    public class Parameters
        public string project { get; set; }
        public string model { get; set; }
        public string thresholdSuccessPercent { get; set; }
        public string imageBase64String { get; set; }
    public class GoogleMLController : ApiController
        [Route("***/********")] //route omitted from BLOG post
        public string PerformPrediction([FromBody] Parameters args)
            string result = null;
                string model = args.model;
                string project = args.project;
                string thresholdSuccessPercent = args.thresholdSuccessPercent;
                string imageBase64String = args.imageBase64String;

                prediction_client.PredictionCaller pc = new prediction_client.PredictionCaller(project, model, double.Parse(thresholdSuccessPercent), imageBase64String);  
                result = pc.resultXmlStr;
            catch (Exception E)
                result = E.Message;
            return result;

...and below is an example client-side caller to this RESTful web-service (snippet taken from a c# Windows console app). This sample includes (i) code for converting a test '.png' image file into the appropriate format for encoding via JSON for consumption by the aforementioned web-service (and passing on to the TensorFlow model); (ii) calling the predictor and retrieving the prediction results; (iii) converting the returned bounding boxes into latitude, longitude offsets (representing the centre-point location of given bounding-box since that is what ParkingRadar actually only cares about!)

static async Task RunViaWebService()
            try {
                HttpClient client = new HttpClient();
                client.BaseAddress = new Uri("https://*******/***/"); //hidden on BLOG
                client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
                //Set infinite timeout for long ML runs (default 100 sec)
                client.Timeout = System.Threading.Timeout.InfiniteTimeSpan;
                var predict_uri = "*******"; //hidden on BLOG
                Dictionary<string, string> parameters = new Dictionary<string, string>();            
                parameters.Add("project", project);
                parameters.Add("model", model);
                parameters.Add("thresholdSuccessPercent", thresholdSuccessPercent.ToString());
                //Load  a sample PNG
                string fullFile = @"ExampleRuntimeImages\FromScreenshot.png";
                //Load image file into bytes and then into BAse64 string format for transport via JSON       
                parameters.Add("imageBase64String", System.Convert.ToBase64String(System.IO.File.ReadAllBytes(fullFile)));
                var jsonString = JsonConvert.SerializeObject(parameters);
                var content = new StringContent(jsonString, Encoding.UTF8, "application/json");
                var responseMessage = await client.PostAsync(predict_uri, content);
                var resultStr = await responseMessage.Content.ReadAsStringAsync();
                //Now create lat-lon of centre point for each boumding box
                //Since this data set created from ZOOM 17 on standrad web mercator
                //Mapscale=1:4514 , 1 pixel=0.00001 decimal degrees (1.194329 m a t equator)
                // See   0.003/256 = 1.1719e-5
                double pixelsToDegrees = 0.000011719;
                //Strip out string delimiters
                resultStr = resultStr.Remove(0, 1);
                resultStr = resultStr.Remove(resultStr.Length - 1, 1);
                resultStr = resultStr.Replace("\\", "");
                XmlDocument tempDoc = new XmlDocument();
                XmlNodeList resNodes=tempDoc.SelectNodes("//Prediction");
                if (resNodes != null)
                    foreach (XmlNode res in resNodes)
                        double Xmin = double.Parse(res.SelectSingleNode("@Xmin").InnerText);
                        double Xmax = double.Parse(res.SelectSingleNode("@Xmax").InnerText);
                        double Ymin = double.Parse(res.SelectSingleNode("@Ymin").InnerText);
                        double Ymax = double.Parse(res.SelectSingleNode("@Ymax").InnerText);
                        double lat = testLat + pixelsToDegrees * (0.5 - 0.5 * (Ymin + Ymax)) * imageHeightPx;
                        double lon = testLon + pixelsToDegrees * (0.5 * (Xmin + Xmax) - 0.5) * imageHeightPx;
                        Console.WriteLine("LAT " + lat.ToString() + ", LON " + lon.ToString());

            catch (Exception E)

With the RESTFul web-service (and suitable client-code) deployed on Azure, the entire project is complete. The goals have been met. The "P" symbol object detector is now live "in production" within the ParkingRadar back-end code-stack, and has been running successfully for some weeks now.

Closing Comments

If you have read this post (and especially the previous post) in it's entirety, I expect you will agree that the process for implementing a Deep Learning object-detection model in TensorFlow can reasonably be described as tedious. Moreover, if you have actually implemented a similar model in a similar way, you will know just how tedious it can be. I hope the code snippets provided here may be helpful if you happen to get stuck along the way.

All that said, it is nevertheless quite remarkable, to me at least, that I was able to create a Deep Learning object detector and deploy it in "production" to the (serverless) cloud, all with open-source software, albeit with some bumps in the road. Google should be congratulated on making all that possible.

Do I think the Deep Learning model can be considered in any way "Intelligent" ?

No, I don't. I see it as a powerful computer program which utilises a cascade of nonlinear elements to perform the complex task of pattern recognition. Like all computer programs, it needs to be told precisely what to do -- and in the specific case of these Deep Learning neural nets -- it needs to be told not just once, but thousands of times via the painstakingly prepared training images. Its abilities are also very narrow and brittle. Case in point, with the "P" symbol detector, because it has been trained on images where each "P" symbol is enclosed in a separate, isolated bounding box, it completely fails to recognise "P" symbols which are closer together than the dimension of the bounding box. Or put another way, it cannot handle images where the bounding boxes overlap one another. One could imagine trying to create a further set of training images which attempt to cater for all possibilities of such overlaps: but the number of possibilities to cover would be very large, maybe impractically large. By contrast, I could imagine asking a young child to draw a circle round every "P " on the image. I would only have to demonstrate once (or maybe even not at all, the description being sufficient), and the child would "get it", and would circle all "P"s it could find, no matter how close they are to each other. That is the difference. And the difference is huge.

The Future

In the near to mid-term, I aim to (i) investigate other open-source AI frameworks such as AWS SageMaker; (ii) give MATLAB Deep Learning a run for its money; (iii) hope that Google will enhance their Cloud ML offering by providing access to GPUs (or, better still, TPUs).