Wednesday 30 May 2018

Introducing NeuralMET

Updated 6 October 2019 with newer ML models

Updated 28 June 2018: new locations added


An Online Weather Forecaster using Artificial Intelligence Deep Learning Neural Networks built in MATLAB


In this post, I pick up from where I left off in my previous post(s) where I developed some preliminary Machine Learning (ML) models for weather prediction using MATLAB. In this next post, I deploy the weather forecasts into production in order to complete my end-to-end example of using MATLAB for prototyping and deploying Deep Learning neural networks. You will need to read the previous post for context, as I do not repeat any of that here.

Production Deployment via the MATLAB Compiler 


The MATLAB Compiler provides the option to build a self-contained Windows executable which encompasses the entire MATLAB code in a single application. However, for future-proofing / code re-usability, I instead chose to build a library in C# / .NET containing all the neural network prediction code (importing the trained networks via a MATLAB ".mat" data file -- 72 networks in all, covering the 8 variables across the 9 lookahead forecast periods), including the SQL database interaction (for reading the METAR observations and persisting the computed forecasts, every half hour). This was (almost) trivially simple to accomplish using the MATLAB Compiler Graphical-User-Interface. Having built the library (in my case, named NeuralMetLib.dll), I then built a very simple Windows Console application in C# which imported the compiled MATLAB library via the following few lines of code:

using MathWorks.MATLAB.NET.Arrays; 
using NeuralMetLib; 
namespace TestNeuralMetConsole 

class Program 
 { 
  static void Main(string[] args) 
   { 
    NeuralMet nm = new NeuralMet(); 
    MWArray resultMW= nm.testNeuralMET(); 
   } 
  } 


I deployed this Windows Console app on an AWS EC2 t2.small instance type and "wrapped it" behind a Windows Scheduler Task to enable the automatic triggering of forecast updates every half hour -- and that essentially completed the deployment of the production code.

Client Web Application


In order to consume the deployed forecasts and make them available for viewing, I built a simple ASP.NET web app, hosted on Microsoft Azure. You can find it here. This Online Forecaster automatically updates whenever new METAR data is obtained. It also displays the running statistics of the neural network forecasts versus the naïve ("zero-order-hold") forecasts (which can be compared directly with the Revised Error Curves in the previous post).

Conclusions & Next Steps


The MATLAB Compiler proved to be a straightforward and effective means of deploying trained Deep Learning Neural Network into production. Comparing this to my previous experience of deploying neural networks from TensorFlow, I found the MATLAB approach to be considerably easier.

As a next step, I'll use the same approach to deploy the Deep Learning training code (i.e., as well as the prediction code). In that way, the re-training of the neural networks can be fully automated, say, every month or so. I'll also add more locations (i.e., in addition to EGNS) once I capture the necessary METAR data.

I'd also like to try using Containers as opposed to entire Virtual Machine instances, in the interests of optimal use of infrastructure.

GOTCHAS


Dealing with the following "gotchas" was the most time-consuming aspect of the deployment. These aside, the entire process took only a couple of hours to build, test, and deploy. 

GOTCHA: In order for the Compiled MATLAB code to access the (Azure) SQL database, the following steps had to be carried out on the target machine (in my case, a Windows 2016 Server):

Installed JRE 8 from here

Installed Microsoft JDBC Driver 6.4 for SQL Server from here (where I arbitrarily selected 'C:\Program Files\JDBC6' for the installation location) 

Each time I accessed the SQL database from within the MATLAB source-code, I configured the database connection within MATLAB as follows:

javaaddpath('C:\Program Files\JDBC6\sqljdbc_6.4\enu\mssql-jdbc-6.4.0.jre8.jar','-end'); 

conn=database('DATABASE_SERVER_NAME',...
'ACCOUNT_NAME','PASSWORD',...
'com.microsoft.sqlserver.jdbc.SQLServerDriver',...
'jdbc:sqlserver://k00cpfylic.database.windows.net:1433;
                             database=DATABASE_SERVER_NAME'); 

where DATABASE_SERVER_NAME, ACCOUNT_NAME, and PASSWORD are (obviously!) for your own database rather than mine. Also, make sure the SQL Server database firewall is configured to allow connections from the target machine (straightforward in Azure when using a fixed IP address on the target machine). When compiled, the deployed application successfully connects to the SQL database.


DOUBLE GOTCHA: When compiling the MATLAB code via the MATLAB Compiler, there can be situations where multiple functions in MATLAB have the same name but different usage depending on the given Toolbox. In such cases you need to "steer" the Compiler by giving it some direction at compile-time. Specifically, when compiling functions that use the 'predict' function applied to Deep Learning neural nets (of type SeriesNetwork), you need to add the following pragma line to the function which contains the 'predict' function:


 %#function SeriesNetwork %declares which "predict" function to use 

Otherwise, the code will compile without error, but will give the following runtime error: 

Undefined function 'predict' for input arguments of type 'double'

GOTCHA: When building a .NET application which consumes the compiled MATLAB dll, make sure the project settings in Visual Studio are configured to build specifically for the 64bit CPU target (not Any CPU), otherwise an error similar to the following will arise at runtime:

Unhandled Exception: System.TypeInitializationException: The type initializer for 'NeuralMetLib.NeuralMet' threw an exception. ---> System.TypeInitializationExc eption: The type initializer for 'MathWorks.MATLAB.NET.Utility.MWMCR' threw an exception. ---> System.TypeInitializationException: The type initializer for 'Mat hWorks.MATLAB.NET.Arrays.MWArray' threw an exception. ---> System.BadImageFormat Exception: An attempt was made to load a program with an incorrect format. (Exception from HRESULT: 0x8007000B) at MathWorks.MATLAB.NET.Arrays.MWArray.mclmcrInitialize2(Int32 primaryMode) at MathWorks.MATLAB.NET.Arrays.MWArray..cctor() --- End of inner exception stack trace --- at MathWorks.MATLAB.NET.Utility.MWMCR..cctor() --- End of inner exception stack trace --- at NeuralMetLib.NeuralMet..cctor() --- End of inner exception stack trace --- at NeuralMetLib.NeuralMet..ctor() at TestNeuralMetConsole.Program.Main(String[] args) in E:\FlyLogicalSoftware\ VS2017\WinDotnet\NeuralMet\TestNeuralMetConsole\Program.cs:line

Friday 25 May 2018

Weather Prediction Refinements in MATLAB Machine Learning Models

Update 30 May 2018: the models presented here have now been deployed online, as described in the next post.

In this post, I pretty much pick up from where I left off in my previous post where I developed some preliminary Machine Learning (ML) models for weather prediction using MATLAB. In this next post, I explore some further refinements to the models. You will need to read the previous post for context, as I do not repeat any of that here.

Revised Error Curves


All the results of current refinements are presented in the following set of Error Curves which are revised versions of those presented in the previous post. Each of the updates to the curves is described later in this post.


The Previous Models


The results from the previous models are reproduced in the Revised Error Curves exactly as they were before, but with slight label changes summarised as follows:

  • The solid blue curves are the "LSTM alone" curves from before. Now labelled "LSTM", the "(dash, Multivar)" can be ignored for now.
  • The solid red curves are the "Multi-regression plus LSTM" curves from before. Now labelled "Multi-reg plus LSTM", the "(+ single period)" can be ignored for now.
  • The solid orange curves are the "Multi-regression alone" curves from before. Now labelled "Multi-reg", the "(+ single period)" can be ignored for now.
  • The black dotted line labelled "sdev obs" is the same as before 

The New Models 

Zero Order Hold

The first of the new models, the most naïve model of all, represented by the solid black curves labelled "ZOH" (for Zero Order Hold), is described as follows: basically, use the last known observed values as the forecast values for the future. It is about the simplest method of forecasting. As the curves show, it works well in the leftmost portions of the graphs, for short forecast periods (half hour, one hour etc), but unsurprisingly, the performance drops off (i.e., curves rise steeply) as the forecast period increases. That said, the (frankly disappointing) realisation is that this naïve model actually performs better than all the previously-presented models up to approximately 5 hours or so, which probably says more about the poor quality of those previous models. Note: the "dips" in the ZOH error at 24, 48, and 72-hours ahead correspond to the fact that local time of the forecast is exactly the same as the local time of the observation used in the ZOH, so diurnal variation will be effectively nulled,  leading to a stronger correlation.

Multivariable LSTM

The second of the new models represented by the blue dot-dash curves labelled "(dash, Multivar)" (alongside the previous "LSTM" labels) is a variation of the previous LSTM modelling. Recalling from the previous post, when building the LSTM models, I opted to treat each variable individually, and trained a single-variable LSTM model on the single time-history of observations for each variable. My (albeit nothing more than instinctive) reasoning was that it could be expecting "too much from the model" to try and fit all 8 variables together.  However, I did remark back then that it might be worth trying the multi-variable LSTM i.e., by fitting for 8 variables simultaneously in a single LSTM neural network, just in case there were useful internal correlations that might help. MATLAB makes this extension to multiple variables straightforward, and the results are now in. Unfortunately, with the exception of a few sporadic "dips" (i.e., regions of lower errors), the multivariable LSTM model generally under-performs the previous individual LSTM models. Moreover, it almost exclusively under-performs the naïve ZOH models for every variable across almost the entirety of the forecast periods (validating my original instincts).

Multivariable Regressions for Individual Forecast Periods

Recalling from the previous post, when fitting the multivariable regression models (the solid red and orange curves), I opted to fit for all forecast periods simultaneously (mostly to minimise the number of models required). However, as noted back then, the performance of the resulting models was relatively poor at small forecast periods, where the regressions were expected to perform better. I reasoned that the single error being minimised via the back propagation training algorithm was hampering the short forecast periods by being unable to get below the value for the long forecast periods (where the error is always going to be larger). So, my suggested proposal as a future enhancement was to fit a regression for each single forecast period of interest. The results of those regressions are now in (specifically for 0.5, 1, 3, 6, 12, 24, 36, 48, and 72-hour forecast periods) and are represented by the red and orange 'plus signs' in the error curves labelled "(+ single period)" alongside the corresponding "Multi-reg" labels. As described previously, the red (solid and now 'plus signs') correspond to the case where the outputs from the (solid blue) LSTM estimates are used as further input (regressors) in the regressions; the orange (solid and now 'plus signs') correspond to the case where the outputs from the (solid blue) LSTM estimates are not used as further input (regressors) in the regressions. As can be observed from the Revised Error Curves, it turns out that the Multivariable Regressions without LSTM, computed individually for each forecast period of interest, represented by the orange 'plus signs', are generally the best of all models across almost the entire range of forecast periods except for the low periods where ZOH prevails. There is one notable exception, namely Sea-Level Pressure, where ZOH prevails exclusively, indicating that this particular variable is very difficult to predict from previous time-histories.

Revised Recipe for an Online Weather Forecaster


Given the above results, the recipe for an online weather predictor for a given location is simplified, as follows:

  • Capture and persist the METAR briefings every half-hour for the location of interest.
  • Every few months or so, use the above-mentioned METAR time histories to re-train a set of neural network multivariable input regression models, with individual target responses per variable, and one model per forecast period of interest.
  • Every half hour, update the forecasts using the above-mentioned trained regression models with the most recent set of METAR observations as inputs (and desired forecasts as outputs). In those specific cases (combinations of variable and forecast period) where ZOH prevails, use that instead of the neural net.
Over time, one can expect that the predictive capabilities of the Deep Learning networks used in the above-mentioned regressions should improve as the training datasets grow. Moreover, once sufficient data has been gathered to span at least a year, the day-of-year variable (currently omitted, see previous post for the reason why) can be included as a further input (regressor). This will improve the performance by capturing the seasonal effects of the weather (i.e., in addition to the intra-day effects already captured).

Production Deployment

This is the next important step and will be explored in a future post.









Wednesday 16 May 2018

Weather Prediction with Machine Learning in MATLAB


UPDATE 30 May 2018: See latest post for online deployment of the models presented here

UPDATE 25 May 2018: See following post for refinements to the models presented here

This is the next in the series of  my Artificial Intelligence (AI) / Machine Learning (ML) posts . The first covered the use of TensorFlow for Object Detection. The second described how to deploy the trained TensorFlow model on the Google Cloud ML Engine.

In this third post, I focus entirely on MATLAB in order to explore its machine learning capabilities, specifically for prototyping ML models. The topic of deploying the trained models for production is touched upon but not expanded here.

As in my previous posts, I make no apologies for any technical decisions which may be considered sub-optimal by those who know better. This was an AI/ML learning exercise for me, first and foremost.

The Goal

Devise an ML algorithm to forecast the (aviation) weather, in half-hour increments, up to three days into the future, using historical time series of weather data. I was motivated to tackle this specific problem because (i) high-quality aviation weather data is readily available -- so the task of data preparation and cleansing is minimised, enabling the focus to be primarily on the ML algorithms; (ii) I've lamented the loss of the (very useful -- in my opinion) 3-day aviation forecast from the UK MET Office website ever since they removed it a couple of years ago.

The Dataset

For this initial exploration, I used aviation weather data (METARs) for Ronaldsway Airport (EGNS) on the Isle of Man since (i) I am based here and fly my Scottish Aviation Bulldog from here; (ii) being located in Northern Europe, the weather is varied and changeable, so the models (hopefully) have interesting features to be detected during training, thereby exercising the models' forecasting capabilities more than if the weather was uniform and more easily predictable. The modelling techniques presented for this single location can of course be extended to any other location for which analogous data exists (i.e., pretty-much anywhere).

The underlying weather data was obtained from US NOAA/NWS  (as utilised in JustMET, iNavCalc, and ReallySimpleMovingMap). The training set comprised METAR data captured every half hour for EGNS over the 3.5 month period from 30 December 2017 through 13 April 2018. Each half-hourly recorded METAR was persisted (for long term storage e.g., for future analysis) to an Amazon DynamoDB database, as well as to a Microsoft Azure SQL database (for temporary storage and staging). The triggering to capture each successive half-hourly METAR via web-service calls to NOAA was implemented using a Microsoft Azure Scheduler.

The Toolkit


Solution Path

Pre-processing the Raw Data

The first task was to pre-process the raw data, in this case, primarily to correct for data gaps since there was (inevitably) some (unforeseen) down-time over the weeks and months in the automated METAR capture process.

To start out, the data was retrieved (into MATLAB) from the Azure SQL database using the MATLAB Database Toolbox functionality. GOTCHA: the graphical interface bundled with the MATLAB Database Toolbox is rather limited and cannot handle stored procedures. Instead, the command-line functions must be used when retrieving data from databases via stored-procedures.

Next, the retrieved data was reformatted into a MATLAB Timetable. This proved to be a very convenient format for manipulating and preparing the data. It is an extension of the MATLAB table format, designed specifically to handle time-stamped data, and therefore ideal for handling the multivariate METAR time-series. Note: the MATLAB table format is a relatively recent innovation, and seems to be MATLAB's answer to the DataFrame object from the powerful and popular pandas library available for Python.

The set of 8 variables collected for analysis and forecasting are summarised below (for detailed definitions, see here). The variables pertain to observations made on the ground at the location of the given weather station (airport), distributed via the METAR reports. I have kept the units as per the METARs (rather than converting to S.I.). Each observation (i.e., at each sample time) contains the following set of data:
  • Temperature (units: degrees Celsius)
  • Dewpoint (units: degrees Celsius)
  • Cloudbase (units: feet)
  • Cloudcover (units: oktas, dimensionless, 0 implies clear sky, 8 implies overcast), converted to numerical value from the raw skycover categorical variable from the METAR (i.e.,"CAVOK" -- 0 oktas; "FEW" -- 1.5 oktas; "SCT" -- 3.5 oktas; "BKN" -- 5.5 oktas; "OVC" -- 8 oktas). Note: whenever "CAVOK" was specified, this was taken to set the Cloudcover value to zero and the Cloudbase value to 5000 feet -- even if skies were clear all the way up since the METAR vertical extent formally ends at 5000 feet above the airport (typically). Making this assumption (hopefully) means erring on the safe side, even if it tampers -- in a sense --with the natural data. 
  • Surface Pressure (units: Hectopascals)
  • Visibility (units: miles)
  • Wind Speed (units: knots)
  • Wind Direction (units: degrees from True North)
Additionally, the date and time of each observation is given (in UTC), from which the local time-of-day can be determined from the known longitude via the MATLAB expression:

timeofday=mod(timeutc+longitude*12/180,24);

Note: as well as local-time-of-day, it would also be worthwhile to include the day-of-year in the analyses (since weather is known to be seasonal). This was not done for now since the entire data set spans only 3.5 months rather than at least one complete year. This means that since the validation set is taken from the 30% tail-end (see later), and the training set is the 70% taken from the start, up to the beginning of the tail-end, there will be no common values for day-of-year in both the training and validation sets, so it is not sensible to include day-of-year for now. However, if/when the collected data set spans sufficient time (1.3 years for the 30:70 split) such that the training and validation sets both contain common values for day-of-year, then it should be included in a future refinement.

Filling the Data Gaps

The MATLAB command for creating the aforementioned timetable structure from individual vectors is as follows:

TT=timetable(datetimeUTC,temperature,dewpoint,cloudbase,
cloudcover,visibility,sealevelpressure,windspeed,...
winddirection,timeofday);

Since all the variables in the timetable are continuous (rather than categorical), it is simple in MATLAB to fill for missing data by interpolation as follows:

% Define the complete time vector, every 30 minutes
newTimes = [datetimeUTC(1):minutes(30):datetimeUTC(end)];

% Use interpolation for numerical values

TT.Properties.VariableContinuity = {'continuous','continuous','continuous','continuous',...
'continuous','continuous','continuous','continuous',...
'continuous'};

% Perform the data filling
TT1 = retime(TT,newTimes);


METAR Data Time Series Plots

These cleaned METAR time series for EGNS are plotted in the graphs below and serve as the source of training and validation data for the upcoming ML models. Each time series is 4,974 data points in length (corresponding to the 3.5 month historical record, sampled each half hour).






Modelling Phase 1: LSTM models for each variable

Since it is generally known that long short-term (LSTM) neural networks are well-suited to the task of building regression models for time series data, it seemed the natural starting point for these investigations, not least since LSTM layers are now available within MATLAB.

A separate LSTM model was therefore built for each of the METAR data variables by following the MATLAB example presented here. Note: it is possible to build an LSTM model for multiple time series taken together, but I felt it would be "asking too much" of the model, so I opted for separate models for each (single variable) time series. It may be worthwhile revisiting this decision in a future attempt at refining the modelling.

When building the models for the METAR data, the various (hyper-)parameters available within the LSTM model for "tweaking" (such as the number of neurons, per layer the number of layers, the number of samples back in time to be used to fit the model for looking forward in time, etc) not surprisingly needed to be changed from the default settings and from those settings presented in the MATLAB example, in order to achieve useful results on the METAR data. This is not unreasonable, given that the data sets are so different, and that machine learning is essentially data-driven. By trial-and-error experimentation, the following code snippet captures the set of hyper-parameters which were found to be effective on the METAR variables.

%% Example data setup for LSTM model on the first chunk of data

% Look back 92 hours. Seems suitable for METAR data
numTimeStepsTrain = 184; 

% 3 days maximum forecast look-ahead
numTimeStepsPred = 144;
windowLength = numTimeStepsPred+numTimeStepsTrain;


data=data_entire_history(1:windowLength);
% where data_entire_history is entire time series

XTrain = data(1:numTimeStepsTrain);
% where data is the first window of time series

YTrain = data(2:numTimeStepsTrain+1);
% target for LSTM is one time-step into the future

XTest = data(numTimeStepsTrain+1:end-1);
% inputs for testing the LSTM model at all forecast look-aheads

YTest = data(numTimeStepsTrain+2:end);
% targets for testing the LSTM model at all forecast look-aheads

%For a better fit and to prevent the training from diverging,
%standardize the training data to have zero mean and unit variance.
%Standardize the test data using the same parameters as the training
%data.

mu = mean(XTrain);
sig = std(XTrain);

XTrain = (XTrain - mu) / sig;
YTrain = (YTrain - mu) / sig;

XTest = (XTest - mu) / sig;

%% Define LSTM Network Architecture
inputSize = 1;
numResponses = 1;
numHiddenUnits =65; % seems suitable for METAR data

layers = [ ...
    sequenceInputLayer(inputSize)
    lstmLayer(numHiddenUnits)
    fullyConnectedLayer(numResponses)
    regressionLayer];

maxEpochs=200;
opts = trainingOptions('adam', ...
    'MaxEpochs',maxEpochs, ...
    'GradientThreshold',1, ...
    'InitialLearnRate',0.005, ...
    'LearnRateSchedule','piecewise', ...
    'LearnRateDropPeriod',125, ...
    'LearnRateDropFactor',0.2, ...
    'Verbose',0);

% Train the LSTM network with the specified training options by

% using trainNetwork.

net = trainNetwork(XTrain,YTrain,layers,opts);

% Forecast Future Time Steps
% To forecast the values of multiple time steps in the future,

% use the predictAndUpdateState function to predict time steps
% one at a time and update the network state at each prediction.
% For each prediction, use the previous prediction as input to
% the function.
% To initialize the network state, first predict on the training 
% data XTrain. Next, make the first prediction using the last
% time step of the training response YTrain(end). Loop over the 
% remaining predictions and input the previous prediction to
% predictAndUpdateState.


net = predictAndUpdateState(net,XTrain);
[net,YPred] = predictAndUpdateState(net,YTrain(end));

numTimeStepsTest = numel(XTest);
for i = 2:numTimeStepsTest
    [net,YPred(1,i)] = predictAndUpdateState(net,YPred(i-1));
end


% Unstandardize the predictions using mu and sig calculated

% earlier.

YPred = sig*YPred + mu;


% The training progress plot reports the root-mean-square error

%(RMSE) calculated from the standardized data. Calculate the RMSE
% from the unstandardized predictions.

rmse = sqrt(mean((YPred-YTest).^2))

Note: it may be better to tweak the hyper-parameters specific to the modeling of each variable, again, not done, here, but an idea for future enhancements.

Using the above-mentioned set of hyper-parameters, the graph below shows a typical LSTM training convergence history (in this case, for Temperature). Note: this plot, (optionally) generated by MATLAB interactively during training, is similar to that available via TensorBoard (when training TensorFlow models), but with the added advantage that there is a "Stop Button" on the MATLAB interface that enables the user to stop the training at any time (and capture the network parameters at that time).



The typical forecast results (i.e., from just one arbitrary window of the historical data set) obtained from the LSTM models for each variable are shown in the following plots. In each plot, the (92 hour) training data window is (arbitrarily) chosen to be before "time zero" when the forecast starts (and extends from half an hour to 72 hours i.e., 3 days). The black curve is the training data, the blue curve the forecast results, and the red curve the test values against which the forecast performance can be directly compared. For all METAR variables, the forecasts are seen to be effective only out to a few hours at most, with significant deviations beyond that -- and some variables are worse than others.




Forecast Error Plots

By re-training each LSTM model for each METAR variable, at each successive sample point in the test data set, and comparing with the known measurements for each forecast time, it is possible to build-up a statistical picture of the average performance of the models over forecasting time. For the 3.5 month historical METAR data set, sampled every half hour, and subtracting two window widths (first and last), this implies training approximately 4600 LSTM models per METAR variable. Note: when it comes to production deployment of the models, the principle of re-training each model at each sample point i.e., each time a new weather observation comes in -- every half hour in the case of METARs, --means that the model for the given variable is the most up-to-date that it can be at any given time, for use in forecasting forward from that point in time. By averaging the mean-squared error in the (4600) forecasts of each of the trained models sliding forward in time from the begin to the end of the entire data set, the expected accuracy of the forecast for each look-ahead forecast time can be assessed. These accuracies (in terms of absolute average mean-square error and relative average mean-square error) are plotted in the Forecast Error Plots below (blue curves, labelled "LSTM alone") for each variable versus look-ahead time (from half an hour to three days). On the absolute error plots, the standard deviation of the underlying observations is also shown (denoted sdev obs). Whenever the error curve is (well) below the sdev obs line, the forecast can be considered better than random. But whenever the error curve is near to or above the sdev obs line, the forecast is no better than random, and should be considered as ineffective. Similarly, on the relative error curves, whenever the error is below 50% (as indicated by the line marked 50% error), the forecast may be considered as effective, though the lower the better. Above a relative error of 50%, the forecast should be considered as ineffective.



In terms of the range to which the LSTM forecasts are considered useful (in terms of look-ahead period), these values have been extracted from the plots and summarised in the following table.

Variable Usable forecast 
Temperature     6 hours
Dewpoint     2 hours
Cloudbase     2 hours
Cloud Cover     No usable forecast
Sea-Level Pressure    25 hours
Visibility    2 hours
Windspeed    3 hours
Wind Direction    3 hours

The LSTM forecasts are generally seen to be useful out to a few hours, with some exceptions: Sea-Level Pressure forecast is good out to (an impressive)  25 hours; but Cloud Cover forecast is no good at all.


Modelling Phase 2: Using the LSTM model outputs in combination with the other METAR variables to perform regressions

The benefit of the LSTM modelling from Phase 1 above is that the recent history of a given variable is utilised in predicting it's future path. This should presumably be better than using just a single snapshot in time (e.g., now) to predict the future. That said, from the results obtained, the accuracy diminishes quite significantly when forecasting out beyond a couple of hours or so. In this next phase, the idea is to utilise the information from the other METAR variables to help improve the forecasts for the given variable (which so far has been based only on histories of itself). This makes sense from the laws-of-physics point of view.  For example, the temperature an hour from now will depend not only on the temperature now, but on: the time of day (since there is a diurnal temperature cycle) of the measurement and of the desired forecast; the extent of cloud cover; the wind strength (possibly), etc., so it makes intuitive sense to somehow tie these other known measurements and time-factors into the forecasts for a given variable.

The strategy therefore is to re-cast the forecasting task as a neural network multivariate regression problem where the inputs (regressors) comprise: (i) all the measured METAR variables at a given time, (ii) the time of day of those measurements, (ii) the time difference between the measurement time and the time of the forecast looking ahead; (iii) and the estimated value of the variable in question at the time of the forecast looking ahead obtained from the LSTM model from Phase 1. The output (target) of the regression is the estimate of the value of the variable in question at the time of the forecast looking ahead (for each look ahead time). For training, all input and output values are known. Moreover, since the LSTM models have been re-defined every half hour (by sliding through the entire data set), a large set (i.e., 664,521) of input/output values is available for training this neural net regression model.

To perform the neural network regression, MATLAB has two options available: the (older) train function; and the newer trainNetwork function (which was used above for the LSTM training). The differences between the two methods are discussed here. I opted to use the newer trainNetwork method since it is focused on Deep Learning and can make use of large data sets running on GPUs. At this point I would like to extend my gratitude to Musab Khawaja at the Mathworks who provided me with sample code (in the snippet below) demonstrating how to adapt the imageInputLayer (normally used for image processing) for use in general-purpose multivariable regression.

As with the LSTM modelling, the hyper-parameters need to be chosen for the training. Again, by trial-and-error, the following common set (within the code snippet below) proved to be suitable for each of the METAR data fits:


% Here, x is the array of appropriate regressor observations, and
% t is the vector of targets

% Create a table with last col the outputs
data=array2table([x' t']);
[sizen,sizem]=size(x);

numVars = sizen;
n = height(data);

% reshape to 4D - first 3D for the 'image', last D for each
% sample
dataArray = reshape(data{:,1:numVars}', [numVars 1 1 n]);

% ...assume first numVars columns are predictors (regressors)

output = data{:,numVars+1}; % assume response is last column


% Split into 70% training and 30% test set
pc = 0.7;
rng('default') % for reproducibility
idx=1:n;
% Don't shuffle yet, since don't want training sliding
% window to leak into validation set
max_i = ceil(n*pc);
idxTrain = idx(1:max_i);
idxTest = idx(max_i+1:n);


% Now shuffle the training and validation sets independently
idxTrain=randsample(idxTrain,length(idxTrain));
idxTest=randsample(idxTest,length(idxTest));

% Prepare arrays for regressions

trainingData = dataArray(:, :, :, idxTrain);
trainingOutput = output(idxTrain);

testData = dataArray(:, :, :, idxTest);
testOutput = output(idxTest);
testSet = {testData, testOutput};


% Define network architecture
layers = [...
    imageInputLayer([numVars,1,1]) % Non-image regression!
       
    fullyConnectedLayer(500) % Seems suitable for METAR data
    reluLayer
   

    fullyConnectedLayer(100)
    reluLayer
   
    fullyConnectedLayer(1)
    regressionLayer
    ];

% Set training options
options = trainingOptions('adam', ...
    'ExecutionEnvironment','gpu', ...
    'InitialLearnRate', 1e-4, ...
    'MaxEpochs', 1000, ...
    'MiniBatchSize', 10000,...% Seems suitable for METAR data
    'ValidationData', testSet, ...
    'ValidationFrequency', 25, ...
    'ValidationPatience', 5, ... 

    'Shuffle', 'every-epoch', ...
    'Plots','training-progress');


% Train
net = trainNetwork(trainingData, trainingOutput, layers, options);


% Predict for validation plots
y=predict(net,testData);
VALIDATION_INPUTS=reshape(testData,[numVars,length(testData)])';
VALIDATION_OUTPUTS=testOutput;
VALIDATION_y=y;



A Deep Learning model with the afore-mentioned hyper-parameters was trained for each METAR variable, in turn, selected as the target. As evident from the code snippet, the first 70% of the entire data set was used for training (which amounted to 465,165 data points), the remaining 30% ("tail-end") for validation (which amounted to 199,356  data points).

The errors in the corresponding forecasts when applied to the validation data are displayed as the red curves (labelled "Multi-regression plus LSTM") in the Forecast Error Plots presented earlier. For comparison, the yellow curves (labelled "Multi-regression alone") in t
he error plots correspond to the multivariate regressions re-trained but this time excluding the LSTM outputs as regressors. It can be seen that in all cases, the LSTM models out-perform the Multi-regressions when limiting our attention to those regimes where the forecasts are deemed to be useful i.e., when the absolute rms error is below the standard deviation of the observations and when the relative rms error is below 50%. This came somewhat as a surprise, since intuitively it was felt that the addition of information via the other variables should have been more beneficial than was observed. Perhaps a refined regression analysis, as discussed below, would reveal such. Outside the usable regimes, the Multi-regressions sometimes out-perform the LSTM models, but by then, none of the models are effective (errors too large). It is also interesting to note that the inclusion of the LSTM outputs as inputs to the Multi-regressions generally improves their performance (i.e., red curves lower than yellow curves) -- but not in every case (see for example, the Cloudbase forecasts, where the yellow curve is lower than the red curve).


Other Things To Try


Some ideas to try next include (not exhaustive):

  • The LSTM models were trained to target the observations just one sample period (half hour) away from the inputs, but then asked to predict out to three days away, with the accuracy dropping off dramatically within the first few samples ahead. Instead, it might be worth trying training a different LSTM model for each look-ahead period (by down-sampling before training). This would entail having a different LSTM model per look-ahead period, but perhaps the forecast accuracy would be better, particularly further out ?
  • Likewise, for the Multi-regression models, all look-ahead periods were included in a single regression model. This means that the accuracy for the short look-ahead periods is penalised by the errors further out (since the stochastic gradient descent optimiser minimises a single number: the rms error across all look-ahead periods). Instead, it might be worth trying training a different Multi-regression model for each look-ahead period. Again, this would entail more models, but the accuracy may be better.
  • The hyper-parameter settings for all models were set via trial-and-error. It might be worth trying a more systematic approach e.g., by invoking an outer layer of optimisation which uses for example, techniques invoking genetic algorithms, to choose the optimum set of hyper parameters.
  • Try incorporating additional information in the regressions. For example, weather data for other locations known to correlate with the weather in the given location. Case-in-point: since most of the weather systems on the Isle of Man originate from the Atlantic i.e., to the west, it might be useful to incorporate weather data from Ireland, with suitable lag, to try and improve the model predictions for the Isle of Man. 

End-to-End Recipe For Weather Forecaster


If the Multi-regression results can be improved by the suggestions above, such that they can compete with the LSTM models over some portion(s) of the look-ahead range, then the following general recipe for an online ML-based weather forecaster can be proposed:

Every few months (or so):

  1. For a given location, gather as long a history of half-hourly METAR data as possible/available, ideally over at least the past year in order to capture seasonal variations
  2. From the data in 1), perform a set of LSTM fits (sliding across the data, one sample at a time) to obtain the estimated quantities for use as inputs (alongside the METAR data) to the multivariate regressions. Note: for the 3.5 month historical data set, this set of fits took approximately two days per METAR variable on a p2.xlarge (GPU-equipped) AWS instance, owing to the many thousands of LSTM training runs required.  
  3. With the data from 1) combined with the estimates from 2), Perform Deep learning multivariate regressions for each target variable. Refer to this trained model as the REGRESSION MODEL for the given target variable. Also perform a regression for the given target variable excluding the LSTM estimates as a regressor. Refer to this trained model as the REDUCED REGRESSION MODELfor the given target variable. Note: for the 3.5 month historical data set, these two fits took approximately one hour per METAR variable on a p2.xlarge (GPU-equipped) AWS instance. Re-create a revised set of Forecast Error Plots from the results of the runs in 2 & 3 (in order to be able to select the best model per forecast look-ahead period, see below).

Every time a new observation is received (half-hourly):

  1. Re-train the LSTM models, one per variable, using the latest measurement as the most recent available. For each variable, refer to this trained model as the LSTM MODEL for the given variable (note: this fit will take a few minutes per METAR variable on a p2.xlarge GPU-equipped AWS instance). 
  2. For each forecast look-ahead period (i.e., half hourly up to three days ahead), use each LSTM MODEL each REGRESSION MODEL, and each REDUCED REGRESSION MODEL to generate three different forecasts for the given variable (note: these will take only a few seconds per METAR variable on a p2.xlarge GPU-equipped AWS instance). For each forecast look-ahead period, choose the forecast (i.e., from the LSTM MODEL, the REGRESSION MODEL, or the REDUCED REGRESSION MODEL) depending on which gives the lowest rms error for the given forecast look-ahead period, by referring to the updated Forecast Error Plots. 

Production Deployment Possibilities


The optimum choice of computational and software platform for the Production Deployment of the end-to-end ML-based weather forecaster presented above is not at all clear, requiring a detailed  exploration of the available technical options and trade-offs. However, the following possibilities come to mind, each with its own advantages and disadvantages:

  1. Deploy on a suite of MATLAB-equipped Cloud-based server instances. Has the advantage that the code can be used essentially "as is" (since the MATLAB code is already written via the prototypes presented here). Has the disadvantage with respect to cost that the servers would have to be "always on", and the associated MATLAB licensing costs may become prohibitive.
  2. Use the MATLAB compiler to package the trained models into deployable libraries which can be installed within (say, Docker) containers which can be instantiated on-demand in the Cloud (and automatically shut down when dormant). Has the advantages that the code is essentially written (just needs to be run through the MATLAB compiler); and that by using containers, there is no need to incur the cost of "always on" server instances. There some open questions, however: can the (half hourly) re-training of the LSTM models via the trainNetwork function be compiled via the MATLAB Compiler?; can functions deployed from the MATLAB Compiler access GPUs, or must the GPU Coder be used?; can compiled MATLAB software running within containers access GPUs? 
  3. Re-write the models in an open-source ML framework such as TensorFlow and deploy on the Google Cloud ML Engine as exemplified here. Has the disadvantage that all the models would have to be rewritten, outside MATLAB.
  4. Any suggestions welcome : )

A Note On Workflow

In the past, I would tend to use MATLAB much like I would use other functional programming languages i.e., by creating many functions (subroutines) and calling them from a main program. However, by its very nature, machine learning is much more of a trial-and-error process than the type of analyses I have been used to. It is generally more amenable to the interactive process of defining a set of parameters, running the parameters through a script (e.g., which contains the ML model training commands), viewing the outputs (e.g., in terms of suitable performance metrics), re-assessing the assumptions and tweaking the parameters accordingly, then running the script again, etc., until a satisfactory outcome has been achieved.  In fact, this very mode of interaction has resulted in Jupyter Notebooks being one of the most widely-used IDEs for developing ML models in Python. Again, MATLAB seems to have their own recently-introduced answer to this: namely, the MATLAB Live Editor. As such, when starting out on this exploration, and having successfully used Jupyter Notebooks for Python on previous ML projects, I launched into using the MATLAB Live Editor for running the aforementioned interactive ML design scripts. Whilst I found this to be useful in the early prototyping stage for a given model, I reverted back to the tried-and-tested technique of executing scripts (stored in m-files) with embedded local sub-functions (for calling from loops in the given script). I simply found this mode of operation to be more productive. Also, the publish options from Live Edit seemed to be less flexible and less configurable than for normal m-file scripts.

Conclusions

  • MATLAB is a highly productive platform for prototyping Machine Learning (in particular, Deep Learning) algorithms. The data-wrangling tools are excellent. In my opinion, it is easier to develop the ML models in MATLAB than in Python/TensorFlow, but that could be due to the fact that I have a long experience (decades) with using MATLAB compared with only a few weeks using Python/TensorFlow.
  • Weather forecasting is a hard problem. The Deep Learning approaches developed here show some promise, particularly the LSTM models, but generally only out to a few hours -- and not to the 3 days desired at the outset. Further refinement (perhaps along the lines presented above in Other Things To Try) would hopefully improve the predictive ability of the models.

Saturday 5 May 2018

Navigation -- New Track Reference Technique

Stuck on the ground due to fog on the Isle of Man today, waiting to do an air-test on one of our Bulldogs (to bed-in its brand new engine), our test pilot Robert Miller (with 21,000 hours on non-airline military and civilian aircraft!) spent the afternoon in Costa's Coffee Castletown explaining the New Track Reference Technique for aerial navigation using map and compass. Here's his write-up.