Probabilistic Time Series Forecast
In this example, we will consider how to use conditional generative models for probabilistic time series forecast. The generative models help to estimate confidence intervals of the predictions.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
Read data set
We will predict the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144 observations.
# data download
!wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv
# read the data
data = pd.read_csv("airline-passengers.csv")
data.head()
Data visualization
# get a time series
Y = data[['Passengers']].values
plt.figure(figsize=(12, 4))
plt.plot(Y, label='True')
plt.legend(loc='best', fontsize=14)
plt.tight_layout()
plt.show()
Preprocessing
# scaling
from sklearn.preprocessing import StandardScaler
Y = StandardScaler().fit_transform(Y)
plt.figure(figsize=(12, 4))
plt.plot(Y, label='True')
plt.legend(loc='best', fontsize=14)
plt.tight_layout()
plt.show()
Autoregression model for time series forecast
Consider are observations of a time series. Autoegression model (AR) for the time series forecast assumes the following:
where is a predicted value.
In matrix form we will define this model as:
where
K = 10
M = 10
def AR_matrices(Y, K, M):
X_AR = []
Y_AR = []
for i in range(len(Y)):
if i < K-1: continue
if i+M >= len(Y): break
ax_ar = Y[i+1-K:i+1].reshape(-1, )
X_AR.append(ax_ar)
ay_ar = Y[i+M]#[0] # predict only y_{i+M}
# ay_ar = Y[i+1:i+M+1].reshape(-1, ) # predict y_{i+M}, ..., y_{i+2}, y_{i+1}
Y_AR.append(ay_ar)
return np.array(X_AR), np.array(Y_AR)
# prepare X and Y matrices
X_AR, Y_AR = AR_matrices(Y, K, M)
X_AR[:2]
Y_AR[:2]
X_AR.shape, Y_AR.shape
Train and test split
N = 90
X_AR_train, X_AR_test = X_AR[:N], X_AR[N:]
Y_AR_train, Y_AR_test = Y_AR[:N], Y_AR[N:]
Fit probabilistic model
from probaforms.models import RealNVP
# fit nomalizing flow model
model = RealNVP(lr=0.01, n_epochs=100, weight_decay=0.2)
model.fit(Y_AR_train, X_AR_train)
plt.figure(figsize=(12, 4))
plt.plot(model.loss_history)
plt.xlabel('Number of iterations')
plt.ylabel('Loss')
plt.tight_layout()
plt.show()
Single prediction
# sample new objects
Y_pred_test = model.sample(X_AR_test)
plt.figure(figsize=(12, 4))
plt.plot(Y_AR_test[:, -1], label='True', alpha=1.)
plt.plot(Y_pred_test[:, -1], label='Prediction', alpha=1.)
plt.legend(loc='best')
plt.tight_layout()
plt.show()
Multiple predictions
predictions_test = []
for i in range(1000):
Y_pred_test = model.sample(X_AR_test)
# store predictions
predictions_test.append(Y_pred_test[:, -1])
predictions_test = np.array(predictions_test)
plt.figure(figsize=(12, 4))
# mean prediction
plt.plot(predictions_test.mean(axis=0), label='Prediction (mean)', alpha=1., color='0')
# 90% CI
xx = np.arange(len(Y_pred_test))
y_top = np.quantile(predictions_test, q=0.95, axis=0)
y_bot = np.quantile(predictions_test, q=0.05, axis=0)
plt.fill_between(xx, y_bot, y_top, label="90% CI", alpha=0.5, color='C0')
# true
plt.plot(Y_AR_test[:, -1], label='True', alpha=1., color='C1')
plt.legend(loc='best')
plt.tight_layout()
plt.show()