Regression

Generate data set

In this example, we will solve a regression task with uncertainty estimation using Conditional Normalizing Flows. In addition, we will compare this approach with the Gaussian Process. Let’s generate a data set with observations sampled from a normal distribution, where mean and variance depend on input variable X.

# import of basic libraries
import matplotlib.pyplot as plt
from matplotlib import cm
import pandas as pd
import numpy as np

# function we would like to predict
def func(X):
    return np.exp(-X)

# input variable
X = np.linspace(0, 5, 500).reshape(-1, 1)

# mean values of targets y
mu = func(X)

# normal noise
eps = np.random.normal(0, 1, X.shape)
sigma = 0.05*(X+0.5)

# target variable we need to predict
y = mu + eps * sigma

plt.figure(figsize=(12, 6))
plt.plot(X, mu, label='Mean true', color='0', linewidth=2)
plt.scatter(X, y, marker='+', label='Observations', linewidth=2)

plt.plot(X, mu+sigma, label=r'$\mu \pm \sigma$ True', color='0', linewidth=2, linestyle='--')
plt.plot(X, mu-sigma, color='0', linewidth=2, linestyle='--')

plt.xlabel("X")
plt.ylabel("y")
plt.grid()
plt.legend()
plt.show()

Normalizing flows

Now, we use Conditional Real NVP to learn conditional distribution $p(y|X)$ of the observations. Here, X is condition, y is our target to learn its distribution.

from probaforms.models import RealNVP

# fit nomalizing flow model
model = RealNVP(lr=0.01, n_epochs=100)
model.fit(y, X) # (target, condition)

# sample new observations
y_gen = model.sample(X)

The figure below shows that Normalizing Flow (NF) successfully learnt the distribution $p(y|X)$ . We can use NF to sample new objects from the distribution that are similar to the real observations.

plt.figure(figsize=(12, 6))
plt.plot(X, mu, label='Mean true', color='0', linewidth=2)
plt.scatter(X, y_gen, marker='+', label='Generated with NF', color='C1', linewidth=2)
plt.scatter(X, y, marker='+', label='Observations', linewidth=2)
plt.xlabel("X")
plt.ylabel("y")
plt.grid()
plt.legend()
plt.show()

Now, we will repeat the sampling procedure several times and empirically estimate the mean value of the function and its standard deviation. The standard deviation we consider as the prediction uncertainty. The figure below demonstrates that NF successfully estimates the mean and standard deviation of the sample.

y_preds = []

# repeat sampling several times
for i in range(1000):

    # sample with NF
    y_gen = model.sample(X)
    y_preds.append(y_gen)

y_preds = np.array(y_preds)

# estimate the mean of the predictions
mu_pred = y_preds.mean(axis=0).reshape(-1,)

# estimate the standard deviation of the predictions
sigma_pred = y_preds.std(axis=0).reshape(-1,)

plt.figure(figsize=(12, 6))

plt.fill_between(X[:, 0], 
                 y1=mu_pred+sigma_pred, 
                 y2=mu_pred-sigma_pred, color='C1', alpha=0.5, label=r'$\mu \pm \sigma$ by NF')
plt.plot(X, mu_pred, label='Mean by NF', color='C1', linewidth=4)

plt.plot(X, mu, label='Mean true', color='0', linewidth=2)
plt.scatter(X, y, marker='+', label='Observations', color='C0', linewidth=2)

plt.plot(X, mu+sigma, label=r'$\mu \pm \sigma$ True', color='0', linewidth=2, linestyle='--')
plt.plot(X, mu-sigma, color='0', linewidth=2, linestyle='--')

plt.xlabel("X")
plt.ylabel("y")
plt.grid()
plt.legend()
plt.show()

Gaussian Process

Finally, let’s solve the same task with the Gaussian Process (GP) to compare the results. The plot below shows that GP predicts the mean with good precision. However, it is not able to learn the dependency of the standard deviation from the input variable X.

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import WhiteKernel, RBF, ConstantKernel as C

# define kernel
kernel = C() * RBF() + WhiteKernel()

# fit GP
gpr = GaussianProcessRegressor(kernel=kernel)
gpr.fit(X, y)

# make predictions of the means and standard deviations
mu_pred_gp,sigma_pred_gp  = gpr.predict(X, return_std=True)

plt.figure(figsize=(12, 6))

plt.fill_between(X[:, 0], 
                 y1=mu_pred_gp+sigma_pred_gp, 
                 y2=mu_pred_gp-sigma_pred_gp, color='C1', alpha=0.5, label=r'$\mu \pm \sigma$ by GP')
plt.plot(X, mu_pred_gp, label='Mean by GP', color='C1', linewidth=4)

plt.plot(X, mu, label='Mean true', color='0', linewidth=2)
plt.scatter(X, y, marker='+', label='Observations', color='C0', linewidth=2)

plt.plot(X, mu+sigma, label=r'$\mu \pm \sigma$ True', color='0', linewidth=2, linestyle='--')
plt.plot(X, mu-sigma, color='0', linewidth=2, linestyle='--')

plt.xlabel("X")
plt.ylabel("y")
plt.grid()
plt.legend()
plt.show()