Part II: Hierarchical Sampling

In this section, we demonstrate how to obtain estimates for the model parameters \(m_0\), \(m_1\), \(m_2\), and \(m_3\) while treating the data noise as a free parameter, to be inferred from the data. If you haven’t already, we recommend reviewing Part I: Known Data Noise, which provides a detailed explanation of the code needed to define and solve a simple regression problem.

Import libraries and define constants

import bayesbay as bb
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(30)
# dimensions and true coefficients
N_DIMS = 4
M0, M1, M2, M3 = 20, -10, -3, 1

# data and noise
N_DATA = 15
DATA_X = np.linspace(-5, 10, N_DATA)
DATA_NOISE_STD = 20

Data predictions and kernel

fwd_operator = np.vander(DATA_X, N_DIMS, True)
y = fwd_operator @ [M0, M1, M2, M3]
y_noisy = y + np.random.normal(0, DATA_NOISE_STD, y.shape)
fig, ax = plt.subplots()
ax.set_title('Data')
ax.plot(DATA_X, y, 'k', lw=2, label='Predicted data (true model)')
ax.plot(DATA_X, y_noisy, 'ro', label='Noisy observations')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.grid()
ax.legend()
plt.show()
../_images/b817226df9ee41dbd19305770d90665294950e38dacb9057aa5ab7ea1c181985.png

Setting up the Bayesian sampling

1. Prior probability

m0 = bb.prior.GaussianPrior(name="m0", mean=20, std=1, perturb_std=0.5)
m1 = bb.prior.UniformPrior(name="m1", vmin=-13, vmax=-7, perturb_std=0.4)
m2 = bb.prior.UniformPrior(name="m2", vmin=-10, vmax=4, perturb_std=0.5)
m3 = bb.prior.GaussianPrior(name="m3", mean=1, std=0.1, perturb_std=0.1)

2. Parameter space and parameterization

param_space = bb.parameterization.ParameterSpace(
    name="my_param_space", 
    n_dimensions=1, 
    parameters=[m0, m1, m2, m3], 
)
parameterization = bb.parameterization.Parameterization(param_space)

3. Forward problem

def fwd_function(state: bb.State) -> np.ndarray:
    m = [state["my_param_space"][f"m{i}"] for i in range(N_DIMS)]
    return np.squeeze(fwd_operator @ m) # dpred = G m

4. Observed data: the Target

Important

In BayesBay, the hyper-parameters related to the data noise should be set up through the data Target. This is accomplished by specifying minimum and maximum bounds for the standard deviation of the data and, optionally, for the data correlation.

In this instance, we assume that there is no correlation between adjacent data points.

target = bb.Target("my_data", 
                   y_noisy, 
                   std_min=0, 
                   std_max=40, 
                   std_perturb_std=4,
                   noise_is_correlated=False)

5. Log Likelihood

log_likelihood = bb.LogLikelihood(targets=target, fwd_functions=fwd_function)

Run the Bayesian sampling

inversion = bb.BayesianInversion(
    log_likelihood=log_likelihood, 
    parameterization=parameterization, 
    n_chains=10
)
inversion.run(
    n_iterations=50_000, 
    burnin_iterations=20_000, 
    save_every=100, 
    verbose=False, 
)

Retrieve the results and plot

In the following block, we plot all the saved predicted data along with their median (left) and a histogram of the data standard deviation as inferred from the observations (right). With an average of \(\sim\)16.5, the inferred data noise is slightly underestimated in this case, yet it falls close to the true noise standard deviation of 20 that was used to generate the synthetic data.

results = inversion.get_results()

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
y_pred = np.array(results["my_data.dpred"])
ax1.plot(DATA_X, y_pred[0], c='gray', lw=0.3, alpha=0.3, label="Inferred data")
ax1.plot(DATA_X, y_pred[1:].T, c='gray', lw=0.3, alpha=0.3)
ax1.plot(DATA_X, y, c='orange', label='Predicted data (true model)')
ax1.plot(DATA_X, np.median(y_pred, axis=0), c="blue", label='Median inferred samples')
ax1.scatter(DATA_X, y_noisy, c='r', label='Noisy observations', zorder=3)
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.legend()

ax2.axvline(x=DATA_NOISE_STD, color='r', lw=3, alpha=0.3, label='True noise')
pdf, bins, _ = ax2.hist(results['my_data.std'], density=True, bins=15, ec='w', zorder=100, label='Posterior')
ax2.fill_between([target.std_min, target.std_max], 1 / (target.std_max - target.std_min), alpha=0.2, label='Prior')
ax2.set_xlabel('Standard deviation')
ax2.legend()
plt.show()
../_images/151dcef3594c467a409f19ced852392b2bb8ebf92ab996b9d8e3e9af16bee059.png

Finally, we create a figure that displays the inferred polynomial coefficients, with yellow dots indicating the true values. This is achieved using the ArviZ library, which provides useful functionalities for plotting inference results.

import arviz as az

fig, axes = plt.subplots(4, 4, figsize=(10, 8))
samples = {f'm{i}': results[f'my_param_space.m{i}'] for i in range(4)}
_ = az.plot_pair(
    samples,
    marginals=True,
    reference_values={'m0': M0, 'm1': M1, 'm2': M2, 'm3': M3},
    reference_values_kwargs={'color': 'yellow',
                             'ms': 10},
    kind='kde',
    kde_kwargs={
        'hdi_probs': [0.3, 0.6, 0.9],  # Plot 30%, 60% and 90% HDI contours
        'contourf_kwargs': {'cmap': 'Blues'},
        },
    ax=axes,
    textsize=10
    )
plt.tight_layout()
/home/fabrizio/mambaforge/envs/seislib/lib/python3.9/site-packages/arviz/data/base.py:221: UserWarning: More chains (3000) than draws (1). Passed array should have shape (chains, draws, *shape)
  warnings.warn(
/home/fabrizio/mambaforge/envs/seislib/lib/python3.9/site-packages/arviz/plots/pairplot.py:232: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  gridsize = int(dataset.dims["draw"] ** 0.35)
../_images/1c9c3bc2b3cfd3f9cb4c559fae1d27cdc7b1b7c82994735e5c45b1e029355074.png