Part I: Known Data Noise
We assume here that the data noise is known. For a hierarchical example, where the data noise is treated as unknown, please refer to the Part II: Hierarchical Sampling of this tutorial.
Import libraries and define constants
import bayesbay as bb
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(30)
# dimensions and true coefficients
N_DIMS = 4
M0, M1, M2, M3 = 20, -10, -3, 1
# data and noise
N_DATA = 15
DATA_X = np.linspace(-5, 10, N_DATA)
DATA_NOISE_STD = 20
Forward operator and noisy observations
In the following code block we define the forward operator — which will be used later in the tutorial to generate data predictions based on the considered model coefficients — and the observed data.
fwd_operator = np.vander(DATA_X, N_DIMS, True)
y = fwd_operator @ [M0, M1, M2, M3]
y_noisy = y + np.random.normal(0, DATA_NOISE_STD, y.shape)
fig, ax = plt.subplots()
ax.set_title('Data')
ax.plot(DATA_X, y, 'k', lw=2, label='Predicted data (true model)')
ax.plot(DATA_X, y_noisy, 'ro', label='Noisy observations')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.grid()
ax.legend()
plt.show()

Setting up the Bayesian sampling
To solve the inverse problem inherent in inferring the coefficients \(\mathbf{m}\) from the observations, we need to define:
A Parameterization, encapsulating the unknowns parameters and their prior probability
A forward function, enabling the prediction of data points \(\mathbf{d}_{pred}\) from the model \(\mathbf{m}\)
What we refer to as a Target, that is, a Python object designed to store all information about a given data set
1. Prior probability
Here, we assume that prior information on the sought parameters is available. For instance, while we may not know the true values of \(m_1\) and \(m_2\), we understand that these values are bounded within specific ranges, namely \(-13\leq m_1 \leq7\) and \(-10\leq m_2 \leq4\). As demonstrated in the following block, this prior information can be implemented through UniformPrior, that is, a free parameter characterized by a uniform prior probability.
Conversely, we might know that \(m_0\) and \(m_3\) are likely associated with specific values, without, however, being able to provide hard bounds for such values. It is then natural to define the prior probabilities of these parameters using a Gaussian distribution. This is achieved in BayesBay through GaussianPrior. (Different options for the prior of a free parameter can be implemented via CustomPrior.)
Finally, note the argument perturb_std
in the below block, taken by both UniformPrior
and GaussianPrior
. The numerical value assigned to this argument represents the standard deviation of the Gaussian distribution used to propose a new model by perturbing the considered parameter at a given Markov-chain iteration.
m0 = bb.prior.GaussianPrior(name="m0", mean=20, std=1, perturb_std=0.5)
m1 = bb.prior.UniformPrior(name="m1", vmin=-13, vmax=-7, perturb_std=0.4)
m2 = bb.prior.UniformPrior(name="m2", vmin=-10, vmax=4, perturb_std=0.5)
m3 = bb.prior.GaussianPrior(name="m3", mean=1, std=0.1, perturb_std=0.1)
2. Parameter space and parameterization
The free parameters we have defined, regardless of their prior probability, should be used in BayesBay to create what we call a ParameterSpace. Mathematically, ParameterSpace
can be thought of as a \(n\)-dimensional vector space. Each point (vector) within this space corresponds to a free parameter in the inference problem, and the position of each point defines the model parameters \(\mathbf{m}_i \in \mathbb{R}^n\), with \(\mathbf{m}_i \subseteq \mathbf{m}\). Importantly, in our implementation \(n\) can be treated as a variable to enable trans-dimensional Bayesian sampling.
From a computer programming perspective, ParameterSpace
could be seen as a specialized container that not only groups a number of free parameters but also determines their dimensionality. For instance, in this tutorial’s case, ParameterSpace
will contain four distinct free parameters (namely, the unknown coefficients \(m_0\), \(m_1\), \(m_2\), and \(m_3\)), and since these are scalars, its dimensionality will remain fixed and equal to one throughout the inversion. Finally, under the hood ParameterSpace
also defines the perturbation functions used to propose new model parameters from the given ones (for more information, see the module bayesbridge.perturbations).
Having defined one or more instances of ParameterSpace
, these should be used to define a Parameterization. Parameterization
is a relatively simple object compared to ParameterSpace
, and its main purpose is to aggregate all model parameters (from all specified instances of ParameterSpace
) to define \(\mathbf{m} = \lbrace \mathbf{m}_1, \mathbf{m}_2, \mathbf{m}_3, \dots \rbrace\).
param_space = bb.parameterization.ParameterSpace(
name="my_param_space",
n_dimensions=1,
parameters=[m0, m1, m2, m3],
)
parameterization = bb.parameterization.Parameterization(param_space)
3. Forward problem
Another fundamental ingredient in BayesBay is the definition of the forward function. This function is required to return the predicted data in the form of a numpy array, which is then used to calculate the likelihood \(p({\bf d}_{obs} \mid {\bf m})\). As illustrated in the code block below, the forward function should be programmed to take in a bayesbay.State, from which all model parameters can be accessed.
def fwd_function(state: bb.State) -> np.ndarray:
m = [state["my_param_space"][f"m{i}"] for i in range(N_DIMS)]
return np.squeeze(fwd_operator @ m) # dpred = G m
4. Observed data: the Target
In BayesBay, the observed data, \(\mathbf{d}_{obs}\) should be used to define what we call a data Target, which, as demonstrated in Part II: Hierarchical Sampling, allows for treating the data noise as an unknown. Here, we assume the data noise is known and use the true noise standard deviation to define the inverse of the data covariance matrix.
target = bb.likelihood.Target(
name="my_data",
dobs=y_noisy,
covariance_mat_inv=1/DATA_NOISE_STD**2)
5. Log Likelihood
Having defined the forward function and the target, these can be used to initialize an instance of the LogLikelihood class. (Alternatively, such an instance can be initialized through a function that either calculates \(\log(p({\bf d}_{obs} \mid {\bf m}))\) or \(\log \left( \frac{p(\mathbf{d}_{obs} \mid \mathbf{m}')}{p(\mathbf{d}_{obs} \mid \mathbf{m})}\right)\), where \(\mathbf{m}'\) denotes the proposed model.) The LogLikelihood
instance represents the last ingredient needed to run the Bayesian sampling.
log_likelihood = bb.likelihood.LogLikelihood(targets=target, fwd_functions=fwd_function)
Run the Bayesian sampling
We can now use all the defined variables to solve the inference problem. The code provided below is self-explanatory. We sample the posterior using ten Markov chains, each run on a separate CPU for 50,000 iterations. (To modify the number of CPUs employed in the sampling, see the argument parallel_config
in bayesbay.BayesianInversion.run().) The first 20,000 iterations are discarded as the burn-in phase. After the burn-in, we save one model every 100 iterations.
Notice the arguments targets
and fwd_functions
of bayesbay.BayesianInversion: rather than accepting only one data target and forward function, as defined earlier, these can accept a list of targets and a list of functions. This flexibility makes it straightforward to carry out joint inversions of multiple datasets, each potentially associated with different forward functions.
inversion = bb.BayesianInversion(
log_likelihood=log_likelihood,
parameterization=parameterization,
n_chains=10
)
inversion.run(
n_iterations=50_000,
burnin_iterations=20_000,
save_every=100,
verbose=False,
)
for chain in inversion.chains:
chain.print_statistics()
Chain ID: 0
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5655/50000 (11.31 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5655/50000 (11.31%)
Chain ID: 1
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5393/50000 (10.79 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5393/50000 (10.79%)
Chain ID: 2
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5542/50000 (11.08 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5542/50000 (11.08%)
Chain ID: 3
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5662/50000 (11.32 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5662/50000 (11.32%)
Chain ID: 4
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5580/50000 (11.16 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5580/50000 (11.16%)
Chain ID: 5
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5642/50000 (11.28 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5642/50000 (11.28%)
Chain ID: 6
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5392/50000 (10.78 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5392/50000 (10.78%)
Chain ID: 7
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5626/50000 (11.25 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5626/50000 (11.25%)
Chain ID: 8
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5714/50000 (11.43 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5714/50000 (11.43%)
Chain ID: 9
TEMPERATURE: 1
EXPLORED MODELS: 50000
ACCEPTANCE RATE: 5468/50000 (10.94 %)
PARTIAL ACCEPTANCE RATES:
ParamPerturbation(['my_param_space.m0', 'my_param_space.m1', 'my_param_space.m2', 'my_param_space.m3']): 5468/50000 (10.94%)
Retrieve the results and plot
In the following block, we first retrieve the samples collected during the inference by calling inversion.get_results()
. We then plot all the saved predicted data along with their median.
results = inversion.get_results()
fig, ax = plt.subplots()
y_pred = np.array(results["my_data.dpred"])
ax.plot(DATA_X, y_pred[0], c='gray', lw=0.3, alpha=0.3, label="Inferred data")
ax.plot(DATA_X, y_pred[1:].T, c='gray', lw=0.3, alpha=0.3)
ax.plot(DATA_X, y, c='orange', label='Predicted data (true model)')
ax.plot(DATA_X, np.median(y_pred, axis=0), c="blue", label='Median inferred samples')
ax.scatter(DATA_X, y_noisy, c='r', label='Noisy observations', zorder=3)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.legend()
plt.show()

Finally, we create a figure that displays the inferred polynomial coefficients, with yellow dots indicating the true values. This is achieved using the ArviZ library, which provides useful functionalities for plotting inference results.
import arviz as az
fig, axes = plt.subplots(4, 4, figsize=(10, 8))
samples = {f'm{i}': results[f'my_param_space.m{i}'] for i in range(4)}
_ = az.plot_pair(
samples,
marginals=True,
reference_values={'m0': M0, 'm1': M1, 'm2': M2, 'm3': M3},
reference_values_kwargs={'color': 'yellow',
'ms': 10},
kind='kde',
kde_kwargs={
'hdi_probs': [0.3, 0.6, 0.9], # Plot 30%, 60% and 90% HDI contours
'contourf_kwargs': {'cmap': 'Blues'},
},
ax=axes,
textsize=10
)
/home/fabrizio/mambaforge/envs/seislib/lib/python3.9/site-packages/arviz/data/base.py:221: UserWarning: More chains (3000) than draws (1). Passed array should have shape (chains, draws, *shape)
warnings.warn(
/home/fabrizio/mambaforge/envs/seislib/lib/python3.9/site-packages/arviz/plots/pairplot.py:232: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
gridsize = int(dataset.dims["draw"] ** 0.35)
