A Practical Introduction to Nested Sampling: Part 1

During my PhD, I have been lucky to be supervised by a pretty famous Bayesian, Steve Gull, who by proxy introduced me to John Skilling, the inventor of Nested Sampling [1]. My PhD went in half a dozen different directions, most of them well explored, and the Bayesian side of Radio Astronomy was something I fully leaned into with Steve. It didn’t make it into my thesis, as I had so much other work completed it didn’t cohere as well with the other parts, but it was one of the most interesting things I’ve studied, and I still find enduring uses for it beyond anything that actually went into my thesis!

During this mini-series, I’d like to demonstrate the power of Nested Sampling with a few different examples, implementing a fully Bayesian probabilistic and optimisation framework. Those who have previously done probabilistic modelling before may have encountered terms such as Maximum Likelihood modelling in conjunction with sampling algorithms such as Metropolis Hastings (which is itself a form of the more general Markov Chain Monte Carlo methods). Whereas a Maximum Likelihood method is pseudo-Bayesian (due to discarding the evidence values), Nested Sampling is a true Bayesian method with several properties that make it very pleasant for implementing in practice on modern parallel computer hardware architectures. Throughout I have used D. S. Sivia’s excellent introduction to the topic in his book “Data Analysis: A Bayesian Tutorial” [2].

Quick Theoretical Introduction

Before we get into Nested Sampling proper, let’s start with Baye’s formula: $$P(\theta | D,M) = \frac{P(D|\theta,M)P(\theta|M)}{P(D|M)} $$ where $\theta$ is a general set of hyper parameters we will be optimising for. $M$ is the model pertaining to our likelihood function, and $D$ is our data. This can be rewritten as $$Ps = \frac{L\pi}{Z}$$ defining $Ps$ is our posterior, $L $ as our likelihood, $\pi$ as our prior, and $Z$ as our evidence.

The most important, and most neglected, term in Bayesian optimisation is what is known under dual terms as either the “Evidence” or the “Marginal Likelihood”. The rigorous definition of the evidence is $$Z = \iint \cdots \int L(\theta) \pi(\theta) d\theta $$ where we integrate the likelihood (weighted by the prior) over the hyper parameters. This is known as marginalisation, and gives rise to the term “Marginal Likelihood. In practice, computing this multidimensional integral is often very intractable which is why many “Bayesians” abandon the evidence, such as in maximum likelihood methods, which is not necessarily a bad thing in the case of a single model. However, when comparing multiple models, say several neural networks of different configuration, it is impossible to compare without this term. Nested Sampling treats the evidence as a first class quantity and is what sets it apart from other methods.

With the idea of computing an integral piece by piece using something like the rectangle or trapezium rule, we can look at the problem from 90 degrees, and see if we can do something similar. First lets define a term $$ \bar{L} = \iint \cdots \int_{L(\theta) > \lambda } \pi(\theta) d\theta $$ defining a prior volume where the likelihood is greater than a value $\lambda$.

Nested Sampling estimates the prior mass by defining $N $ atoms which sit within an $M $ dimensional space. Each atom has associated with it properties relative to the model being used, but at their most basic will have some parameters, $\theta_i $, and the value of the likelihood at that point. The likelihood is usually expressed as a logarithm for numerical precision reasons.

The Nested Sampling algorithm iterates, and at each iteration, we sort the $N $ atoms by their likelihood value, and delete the sample with the lowest likelihood. This value of lowest likelihood marks a ‘contour’ in the likelihood domain, and we then replace the deleted object with a new object within this likelihood contour. At each point, the prior volume, $\bar{L} $ contracts, and we store the deleted object in a set of discarded objects from which we can compute the evidence and the posterior. For each deleted object we store its logarithmic area which can be defined at iterate $j$ as $$W = log(L) + \bar{W} $$ $$ \bar{W} = \bigg[ log \big( 1.0 – exp \big( \frac{-1 }{N} \big) \big) – \frac{j}{N} \bigg]$$ This is the cornerstone of the entire method and for a full explanation of why we can make this assumption in constraining the prior width at each iteration in this analytic fashion, the interested reader should refer to [1] and [2] (specifically Chapter 9, Section 2.1 in [2] ).

To compute the evidence we use this associated width at each element in the set of discarded samples to define an area $A = hL $ where $h = \delta \bar{L}_{j-1} – \bar{L}_{j} $, allowing us to define the evidence as approximately $$Z \approx \sum_k A_k $$ From here the extension to posterior at each point should be fairly obvious. It is possible to use other methods for approximating the integral past the rectangle summation I am using here, but as noted in [2] the uncertainty in $\bar{W}$ is more dominant than the error from the integration method.

To get a full introduction to this topic I highly recommend reading [1] and Chapter 9 of [2]. [3] explains one of the more widespread implementations of nested sampling, and has some interesting observations on key points such as selecting a new atom at each iteration.

Implementation in C

Below I’ve written what an example nested sampling loop looks like at its crux. I wrote this in C as I find this more comfortable for numerical algorithms like this, personally. This is based on an example, with my modifications, from the book “Data Analysis: A Bayesian Tutorial” by D. S. Sivia and J. Skilling.

You will note I have used stack allocations for our atoms, which is perfectly acceptable for the small examples here, but in larger examples could lead to a stack overflow.

// Object within an n-dimensional space
typedef struct nestedObject {
    double x[N_DIM]; // Pointer to an array with num_dim elements.
    double log_likelihood;
    double logarea;
} nestedObject;


int main(void)
{
    nestedObject objects[N];
    nestedObject samples[MAX];
    double logarea; // ln(width in prior mass)
    double logLstar; // ln(Likelihood constraint)
    double H = 0.0; // Information (sometimes known as the negative log entropy)
    double logZ = -DBL_MAX; // ln(Evidence Z), initially 0
    double logZnew; // Updated logZ
    int i; // Object counter
    int copy; // Duplicated object
    int worst; // Worst object
    int nest; // Nested sampling iteration count
    fprintf( stderr, "Initialising priors on objects...");
    // Set prior objects
    for( i = 0; i < N; ++i )
    {
        Prior(&objects[i]);	    
    }
    fprintf( stderr, "done\n");

#if VERBOSE_DEBUG_MESSAGES
    for( i = 0; i < N; ++i )
    {
        fprintf( stderr, "Prior Object %d: %g %g logL:%g\n", i, objects[i].x[0], objects[i].x[1], objects[i].log_likelihood );
    }
#endif

    // Outermost interval of prior mass
    logarea = log( 1.0 - exp( - 1.0 / N ) );

    // START - Nested Sampling Loop
    for( nest = 0; nest < MAX; ++nest )
    {
        worst = 0;
        for( i = 1; i < N; ++i )
        {
            if( objects[i].log_likelihood < objects[worst].log_likelihood ) {
                worst = I;
            }
        }
        objects[worst].logarea = logarea + objects[worst].log_likelihood;
        // Update Evidence Z and Information H
        logZnew = PLUS( logZ, objects[worst].logarea );
        H = exp( objects[worst].logarea - logZnew ) * objects[worst].log_likelihood
	    + exp( logZ - logZnew ) * ( H + logZ ) - logZnew;
        logZ = logZnew;

        // Posterior samples (optional, not needed for optimising over the parameter space)
        samples[nest] = objects[worst];

        // Kill the worst object in favour of the copy of different survivor
        do {
	    copy = (int)( N * UNIFORM ) % N;
        }
        while( copy == worst && N > 1 );
        logLstar = objects[worst].log_likelihood;
        objects[worst] = objects[copy];

        // Evolve the copied object within our new constraint.
        Explore( &objects[worst], objects, logLstar );

        // Shrink the interval
        logarea -= 1.0 / N;
    }
    // END - Nested Sampling Loop
#if VERBOSE_DEBUG_MESSAGES
    for( i = 0; i < N; ++i )
    {
        fprintf( stderr, "Post Nested Sampling Object %d: %g %g logL:%g\n", i, objects[i].x[0], objects[i].x[1], objects[i].log_likelihood );
    }
#endif

    printf( "# iterations = %d\n", nest );
    printf( "Evidence: ln(Z) = %g +- %g\n", logZ, sqrt( H / N ) );
    printf( "Information: H = %g nats = %g bits\n", H, H / log( 2. ) );
    printf( "Final posterior sample: \n" );
    for(int i = 0; i < N_DIM; ++i) printf( "Coord %d:%g\n", i, samples[nest].x[i] );
    printf( "logL:%g +- %g\n", samples[nest].log_likelihood, sqrt( H / N ) );
    dumpSamples( samples, nest, logZ );

    return 0;
}

Prior – Sets up the prior for the nested sampling ‘atoms’ that we use to traverse our space.
Explore – After one of the atoms has been killed off, this creates a new atom within our new likelihood constraint. I use a very rudimentary genetic algorithm to do this.
Results – Takes our killed samples and uses them to derive some moments from our posterior distribution. Here we assume a Gaussian distribution, but in Part 2, we will consider a non-gaussian distribution.

So far so good, not too much going on. As Sivia points out, the only thing you need to do is define three methods:

Here are the functions I used for a simple multimodal gaussian model:

#define MODEL_MEAN ( 20.0 ) // Mean the same in all dimensions
#define MODEL_SIGMA ( 8.0 )  // Standard deviation the same in all dimensions
#define MODE_SHIFT ( 17.6 ) // Used for a second mode in multimodal model.

// Likelihood ("The Model") prototypes.
double _model( nestedObject *object ); // Basic gaussian model
double _multimodal_model( nestedObject *object ); // Bi-model gaussian model
double _lighthouse_model( nestedObject *object ); // The famous light-house model.
#define LIKELIHOOD( object ) ( _multimodal_model( object ) );

#define PRIOR_BOX ( 30.0 )
#define PRIOR_MIN ( MODEL_MEAN - PRIOR_BOX )
#define PRIOR_MAX ( MODEL_MEAN + PRIOR_BOX )
void Prior(nestedObject *object); // Sets up an object within our prior boundaries

// For re-nesting an object after it has been killed.
void Explore(nestedObject *object,  // Object we want to replace 
	     nestedObject *objects, // Objects 
	     double logLstar); // The new log likelihood limit.

// Basic gaussian model
double _model(nestedObject *object)
{
    double exponent = 0.0;
    for( int i = 0; i < N_DIM; ++i )
    {
        double coord = object->x[I];
        exponent -= ( pow( ( coord - MODEL_MEAN ) / MODEL_SIGMA , 2 ) );
    }
    exponent *= (double)( N_DIM / 2 );
    return 1 + log( exp( exponent ) );
}

// Slightly more elaborate multi-model gaussian model
double _multimodal_model(nestedObject *object)
{
    double exponent1 = 0.0;
    double exponent2 = 0.0;
    // Mode 1
    for( int i = 0; i < N_DIM; ++i )
    {
        double coord = object->x[I];
        exponent1 -= ( pow( ( coord - MODEL_MEAN ) / MODEL_SIGMA, 2 ) );
    }
    exponent1 *= (double)( N_DIM / 2 );

    // Mode 2
    for( int i = 0; i < N_DIM; ++i )
    {
        double coord = object->x[I];
        exponent2 -= ( pow( ( coord - MODEL_MEAN - MODE_SHIFT  ) / MODEL_SIGMA, 2 ) );
    }
    exponent2 *= (double)( N_DIM / 2 );

    return 1 + log( exp ( exponent1 ) + 1.8 * exp( exponent2 ) );
}

void Prior(nestedObject *object)
{
    for(int i = 0; i < N_DIM; ++i) {
        object->x[i] = ( UNIFORM - 0.5 ) * ( 2 * PRIOR_BOX ) + MODEL_MEAN;// - ( 2 * PRIOR_BOX - MODEL_MEAN );
    }
    object->log_likelihood = LIKELIHOOD( object );
}

// For re-nesting an object after it has been killed.
// We use a very crude genetic algorithm to do this by mixing co-ords from two random objects.
void Explore(nestedObject *object,  // Object we want to replace 
	     nestedObject *objects, // Objects 
	     double logLstar)
{
    double evolvedLogL = 0.0;
    nestedObject dummy;
    do
    {
	int parents[2] = { 0 };
	parents[0] = (int)( N * UNIFORM ) % N;
	parents[1] = (int)( N * UNIFORM ) % N;

	if( parents[0] == parents[1] ) continue; // We can not reproduce asexually.
        for( int i = 0; i < N_DIM; ++I )
        {
            double mixing_factor = UNIFORM;
	    double new_gene = mixing_factor * objects[parents[0]].x[i] +
		             ( 1 - mixing_factor ) * objects[parents[1]].x[i];
	    dummy.x[i] = new_gene;
	}
        evolvedLogL = LIKELIHOOD( &dummy );	
    }
    while( evolvedLogL < logLstar );
    dummy.log_likelihood = evolvedLogL;
    dummy.logarea = 0.0;
    *object = dummy;
}


void Results(nestedObject *samples,
	     int nest,
	     double logZ )
{
    double x = 0.0, xx = 0.0;
    double y = 0.0, yy = 0.0;
    double w;
    int i;

    for( i = 0; i < nest; ++i )
    {
        w = exp( samples[i].logarea - logZ );
        x += w * samples[i].x[0];
        xx += w * samples[i].x[0] * samples[i].x[0];
        y += w * samples[i].x[1];
        yy += w * samples[i].x[1] * samples[i].x[1];
    }
    fprintf( stderr, "### Posterior Results ###\n" );
    fprintf( stderr, "Mean(x) = %g, Std. Dev.(x) = %g\n", x, sqrt( xx - x * x ) );
    fprintf( stderr, "Mean(y) = %g, Std. Dev.(y) = %g\n", y, sqrt( yy - y * y ) );
}

So above I’ve given a fair jumble of C code (the full source will be given below), and now we can see it in action! So here I am cheating slightly by giving a analytical (and thus very fast and easy) definition of the likelihood, which serves our purposes for this example. In reality, this likelihood is the most difficult term to define and will vary dramatically depending on what the problem is, for example a sampling from the ‘likelihood’ could be:

A closed form analytical representation – like this one here.
Running a physics based simulation with the current hyper-parameters for the atom you are handling.
Running a few physics based simulations, and fitting a curve to construct an approximate likelihood. This is known as “Likelihood Free Inference. (There is an excellent article from Miles Cranmer on this: https://astroautomata.com/blog/simulation-based-inference/)
Drilling a borehole for working out how much ore or water is at a particular geographic location, trying to find the greatest mass of extractable material.

With that in mind, let’s move onto seeing this in action. We first start out with our atoms uniformly distributed throughout our prior space. The choice of prior is important but not critical, as the likelihood contours will quickly render it insubstantial. However, in the case of a uniform prior, if there is a large mode outside its bounds, then you will never find it. The prior is closely associated with the underlying model you are using and will differ on a case by case basis.

At each iteration, we kill the atom with the worst likelihood performance, and this becomes a sample in what will create our posterior distribution. The newly deceased atom now forms the next contour in the likelihood, and as we iterate we slowly restrict the prior mass that we are exploring until we can by definition only be at the chosen extrema of the prior, weighted by the likelihood. At all points, the quantity of greatest importance is the evidence, not the likelihood, which merely serves to demarcate iterations. In this vein, we can also keep track of the negative entropy, more commonly known as the “Information”, contained in our space. This is very similar to the thermodynamic optimisation process known as “Simulated Annealing” where we define a thermodynamic annealing schedule, and slowly lower the “temperature” until we’ve reached our optimal position. I am not certain if Nested Sampling can be said to take direct inspiration from these methods inspired by Statistical Physics, but it is well known John Skilling (and Steve Gull) have both worked extensively in Statistical Physics over the years.

Results

Now let us observe how our distributions of posterior points changes as a function of number of iterations. We run my basic nested sampling algorithm above with 3000 iterations with 100 atoms, and observe how the posterior samples change over these 3000 iterations, by plotting batches of 200 samples in a GIF:

Not bad! We start with our atoms distributed uniformly across our prior space, then slowly begin to reduce the logarithmic prior mass, until we converge on the maximum in our multimodal distribution. The wonderful thing about nested sampling is that due to its atomistic properties, it can map out the posterior distribution of highly non-gaussian likelihoods, with significant degeneracies in the underlying probability space (this regularly arises when training Neural Networks, for instance, and please see “Compromise-free Bayesian neural networks” by Kamran Javid [4], for more information).

Now let’s try pushing our gaussian modes together to make things less easier for the sampler, and see where we end up.

Evolution of posterior samples overlayed over ground truth. Ground truth is two gaussians fairly close to each other.

Nested sampling does a good job at finding the maximum of the distribution of the underlying ground truth. At every point we have access to samples from the evidence as the atomistic distribution evolves. When combined with prior weighted by the likelihood, we can map out the posterior distribution $P(\theta | D, M )$ by looping through our posterior samples. Let’s dump our posterior samples to disk using some fairly rudimentary binary file:

void dumpSamples(nestedObject *samples,
		 int nest,
		 double logZ)
{
    FILE *write_pointer;
    write_pointer = fopen( "nestedSamples.bin", "wb" );

    for(int i = 0; i < nest; ++i)
    {
        double x = samples[i].x[0];
        double y = samples[i].x[1];
        double logL = samples[i].log_likelihood;
        double logarea = samples[i].logarea;
        fwrite( &x, sizeof( x ), 1, write_pointer );
        fwrite( &y, sizeof( y ), 1, write_pointer );
        fwrite( &logL, sizeof( logL ), 1, write_pointer );
        fwrite( &logarea, sizeof( logarea ), 1, write_pointer );
        fwrite( &logZ, sizeof( logZ ), 1, write_pointer );
    }
    fclose( write_pointer );
}

Whilst C is good for numerical algorithms like this, and will likely remain the favourite along with C++ and Fortran, it is not the best for plotting libraries. So I’ll use matplotlib in Python to do the rest:

import numpy
import imageio
import matplotlib.pyplot as plt

N_SAMPLES = 3000
samples = numpy.fromfile( "nestedSamples.bin", dtype=numpy.double )
samples = samples.reshape( N_SAMPLES, 5 )
PRIOR_MIN = -10.0
PRIOR_MAX = 50.0

UNIFORM_PRIOR_CONSTANT = ( 1.0 / ( PRIOR_MAX - PRIOR_MIN ) ** 2.0 )

plt.tricontour( samples[:,0], samples[:,1],samples[:,2] )
plt.title( "Log Likelihood Distribution" )
plt.colorbar()
plt.savefig( "log_likelihood.pdf" )
plt.show()

POSTERIOR_SAMPLES_MIN = 0
POSTERIOR_SAMPLES_MAX = 3000
posterior_samples = samples[POSTERIOR_SAMPLES_MIN:POSTERIOR_SAMPLES_MAX,:]
plt.tricontour(posterior_samples[:,0], posterior_samples[:,1],( posterior_samples[:,2] + numpy.log(UNIFORM_PRIOR_CONSTANT) - ( posterior_samples[:,3] - posterior_samples[:,4]) ),levels=numpy.arange(-40,10,0.2) )
plt.title( "Posterior Distribution" )
plt.savefig( "log_posterior.pdf" )
plt.colorbar()

plt.show()

The matplotlib.axes.Axes.tricontour plot interpolates the contours based on being given a sparse sampling of data, which is useful for us to see how nested sampling maps out our likelihood:

Log Likelihood mapped out by the nested sampling algorithm. Contours are every 3 in log space.

Looks as expected, with the final step being to generate our full posterior, and as we have access to the evidence values at each point, we can thus calculate a sample from the posterior at each point by using our final evidence value and the current prior width parameter. Due to the peakedness in log space due to the density of the atoms around the highest mode, the contour algorithm can get a bit jagged but this suffices for our purposes:

Posterior Distribution. Contours are every 0.1 in log space.

Thus we end up with our final posterior distribution incorporating all our knowledge of the prior, and most importantly the evidence value, which nested sampling allows us to calculate probabilistically at each point in the algorithm. The above code is minimal but shows that this is not an approach requiring enormous and opaque libraries without user serviceable parts.

However the devil is in the details for nested sampling, particularly in the Explore method that we defined above. The various generations of Nested Sampling implementations can be roughly defined by the way in which they define a new atom after the worst one has been killed off, and this is the primary difference between implementations such as Bayesys, MultiNest, and PolyChord. The latter is the primary commercial package in use today, using slice sampling as its primary exploration method, in line with the original intentions of John Skilling. Here I cut the corner and went for a basic genetic algorithm to compute a new point based on mating two random survivors, which seems to have worked decently in this fairly trivial case, at the cost of extra likelihood sampling. I have not checked in detail if this is worse than something such as Metropolis-Hastings or Gibbs sampling, which both require careful tuning of the covariance matrix you use to compute the jump, with Gibbs sampling further compounding the problem through its iterating through each dimension.

Whilst Gibbs sampling can work well in higher dimensional spaces, you do need to have a decent understanding of the underlying space (and thus its analytical properties) as with Metropolis-Hastings, which may not work well with more open problems. Other probabilistic assumptions become less accurate in higher dimensions, such as assuming gaussianity, due to higher-order terms becoming more prevalent.

In short, Nested Sampling provides one of the best numerical methods for doing Bayesian statistics and optimisation, however with complexity in defining the exploration within each likelihood contour. That said, it is a fast and rigorous method for working directly with the Bayesian evidence.

Full C Implementation

This is a standalone implementation that should compile under gcc or clang as so:

gcc nested_sampling.c -o nested_sampling.out -D VERBOSE_DEBUG_MESSAGES=1

and the source:

// This is adapted from D.S.Sivia's book:
//  Data Analysis: A Bayesian Tutorial
// Chapter 9 - Nested Sampling

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <float.h>

#ifndef VERBOSE_DEBUG_MESSAGES
#define VERBOSE_DEBUG_MESSAGES ( 0 )
#endif

#define N_DIM ( 2 ) // Dimensionality of the likelihood space
#define N ( 100 ) // Number of atoms
#define MAX ( 3000 ) // Max number of iterations

#define UNIFORM ( ( rand() + 0.5 ) / ( RAND_MAX + 1.0 ) ) // Defined on domain (0,1) 
#define PLUS(x,y) ( x > y ? x + log( 1 + exp( y - x ) ) : y + log( 1 + exp( x - y ) ) )

// Object within an n-dimensional space
typedef struct nestedObject {
    double x[N_DIM]; // Pointer to an array with num_dim elements.
    double log_likelihood;
    double logarea;
} nestedObject;


#define MODEL_MEAN ( 20.0 ) // Mean the same in all dimensions
#define MODEL_SIGMA ( 8.0 )  // Standard deviation the same in all dimensions
#define MODE_SHIFT ( 17.6 ) // Used for a second mode in multimodal model.

// Likelihood ("The Model") prototypes.
double _model( nestedObject *object ); // Basic gaussian model
double _multimodal_model( nestedObject *object ); // Bi-model gaussian model
double _lighthouse_model( nestedObject *object ); // The famous light-house model.
#define LIKELIHOOD( object ) ( _multimodal_model( object ) );

#define PRIOR_BOX ( 30.0 )
#define PRIOR_MIN ( MODEL_MEAN - PRIOR_BOX )
#define PRIOR_MAX ( MODEL_MEAN + PRIOR_BOX )
void Prior(nestedObject *object); // Sets up an object within our prior boundaries

// For re-nesting an object after it has been killed.
void Explore(nestedObject *object,  // Object we want to replace 
	     nestedObject *objects, // Objects 
	     double logLstar); // The new log likelihood limit.

void dumpSamples(nestedObject *samples,
		 int nest,
		 double logZ);

int main(void)
{
    nestedObject objects[N];
    nestedObject samples[MAX];
    double logarea; // ln(width in prior mass)
    double logLstar; // ln(Likelihood constraint)
    double H = 0.0; // Information (sometimes known as the negative log entropy)
    double logZ = -DBL_MAX; // ln(Evidence Z), initially 0
    double logZnew; // Updated logZ
    int i; // Object counter
    int copy; // Duplicated object
    int worst; // Worst object
    int nest; // Nested sampling iteration count
    fprintf( stderr, "Initialising priors on objects...");
    // Set prior objects
    for( i = 0; i < N; ++i )
    {
        Prior(&objects[i]);	    
    }
    fprintf( stderr, "done\n");

#if VERBOSE_DEBUG_MESSAGES
    for( i = 0; i < N; ++i )
    {
        fprintf( stderr, "Prior Object %d: %g %g logL:%g\n", i, objects[i].x[0], objects[i].x[1], objects[i].log_likelihood );
    }
#endif

    // Outermost interval of prior mass
    logarea = log( 1.0 - exp( - 1.0 / N ) );

    // START - Nested Sampling Loop
    for( nest = 0; nest < MAX; ++nest )
    {
        worst = 0;
        for( i = 1; i < N; ++i )
        {
            if( objects[i].log_likelihood < objects[worst].log_likelihood ) {
                worst = i;
            }
        }
        objects[worst].logarea = logarea + objects[worst].log_likelihood;
        // Update Evidence Z and Information H
        logZnew = PLUS( logZ, objects[worst].logarea );
        H = exp( objects[worst].logarea - logZnew ) * objects[worst].log_likelihood
	    + exp( logZ - logZnew ) * ( H + logZ ) - logZnew;
        logZ = logZnew;

        // Posterior samples (optional, not needed for optimising over the parameter space)
        samples[nest] = objects[worst];

        // Kill the worst object in favour of the copy of different survivor
        do {
            copy = (int)( N * UNIFORM ) % N;
        }
        while( copy == worst && N > 1 );
        logLstar = objects[worst].log_likelihood;
        objects[worst] = objects[copy];

        // Evolve the copied object within our new constraint.
        Explore( &objects[worst], objects, logLstar );

        // Shrink the interval
        logarea -= 1.0 / N;
    }
    // END - Nested Sampling Loop
#if VERBOSE_DEBUG_MESSAGES
    for( i = 0; i < N; ++i )
    {
        fprintf( stderr, "Post Nested Sampling Object %d: %g %g logL:%g\n", i, objects[i].x[0], objects[i].x[1], objects[i].log_likelihood );
    }
#endif

    printf( "# iterations = %d\n", nest );
    printf( "Evidence: ln(Z) = %g +- %g\n", logZ, sqrt( H / N ) );
    printf( "Information: H = %g nats = %g bits\n", H, H / log( 2. ) );
    printf( "Final posterior sample: \n" );
    for(int i = 0; i < N_DIM; ++i) printf( "Coord %d:%g\n", i, samples[nest].x[i] );
    printf( "logL:%g +- %g\n", samples[nest].log_likelihood, sqrt( H / N ) );
    dumpSamples( samples, nest, logZ );

    return 0;
}

// Basic gaussian model
double _model(nestedObject *object)
{
    double exponent = 0.0;
    for( int i = 0; i < N_DIM; ++i )
    {
        double coord = object->x[i];
        exponent -= ( pow( ( coord - MODEL_MEAN ) / MODEL_SIGMA , 2 ) );
    }
    exponent *= (double)( N_DIM / 2 );
    return 1 + log( exp( exponent ) );
}

// Slightly more elaborate multi-model gaussian model
double _multimodal_model(nestedObject *object)
{
    double exponent1 = 0.0;
    double exponent2 = 0.0;
    // Mode 1
    for( int i = 0; i < N_DIM; ++i )
    {
        double coord = object->x[i];
        exponent1 -= ( pow( ( coord - MODEL_MEAN ) / MODEL_SIGMA, 2 ) );
    }
    exponent1 *= (double)( N_DIM / 2 );

    // Mode 2
    for( int i = 0; i < N_DIM; ++i )
    {
        double coord = object->x[i];
        exponent2 -= ( pow( ( coord - MODEL_MEAN - MODE_SHIFT  ) / MODEL_SIGMA, 2 ) );
    }
    exponent2 *= (double)( N_DIM / 2 );

    return 1 + log( exp ( exponent1 ) + 1.8 * exp( exponent2 ) );
}

void Prior(nestedObject *object)
{
    for(int i = 0; i < N_DIM; ++i) {
        object->x[i] = ( UNIFORM - 0.5 ) * ( 2 * PRIOR_BOX ) + MODEL_MEAN;// - ( 2 * PRIOR_BOX - MODEL_MEAN );
    }
    object->log_likelihood = LIKELIHOOD( object );
}

// For re-nesting an object after it has been killed.
// We use a very crude genetic algorithm to do this by mixing co-ords from two random objects.
void Explore(nestedObject *object,  // Object we want to replace 
	     nestedObject *objects, // Objects 
	     double logLstar)
{
    double evolvedLogL = 0.0;
    nestedObject dummy;
    do
    {
	int parents[2] = { 0 };
	parents[0] = (int)( N * UNIFORM ) % N;
	parents[1] = (int)( N * UNIFORM ) % N;

	if( parents[0] == parents[1] ) continue; // We can not reproduce asexually.
        for( int i = 0; i < N_DIM; ++i )
        {
            double mixing_factor = UNIFORM;
            double new_gene = mixing_factor * objects[parents[0]].x[i] +
		             ( 1 - mixing_factor ) * objects[parents[1]].x[I];
            dummy.x[i] = new_gene;
	}
        evolvedLogL = LIKELIHOOD( &dummy );	
    }
    while( evolvedLogL < logLstar );
    dummy.log_likelihood = evolvedLogL;
    dummy.logarea = 0.0;
    *object = dummy;
}


void Results(nestedObject *samples,
	     int nest,
	     double logZ )
{
    double x = 0.0, xx = 0.0;
    double y = 0.0, yy = 0.0;
    double w;
    int i;

    for( i = 0; i < nest; ++i )
    {
        w = exp( samples[i].logarea - logZ );
        x += w * samples[i].x[0];
        xx += w * samples[i].x[0] * samples[i].x[0];
        y += w * samples[i].x[1];
        yy += w * samples[i].x[1] * samples[i].x[1];
    }
    fprintf( stderr, "### Posterior Results ###\n" );
    fprintf( stderr, "Mean(x) = %g, Std. Dev.(x) = %g\n", x, sqrt( xx - x * x ) );
    fprintf( stderr, "Mean(y) = %g, Std. Dev.(y) = %g\n", y, sqrt( yy - y * y ) );
}

void dumpSamples(nestedObject *samples,
		 int nest,
		 double logZ)
{
    FILE *write_pointer;
    write_pointer = fopen( "nestedSamples.bin", "wb" );

    for(int i = 0; i < nest; ++i)
    {
        double x = samples[i].x[0];
        double y = samples[i].x[1];
        double logL = samples[i].log_likelihood;
        double logarea = samples[i].logarea;
        fwrite( &x, sizeof( x ), 1, write_pointer );
        fwrite( &y, sizeof( y ), 1, write_pointer );
        fwrite( &logL, sizeof( logL ), 1, write_pointer );
        fwrite( &logarea, sizeof( logarea ), 1, write_pointer );
        fwrite( &logZ, sizeof( logZ ), 1, write_pointer );
    }
    fclose( write_pointer );
}

References

[1] J. Skilling, ‘Nested sampling for general Bayesian computation’, Bayesian Analysis, vol. 1, no. 4, pp. 833–859, Dec. 2006, doi: 10.1214/06-BA127.

[2] D. Sivia and J. Skilling, Data analysis: a Bayesian tutorial. OUP Oxford, 2006.

[3] W. J. Handley, M. P. Hobson, and A. N. Lasenby, ‘PolyChord: next-generation nested sampling’, Mon. Not. R. Astron. Soc., vol. 453, no. 4, pp. 4385–4399, Nov. 2015, doi: 10.1093/mnras/stv1911.

[4] K. Javid, W. Handley, M. Hobson, and A. Lasenby, ‘Compromise-free Bayesian neural networks’, arXiv:2004.12211 [cs, stat], Jun. 2020, Accessed: Apr 25, 2020. [Online]. Available: http://arxiv.org/abs/2004.12211