Prior and Posterior Distributions

Section 6.3 concluded by deriving an equation for updating a prior probability density function g( )θ for an unknown parameter θ, based on some observed data D to give a posterior probability density function g( )θ D . The term f D( )θ is

proportional to the likelihood function. If the data set D consists of a random sample of observations x x1, , ,2… xn of a continuous random variable X with probability density function f x( )θ , then the likelihood function becomes

( ) ( )

; n i

L θ D f x θ

∝∏ . (6.14)

As the term f D( ) does not depend on θ, we can therefore write

( ) ( ; ) ( )

g θ D ∝L θ D g θ (6.15)

or, in words,

“posterior is proportional to likelihood times prior”.

This is the fundamental rule for Bayesian inference.

Example 6.4 Previously, in Example 6.3, we considered the proportion of car batteries that fail within two years. This involved the use of Bayes’ theorem for this unknown model parameter θ and was an illustration of how the fundamental rule “posterior is proportional to likelihood times prior” can be applied. To clarify this demonstration, the likelihood function takes the form

( ) 3 ( ) ( )2

; i 1

L θ D p x θ θ θ

∝∏ = − (6.16)

where the probability mass function p x( )iθ corresponds to a Bernoulli distribution. Consequently, the posterior probability density function of θ given the data

D has the form

( ) ( ; ) ( ) (1 )3

g θ D ∝L θ D g θ ∝θ −θ (6.17)

for 0< <θ 1, which agrees with the result we obtained previously. The

corresponding prior and posterior probability density functions are graphed for

comparison in Figure 6.2. □

Figure 6.2. Prior and posterior probability density functions for Example 6.4 Having evaluated a posterior distribution using this rule, we can evaluate the posterior mode θˆ such that

( )ˆ ( )

g θ D ≥g θ D ∀θ, (6.18)

by solving the equation

( ; ) ( ) 0

d L D g

d θ θ

θ = . (6.19)

However, to find the median or mean, and to use this posterior density to make any further inference, we need to determine the constant of proportionality in the fundamental rule above. In standard situations, we can recognise the functional form of L(θ;D g) ( )θ and hence quote published work on probability distributions to determine this constant of proportionality and so derive g( )θ D explicitly. In non-standard situations, we determine this constant of proportionality using numerical quadrature or simulation, both of which we discuss later.

6.4.1 Reference Priors

There are two main types of prior distribution, which loosely correspond with objective priors and subjective priors. As objective priors strictly do not exist, this category is generally known as reference priors and are used if little prior information is available and as a benchmark against which to compare the output from using subjective priors. This offers a default Bayesian analysis that is not dependent upon any personal prior knowledge. The simplest reference prior is proposed by the Bayes-Laplace postulate and simply recommends the use of a uniform or locally-uniform prior g( )θ ∝1 for all θ in the region of support Rθ.

0 0.2 0.4 0.6 0.8 1

0 1 2 3

prior(θ) posterior(θ)

However, different parameterisations can lead to different inferences with this approach.

To avoid this inconsistency, the standard univariate reference prior that analysts now adopt is the invariant prior of Jeffreys (1998), defined by

( ) ( );

g θ ∝ I θ θ∈Rθ (6.20)

where

( ) 2 ( )

log

d f x

I E

θ d θ θ

⎧ ⎫

⎪ ⎪

= − ⎨ ⎬

⎪ ⎪

⎩ ⎭

(6.21)

is Fisher’s expected information. An extension exists for the case of a parameter vector θ, though we usually assume the components of θ are independent, so

( )θ

g is just the product of the univariate invariant priors. This invariant prior distribution is occasionally improper, as its integral sometimes diverges. However, this problem is generally unimportant because the corresponding posterior distributions are usually proper. Books on Bayesian methods, such as Bernardo and Smith (2000) and Lee (2004), present tables of invariant prior and posterior distributions for common models.

6.4.2 Subjective Priors

Subjective prior distributions should be used if prior information is available, which is almost always. They represent the best available knowledge about unknown parameters and can be specified using smoothed histograms, relative likelihoods or parametric families. The first two of these are arbitrary and compu- tationally awkward, so we now investigate the last of these. A family of priors C is closed under sampling if

( ) ( )

g θ ∈ ⇒C g θ D ∈C, (6.22)

so that the posterior density has the same functional form as the prior density. This property is particularly appealing, as our prior knowledge can be regarded as posterior to some previous information. Again, we tend to suppose that components in multi-parameter problems are independent, so that their joint prior density is the product of corresponding univariate marginal priors.

Such closed priors exist, and are called natural conjugate priors, for sampling distributions f x( )θ that belong to the exponential family. This family includes Bernoulli, binomial, geometric, negative binomial, Poisson, exponential, gamma, normal and lognormal models. For a model in the exponential family with scalar parameter θ, we can express the probability density or mass function in the form

( ) exp{ ( ) ( ) ( ) ( ) }

f xθ = a x b θ +c x +d θ (6.23)

and the natural conjugate prior for θ is defined by

( ) exp{ 1 ( ) 2 ( ) }

g θ ∝ k b θ +k d θ (6.24)

for suitable constants k1 and k2. However, any conjugate prior of the form

( ) ( )exp{ 1 ( ) 2 ( ) }

g θ ∝h θ k b θ +k d θ (6.25)

is also closed under sampling for models in the exponential family.

Books on Bayesian methods, such as Bernardo and Smith (2000), present tables of the conjugate prior and posterior distributions for common models. However, many applications in reliability and maintenance are not amenable to such simple analyses. For example, the Weibull distribution is not a member of the exponential family. As a result of this, the constant of proportionality in the expression

( ) ( ; ) ( )

g θ D ∝L θ D g θ (6.26)

can sometimes not be evaluated algebraically and analytical approximations or numerical computation are usually required.

It is desirable to avoid the inconsistency of using natural conjugate priors when they exist and other forms of subjective prior, such as location-scale forms, when they do not. The following recommendations by Percy (2004) provide a simple, consistent and comprehensive strategy that achieves this for general use:

• Infinite range −∞ < < ∞θ , use a normal prior distribution for θ

• Semi-infinite range 0< < ∞θ , use a gamma prior distribution for θ

• Finite range 0< <θ 1, use a beta prior distribution for θ

If necessary, linear transformations of the parameters ensure that these priors are sufficient for modelling all situations. They match with the natural conjugate priors for simple models and extend to deal with more complicated models. Mixtures of these priors can be used if multimodality is present and prior independence can be assumed for multiparameter situations.

State-of-the-art Reviews on Maintenance Technologies

Watchdog Agent ® -based Intelligent Maintenance Systems