Parametric matching priors in the multiparameter case

Một phần của tài liệu Handbook of Statistics Vol 25 Supp 1 (Trang 113 - 123)

In this section we investigate probability matching associated with posterior parametric statements in the cased >1.

As in the one-parameter case, in the multiparameter case the O(n−1/2)equivalence of Bayesian and frequentist probability statements holds on account of the first-order equivalence of the Bayesian and frequentist normal approximations. In this section we therefore investigate higher-order probability matching in the cased >1.

5.1. Matching for an interest parameter

Suppose that θ1 is considered to be a scalar parameter of primary interest and that 2, . . . , θd)is regarded as a vector of nuisance parameters. Letzα(Xn)be the upper α-quantile of the marginal posterior distribution ofθ1underπ; that iszα(Xn)satisfies

prπ θ1⩽zα

XnXn

=α.

The priorπ(ã)is O(n−1)-probability matching with respect toθ1if

(9) prθ

θ1⩽zα Xn

=α+O n−1 pointwise or very weakly for everyα,0< α <1.

The second-order one-parameter result ofWelch and Peers (1963)was generalised to the multiparameter case byPeers (1965). Peers showed thatπ(θ )is a PMP forθ1if and only if it is a solution of the partial differential equation

(10) Dj

κ11−1/2

κ1jπ

=0,

whereDj∂/∂θj, κij(θ ) is the(i, j )th element of{i(θ )}−1 and we have used the summation convention. In particular, ifθ1and2, . . . , θd)are orthogonal (cf.Cox and Reid, 1987) thenκ1j(θ )=0,j =2, . . . , dand it follows immediately from(10)that

(11) π(θ )κ11(θ )−1/2h(θ2, . . . , θd)=κ11(θ )1/2h(θ2, . . . , θd),

where the functionhis arbitrary, as indicated inTibshirani (1989)and rigorously proved byNicolaou (1993)andDatta and J.K. Ghosh (1995a), following earlier work byStein (1985).

Given the arbitrary functionhin(11), there is the opportunity of probability match- ing to an order higher than O(n−1).Mukerjee and Ghosh (1997), generalising results inMukerjee and Dey (1993), show that it may be possible to achieve o(n−1)probabil- ity matching in(9). The priors that emerge, however, may differ depending on whether posterior quantiles or the posterior distribution function ofθ1are considered.

Datta and J.K. Ghosh (1995a)generalised the differential equation(10)for an arbi- trary parametric function. For a smooth parametric functiont (θ )of interest a second- order PMPπ(θ )satisfies the differential equation

(12) Dj

Λ−1bjπ

=0, where

(13) bj =κj rDrt, Λ2=κj rDjt Drt.

The third-order matching results fort (θ ), established byMukerjee and Reid (2001), are more complex. One may also refer toDatta and Mukerjee (2004, Theorem 2.8.1) for this result.

EXAMPLE5.1. Consider the location-scale model f (x;θ )= 1

θ2

f

xθ1 θ2

, x∈R,

whereθ1 ∈ R, θ2 > 0, and f(ã) is a density with support R. It can be checked that the information matrix is θ2−2Σ, where Σ = ij) is the covariance matrix of Uj−1d logf(U )/dU,j =1,2, when the density ofUisf(u).

Supposeθ1 is the interest parameter. In a class of priors of the formg(θ1)h(θ2), g(θ1)is necessarily a constant for a second-order PMP forθ1. If an orthogonal para- meterisation holds, which happens iff(ã)is a symmetric density about zero, such as a standard normal or Student’st distribution, thenh(θ2)is arbitrary in this case. How- ever, in the absence of orthogonalisation,h(θ2)must be proportional toθ2−1. It can be checked using (2.5.15) ofDatta and Mukerjee (2004) that, in the class of priors un- der consideration, the prior proportional toθ2−1is the unique third-order PMP with or without parametric orthogonality.

Now suppose θ2 is the interest parameter. In the aforementioned class of priors, g(θ1)/θ2 is a second-order PMP. Under parametric orthogonality, again g(θ1)is ar- bitrary; otherwise it is a constant. In either case, using (2.5.15) ofDatta and Mukerjee (2004), it can be checked that the second-order PMP is also third-order matching forθ2. If PMPs are required when bothθ1andθ2 are interest parameters, thenθ2−1 is the unique prior (see Section5.2).

Finally, consider θ12 as the parametric function of interest. In a class of priors of the form θ2aTΣ θ )b, wherea, b are constants, it can be checked from (12)that a = −1, b arbitrary gives the class of all second-order PMPs. In particular,b = 0 leads to the prior which is second-order matching simultaneously for each ofθ1, θ2 andθ12. While this is an attractive choice, a negative feature of this prior is that in the special case of a normal model, it leads to a marginalisation paradox (cf.Dawid et al., 1973;Bernardo, 1979). On the other hand, for the normal model it was checked inDatta and Mukerjee (2004, p. 38)that this prior is the unique third-order PMP for θ12. Interestingly,b = −1/2 leads to an important prior, namely, the reference prior ofBernardo (1979), which avoids this paradox.

EXAMPLE5.2. Consider the balanced one-way random effects model withttreatments andnreplications for each treatment. The corresponding mixed linear model is given by

Xij =à+ai+eij, 1⩽in, 1⩽jt,

where the parameteràrepresents the general mean, each random effectai is univari- ate normal with mean zero and varianceλ1, eacheij is univariate normal with mean zero and variance λ2, and the ai’s and the eij’s are all independent. Here à ∈ R andλ1, λ2(>0)are unknown parameters andt (⩾ 2)is fixed. For 1 ⩽ in, let Xi =(Xi1, . . . , Xit). ThenX1, . . . , Xn are i.i.d. random variables with a multivariate normal distribution.

This example has been extensively discussed in the noninformative priors literature (cf.Box and Tiao, 1973).Berger and Bernardo (1992b)have constructed reference pri- ors for (à, λ1, λ2)under different situations when one of the three parameters is of interest and the remaining two parameters are either clustered into one group or divided into two groups according to their order of importance.Ye (1994)considered the one- to-one reparameterisation(à, λ2, λ12)and constructed various reference priors when

λ12is the parameter of importance.Datta and M. Ghosh (1995a, 1995b)have con- structed reference priors as well as matching priors for various parametric functions in this set-up.

Suppose our interest lies in the ratioλ12. FollowingDatta and Mukerjee (2004), we reparameterise as

θ1=λ12, θ2=

λt2−1(tλ1+λ2)1/(2t )

, θ3=à,

whereθ1, θ2>0 andθ3∈R. It can be checked that the above is an orthogonal parame- terisation withκ11(θ )(1+t θ1)−2. Hence by(11), second-order matching is achieved if and only ifπ(θ ) = d(θ(2))/(1+t θ1), whereθ(2) = 2, θ3)andd(ã)is a smooth positive function. In fact,Datta and Mukerjee (2004)further showed that a subclass of this class of priors given by π(θ ) = d3)/{(1+t θ12}characterises the class of all third-order matching priors, whered3)is a smooth positive function. In partic- ular, takingd3)constant, it can be checked that the prior given by{(1+t θ12}−1 corresponds to{(tλ1+λ22}−1in the original parameterisation, which is one of the reference priors derived by Berger and Bernardo (1992b)and Ye (1994). This prior was also recommended byDatta and M. Ghosh (1995a)andDatta (1996)from other considerations.

An extension of this example to the unbalanced case was studied recently byDatta et al. (2002).

5.2. Probability matching priors in group models

We have already encountered group models in our review of exact matching in Sec- tion3. In this section we will review various noninformative priors for a scalar interest parameter in group models. The interest parameter is maximal invariant under a suit- able group of transformationsGwhere the remaining parameters are identified with the group elementg. We assume thatGis a Lie group and that the parameter spacehas a decomposition such that the space of nuisance parameters is identical withG. It is also assumed thatGacts on freely by left multiplication inG.Chang and Eaves (1990) derived reference priors for this model.Datta and J.K. Ghosh (1995b)used this model for a comparative study of various reference priors, including the Berger–Bernardo and Chang–Eaves reference priors. Writing the parameter vector as1, g),Datta and J.K. Ghosh (1995b)noted that (i)κ11, the first diagonal element of the inverse of the information matrix, is only a function ofψ1, (ii) theChang and Eaves (1990)reference prior πCE1, g) is given by {κ111)}−1/2hr(g) and (iii) the Berger and Bernardo (1992a)reference prior for the group ordering{ψ1, g}is given by{κ111)}−1/2hl(g), wherehr(g)andhl(g)are the right and the left invariant Haar densities onG. While the left and the right invariant Haar densities are usually different, they are identical if the group G is either commutative or compact. Typically, these reference priors are improper. It follows fromDawid et al. (1973) thatπCE1, g) will not yield any marginalisation paradox for inference forψ1. The same is not true for the two-group ordering Berger–Bernardo reference prior.Datta and J.K. Ghosh (1995b)illustrated this point through two examples. With regard to probability matching, this article established that while the Chang–Eaves reference prior is always second-order matching forψ1, this

is not always the case for the other prior based on the left invariant Haar density. How- ever these authors also noted that often the Berger–Bernardo reference prior based on a one-at-a-time parameter grouping is identical with the Chang–Eaves reference prior.

We illustrate these priors through two examples given below.

EXAMPLE5.3. Consider the location-scale family ofExample 5.1. Letψ1 = θ12. Under a group of scale transformations,ψ1remains invariant. Since the group operation is commutative, both the left and the right invariant Haar densities are equal and given byg−1. Here the group elementgis identified with the nuisance parameterθ2. It can be checked thatκ11 = 22+2σ12ψ1+σ11ψ12)/|Σ|, whereΣ and its elements are as defined inExample 5.1. Hence the Berger–Bernardo and Chang–Eaves reference priors are given by (in the1, g)-parameterisation)g−122+2σ12ψ1+σ11ψ12)−1/2. It can be checked that, in the1, θ2)-parameterisation, this prior reduces to the prior corresponding to the choicesa= −1 andb= −1/2 given at the end ofExample 5.1.

EXAMPLE5.4. Consider a bivariate normal distribution with meansà1,à2and disper- sion matrixσ2I2. Let the parameter of interest beθ1=1−à2)/σand writeà2=θ2 andσ = θ3. This problem was considered, among others, byDatta and J.K. Ghosh (1995b)andGhosh and Yang (1996). For the group of transformationsH = {g: g = (g2, g3), g2 ∈ R, g3 > 0} in the range ofX1 defined by gX1 = g3X1+g21, the induced group of transformations on the parameter space isG = {g}with the trans- formation defined by = 1, g3θ2+g2, g3θ3). Here θ1 is the maximal invariant parameter.Datta and J.K. Ghosh (1995b) obtained the Chang–Eaves reference prior and Berger–Bernardo reference prior for the group ordering{θ1, (θ2, θ3)}given by

πCE(θ )

8+θ12−1/2

θ3−1, πBB(θ )

8+θ12−1/2

θ3−2.

The Chang–Eaves prior is a second-order PMP forθ1. These two priors transform to σ−1{8σ2+1−à2)2}−1/2andσ−2{8σ2+1−à2)2}−1/2respectively in the orig- inal parameterisation.Datta and Mukerjee (2004) have considered priors having the structure

π1, à2, σ )=

8+

à1−à2 σ

2−s1

σs2,

wheres1ands2are real numbers. They showed that such priors will be second-order matching for1−à2)/σif and only ifs2=2s1+1. Clearly, the Chang–Eaves prior satisfies this condition.Datta and Mukerjee (2004) have further shown that the only third-order PMP in this class is given bys1=0 ands2=1.

5.3. Probability matching priors and reference priors

Berger and Bernardo (1992a) have given an algorithm for reference priors. Berger (1992)has also introduced the reverse reference prior. The latter prior for the para- meter grouping{θ1, θ2}, assumingθ = 1, θ2)for simplicity, is the prior that would result from following the reference prior algorithm for the reverse parameter group- ing{θ2, θ1}. Following the algorithm with rectangular compacts, the reverse reference

priorπRR1, θ2)is of the form πRR1, θ2)=κ111/2(θ )g(θ2),

whereg(θ2)is an appropriate function ofθ2. Under parameter orthogonality, the above reverse reference prior has the form of(11)and hence it is a second-order PMP forθ1.

While the above is an interesting result for reverse reference priors, reference pri- ors still play a dominant role in objective Bayesian inference. Datta and M. Ghosh (1995b) have provided sufficient conditions establishing the equivalence of reference and reverse reference priors. A simpler version of that result is described here. Let the Fisher information matrix be ad×ddiagonal matrix with thejth diagonal element fac- tored asκjj(θ ) =hj1j)hj2(j )), whereθ(j ) =1, . . . , θj−1, θj+1, . . . , θd)and hj1(ã), hj2(ã)are two positive functions. Assuming a rectangular sequence of compacts, it can be shown that for the one-at-a-time parameter grouping{θ1, . . . , θd}the reference priorπR(θ )and the reverse reference priorπRR(θ )are identical and proportional to

(14) d

j=1

h1/2j1 j).

It was further shown byDatta (1996)that the above prior is second-order probability matching for each component ofθ. The last result is also available in somewhat implicit form inDatta and M. Ghosh (1995b), and in a special case inSun and Ye (1996).

EXAMPLE5.5 (Datta and M. Ghosh 1995b). Consider the inverse Gaussian distribution with pdf

f

x;à, σ2

=

2π σ2−1/2

x−3/2exp

(xà)2/

2σ2à2x

I(0,)(x), where à (> 0)and σ2 (> 0)are both unknown. Here i(à, σ2) = diag−3σ−2, (2σ4)−1). From the result given above it follows that πR(à, σ2)πRR(à, σ2)à−3/2σ−2, identifyingθ1=àandθ2=σ2. It further follows that it is a second-order PMP for bothàandσ2.

5.4. Simultaneous and joint matching priors

Peers (1965)showed that it is not possible in general to find a single prior that is proba- bility matching to O(n−1)for all parameters simultaneously.Datta (1996)extended this discussion to the case ofsdreal parametric functions of interest,t1(θ ), . . . , ts(θ ). For each of these parametric functions a differential equation similar to(12)can be solved to get a second-order PMP for that parametric function. It may be possible that one or more priors exist satisfying all thes differential equations. If that happens, following Datta (1996)these priors may be referred to as simultaneous marginal second-order PMPs.

The above simultaneous PMPs should be contrasted with joint PMPs, which are derived via joint consideration of all these parametric functions. While simultaneous marginal PMPs are obtained by matching appropriate posterior and frequentist marginal quantiles, joint PMPs are obtained by matching appropriate posterior and frequentist joint c.d.f.’s.

A priorπ(ã)is said to be a joint PMP fort1(θ ), . . . , ts(θ )if prπ

n1/2{t1(θ )t1ˆn)}

d1 ⩽w1, . . . ,n1/2{ts(θ )tsˆn)}

dsws|Xn

=prθ

n1/2{t1(θ )t1ˆn)}

d1 ⩽w1, . . . ,n1/2{ts(θ )tsˆn)}

dsws

+o (15)

n−1/2

for allw1, . . . , ws and allθ. In the above, dk = [{tkˆn)}TCn−1{tkˆn)}]1/2,k = 1, . . . , s, andw1, . . . , wsare free fromn, θandXn. HereCnis the observed information matrix, ad×d matrix which is positive definite with prθ-probability 1+O(n−2). It is assumed that thed ×sgradient matrix corresponding to thesparametric functions is of full column rank for allθ.

In this section we will be concerned with only second-order PMPs.Mukerjee and Ghosh (1997)showed that up to this order marginal PMPs via c.d.f. matching and quan- tile matching are equivalent. Thus, from the definition of joint matching it is obvious that any joint PMP for a set of parametric functions will also be a simultaneous marginal PMP for those functions.

Datta (1996)investigated joint matching by considering an indirect extension of ear- lier work ofGhosh and Mukerjee (1993a), in which an importance ordering as used in reference priors (cf.Berger and Bernardo, 1992a) is assumed amongst the components ofθ.Ghosh and Mukerjee (1993a)considered PMPs for the entire parameter vectorθ by considering a pivotal vector whoseith component can be interpreted as an approxi- mate standardised version of the regression residual ofθi onθ1, . . . , θi−1,i=1, . . . , d, in the posterior set-up. For this reasonDatta (1996)referred to this approach as a re- gression residual matching approach. On the other hand,Datta (1996)proposed a direct extension ofDatta and J.K. Ghosh (1995a)for a set of parametric functions that are of equal importance. The relationship between these two approaches has been explored in Datta (1996).

Definebkj andΛk as in(13)by replacingt (θ )bytk(θ ),k = 1, . . . , s. Also define, fork, m, u=1, . . . , s,

ρkm=bjkκj lbml /(ΛkΛm)=κj rDjtkDrtm/(ΛkΛm), ζkmu=bjkDjρmuk.

Datta (1996) proved that a simultaneous marginal PMP for parametric functions t1(θ ), . . . , ts(θ )is a joint matching prior if and only if

(16) ζkmu+ζmku+ζukm=0, k, m, u=1, . . . , s,

hold.

Note that the conditions(16)depend only on the parametric functions and the model.

Thus if these conditions fail, there would be no joint PMP even if a simultaneous marginal PMP exists. In the special case in which interest lies in the entire parame- ter vector, thens = d,tk(θ ) = θk. Here, if the Fisher information matrix is ad ×d diagonal matrix then condition(16)holds trivially. If further thejth diagonal element

factors as κjj(θ ) = hj1j)hj2(j )), where θ(j ) = 1, . . . , θj−1, θj+1, . . . , θd) andhj1(ã), hj2(ã)are two positive functions, then the unique second-order joint PMP πJM(θ )is given by the prior

πJM(θ )∝ (17) d

j=1

h1/2j1j),

which is the same as the reference prior given in(14). In particular,Sun and Ye’s (1996) work that considered a joint PMP for the orthogonal mean and variance parameters in a two-parameter exponential family follows as a special case of the prior(17).

EXAMPLE5.6. Datta (1996)considered the example of ap-variate normal with mean à=1, . . . , àp)and identity matrix as the dispersion matrix. Reparameterise asà1= θ1cosθ2, . . . , àp−1 = θ1sinθ2ã ã ãsinθp−1cosθp,àp = θ1sinθ2ã ã ãsinθp−1sinθp. Herei(θ )=diag(1, θ12, θ12sin2θ2, . . . , θ12sin2θ2ã ã ãsin2θp−1)and all its diagonal el- ements have the desired factorisable structure. Hence by(17),π(θ ) ∝1 is the unique joint PMP for the components ofθ.

EXAMPLE5.7. We continueExample 5.2with a different notation. Consider the mixed linear model

Xij =θ1+ai+eij, 1⩽in, 1⩽jt,

where the parameterθ1represents the general mean, each random effectai is univariate normal with mean zero and varianceθ2, eacheij is univariate normal with mean zero and varianceθ3, and theai’s and theeij’s are all independent. Hereθ1 ∈ Randθ2, θ3 (>0). Lets=3,t1(θ )=θ1,t2(θ )=θ23andt3(θ )=θ3. It is shown byDatta (1996) that the elementsρij are all free fromθ, and hence the condition(16)is automatically satisfied.Datta (1996)showed thatπ(θ )∝ {θ33+t θ2)}−1is the unique joint PMP for the parametric functions given above.

5.5. Matching priors via Bartlett corrections

Inversion of likelihood ratio acceptance regions is a standard approach for constructing reasonably optimal confidence sets. Under suitable regularity conditions, the likelihood ratio statistic is Bartlett correctable. The error incurred in approximating the distribution of a Bartlett corrected likelihood ratio statistic for the entire parameter vector by the chi- square distribution withd degrees of freedom is O(n−2), whereas the corresponding error for the uncorrected likelihood ratio statistic is O(n−1). Approximate confidence sets using a Bartlett corrected likelihood ratio statistic and chi-square quantiles will have coverage probability accurate to the fourth-order.

In a pioneering articleBickel and Ghosh (1990)noted that the posterior distribution of the likelihood ratio statistic is also Bartlett correctable, and the posterior distribution of the posterior Bartlett corrected likelihood ratio statistic agrees with an appropriate chi-square distribution up to O(n−2). From this, via the shrinkage argument mentioned in Section2, they provided a derivation of the frequentist Bartlett correction.

It follows from the above discussion that for any smooth prior one can construct ap- proximate credible sets forθusing chi-square quantiles and posterior Bartlett corrected likelihood ratio statistics with an O(n−2)error in approximation.Ghosh and Mukerjee (1991)utilised the existence of a posterior Bartlett correction to the likelihood ratio sta- tistic to derive the frequentist Bartlett correction to the same and characterised priors for which these two corrections are identical up to o(1). An important implication of this characterisation result is that for all such priors the resulting credible sets based on the posterior Bartlett corrected likelihood ratio statistic will also have frequentist coverage accurate to O(n−2)and hence these priors are fourth-order PMPs.

For 1⩽j, r, sd, let us defineVj r,s =Eθ{DjDrlogf (X;θ )Dslogf (X;θ )}and Vj rs =Eθ{DjDrDslogf (X;θ )}.Ghosh and Mukerjee (1991)characterised the class of priors for which the posterior and the frequentist Bartlett corrections agree to o(1).

Any such priorπ(ã)is given by a solution to the differential equation

(18) DiDj

π(θ )κij

Di

π(θ )κirκj s(2Vrs,j+Vj rs)

=0.

Ghosh and Mukerjee (1992b)generalised the above result in the presence of a nui- sance parameter. They considered the case of a scalar interest parameter in the presence of a scalar orthogonal nuisance parameter.DiCiccio and Stern (1993)have considered a very general adjusted likelihood for a vector interest parameter where the nuisance parameter is also vector-valued. They have shown that the posterior distribution of the resulting likelihood ratio statistic also admits posterior Bartlett correction. They sub- sequently utilised this fact inDiCiccio and Stern (1994), as inGhosh and Mukerjee (1991), to characterise priors for which the posterior and the frequentist Bartlett cor- rections agree to o(1). Such priors are obtained as solutions to a differential equation similar to the one given by(18)above. As a particular example,DiCiccio and Stern (1994)considered PMPs based on HPD regions of a vector of interest parameters in the presence of nuisance parameters (see the next subsection).

5.6. Matching priors for highest posterior density regions

Highest posterior density regions are very popular in Bayesian inference as they are de- fined for multi-dimensional interest parameters with or without a nuisance parameter, which can also be multi-dimensional. These regions have the smallest volumes for a given credible level. If such regions also have frequentist validity, they will be desirable in the frequentist set-up as well. Nonsubjective priors that lend frequentist validity to HPD regions are known in the literature as HPD matching priors. Complete characteri- sations of such priors are well studied in the literature. A brief account of this literature is provided below.

A priorπ(ã)is HPD matching forθif and only if it satisfies the partial differential equation

(19) Du

π(θ )Vj rsκj rκsu

DjDr

π(θ )κj r

=0.

Ghosh and Mukerjee (1993b)reported this result in a different but equivalent form.

Prior to these authorsPeers (1968)andSeverini (1991)explored HPD matching priors for scalarθ models. Substantial simplification of Eq. (19)arises for scalar parameter

Một phần của tài liệu Handbook of Statistics Vol 25 Supp 1 (Trang 113 - 123)

Tải bản đầy đủ (PDF)

(1.044 trang)