In this section, results concerning several random variables are presented in a more rigorous fashion than in Section 2.5. For those desiring such a presentation, this section may be covered in addition to, or in place of, Section 2.5.
We have said that observing a value of a random variable is like sampling a value from a population. In some cases, the items in the population may each have several random variables associated with them. For example, imagine choosing a student at random from a list of all the students registered at a university and measuring that student’s height and weight. Each individual in the population of students corresponds to two random variables, height and weight. If we also determined the student’s age, each individual would correspond to three random variables. In principle, any number of random variables may be associated with each item in a population.
When two or more random variables are associated with each item in a population, the random variables are said to bejointly distributed. If all the random variables are discrete, they are said to bejointly discrete. If all the random variables are continuous, they are said to bejointly continuous. We will study these two cases separately.
Jointly Discrete Random Variables
Example 2.49 (in Section 2.5) discussed the lengths and widths of rectangular plastic covers for a CD tray that is installed in a personal computer. Measurements are rounded to the nearest millimeter. Let Xdenote the measured length andY the measured width.
The possible values ofXare 129, 130, and 131, and the possible values forY are 15 and 16. BothX andY are discrete, soX andY arejointly discrete. There are six possible values for the ordered pair(X,Y):(129,15),(129,16),(130,15),(130,16),(131,15), and(131,16). Assume that the probabilities of each of these ordered pairs are as given in the following table.
x y P(X=xandY=y)
129 15 0.12
129 16 0.08
130 15 0.42
130 16 0.28
131 15 0.06
131 16 0.04
Thejoint probability mass functionis the function p(x,y)= P(X=xandY=y).
So, for example, we have p(129,15)=0.12, andp(130,16)=0.28.
Sometimes we are given a joint probability mass function of two random variables, but we are interested in only one of them. For example, we might be interested in the probability mass function of X, the length of the CD cover, but not interested in the widthY. We can obtain the probability mass function of either one of the variablesXor Y separately by summing the appropriate values of the joint probability mass function.
Examples 2.52 and 2.53 illustrate the method.
E xample 2.52 Find the probability that a CD cover has a length of 129 mm.
Solution
It is clear from the previous table that 12% of the CD covers in the population have a length of 129 and a width of 15, and 8% have a length of 129 and a width of 16.
Therefore 20% of the items in the population have a length of 129. The probability that a CD cover has a length of 129 mm is 0.20. In symbols, we have
P(X =129)=P(X =129 andY =15)+P(X =129 andY =16)
=0.12+0.08
=0.20
E xample 2.53 Find the probability that a CD cover has a width of 16 mm.
Solution
We need to findP(Y =16). We can find this quantity by summing the probabilities of all pairs(x,y)for whichy=16. We obtain
P(Y =16)= P(X =129 andY =16)+P(X =130 andY =16) +P(X =131 andY =16)
=0.08+0.28+0.04
=0.40
Examples 2.52 and 2.53 show that we can find the probability mass function ofX (or Y) by summing the joint probability mass function over all values of Y (or X).
Table 2.3 presents the joint probability mass function ofXandY. The probability mass function ofX appears in the rightmost column and is obtained by summing along the rows. The probability mass function ofY appears in the bottom row and is obtained by summing down the columns. Note that the probability mass functions of X and ofY appear in the margins of the table. For this reason they are often referred to asmarginal probability mass functions.
TABLE 2.3Joint and marginal proba- bility mass functions for the length X and widthYof a CD cover
y
x 15 16 pX(x)
129 0.12 0.08 0.20
130 0.42 0.28 0.70
131 0.06 0.04 0.10
pY(y) 0.60 0.40
Finally, if we sum the joint probability density function over all possible values of x andy, we obtain the probability that X andY take values somewhere within their possible ranges, and this probability is equal to 1.
Summary
IfXandY are jointly discrete random variables:
■ The joint probability mass function ofXandY is the function p(x,y)=P(X =xandY =y)
■ The marginal probability mass functions ofXand ofY can be obtained from the joint probability mass function as follows:
pX(x)=P(X =x)=
y
p(x,y) pY(y)= P(Y =y)=
x
p(x,y) where the sums are taken over all the possible values ofY and ofX, respectively.
■ The joint probability mass function has the property that
x
y
p(x,y)=1
where the sum is taken over all the possible values ofXandY.
Jointly Continuous Random Variables
We have seen that if X is a continuous random variable, its probabilities are found by integrating its probability density function. We say that the random variables X and Y arejointly continuous if their probabilities are found by integrating a function of two variables, called thejoint probability density function of X andY. To find the probability that X andY take values in any region, we integrate the joint probability density function over that region. Example 2.54 shows how.
E xample 2.54 Assume that for a certain type of washer, both the thickness and the hole diameter vary from item to item. LetXdenote the thickness in millimeters and letYdenote the hole diameter in millimeters, for a randomly chosen washer. Assume that the joint probability density function ofX andY is given by
f(x,y)=
⎧⎨
⎩ 1
6(x+y) if 1≤x≤2 and 4≤y≤5
0 otherwise
Find the probability that a randomly chosen washer has a thickness between 1.0 and 1.5 mm, and a hole diameter between 4.5 and 5 mm.
Solution
We need to find P(1≤X≤1.5 and 4.5≤Y≤5). The large rectangle in the figure indicates the region where the joint density is positive. The shaded rectangle indicates the region where 1≤x≤1.5 and 4.5≤y≤5, over which the joint density is to be integrated.
5.5
5
4.5
4
3.50.5 1 1.5
x y
2 2.5
We integrate the joint probability density function over the indicated region:
P(1≤ X ≤1.5 and 4.5≤Y ≤5)= 1.5
1
5
4.5
1
6(x+y)dy dx
= 1.5
1
x y 6 + y2
12
y=5 y=4.5
dx
= 1.5
1
x 12+19
48
dx
= 1 4
Note that if a joint probability density function is integrated over the entire plane, that is, if the limits are−∞to∞for bothxandy, we obtain the probability that both XandY take values between−∞and∞, which is equal to 1.
Summary
IfXandYare jointly continuous random variables, with joint probability density function f(x,y), anda<b,c<d, then
P(a ≤X ≤bandc≤Y ≤d)= b
a
d
c
f(x,y)dy dx The joint probability density function has the following properties:
f(x,y)≥0 for allxandy ∞
−∞
∞
−∞
f(x,y)dy dx =1
We have seen that if X andY are jointly discrete, the probability mass function of either variable may be found by summing the joint probability mass function over all the values of the other variable. When computed this way, the probability mass function is called the marginal probability mass function. By analogy, if X andY are jointly continuous, the probability density function of either variable may be found by integrating the joint probability density function with respect to the other variable. When computed this way, the probability density function is called the marginal probability density function. Example 2.55 illustrates the idea.
E xample 2.55 Refer to Example 2.54. Find the marginal probability density function of the thickness X of a washer. Find the marginal probability density function of the hole diameterY of a washer.
Solution
Denote the marginal probability density function of X by fX(x), and the marginal probability density function ofY by fY(y). Then
fX(x)= ∞
−∞
f(x,y)dy
= 5
4
1
6(x+y)dy
= 1 6
x+9
2
for 1≤x≤2 and
fY(y)= ∞
−∞
f(x,y)dx
= 2
1
1
6(x+y)dx
= 1 6
y+3
2
for 4≤ y≤5
Summary
IfXandY are jointly continuous with joint probability density function f(x,y), then the marginal probability density functions ofX and ofY are given, respec- tively, by
fX(x)= ∞
−∞ f(x,y)dy fY(y)= ∞
−∞ f(x,y)dx
E xample 2.56 The article “Performance Comparison of Two Location Based Routing Protocols for Ad Hoc Networks” (T. Camp, J. Boleng, et al.,Proceedings of the Twenty- r st Annual Joint Conference of IEEE Computer and Communications Societies2002:1678–1687) describes a model for the movement of a mobile computer. Assume that a mobile computer moves within the region Abounded by thexaxis, the linex =1, and the liney=xin such a way that if(X,Y)denotes the position of the computer at a given time, the joint density ofXandY is given by
f(x,y)=
8x y (x,y)∈ A 0 (x,y)∈ A FindP(X >0.5 andY <0.5).
Solution
The region A is the triangle shown in Figure 2.15, with the region X > 0.5 and Y <0.5 shaded in. To find P(X >0.5 andY <0.5), we integrate the joint density over the shaded region.
P(X >0.5 andY <0.5)= 1
0.5
0.5
0 8xy dy dx
= 1
0.5
4xy2
y=0.5 y=0
dx
= 1
0.5
x dx
=0.375
0.5 y = x
0 0.5
0 1
1 y
x
FIGURE 2.15The triangle represents the region where the joint density ofXandYis positive. By integrating the joint density over the shaded square, we find the probability that the point(X,Y)lies in the shaded square.
E xample 2.57 Refer to Example 2.56. Find the marginal densities ofX and ofY. Solution
To compute fX(x), the marginal density ofX, we fixxand integrate the joint density along the vertical line throughx, as shown in Figure 2.16. The integration is with respect toy, and the limits of integration arey=0 toy=x.
fX(x)= x
0
8xy dy
=4x y2
y=x
y=0
=4x3 for 0<x<1
00 1
1 y
x x
y = x
FIGURE 2.16The marginal density fX(x)is computed by integrating the joint density along the vertical line throughx.
To compute fY(y), the marginal density ofY, we fixyand integrate the joint density along the horizontal line throughy, as shown in Figure 2.17 (page 134). The integration is with respect tox, and the limits of integration arex =ytox=1.
fY(y)= 1
y
8xy dx
=4x2y
x=1
x=y
=4y−4y3 for 0<y<1
00 1
y
1 y
x y = x
FIGURE 2.17The marginal density fY(y)is computed by integrating the joint density along the horizontal line throughy.
More than Two Random Variables
The ideas of joint probability mass functions and joint probability density functions extend easily to more than two random variables. We present the definitions here.
Definition
■ If the random variablesX1, . . . ,Xnare jointly discrete, the joint probability mass function is
p(x1, . . . ,xn)=P(X1 =x1, . . . , Xn=xn)
■ If the random variablesX1, . . . ,Xnare jointly continuous, they have a joint probability density function f(x1, . . . ,xn), where
P(a1≤ X1≤b1, . . . ,an ≤Xn≤bn)= bn
an
ã ã ã b1
a1
f(x1, . . . ,xn)dx1ã ã ãdxn for any constantsa1 ≤b1, . . . ,an ≤bn.
Means of Functions of Random Variables
Sometimes we are given a random variable X and we need to work with a function of X. If X is a random variable, and h(X)is a function of X, thenh(X)is a random variable as well. If we wish to compute the mean ofh(X), it can be done by using the probability mass function or probability density function of X. It is not necessary to know the probability mass function or probability density function ofh(X).
LetXbe a random variable, and leth(X)be a function ofX. Then
■ IfXis discrete with probability mass functionp(x), the mean ofh(X)is given by
μh(X)=
x
h(x)p(x) (2.59)
where the sum is taken over all the possible values ofX.
■ IfXis continuous with probability density function f(x), the mean of h(X)is given by
μh(X)= ∞
−∞
h(x)f(x)dx (2.60)
Note that if we substituteh(X)=(X−μX)2in either Equation (2.59) or (2.60), the right-hand side of the equation becomes an expression for the variance of X. It follows thatσX2 = μ(X−μX)2. We can obtain another expression for the variance of X by substituting h(X) = X2 and subtractingμ2X from both sides of the equation. We conclude thatσX2 =μX2−μ2X.
E xample
2.58 An internal combustion engine contains several cylinders bored into the engine block.
Let X represent the bore diameter of a cylinder, in millimeters. Assume that the probability density function ofX is
f(x)=
10 80.5<x<80.6 0 otherwise
LetA=πX2/4 represent the area of the bore. Find the mean of A.
Solution
μA = ∞
−∞
πx2 4 f(x)dx
= 80.6
80.5
πx2 4 (10)dx
=5096 The mean area is 5096 mm2.
Ifh(X) =a X +b is a linear function of X, then the meanμa X+b and the vari- anceσa X+b2 can be expressed in terms ofμX andσX2. These results were presented in Equations (2.44) through (2.46) in Section 2.5; we repeat them here.
IfXis a random variable, andaandbare constants, then
μa X+b =aμX+b (2.61)
σa X+b2 =a2σX2 (2.62)
σa X+b= |a|σX (2.63)
Proofs of these results are presented at the end of this section.
If X andY are jointly distributed random variables, andh(X,Y)is a function of X andY, then the mean ofh(X,Y)can be computed from the joint probability mass function or joint probability density function ofXandY.
IfX andY are jointly distributed random variables, andh(X,Y)is a function of XandY, then
■ IfXandY are jointly discrete with joint probability mass functionp(x,y), μh(X,Y)=
x
y
h(x,y)p(x,y) (2.64) where the sum is taken over all the possible values ofXandY.
■ IfXandY are jointly continuous with joint probability density function f(x,y),
μh(X,Y)= ∞
−∞
∞
−∞
h(x,y)f(x,y)dx dy (2.65)
E xample 2.59 The displacement of a piston in an internal combustion engine is defined to be the volume that the top of the piston moves through from the top to the bottom of its stroke. LetX represent the diameter of the cylinder bore, in millimeters, and letY represent the length of the piston stroke in millimeters. The displacement is given by D =πX2Y/4. AssumeX andY are jointly distributed with joint probability mass function
f(x,y)=
100 80.5<x<80.6 and 65.1<y<65.2
0 otherwise
Find the mean ofD.
Solution
μD= ∞
−∞
∞
−∞
πx2y
4 f(x,y)dx dy
= 65.2
65.1
80.6 80.5
πx2y
4 (100)dx dy
=331,998
The mean displacement is 331,998 mm3, or approximately 332 mL.
Conditional Distributions
If X andY are jointly distributed random variables, then knowing the value of X may change probabilities regarding the random variable Y. For example, let X represent the height in inches andY represent the weight in pounds of a randomly chosen college student. Let’s say that we are interested in the probabilityP(Y ≥200). If we know the joint density ofX andY, we can determine this probability by computing the marginal density ofY. Now let’s say that we learn that the student’s height is X =78. Clearly, this knowledge changes the probability thatY ≥200. To compute this new probability, the idea of aconditionaldistribution is needed.
We will first discuss the case whereXandY are jointly discrete. Letxbe any value for which P(X =x) > 0. Then the conditional probability thatY = ygiven X =x is P(Y =y|X =x). We will express this conditional probability in terms of the joint and marginal probability mass functions. Let p(x,y)denote the joint probability mass function ofXandY, and letpX(x)denote the marginal probability mass function ofX. Then the conditional probability is
P(Y =y|X =x)= P(X =xandY =y)
P(X=x) = p(x,y) pX(x)
Theconditional probability mass functionofY givenX =xis the conditional prob- ability P(Y =y|X=x), considered as a function ofyandx.
Definition
Let X andY be jointly discrete random variables, with joint probability mass functionp(x,y). Let pX(x)denote the marginal probability mass function ofX and letxbe any number for which pX(x) >0.
The conditional probability mass function ofY givenX =xis pY|X(y|x)= p(x,y)
pX(x) (2.66)
Note that for any particular values ofxandy, the value of pY|X(y|x)is just the conditional probabilityP(Y =y|X =x).
E xample 2.60 Table 2.3 presents the joint probability mass function of the lengthXand widthY of a CD cover. Compute the conditional probability mass functionpY|X(y|130).
Solution
The possible values forY arey=15 andy=16. From Table 2.3,P(Y =15 andX = 130)=0.42, andP(X =130)=0.70. Therefore,
pY|X(15|130)=P(Y =15|X =130)
= P(Y =15 andX =130) P(X =130)
= 0.42 0.70
=0.60
The value ofpY|X(16|130)can be computed with a similar calculation. Alternatively, note that pY|X(16|130) = 1−pY|X(15|130), sincey = 15 and y = 16 are the only two possible values for Y. Therefore pY|X(16|130) = 0.4. The conditional probability mass function ofY givenX =130 is therefore pY|X(15|130)= 0.60, pY|X(16|130)=0.40, andpY|X(y|130)=0 for any value ofyother than 15 or 16.
The analog to the conditional probability mass function for jointly continuous ran- dom variables is theconditional probability density function. The definition of the conditional probability density function is just like that of the conditional probability mass function, with mass functions replaced by density functions.
Definition
LetXandYbe jointly continuous random variables, with joint probability density function f(x,y). Let fX(x)denote the marginal probability density function of
Xand letxbe any number for which fX(x) >0.
The conditional probability density function ofY givenX =xis fY|X(y|x)= f(x,y)
fX(x) (2.67)
E xample
2.61 (Continuing Example 2.54.) The joint probability density function of the thicknessX and hole diameterY (both in millimeters) of a randomly chosen washer is f(x,y)= (1/6)(x+y)for 1≤x≤2 and 4≤y≤5. Find the conditional probability density function ofY givenX =1.2. Find the probability that the hole diameter is less than or equal to 4.8 mm given that the thickness is 1.2 mm.
Solution
In Example 2.55 we computed the marginal probability density functions fX(x)=1
6(x+4.5) for 1≤x≤2 fY(y)= 1
6(y+1.5) for 4≤y≤5 The conditional probability density function ofY givenX =1.2 is
fY|X(y|1.2)= f(1.2, y) fX(1.2)
=
⎧⎪
⎨
⎪⎩
(1/6)(1.2+y)
(1/6)(1.2+4.5) if 4≤y≤5
0 otherwise
=
⎧⎪
⎨
⎪⎩ 1.2+y
5.7 if 4≤y≤5
0 otherwise
The probability that the hole diameter is less than or equal to 4.8 mm given that the thickness is 1.2 mm isP(Y ≤4.8|X =1.2). This is found by integrating fY|X(y|1.2) over the regiony≤4.8:
P(Y ≤4.8|X =1.2)= 4.8
−∞ fY|X(y|1.2)dy
= 4.8
4
1.2+y 5.7 dy
=0.786
Conditional Expectation
Expectation is another term for mean. Aconditional expectationis an expectation, or mean, calculated using a conditional probability mass function or conditional probability density function. The conditional expectation ofYgivenX =xis denotedE(Y|X =x) orμY|X=x. We illustrate with Examples 2.62 through 2.64.
E xample 2.62 Table 2.3 presents the joint probability mass function of the lengthXand widthY of a CD cover. Compute the conditional expectationE(Y|X =130).
Solution
We computed the conditional probability mass function pY|X(y|130) in Exam- ple 2.60. The conditional expectation E(Y|X = 130) is calculated using the
definition of the mean of a discrete random variable and the conditional probability mass function. Specifically,
E(Y|X =130)=
y
y pY|X(y|130)
=15pY|X(15|130)+16pY|X(16|130)
=15(0.60)+16(0.40)
=15.4
E xample 2.63 Refer to Example 2.61. Find the conditional expectation ofY given thatX =1.2.
Solution
SinceXandYare jointly continuous, we use the definition of the mean of a continuous random variable to compute the conditional expectation.
E(Y|X =1.2)= ∞
−∞y fY|X(y|1.2)dy
= 5
4
y1.2+y 5.7 dy
=4.5146
E xample 2.64 Refer to Example 2.61. Find the valueμY (which can be called the unconditional mean ofY). Does it differ fromE(Y|X=1.2)?
Solution
The valueμYis calculated using the marginal probability mass function ofY. Thus μY =
∞
−∞
y fY(y)dy
= 5
4
y1
6(y+1.5)dy
=4.5139
The conditional expectation in this case differs slightly from the unconditional expectation.
Independent Random Variables
The notion of independence for random variables is very much like the notion of in- dependence for events. Two random variables are independent if knowledge regarding one of them does not affect the probabilities of the other. We present here a definition of independence of random variables in terms of their joint probability mass or joint prob- ability density function. A different but logically equivalent definition was presented in Section 2.5.
Definition
Two random variablesXandY are independent, provided that
■ IfXandY are jointly discrete, the joint probability mass function is equal to the product of the marginals:
p(x,y)= pX(x)pY(y)
■ IfXandY are jointly continuous, the joint probability density function is equal to the product of the marginals:
f(x,y)= fX(x)fY(y) Random variablesX1, . . . ,Xnare independent, provided that
■ IfX1, . . . ,Xnare jointly discrete, the joint probability mass function is equal to the product of the marginals:
p(x1, . . . ,xn)= pX1(x1)ã ã ãpXn(xn)
■ IfX1, . . . ,Xnare jointly continuous, the joint probability density function is equal to the product of the marginals:
f(x1, . . . ,xn)= fX1(x1)ã ã ã fXn(xn)
Intuitively, when two random variables are independent, knowledge of the value of one of them does not affect the probability distribution of the other. In other words, the conditional distribution ofY givenX is the same as the marginal distribution ofY.
IfXandY are independent random variables, then
■ IfXandY are jointly discrete, andxis a value for whichpX(x) >0, then pY|X(y|x)= pY(y)
■ IfXandY are jointly continuous, andxis a value for which fX(x) >0, then
fY|X(y|x)= fY(y)
E xample 2.65 The joint probability mass function of the length X and thickness Y of a CD tray cover is given in Table 2.3. AreX andY independent?
Solution
We must check to see if P(X = xandY = y)= P(X = x)P(Y = y)for every value ofxandy. We begin by checkingx=129,y=15:
P(X=129 andY =15)=0.12=(0.20)(0.60)=P(X =129)P(Y =15)
Continuing in this way, we can verify thatP(X=xandY=y)=P(X=x)P(Y=y) for every value ofxandy. ThereforeX andY are independent.
E xample 2.66 (Continuing Example 2.54.) The joint probability density function of the thickness X and hole diameterY of a randomly chosen washer is f(x,y)=(1/6)(x+y)for 1≤x≤2 and 4≤y≤5. AreX andY independent?
Solution
In Example 2.55 we computed the marginal probability mass functions fX(x)= 1
6
x+9 2
fY(y)= 1 6
y+3
2
Clearly f(x,y)= fX(x)fY(y). ThereforeX andY are not independent.
Covariance
When two random variables are not independent, it is useful to have a measure of the strength of the relationship between them. Thepopulation covarianceis a measure of a certain type of relationship known as alinearrelationship. We will usually drop the term “population,” and refer simply to the covariance.
Definition
LetX andY be random variables with meansμX andμY. The covariance ofX andY is
Cov(X,Y)=μ(X−μX)(Y−μY) (2.68)
An alternate formula is
Cov(X,Y)=μXY−μXμY (2.69)
A proof of the equivalence of these two formulas is presented at the end of the section.
It is important to note that the units of Cov(X,Y)are the units of X multiplied by the units ofY.
How does the covariance measure the strength of the linear relationship between X and Y? The covariance is the mean of the product of the deviations (X −μX)(Y −μY). If a Cartesian coordinate system is constructed with the origin at(μX, μY), this product will be positive in the first and third quadrants, and negative in the second and fourth quadrants (see Figure 2.18). It follows that if Cov(X,Y)is strongly positive, then values of(X,Y)in the first and third quadrants will be observed much more often than values in the second and fourth quadrants. In a random sample of points, therefore, larger values ofXwould tend to be paired with larger values ofY, while smaller values ofXwould tend to be paired with smaller values ofY (see Figure 2.18a).
Similarly, if Cov(X,Y)is strongly negative, the points in a random sample would be