Mathematical Statistics for Health Researchers

Author
Affiliation

Bongani Ncube

University Of the Witwatersrand (School of Public Health)

Published

2 April 2025

Keywords

Expectation, Variance, Moment Generating Functions, Statistical Analysis, Biostatistics

Summary of Random variables

Discrete Variable Continuous Variable
Definition A random variable is discrete if it can assume at most a finite or countably infinite number of possible values A random variable is continuous if it can assume any value in some interval or intervals of real numbers and the probability that it assumes any specific value is 0
Density Function A function f is called a density for X if:
(1) f(x)0
(2) all xf(x)=1
(3) f(x)=P(X=x) for x real
A function f is called a density for X if:
(1) f(x)0 for x real
(2) f(x)dx=1
(3) P[aX]=abf(x)dx for a and b real
Cumulative Distribution Function
for x real
F(x)=P[Xx] F(x)=P[Xx]=f(t)dt
E[H(X)] all xH(x)f(x) H(x)f(x)
μ=E[X] all xxf(x) xf(x)
Ordinary Moments
the kth ordinary moment for variable X is defined as: E[Xk]
all xX(xkf(x)) (xkf(x))
Moment generating function (mgf)
mX(t)=E[etX]
all xX(etxf(x)) (etxf(x)dx)


Expected Value For descrete Random Variables

Recall, a random variable is a real-valued function defined over a sample space, usually denoted by X or Y, and X is discrete if the space of X is finite or countably infinite.

Note

If X is a discrete random variable with probability function p(x), then the expected value of X, denoted E(X), is E(X)=all xxp(x). The expected value E(X) is also called the mean of X, and is often denoted as μX, or μ if the random variable X is understood.

Note

Let X be a discrete random variable with probability function p(x), and suppose g(X) is a real-valued function of X. Then the expected value of g(X) is E(g(X))=all xg(x)p(x).

Variance

Note

If X is a random variable with expected value E(X)=μ, the variance of X, denoted V(X), is V(X)=E((Xμ)2). The variance of X is often denoted σX2, or σ2 if the random variable is understood. Also, V(X), denoted σX or σ, is called the standard deviation of X.

Properties of Expected Value

Note

Suppose X is a discrete random variable, cR is a constant, and g, g1, and g2 are functions of X.

  1. E(c)=c.
  2. E(cg(X))=cE(g(X)).
  3. E(g1(X)±g2(X))=E(g1(X))±E(g2(X)).

Let’s take the time to prove these properties. Each of them essentially follows by properties of summations.

Proof
  1. Given a constant c, we can view this constant as a function of X, say f(x)=c. Then E(c)=all xcp(x)=call xp(x)

Since the sum over all x of p(x) is 1 for any probability model, the result follows.

  1. Here appeal to Theorem: E(cg(X))=all xcg(x)p(x)=call xg(x)p(x)by arithmetic=cE(g(X))

  2. Here we also appeal to Theorem and arithmetic: E(g1(x)±g2(x))=all x(g1(x)±g2(x))p(x)=all x(g1(x)p(x)±g2(x)p(x))by arithmetic=all xg1(x)p(x)±all xg2(x)p(x)by arithmetic=E(g1(X))±E(g2(X))

Let X be a discrete random variable with probability function p(x) and expected value E(X)=μ. Then V(X)=E(X2)μ2.

Important

By definition, V(X)=E((Xμ)2)=E(X22μX+μ2)by expanding=E(X2)E(2μX)+E(μ2)by E() Property 3=E(X2)2μE(X)+μ2by E() Properties 2 and 1=E(X2)2μ2+μ2since E(X)=μV(X)=E(X2)μ2.

Tchebysheff’s Theorem

Let X be a random variable with mean E(X)=μ and finite variance V(X)=σ2>0. Then for any constant k>0, P(|Xμ|<kσ)11k2. Equivalently, P(|Xμ|kσ)1k2.

Important

We prove Tchebysheff’s inequality in the case for a discrete random variable, and we come back to this theorem after defining continuous random variables.

Let k>0 be given.

Then V(X)=all x(xμ)2p(x), by the definition of variance. We can partition the space of X into three disjoint sets, depending on the location of x relative to μ±kσ:

V(X)=all xμkσ(xμ)2p(x)+all x s.t. |xμ|<kσ(xμ)2p(x)+all xμ+kσ(xμ)2p(x)

Each of these three sums is non-negative, and for the first and third sums we can also say that (xμ)2k2σ2 for all x in the given range, so it follows that V(x)all xμkσk2σ2p(x)+0+all xμ+kσk2σ2p(x). So,

σ2all xμkσk2σ2p(x)+0+all xμ+kσk2σ2p(x)=k2σ2(all xμkσp(x)+all xμ+kσp(x))=k2σ2(P(Xμkσ)+P(Xμ+kσ))=k2σ2P(|Xμ|kσ)

Dividing both sides of the inequality by the positive value k2σ2 gives us the result: P(|Xμ|kσ)1k2.

Expected Value for Continuous Random Variables

Note

If X is a continuous random variable with probability density function f(x), then the expected value of X, denoted E(X), is E(X)=xf(x) dx, provided this integral exists. The expected value E(X) is also called the mean of X, and is often denoted as μX, or μ if the random variable X is understood.

The expected value of the function g(X) of X is E(g(X))=g(x)f(x) dx, provided this integral exists.

The variance of X is V(X)=E((XμX)2), provided this integral exists.

As in the discrete case, one can show V(X)=E(X2)E(X)2, a working formula for variance which is sometimes easier to use to calculate variance.

Find E(X) and V(X) where X is the continuous random variable .

Recall X has density function f(x)=3x2/8 for 0x2.

Expected Value: E(X)=02x3x2/8 dx=3802x3 dx=3814x4 |02=32.

Variance: We first find E(X2): E(X2)=02x23x2/8 dx=3802x4 dx=3815x5 |02=125.

Then, V(X)=E(X2)E(X)2=(12/5)(3/2)2=0.15.

The properties of expected value that held for discrete random variables also hold for continuous random variables.

Note

Suppose X is a continuous random variable, cR is a constant, and g, g1, and g2 are functions of X.

  1. E(c)=c.
  2. E(cg(X))=cE(g(X)).
  3. E(g1(X)±g2(X))=E(g1(X)±g2(X)).

These results follow immediately from properties of integration. For instance, to prove property 1 we observe that for constant c, E(c)=cf(x) dx=cf(x) dx, and the integral in the last expression equals 1 by definition of a valid probability density function.

Let X be a random variable (discrete or continuous) with E(X)=μ and V(X)=σ2, and let a,b be constants. Then

  1. E(aX+b)=aE(X)+b=aμ+b.
  2. V(aX+b)=a2V(X)=a2σ2.

Proof.

  1. This result follows immediately from properties of expected value .

  2. Let Y=aX+b. Then (a) says that E(Y)=aμ+b, so V(Y)=E((Y(aμ+b))2)=E(((aX+b)(aμ+b))2)=E((aXaμ)2)=a2E((Xμ)2) But E((Xμ)2)=V(X) by the definition of variance, so the result follows.

Moments and Moment-Generating Functions For Descrete Random Variables

Moment generating functions(MGFs), Probability generating functions (PGFs) and characteristic functions provide a way of representing pdfs/pmfs through functions of a single variable. The are useful in many ways and these include:

  1. Provide an easy way of calculating the moments of a distribution. This helps in the computation of mean and variance functions for the different variables.
  2. Provide some powerful tools for addressing certain counting and combinatorial problems
  3. Provide easy way of characterizing the distribution of the sum of independent random variables.
  4. Provide a bridge between complex analysis and probability, so the complex analysis methods can be brought to bear on probability problems.
  5. Provide powerful tools for proving limiting theorems such as the law of large numbers and the central limit theorems.
We have seen that:

For random variable X we have seen that E(X) and E(X2) provide useful information:

  • μ=E(X) gives the mean of the distribution
  • σ2=E(X2)E(X)2 gives the variance of the distribution.
Important

Let X be a random variable, and k1. The kth moment of X about the origin is E(Xk). More generally, for any constant cR, E((Xc)k) is called the kth moment of X about x=c.

Often times we can encode all the moments of a random variable in an object called a moment-generating function.

Note

Let X be a discrete random variable with density function p(x). If there is a positive real number h such that for all t(h,h), E(etx) exists and is finite, then the function of t defined by m(t)=E(etx) is called the moment-generating function of X.

Suppose X has the density function x0123p(x).1.2.3.4

Then, for any real number t,

m(t)=E(etx)=x=03etxp(x)=e0(.1)+et(.2)+e2t(.3)+e3t(.4)=.1+.2et+.3e2t+.4e3t,

and this sum exists as a finite number for any <t<, so the mgf for X exists.

How does m(t) encode the moments E(X),E(X2),E(X3),?

Suppose X is a random variable with moment-generating function m(t) which exists for t in some open interval containing 0. Then the kth moment of X equals the kth derivative of m(t) evaluated at t=0: E(Xk)=m(k)(0).

Proof. Let’s say X is discrete and m(t)=all xetxp(x). Then the derivative of m(t) with respect to the variable t is Then m(t)=all xxetxp(x), and letting t=0 we have m(0)=all xxe0p(x), which equals E(X) since e0=1.

The second derivative of m(t) is m(t)=ddt[m(t)]=all xx2etxp(x)

Evaluating this at t=0 gives m(t)=all xx21p(x)=E(X2).

Continuing in this manner, for any k1, the kth derivative of m(t) is m(k)(t)=all xxketxp(x), which evaluates to the defintion of E(Xk) when t=0.

The mgf for a geometric distribution

If X is geometric with parameter p, then p(x)=(1p)x1p, for x=1,2,3,, and

m(t)=E(etx)=x=1etx(1p)x1p=petx=1et(x1)(1p)x1since etet(x1)=etx=petx=1[et(1p)]x1=petk=0[et(1p)]kwhere k=x1 is a change of index=pet11et(1p)

The last step is true by the geometric series formula, provided |et(1p)|<1. Since 0|et(1p)|=et(1p), the series converges by the geometric series formula if and only if et(1p)<1. Well,

et(1p)<1et<11pt<ln(11p).

In other words, yes, there exists an interval containing 0 for which m(t) exists for all t in the interval.

The mgf for a Poisson distribution

Find the mgf of a Poisson random variable X with parameter λ. Since we’re considering a Poisson distribution, our strategy for finding the mgf will be to work our expectation to look like a power series for ejunk.

Strategy: Work our series to include x=0(junk)xx! since this converges to ejunk.

m(t)=E(etx)=x=0etxλxeλx!=eλx=0(λet)xx!here it is!=eλe[λet]for all <t<=eλ(et1).

Let’s derive our μ and σ formulas for a Poisson random variable using the mgf.

The first derivative is m(t)=eλ(et1)λet, and m(0)=eλ(11)λe0=λ.

The second derivative is m(t)=(eλ(et1)λet)λet+eλ(et1)λet, so m(0)=λ2+λ.

Now μ=m(0)=λ, check! And, σ2=m(0)[m(0)]2=(λ2+λ)λ2=λ, check again!

Moments and Moment-Generating Functions For Normal Random Variables

the moment-generating function (mgf) associated with a discrete random variable X, should it exist, is given by mX(t)=E(etX) where the function is defined on some open interval of t values containing 0. The same definition applies to continuous random variables. We have seen that this mgf encodes information about X: the kth derivative of m evaluated at t=0 gives us the kth moment. That is, for k=1,2,3,, mX(k)(0)=E(Xk).

In fact, it turns out that the mgf gives us all the information about a random variable X, per the following theorem, whose proof is beyond the scope of this course.

Let mX(t) and mY(t) denote the mgfs of random variables X and Y, respectively. If both mgfs exist and mX(t)=mY(t) for all values of t then X and Y have the same probability distribution.

Find the mgf for the standard normal random variable ZN(0,1).

mZ(t)=E(etZ)=12πez2/2etz dz=12πetzz2/2 dz=12πe12(zt)2+12t2 dzcomplete the square=e12t2[12πe12(zt)2 dz]

The bracketed portion of this last expression equals 1, for all t, since it is the integral of the density function of a N(t,1) distribution, so mZ(t)=e12t2, for all <t<.

More generally, for XN(μ,σ), one can show its mgf is

m(t)=e(μt+σ22t2)

We now return to the proof of Theorem , which we restate as the following lemma.

If X is N(μ,σ) and Z=Xμσ, then Z is N(0,1).

Note

Let X be N(μ,σ), and Z=Xμσ. Then the mgf for Z is

mZ(t)=E[etZ]=E[et(Xμσ)]=E[eXtσμtσ]=E[eXt/σeμt/σ]=eμt/σE[eXt/σ]=eμt/σmX(t/σ) This last step follows because E[eXt/σ] is the mgf of X evaluated at t/σ. Then,

mZ(t)=eμt/σe(μ(t/σ)+σ22(t/σ)2)=et2/2

But hey! This mgf is the mgf for N(0,1), so by Theorem , since Z=(Xμ)/σ and N(0,1) have the same mgf, they have the same probability distribution.

If Z is N(0,1) then Z2 is χ2(1).

The proof of this lemma is left for now.

Note

Let X1,X2,,Xn be independent random variables with mgfs m1(t),m2(t),mn(t), respectively. If Sn=X1+X2++Xn then mSn(t)=m1(t)m2(t)  mn(t).

Sketch of Proof:

mSn(t)=E[etSn]=E[et(X1+X2+Xn)]=E[etX1 etX2  etXn]=E[etX1]E[etX2]  E[etXn]=MX1(t)MX2(t)  MXn(t)

Note

Let X1,X2,,Xn be independent random variables coming from a distribution with mgf M(t) and distribution function F(x) . If Sn=X1+X2++Xn then mSn(t)=m1(t)m2(t)  mn(t)=[m(t)]n

That the E[ ] distributes through the product in line 4 above follows since the Xi are assumed to be independent.

Note

Let X1,X2,,Xn be independent normal random variables with XiN(μi,σi), and let a1,a2,,an be constants. If Sn=i=1naiXi, then U is normally distribution with μ=i=1naiμi    and    σ2=i=1nai2σi2.

Note

Since Xi is N(μi,σi), Xi has mgf mXi(t)=e(μit+σi2t2/2). For constant ai, the random variable aiXi has mgf maiXi(t)=E(eaiXit)=mXi(ait)=e(μiait+ai2σi2t2/2). Then by Theorem and properties of exponents, for Sn=aiXi, mSn(t)=i=1nmaiXi(t)=i=1ne(μiait+ai2σi2t2/2)=e(taiμi+t22ai2μi2)

But hey! This is the mgf for a normal distribution with mean aiμ and variance ai2σi2, so we have proved the result.

Let X1,X2,,Xn be independent normal random variables with XiN(μi,σi), and Zi=Xiμiσi for i=1,,n. Then U=i=1nZi2 is χ2(n).

Note

Suppose the number of customers arriving at a particular checkout counter in an hour follows a Poisson distribution. Let X1 record the time until the first arrival, X2, the time between the 1st and 2nd arrival, and so on, up to Xn, the time between the (n1)st and nth arrival. Then it turns out the Xi are independent, and each is an exponential random variable with density fXi(xi)=1θexi/θ, for xi>0 (and 0 else). Find the density function for the waiting time U until the nth customer arrives.

Well U=X1+X2++Xn, so by Theorem , mU(t)=m1(t)  mn(t)=(1θt)n. But, hey! This is the mgf for a gamma(α=n,β=θ) random variable so by Theorem , U is gamma(n,θ). So fU(u)=1(n1)!θnun1eu/θ, for u>0 (and 0 else).

Note

If Y1 is N(10,.5) and Y2 is N(4,.2) and U=100+7Y1+3Y2, how is U distributed, and what value marks the 90th percentile for U?

Theorem says that U is normal with E(U)=100+710+34=182, and V(U)=0+72(.5)2+32(.2)2=12.61, so σU=12.61=3.55.

The 90th percentile can be found in R with the qnorm() function:

qnorm(.9,mean=182,sd=3.55)
[1] 186.5495
MGF for a Uniform Distribution

Find the moment-generating function for X U(θ1,θ2).

mX(t)=E(etX)=θ1θ2etx1θ2θ1 dx=1θ2θ11tetx |θ1θ2=et(θ2θ1)t(θ2θ1).

MGF for a Gamma Distribution

Find the moment-generating function for Xgamma(α,β) and compute E(X) and V(X).

mX(t)=E(etX)=0etx1βαΓ(α)xα1e(x/β) dx=1βαΓ(α)0xα1ex(1/βt) dx=1βαΓ(α)(11/βt)αΓ(α)0xα1ex(1/βt)(11/βt)αΓ(α) dx=1βαΓ(α)(11/βt)αΓ(α)

The last integral above evaluates to 1 because it is the pdf for a gamma(α,β) distribution! After simplifying we obtain mX(t)=(1βt)α.

With the mgf for a gamma random variable in hand, we can now derive its mean and variance, thus proving Theorem.

mX(t)=α(1βt)α1(β)=αβ(1βt)α1, so E(X)=mX(0)=αβ. Turning to the second derivative, mX(t)=(α1)αβ(1βt)α2(β)=α(α+1)β2(1βt)α2, so E(X2)=mX(0)=α(α+1)β2. Thus, V(X)=E(X2)E(X)2=α(α+1)β2(αβ)2=αβ2.

Moment generating function

Moment generating function properties:

  1. dk(mX(t))dtk|t=0=E[Xk]
  2. μ=E[X]=mX(0)
  3. E[X2]=mX(0)

mgf Theorems

Let X1,X2,...Xn,Y be random variables with moment-generating functions mX1(t),mX2(t),...,mXn(t),mY(t)

  1. If mX1(t)=mX2(t) for all t in some open interval about 0, then X1 and X2 have the same distribution
  2. If Y=α+βX1, then mY(t)=eαtmX1(βt)
  3. If X1,X2,...Xn are independent and Y=α0+α1X1+α2X2+...+αnXn (where α0,...,αn are real numbers), then mY(t)=eα0tmX1(α1t)mX2(α2t)...mXn(αnt)
  4. Suppose X1,X2,...Xn are independent normal random variables with means μ1,μ2,...μn and variances σ12,σ22,...,σn2. If Y=α0+α1X1+α2X2+...+αnXn (where α0,...,αn are real numbers), then Y is normally distributed with mean μY=α0+α1μ1+α2μ2+...+αnμn and variance σY2=α12σ12+α22σ22+...+αn2σn2

Moment

Moment Uncentered Centered
1st E(X)=μ=Mean(X)
2nd E(X2) E((Xμ)2)=Var(X)=σ2
3rd E(X3) E((Xμ)3)
4th E(X4) E((Xμ)4)

Skewness(X) = E((Xμ)3)/σ3

Kurtosis(X) = E((Xμ)4)/σ4

Variate Transformations

Transformations and Expectations

Distributions of Functions of a Random Variable

If X is a random variable with cdf FX(x), then any function of X, say g(X), is also a random variable. We set Y=g(X), then for any set A

P(YA)=P(g(X)A)

Formally, if we write y = g(x), the function g(x) defines a mapping from the original sample space of X, X, to a new sample space, Y, the sample space of the random variable Y. That is,

g(x):XY

We associate with g an inverse mapping, denoted by g1,

g1(A)={xX:g(x)A}

If the random variable Y is now defined by Y = g(X), we can write for any set AY,

P(YA)=P(g(X)A)=P({xX:g(x)A}=P(Xg1(A))

If Y is a discrete random variable, the pmf for Y is

fY(y)=P(Y=y)=xg1(y)P(X=x)=xg1(y)fX(x), for yY

It’s easiest to deal with function g(x) that are monotone, that is those that satisfy either increasing or decreasing. It the transformation x –> g(x) is monotone, then it is one-to-one and onto from XY.

Theorem 2.1.3

Let X have cdf FX(x), let Y = g(X), and let X={x:fX(x)>0}, Y={y:y=g(x) for some xX}.

  • If g is an increasing function on X, FY(y)=FX(g1(g)) for yY
  • If g is a decreasing function on X and X is a continuous random variable, FY(y)=1FX(g1(y)) for yY

Theorem 2.1.5

Let X have pdf fX(x) and let Y=g(X), where g is a monotone function. Let X={x:fX(x)>0}, Y={y:y=g(x) for some xX}. Suppose that fX(x) is continuous on X and that g1(y) has a continuous derivative on Y. THen the pdf of Y is given by

fY(y)={fX(g1(y))|ddyg1(y)|yY0otherwise