Multivariate statistics for Data science Notes
Multivariate Methods
Let
where
Let
in which
Properties of Covariance Matrices
- Symmetric
- Non-negative definite
for any , which is equivalent to eigenvalues of , (generalized variance) (the bigger this number is, the more variation there is sum of variance (total variance)
Note:
is typically required to be positive definite, which means all eigenvalues are positive, and has an inverse such that
Correlation Matrices
where
Alternatively,
where
and
Equalities
Let
and be random vectors with means and and variance -variance matrices and . and be matrices of constants and and be vectors of constants
Then
Multivariate Normal Distribution
Let
Properties of MVN
Let
be a fixed matrix. Then . and all rows of must be linearly independent to guarantee that is non-singular.Let
be a matrix such that . Then andAny fixed linear combination of
(say ) followsDefine a partition,
where is is , Then
The marginal distributions of
and are andIndividual components
are all normally distributedThe conditional distribution of
and is normal- In this formula, we see if we know (have info about)
, we can re-weight ’s mean, and the variance is reduced because we know more about because we know
- In this formula, we see if we know (have info about)
which is analogous to
. And and are independently distrusted only if
If
and is positive definite, thenIf
are independent random variables, then for fixed matrices ,
Multiple Regression
The conditional distribution of Y given x follows a univariate normal distribution with
where
Samples from Multivariate Normal Populations
A random sample of size n,
Since
are iid, their sample mean, . that is, is an unbiased estimator ofThe
sample variance-covariance matrix, is- where
is symmetric, unbiased estimator of and has random variables.
- where
is a Wishart distribution with n-1 degrees of freedom and expectation . The Wishart distribution is a multivariate extension of the Chi-squared distribution. and are independent and are sufficient statistics. (All of the info in the data about and is contained in and , regardless of sample size).
Large Sample Properties
is a consistent estimator for is a consistent estimator forMultivariate Central Limit Theorem: Similar to the univariate case,
where n is large relative to p ( ), which is equivalent toWald’s Theorem:
when n is large relative to p.
Maximum Likelihood Estimation for MVN
Suppose iid
Then, the MLEs are
using derivatives of the log of the likelihood function with respect to
Properties of MLEs
Invariance: If
is the MLE of , then the MLE of is for any function h(.)Consistency: MLEs are consistent estimators, but they are usually biased
Efficiency: MLEs are efficient estimators (no other estimator has a smaller variance for large samples)
Asymptotic normality: Suppose that
is the MLE for based upon n independent observations. Then is the Fisher Information Matrix, which contains the expected values of the second partial derivatives fo the log-likelihood function. the (i,j)th element of iswe can estimate
by finding the form determined above, and evaluate it at
Likelihood ratio testing: for some null hypothesis,
we can form a likelihood ratio testThe statistic is:
For large n,
where v is the number of parameters in the unrestricted space minus the number of parameters under
Test of Multivariate Normality
Check univariate normality for each trait (X) separately
Can check [Normality Assessment]
The good thing is that if any of the univariate trait is not normal, then the joint distribution is not normal (see again [m]). If a joint multivariate distribution is normal, then the marginal distribution has to be normal.
However, marginal normality of all traits does not imply joint MVN
Easily rule out multivariate normality, but not easy to prove it
Mardia’s tests for multivariate normality
Multivariate skewness is
where
and are independent, but have the same distribution (note: here is not regression coefficient)Multivariate kurtosis is defined as
For the MVN distribution, we have
andFor a sample of size n, we can estimate
- where
. Note: where is the Mahalanobis distance
- where
[@MARDIA_1970] shows for large n
Hence, we can use
and to test the null hypothesis of MVN.When the data are non-normal, normal theory tests on the mean are sensitive to
, while tests on the covariance are sensitive to
Alternatively, Doornik-Hansen test for multivariate normality [@doornik2008]
Chi-square Q-Q plot
Let
be a random sample sample fromThen
are iid . Thus,plot the ordered
values against the qualities of the distribution. When normality holds, the plot should approximately resemble a straight lien passing through the origin at a 45 degreeit requires large sample size (i.e., sensitive to sample size). Even if we generate data from a MVN, the tail of the Chi-square Q-Q plot can still be out of line.
If the data are not normal, we can
ignore it
use nonparametric methods
use models based upon an approximate distribution (e.g., GLMM)
try performing a transformation
Mean Vector Inference
In the univariate normal distribution, we test
under the null hypothesis. And reject the null if
Equivalently,
Natural Multivariate Generalization
Define Hotelling’s
which can be viewed as a generalized distance between
Under the assumption of normality,
and reject the null hypothesis when
The
test is invariant to changes in measurement units.- If
where and do not depend on , then
- If
The
test can be derived as a likelihood ratio test of
Confidence Intervals
Confidence Region
An “exact”
are just the mean vectors that are not rejected by the test when is observed.
In case that you have 2 parameters, the confidence region is a “hyper-ellipsoid”.
In this region, it consists of all
Even though the confidence region better assesses the joint knowledge concerning plausible values of
Simultaneous Confidence Statements
- Intervals based on a rectangular confidence region by projecting the previous region onto the coordinate axes:
for all
which implied confidence region is conservative; it has at least
Generally, simultaneous
works for any arbitrary linear combination
, which is a projection onto the axis in the direction ofThese intervals have the property that the probability that at least one such interval does not contain the appropriate
is no more thanThese types of intervals can be used for “data snooping” (like [Scheffe])
One at a time
- One at a time confidence intervals:
Each of these intervals has a probability of
of covering the appropriateBut they ignore the covariance structure of the
variablesIf we only care about
simultaneous intervals, we can use “one at a time” method with the [Bonferroni] correction.This method gets more conservative as the number of intervals
increases.
General Hypothesis Testing
One-sample Tests
where
is a matrix of rank c where
We can test this hypothesis using the following statistic
where
Example:
Equivalently,
a total of
number of rows =
Equivalently, we can also compare all of the other means to the first mean. Then, we test
The value of
This is often used for repeated measures designs, where each subject receives each treatment once over successive periods of time (all treatments are administered to each unit).
Example:
Let
Let
Equivalently,
We can test orthogonal polynomials for 4 equally spaced time points. To test for example the null hypothesis that quadratic and cubic effects are jointly equal to 0, we would define
Two-Sample Tests
Consider the analogous two sample multivariate tests.
Example: we have data on two independent random samples, one sample from each of two populations
We assume
normality
equal variance-covariance matrices
independent random samples
We can summarize our data using the sufficient statistics
Since we assume that
At least one element of the mean vectors is different
We use
to estimate to estimateNote: because we assume the two populations are independent, there is no covariance
Reject
or equivalently, if
A
The simultaneous confidence intervals for all linear combinations of
Bonferroni intervals, for k combinations
Model Assumptions
If model assumption are not met
Unequal Covariance Matrices
If
(large samples) there is little effect on the Type I error rate and power fo the two sample testIf
and the eigenvalues of are less than 1, the Type I error level is inflatedIf
and some eigenvalues of are greater than 1, the Type I error rate is too small, leading to a reduction in power
Sample Not Normal
Type I error level of the two sample
test isn’t much affect by moderate departures from normality if the two populations being sampled have similar distributionsOne sample
test is much more sensitive to lack of normality, especially when the distribution is skewed.Intuitively, you can think that in one sample your distribution will be sensitive, but the distribution of the difference between two similar distributions will not be as sensitive.
Solutions:
Transform to make the data more normal
Large large samples, use the
(Wald) test, in which populations don’t need to be normal, or equal sample sizes, or equal variance-covariance matrices use
Equal Covariance Matrices Tests
With independent random samples from k populations of p-dimensional vectors. We compute the sample covariance matrix for each,
Assume
with
Bartlett’s Test
(a modification of the likelihood ratio test). Define
and (note:
Reject
whenIf not all samples are from normal populations,
has a distribution which is often shifted to the right of the nominal distribution, which means is often rejected even when it is true (the Type I error level is inflated). Hence, it is better to test individual normality first, or then multivariate normality before you do Bartlett’s test.
Two-Sample Repeated Measurements
Define
to be the observations from the i-th subject in the h-th group for times 1 through TAssume that
are iid and that are iid where is a matrix of rank whereThe test statistic has the form
where
when
If the null hypothesis
The null hypothesis matrix term is then