Loading [MathJax]/extensions/MathMenu.js

Analysis of Variance in R and Stata for Public health Researchers

Biostatistics for health Researchers

Authors

Bongani Ncube

University Of the Witwatersrand (Biostatistician)

Published

16 March 2025

Limitations of conducting multiple t-tests

Why can’t T-tests be used to compare each mean with each other (conducting multiple t-tests) when there are more than two means

  • Time-consuming: Analysis becomes increasingly time-consuming as the number of means increases
  • Inflates overall significance level: Increases the likelihood of finding a significant result by chance alone, leading to a high risk of false positive
  • Loss of power: making detecting true differences between means harder, especially with small sample sizes.
  • Interpretation challenges: Numerous pairwise comparisons become challenging to interpret and synthesise findings coherently

These limitations can be effectively controlled using ANOVA to compare means across multiple groups

Assumptions

  • Independence of observations: Ensure that the observations within each group are independent

  • Normality: Check whether the DV is approximately normally distributed for each category of the IDV

  • Create histograms or Q-Q plots for each group

  • Shapiro-Wilk test of normality (Best for small sample size)

  • Null hypothesis (H0): the data is normally distributed

  • Alternative hypothesis (H1): The data is not normally distributed

  • p-value < 0.05, reject H0

  • p-value >= 0.05, fail to reject-data is normal

  • D’Agostino-Pearson Test (sktest-best for large samples)

  • Same as above

  • Test can be unreliable if observations in one or more groups come from a highly non-normal distribution

  • Homogeneity of variances: Check whether the variances of the groups are approximately equal (More important than the normality test)

  • Create boxplots for each group

  • Levene’s test

  • Bartlett’s test (part of the standard output of the oneway command in Stata)

    • Null Hypothesis (H0): The variances across all groups are equal (homoscedasticity)
    • Alternative Hypothesis (H1): At least one group has a variance that is significantly different (heteroscedasticity)
  • If this assumption does not hold we should consider using the non-parametric alternative (Kruskal Wallis test ) or transforming our response variable

Analysis Steps

  1. Conduct exploratory data analysis- Graphically and numerically, e.g.  mean, SD, min, max, for each group
  2. Verify that the assumptions of ANOVA are met
  3. Formulate hypotheses
  4. Calculate test statistic
  5. Determine critical value
  6. Decision making
  7. Post-hoc analysis (if needed)

Theory behind Analysis of Variance (ANOVA)

ANOVA is using the same underlying mechanism as linear regression. However, the angle that ANOVA chooses to look at is slightly different from the traditional linear regression. It can be more useful in the case with qualitative variables and designed experiments.


Experimental Design

  • Factor: explanatory or predictor variable to be studied in an investigation
  • Treatment (or Factor Level): “value” of a factor applied to the experimental unit
  • Experimental Unit: person, animal, piece of material, etc. that is subjected to treatment(s) and provides a response
  • Single Factor Experiment: one explanatory variable considered
  • Multifactor Experiment: more than one explanatory variable
  • Classification Factor: A factor that is not under the control of the experimenter (observational data)
  • Experimental Factor: assigned by the experimenter


Basics of experimental design:

  • Choices that a statistician has to make:

    • set of treatments
    • set of experimental units
    • treatment assignment (selection bias)
    • measurement (measurement bias, blind experiments)
  • Advancements in experimental design:

    1. Factorial Experiments:
      consider multiple factors at the same time (interaction)

    2. Replication: repetition of experiment

      • assess mean squared error
      • control over precision of experiment (power)
    3. Randomization

      • Before R.A. Fisher (1900s), treatments were assigned systematically or subjectively
      • randomization: assign treatments to experimental units at random, which averages out systematic effects that cannot be control by the investigator
    4. Local control: Blocking or Stratification

      • Reduce experimental errors and increase power by placing restrictions on the randomization of treatments to experimental units.

Randomization may also eliminate correlations due to time and space.

Completely Randomized Design (CRD)

Treatment factor A with \(a\ge2\) treatments levels. Experimental units are randomly assinged to each treatment. The number of experiemntal units in each group can be

  • equal (balanced): n
  • unequal (unbalanced): \(n_i\) for the i-th group (i = 1,…,a).

The total sample size is \(N=\sum_{i=1}^{a}n_i\)

Possible assignments of units to treatments are \(k=\frac{N!}{n_1!n_2!...n_a!}\)

Each has probability 1/k of being selected. Each experimental unit is measured with a response \(Y_{ij}\), in which j denotes unit and i denotes treatment.

Treatment

1 2 a
\(Y_{11}\) \(Y_{21}\) \(Y_{a1}\)
\(Y_{12}\)
Sample Mean \(\bar{Y_{1.}}\) \(\bar{Y_{2.}}\) \(\bar{Y_{a.}}\)
Sample SD \(s_1\) \(s_2\) \(s_a\)

where \(\bar{Y_{i.}}=\frac{1}{n_i}\sum_{j=1}^{n_i}Y_{ij}\)

\(s_i^2=\frac{1}{n_i-1}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y_i})^2\)

And the grand mean is \(\bar{Y_{..}}=\frac{1}{N}\sum_{i}\sum_{j}Y_{ij}\)

Single Factor Fixed Effects Model

also known as Single Factor (One-Way) ANOVA or ANOVA Type I model.

Partitioning the Variance

The total variability of the \(Y_{ij}\) observation can be measured as the deviation of \(Y_{ij}\) around the overall mean \(\bar{Y_{..}}\): \(Y_{ij} - \bar{Y_{..}}\)

This can be rewritten as: \[ \begin{split} Y_{ij} - \bar{Y_{..}}&=Y_{ij} - \bar{Y_{..}} + \bar{Y_{i.}} - \bar{Y_{i.}} \\ &= (\bar{Y_{i.}}-\bar{Y_{..}})+(Y_{ij}-\bar{Y_{i.}}) \end{split} \] where

  • the first term is the between treatment differences (i.e., the deviation of the treatment mean from the overall mean)
  • the second term is within treatment differences (i.e., the deviation of the observation around its treatment mean)


\[ \begin{split} \sum_{i}\sum_{j}(Y_{ij} - \bar{Y_{..}})^2 &= \sum_{i}n_i(\bar{Y_{i.}}-\bar{Y_{..}})^2+\sum_{i}\sum_{j}(Y_{ij}-\bar{Y_{i.}})^2 \\ SSTO &= SSTR + SSE \\ total~SS &= treatment~SS + error~SS \\ (N-1)~d.f. &= (a-1)~d.f. + (N - a) ~ d.f. \end{split} \]

we lose a d.f. for the total corrected SSTO because of the estimation of the mean (\(\sum_{i}\sum_{j}(Y_{ij} - \bar{Y_{..}})=0\))
And, for the SSTR \(\sum_{i}n_i(\bar{Y_{i.}}-\bar{Y_{..}})=0\)

Accordingly, \(MSTR= \frac{SST}{a-1}\) and \(MSR=\frac{SSE}{N-a}\)

ANOVA Table

Source of Variation SS df MS
Between Treatments \(\sum_{i}n_i (\bar{Y_{i.}}-\bar{Y_{..}})^2\) a-1 SSTR/(a-1)
Error (within treatments) \(\sum_{i}\sum_{j}(Y_{ij}-\bar{Y_{i.}})^2\) | N-a | SSE/(N-a) |
Total (corrected) \[\sum(Y_{ij} - \bar{Y}_{..})^2\] N-1

Linear Model Explanation of ANOVA

Summary of one-Way ANOVA

Use the one-way ANOVA test to compare the mean response of a continuous dependent variable among the levels of a factor variable (if you only have two levels, use the independent-samples t-test). The observations must be independent, meaning the data generators cannot influence each other (e.g., same participant in different groups, or participants interact with each other to produce the observed outcome).

The ANOVA method decomposes the deviation of observation \(Y_{ij}\) around the overall mean \(\bar{Y}_{..}\) into two parts, the deviation of the observations around their treatment means, \(SSE\), and the deviation of the treatment means around the overall mean, \(SSR\). Their ratio, \(F = \frac{SSR}{SSE}\) follows an F-distribution with \(k-1\) numerator dof and \(N-k\) denominator dof. The more observation variance captured by the treatments, the large is \(F\), and the less likely that the null hypothesis, \(H_0 = \mu_1 = \mu_2 = \cdots = \mu_k\) is true.

Compare the F-statistic to the F-distribution with \(k-1\) numerator degrees of freedom and \(N-k\) denominator degrees of freedom

The F test does not indicate which populations cause the rejection of \(H_0\). For this, use one of the post-hoc tests: Tukey, Fisher’s Least Significant Difference (LSD), Bonferroni, Scheffe, or Dunnett.

ANOVA returns reliable results if the following conditions hold:

    1. there should be no significant outliers within the factor levels;
    1. the response variable should be approximately normally distributed within each factor level; and
    1. the the response variable variances within the factor levels should be equal.

Here is an example where you might use an ANOVA test.

Example

An experiment was performed to test the effects of drugs A and B on the lymphocyte count in mice by comparing A, B and a placebo (inactive substance) C. Fifteen mice were randomly assigned into three groups of five mice each, and the groups were given A, B and placebo C, respectively. The lymphocyte counts (in hundreds per cubic mm of blood) are summarised below:

Long Format

Descriptive analysis

Start with a column plot with 95% confidence intervals. it is unclear if the effects of each drug are significantly different from each other.

Conducting the one way anova

Checking Conditions

The ANOVA test applies when the dependent variable is continuous, the independent variable is categorical, and the observations are independent within groups. Independent means the observations should be from a random sample, or from an experiment using random assignment. Each group’s size should be less than 10% of its population size. The groups must also be independent of each other (non-paired, and non-repeated measures). Additionally, there are three conditions related to the data distribution.

  1. No outliers. There should be no significant outliers in the groups. Outliers exert a large influence on the mean and standard deviation. Test with a box plot. If it fails this condition, you might be able to drop the outliers or transform the data. Otherwise you’ll need to switch to a non-parametric test.
  2. Normality. Each group’s values should be nearly normally distributed (“nearly” because ANOVA is considered robust to the normality assumption). This condition is especially important with small sample sizes. Test with the Q-Q plots or the Shapiro-Wilk test for normality. If the data is very non-normal, you might be able to transform your response variable, or use a nonparametric test such as Kruskal-Wallis.
  3. Equal Variances. The group variances should be roughly equal. This condition is especially important when sample sizes differ. The IQR of the box plot is a good way to visually assess this condition. A rule of thumb that the largest sample standard deviation should be less than twice the size of the smallest. More formal homogeneity of variance tests include the Bartlett test, and Levene test. If the variances are very different, use a Games-Howell post hoc test for multiple comparisons instead of the Tukey post hoc test.

Outliers

Assess outliers with a box plot. The whiskers extend no further than 1.5*IQR from the upper and lower hinges. Any observations beyond the whiskers are outliers and are plotted individually. Our example includes an outlier in fertilizer group F2.

There were no outliers in the data, as assessed by inspection of a boxplot.

The data did pass the test, so what do you do? There are generally three reasons for outliers: data entry errors, measurement errors, and genuinely unusual values. If the problem’s data entry - fix it! If it’s measurement, throw it out. If it is genuine, you have some options.

  • Kruskal-Wallace H test. It is a non-parametric test. Be careful here, because it is not quite testing the same \(H_0\).
  • Transform the dependent variable. Don’t do this unless the data is also non-normal. It also has the downside of making interpretation more difficult.
  • Leave it in if it doesn’t affect the conclusion (compared to taking it out).

Normality

Quantile Quantile Plots

Shapiro Wilk test

And the Shapiro-Wilk test fails to reject the normality null hypothesis.

If the data passes the test, report

count was normally distributed, as assessed by inspection of a Q-Q plot.

or

count was normally distributed, as assessed by Shapiro-Wilk’s test (p > .05).

Had the data not been normally distributed, you would have three options:

  1. transform the dependent variable;
  2. use a non-parametric test such as Kruskal-Wallis; or
  3. carry on regardless.

Transformations will generally only work when the distribution of scores in all groups are the same shape. They also have the drawback of making the data less interpratable.

You can also choose to carry on regardless. ANOVA is considered “robust” to normality violations.

Equal Variances

The equality of sample variances condition is less critical when sample sizes are similar among the groups. One rule of thumb is that no group’s standard deviation should be more than double that of any other.

Post Hoc

Handling Non-Constant Variance

The statistical tests for the model conditions (e.g. Bartlett’s test for homogeneity) are often too sensitive. ANOVA is robust to small violations of the conditions. However, heterogeneity is a common problem in ANOVA. Tranforming the response variable can often remove the heterogeneity. Finding the correct transformation can be challenging, but the Box-Cox procedure can help. The MASS::boxcox() function calculates the profile log-likelihoods for a power transformation of the response variable \(Y^\lambda\).

\(\lambda\) \(Y^\lambda\) Transformation
2 \(Y^2\) Square
1 \(Y^1\) (no transformation)
.5 \(Y^{.5}\) Square Root
0 \(\ln(Y)\) Log
-.5 \(Y^{-.5}\) Inverse Square Root
-1 \(Y^{-1}\) Inverse

The Box-Cox procedure does not recommend any particular transformation of the data in this case.