# Analysis of variance (ANOVA)

Russian: Дисперсионный анализ (ANOVA)

## Overview

ANOVA is an acronym of Analysis of Variance technique. It is a general technique that can be used to test the hypothesis that the means among two or more groups are equal, under the assumption that the sampled populations are normally distributed[1] We assume that the total variation present in a data set is partitioned or segregated into several components. Each of these components of variation is associated with a specific source of variation. ANOVA is based on comparing the variance (or variation) between the data samples to variation within each particular sample. If the between variation is much larger than the within variation, the means of different samples will not be equal. If the between and within variations are approximately the same size, then there will be no significant difference between sample means. The term ANOVA was introduced by Sir Ronald Fisher in 1918. Further, in 1925 Fisher presented the full subdivision of the total sum of squares in his famous book Statistical Methods for Research Workers. In his honor, the statistic used in ANOVA is called an F statistic.

## Hypotheses and ANOVA F-test

We may test the null hypothesis that, all population or treatment means are equal against the alternative that the members of at least one pair are not equal. The null hypothesis examined by the independent samples t test is that two population means are equal. If more than two means are compared, repeated use of the independent-samples t test will lead to a higher Type I error rate (the experiment-wise α level) than the α level set for each t test. A better approach than the t test is to consider all means in one null hypothesis—that is, examining the plausibility of the null hypothesis with a single statistical test (F-test). This can provide a better control of the probability of falsely declaring significant differences among means. ANOVA measures two sources of variation in the data and compares their relative sizes:

• variation between groups (MSG) for each data value look at the difference between its group mean and the overall mean
• variation within groups (MSE) for each data value we look at the difference between that value and the mean of its group.

ANOVA F-statistic is a ratio of the between group variation divided by the within group variation:

$F{{=}}\dfrac{MSG}{MSE}$

A large F (larger, than critical value) is evidence against null hypothesis, since it indicates that there is more difference between groups than within groups[2].

## Types of ANOVAs

One-way ANOVA examines one factor at a time, tests for differences among levels of the factor. The null hypothesis is: there is no difference in the population means of the different levels of factor A (the only factor). The alternative hypothesis is: the means are not the same. Two-way ANOVA is a statistical procedure in which two factors can be used to explain variability in the response variable. These factors are fixed on different levels. For the 2-way ANOVA, the possible null hypotheses are:

1. There is no difference in the means of factor A
2. There is no difference in means of factor B. The alternative hypothesis for cases 1 and 2 is: the means are not equal.
3. There is no interaction between factors A and B. The alternative hypothesis for case 3 is: there is an interaction between A and B.

The n-way ANOVA examines n-factors at a time, tests for differences among levels of each factor and interaction effect.

## Assumptions of the ANOVA

The ANOVA technique is associated with three statistical assumptions:

1. Observations are randomly or independently selected from their respective populations.
2. The shape of population distributions is normal.
3. These normal populations have identical variances

ANOVA is robust enough to handle departures from normality and unequal variances. Problems occur when heterogeneity of variances is combined with unequal sample sizes. Therefore it is worthwhile to design an experiment in which the samples from the populations are equal in size.

## References

1. Graham Upton. A Dictionary of Statistics (Oxford Paperback Reference), Oxford University Press, USA; 2 edition (October 2, 2008).ISBN-10: 0199541450
2. Ellen R.Girden. ANOVA: Repeated Measures (Quantitative Applications in the Social Sciences).SAGE Publications, Inc; 1 edition (November 26, 1991).ISBN-10: 0803942575

## Useful Online Recourses

Statsoft Electronic Statistics Textbook