Content

Book Analysis of Variance

Page Analysis of variance

Page One way ANOVA

Page Two way ANOVA

Page Two way ANOVA with replicates

Page Multiple contrasts with ANOVA

Page Fully nested random ANOVA

Page Latin square ANOVA

Page Crossover tests

Page Kruskal-Wallis test

Page Friedman test

Page Equality of variance

Page Agreement analysis

Analysisof variance (ANOVA) 1

One way analysis of variance. 3

Two way analysis of variance. 6

Two way replicate analysis of variance. 9

Multiple comparisons in ANOVA.. 12

Fully nested random analysis of variance. 14

Latin square analysis of variance. 17

Crossover tests.. 22

Kruskal-Wallis test.. 26

Friedman test.. 29

Equality (homogeneity) of variance. 33

Agreement of continuous measurements.. 36

Download a free 10 day StatsDirect trial

Analysis ofvariance (ANOVA)

·One way (one factor, fixed effects)

·Two way (two factors, randomizedblocks)

·Two way with repeated observations (two factors, randomized block)

·Fully nested (hierarchicalfactors)

·Latinsquare (one primary and two secondary factors)

·Crossover (two factors, fixed effects, treatment crossover)

·Kruskal-Wallis (nonparametric one way)

·Friedman (nonparametric two way)

·Homogeneityof variance (examine the ANOVA assumption ofequal variance)

·Shapiro-WilkW (examine the ANOVA assumption of normality)

·Agreement (examine agreement of two or more samples)

Menulocation: Analysis_Analysis of Variance.

Basics

ANOVA is aset of statistical methods used mainly to compare the means of two or moresamples. Estimates of variance are the key intermediate statistics calculated,hence the reference to variance in the title ANOVA. The different types ofANOVA reflect the different experimental designs and situations for which theyhave been developed.

Excellentaccounts of ANOVA are given by Armitage & Berry(1994) and Kleinbaum et. al (1998). Nonparametric alternatives to ANOVA are discussed by Conover (1999) andHollander and Wolfe (1999).

ANOVA andregression

ANOVA can betreated as a special case of generallinear regression where independent/predicatorvariables are the nominal categories or factors. Each value that can be takenby a factor is referred to as a level. k differentlevels (e.g. three different types of diet in a study of diet on weight gain)are coded not as a single column (e.g. of diet 1 to 3) but as k-1 dummyvariables. The dependent/outcome variable in theregression consists of the study observations.

Generallinear regression can be used in this way to build more complex ANOVA modelsthan those described in this section; this is best done under expertstatistical guidance.

Fixed vs.random effects

A fixedfactor has only the levels used in the analysis (e.g. sex, age, blood group). Arandom factor has many possible levels and some are used in the analysis (e.g.time periods, subjects, observers). Some factors that are usually treated asfixed may also be treated as random if the study is looking at them as part ofa larger group (e.g. treatments, locations, tests).

Most generalstatistical texts arrange data for ANOVA into tables where columns representfixed factors and the one and two way analyses described are fixed factormethods.

Multiplecomparisons

ANOVA givesan overall test for the difference between the means of k groups. StatsDirect enables you to compare all k(k-1)/2possible pairs of means using methods that are designed to avoid the type Ierror that would be seen if you used two sample methods such as t test forthese comparisons. The multiple comparison/contrast methods offered by StatsDirect are Tukey(-Kramer), Scheffé, Newman-Keuls, Dunnett and Bonferroni (Armitage and Berry,1994; Wallenstein, 1980; Liddell, 1983; Miller, 1981; Hsu, 1996; Kleinbaum etal., 1998). See multiplecomparisons for more information.

Furthermethods

There aremany possible ANOVA designs. StatsDirect covers thecommon designs in its ANOVA section and provides general tools (see generallinear regression and dummyvariables) for building more complex designs.

Othersoftware such as SAS and Genstat provide further specific ANOVA designs. For example,balanced incomplete block design:

- withcomplete missing blocks you should consider a balanced incomplete block designprovided the number of missing blocks does not exceed the number of treatments.

		Treatments
		1	2	3	4
Blocks	A	x	x	x
B	x	x		x
C	x		x	x
D		x	x	x

Complex ANOVAshould not be attempted without expert statistical guidance. Beware situationswhere over complex analysis is used in order to compensate for poorexperimental design. There is no substitute for good experimental design.

Download a free 10 day StatsDirect trial

One wayanalysis of variance

Menulocation: Analysis_Analysis of Variance_One Way.

This functioncompares the sample means for k groups. There is an overall test for k means, multiplecomparison methods for pairs of means and testsfor the equalityof the variances of the groups.

Consider fourgroups of data that represent one experiment performed on four occasions withten different subjects each time. You could explore the consistency of theexperimental conditions or the inherent error of the experiment by using oneway analysis of variance (ANOVA), however, agreementanalysis might be more appropriate. One way ANOVAis more appropriate for finding statistical evidence of inconsistency ordifference across the means of the four groups.

One way ANOVAassumes that each group comes from an approximately normal distribution andthat the variability within the groups is roughly constant. The factors are arranged so that experiments are columns and subjectsare rows, this is how you must enter your data in the StatsDirectworkbook. The overall F test is fairly robust to small deviations from theseassumptions but you could use the Kruskal-Wallis test as an alternative to one way ANOVA if there was anydoubt.

Numerically,one way ANOVA is a generalisation of the two sample ttest. The F statistic compares the variability between the groups to thevariability within the groups:

- where F isthe variance ratio for the overall test, MST is the mean square due totreatments/groups (between groups), MSE is the mean square due to error (withingroups, residual mean square), Yij is an observation,Ti is a group total, G is the grand total of all observations, ni is the number in group i and nis the total number of observations.

Assumptions:

·random samples

·normally distributed observations in eachpopulation

·equal variance of observations in each population

- the homogeneityof variance option (marked as "Equality ofvariance tests (Levene, Bartlett)" in the ANOVAresults window) can be used to test the variance assumption. The Shapiro-Wilk test can be used to look for evidence of non-normality.The most commonly unacceptable deviation from the assumptions is inequality ofvariance when the groups are of unequal sizes.

A significantoverall test indicates a difference between the population means for the groupsas a whole; you may then go on to make multiplecomparisons between the groups but this"dredging" should be avoided if possible.

If the groupsin this example had been a series of treatments/exposures to which subjectswere randomly allocated then a two wayrandomized block design ANOVA should have beenused.

Example

From Armitageand Berry (1994, p. 214).

Test workbook(ANOVA worksheet: Expt 1, Expt2, Expt 3, Expt4).

The followingdata represent the numbers of worms isolated from the GI tracts of four groupsof rats in a trial of carbon tetrachloride as an anthelminthic.These four groups were the control (untreated) groups.

Expt 1	Expt 2	Expt 3	Expt 4
279	378	172	381
338	275	335	346
334	412	335	340
198	265	282	471
303	286	250	318

To analyse these data in StatsDirectyou must first prepare them in four workbook columns appropriately labelled. Alternatively, open the test workbook using thefile open function of the file menu. Then select One Way from the Analysis ofVariance section of the analysis menu. Select the columns marked "Expt1", "Expt 2","Expt 3" and"Expt 4" in one action when prompted fordata.

For thisexample:

One way analysisof variance

Variables: Expt 1, Expt 2, Expt 3, Expt 4

Source of Variation	Sum Squares	DF	Mean Square
Between Groups	27234.2	3	9078.066667
Within Groups	63953.6	16	3997.1
Corrected Total	91187.8	19

F (varianceratio) = 2.271163 P = .1195

The nullhypothesis that there is no difference in mean worm counts across the fourgroups is held. If we had rejected this null hypothesis then we would have hadto take a close look at the experimental conditions to make sure that allcontrol groups were exposed to the same conditions.

P values

multiple comparisons

analysis of variance

Technicalvalidation

The AmericanNational Institute of Standards and Technology provide Statistical ReferenceDatasets for testing statistical software (McCullough andWilson, 1999; http://www.nist。gov.itl/div898/strd). The results below for the SiResitsdata set are given to 12 decimal places, StatsDirect provides 15 decimal places of accuracyinternally.

One wayanalysis of variance

Variables:Instrument 1, Instrument 2, Instrument 3, Instrument 4, Instrument 5

Source of Variation	Sum Squares	DF	Mean Square
Between Groups	0.0511462616	4	0.0127865654
Within Groups	0.21663656	20	0.010831828
Corrected Total	0.2677828216	24

F (varianceratio) = 1.180462374402 P = .3494

Download a free 10 day StatsDirect trial

Two wayanalysis of variance

Menulocation: Analysis_Analysis of Variance_Two Way.

This functioncalculates ANOVA for a two way randomized block experiment. There are overall tests for differencesbetween treatment means and between block means. Multiplecomparison methods are provided for pairs oftreatment means.

Consider dataclassified by two factors such that each level of one factor can be combinedwith all levels of the other factor:

		Treatment (i, 1 to k)
		1	2	3	k
Block (j, 1 to b)	1	Yij
	2
	3
	.
	b

In theexample below there is a study of different treatments on clotting times.Response/outcome variable Y is the observed clotting time for blood samples.Blocks are individuals who donated a blood sample. Treatments are differentmethods by which portions of each of the blood samples are processed.

Unlike oneway ANOVA, the F tests for two way ANOVA are the sameif either or both block and treatment factors are considered fixed orrandom:

- where F isthe variance ratio for tests of equality of treatment and block means, MST isthe mean square due to treatments/groups (between groups), MSB is the meansquare due to blocks (between blocks), MSE is the mean square due to error(within groups, residual mean square), Yij is anobservation, Y bar i. is a treatment group mean, Ybar .j is a block mean and Y bar .. isthe grand mean of all observations.

If you wishto use a two way ANOVA but your data are clearly non-normal then you shouldconsider using the Friedman test, a nonparametric alternative.

Please notethat many statistical software packages and texts present multiple comparisonmethods for treatment group means only in the context of one way ANOVA. StatsDirect extends this to two wayANOVA by using the treatment group mean square from two way ANOVA for multiplecomparisons. Treatment effects must be fixed for this use of multiple comparisons to be valid. See Hsu (1996) for further discussion.

Example

From Armitageand Berry (1994, p. 241).

Test workbook(ANOVA worksheet: Treatment 1, Treatment 2, Treatment 3, Treatment4).

The followingdata represent clotting times (mins) of plasma fromeight subjects treated in four different ways. The eight subjects (blocks) wereallocated at random to each of the four treatment groups.

Treatment 1	Treatment 2	Treatment 3	Treatment 4
8.4	9.4	9.8	12.2
12.8	15.2	12.9	14.4
9.6	9.1	11.2	9.8
9.8	8.8	9.9	12.0
8.4	8.2	8.5	8.5
8.6	9.9	9.8	10.9
8.9	9.0	9.2	10.4
7.9	8.1	8.2	10.0

To analyse these data in StatsDirectyou must first prepare them in four workbook columns appropriately labelled. Alternatively, open the test workbook using thefile open function of the file menu. Then select Two Way from the Analysis of Variancesection of the analysis menu. Select the columns marked "Treatment 1","Treatment 2","Treatment 3" and "Treatment 4" inone action when prompted for data.

For thisexample:

Two way randomized block analysis of variance

Variables:Treatment 1, Treatment 2, Treatment 3, Treatment 4

Source of Variation	Sum Squares	DF	Mean Square
Between blocks (rows)	78.98875	7	11.284107
Between treatments (columns)	13.01625	3	4.33875
Residual (error)	13.77375	21	0.655893
Corrected total	105.77875	31

F (VR betweenblocks) = 17.204193 P < .0001

F (VR betweentreatments) = 6.615029 P = .0025

Here we cansee that there was a statistically highly significant difference between meanclotting times across the groups. The difference between subjects is of noparticular interest here.

P values

multiple comparisons

analysis of variance

Download a free 10 day StatsDirect trial

Two wayreplicate analysis of variance

Menulocation: Analysis_Analysis of Variance_Replicate Two Way.

This functioncalculates ANOVA for a two way randomized block experiment with repeated observations for eachtreatment/block cell. There are overall tests for differences between treatmentmeans, between block means and block/treatment interaction. Multiplecomparison methods are provided for pairs oftreatment means.

Consider dataclassified by two factors such that each level of one factor can be combinedwith all levels of the other factor:

For repeatedY observations 1 to r:

		Treatment (i, 1 to k)
		1	2	3	k
Block (j, 1 to b)	1	Yijr
	2
	3
	.
	b

In theexample below there is a study of different treatments on clotting times.Response/outcome variable Y is the measured r times for clotting time for bloodsamples. Blocks are individuals who donated a blood sample. Treatments aredifferent methods by which portions of each of the blood samples are processed.

The simple two way randomized block design assumes that theblock (row) and treatment (column) effects are additive. This means that apart from experimental error,the difference in effect between any two blocks is the same for all treatmentand vice versa. If these effects are not additive thenthere is a row-column interaction that should be investigated by repeating theobservations for each block.

StatsDirect compensates for missing observations in the replicates (repeatobservations) by estimating them as the mean of the replicates present and byreducing the degrees of freedom, you should avoid this situation if possible.

Datapreparation

Enter eachset of replicates as a separate two way table of treatment columns and blockrows.

Example

From Armitageand Berry (1994, p. 243).

Testworkbook (ANOVA worksheet: T1(rep 1), T2(rep 1), T3(rep 1), T1(rep 2), T2(rep 2),T3(rep 2), T1(rep 3), T2(rep 3), T3(rep 3)).

The followingdata represent clotting times (mins) from threesubjects treated in three different ways. The plasma samples were allocatedrandomly to the treatments and the analysis was repeated three times for eachsample.

Treatment	A	B	C
Subject 1	9.8	9.9	11.3
	10.1	9.5	10.7
	9.8	10.0	10.7

Subject 2	9.2	9.1	10.3
	8.6	9.1	10.7
	9.2	9.4	10.2

Subject 3	8.4	8.6	9.8
	7.9	8.0	10.1
	8.0	8.0	10.1

To analyse these data in StatsDirectyou must first prepare them in nine workbook columns:

r =repeat/replicate observation

T = treatment

T1 (r 1)	T2 (r 1)	T3 (r 1)	T1 (r 2)	T2 (r 2)	T3 (r 2)	T1 (r 3)	T2 (r 3)	T3 (r 3)
9.8	9.9	11.3	10.1	9.5	10.7	9.8	10.0	10.7
9.2	9.1	10.3	8.6	9.1	10.7	9.2	9.4	10.2
8.4	8.6	9.8	7.9	8.0	10.1	8.0	8.0	10.1

Alternatively,open the test workbook using the file open function of the file menu. Thenselect Replicate Two Way from the analysis of variance section of the analysismenu. Enter the number of repeats as three and select the columns marked "T1 (rep 1)" etc. whenprompted for the subject (row) by treatment (column) data for each repeat.

For this example:

Two way randomized block analysis of variance with repeatedobservations

Variables:(T1 (rep 1), T2 (rep 1), T3 (rep 1)) (T1 (rep 2), T2 (rep 2), T3 (rep 2)) (T1(rep 3), T2 (rep 3), T3 (rep 3))

Source of Variation	Sum Squares	DF	Mean Square
Blocks (rows)	9.26	2	4.63
Treatments (columns)	11.78	2	5.89
Interaction	0.74	4	0.185
Residual (error)	1.32	18	0.073333
Corrected total	23.1	26

F (VR blocks)= 63.136364 P < .0001

F (VRtreatments) = 80.318182 P < .0001

F (VRinteraction) = 2.522727 P = .0771

Here we see astatistically highly significant difference between mean clotting times acrossthe groups. If the F statistic for interaction had been significant then therewould have been little point in drawing conclusions about independent block andtreatment effects from the other F statistics.

P values

multiple comparisons

analysis of variance

Download a free 10 day StatsDirect trial

Multiplecomparisons in ANOVA

StatsDirect provides functions for multiple comparison(simultaneous inference), specifically all pairwisecomparisons and all comparisons with a control. For k groups there are k(k-1)/2 possible pairwisecomparisons.

Tukey (Tukey-Kramer if unequal group sizes), Scheffé,Bonferroni and Newman-Keulsmethods are provided for all pairwise comparisons (Armitageand Berry, 1994; Wallenstein, 1980; Miller, 1981; Hsu, 1996; Kleinbaum et al.,1998). Dunnett's methodis used for multiple comparisons with a control group (Hsu, 1996).

For k groups,ANOVA can be used to look for a difference across k group means as a whole. Ifthere is a statistically significant difference across k means then a multiplecomparison method can be used to look for specific differences between pairs ofgroups. The reason that two sample methods should not be used to make multiple pairwise comparisons is that they are not designed forrepeat testing in a "data dredging" manner.

If 20 repeat pairwise tests are made then you can not accept theconventional 1 in 20chance of being wrong as a cut off level for statistical inference, i.e. thereis a higher risk of type I error. A simple solution to this problem is to reduce thecut-off for statistical significance with increasing numbers of contrasts made;Bonferroni's method does just this with multiple ttests. More sophisticated methods, such as Tukey(-Kramer), consider thestatistical distributions associated with systematic repeated testing; both Tukey(-Kramer) and Newman-Keulsmethods are based upon the Studentized range statistic. Scheffé's methodgives a very conservative/cautious weighting against the risk of type I errorand is therefore less powerful for the detection of true differences. The mostacceptable general method for all pairwisecomparisons is Tukey(-Kramer), the P values for which are exact with balanceddesigns (Hsu,1996).

The outputsfrom the different multiple contrast methods are displayed in decreasing orderbased upon on the absolute value of the difference between the means of the twogroups compared for each contrast. The word "stop" is shown next tothe first non-significant P value, this indicates that you should not considerfurther contrasts if you are making a simultaneous analysis (similar to theShaffer-Holm method).

The followingis a decision tree for selecting a multiple contrast method:

·pairwise

·equal groups sizes: Tukey

·unequal group sizes: Tukey-Krameror Scheffé

·not pairwise

·with a control: Dunnett

·planned: Bonferroni

·not planned: Scheffé

Note that Bonferroni and Scheffé methodsare completely general; they can be used for unplanned (a posteriori) orplanned (a priori) multiple comparisons.

This is acontroversial area in statistics and you would be wise to seek the advice of astatistician at the design stage of your study. In general you should designexperiments so that you can avoid having to "dredge" groups of datafor differences, decide which contrasts you are interested in at the outset.Note that multiple independent comparisons (e.g. multiple t or Mann-Whitneytests) may be justified if you identify the comparisons as valid at the designstage of your investigation.

Otherstatistical software may refer to LSD (least significant difference) methods,please note that the Bonferroni technique describedabove is an LSD method.

analysis of variance

Download a free 10 day StatsDirect trial

Fully nestedrandom analysis of variance

Menulocation: Analysis_Analysis of Variance_Fully Nested.

This functioncalculates ANOVA for a fully nested random (hierarchical or split-plot)study design. One level of subgrouping is supportedand subgroups may be of unequal sizes. Corrected treatment and subgroup meansare given.

You shouldseek expert statistical guidance before using this method.

If eachtreatment/exposure group in a study contains treatment/exposure sub-groups thendata for nested analysis of variance may be set out as follows:

Hospital 1	Hospital 2
ward 1	ward 2	ward 3	ward 1	ward 2	ward 3
x	x	x	x	x	x	<--- patients
x	x	x	x	x	x
x	x	x	x	x	x
x	x		x	x	x
x		x		x
x				x

The effects (treatments and their subgroups) in this type of study areoften random but the same basic calculations are used for models with fixedeffects or for mixed models. The variance ratios given are based on a fixedeffects model but you can use the mean square results to calculate any othervariance ratio of interest. A good account is given by Snedecor and Cochran(1989).

·For a fixed effects model use the "F (VRbetween groups)" statistic.

·For a random effects model use the "F (usinggroup/subgroup msqr)" statistic.

TechnicalValidation

ANOVA for athree factor fully random nested (split-plot) model is calculated as follows (Snedecor and Cochran, 1989):

- where Xijk is the kthobservation from the jth subgroup of the ith group, g is the number of groups, SStotal is the total sum of squares, SSgroups is the sum of squares due to thegroup factor, SSsubgroups (group i) is the sum of squares due to the subgroup factor ofgroup i, siis the number of subgroups in the ith group, nij is the number of observations in the jth subgroup of the ithgroup and N is the total number of observations.

Hocking(1985) describes potential instabilities of this calculation, you shouldtherefore seek expert statistical guidance before using it.

Example

From Snedecor &Cochran (1989).

Test workbook(ANOVA worksheet: P1L1, P1L2, P1L3, P2L1,P2L2, P2L3, P3L1, P3L2,P3L3, P4L1, P4L2, P4L3).

The followingdata represent calcium measurements from the leaves of turnip greens. Thegroups represent 4 plants and the subgroups represent 3 leaves taken from eachplant. 2 samples were taken from each leaf for calcium measurement.

	Plant 1
leaf 1	leaf 2	leaf 3
x	x	x	<--- sample
x	x	x

To analyse these data in StatsDirectyou must first enter them in the workbook using a separate column for eachsubgroup:

P = plant

L = leaf

P1L1	P1L2	P1L3	P2L1	P2L2	P2L3	P3L1	P3L2	P3L3	P4L1	P4L2	P4L3
3.28	3.52	2.88	2.46	1.87	2.19	2.77	3.74	2.55	3.78	4.07	3.31
3.09	3.48	2.80	2.44	1.92	2.19	2.66	3.44	2.55	3.87	4.12	3.31

Alternatively,open the test workbook using the file open function of the file menu. Thenselect Fully Nested from the analysis of variance section of the analysis menu.Enter the number of groups as four and then select the four sets of subgroupsmarked "P1L1"(i.e. Plant 1 Leaf 1) etc.. Each subgroup should beselected by a single selection action.

For this example:

Fullynested/hierarchical random analysis of variance

Variables: (P1L1, P1L2, P1L3)(P2L1, P2L2, P2L3) (P3L1,P3L2, P3L3) (P4L1, P4L2,P4L3)

Source of Variation	Sum Squares	DF	Mean Square
Between Groups	7.560346	3	2.520115
Between Subgroups within Groups	2.6302	8	0.328775
Residual	0.07985	12	0.006654
Total	10.270396	23

F (VR betweengroups) = 378.727406 P < .0001

F (usinggroup/subgroup msqr) = 7.665167 P = .0097

F (VR betweensubgroups within groups) = 49.408892 P < .0001

The "F (VRbetween groups)" statistic assumes a fixed effects model. For thisexample, which assumes random effects, use the "F (using group/subgroup msqr)" statistic, this treats the residual sum ofsquares as the samples sum of squares. For the null hypothesis of zero groupvariance, consider 2.5201/0.3288 (= 7.66 on an F(3,8)distribution) instead of 2.5201/0.0067 (= 379 on an F(3,12) distribution)because the point of randomization has been re-defined. The "F (VR between subgroups within groups" statistic clearly rejectsthe null hypothesis of zero subgroup-in-group variance.

The analysisshows that the plants contribute most to the overall variability and the leavesalso have a statistically significant contribution. The samples from each leaf,as reflected by the residual sum of squares, contribute relatively little tothe overall variability. Sampling further plants or leaves is, therefore, moreimportant than taking multiple samples per leaf.

P values

analysis of variance

Download a free 10 day StatsDirect trial

Latin squareanalysis of variance

Menulocation: Analysis_Analysis of Variance_Latin.

This functioncalculates ANOVA for a special three factor design known as Latin squares.

The Latinsquare design applies when there are repeated exposures/treatments and twoother factors. This design avoids the excessive numbers required for full threeway ANOVA.

An example of a Latin square design is the response of 5 differentrats (factor 1) to 5 different treatments (repeated blocks A to E) when housedin 5 different types of cage (factor 2):

		Rat
		1	2	3	4	5
	1	A	E	C	D	A
	2	E	B	A	B	C
Cage	3	C	D	E	D	D
	4	D	C	B	C	B
	5	B	A	D	A	E

This special sort of balancing means that the systematicvariation between rows, or similarity between columns, does not affect thecomparison of treatments.

The Latin square is probably under used in most fields ofresearch because text book examples tend to be restricted to agriculture, thearea which spawned most original work on ANOVA. Agricultural examples oftenreflect geographical designs where rows and columns are literally twodimensions of a grid in a field. Rows and columns can be any two sources ofvariation in an experiment. In this sense a Latin square is a generalisation of a randomized block design with twodifferent blocking systems. Armitage and Berry(1994) discuss medical applications of this method. Further details aregiven by Cochranand Cox (1957).

The varianceratio test statistics given by StatsDirect for thisdesign are valid only when an additive model applies (Armitage and Berry,1994).

Automaticcolumn comparisons are not given here. If you want to make linear contrastsbetween row or column means then you can use the residual mean square of the Latinsquare as the variance estimate. This estimate is not reliable if the additivemodel does not apply.

TechnicalValidation

The Latinsquare ANOVA for three factors without interaction is calculated as follows (Armitageand Berry, 1994; Cochran and Cox, 1957):

- where Xijk is the observation from the ithrow of the jth column with the kth treatment, G is the grand total of allobservations, Ri is the total for the ith row, Cj is thetotal for the jth column, Tkis the total for the kth treatment, SStotal is the total sum of squares, SSrows is the sum of squares due to the rows, SScolumns is the sum of squares due to the columns, SStreatments is the sum of squares due to thetreatments and a is the number of rows, columns or treatments.

Example

From Armitageand Berry (1994, p. 236).

Test workbook(ANOVA worksheet: Observations; Rabbit; Position; Order).

Armitagequotes a paper which reported an experiment that had been designed as a Latinsquare. The skins of rabbits' backs were inoculated with a diffusing factor insix separate sites. Six rabbits were therefore used and the order in which thesites were inoculated was done six different ways. The outcome measured wasarea of blister (cm²). The overall objective was to see whether or not theorder of administration affected this outcome. The experimental design and dataare represented in the Latin square below.

		Rabbit
		1	2	3	4	5	6
	a	iii	v	iv	i	vi	ii
		7.9	8.7	7.4	7.4	7.1	8.2

	b	iv	ii	vi	v	iii	i
		6.1	8.2	7.7	7.1	8.1	5.9

	c	i	iii	v	vi	ii	iv
		7.5	8.1	6	6.4	6.2	7.5
Position
	d	vi	i	iii	ii	iv	v
		6.9	8.5	6.8	7.7	8.5	8.5

	e	ii	iv	i	iii	v	vi
		6.7	9.9	7.3	6.4	6.4	7.3

	f	v	vi	ii	iv	i	iii
		7.3	8.3	7.3	5.8	6.4	7.7

To analyse these data in StatsDirect you must first enter them in the workbook witha separate column for the observations, the column classifier (factor 1), therow classifier (factor 2) and the treatment/Latin/randomisedclassifier (factor 3):

Observation	Rabbit	Position	Order
7.9	1	1	3
6.1	1	2	4
医学考研网 7.5	1	3	1
6.9	1	4	6
6.7	1	5	2
7.3	1	6	5
8.7	2	1	5
8.2	2	2	2
8.1	2	3	3
8.5	2	4	1
9.9	2	5	4
8.3	2	6	6
7.4	3	1	4
7.7	3	2	6
6	3	3	5
6.8	3	4	3
7.3	3	5	1
7.3	3	6	2
7.4	4	1	1
7.1	4	2	5
6.4	4	3	6
7.7	4	4	2
6.4	4	5	3
5.8	4	6	4
7.1	5	1	6
8.1	5	2	3
6.2	5	3	2
8.5	5	4	4
6.4	5	5	5
6.4	5	6	1
8.2	6	1	2
5.9	6	2	1
7.5	6	3	4
8.5	6	4	5
7.3	6	5	6
7.7	6	6	3

Alternatively,open the test workbook using the file open function of the file menu. Thenselect Latin square from the analysis of variance section of the analysis menu.First select the observations data, then the column, row and treatmentclassifiers respectively.

For thisexample:

Latinsquare test

Factors:Rabbit, Position, Order.

Source of Variation	Sum Squares	DF	Mean Square
Rows	3.833333	5	0.766667
Columns	12.833333	5	2.566667
Treatments	0.563333	5	0.112667
Residual	13.13	20	0.6565
Total	30.36	35

F (rows) =1.167809, P = .3592

F (columns) =3.909622, P = .0124

F(treatments) = 0.171617, P = .9701

Here we seethat the order of administration does not have a statistically significanteffect on blistering but that inter-rabbit variation was significant.

P values

analysis of variance

Download a free 10 day StatsDirect trial

Crossovertests

Menulocation: Analysis_Analysis of Variance_Crossover.

This functioncalculates a number of test statistics for simple crossover trials.

If a group ofsubjects is exposed to two different treatments A and B then a crossover trialwould involve half of the subjects being exposed to A then B and the other halfto B then A. A washout period is allowed between the two exposures and thesubjects are randomly allocated to one of the two orders of exposure. Theperiods when the groups are exposed to the treatments are known as period 1 andperiod 2. This function evaluated treatment effects, period effects andtreatment-period interaction. For further information please refer to Armitageand Berry (1994).

Please notethat the treatment-period interaction statistic is included for interest only;two-stage procedures are not now recommended for crossover trials (Senn,1993).

TechnicalValidation

Statisticsfor the analysis of crossover trials, with optional baseline run-inobservations, are calculated as follows (Armitage andBerry, 1994; Senn, 1993):

- where mis the number of observations in the first group (say drug first); n isthe number of observations in the second group (say placebo first); XDi is an observation from the drug treated arm inthe first group; XPi is an observation fromthe placebo arm in the first group; XDj is anobservation from the drug treated arm in the second group; XPjis an observation from the placebo arm in the second group; trelativeis the test statistic, distributed as Student t on n+m-1 degrees offreedom, for the relative effectiveness of drug vs. placebo; ttp is the test statistic, distributed as Student ton n+m-2 degrees of freedom, for the treatment-period interaction; and ttreatment and tperiodare the test statistics, distributed as Student t on n+m-2 degrees offreedom, for the treatment and period effect sizes respectively (nullhypothesis = 0). Any baseline observations are subtracted from the relevantobservations before the above are calculated.

Example

From Armitageand Berry (1994, p. 247).

Test workbook(ANOVA worksheet: Drug 1, Placebo 1, Drug 2, Placebo2).

The followingdata represent the number of dry nights out of 14 in two groups of bedwetters.The first group were treated with drug X and then a placebo and the secondgroup were treated with the placebo then drug x. An acceptable washout periodwas allowed between these two treatments.

Drug 1	Placebo 1	Drug 2	Placebo 2
8	5	11	12
14	10	8	6
8	0	9	13
9	7	8	8
11	6	9	8
3	5	8	4
6	0	14	8
0	0	4	2
13	12	13	8
10	2	7	9
7	5	10	7
13	13	6	7
8	10
7	7
9	0
10	6
2	2

To analyse these data in StatsDirectyou must first prepare them in four workbook columns appropriately labelled. Alternatively, open the test workbook using thefile open function of the file menu. Then select Crossover from the Analysis ofVariance section of the analysis menu. Select the column labelled"Drug 1" when asked for drug 1, then "Placebo 1" forplacebo 1. Click on the cancel button when you are asked for baseline levels.Repeat this process for drug 2 and placebo 2.

For thisexample:

Crossovertests

	Period 1	Period 2	Difference
Group 1	8.117647	5.294118	2.823529
Group 2	7.666667	8.916667	-1.25

Test forrelative effectiveness of drug / placebo:

combineddiff = 2.172414, SE = 0.61602

t = 3.526533,DF = 28, P = .0015

Test fortreatment effect:

diff 1 -diff 2 = 4.073529, SE = 1.2372

effectmagnitude = 2.036765, 95% CI = 0.767502 to 3.306027

t = 3.292539,DF = 27, P = .0028

Test forperiod effect:

diff 1 +diff 2 = 1.573529, SE = 1.2372

t = 1.271847,DF = 27, P = .2143

Test fortreatment / period interaction:

sum 1 -sum 2 = -3.171569, SE = 2.440281

t =-1.299673, DF = 27, P = .2047

The absenceof a statistically significant period effect or treatment period interactionpermits the use of the statistically highly significant statistic for effect ofdrug vs. placebo. With 95% confidence we can say that the true population valuefor the magnitude of the treatment effect lies somewhere between 0.77 and 3.31extra dry nights each fortnight.

P values

analysis of variance

Download a free 10 day StatsDirect trial

Kruskal-Wallistest

Menulocation: Analysis_Analysis of Variance_Kruskal-Wallis.

This is amethod for comparing several independent random samples and can be used as anon-parametric alternative to the one way ANOVA.

The Kruskal-Wallis test statistic for k samples, each of size ni is:

- where N is the total number (all ni)and Ri is the sum of the ranks (from all samplespooled) for the ith sample and:

The nullhypothesis of the test is that all k distribution functions are equal. Thealternative hypothesis is that at least one of the populations tends to yieldlarger values than at least one of the other populations.

Assumptions:

·random samples from populations

·independence within each sample

·mutual independence among samples

·measurement scale is at least ordinal

·either k population distribution functions areidentical, or else some of the populations tend to yield larger values thanother populations

If the testis significant, you can make multiple comparisons between the samples. You maychoose the level of significance for these comparisons (default is a = 0.05). All pairwisecomparisons are made and the probability of each presumed"non-difference" is indicated (Conover, 1999;Critchlow and Fligner, 1991; Hollander and Wolfe, 1999). Two alternative methods are used to make all possible pairwise comparisons between groups; these are Dwass-Steel-Critchlow-Fligner and Conover-Inman. In mostsituations, you should use the Dwass-Steel-Critchlow-Flignerresult.

By the Dwass-Steel-Critchlow-Fligner procedure, a contrast isconsidered significant if the following inequality is satisfied:

- where q isa quantile from the normal range distribution for kgroups, ni is size of the ithgroup, nj is the size of the jthgroup, tb is the number of ties at rank b and Wij is the sum of the ranks for the ithgroup where observations for both groups have been ranked together. The valueseither side of the greater than sign are displayed in parentheses in StatsDirect results.

TheConover-Inman procedure is simply Fisher's least significant difference methodperformed on ranks. A contrast is considered significant if the followinginequality is satisfied:

- where t is a quantile from theStudent t distribution on N-k degrees of freedom. The values either side of thegreater than sign are displayed in parentheses in StatsDirectresults.

Analternative to Kruskal-Wallis is to perform a one wayANOVA on the ranks of the observations.

StatsDirect also gives you an homogeneity of variance testoption with Kruskal-Wallis; this is marked as"Equality of variance (squared ranks)". Please refer to homogeneityof variance for more details.

TechnicalValidation

The teststatistic is an extension of the Mann-Whitney test and is calculated as above.In the presence of tied ranks the test statistic is given in adjusted andunadjusted forms, (opinion varies concerning the handling of ties). The teststatistic follows approximately a chi-square distribution with k-1 degrees offreedom; P values are derived from this. For small samples you may wish torefer to tables of the Kruskal-Wallis test statisticbut the chi-square approximation is highly satisfactory in most cases (Conover,1999).

Example

From Conover(1999, p. 291).

Test workbook(ANOVA worksheet: Method 1, Method 2, Method 3, Method4).

The followingdata represent corn yields per acre from four different fields where differentfarming methods were used.

Method 1	Method 2	Method 3	Method 4
83	91	101	78
91	90	100	82
94	81	91	81
89	83	93	77
89	84	96	79
96	83	95	81
91	88	94	80
92	91		81
90	89
	84

To analyse these data in StatsDirectyou must first prepare them in four workbook columns appropriately labelled. Alternatively, open the test workbook using thefile open function of the file menu. Then select Kruskal-Wallisfrom the Non-parametric section of the analysis menu. Then select the columns marked "Method 1", "Method2", "Method 3" and "Method 4" in one selection action.

For thisexample:

Adjusted forties: T = 25.62883 P < 0.0001

All pairwise comparisons (Dwass-Steel-Chritchlow-Fligner)

Method 1 andMethod 2 , P = 0.1529

Method 1 andMethod 3 , P = 0.0782

Method 1 andMethod 4 , P = 0.0029

Method 2 andMethod 3 , P = 0.0048

Method 2 andMethod 4 , P = 0.0044

Method 3 andMethod 4 , P = 0.0063

All pairwise comparisons (Conover-Inman)

Method 1 andMethod 2, P = 0.0078

Method 1 andMethod 3, P = 0.0044

Method 1 andMethod 4, P < 0.0001

Method 2 andMethod 3, P < 0.0001

Method 2 andMethod 4, P = 0.0001

Method 3 andMethod 4, P < 0.0001

From theoverall T we see a statistically highly significant tendency for at least onegroup to give higher values than at least one of the others. Subsequentcontrasts show a significant separation of all groups with the Conover-Inmanmethod and all but method 1 vs. methods 2 and 3 with the Dwass-Steel-Chritchlow-Flignermethod. In most situations, it is best to use only the Dwass-Steel-Chritchlow-Flignerresult.

P values

analysis of variance

Download a free 10 day StatsDirect trial

Friedmantest

Menulocation: Analysis_Analysis of Variance_Friedman.

This methodcompares several related samples and can be used as a non-parametricalternative to the two way ANOVA.

The power ofthis method is low with small samples but it is the best method fornon-parametric two way analysis of variance with sample sizes above five.

The Iman-Davenport T2 variant of the Friedman test statisticis:

- where there are k treatments and b blocks and T1 is:

- where Rj is the sum of the ranks(from pooled observations) for all blocks in a one treatment and A1 and C1 are:

Assumptions:

·results in one block don't affect results in otherblocks

·observations in a block are ranked by a criterion ofinterest

The nullhypothesis of the test is that the treatments have identical effects. Thealternative hypothesis is that at least one of the treatments tends to yieldlarger values than at least one of the other treatments.

When the testis significant StatsDirect allows you to makemultiple comparisons between the individual samples. These comparisons areperformed automatically for all possible contrasts and you are informed of thestatistical significance of each contrast. A contrast is considered significantif the following inequality is satisfied:

- where t is a quantile from theStudent t distribution on (b-1)(k-1) degrees of freedom. This method is anonparametric equivalent to Fisher's least significant difference method (Conover,1999).

Analternative to the Friedman test is to perform two wayANOVA on ranks; this is how the T2 statistic was derived.

Cochran'sQ, Kendall's W and Quade

Kendall's W coefficient of concordance test (also attributed toWallis and Babington-Smith independently) gives the same numerical answers asFriedman's test.

Quade'sproposed a slightly different method for testing the same hypotheses asdescribed above for Friedman's method. Friedman's test generally performsbetter than Quade's test and should be used instead.

Cochran's Qtest can be performed using this Friedman test function by entering dichotomousdata coded as in the example below (Conover, 1999):

	Sportsman
Game	1	2	3
1	1	1	1
2	1	1	1
3	0	1	0
4	1	1	0
5	0	0	0
6	1	1	1
7	1	1	1
8	1	1	0
9	0	0	1
10	0	1	0
11	1	1	1
12	1	1	1

- hereConover (1999) describes how three people ran their own separate scoringsystems for predicting the outcomes of basketball games; the table above shows1 if they predicted the outcome correctly and 0 if not for 12 games.

TechnicalValidation

The overalltest statistic is T2 calculated as above (Iman and Davenport,1980). T2 is approximately distributed as an Frandom variable with k-1 numerator and (b-1)(k-1)denominator degrees of freedom, this is how the P value is derived. Olderliterature and some software uses an alternative statistic that is testedagainst a chi-square distribution, the method used in StatsDirectis more accurate (Conover, 1999).

Example

From Conover(1999, p. 372).

Test workbook(ANOVA worksheet: Grass 1, Grass 2, Grass 3, Grass 4).

The followingdata represent the rank preferences of twelve home owners for four differenttypes of grass planted in their gardens for a trial period. They considereddefined criteria before ranking each grass between 1 (best) and 4 (worst).

Grass 1	Grass 2	Grass 3	Grass 4
4	3	2	1
4	2	3	1
3	1.5	1.5	4
3	1	2	4
4	2	1	3
2	2	2	4
1	3	2	4
2	4	1	3
3.5	1	2	3.5
4	1	3	2
4	2	3	1
3.5	1	2	3.5

To analyse these data in StatsDirectyou must first prepare them in four workbook columns appropriately labelled. Alternatively, open the test workbook using thefile open function of the file menu. Then select Friedman from theNon-parametric section of the analysis menu. Then select the columns marked "Grass 1", "Grass2", "Grass 3" and "Grass 4" in one selection action.

For thisexample:

T2 = 3.192198P = 0.0362

All pairwise comparisons (Conover)

Grass 1 vs.Grass 2, P = 0.0149

Grass 1 vs.Grass 3, P = 0.0226

Grass 1 vs.Grass 4, P = 0.4834

Grass 2 vs.Grass 3, P = 0.8604

Grass 2 vs.Grass 4, P = 0.0717

Grass 3 vs.Grass 4, P = 0.1017

From theoverall test statistic we can conclude that there is a statisticallysignificant tendency for at least one group to yield higher values than atleast one of the other groups. Considering the raw data and the contrastresults we see that grasses 2 and 3 are significantly preferred above grass 1but that there is little to choose between 2 and 3.

P values

analysis of variance

Download a free 10 day StatsDirect trial

Equality(homogeneity) of variance

Menulocation: Analysis_Analysis of Variance(_Oneway, _Kruskal-Wallis).

StatsDirect provides parametric (Bartlet and Levene) and nonparametric (squared ranks) tests forequality/homogeneity of variance.

Most commonlyused statistical hypothesis tests, such as t tests, compare means or othermeasures of location. Some studies need to compare variability also. Equalityof variance tests can be used on their own for this purpose but they are oftenused alongside other methods (e.g. analysis of variance) to support assumptionsmade about variance. For this reason, StatsDirectpresents equality of variance tests with analysis of variance.

Bartlettand Levene (parametric) tests

Two samples

Use the Ftest to compare the variances of two randomsamples from a normal distribution. Note that the F test is quite sensitive todepartures from normality; if you have any doubt then please use thenon-parametric equivalent described below.

More thantwo samples (Levene)

StatsDirect gives Levene's test as an option with One WayAnalysis of Variance.

The W50definition of Levene test statistic (Brownand Forsythe, 1974) is used; this is essentiallya one way analysis of variance on the absolute (unsigned) values of thedeviations of observations from their group medians.

Levene'stest assumes only that your data form random samples from continuousdistributions. If you are in any doubt about this, use the squared ranks testpresented below, it is generally more robust.

More thantwo samples (Bartlett)

StatsDirect gives Bartlett'stest as an option with One WayAnalysis of Variance.

Bartlett's test assesses equality of the variances ofmore than two samples from a normal distribution (Armitage and Berry,1994).

Please notethat Bartlett'stest is not reliable with moderate departures from normality; use Levene's test as an alternative routinely. Bartlett's test is included here solely forthe purpose of continuity with textbooks.

SquaredRanks (nonparametric) test

StatsDirect gives the squared ranks test as an option in the Kruskal-Wallistest.

The squaredranks test can be used to assess equality of variance across two or moreindependent, random samples which have been measured using a scale that is atleast interval (Conover, 1999).

When you analyse more than two samples with the squared ranks test, StatsDirect performs an automatic comparison of allpossible pair-wise contrasts as described by Conover (1999).

Assumptionsof the squared ranks test:

·random samples

·independence within samples

·mutual independence between samples

·measurement scale is at least interval

Example

From Conover(1999, p. 305).

Testworkbook (Nonparametric worksheet:Machine X, Machine Y).

The followingdata represent the weight of cereal in boxes filled by two different machines Xand Y.

Machine X	Machine Y
10.8	10.8
11.1	10.5
10.4	11.0
10.1	10.9
11.3	10.8
	10.7
	18.8

To analyse these data in StatsDirectyou must first prepare them in two workbook columns appropriately labelled. Alternatively, open the test workbook using thefile open function of the file menu. Then select Kruskal-Wallisfrom the Nonparametric section of the analysis menu.Then select the columns marked "Machine X" and "MachineY" in one selection action. Ignore the Kruskal-Wallistest result and select the squared ranks variance equality test option thenclick on the calculate button.

For thisexample:

Squaredrankswww.med126.com/yishi/ approximate equality of variance test (2 sample)

z = 2.327331

Two tailed P= .0199

One tailed P= .01

Here wereject the null hypothesis that the samples are from identical distributions(except for possibly different means) and we infer a statistically significantdifference between the variances.

P values

Download a free 10 day StatsDirect trial

Agreement ofcontinuous measurements

Menulocation: Analysis_Analysis of Variance_Agreement.

The functioncalculates one way random effects intra-class correlation coefficient,estimated within-subjects standard deviation and a repeatability coefficient (Blandand Altman 1996a and 1996b, McGraw and Wong, 1996).

Intra-classcorrelation coefficient is calculated as:

- where m is the number of observations per subject, SSB isthe sum of squared between subjects and SST is the total sum of squares (as perone way ANOVA above).

Within-subjectsstandard deviation is estimated as the square root of the residual mean squarefrom one way ANOVA.

Therepeatability coefficient is calculated as:

- where mis the number of observations per subject, z is a quantilefrom the standard normal distribution (usually taken as the 5% two tailed quantile of 1.96) and xw is the estimated within-subjects standard deviation(calculated as above).

Intra-subjectstandard deviation is plotted against intra-subject means and Kendall'srank correlation is used to assess the interdependence of these two variables.

An agreementplot is constructed by plotting the maximum differences from each possibleintra-subject contrast against intra-subject means and the overall mean ismarked as a line on this plot.

A Q-Q plot isgiven; here the sum of the difference between intra-subject observations andtheir means are ordered and plotted against an equal order of chi-square quantiles.

Agreementanalysis is best carried out under expert statistical guidance.

Example

From Bland and Altman(1996a).

Test workbook(Agreement worksheet: 1st, 2nd, 3rd, 4th).

Five peakflow measurements were repeated for twenty children:

1st	2nd	3rd	4th
190	220	200	200
220	200	240	230
260	260	240	280
210	300	280	265
270	265	280	270
280	280	270	275
260	280	280	300
275	275	275	305
280	290	300	290
320	290	300	290
300	300	310	300
270	250	330	370
320	330	330	330
335	320	335	375
350	320	340	365
360	320	350	345
330	340	380	390
335	385	360	370
400	420	425	420
430	460	480	470

To analyse these data using StatsDirectyou must first enter them into a workbook or open the test workbook. Thenselect Agreement from the Analysis of Variance section of the Analysis menu.

Agreement

Variables:1st, 2nd, 3rd, 4th

Intra-classcorrelation coefficient (one way random effects) = 0.882276

Estimatedwithin-subjects standard deviation = 21.459749

Forwithin-subjects sd vs. mean,Kendall's tau b =0.164457 two sided P = .3296

Repeatability(for alpha = 0.05) = 59.482297

外科	妇产科	儿科
内科学	生理学	更多

药理学	中药学	药物化学
生药学	卫生毒理学	更多