Content

Nonparametric Methods

Page Non-parametric methods

Page Mann-Whitney U test

Page Wilcoxon signed ranks test

Page Kendall's rank correlation

Page Spearman's rank correlation

Page Non-parametric linear regression

Page Cuzick's test for trend

Page Smirnov two sample test

Page Quantile confidence interval

Page Chi-square goodness of fit test

Page Kruskal-Wallis test

Page Friedman test

Page Homogeneity of variance

Page Ranking

Page Normal scores

Page Sorting

Page Pairwise data manipulation

Page ROC curve analysis

Page Gini coefficient of inequality

Page Diversity of classes

Non-parametric methods. 2

Mann-Whitney U test. 3

Wilcoxon's signed ranks test. 5

Kendall's rank correlation. 7

Spearman's rank correlation. 10

Non-parametric linear regression. 11

Cuzick's test for trend.. 13

Two sample Smirnov test. 15

Quantile confidence interval. 17

Chi-square goodness of fit test. 19

Kruskal-Wallis test. 20

Friedman test. 23

Equality (homogeneity) of variance. 26

Ranking.. 28

Normal scores. 29

Sort data to new column. 30

Pairwise. 31

ROC curve analysis. 32

Gini coefficient of inequality. 34

Diversity of classes. 37

Download a free 10 day StatsDirect trial

Non-parametricmethods

·Mann-WhitneyU (Wilcoxon's rank sum) test (compare two independent samples)

·Wilcoxon'ssigned ranks test (compare a pair of samples)

·Spearman'srank correlation (relate two variables from a sample)

·Kendall'srank correlation (relate two variables from a sample)

·Non-parametriclinear regression (straight line relationship for two variables from asample)

·Cuzick'stest for trend (detect trend across several samples)

·Two sample Smirnov test (compare distributions of twosamples)

·Quantileconfidence interval (e.g. median and its 95% confidence interval)

·Chi-squaregoodness of fit test (compare observed and expected counts)

·Kruskal-Wallis(compare several independent samples)

·Friedman(compare several samples with row by row relationship)

·Homogeneityof variance (test the similarity of data spread for several samples)

·Ranking (save the ranks of a column of numbers)

·Sorting(sort a column of numbers)

·Normalscores (save the normal scores of a column of numbers)

·Pairwisedata manipulation (various treatments of all possible pairs of twovariables)

·ROCcurve analysis (analyse areas under receiveroperating characteristic curves)

·Ginicoefficient (bootstrap confidence intervals for a coefficient ofinequality)

·Diversityindices (examine diversity in a list of counts)

Menu location: Analysis_Non-parametric.

This section provides variousrank-based hypothesis tests and descriptive functions which do not assume thatyour data are from normal distributions.

Rank-based methods:-

·assume that your data have an underlying continuousdistribution.

·assume that for groups being compared, their parent distributionsare similar in all characteristics other than location.

·are usually less sensitive than parametricmethods.

·are often more robust than parametric methods when their assumptionsare properly met.

·are preferred less by some statisticians and more by others incomparison with the use of parametric methods on transformeddata.

·can run into problems when there are many ties (data with the samevalue).

·that take intoaccount the magnitude of the difference between categories (e.g. Wilcoxon signed ranks test) are more powerful than those thatdo not (e.g. sign test).

The numerical methods used inrank-based calculations have progressed in recent years. StatsDirectutilises modern developments, including somecalculations of exact probability in the presence of tied data. An excellentaccount of non-parametric methods is given by Conover (1999).

Download a free 10 day StatsDirect trial

Mann-WhitneyU test

Menu location: Analysis_Non-parametric_Mann-Whitney.

This is a method for thecomparison of two independent random samples (x and y):

The Mann Whitney U statistic isdefined as:

- wheresamples of size n1 and n2 are pooled and Ri are theranks.

U can be resolved as the numberof times observations in one sample precede observations in the other sample inthe ranking.

Wilcoxon rank sum, Kendall's S and theMann-Whitney U test are exactly equivalent tests. In the presence of ties theMann-Whitney test is also equivalent to a chi-square test for trend.

In most circumstances a two sidedtest is required; here the alternative hypothesis is that x values tend to bedistributed differently to y values. For a lower side test the alternativehypothesis is that x values tend to be smaller than y values. For an upper sidetest the alternative hypothesis is that x values tend to be larger than yvalues.

Assumptions of the Mann-Whitneytest:

·random samples frompopulations

·independence withinsamples and mutual independence between samples

·measurement scaleis at least ordinal

A confidence interval for thedifference between two measures of location is provided with the samplemedians. The assumptions of this method are slightly different from theassumptions of the Mann-Whitney test:

·random samples frompopulations

·independence withinsamples and mutual independence between samples

·two populationdistribution functions are identical apart from a possible difference inlocation parameters

Technical Validation

StatsDirect uses the sampling distribution of U to give exact probabilities.These calculations may take an appreciable time to complete when many data aretied.

Confidence intervals areconstructed for the difference between the means or medians (any measure oflocation in fact). The level of confidence used will be as close as is theoreticallypossible to the one you specify. StatsDirectapproaches the selected confidence level from the conservative side.

When samples are large (eithersample > 80 or both samples >30) a normal approximation is used for thehypothesis test and for the confidence interval. Note that StatsDirectuses more accurate P value calculations than some other statistical software, therefore, you may notice a difference in results(Conover, 1999; Dineenand Blakesley, 1973; Harding, 1983; Neumann, 1988).

Example

From Conover (1999, p.218).

Testworkbook (Nonparametric worksheet: Farm Boys, TownBoys).

The following data representfitness scores from two groups of boys of the same age, those from homes in thetown and those from farm homes.

Farm Boys	Town Boys
14.8	12.7
7.3	14.2
5.6	12.6
6.3	2.1
9.0	17.7
4.2	11.8
10.6	16.9
12.5	7.9
12.9	16.0
16.1	10.6
11.4	5.6
2.7	5.6
	7.6
	11.3
	8.3
	6.7
	3.6
	1.0
	2.4
	6.4
	9.1
	6.7
	18.6
	3.2
	6.2
	6.1
	15.3
	10.6
	1.8
	5.9
	9.9
	10.6
	14.8
	5.0
	2.6
	4.0

To analysethese data in StatsDirect you must first enter themin two separate workbook columns. Alternatively, open the test workbook usingthe file open function of the file menu. Then select the Mann-Whitney from theNon-parametric section of the analysis menu. Selectthe columns marked "Farm Boys" and "Town Boys" whenprompted for data.

For this example:

estimated median difference = 0.8

two sided P = 0.529

95.1% confidence interval fordifference between population means or medians = -2.3 to 4.4

Here we have assumed that thesegroups are independent and that they represent at least hypothetical randomsamples of the sub-populations they represent. In this analysis, we are clearlyunable to reject the null hypothesis that one group does NOT tend to yielddifferent fitness scores to the other. This lack of statistical evidence of adifference is reflected in the confidence interval for the difference betweenpopulation means, in that the interval spans zero. Note that the quoted 95.1%confidence interval is as close as you can get to 95% because of the verynature of the mathematics involved in non-parametric methods like this.

P values

confidenceintervals

Download a free 10 day StatsDirect trial

Wilcoxon'ssigned ranks test

Menu location: Analysis_Non-parametric_Wilcoxon Signed Ranks.

This is a method for thecomparison of a pair of samples.

The Wilcoxonsigned ranks test statistic T+ is the sum of the ranks owww.med126.com/wszg/f the positive,non-zero differences (Di) between a pair of samples.

Assumptions of tests on T+:

·distribution ofeach Di is symmetrical

·all Di are mutuallyindependent

·all Di have thesame mean

·measurement scaleof Di is at least interval

In most situations you should usea two sided test. A two sided test is based upon the null hypothesis that thecommon median of the differences is zero. The approximate alternativehypothesis in this case is that the differences tend not to be zero. For alower side test the approximate alternative hypothesis is that differences tendto be less than zero. For an upper side test the approximate alternativehypothesis is that that differences tend to be greater than zero.

A confidence interval isconstructed for the difference between the population medians. In sample termsthis is called the confidence interval for the median or mean difference. It isalso known as the Hodges-Lehmann estimate of shift. The assuptionsof this method are:

·distribution ofeach Di is symmetrical

·all Di are mutuallyindependent

·all Di have thesame median

·measurement scaleof Di is at least interval

Technical Validation

Exact permutationalprobability associated with the test statistic is calculated for sample sizesof less than 50. A normal approximation is used with sample sizes of 50 ormore. Note that StatsDirect uses more accuratemethods than some other statistical software for calculating probabilitiesassociated with this statistic, therefore, you may notice a difference inresults. Confidence limits are calculated using critical values for k with samplesizes up to 30 or by calculating K* for samples with more than 30 observations(Conover, 1999;Neumann, 1988).

Example

From Conover (1999).

Test workbook (Nonparametricworksheet: First Born, Second Born).

The following data represent agressivity scores for 12 pairs of monozygotic twins.

First Born	Second Born
86	88
71	77
77	76
68	64
91	96
72	72
77	65
91	90
70	65
71	80
88	81
87	72

To analysethese data in StatsDirect you must first enter theminto two columns in the workbook. Alternatively, open the test workbook usingthe file open function of the file menu. Then select the WilcoxonSigned Ranks from the Non-parametric methods section of the analysis menu. Selectthe columns marked "Firstborn" and "Second twin" whenprompted for data.

For this example:

two sided P = 0.52

median difference = 1.5

95.8% confidence interval for thedifference between population medians = -2.5 to 6.5

Assuming that the paireddifferences come from a symmetrical distribution then these results show thatone group did not tend to yield different results to the other group which waspaired with it, i.e. there was no statistically significant difference betweenthe agressivity scores of the firstborn as comparedwith the second twin. The extent of this lack of difference is shown well bythe confidence interval which clearly encompasses zero. Note that the quoted95.1% confidence interval is as close as you can get to 95% because of the verynature of the mathematics involved in non-parametric methods like this.

P values

confidenceintervals

Download a free 10 day StatsDirect trial

Kendall'srank correlation

Menu location: Analysis_Non-parametric_Kendall Rank Correlation.

Kendall's rank correlation provides a distribution free test ofindependence and a measure of the strength of dependence between two variables.

Spearman's rank correlation issatisfactory for testing a null hypothesis of independence between twovariables but it is difficult to interpret when the null hypothesis isrejected. Kendall's rank correlation improvesupon this by reflecting the strength of the dependence between the variablesbeing compared.

Consider two samples, x and y,each of size n. The total number of possible pairings of x with y observationsis n(n-1)/2. Now consider ordering the pairs by the xvalues and then by the y values. If x3 > y3 when ordered on both x and ythen the third pair is concordant, otherwise the third pair is discordant. S isthe difference between the number of concordant (ordered in the same way, nc) and discordant (ordereddifferently, nd) pairs.

Tau (t) is related to S by:

If there are tied (same value)observations then tb is used:

- where ti is the number of observations tied at a particular rankof x and u is the number tied at a rank of y.

In thepresence of ties the statistic tb is given as avariant of t adjusted for ties (Kendall,1970). When there are no ties tb = t. An approximate confidence interval is given for tb or t. Please notethat the confidence interval does not correspond exactly to the P values of thetests because slightly different assumptions are made (Samra and Randles,1988).

The gamma coefficient is given asa measure of association that is highly resistant to tied data (Goodman and Kruskal, 1963):

Tests forKendall's test statistic being zero are calculated in exact form when there areno tied data, and in approximate form through a normalisedstatistic with and without a continuity correction (Kendall's score reduced by1).

Technical Validation

Anasymptotically distribution-free confidence interval is constructed for tb or t using thevariant of the method of Samra and Randles(1988) described by Hollanderand Wolfe (1999).

In thepresence of ties, the normalised statistic iscalculated using the extended variance formula given by Hollanderand Wolfe (1999). In theabsence of ties, the probability of null S (and thus t) is evaluated using a recurrence formula when n < 9and an Edgeworth series expansion when n ³ 9 (Best and Gipps, 1974). In the presence of ties you are guided tomake inferences from the normal approximation (Kendall and Gibbons,1990; Conover, 1999; Hollander and Wolfe, 1999). Note that StatsDirectuses more accurate methods for calculating the P values associated with t than some otherstatistical software, therefore, there may be differences in results.

Example

From Armitage and Berry(1994, p. 466).

Test workbook (Nonparametricworksheet: Career, Psychology).

The following data represent atutor's ranking of ten clinical psychology students as to their suitability fortheir career and their knowledge of psychology:

Career	Psychology
4	5
10	8
3	6
1	2
9	10
2	3
6	9
7	4
8	7
5	1

To analysethese data in StatsDirect you must first enter theminto two columns in the workbook. Alternatively, open the test workbook usingthe file open function of the file menu. Then select Kendall Rank Correlationfrom the Non-parametric section of the analysis menu. Selectthe columns marked "Career" and "Psychology" when promptedfor data.

For this example:

Kendall's tau = 0.5111

Approximate 95% CI = 0.1352 to0.8870

Upper side (H1 concordance) P =.0233

Lower side (H1 discordance) P =.9767

Two sided (H1 dependence) P =.0466

From these results we reject thenull hypothesis of mutual independence between the career suitability andpsychology knowledge rankings for the students. With a two sided test we areconsidering the possibility of concordance or discordance (akin to positive ornegative correlation). A one sided test would have been restricted to eitherdiscordance or concordance, this would be an unusualassumption. In our example we can conclude that there is a statisticallysignificant lack of independence between career suitability and psychologyknowledge rankings of the students by the tutor. The tutor tended to rankstudents with apparently greater knowledge as more suitable to their careerthan those with apparently less knowledge and vice versa.

P values

referencelist

Download a free 10 day StatsDirect trial

Spearman'srank correlation

Menu location: Analysis_Non-parametric_Spearman Rank Correlation.

Spearman's rank correlationprovides a distribution free test of independence between two variables. It is,however, insensitive to some types of dependence. Kendall'srank correlation gives a better measure of correlation and is also a bettertwo sided test for independence.

Spearman's rank correlationcoefficient (r) is calculated as:

- whereR(x) and R(y) are the ranks of a pair of variables (x and y) each containing nobservations.

Technical Validation

r is calculated as Pearson's r based on ranks and average ranks usingthe above formula. The probability associated with r is evaluated using anexact permutational method when n = 10 and an Edgeworth series approximation when n > 10 (Best and Roberts,1975). The exact probability calculation employs a corrected version of theBest and Roberts (1975) algorithm. A confidence interval for rho is constructed using Fisher's z transformation (Conover, 1999;Gardner and Altman, 1989; Hollander and Wolfe, 1973). Note that StatsDirect uses more accurate definitions of r and theprobabilities associated with it than some other statistical software,therefore, there may be differences in results.

Example

From Armitage and Berry(1994, p. 466).

Test workbook (Nonparametricworksheet: Career, Psychology).

The following data represent atutor's ranking of ten clinical psychology students as to their suitability fortheir career and their knowledge of psychology:

Career	Psychology
4	5
10	8
3	6
1	2
9	10
2	3
6	9
7	4
8	7
5	1

To analysethese data in StatsDirect you must first enter theminto two columns in the workbook. Alternatively, open the test workbook usingthe file open function of the file menu. Then select Spearman Rank Correlationfrom the Non-parametric section of the analysis menu. Selectthe columns marked "Career" and "Psychology" when promptedfor data.

For this example:

Spearman's rank correlationcoefficient (Rho)= 0.684848

95% CI for rho(Fisher's z transformed)= 0.097085 to 0.918443

Upper side (H1 positivecorrelation) P = .0156

Lower side (H1 negativecorrelation) P = .9844

Two sided (H1 any correlation) P= .0311

From these results we reject thenull hypothesis of mutual independence between the tutor's ranking of students suitability for their career and their knowledge ofpsychology. With a two sided test we are considering the possibility of apositive or a negative correlation, i.e. we can't be sure of this direction atthe outset. A one sided test would have been restricted to correlation in onedirection only i.e. large values of one group associated with big values of theother (positive correlation) or large values of one group associated with smallvalues of the other (negative correlation). In our example we can conclude thatthere is a statistically significant lack of independence between careersuitability and psychology knowledge rankings of the students by the tutor. Thetutor tended to rank students with apparently greater knowledge as moresuitable to their career than those with apparently less knowledge and viceversa.

P values

confidenceintervals

Download a free 10 day StatsDirect trial

Non-parametriclinear regression

Menu location: Analysis_Non-parametric_ Non-parametric LinearRegression.

This is a distribution freemethod for investigating a linear relationship between two variables Y(dependent, outcome) and X (predictor, independent).

The slope b of the regression (Y=bX+a) is calculated as the median of the gradients from allpossible pairwise contrasts of your data. Aconfidence interval based upon Kendall'st isconstructed for the slope.

Non-parametric linear regressionis much less sensitive to extreme observations (outliers) than is simplelinear regression based upon the least squares method. If your data containextreme observations which may be erroneous but you do not have sufficientreason to exclude them from the analysis then non-parametric linear regressionmay be appropriate.

Assumptions:

·The sample israndom (X can be non-random provided that Ys areindependent with identical conditional distributions).

·The regression of Yon X is linear (this implies an interval measurement scale for both X and Y).

This function also provides youwith an approximate two sided Kendall's rank correlation test forindependence between the variables.

Technical Validation

Note that the two sidedconfidence interval for the slopeis the inversion of the two sided Kendall'stest. The approximate two sided P value for Kendall's t or tb is given but the exact quantile fromKendall's distribution isused to construct the confidence interval, therefore, there may be slightdisagreement between the P value and confidence interval. If there are manyties then this situation is compounded (Conover, 1999).

Example

From Conover (1999, p.338).

Test workbook (Nonparametricworksheet: GPA, GMAT).

The following data represent testscores for 12 graduates respectively:

GPA	GMTA
4.0	710
4.0	610
3.9	640
3.8	580
3.7	545
3.6	560
3.5	610
3.5	530
3.5	560
3.3	540
3.2	570
3.2	560

To analysethese data in StatsDirect you must first enter theminto two columns in the workbook. Alternatively, open the test workbook usingthe file open function of the file menu. Then select Non-parametric LinearRegression from the Non-parametric section of the analysis menu. Selectthe columns marked "GMTA" and "GPA" when prompted for Y andX variables respectively.

For this example:

GPA vs. GMTA

Observations per sample = 12

Median slope (95% CI) = 0.003485(0 to 0.0075)

Y-intercept = 1.581061

Kendall's rank correlation coefficient tau b =0.439039

Two sided (on continuitycorrected z) P = .0678

If you plot GPA against GMTAscores using the scatter plot function in the graphics menu, you will see thatthere is a reasonably straight line relationship between GPA and GMTA. Here wecan infer with 95% confidence that the true population value of the slope of alinear regression line for these two variables lies between 0 and 0.008. Theregression equation is estimated at Y = 1.5811 + 0.0035X.

From the two sided Kendall's rank correlation test, we can not reject thenull hypothesis of mutual independence between the pairs of results for thetwelve graduates. Note that the zero lower confidence interval is a marginalresult and we may have rejected the null hypothesis had we used a differentmethod for testing independence.

P values

confidenceintervals

Download a free 10 day StatsDirect trial

Cuzick's testfor trend

Menu location: Analysis_Non-parametric_Cuzick Trend Test.

This function provides a Wilcoxon-type test for trend across a group of three ormore independent random samples.

Assumptions:

·data must be atleast ordinal

·groups must beselected in a meaningful order i.e. ordered

If you do not choose to enteryour own group scores then scores are allocated uniformly (1 ... n) in order ofselection of the n groups.

The test statistic is calculatedas follows:

- where Riis the sum of the pooled ranks for the ith group, li is the sum of scores for the ithgroup, ni is the sample size for the ith group and N is the total number of observations. Forthe null hypothesis of no trend across the groups T will have mean E(T), variance var(T) and the nullhypothesis is tested using the normalised teststatistic z.

Technical Validation

A logistic distribution isassumed for errors. Probabilities for z are derived from the standard normaldistribution. Please note that this test is more powerful than the applicationof the Wilcoxon rank-sum / Mann-Whitney test betweenmore than two groups of data (Cuzick, 1985).

Example

From Cuzick (1985).

Test workbook (Nonparametricworksheet: CMT 64, CMT 167, CMT 170, CMT 175, CMT181).

Mice were inoculated with celllines, CMT 64 to 181, which had been selected for their increasing metastatic potential. The number of lung metastases foundin each mouse after inoculation are quoted below:

CMT 64	CMT 167	CMT 170	CMT 175	CMT 181
0	0	2	0	2
0	0	3	3	4
1	5	6	5	6
1	7	9	6	6
2	8	10	10	6
2	11	11	19	7
4	13	11	56	18
9	23	12	100	39
	25	21	132	60
	97

To analysethese data in StatsDirect you must first enter themin five workbook columns appropriately labelled.Alternatively, open the test workbook using the file open function of the filemenu. Then select Cuzick's Trend Test from the Non-parametricsection of the analysis menu. Selectthe columns marked "CMT 64", "CMT 167", "CMT170", "CMT 175" and "CMT 181" when prompted for data.Click on "No" when you are prompted about group scores, this does notapply to most analyses provided you select the variables in the order you arestudying them. With automatic group scoring you must be careful to select thevariables in the order across which you want to look for trend.

For this example:

one sided p (corrected for ties) = 0.0172

With these data we are interestedin a trend in one direction only, therefore, we can use a one sided test fortrend. We have shown a statistically significant trend for increasing number ofmetastases across these malignant cell lines in this order.

P values

confidenceintervals

Download a free 10 day StatsDirect trial

Two sampleSmirnov test

Menu location: Analysis_Non-parametric_Smirnov Two Sample.

This function compares thedistribution functions of the parent populations of two samples.

If you have two independentsamples which may have been drawn from different populations then you mightconsider looking for differences between them using a t test or Mann-Whitneytest. Mann-Whitney and t tests are sensitive to differences between two meansor medians but do not detect other differences such as variance. The Smirnovtest (a two sample version of the Kolmogorov test)detects a wider range of differences between two distributions.

The test statistic for the twosided test is the largest vertical distance between the empirical distributionfunctions. In other words, if you plot the sorted values of sample x againstthe sorted values of sample y as a series of increasing steps then the teststatistic is the maximum vertical gap between the two plots.

The test statistics for the onesided tests are the largest vertical distance of one distribution functionabove the other and vice versa.

The alternative hypothesis forthe two sided test is that the distribution functions for x and y are differentfor at least one observation. The alternative hypotheses for the one sidedtests are a) the distribution function for x is greater than that for y for atleast one observation and b) the distribution function for x is less than thatfor y for at least one observation.

The two sample Smirnov methodtests the null hypothesis that the distribution functions of the populationsfrom which your samples have been drawn are identical

Assumptions:

·samples are random

·two samples aremutually independent

·measurement scaleis at least ordinal

·for exact test,random variables are assumed to be continuous

Technical Validation

P values for the test statisticsare calculated by permutation of the exact distribution whenever possible (Conover, 1999;Nikiforov, 1994; Kim and Jennrich 1973).

Example

From Conover (1999).

Test workbook (Nonparametricworksheet: Xi, Yi).

Xi	Yi
7.6	5.2
8.4	5.7
8.6	5.9
8.7	6.5
9.3	6.8
9.9	8.2
10.1	9.1
10.6	9.8
11.2	10.8
	11.3
	11.5
	12.3
	12.5
	13.4
	14.6

To analysethese data in StatsDirect you must first enter theminto two workbook columns and label them appropriately. Alternatively, open thetest workbook using the file open function of the file menu. Then select theSmirnov Two Sample test from the from theNon-parametric section of the analysis menu. Selectthe columns marked "Xi" and "Yi" when prompted for data.

For this example:

Two sided test:

D = 0.4

P = .2653

One sided test (suspecting Xishifted left of Yi):

D = 0.4

P = .1326

One sided test (suspecting Xishifted right of Yi):

D = 0.333333

P = .2432

Thus we can not reject the nullhypothesis that the two populations from which our samples were drawn have thesame distribution function.

If we were interested in a onesided test then we would need good reason for expecting one group to yieldvalues above (distribution shifted to the right of) or below (distributionshifted to the left of) the other group. For these data neither of the onesided tests reached significance.

P values

Download a free 10 day StatsDirect trial

Quantileconfidence interval

Menu location: Analysis_Non-parametric_Quantile ConfidenceInterval.

This function provides aconfidence interval for any quantile or centile.

As with all non-parametricconfidence intervals, the exact confidence level is not always attainable butthe level which is exact to the interval constructed is displayed (Conover, 1999;Gardner and Altman, 1989).

Assumptions:

·random sample

·measurement scale isat least ordinal

A presentation of medians andtheir confidence intervals is often more meaningful than the time honoured (abused) tradition of presenting means andstandard deviations. Researchers sometimes quote means and their confidenceintervals in situations where a median with confidence interval would be moreappropriate (e.g. when outliers have a biasing effect on the mean but there isinsufficient evidence to exclude them from the analysis). A box and whiskerplot is a useful accompaniment to this function.

Technical Validation

For sample sizes greater than 200an approximation to the binomial distribution is used otherwise the criticalvalues of the binomial distribution used in this calculation are found by anexact method (Conover,1999). If the conservative option is not selected or the sample size isgreater than 200 then for a c*100% confidence interval the binomial quantiles closest to a cumulative probability of (1-c)/2and 1-(1-c)/2 are used. If the conservative option is selected and the samplesize is not greater than 200 then for a c*100% confidence interval the binomialquantiles closest to and less than or equal to acumulative probability of (1-c)/2, and closest to and greater than or equal toa cumulative probability of 1-(1-c)/2 are used. Note that the conservativeinterval calculates each side, not just the overall interval, on a conservativebasis.

Example

From Conover (1999, p.145).

Test workbook (Nonparametricworksheet: Tubes).

The following represent times tofailure in hours for a set of pentode radio valves.

Tubes
46.9
47.2
49.1
56.5
56.8
59.2
59.9
63.2
63.3
63.4
63.7
64.1
67.1
67.7
73.3
78.5

To analysethese data in StatsDirect you must first enter theminto a workbook column and label it appropriately. Alternatively, open the testworkbook using the file open function of the file menu. Then select Quantile Confidence Interval from the Non-parametricsection of the analysis menu. Selectthe column marked "Tubes" when prompted for data. Choose 90% as theconfidence level. Then enter 0.75 to specify that the quantileyou want is the upper quartile or 75th percentile 执业医师.

For this example:

upper quartile = 66.35

Approximate 90% CI(non-conservative) = 63.3 to 73.3

exact confidence level = 90.94%

Approximate 90% CI (conservative)= 63.3 to 78.5

exact confidence level = 96.28%

We may conclude with 91%confidence that the population value of the upper quartile lies between 63.3and 73.3 hours.

confidenceintervals

Download a free 10 day StatsDirect trial

Chi-squaregoodness of fit test

Menu location: Analysis_Non-parametric_Chi-Square Goodness ofFit.

This function enables you tocompare the distribution of classes of observations with an expecteddistribution.

Your data must consist of arandom sample of independent observations, the expected distribution of whichis specified (Armitageand Berry, 1994; Conover, 1999).

Pearson's chi-square goodness offit test statistic is:

- where Oj are observed counts, Ej arecorresponding expected count and c is the number of classes for whichcounts/frequencies are being analysed.

The test statistic is distributedapproximately as a chi-square random variable with c-1 degrees of freedom. Thetest has relatively low power (chance of detecting a real effect) with all butlarge numbers or big deviations from the null hypothesis (all classes containobservations that could have been in those classes by chance).

The handling of small expectedfrequencies is controversial. Koehler and Larnz(1980) assert that the chi-square approximation is adequate provided all ofthe following are true:

·total of observed counts (N) ³ 10

·number of classes(c) ³ 3

·all expected values³0.25

Some statistical software offersexact methods for dealing with small frequencies but these methods are notappropriate for all expected distributions, hence they can be specious. You cantry reducing the number of classes but expert statistical guidance is advisablefor this (Conover,1999).

Example

Suppose we suspected an unusualdistribution of blood groups in patients undergoing one type of surgicalprocedure. We know that the expected distribution for the population served bythe hospital which performs this surgery is 44% group O, 45% group A, 8% groupB and 3% group AB. We can take a random sample of routine pre-operative bloodgrouping results and compare these with the expected distribution.

Results for 187 consecutivepatients:

Blood Group	O	67
A	83
B	29
AB	8

To analysethese data using StatsDirect you must first enter theobserved frequencies into the workbook. You can enter the grouped frequencies,as above, or the individual observations (187 rows coded 1 to 4 in this case). If you enter individualobservations, StatsDirect collects them intogroups/bins/classes of frequencies which you can inspect before proceeding withthe analysis. The next step is to enter the expected frequencies, this is donedirectly on screen after you have selectedthe observed frequencies and chosen Chi-square Goodness of Fit from theNon-parametric section of the analysis menu. For this example you can enter theexpected proportions, the expected frequencies will be calculated and displayedautomatically. You can also alter the number of degrees of freedom but this isintended for expert statistical use, thus you would normally exceptthe default value of number of categories minus one. The results for ourexample are:

N = 187

Value	Observed frequency	Expected frequency
1	67	82.28
2	83	84.15
3	29	14.96
4	8	5.61

Chi-square = 17.0481 df = 3

P = .0007

Here we may report astatistically highly significant difference between the distribution of bloodgroups from patients undergoing this surgical procedure and that which would beexpected from a random sample of the general population.

P values

Download a free 10 day StatsDirect trial

Kruskal-Wallistest

Menu location: Analysis_Analysis of Variance_Kruskal-Wallis.

This is a method for comparingseveral independent random samples and can be used as a non-parametricalternative to the one way ANOVA.

The Kruskal-Wallistest statistic for k samples, each of size ni is:

- whereN is the total number (all ni) and Ri is the sum of the ranks (from all samples pooled) forthe ith sample and:

The null hypothesis of the testis that all k distribution functions are equal. The alternative hypothesis isthat at least one of the populations tends to yield larger values than at leastone of the other populations.

Assumptions:

·random samples frompopulations

·independence withineach sample

·mutual independenceamong samples

·measurement scaleis at least ordinal

·either k populationdistribution functions are identical, or else some of the populations tend toyield larger values than other populations

If the test is significant, youcan make multiple comparisons between the samples. You may choose the level ofsignificance for these comparisons (default is a = 0.05). All pairwise comparisons are made and the probability of eachpresumed "non-difference" is indicated (Conover, 1999;Critchlow and Fligner, 1991; Hollander and Wolfe, 1999). Two alternativemethods are used to make all possible pairwisecomparisons between groups; these are Dwass-Steel-Critchlow-Flignerand Conover-Inman. In most situations, you should use the Dwass-Steel-Critchlow-Flignerresult.

By the Dwass-Steel-Critchlow-Flignerprocedure, a contrast is considered significant if the following inequality issatisfied:

- where q is a quantile from the normal range distribution for k groups, ni is size of the ith group, nj is the size of the jth group, tb is the number of ties at rank b and Wijis the sum of the ranks for the ith group whereobservations for both groups have been ranked together. The values either sideof the greater than sign are displayed in parentheses in StatsDirectresults.

The Conover-Inman procedure issimply Fisher's least significant difference method performed on ranks. Acontrast is considered significant if the following inequality is satisfied:

- wheret is a quantile from the Student t distribution onN-k degrees of freedom. The values either side of the greater than sign aredisplayed in parentheses in StatsDirect results.

An alternative to Kruskal-Wallis is to perform a one way ANOVA on the ranksof the observations.

StatsDirect also gives you an homogeneity of variancetest option with Kruskal-Wallis; this is marked as"Equality of variance (squared ranks)". Please refer to homogeneityof variance for more details.

Technical Validation

The test statistic is anextension of the Mann-Whitney test and is calculated as above. In the presenceof tied ranks the test statistic is given in adjusted and unadjusted forms,(opinion varies concerning the handling of ties). The test statistic followsapproximately a chi-square distribution with k-1 degrees of freedom; P valuesare derived from this. For small samples you may wish to refer to tables of theKruskal-Wallis test statistic but the chi-squareapproximation is highly satisfactory in most cases (Conover, 1999).

Example

From Conover (1999, p.291).

Test workbook (ANOVA worksheet:Method 1, Method 2, Method 3, Method 4).

The following data represent cornyields per acre from four different fields where different farming methods wereused.

Method 1	Method 2	Method 3	Method 4
83	91	101	78
91	90	100	82
94	81	91	81
89	83	93	77
89	84	96	79
96	83	95	81
91	88	94	80
92	91		81
90	89
	84

To analysethese data in StatsDirect you must first prepare themin four workbook columns appropriately labelled.Alternatively, open the test workbook using the file open function of the filemenu. Then select Kruskal-Wallis from theNon-parametric section of the analysis menu. Then selectthe columns marked "Method 1", "Method 2", "Method3" and "Method 4" in one selection action.

For this example:

Adjusted for ties: T = 25.62883 P< 0.0001

All pairwisecomparisons (Dwass-Steel-Chritchlow-Fligner)

Method 1 and Method 2 , P = 0.1529

Method 1 and Method 3 , P = 0.0782

Method 1 and Method 4 , P = 0.0029

Method 2 and Method 3 , P = 0.0048

Method 2 and Method 4 , P = 0.0044

Method 3 and Method 4 , P = 0.0063

All pairwisecomparisons (Conover-Inman)

Method 1 and Method 2, P = 0.0078

Method 1 and Method 3, P = 0.0044

Method 1 and Method 4, P <0.0001

Method 2 and Method 3, P <0.0001

Method 2 and Method 4, P = 0.0001

Method 3 and Method 4, P <0.0001

From the overall T we see astatistically highly significant tendency for at least one group to give highervalues than at least one of the others. Subsequent contrasts show a significantseparation of all groups with the Conover-Inman method and all but method 1 vs.methods 2 and 3 with the Dwass-Steel-Chritchlow-Flignermethod. In most situations, it is best to use only the Dwass-Steel-Chritchlow-Flignerresult.

P values

analysis of variance

Download a free 10 day StatsDirect trial

Friedmantest

Menu location: Analysis_Analysis of Variance_Friedman.

This method compares severalrelated samples and can be used as a non-parametric alternative to the two way ANOVA.

The power of this method is lowwith small samples but it is the best method for non-parametric two wayanalysis of variance with sample sizes above five.

The Iman-DavenportT2 variant of the Friedman test statistic is:

- wherethere are k treatments and b blocks and T1 is:

- where Rj is the sum of the ranks (from pooled observations) forall blocks in a one treatment and A1 and C1 are:

Assumptions:

·results in oneblock don't affect results in other blocks

·observations in a block are ranked by a criterion ofinterest

The null hypothesis of the testis that the treatments have identical effects. The alternative hypothesis isthat at least one of the treatments tends to yield larger values than at leastone of the other treatments.

When the test is significant StatsDirect allows you to make multiple comparisons betweenthe individual samples. These comparisons are performed automatically for allpossible contrasts and you are informed of the statistical significance of eachcontrast. A contrast is considered significant if the following inequality issatisfied:

- wheret is a quantile from the Student t distribution on(b-1)(k-1) degrees of freedom. This method is a nonparametric equivalent toFisher's least significant difference method (Conover, 1999).

An alternative to the Friedmantest is to perform two way ANOVA on ranks; this is howthe T2 statistic was derived.

Cochran's Q, Kendall's W and Quade

Kendall's W coefficient of concordance test (also attributed to Wallis andBabington-Smith independently) gives the same numerical answers as Friedman'stest.

Quade's proposed a slightly different method for testing the samehypotheses as described above for Friedman's method. Friedman's test generallyperforms better than Quade's test and should be usedinstead.

Cochran's Q test can be performedusing this Friedman test function by entering dichotomous data coded as in theexample below (Conover,1999):

	Sportsman
Game	1	2	3
1	1	1	1
2	1	1	1
3	0	1	0
4	1	1	0
5	0	0	0
6	1	1	1
7	1	1	1
8	1	1	0
9	0	0	1
10	0	1	0
11	1	1	1
12	1	1	1

- here Conover (1999) describeshow three people ran their own separate scoring systems for predicting theoutcomes of basketball games; the table above shows 1 if they predicted theoutcome correctly and 0 if not for 12 games.

Technical Validation

The overall test statistic is T2calculated as above (Iman and Davenport,1980). T2 is approximately distributed as an F random variable with k-1numerator and (b-1)(k-1) denominator degrees offreedom, this is how the P value is derived. Older literature and some softwareuses an alternative statistic that is tested against a chi-square distribution,the method used in StatsDirect is more accurate (Conover, 1999).

Example

From Conover (1999, p.372).

Test workbook (ANOVA worksheet:Grass 1, Grass 2, Grass 3, Grass 4).

The following data represent therank preferences of twelve home owners for four different types of grass plantedin their gardens for a trial period. They considered defined criteria beforeranking each grass between 1 (best) and 4 (worst).

Grass 1	Grass 2	Grass 3	Grass 4
4	3	2	1
4	2	3	1
3	1.5	1.5	4
3	1	2	4
4	2	1	3
2	2	2	4
1	3	2	4
2	4	1	3
3.5	1	2	3.5
4	1	3	2
4	2	3	1
3.5	1	2	3.5

To analysethese data in StatsDirect you must first prepare themin four workbook columns appropriately labelled.Alternatively, open the test workbook using the file open function of the filemenu. Then select Friedman from the Non-parametric section of the analysismenu. Then selectthe columns marked "Grass 1", "Grass 2", "Grass3" and "Grass 4" in one selection action.

For this example:

T2 = 3.192198 P = 0.0362

All pairwisecomparisons (Conover)

Grass 1 vs. Grass 2, P = 0.0149

Grass 1 vs. Grass 3, P = 0.0226

Grass 1 vs. Grass 4, P = 0.4834

Grass 2 vs. Grass 3, P = 0.8604

Grass 2 vs. Grass 4, P = 0.0717

Grass 3 vs. Grass 4, P = 0.1017

From the overall test statisticwe can conclude that there is a statistically significant tendency for at leastone group to yield higher values than at least one of the other groups.Considering the raw data and the contrast results we see that grasses 2 and 3are significantly preferred above grass 1 but that there is little to choosebetween 2 and 3.

P values

analysis of variance

Download a free 10 day StatsDirect trial

Equality(homogeneity) of variance

Menu location: Analysis_Analysis of Variance(_Oneway, _Kruskal-Wallis).

StatsDirect provides parametric (Bartlet and Levene) and nonparametric (squared ranks) tests forequality/homogeneity of variance.

Most commonly used statisticalhypothesis tests, such as t tests, compare means or other measures of location.Some studies need to compare variability also. Equality of variance tests canbe used on their own for this purpose but they are often used alongside othermethods (e.g. analysis of variance) to support assumptions made about variance.For this reason, StatsDirect presents equality ofvariance tests with analysis of variance.

Bartlett and Levene (parametric) tests

Two samples

Use the F test tocompare the variances of two random samples from a normal distribution. Notethat the F test is quite sensitive to departures from normality; if you haveany doubt then please use the non-parametric equivalent described below.

More than two samples (Levene)

StatsDirect gives Levene's test as an option with One WayAnalysis of Variance.

The W50 definition of Levene test statistic (Brown and Forsythe,1974) is used; this is essentially a one way analysis of variance on theabsolute (unsigned) values of the deviations of observations from their groupmedians.

Levene's test assumes only that your data form random samples fromcontinuous distributions. If you are in any doubt about this, use the squaredranks test presented below, it is generally more robust.

More than two samples (Bartlett)

StatsDirect gives Bartlett'stest as an option with One WayAnalysis of Variance.

Bartlett's testassesses equality of the variances of more than two samples from a normaldistribution (Armitageand Berry, 1994).

Please note that Bartlett's test is not reliable with moderatedepartures from normality; use Levene's test as analternative routinely. Bartlett'stest is included here solely for the purpose of continuity with textbooks.

Squared Ranks(nonparametric) test

StatsDirect gives the squared ranks test as an option in the Kruskal-Wallistest.

The squared ranks test can beused to assess equality of variance across two or more independent, randomsamples which have been measured using a scale that is at least interval (Conover, 1999).

When you analysemore than two samples with the squared ranks test, StatsDirectperforms an automatic comparison of all possible pair-wise contrasts asdescribed by Conover(1999).

Assumptions of the squared rankstest:

·random samples

·independence withinsamples

·mutual independencebetween samples

·measurement scaleis at least interval

Example

From Conover (1999, p.305).

Testworkbook (Nonparametric worksheet: Machine X, Machine Y).

The following data represent theweight of cereal in boxes filled by two different machines X and Y.

Machine X	Machine Y
10.8	10.8
11.1	10.5
10.4	11.0
10.1	10.9
11.3	10.8
	10.7
	18.8

To analysethese data in StatsDirect you must first prepare themin two workbook columns appropriately labelled.Alternatively, open the test workbook using the file open function of the filemenu. Then select Kruskal-Wallis from the Nonparametric section of the analysis menu. Then selectthe columns marked "Machine X" and "Machine Y" in oneselection action. Ignore the Kruskal-Wallis testresult and select the squared ranks variance equality test option then click onthe calculate button.

For this example:

Squared ranks approximateequality of variance test (2 sample)

z = 2.327331

Two tailed P = .0199

One tailed P = .01

Here we reject the nullhypothesis that the samples are from identical distributions (except forpossibly different means) and we infer a statistically significant differencebetween the variances.

P values

Download a free 10 day StatsDirect trial

Ranking

Menu location: Data_Rank.

This function ranks the workbookdata you select and saves the rankings into a new workbook column labelled Rank: Name where Name is the column label of theoriginal data. You can calculate a correction factor for ties in the ranking;five formulae are offered for tie correction:

1.S(t^3 - t /12)

2.S(t * (t-1)/2)

3.S(t * (t-1) * (2t+5))

4.S(t * (t-1) * (t-2))

5.S(t * (t-1) * (t+1))

Here t is the number of data tiedat each tie and upper case sigma (S) is the summation across these ties.

The use of tie corrections is acomplex subject upon which learned statisticians sometimes disagree.

Example

Test workbook (Nonparametricworksheet: First Born).

Ranking the following agressivity scores for a sample of firstborn twins gives.

First Born ----->	Rank: First Born
86	8
71	3.5
77	6.5
68	1
91	11.5
72	5
77	6.5
91	11.5
70	2
71	3.5
88	10
87	9

Download a free 10 day StatsDirect trial

Normalscores

Menu location: Data_Normal Scores.

This function saves the normalscores of workbook data you select into a new workbook column marked Nml Score: Name, where Name is the column label of theoriginal data.

Three different methods forcalculating normal scores are provided:

1. vander Waerden's method (Conover, 1999):

- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size and F(p) is the pth quantilefrom the standard normal distribution.

2. Blom's method (Altman, 1991):

- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size and F(p) is the pth quantilefrom the standard normal distribution.

3. Expectednormal order scores (David, 1981;Royston, 1982; Harter, 1961)

- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size, f(p) is the standard normal density for p and F(p) is the pth quantile from the standardnormal distribution. The solution is found by numerical integration.Calculation of expected normal order scores is not practical for very largesamples, n of 2500 is the maximum permitted in StatsDirect.

Example

Test workbook (Nonparametricworksheet: First Born).

Scoring the following agressivity scores for a sample of firstborn twins usingthe first method above gives:

First Born ----->	Nml Score(vdW): First Born
86	0.2934
71	-0.6151
77	0
68	-1.4261
91	1.1984
72	-0.2933
77	0
91	1.1984
70	-1.0201
71	-0.6151
88	0.7363
87	0.5024

Sort data to newcolumn

Menu location: Data_Sort_Create Sorted Columns.

StatsDirect enables you to create sorted copies of workbook columns or tomanipulate data within a selected data range of a workbook. This section dealswith creating new sorted workbook columns based on existing ones. See also workbooksort.

This function sortsworkbook data you select and saves the sorted data into a new workbook columnmarked Sort: Name, where Name is the column label of the original data. Sortingmay be ascending or descending. The sort may also be tied to other data, i.e.the data in column b may be sorted in the in the order of sorting the data incolumn a. Tied sorting can be repeated for any number of columns.

Example 1

Test workbook (Nonparametricworksheet: First Born).

Sorting the following agressivity scores for a sample of firstborn twins inascending order gives.

First Born ----->	Sort: First Born
86	68
71	70
77	71
68	71
91	72
72	77
77	77
91	86
70	87
71	88
88	91
87	91

Example 2

Test workbook (Nonparametric worksheet:First Born, Second Born).

Sorting the following agressivity scores for a sample of second born twins by theascending order of the scores for firstborn twins gives.

First Born	Second Born ----->	Sr~Second Born~First Born
86	88	64
71	77	65
77	76	80
68	64	77
91	96	72
72	72	76
77	65	65
91	90	88
70	65	72
71	80	81
88	81	96
87	72	90

Pairwise

·Pairwisedifferences

·Pairwisemeans

·Pairwiseslopes

Menu location: Data_Pairwise.

These functions providetransformations based on pairwise calculations eitherbetween columns or within a column, they often provide intermediate steps innon-parametric methods.

Download a free 10 day StatsDirect trial

ROC curveanalysis

Menu location: Graphics_ROC.

This plots a Receiver OperatingCharacteristic (ROC) curve from two sets of raw data.

ROC plots were first used todefine detection cut-off points for radar equipment with different operators.These plots can be used in a similar way to define cut-off points fordiagnostic tests, for example the level of prostate specific antigen in a bloodsample indicating a diagnosis of prostatic carcinoma.Defining cut-off levels for diagnostic tests is a difficult process whichshould combine ethical and practical considerations with numerical evidence. Itis wise to involve a statistician in studies of new diagnostic tests (Altman, 1991).

StatsDirect requires two columns of data for each ROC plot, one with testresults in cases where the condition tested for is known to be present andanother for test results in known negative cases. Sensitivity (probability of +ve test when disease is present) is then plotted against1-specificity (probability of +ve test when diseaseis absent). See diagnostictest for more information.

When you have a number of ROCcurves to compare, the area under the curve is usually the best discriminator (Metz, 1978).

StatsDirect calculates the area under the ROC curve directly by an extendedtrapezoidal rule (Presset al. 1992) and by a non-parametric method analogous to the Wilcoxon/Mann-Whitney test (Hanley and McNeil1982). A confidence interval is constructed using DeLong’svariance estimate (DeLonget al, 1988).

Example

From Aziz et al. (1996).

Test workbook (SDI (nopregnancy), SDI (pregnancy)).

The following are Sperm DeformityIndex (SDI) values from semen samples of men in an infertility study. They aredivided into a "condition" present group defined as those whosepartners achieved pregnancy and "condition" absent where there was nopregnancy.

SDI (pregnancy)

165 140 154 139 134 154 120 133150 146 140 114 128 131 116 128 122 129 145 117 140 149 116 147 125 149 129 157144 123 107 129 152 164 134 120 148 151 149 138 159 169 137 151 141 145 135 135 153 125 159 148 142 130 111 140 136 142 139 137 187 154151 149 148 157 159 143 124 141 114 136 110 129 145 132 125 149 146 138 151 147154 147 158 156 156 128 151 138 193 131 127 129 120159 147 159 156 143 149 160 126 136 150 136 151 140 145 140 134 140 138 144 140140

SDI (no pregnancy)

159 136 149 156 191 169 194 182163 152 145 176 122 141 172 162 165 184 239 178 178164 185 154 164 140 207 214 165 183 218 142 161 168 181 162 166 150 205 163 166176

To analysethese data using StatsDirect you must first enterthem into two columns in a workbook. Enter the number of plots as 1. Then selectROC from the graphics menu and select the appropriate columns for conditionpresent and absent from the workbook. Leave the weighting option as 1 and leavethe cut-off calculator as checked. You are then presented with the cut-offcalculator, try pressing the up and down arrow keys to display diagnostic teststatistics for different cut-offs. Then press "Reset" and"Ok". The ROC plot is then drawn with the optimisedcut-off point marked. The plot should look like a stepped curve convex to thetop left hand corner, if it is upside down then youhave probably selected "condition present" and "conditionabsent" the wrong way around.

For this example:

The optimisedcut-off for equally important sensitivity and specificity was calculated at 160with these data. A cut-off of 161 was gained with sensitivity weighted twice asimportant as specificity. After a similar analysis of a larger study > 160was subsequently chosen as the SDI level for selecting patients for a type ofinfertility treatment.

ROC Analysis

Data set: SDI(+ve), SDI(-ve)

Area under ROC curve by extendedtrapezoidal rule = 0.875411

Wilcoxon estimate of area under ROC curve = 0.875411

DeLong standard error = 0.034862: 95% CI = 0.807082 to 0.943739

Optimum cut-off point selected =160.064

Table at cut-off:	a	b
	30	5

	c	d
	12	111

sensitivity (95% CI) = 0.714286 (0.554161 to 0.842809)

specificity (95% CI) = 0.956897 (0.902275 to 0.985858)

Download a free 10 day StatsDirect trial

Ginicoefficient of inequality

Menu location: Analysis_Non-parametric_Gini Coefficient ofInequality

This method calculates the Gini coefficient (G) of inequality with bootstrapconfidence intervals. A Lorenz plot is produced when a single variable isspecified for analysis, otherwise the summarystatistics alone are displayed for a group of variables.

The Ginicoefficient was developed by the Italian Statistician CorradoGini (Gini, 1912) as asummary measure of income inequality in society. It is usually associated withthe plot of wealth concentration introduced a few years earlier by Max Lorenz (Lorenz, 1905).Since these measures were introduced, they have been applied to topics otherthan income and wealth, but mostly within Economics (Cowell, 1995, 2000;Jenkins, 1991; Sen, 1973).

G is a measure of inequality,defined as the mean of absolute differences between all pairs of individualsfor some measure. The minimum value is 0 when all measurements are equal andthe theoretical maximum is 1 for an infinitely large set of observations whereall measurements but one has a value of 0, which is the ultimate inequality (Stuart and Ord, 1994).

When G is based on the Lorenzcurve of income distribution, it can be interpreted as the expected income gapbetween two individuals randomly selected from the population (Sen, 1973).

The classical definition of Gappears in the notation of the theory of relative mean difference:

- where xis an observed value, n is the number of values observed and x baris the mean value.

If the x values are firstplaced in ascending order, such that each x has rank i,the some of the comparisons above can be avoided and computation is quicker:

- where xis an observed value, n is the number of values observed and i is the rank of values in ascending order.

Note that only positive non-zerovalues are used.

The small sample varianceproperties of G are not known, and large sample approximations to the varianceof G are poor (Millsand Zandvakili, 1997; Glasser, 1962; Dixon et al., 1987), thereforeconfidence intervals are calculated via bootstrap re-sampling methods (Efron andTibshirani, 1997).

StatsDirect calculates two types of bootstrap confidence intervals,these are percentile and bias-corrected (Mills andZandvakili, 1997; Dixon et al., 1987; Efron and Tibshirani, 1997). Thebias-corrected intervals are most appropriate for most applications.

In order for G to be an unbiasedestimate of the true population value, it should be multiplied by n/(n-1) (Dixon, 1987; Millsand Zandvakili, 1997). This corrected form of G does not appear mostliterature, but there are few situations when it is not the most appropriateform to use.

In the context of measuringinequalities in health, Brown (1994)presents a Gini-style index, seemingly calculatedfrom two variables instead of one. The two variables comprise distinctindicators of health (y, e.g. infant deaths) and population (x,live births) for n groups sorted by a composite measure of health andpopulation (e.g. infant mortality rate).

Gb based on two variables(e.g. infant deaths and live births) will be very similar to G calculated froma composite measure (e.g. infant mortality rate). In most situations it is morenatural to think of inequality of the composite measure. Another reason not touse Gb is that itsstatistical characteristics are not well studied.

StatsDirect does not provide a separate function to handle distinct health andpopulation variables when calculating Gini coefficients, instead you should use the single compositehealth/population measure.

The Pan American HealthOrganisation (2001) gave the following illustration:

Country	GNP per capita	infant mortality rate (IMR)	live births	infant deaths
Bolivia	2860	59	250	14750
Peru	4410	43	621	26703
Ecuador	4730	39	308	12012
Colombia	6720	24	889	21336
Venezuela	8130	22	568	12496

Positive non-zero observations =5

Bootstrap re-samples = 2000

Bias = 0.057218

Brown's Gb = 0.1904

Gini coefficient = 0.19893

Percentile 95% CI = 0.023645 to0.219277

Bias-corrected 95% CI = 0.151456to 0.241304

Unbiased estimator of population Gini coefficient = 0.248663

Percentile 95% CI = 0.029557 to0.274096

Bias-corrected 95% CI = 0.18932to 0.30163

This example uses too few groupsfor reliable inference from G.

Technical notes

The percentile confidenceinterval is defined as:

- where g*is a Gini coefficient estimated from a bootstrapsample and a is (100-confidence level)/100.

The bias-corrected confidenceinterval is defined as:

- where g* is a Gini coefficient estimated from a bootstrap sample, Gis the observed Gini coefficient, a is(100-confidence level)/100, F is the standard normal distribution and k is the number ofre-samples in the bootstrap.

confidenceintervals

Download a free 10 day StatsDirect trial

Diversity ofclasses

Menu location: Analysis_Parametric_Diversity of Classes

This function calculates measuresof diversity and an estimate of the number of classes in the population given alist of counts of observations in each class from a sample of the population.

Most of the statistical theoryused here originates from work in economics (Gini, 1912) andinformation science (Shannon, 1948),and has been developed further in ecology, and genetics. Ecologicalapplications usually involve studies of biodiversity, therefore the classes arespecies or other taxa (pl. taxon,group used by a taxonomist). In genetics the classes could be alleles (any oftwo or more alternative forms of a gene occupying the same chromosomal locus).These principles have been applied to other areas of study such as microbiology(Hunter andGaston, 1988; Grundmann et al., 2001), and potentially tomany more, such as community development.

This area of study is fraughtwith potential confusion over terms used to describe concepts. Diversity (orheterogeneity) includes both richness (the number of classes) and evenness (thedistribution of individuals among classes). The most useful descriptions ofdiversity, therefore, present both measures of richness and evenness.

Two commonly used measuresSimpson's index Ds and Shannon's index H'.There are many more indices and none is best for all applications (Hurlbert, 1971;Smith, 2002; Kempton, 2002; Brower et al., 1998; Krebs, 1989; Mouillot andLeprêtre, 1999). Common weaknesses of some of these indices are dependenceupon a model of class abundance that you don't know in advance, variation withsample size, poor discriminatory ability for specific applications, or poortheoretical justification.

Simpson's index Ds (equal to oneminus Simpson's original measure of dominance, l, later proposed by Hurlbert as PIE, the probablilityof interspecific encounter) is the most meaningfulmeasure of evenness. Ds is the probability that tworandomly sampled individuals are from two different classes. This is equivalentto the genetic calculation of heterozygosity, H,being the probability that two alleles are not identical by descent. It followsthat 1-Ds, or dominance l, is the probability that two randomly sampledindividuals are from the same class.

- where sis the number of classes observed, ni is thenumber observed from the ith class and N isthe total number of individuals observed in the sample. Note that Hurlbert (1971) gives a different form of this equation andthat the one above is better because it reduces rounding error by reducing theamount of intermediate division.

The variance for Ds can beestimated as:

- thesecond formula above gives better variance estimates for small samples thandoes the first (Simpson,1949; Brower, 1998).

Shannon's index of diversity H'is derived from information theory, originally in the context of information intelephone systems (Shannon,1948). It combines both evenness and richness in a single measure. H' hasno intuitive interpretation in terms of probability and is sensitive to samplesize. H' was once thought to be a measure of entropy, but this is no longersupported (Hurlbert,1971, Goodman 1975). H' can lead to confounded comparisons where theinvestigator can not infer whether or not differences in H' are due todifferences in richness, diversity or just sampling differences. StatsDirect calculates H' solely for consistency because ithas been used widely in the past.

- where sis the number of classes observed, ni is thenumber observed from the ith class and N isthe total number of individuals observed in the sample. Note that some authorsuse different bases for the logarithms, giving differently scaled results, butit makes no difference which is used provided you are consistent. If you wantto convert the natural log results of StatsDirect tolog (base 10) results then simply multiply H' by 0.4343.

The variance for H' can beestimated as:

- thesecond formula above gives better variance estimates for small samples thandoes the first (Shannon,1948; Nayak, 1985; Pardo et al. 1997). Note that there is an error in thesecond formula in Broweret al. (1998).

The large sample varianceestimates above are used to calculate confidence intervals for Ds and H'. Theseasymptotic estimates of variance do not perform well with small samples, whichcan be compensated for using the small sample adjustments shown above. A betterapproach is to use bootstrap confidence intervals in order to get as muchinformation as possible out of your sample. StatsDirectcalculates two types of bootstrap confidence intervals for diversity indices, these are the bootstrap refinement of the normalasymptotic interval (Mills andZandvakili, 1997; Dixon et al., 1987; Efron and Tibshirani, 1997):

- where g is either the Simpsonor Shannon statistic calculated from the observed sample, k is the numberbootstrap resamples, g star is the statistic ofinterest calculated from a bootstrap sample, seb isthe bootstrap estimate of standard error and t is a quantileof the Student t distribution.

…and a thesymmetrized bootstrap-t interval (Vives et al., 2002;Hall 1988):

- whereG is the estimated bootstrap distribution of the absolute value of the studentized sample diversity index.

The resamplingscheme used for the bootstrap intervals above is the allocation of one observationto each of s classes followed by allocation at random of the remaining N-sobservations to the s classes. This scheme keeps a constant number of classesin each bootstrap sample.

Vives et al. (2002)showed that percentile methods (including BCa) do notperform well for Shannon's index,they advise using a symmetrized bootstrap-tconfidence interval.

StatsDirect also extrapolates the richness (number of classes) in your sample inorder to give an estimate of the number of classes in the population. There aredifferent approaches to this extrapolation, a well-founded method that does notassume a model of class abundance is that of Chao (1984) asdiscussed by Colwelland Coddington (1994).

- where S is the estimate of thetotal number of classes in the population, s is the number of classes observedin the sample, a is the number of classes with exactly one individual(singletons), b is the number of classes with exactly two individuals(doubletons), and where 1 is substituted for a or b if either has no singletonsor doubletons.

Example

Test workbook (Nonparametricworksheet: Community (RAPD)).

Consider the following counts ofnumbers of types of Staphylococcus aureus strainsfound in hospital samples (Grundmann et al.,2001).

CR1	30	CR8	6
CR2	13	CR9	6
CR3	9	CR10	5
CR4	8	CR11	2
CR5	7	CR12	2
CR6	7	CR13	2
CR7	7	CR14-26	1

To test these data for diversityusing StatsDirect you must first prepare them in aworkbook column. Alternatively, open the test workbook using the file openfunction of the file menu. Then select the Diversity item from the parametricmethods section of the analysis menu. Selectthe column marked "Community (RAPD)" when prompted for data.

For this example:

Analysis for Community (RAPD):

Total number of counts = 117

Number of classes observed(richness) = 26

Estimated total number ofclasses = 54

Standard error (large sample) =16.196498

Normal (largesample) 95% CI = 22 to 86

Simpson Ds (Hurlburt PIE) = 0.899352, (dominancel = 0.100648, ds = 9.935578)

Standard error (large sample) =0.016826

Standard error (small sample) =0.017173

Normal (largesample) 95% CI = 0.866373 to 0.932331

Re-samples = 2000, bias =-0.022018, standard error (bootstrap) = 0.011455

Normal (bootstrap) 95% CI =0.876886 to 0.921817

Shannon H' (base e) = 2.656515

Standard error (large sample) =0.095839

Standard error (small sample) =0.10049

Normal (largesample) 95% CI = 2.468673 to 2.844356

Re-samples = 2000, bias =-0.16601, standard error (bootstrap) = 0.065119

Normal (bootstrap) 95% CI =2.528806 to 2.784223

confidenceintervals

外科	妇产科	儿科
内科学	生理学	更多

药理学	中药学	药物化学
生药学	卫生毒理学	更多