微 信 题 库 搜 索
精品课程
热门课程
>
外科 妇产科 儿科
内科学 生理学 更多
药学专业
药理学 中药学 药物化学
生药学 卫生毒理学 更多
中医专业
中医基础理论 中医学 更多
口腔专业
口腔内科 口腔外科 更多
 医学全在线 > 精品课程 > 卫生统计学 > 南方医科大学 > 正文
医学统计学-电子教材:Basic Descriptive Statistics
来源:南方医科大学精品课程网 更新:2013/9/13 字体:

Content

Book Basic Descriptive Statistics

Page Quick univariate summary

Page Univariate summary

Page Central tendency

Page Variance

Page Standard deviation

Page Standard error

Page Skewness

Page Frequencies

Page Crosstabs

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Quick univariate summary. 1

Univariate summary. 2

Central tendency. 6

Variance, standard deviation and spread.. 6

Variance, standard deviation and spread.. 7

Variance, standard deviation and spread.. 8

Skewness. 9

Frequencies. 10

Crosstabs. 11

Quick univariate summary

Menu location: Analysis_Descriptive_Quick Summary.

This function provides rapidaccess to descriptive statistics for a worksheet column of data.

Shortcut: click on the right mouse button when the mouse cursor is over thecolumn of data you want to describe and you will be given summary statisticsfor that column, provided the setting of the Edit_Optionsmenu item is set to "Column summary".

The statistics calculated hereare a sub-set of those available through the Analysis_Descriptive_DescriptiveReport menu function. If you want to calculate summary statistics for morethan one column at a time then you must use the Analysis_Descriptive_DescriptiveReport menu function.

For definitions of the statisticscalculated, please see descriptivereport.

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Univariatesummary

Menu locations:

Analysis_Descriptive_Univariate Summary;

Analysis_Descriptive_Weighted Univariate Summary.

This function provides measuresof location and dispersion which describe the data in a worksheet column. Youare given the number, arithmetic mean, sum, variance, standard deviation,standard error of the arithmetic mean, coefficient of variance, confidenceinterval for the arithmetic mean, geometric mean, coefficient of skewness, coefficient of kurtosis, maximum, upper quartile,median, lower quartile, minimum and range for each selected variable. You canalso choose to calculate an additional quantile andthis is appended to the results listed above. Incalculable results aredisplayed as missing data using an asterisk (*).

If you selectmore than one column of data to describe then you are given an option to savethe results to worksheet columns. Saved columns of results represent thestatistics, mean, median etc., and their rows represent the variables/columnsyou selected to describe.

Confidence limits (boundaries ofthe confidence interval)are given for the arithmetic mean. Please see quantileconfidence interval for confidence intervals for the median and othermeasures of location.

Some related topics:

·central tendency

·variance, standard deviation and spread

·skewness

·normal distribution

·quantiles

·quantile confidence intervals

·histogram

Please refer to one of thegeneral textbooks listed in the reference sectionfor discussion of the application and relative merits of individual descriptivestatistics.

Definitions

Valid data and missing data:

For each worksheet column thatyou select, the number of valid data are the number of cells that can beinterpreted as numbers, the remaining cells that can not be interpreted asnumbers are counted as missing (e.g. empty cell, asterisk or text label). Thesample size used in the calculations below is the number of valid data.

Sum, mean, variance, standarddeviation, standard error and variance coefficient:

- where S is the summation for allobservations (xi) in a sample, x bar is the sample (arithmetic) mean, n is thesample size, s² is the sample variance, s is the sample standard deviation, sem is the standard error of the sample mean, upper andlower CL are the confidence limits of the confidence interval for the mean, ta, n-1 is the(100*a)% two tailed quantile from the Student tdistribution with n-1 degrees of freedom, and vc isthe variance coefficient.

Skewness and kurtosis:

- where S is thesummation for all observations (xi) in a sample, x bar is the sample mean and nis the sample size. Note that there are other definitions of these coefficientsused by some other statistical software. StatsDirectuses the standard definitions for which critical values are published instandard statistical tables (Pearson and Hartley,1970; Stuart and Ord, 1994).

Geometric mean:

The geometric mean is a usefulmeasure of central tendency for samples that are log-normally distributed (i.e.the logarithms of the observations are from an approximately normaldistribution). The geometric mean is not calculated for samples that containnegative values.

- where S is thesummation for all observations (xi) in a sample, lnis the natural (base e) logarithm, exp is the exponent (anti-logarithm for basee), gm is the sample geometric mean and n is the sample size.

Weights:

If weights are selected then theweights that you supply are first normalised so thatthey sum to the total number of observations n:

- wherevi is a user supplied weight and wi is the normalised weight.

The following formulae replacethe mean, variance and moments calculations defined above when weights areused:

Median, quartiles and range:

For samples that are not from anapproximately normal distribution, for example when data are censored to removevery large and/or very small values, the following nonparametric statisticsshould be used in place of the arithmetic mean, its variance and the otherparametric measures above.

Median (50th centile,quantile 0.5), lower quartile (25th centile, quantile 0.25) and upperquartile (75th centile, quantile0.75) are defined generally as quantiles:

Two different quantiledefinitions (Weisberg,1992; Gleason, 1997; Stuart and Ord, 1994) are used in the summarystatistics, the first allows for weights and the second is the conventional quantile that is also used in the quantileconfidence interval function:

Type 1

- where p is a proportion, Q isthe pth quantile (e.g.median is Q(0.5)), u is an observation from a sample after it has been orderedfrom smallest to largest value, n is the sample size, w is a weight normalised so that it sums to n and

Type 2

- where p is a proportion, Q isthe pth quantile (e.g.median is Q(0.5)), fix is the integer part of a real number, h is thefractional part of order statistic i, u is anobservation from a sample after it has been ordered from smallest to largestvalue and n is the sample size.

Technical validation

The computational methods used inStatsDirect univariatesummary statistics, including this function, provide 15 decimal places ofprecision. This is tested against known standards such as the reference dataset used in the example below.

Example

Test workbook (Parametricworksheet: Michelson).

The data are 100 measurements ofthe speed (millions of meters per second) of light in air recorded by Michelsonin 1879 (Dorsey,1944). The American National Institute of Standards and Technology usethese data as part of the Statistical Reference Datasets for testingstatistical software (McCullough andWilson, 1999; http://www.nist。gov.itl/div898/strd).

Open the test workbook and selectthe "Michelson" column. Choose descriptive report from thedescriptive section of the analysis menu and click on OK when you see a list ofdescriptive statistics options.

Results from StatsDirect(with decimal places in Analysis_Options set to 12and centile type 2 selected):

Descriptive statistics

Variables

Michelson

Valid data

100

Missing data

0

Sum

29985.24

Mean

299.8524

Variance

0.006242666667

Standard deviation

0.079010547819

Variance coefficient

0.000263498134

Standard error of mean

0.007901054782

Upper 95% CL of mean

299.868077406834

Lower 95% CL of mean

299.836722593166

Geometric mean

299.852389694496

Skewness

-0.01825961396

Kurtosis

3.263530532311

Maximum

300.07

Upper quartile

299.895

Median

299.85

Lower quartile

299.805

Minimum

299.62

Range

0.45

Centile 95

299.98

Centile 5

299.73

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Centraltendency

The three common measures ofcentral tendency of a distribution are the arithmeticmean, the medianand the mode. Think of a distribution in terms of anhistogram with many bars; a large sample from a normal distribution woulddescribe a bell shaped curve that is symmetrical. In a perfectly symmetrical,non-skeweddistribution the mean, median and mode are equal. As distributions become moreskewed the difference between these different measures of central tendency getslarger.

The mode is the most commonly occurringvalue in a distribution, population or sample.

The mean (arithmetic mean) is theaverage (sum of observations / number of observations) in a distribution,sample or population. The mean is more sensitive to outliers than the median ormode.

The median is the middle value ina sorted distribution, sample or population. When there is an even number ofobservations the median is the mean of the two central values.

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Variance,standard deviation and spread

The standard deviation of themean (SD) is the most commonly used measure of the spread of values in adistribution. SD is calculated as the square root of the variance (the averagesquared deviation from the mean).

Variance in a population is:

[x is avalue from the population, m is the mean of all x, n is the number of x in the population, S is thesummation]

Variance is usually estimatedfrom a sample drawn from a population. The unbiased estimate of populationvariance calculated from a sample is:

[x is anobservation from the sample, x-bar is the sample mean, n (sample size) -1 is degrees offreedom, S is the summation]

The spread of a distribution isalso referred to as dispersion and variability. All three terms mean the extentto which values in a distribution differ from one another.

SD is the best measure of spreadof an approximately normal distribution. This is not the casewhen there are extreme values in a distribution or when the distribution isskewed, in these situations interquartile range orsemi-interquartile are preferred measures ofspread. Interquartile range is the difference betweenthe 25th and 75th centiles. Semi-interquartilerange is half of the difference between the 25th and 75th centiles.For any symmetrical (not skewed) distribution, half of its values will lie one semi-interquartile rangeeither side of the median, i.e. in the interquartilerange. When distributions are approximately normal, SD is a better measure ofspread because it is less susceptible to sampling fluctuation than (semi-)interquartile range.

If a variable y is a linear (y =a + bx) transformation of x then the variance of y isb² times the variance of x and the standard deviation of y is b times thevariance of x.

The standard error of the mean isthe expected value of the standard deviation of means of several samples, this is estimated from a single sample as:

[s isstandard deviation of the sample mean, n is the sample size]

See descriptivestatistics.

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Variance,standard deviation and spread

The standard deviation of themean (SD) is the most commonly used measure of the spread of values in adistribution. SD is calculated as the square root of the variance (the averagesquared deviation from the mean).

Variance in a population is:

[x is avalue from the population, m is the mean of all x, n is the number of x in the population, S is thesummation]

Variance is usually estimatedfrom a sample drawn from a population. The unbiased estimate of populationvariance calculated from a sample is:

[x is anobservation from the sample, x-bar is the sample mean, n (sample size) -1 is degrees offreedom, S is the summation]

The spread of a distribution isalso referred to as dispersion and variability. All three terms mean the extentto which values in a distribution differ from one another.

SD is the best measure of spreadof an approximately normal distribution. This is not the casewhen there are extreme values in a distribution or when the distribution isskewed, in these situations interquartile range orsemi-interquartile are preferred measures ofspread. Interquartile range is the difference betweenthe 25th and 75th centiles. Semi-interquartilerange is half of the difference between the 25th and 75th centiles.For any symmetrical (not skewed) distribution, half of its values will lie one semi-interquartile rangeeither side of the median, i.e. in the interquartilerange. When distributions are approximately normal, SD is a better measure ofspread because it is less susceptible to sampling fluctuation than (semi-)interquartile range.

If a variable y is a linear (y =a + bx) transformation of x then the variance of y isb² times the variance of x and the standard deviation of y is b times thevariance of x.

The standard error of the mean isthe expected value of the standard deviation of means of several samples, this is estimated from a single sample as:

[s isstandard deviation of the sample mean, n is the sample size]

See descriptivestatistics.

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Variance,standard deviation and spread

The standard deviation of themean (SD) is the most commonly used measure of the spread of values in adistribution. SD is calculated as the square root of the variance (the averagesquared deviation from the mean).

Variance in a population is:

[x is avalue from the population, m is the mean of all x, n is the number of x in the population, S is thesummation]

Variance is usually estimatedfrom a sample drawn from a population. The unbiased estimate of populationvariance calculated from a sample is:

[x is anobservation from the sample, x-bar is the sample mean, n (sample size) -1 is degrees offreedom, S is the summation]

The spread of a distribution isalso referred to as dispersion and variability. All three terms mean the extentto which values in a distribution differ from one another.

SD is the best measure of spreadof an approximately normal distribution. This is not the casewhen there are extreme values in a distribution or when the distribution isskewed, in these situations interquartile range orsemi-interquartile are preferred measures ofspread. Interquartile range is the difference betweenthe 25th and 75th centiles. Semi-interquartilerange is half of the difference between the 25th and 75th centiles.For any symmetrical (not skewed) distribution, half of its values will lie one semi-interquartile rangeeither side of the median, i.e. in the interquartilerange. When distributions are approximately normal, SD is a better measure ofspread because it is less susceptible to sampling fluctuation than (semi-)interquartile range.

If a variable y is a linear (y =a + bx) transformation of x then the variance of y isb² times the variance of x and the standard deviation of y is b times thevariance of x.

The standard error of the mean isthe expected value of the standard deviation of means of several samples, this is estimated from a single sample as:

[s isstandard deviation of the sample mean, n is the sample size]

See descriptivestatistics.

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Skewness

Skewness describes the asymmetry of a distribution. A skewed distributiontherefore has one tail longer than the other.

A positively skeweddistribution has a longer tail to the right:

A negatively skeweddistribution has a longer tail to the left:

A distribution with no skew(e.g. a normal distribution) is symmetrical:

In a perfectly symmetrical,non-skewed, distribution the mean, median and mode are equal. As distributionsbecome more skewed the difference between these different measures of centraltendency gets larger.

Positively skewed distributionsare more common than negatively skewed ones.

A coefficient of skewness for a sample is calculated by StatsDirectas:

- wherexi is a sample observation, x bar is the sample mean and n is the sample size.

Skewed distributions cansometimes be "normalized" by transformation.

See descriptivestatistics.

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Frequencies

Menu location: Analysis_Frequencies.

This function gives the actualand relative values for frequency and cumulative frequency of observations inthe samples you select. If you want the cumulative frequencies to representorder then sortthe data before using this function.

Example

The following represent responsesto an element of a questionnaire that used a Likertscale:

response

3

3

4

1

1

2

5

3

In order to analysethese data in StatsDirect, fi卫生资格考试网rst enter them into aworkbook column. Then selectthis column and choose the frequencies option of the analysis menu.

For this example:

N = 8

Value

Frequency

Relative %

Cumulative

Relative %

1

2

25

2

25

2

1

12.5

3

37.5

3

3

37.5

6

75

4

1

12.5

7

87.5

5

1

12.5

8

100

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Crosstabs

Menu location: Analysis_Crosstabs.

This a two orthree way cross tabulation function. If you havetwo columns of numbers that correspond to different classifications of the sameindividuals then you can use this function to give a two way frequency tablefor the cross classification. This can be stratified by a third classificationvariable.

For two way crosstabs, StatsDirect offers a range of analyses appropriate to thedimensions of the contingency table. For more information see chi-squaretests and exacttests.

For three way crosstabs, StatsDirect offers either odds ratio(for case-control studies) or relative risk(for cohort studies) meta-analyses for 2 by 2 by k tables, and generalisedCochran-Mantel-Haenszel tests for r by c by k tables.

Example

A database of test scorescontains two fields of interest, sex (M=1, F=0) and grade of skin reaction toan antigen (none = 0, weak + = 1, strong + = 2). Here is a list of those fieldsfor 10 patients:

Sex

Reaction

0

0

1

1

1

2

0

2

1

2

0

1

0

0

0

1

1

2

1

0

In order to get a crosstabulation of these from StatsDirect you should enterthese data in two workbook columns. Then choose crosstabs from the analysis menu.

For this example:

Reaction

0

1

2

Sex

0

2

2

1

1

1

1

3

We could then proceed to an r byc (2 by 3) contingencytable analysis to look for association between sex and reaction to thisantigen:

Contingency table analysis

Observed

2

2

1

5

% of row

40%

40%

20%

% of col

66.67%

66.67%

25%

50%

Observed

1

1

3

5

% of row

20%

20%

60%

% of col

33.33%

33.33%

75%

50%

Total

3

3

4

10

% of n

30%

30%

40%

TOTAL number of cells = 6

WARNING: 6 out of 6 cells haveEXPECTATION < 5

NOMINAL INDEPENDENCE

Chi-square = 1.666667, DF = 2, P= 0.4346

G-square = 1.726092, DF = 2, P =0.4219

Fisher-Freeman-Halton exact P = 0.5714

ANOVA

Chi-square for equality of meancolumn scores = 1.5

DF = 2, P = 0.4724

LINEAR TREND

Sample correlation (r) = 0.361158

Chi-square for linear trend (M²)= 1.173913

DF = 1, P = 0.2786

NOMINAL ASSOCIATION

Phi = 0.408248

Pearson's contingency = 0.377964

Cramér's V = 0.408248

ORDINAL

Goodman-Kruskalgamma = 0.555556

Approximate test of gamma = 0: SE= 0.384107, P = 0.1481, 95% CI = -0.197281 to 1.308392

Approximate test of independence:SE = 0.437445, P = 0.2041, 95% CI = -0.301821 to 1.412932

Kendall tau-b = 0.348155

Approximate test of tau-b = 0: SE = 0.275596, P = 0.2065, 95% CI = -0.192002 to0.888313

Approximate test of independence:SE = 0.274138, P = 0.2041, 95% CI = -0.189145 to 0.885455

相关文章
 执业卫生与执业医学典型实验教学录像
 儿科学电子教案:肾小球肾炎
 内科学图片库:支气管哮喘Thumbs
 外科学外科学习题:第三篇 普通外科
 中药药理学教学方法
 护理学基础电子教案:第九章(3)
   触屏版       电脑版       全站搜索       网站导航   
版权所有:医学全在线(m.med126.com)
网站首页
频道导航
医学论坛
返回顶部