微 信 题 库 搜 索
精品课程
热门课程
>
外科 妇产科 儿科
内科学 生理学 更多
药学专业
药理学 中药学 药物化学
生药学 卫生毒理学 更多
中医专业
中医基础理论 中医学 更多
口腔专业
口腔内科 口腔外科 更多
 医学全在线 > 精品课程 > 卫生统计学 > 南方医科大学 > 正文
医学统计学-电子教材:Agreement
来源:南方医科大学精品课程网 更新:2013/9/13 字体:

 Content

 Agreement

 Agreement analyses

 Continuous (intra-class correlation etc.)

 Categorical (kappa etc.)

 Reliability and reducibility (Cronbach alpha etc.)

Agreementanalyses.. 1

Agreement of continuous measurements.. 1

Kappa and Maxwell.. 5

Principal components analysis and alpha reliabilitycoefficient.. 10

Agreement analyses

Menulocation: Analysis_Agreement

These methodslook at the agreement of a set of measurements across a sample of individuals.Your data should be prepared so that each row represents data from anindividual. Some literature lists these methods under 'reliability testing'.

·CONTINUOUS(intra-class correlation etc.)

 

·CATEGORICAL (kappa etc.)

·REDUCIBILITY (principal components analysis and Cronbachalpha)

 

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Agreement of continuous measurements

 

Menu location: Analysis_Analysisof Variance_Agreement.

 

The function calculates one way random effectsintra-class correlation coefficient, estimated within-subjects standarddeviation and a repeatability coefficient (Bland and Altman 1996a and 1996b, McGraw and Wong, 1996).

 

Intra-classcorrelation coefficient is calculated as:

- where m is the number of observations per subject, SSB isthe sum of squared between subjects and SST is the total sum of squares (as perone way ANOVA above).

 

Within-subjectsstandard deviation is estimated as the square root of the residual mean squarefrom one way ANOVA.

 

Therepeatability coefficient is calculated as:

- where m is thenumber of observations per subject, z is a quantilefrom the standard normal distribution (usually taken as the 5% two tailed quantile of 1.96) and xw is theestimated within-subjects standard deviation (calculated as above).

 

Intra-subjectstandard deviation is plotted against intra-subject means and Kendall'srank correlation iswww.med126.com used to assess the interdependence of these two variables.

 

An agreement plotis constructed by plotting the maximum differences from each possibleintra-subject contrast against intra-subject means and the overall mean ismarked as a line on this plot.

 

A Q-Q plot isgiven; here the sum of the difference between intra-subject observations andtheir means are ordered and plotted against an equal order of chi-square quantiles.

 

Agreement analysisis best carried out under expert statistical guidance.

 

Example

FromBland and Altman (1996a).

Test workbook(Agreement worksheet: 1st, 2nd, 3rd, 4th).

 

Five peak flowmeasurements were repeated for twenty children:

1st

2nd

3rd

4th

190

220

200

200

220

200

240

230

260

260

240

280

210

300

280

265

270

265

280

270

280

280

270

275

260

280

280

300

275

275

275

305

280

290

300

290

320

290

300

290

300

300

310

300

270

250

330

370

320

330

330

330

335

320

335

375

350

320

340

365

360

320

350

345

330

340

380

390

335

385

360

370

400

420

425

420

430

460

480

470

 

To analyse these data using StatsDirectyou must first enter them into a workbook or open the test workbook. Thenselect Agreement from the Analysis of Variance section of the Analysis menu.

 

Agreement

 

Variables: 1st,2nd, 3rd, 4th

 

Intra-class correlationcoefficient (one way random effects) = 0.882276

 

Estimatedwithin-subjects standard deviation = 21.459749

 

Forwithin-subjects sd vs. mean,Kendall's tau b =0.164457 two sided P = .3296

 

Repeatability (foralpha = 0.05) = 59.482297

 

 

 

 

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Kappa and Maxwell

 

Menu location: Analysis_Miscellaneous_Kappa& Maxwell.

 

AgreementAnalysis

For the case of two raters, this function givesCohen's kappa (weighted and unweighted) and Scott'spi as measures of inter-rater agreement for two raters' categorical assessments(Fleiss, 1981; Altman, 1991; Scott 1955). For three ormore raters, this function gives extensions of the Cohen kappa method, due to Fleiss and Cuzick (1979) in the case oftwo possible responses per rater, and Fleiss, Nee and www.med126.comLandis (1979) in the generalcase of three or more responses per rater.

 

If you have onlytwo categories then Scott's pi is the statistic of choice (with confidenceinterval constructed by the Donner-Eliasziw (1992) method) forinter-rater agreement (Zwick, 1988).

 

Weighted kappapartly compensates for a problem with unweightedkappa, namely that it is not adjusted for the degree of disagreement.Disagreement is weighted in decreasing priority from the top left (origin) ofthe table. StatsDirect uses the following definitionsfor weight (1 is the default):

 

1. w(ij)=1-abs(i-j)/(g-1)

2. w(ij)=1-[(i-j)/(g-1)]²

3. User defined(this is only available via workbook data entry)

 

g = categories

w = weight

i = category forone observer (from 1 to g)

j = category forthe other observer (from 1 to g)

 

In broad terms akappa below 0.2 indicates poor agreement and a kappa above 0.8 indicates verygood agreement beyond chance.

 

Guide (Landis and Koch, 1977):

Kappa

Strength of agreement

< 0.2

Poor

> 0.2 £ 0.4

Fair

> 0.4 £ 0.6

Moderate

> 0.6 £ 0.8

Good

> 0.8 £ 1

Very good

 

N.B. You can notreliably compare kappa values from different studies because kappa is sensitiveto the prevalence of different categories. i.e. if onecategory is observed more commonly in one study than another then kappa mayindicate a difference in inter-rater agreement which is not due to the raters.

 

Agreement analysiswith more than two raters is a complex and controversial subject, see Fleiss (1981, p. 225).

 

Disagreement Analysis

StatsDirect uses the methodsof Maxwell (1970) to test fordifferences between the ratings of the two raters (or k nominal responses withpaired observations).

 

Maxwell'schi-square statistic tests for overall disagreement between the two raters. Thegeneral McNemar statistic tests for asymmetry in thedistribution of subjects about which the raters disagree, i.e. disagreementmore over some categories of response than others.

 

Data preparation

You may presentyour data for the two-rater methods as a fourfold table in the interactivescreen data entry menu option. Otherwise, you may present your data asresponses/ratings in columns and rows in a worksheet, where the columnsrepresent raters and the rows represent subjects rated. If you have more thantwo raters then you must present your data in the worksheet column (rater) row(subject) format. Missing data can be used where raters did not rate allsubjects.

 

Technical validation

All formulae forkappa statistics and their tests are as per Fleiss (1981):

For two raters(m=2) and two categories (k=2):

- where n is thenumber of subjects rated, w is the weight for agreement or disagreement, po is the observed proportion of agreement, pe is the expected proportion of agreement, pij is the fraction of ratings iby the first rater and j by the second rater, and so is the standard error fortesting that the kappa statistic equals zero.

 

For three or moreraters (m>2) and two categories (k =2):

- where xi is the number of positive ratings out of mi ratersfor subject i of n subjects, and so is the standarderror for testing that the kappa statistic equals zero.

 

For three or moreraters and categories (m>2, k>2):

- where soj is the standard error for testing kappa equal for eachrating category separately, and so bar is the standard error for testing kappaequal to zero for the overall kappa across the k categories. Kappa hat iscalculated as for the m>2, k=2 method shown above.

 

Example

FromAltman (1991).

 

Altman quotes theresults of Brostoff et al. in a comparison not of twohuman observers but of two different methods of assessment. These methods areRAST (radioallergosorbent test) and MAST (multi-RAST)for testing the sera of individuals for specifically reactive IgE in the diagnosis of allergies. Five categories ofresult were recorded using each method:

 

 

 

RAST

 

 

Negative

weak

moderate

high

very high

MAST

negative

86

3

14

0

2

 

Weak

26

0

10

4

0

 

Moderate

20

2

22

4

1

 

High

11

1

37

16

14

 

very high

3

0

15

24

48

 

To analyse these data in StatsDirectyou may select kappa from the miscellaneous section of the analysis menu.Choose the default 95% confidence interval. Enter the above frequencies asdirected on the screen and select the default method for weighting.

 

For this example:

 

General agreement over all categories (2 raters)

 

Cohen's kappa (unweighted)

Observed agreement= 47.38%

Expected agreement= 22.78%

Kappa = 0.318628(se = 0.026776)

95% confidenceinterval = 0.266147 to 0.371109

z (for k = 0) =11.899574

P < 0.0001

 

Cohen's kappa (weighted by 1-Abs(i-j)/(1 - k))

Observed agreement= 80.51%

Expected agreement= 55.81%

Kappa = 0.558953(se = 0.038019)

95% confidenceinterval for kappa = 0.484438 to 0.633469

z (for kw = 0) = 14.701958

P < 0.0001

 

Scott's pi

Observed agreement= 47.38%

Expected agreement= 24.07%

Pi = 0.30701

 

Disagreement over any category and asymmetry ofdisagreement (2 raters)

Marginalhomogeneity (Maxwell) chi-square = 73.013451, df = 4, P < 0.0001

Symmetry (generalised McNemar) chi-square =79.076091, df = 10, P <0.0001

 

Note that forcalculation of standard errors for the kappa statistics, StatsDirectuses a more accurate method than that which is quoted in most textbooks (e.g. Altman, 1990).

 

The statisticallyhighly significant z tests indicate that we should reject the null hypothesisthat the ratings are independent (i.e. kappa = 0) and accept the alternativethat agreement is better than one would expect by chance. Do not put too muchemphasis on the kappa statistic test, it makes a lot of assumptions and fallsinto error with small numbers.

 

The statisticallyhighly significant Maxwell test statistic above indicates that the ratersdisagree significantly in at least one category. The generalisedMcNemar statistic indicates the disagreement is notspread evenly.

 

confidence intervals

P values

 

Copyright © 1990-2006 StatsDirectLimited, all rights reserved

Download a free 10 day StatsDirect trial

Principal components analysis andalpha reliability coefficient

 

Menu locations:

Analysis_Regression & Correlation_Principal Components

Analysis_Agreement_Reliability &Reducibility

 

This function provides principal components analysis(PCA), based upon correlation or covariance, and Cronbach'scoefficient alpha for scale reliability.

 

See questionnaire design for moreinformation on how to use these methods in designing questionnaires or otherstudy methods with multiple elements.

 

Principalcomponents analysis is most often used as a data reduction technique forselecting a subset of "highly predictive" variables from a largergroup of variables. For example, in order to select a sample of questions froma thirty-question questionnaire you could use this method to find a subset thatgave the "best overall summary" of the questionnaire (Johnson and Wichern, 1998; Armitage and Berry, 1994; Everittand Dunn, 1991; Krzanowski, 1988).

 

There are problemswith this approach, and principal components analysis is often wrongly appliedand badly interpreted. Please consult a statistician before using this method.

 

PCA does notassume any particular distribution of your original data but it is verysensitive to variance differences between variables. These differences mightlead you to the wrong conclusions. For example, you might be selectingvariables on the basis of sampling differences and not their "real"contributions to the group. Armitage and Berry (1994) give an exampleof visual analogue scale results to which principal components analysis wasapplied after the data had been transformed to angles as a way of stabilising variances.

 

Another problemarea with this method is the aim for an orthogonal or uncorrelated subset ofvariables. Consider the questionnaire problem again: it is fair to say that apair of highly correlated questions are serving much the same purpose, thus oneof them should be dropped. The component dropped is most often the one that hasthe lower correlation with the overall score. It is not reasonable, however, toseek optimal non-correlation in the selected subset of questions. There may bemany "real world" reasons why particular questions should remain inyour final questionnaire. It is almost impossible to design a questionnairewhere all of the questions have the same importance to every subject studied.For these reasons you should cast a net of questions that cover what you aretrying to measure as a whole. This sort of design requires strong knowledge ofwhat you are studying combined with strong appreciation of the limitations ofthe statistical methods used.

 

Everitt and Dunn (1991) outline PCA andother multivariate methods. McDowell and Newell (1996) and Streiner & Norman (1995) offer practicalguidance on the design and analysis of questionnaires.

 

Factor analysis vs. principal components

Factor analysis(FA) is a child of PCA, and the results of PCA are often wrongly labelled as FA. A factor is simply another word for acomponent. In short, PCA begins with observations and looks for components,i.e. working from data toward a hypothetical model, whereas FA works the otherway around. Technically, FA is PCA with some rotation of axes. There aredifferent types of rotations, e.g. varimax (axes arekept orthogonal/perpendicular during rotations) and oblique Procrustean (axesare allowed to form oblique patterns during rotations), and there isdisagreement over which to use and how to implement them. Unsurprisingly, FA ismisused a lot. There is usually a better analytical route that avoids FA; youshould seek the advice of a statistician if you are considering it.

 

Data preparation

To prepare datafor principal components analysis in StatsDirect youmust first enter them in the workbook. Use a separate column for each variable(component) and make sure that each row corresponds to the observations fromone subject. Missing data values in a row will cause that row / subject to bedropped from the analysis. You have the option of investigating eithercorrelation or covariance matrices; most often you will need the correlationmatrix. As discussed above, it might be appropriate to transform your data before applying thismethod.

 

For the example of0 to 7 scores from a questionnaire you would enter your data in the workbook inthe following format. You might want to transform these data first (Armitage and Berry, 1994).

 

 

Question 1

Question 2

Question 3

Question 4

Question 5

subject 1

5

7

4

1

5

subject 2

3

3

2

2

6

subject 3

2

2

4

3

7

subject 4

0

0

5

4

2

 

Internal consistency and deletion of individual components

Cronbach's alpha is a usefulstatistic for investigating the internal consistency of a questionnaire. Ifeach variable selected for PCA represents test scores from an element of aquestionnaire, StatsDirect gives the overall alphaand the alpha that would be obtained if each element in turn were dropped. Ifyou are using weights then you should use the weighted scores. You should notenter the overall test score as this is assumed to be the sum of the elementsyou have specified. For most purposes alpha should be above 0.8 to supportreasonable internal consistency. If the deletion of an element causes aconsiderable increase in alpha then you should consider dropping that elementfrom the test. StatsDirect highlights increases ofmore than 0.1 but this must be considered along with the "real world"relevance of that element to your test. A standardisedversion of alpha is calculated by standardising allitems in the scale so that their mean is 0 and variance is 1 before thesummation part of the calculation is done (Streiner and Norman, 1995; McDowell and Newell, 1996;Cronbach, 1951). You should use standardisedalpha if there are substantial differences in the variances of the elements ofyour test/questionnaire.

 

Technical Validation

Singular valuedecomposition (SVD) is used to calculate the variance contribution of eachcomponent of a correlation or covariance matrix (Krzanowski,1988; Chan, 1982):

 

The SVD of an nby m matrix X is USV' = X. U and Vare orthogonal matrices, i.e. V' V = V V' where V'is the transpose of V. U is a matrix formed fromcolumn vectors (m elements each) and V is a matrix formed fromrow vectors (n elements each). S is asymmetrical matrix with positive diagonal entries in non-increasing order. If Xis a mean-centred, n by m matrix where n>mand rank r = m (i.e. full rank) then the first r columns of Vare the first r principal components of X. The positive eigenvalues of X'X on XX arethe squares of the diagonals in S. Thecoefficients or latent vectors are contained in V.

 

Principalcomponent scores are derived from U and S via a S as trace{(X-Y)(X-Y)'}. For acorrelation matrix, the principal component score is calculated for thestandardized variable, i.e. the original datum minus the mean of the variablethen divided by its standard deviation.

 

Scale reversalis detected by assessing the correlation between the input variables and thescores for the first principal component.

 

A lowerconfidence limit for Cronbach's alpha is calculatedusing the sampling theory of Kristoff (1963) and Feldt (1965):

- where F is the F quantile for a100(1-p)% confidence limit, k is the number of variables and n is the number ofobservations per variable.

 

Example

Test workbook(Agreement worksheet: Question 1, Question 2, Question 3, and Question 4).

 

Principal components (correlation)

 

Sign wasreversed for: Question 3; Question 4

 

Component

Eigenvalue (SVD)

Proportion

Cumulative

1

1.92556

48.14%

48.14%

2

1.305682

32.64%

80.78%

3

0.653959

16.35%

97.13%

4

0.114799

2.87%

100%

 

With rawvariables:

 

Scalereliability alpha = 0.54955 (95% lower confidence limit = 0.370886)

 

Variable dropped

Alpha

Change

Question 1

0.525396

-0.024155

Question 2

0.608566

0.059015

Question 3

0.411591

-0.13796

Question 4

0.348084

-0.201466

 

Withstandardized variables:

 

Scalereliability alpha = 0.572704 (95% lower confidence limit = 0.403223)

 

Variable dropped

Alpha

Change

Question 1

0.569121

-0.003584

Question 2

0.645305

0.072601

Question 3

0.398328

-0.174376

Question 4

0.328003

-0.244701

 

You can see fromthe results above that questions 2 and 3 seemed to have scales going inopposite directions to the other two questions, so they were reversed beforethe final analysis. Dropping question 2 improves the internal consistency ofthe overall set of questions, but this does not bring the standardisedalpha coefficient to the conventionally acceptable level of 0.8 and above. Itmay be necessary to rethink this questionnaire.

 

相关文章
 护理管理学授课教案:第十一章 护理信息管理
 病理生理学电子教材(中文):第二节 原因和条
 内科学图片库:胃癌X线照片Thumbs
 内科学图片库:胃癌进展型胃癌胃镜检查1
 诊断学作业习题:名词解释
 中西医结合妇产科讲稿:第三节 胎位异常
   触屏版       电脑版       全站搜索       网站导航   
版权所有:医学全在线(m.med126.com)
网站首页
频道导航
医学论坛
返回顶部