Beyond kappa: A review of interrater agreement measures

M Banerjee, M Capozzoli… - Canadian journal of …, 1999 - Wiley Online Library
In 1960, Cohen introduced the kappa coefficient to measure chance‐corrected nominal
scale agreement between two raters. Since then, numerous extensions and generalizations …

Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α

LM Hsu, R Field - Understanding Statistics, 2003 - Taylor & Francis
The Cohen (1960) kappa interrater agreement coefficient has been criticized for penalizing
raters (eg, diagnosticians) for their a priori agreement about the base rates of categories (eg …

Computing inter‐rater reliability and its variance in the presence of high agreement

KL Gwet - British Journal of Mathematical and Statistical …, 2008 - Wiley Online Library
Pi (π) and kappa (κ) statistics are widely used in the areas of psychiatry and psychological
testing to compute the extent of agreement between raters on nominally scaled data. It is a …

Inequalities between multi-rater kappas

MJ Warrens - Advances in data analysis and classification, 2010 - Springer
The paper presents inequalities between four descriptive statistics that have been used to
measure the nominal agreement between two or more raters. Each of the four statistics is a …

Sample size determinations for the two rater kappa statistic

VF Flack, AA Afifi, PA Lachenbruch, HJA Schouten - Psychometrika, 1988 - Springer
This paper gives a method for determining a sample size that will achieve a prespecified
bound on confidence interval width for the interrater agreement measure, κ. The same …

[PDF][PDF] Kappa statistic is not satisfactory for assessing the extent of agreement between raters

K Gwet - Statistical methods for inter-rater reliability assessment, 2002 - agreestat.com
Evaluating the extent of agreement between 2 or between several raters is common in
social, behavioral and medical sciences. The objective of this paper is to provide a detailed …

[PDF][PDF] The measurement of interrater agreement

JL Fleiss, B Levin, MC Paik - Statistical methods for rates and …, 1981 - Citeseer
The statistical methods described in the preceding chapter for controlling for error are
applicable only when the rates of misclassification are known from external sources or are …

How robust are multirater interrater reliability indices to changes in frequency distribution?

D Quarfoot, RA Levine - The American Statistician, 2016 - Taylor & Francis
Interrater reliability studies are used in a diverse set of fields. Often, these investigations
involve three or more raters, and thus, require the use of indices such as Fleiss's kappa …

A generalized kappa coefficient

JS Uebersax - Educational and Psychological Measurement, 1982 - journals.sagepub.com
Previously proposed methods for calculating the kappa measure of nominal rating
agreement among multiple raters are not applicable in many situations. This paper presents …

Five ways to look at Cohen's kappa

MJ Warrens - Journal of Psychology & Psychotherapy, 2015 - research.rug.nl
The kappa statistic is commonly used for quantifying inter-rater agreement on a nominal
scale. In this review article we discuss five interpretations of this popular coefficient. Kappa is …