Differences in Inter‐Rater Reliability and Accuracy for a Treatment Adherence Scale

SM Wu, U Whiteside, C Neighbors - Cognitive Behaviour Therapy, 2007 - Taylor & Francis
Cognitive Behaviour Therapy, 2007Taylor & Francis
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is
frequently used as a substitute for accuracy despite conceptual differences and literature
suggesting important differences between them. The aims of this study were to compare
inter‐rater reliability and accuracy among a group of raters, using a treatment adherence
scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate
raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive …
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.
Taylor & Francis Online
以上显示的是最相近的搜索结果。 查看全部搜索结果