Recently, a colleague of mine asked for some advice on how to compute interrater reliability for a coding task, and I discovered that there aren’t many resources online written in an easy-to-understand format – most either 1) go in depth about formulas and computation or 2) go in depth about SPSS without giving many specific reasons for why you’d make several important decisions.

If you think my writing about statistics is clear below, consider my student-centered, practical and concise Step-by-Step Introduction to Statistics for Business for your undergraduate classes, available now from SAGE.

Social scientists of all sorts will appreciate the ordinary, approachable language and practical value – each chapter starts with and discusses a young small business owner facing a problem solvable with statistics, a problem solved by the end of the chapter with the statistical kung-fu gained.

This means ICC(3) will also always be larger than ICC(1) and typically larger than ICC(2), and is represented in SPSS as “Two-Way Mixed” because 1) it models both an effect of rater and of ratee (i.e.

two effects) and 2) assumes a random effect of ratee but a fixed effect of rater (i.e. After you’ve determined which kind of ICC you need, there is a second decision to be made: are you interested in the reliability of a single rater, or of their mean?

In the social sciences, we often have research participants complete surveys, in which case you don’t need ICCs – you would more typically use coefficient alpha.

But when you have research participants provide something about themselves from which you need to extract data, your measurement becomes what you get from that extraction.

Or in other words, while a particular rater might rate Ratee 1 high and Ratee 2 low, it should all even out across many raters.

Like ICC(1), it assumes a random effects model for raters, but it explicitly models this effect – you can sort of think of it like “controlling for rater effects” when producing an estimate of reliability.

For example, in our Facebook study, we want to know both.

First, we might ask “what is the reliability of our ratings?

So I am taking a stab at providing a comprehensive but easier-to-understand resource.