Measurement, Validity and Reliability

elinwaring

5

As researchers, we have to move from the idea of a variable to something that we can actually measure in our research. Measurement means, essentially, how we make it into something that can be studied empirically. Different researchers may think about it differently depending on their theoretical and methodological perspective, but it is an issue for everyone.

The simplest formulations of this focus on two basic concepts: reliability and validity. This sounds simple but it is actually complex, because we have to ask reliable in what way and valid in what way?

There are tons of images that depict reliability and validity in variations of this way:

Target metaphor for reliability and vialidity — Source: https://statsthinking21.github.io/statsthinking21-core-site/working-with-data.html

in what is obviously a target metaphor. In this idea reliability refers to “consistency” and validity refers to “centered around the middle.” In this idea there is a correct measurement represented by the middle of the target. Valid measures evenly surround that target. A valid measure is a measure that “measures what you think you are measuring.” Of course there are different theoretical ideas about whether variables can ever be directly measured, and the argument would be that valid measures are unbiased (hence evenly distributed around the true value). However this does not mean that they are close to the actual value. But on average they are accurate. In contrast, reliable measures are consistent, meaning that if you take them over and over the results will be close to each other. However, reliable measures may be biased meaning that they are not centered around the true value.

This is simple take, but it’s important, because you can see that there is often a trade off between the two ideals. Of course, we would like measure that have high reliability and high validity, but often, not just in measurement but in methodology and statistics, we have to make decisions about how to balance the two. For example, looking at B and C above, the C points are all inside the target while the B ones include a number that totally missed the target. So we might prefer to have a reliable measure that is pretty valid than a valid measure that is only pretty reliable.

That’s a good starting point, but this terminology is going to come up again in research design when we talk to threats to internal validity and threats to external validity. This kind of validity is about whether our conclusions are accurate, not the individual measures. Internal validity is about whether we draw the correct conclusions within our study (or if we get fooled in some way by one or more of the threats). External validity is about whether the conclusions we draw from our study can be applied to other settings. Of course we want studies with both internal and external validity, but when you look at many research designs you see that this is not the case. For example, experimental design is considered to have high internal validity. But the external validity is sometimes suspect because lab conditions or limited sample make such generalization questionable.

It sometimes seems that there is not that much discussion of reliability compared to that about validity. However, the “reproducability crisis” and the “replication crisis” that have become a major topic of discussion in many social sciences shows that there is concern about the ideal of findings that are consistent.

Another way that people have thought about this is that qualitative research focuses more on internal validity — on really getting into the understanding of a vary specific setting or the understandings a person has. On the other hand, large scale quantitative research, with its emphasis on rigorous sampling methods and use of carefully designed and validated measures is more focused on reliability. But as you can see from the previous paragraphs, it really is not that simple.

Read

Make sure you explore the resources on the measurement memo assignment.

The chapter on Measurement here https://conjointly.com/kb/measurement-in-research/

Skim over chapters 4-6 https://lhbikos.github.io/ReC_Psychometrics/rxy.html but don’t worry about the code (unless you want to) but try to get a sense of the vocabulary.

Chapter 5 here covers operationalization an conceptualization in a political science context https://ipsrm.com/wp-content/uploads/2021/12/SP22-IPSRM-02.pdf

Sections 1.1-1.3 cover internal and external validity and threats thereto https://stats.libretexts.org/Courses/Kansas_State_University/EDCEP_917%3A_Experimental_Design_(Yang)/01%3A_Introduction_to_Research_Designs/1.01%3A_Research_Designs

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The Weirdest People in the World? RatSWD Working Papers, Article 139. https://ideas.repec.org//p/rsw/rswwps/rswwps139.html

Chapter 2.3-2.6 of Poldrack, C. 2019 R. A. (n.d.). Chapter 2 Working with data | Statistical Thinking for the 21st Century. Retrieved September 26, 2023, from https://statsthinking21.github.io/statsthinking21-core-site/working-with-data.html#what-makes-a-good-measurement

Read

Szymanski, D. M., & Bissonette, D. (2020). Perceptions of the LGBTQ College Campus Climate Scale: Development and Psychometric Evaluation. Journal of Homosexuality, 67(10), 1412–1428. https://doi.org/10.1080/00918369.2019.1591788

and

Watch this interview with Szymanski

https://spu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0f9696ab-df9a-452b-8ccd-aee101271054

Read

Lyons, L., Brasof, M., & Baron, C. (2020). Measuring Mechanisms of Student Voice: Development and Validation of Student Leadership Capacity Building Scales. AERA Open, 6(1), 2332858420902066. https://doi.org/10.1177/2332858420902066

5

License

Share This Book