Module 5: Relationships in Categorical Data with Intro to Probability

Two-Way Tables (4 of 5)

Two-Way Tables (4 of 5)

Learning OUTCOMES

  • Analyze and compare risks using conditional probabilities.

When we calculate the probability of a negative outcome like a heart attack, we often refer to the probability as a risk. For example, we talk about the probability of winning the lottery but the risk of getting struck by lightning. Whenever you see the word risk, keep in mind it’s just another word for probability.

Example

Risk and the Physicians’ Health Study

Researchers in the Physicians’ Health Study (1989) designed a randomized clinical trial to determine whether aspirin reduces the risk of heart attack. Researchers randomly assigned a large sample of healthy male physicians (22,071) to one of two groups. One group took a low dose of aspirin (325 mg every other day). The other group took a placebo. This was a double-blind experiment. Here are the final results.

Heart Attack No Heart Attack Row Totals
Aspirin 139 10,898 11,037
Placebo 239 10,795 11,034
Column Totals 378 21,693 22,071

Note that the categorical variables in this case are

  • Explanatory variable: Treatment (aspirin or placebo)
  • Response variable: Medical outcome (heart attack or no heart attack)

Question:
Does aspirin lower the risk of having a heart attack?

To answer this question, we compare two conditional probabilities:

  • The probability of a heart attack given that aspirin was taken every other day.
  • The probability of a heart attack given that a placebo was taken every other day.

From the table we have

  • P(heart attack | aspirin) = 139 / 11,037 = 0.013
  • P(heart attack | placebo) = 239 / 11,034 = 0.022

The result shows that taking aspirin reduced the risk from 0.022 to 0.013.

We often compare two risks by calculating the percentage change. We calculate the difference (how much the risk changed) and divide by the risk for the placebo group.

Here is the calculation:

[latex]\frac{0.013-0.022}{0.022} = \frac{-0.009}{0.022} \approx -0.41[/latex]

Therefore, we conclude that taking aspirin results in a 41% reduction in risk.

As reported in the New England Journal of Medicine, “This trial of aspirin for the primary prevention of cardiovascular disease demonstrates a conclusive reduction in the risk of myocardial infarction (heart attack).” (Source: “Final Report on the Aspirin Component of the Ongoing Physicians’ Health Study,” New England Journal of Medicine 321(3):129–35, 1989.)

Comment

In the preceding example, we compared the difference in risk (how much the risk changed) to the risk for the placebo (nontreatment) group:

[latex]\text{percentage reduction of risk} = \frac{\text{new treatment risk - placebo risk}}{\text{placebo risk}}[/latex]

In general, we are interested in determining how much a new treatment reduces the risk compared to a reference risk. The reference may be nontreatment (e.g., use of a placebo), or it could be an existing treatment that we hope to improve on. So we have:

[latex]\text{percentage reduction of risk} = \frac{\text{new treatment risk - reference risk}}{\text{reference risk}}[/latex]

The following table is used for the next Try It activity.

Nonfatal Fatal Row Totals
Seat Belt 412,368 510 412,878
No Seat Belt 162,527 1,601 164,128
Column Totals 574,895 2,111 577,006

Try It

Let’s summarize our work with probability. We defined three kinds of probabilities related to a two-way table.

  • A marginal probability is the probability of a categorical variable taking on a particular value without regard to the other categorical variable. For example, P(Health Sciences) is the probability that a student is enrolled in the Health Sciences program. In calculating the probability, we use overall student data contained in the margins of the table. We do not take into account the other categorical variable: gender.
  • A conditional probability is the probability of a categorical variable taking on a particular value given the condition that the other categorical variable has some particular value. For example, P(Health Sciences given female) is the probability that a student is enrolled in Health Sciences given that we know the student is female. In calculating the probability, we use only a subset of the data. The subset used is determined by the given condition: if our condition relates to female students, then we consider only the information in the table pertaining to females.
  • A joint probability is the probability that the two categorical variables each take on a specific value. For example: P(male and Info Tech) is the probability that a student is both a male and in the Info Tech program. In calculating this probability, we divide the count in one inner cell of the table by the overall total count (in the lower right corner).

When we calculate the probability of a negative outcome like a heart attack, we often refer to the probability as a risk. We compare risk by calculating the percentage change:

[latex]\text{percentage reduction of risk} = \frac{\text{new treatment risk - reference risk}}{\text{reference risk}}[/latex]

 

CC licensed content, Shared previously

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Concepts in Statistics Copyright © 2023 by CUNY School of Professional Studies is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book