Statistics & Probability

2014-10-02_1234This post contains resources for the talk "Using Statistics in Mathematics Classes" given by Jason Vitosh (Falls City High School, Falls City, NE) and myself at the Midwest Regional Noyce Conference on Thursday, October 2 from 2:15 pm - 3:00 pm.



Click on the link below to access the presentation file containing resources, images, and links.

10-2-14 Stats is not Math












My wife is a pharmacist at a hospital. She and her co-workers often provide me with some really interesting math problems. Clinical pharmacists often need mathematics to effectively work with complex patient cases. Modeling drug interactions becomes tricky as the number of prescriptions for a particular patient increases. As the number of prescriptions increases, the amount of time a pharmacist must spend doing drug interaction research also increases. Time is money, so herein lies the problem.

One of my wife’s co-workers asked her to ask me, “What is the number of possible permutations of seven days?” After looking at a handwritten note and talking to the pharmacist directly, here’s a more thorough description of the problem.

Coumadin is a drug used to treat issues associated with blood clots.

Risks come with any anti-clotting drug. If the patient has a car accident, for example, bleeding risk increases dramatically and can have dire consequences. The benefits associated with Coumadin usage must be weighed with the bleeding risk. Computing the correct treatment scheme – a schedule and selection of doses – is important to the patient’s safety. Doctors and pharmacists determine dosage based on a target value: INR target value of 2.5, target range of 2.0 – 3.0 (a confidence interval!!), and if the INR goes above 4.0, there is no greater therapeutic benefit to the dose and patient bleeding risk increases beyond any benefit.

Dosages for this particular drug vary dependent on many factors. However, for the sake of this problem, the pharmacist in question wishes to investigate how much time it will take to write an Excel spreadsheet to determine the different possible treatment schemes. We will assume the simple case: either a patient takes a dose (pill) on a particular day, or a patient does not.

Back to the original question: why does the pharmacist ask for possible “permutations” of the days of the week? Because these would correspond to patient dosage schedules. For example, if the patient takes a dose three days a week, they might take that dose on Monday – Tuesday – Wednesday and not take a pill the rest of the days of the week. Or they might take a dose Tuesday – Thursday – Saturday. All the possibilities would not correspond to permutations, despite the wording of the original question. What we really need to consider are combinations. Permutation means order matters, so we would treat Tuesday – Thursday – Saturday and Thursday – Tuesday – Saturday as different events, when in reality they would be the same treatment schedule in a given week.

If the patient takes a dosage of Coumadin three days a week, then the possible number of treatment schedules would be 7 choose 3.




There are 35 different possible ways to choose 3 days out of the 7 days in a week.

How would we count the number of possible treatment schedules assuming a patient either takes a dose or does not take a dose each day? We need to consider all the ways to select different groupings of 7 days. Enter Pascal’s Triangle & binomial coefficients.








The bottom row above corresponds to each case. 7 choose 0 would correspond to a patient not taking any Coumadin. There’s only one way for that to happen: the patient takes no doses. This would be the trivial case. We aren’t concerned with the patients not taking any doses. The blue number, 128, is the sum of the values in the row with 1, 7, 21, 35, 35, 21, 7, and 1.

Written a little more formally, we have





This value, 127, is 128 – 1, which is , where the 2 corresponds to the number of daily outcomes, either dose or no dose. The 7 corresponds to the number of days in week, and the subtracted 1 corresponds to the trivial case where of the 7 days, the patient takes doses on 0 days.

(Tangential side note: This value, 127, is the difference of a power of 2 and 1. I am immediately reminded of the original Legend of Zelda game on the Nintendo Entertainment System. The maximum number of rupees – currency – a player could obtain is 255, which is 256 – 1, and seems to be related to binary storage limitations of the game.)








This was the answer the pharmacist needed to communicate the number of different outcomes his Excel spreadsheet would need to consider. Some of the treatment schemes might be impractical, so instead of considering 127 different possibilities, he will argue they should boil the cases down to 10 or 12 common dosage schemes.

But for me, the math doesn’t stop there.

What if a person is taking a drug with a more elaborate dosing scheme? I wanted to put a structure on the next case up in complexity, the case where a patient might take one of two different doses on any given day. Here is the structure I used when reasoning through this case initially.

Case 2: Two different dosages on any given day
A = 5 mg dose of a drug (an arbitrary concentration)
B = 2.5 mg dose of a drug

1 = Monday
2 = Tuesday
3 = Wednesday
4 = Thursday
5 = Friday
6 = Saturday
7 = Sunday

A patient might miss a dose. Let C = no dose. Or, they might be instructed to take nothing on a particular day. Either way, the patient takes no dose. The table below describes all the cases for a week.


To count all the possible outcomes efficiently, we have 3 independent choices each day. At least, we will treat each treatment choice as independent although this may or may not be practically true. The total number of possible treatment schedules, then, would be




But we would also throw out the trivial case (no treatments on any day) by subtracting 1. So our total number of possible treatment schedules would be 2,186.

I then wondered if I could write a function to count the number of treatment schedules for any possible number of different dosages.




d = the number of different dosages a patient may take
t = number of days in the timeframe of reference
7 above for days in a week; we could change this to 30 for days in a given month

The number of cases grows quickly as the number of different dosages increases.







This problem is a great practical example of mathematics used in an authentic example. I can extend this problem to exponential decay and dosing with antibiotics. Kids need to know there's a reason a person needs to take an entire course of antibiotics, even if they are feeling better midway through the treatment course.

I will pose this question to my students when we start our unit on counting theory this coming school year. I need to spend more time this summer finding authentic applications locally.

Type I and Type II error are concepts my students have struggled with year after year. This school year, I decided to do something different in AP Stats.

KNEB news story: "Rabies Outbreak Soars in Goshen County"
(this particular county is very close to where we live in western Nebraska)

The news story above was posted the night before I was slated to introduce Type I and Type II error for the first time. The text of the news story caught my eye because something, literally, didn't add up.

...since February 7th of this year, out of 19 dead skunks collected, 16 tested positive for rabies. They also had 1 red fox test positive for the disease... Mills will be retiring next month and has been at the lab since 1984 and has never seen, even in an outbreak situation, this high of a percentage rate of positive samples at 89.5%...

It is interesting in the news story that the reported percentage, 89.5%, does not match the fraction 16/19 (which is approximately 84.2%). The 89.5% figure is a computational error. The person computing it took included the rabid red fox, despite the fact the red fox is a different species, and also did not increase the denominator by one. The 89.5% comes from the fraction 17/19, which is obviously not valid. (We had a nice class discussion over whether this was a computational error or a purposeful error meant to sensationalize the news story.)

When I first saw this story, I decided to scrap what I had planned for the day in AP Stats and made this worksheet in the planning period I had right before class.

APS Rabies Outbreak 3-7-14

I spent about 45 minutes trolling the internet - in particular, the USDA and CDC websites - trying to come up with count data for how many animals are tested for rabies in Wyoming in a given year. My searches took me here and here. I found data from 2010. 32 wild animals tested for rabies in Wyoming in 2010 - 2 were cattle, 12 were bats, and 20 were skunks. I could not locate more recent data or 2010 data on the total number of animals tested, which posed a problem, because we are really interested in the total number of animals tested to inform us on what proportion of rabies cases we expect. I wanted to reverse engineer the actual percentage of rabid skunks we expect in Wyoming... how high does the percentage of positive tests have to be before we release such a warning to the public? 50%? 60%? 75%?

[side note: I learned a great deal about how rabies spreads and which animals are typically affected. See the map below.]









U.S. map showing which animal is the most frequent rabies carrier by region [source: CDC]

I asked the students to use their TI-84 calculator and, through trial and error, to determine the population proportion value p (the actual proportion of rabid skunks) that would lead to NOT rejecting the null hypothesis for each alpha level. Here's a slide with the work a student wrote on the board:



I had students use trial and error, with the TI-84 one-proportion z-test, to determine what the decision would be (whether we would reject or fail to reject the null hypothesis) for different values of the assumed population proportion p.

For example, with an alpha level of .05 and assuming the population proportion of all skunks that have rabies is 50%, we would reject the null - the population proportion being 50% and conclude in favor of the alternative, that the proportion of skunks with rabies is likely higher than 50% based on the sample, since we expect to see a sample this extreme (16 out of 19 skunks rabid) only 14 times out of every 10,000 samples due to chance alone.

Through trial and error, my students found we would not reject the null hypothesis given a sample of 16 out of 19 skunks being rabid if the actual proportion of rabid skunks was 66.37% or higher. 2/3 of all skunks being rabid is pretty scary to think about, but if we are trying to determine why the USDA released a rabies warning, this is useful information for us.

I had students formulate the idea of a Type I error (mistakenly rejecting a true null hypothesis) and a Type II error (mistakenly failing to reject a false null hypothesis) in the context of this problem through discussion.

Type I error would mean the proportion of skunks with rabies is greater than the assumed proportion when in fact the assumed proportion is true. This would mean potentially raising a false alarm, or releasing an outbreak alert when one is not needed.

Type II error would mean concluding the proportion of skunks with rabies is the assumed proportion when in fact the actual proportion of skunks with rabies is higher. This would mean not alerting the public to a potential outbreak when in fact an outbreak is actually going on.

Students collectively agreed Type II error is more serious in this setting. The students discovered through this activity the decision they should make about the alpha level - maximizing the chance of a Type I error in favor of decreasing the chance of a Type II error.

My students breezed through the exam covering Type I and Type II error for the first time in my career. 🙂

When I first introduce experimental design to students in AP Stats, we consider the diagram below.

A fun exercise in stats class is to locate stories in the media and comment on how the story's author interprets findings. Here's one such example from my morning Internet surfing.

Why Being Tired Can Make You Thinner and Healthier
Source: Yahoo News, Shine section

The following passage is taken from the article:

The urge to take care of our bodies when we're sleepy could be biological. “We proposed that people are more motivated to engage in healthful behavior when they are depleted and perceive their safety to be at stake," wrote study authors Monika Lisjak, assistant professor of marketing at Erasmus University in the Netherlands and Angela Y. Lee, professor of marketing at Northwestern University.

In the study, researchers asked subjects to read about the dangers of kidney disease and early detection, those with a family history included. Afterward, those who were feeling exhausted expressed a higher likelihood of being tested than their energized counterparts. In another study, subjects were asked to complete a survey on health and fitness, either before or after hitting the gym. After the survey, everyone was told to choose a gift of either sunblock or moisturizer. Those who had worked out were more likely to select the skin-saving sunblock.

Questions for students:

  • Does the treatment suggest participants will get 'thinner' when tired?
  • What is the author's motivation behind the wording in the article's title?
  • What is the author's definition of 'healthier'?
  • Could we design further research to confirm or disconfirm the author's claims?
  • What implications do the results have on our daily lives?
  • Is the source trustworthy? How would we make this decision?
  • What would we need to know about the sampling method to make an informed decision about the results of this study?
  • What would we need to know about the population of interest to make an informed decision about the results of this study?


The following problem is what I use each year I introduce the notion of z-scores and converting position values on a Normal distribution with mean μ and standard deviation σ [ N(μ, σ) ] to scores on the standard Normal distribution with mean 0 and standard deviation 1 [ N(0, 1) ].

We use YMS The Practice of Statistics, 3rd edition, in the AP Stats class I teach. Standardized scores makes its first appearance in Chapter 2. We cover this content in early September, a time where many college bound students are busy filling out college applications, preparing resumes, and requesting letters of reference. Since our school is in the Midwest, virtually all students are familiar with the ACT. Few know about the SAT; in particular, few know the maximum possible score on sections of the SAT. This activity leads to a nice thought experiment also, where the students must put themselves in the shoes of scholarship committee members making decisions that affect students' lives.

Here's what the example above looks like worked out on the Promethean board:

This decision is pretty simple to make. The student with an ACT math score of 33 has a relative performance far more impressive than the student with the SAT math score of 705. I start with this example because the values are fairly clean and the decision is easy to make.

But what happens when the computations reveal values that do not yield an 'easy' decision? Here's the example I use to immediately follow the ACT vs SAT issue.

I like this example because it requires the students to reflect on the choice they will make about units of length. Should we convert the feet & inches measurements to decimal feet? Or to inches? Many students choose to use inches. I show students the problem and put three minutes on a countdown timer.

Then I circulate the room as students work through the problem. I listen carefully for the discussion, for the argument of which student would be the better scholarship candidate. I randomly select a student to go to the front of the room to show the work they did and to explain their thinking. An example of some student work is below.

Listening to the students argue about this decision is fascinating. Some will insist that because the female has a z-score that is a whole unit higher (5.11 versus 4.109), the female deserves the scholarship.

Others will argue that because the normalcdf command on the TI-84 yields the same value to four decimal places, it does not matter which candidate we choose; both are equally good. (This four decimal claim is because the rounding convention we use as a default is to the nearest ten-thousandth when not specified).

Another school of thought amongst the students is that because the procedure does not yield a clear result, further analysis is needed, such as academic performance, financial need, or an assessment of each athlete's moral character. These factors of consideration are student-centric.

I challenge students to think from the perspective of the athletic team or the institution. Perhaps the conference is loaded with strong female athletes, so we need a strong female athlete to be competitive. Perhaps we need to choose the athlete whose family could provide more financial support in the event we have to split the scholarship value later. Many of these institution-centric considerations do not occur to the students naturally.

Using high quality problems like this one provides another hidden instructional benefit. I always have a conceptual hook on which to hang the process of standardizing scores. If I ask "Do you remember the process for standardizing scores on the Normal curve?" and get little to no positive responses, I can always quickly follow with, "Think back to the scholarship problem, where you had to compare two different candidates to see which candidate was better." This cuts down on the time I have to spend reteaching and allows us to be more efficient during class time.

As I look to summer when I have additional time to better my practice, this is one of the first problems I will look to film when I test the waters of 'flipping' the classroom.

A middle school teacher emailed me a really interesting problem earlier today:

I have a probability question for you that I hope you can answer. What is the probability of rolling 5 dice and getting 66553 in 3 attempts? It's not for any class or anything, a friend of mine wanted to know. (Apparently this is some bar room game of chance).

The situation reminded me of the game "Liar's Dice" featured in the movie Pirates of the Caribbean: Dead Man's Chest.

In the game, each player has five dice in a cup. I don't know whether Liar's Dice is the game the teacher references, but that's the first thing that came to mind. On to the problem...

Each of the five dice can show one of the values from {1, 2, 3, 4, 5, 6}. We can think about all five dice being different colors. That makes the outcome 66553 different than 65653. Thinking this way will help us count all possible outcomes.

I have a PDF of the solution I wrote up to this problem below. I plan to use this problem the next time I review independent events, dependent events, and mutually exclusive events in stats class.

Ways to Win the Dice Game Solution



See the PDF for the continuation of the solution I wrote. Please feel free to comment below.

I have a confession to make. When I first started teaching AP Statistics in 2005, I had no idea why a Normal probability plot (an example is shown to the left) was important... or what it told us about data. I busy trying to stay a day ahead of students that first year. I never really sat down with several textbooks to compare definitions and examples as I probably should have. Simply put, when students asked, I told them the canned answer: "The more linear the plot is, the more "Normal" the data is." We'd use the calculator to make the plot, look at it, and move on.

Let's take a closer look at why we study a Normal probability plot in AP Statistics. I will do some borrowing from various discussion board posts of the past on the AP Stats forum and will add some commentary as we go.

First, consider the method we use to compute a z-score; that is, a positional score for Normally distributed data that indicates the number of standard deviation units above or below the mean a particular data point lives. For example, if z = -1.2, then the data point is 1.2 standard deviations below the mean. It makes sense that a standardized score [ z = (x-μ)/σ] depends on two things: the data value's physical distance from the mean *and* the distance tempered by a measure of spread, specifically the standard deviation. Let's isolate x in this equation to see what happens.









The algebra above is commonly used in problems where we are asked to find a score which corresponds to a particular percentile rank. For example, if the mean score of the ACT is 18, and the standard deviation is 6, then what composite score puts a student in the 70th percentile of all test takers that day? A score slightly north of 21, as shown below.







The InvNorm command above finds the z-score corresponding to a cumulative area of .70 under the standard Normal curve, which has mean 0 and standard deviation 1. We see a z-score of .5244005101, according to the TI-84, gives the position for a data point in the 70th percentile. We can then reverse engineer the score needed to fall into this percentile.

In the world outside school, it's usually not likely we know the actual value of σ, the population standard deviation, or μ, the actual population mean. As unbiased estimators of these unknown values, we use , the sample mean, in place of μ, and we use s, the sample standard deviation, in place of σ. Then the value of x looks like Technically, once we make the substitutions, we would really be using a t-distribution of some flavor to model the data. On the other hand, in the example below, since we can get data on every qualified point guard in the NBA as of right now, we can directly compute the mean and standard deviation for the entire population, making this substitution unnecessary in this case. However, students need to be aware of the need for t-procedures.

To show an example of a Normal probability plot, I pulled NBA data from ESPN regarding point guard performance thus far in the 2013-14 regular season. Let's take a look at the top 26 (since there's a tie for 25th place) point guards in the NBA with respect to average points scored per game, the gray column labeled "PTS."


Let's enter the data from the table above in the TI-84.


Next, let's construct the Normal probability plot for the data. Norm_Prob_Plot






So... what exactly does this plot represent? And what makes this representation so important? The x-values obviously correspond to the average points per game value for each point guard. What about the y-coordinate of each point on the graph? The y-coordinate corresponds to the z-score related to that particular x-value. In the screen shot above, Kemba Walker, the point guard with 18.6 points per game, has a z-score of approximately .7039. If the data followed exactly a Normal curve, then all the points on the above graph would lie exactly on a straight line. By looking at the z-score for each data point using this display, we can get a quick insight into whether the data are Normally distributed. Let's look at a boxplot for the same data:



We can see, in the plot above, the data for these 26 point guards have no outliers, but there appears to be some skewness. Computing the values (Max - Q3) = 4.4 and (Q1 - Min) = 10.8 - 9.3 = 1.5 and 4.4 > 1.5, we can demonstrate this skewness. This numeric argument doesn't take a lot of calculator kung fu, but we do have to perform an extra computation or two. Looking back at the Normal probability plot, we could use the image to immediately notice the skewness of the data. Suppose we graphed the original z-score equation [z = (x-μ)/σ] on the same graph as the Normal probability plot. In other words, we will make the Normal probability plot. Take a look!


We only used 26 data points, so the data is a sample of the population of NBA point guards. Again, if the data were perfectly Normal, all the blue points would be living directly on the red line. We can use our knowledge of linear equations to see clearly what's going on here.

So the slope of this red line representing the 'perfectly' Normal data has slope 1/4.271785124. Let's find an equivalent value that's slightly more user friendly:

If we express this value as

notice we can say for every additional unit increase in x, the average points scored per game, we expect to see a z-score increase of .2340941716. Much like when we consider residuals while doing linear regression, when x-values deviate noticeably from the expected red line, they are surprising from the "Normal curve's point of view." The curvature at the left end of the Normal probability plot immediately indicates the skewness of the data. You can find more examples of this on your favorite search engine by asking for "Normal probability plot skewness." If we know how to visually recognize this pattern, we can immediately recognize skewness of data using a Normal probability plot.

This connection between the Normal distribution and why its z-scores are linear has a pretty good explanation on the Wikipedia entry for "Standard score."

People commonly use statistics like a drunk uses a lamppost:
for support, rather than for illumination.   - Mark Twain

Your tone of voice matters. Read the title of this post again... only this time, be really enthusiastic when your internal voice reads "Rules"

In my experience as a high school statistics teacher, probability is a topic that can prove vexing to even the 'best' students. The purpose of this post is for me to collect my thoughts on some things I am noticing in both my non-AP Statistics classes and in my AP Statistics classes. This may look like a random hodge-podge of my thoughts on teaching probability concepts... and it is. Empirically I have noticed a few things I think contribute to some of the misconceptions.

  • When we introduce the Binomial setting, we talk about two possible outcomes. We often classify these outcomes as 'success' or 'failure.' However, if the context of the problem is mortality rates attributed to malaria, we wouldn't really consider death a 'successful' outcome. We would include a death as an outcome of interest if we were considering the proportion of deaths in the probability space attributed to malaria. This contributes to misunderstanding complementary events, too. I encourage students to think in terms of counting. What should we count? What should we not count? If we want to know the number of county residents that do NOT have red hair, what would be easier to count? Those without red hair? Or count those with red hair and take those away from the county population?
  • Student experience with Venn diagrams varies wildly. Some are skilled with drawing, labeling, and interpreting Venn diagrams; others not. My students often can draw a circle representing outcome A, and a circle representing outcome B, but they often forget to draw a boundary, like a rectangle, around the two circles to represent the probability space. Failing to draw some kind of boundary around the two circles causes students to forget to consider the case where neither A nor B occur. If the boundary is there, and either A or B are guaranteed to happen, then the student could simply write a "0" in the space outside A and outside B. [see above picture; no boundary given]. Students often struggle with labeling the probabilities corresponding to the interiors of each circle because they forget to address the outcome where both A and B occur simultaneously.
  • Many students ask about "probability formulas." Think about how complicated Bayes Theorem first appears to students. Personally, when I work problems of this type, I make a tree diagram and label the outcomes. Somewhere along the way, students convince themselves there is always a formula for a probability problem. I think this reliance on formula is even more amplified than the same formula reliance we see in algebra and geometry classes.
  • Students struggle with when to add probabilities and when to multiply probabilities. Some texts address these as the "General Addition Rule" or the "General Multiplication Rule." It's really interesting to watch students try to resolve cognitive dissonance - that is, when the student perceives their current understanding is disconfirmed by what they observe. One way I try to address this difference with students is to address disjoint outcomes and dependent outcomes. We go back to drawing Venn diagrams.
  • Think about dice problems <ahem> I mean, random number cube problems. Suppose we are rolling two fair dice. We want to know the theoretical probability we roll a sum of five. My suspicion about a problem like this is the majority of students do not write out all 36 outcomes. Some might argue there is no need to write out all 36 outcomes because one can simply consider (1,4), (2,3), (3,2) and (4,1) to arrive at a neat and tidy 4/36... but I think this is a source of struggle for students that have a poor understanding of probability and counting. Many things might contribute to this avoidance of writing out all the possibilities. I imagine conserving class time and covering content might be reasons a teacher would avoid writing out all the cases... but the sample space provides the opportunity for rich discussion of many possible questions, not just the initial question.
  • Students carry preconceived notions of equations and inequality statements. Try asking your high school students to write out a statement like
    1/2 + x = (2/7)x + (6/7) = (-4/3)x + (5/3)
    and ask students what they would do first to solve for x [the image of the graph of these three linear equations is below; I fabricated this example by picking a simple linear equation and crafting two additional lines where all three lines intersect at a common point]. Students have a similar lack of understanding of inequality statements. For example, suppose we have some random variable that is Normally distributed with mean 100 and standard deviation 15. Then the reason P(X < 85) and P(Z < -1) are equivalent is because we performed the same algebraic transformation to each quantity we originally compared. This transformation standardizes the original values and places them on a distribution with mean 0 and standard deviation 1. What I want my students to notice is when we perform the same algebraic transformations on the 'sides' of an inequality statement, the truth or falsehood of the statement remains unchanged.
  • Students also have trouble for similar reasons with statements where a random variable takes on a value between two endpoints. Suppose with the above distribution we consider P(70 < X < 85) and P(-2 < Z < -1) and why these two statements are equivalent.

A different perspective on the probability matter comes from one of my favorite books, Mind Hacks: Tips & Tools for Using Your Brain. The following is taken from pages 294 and 295. I encourage you to check out some of the practical knowledge in this book. Stafford and Webb offer terrific explanations for some of the rational and irrational things the human brain does.

Our ability to think about probabilities evolved to keep us safe from rare events that would be pretty serious if they did happen (like getting eaten) and to help u