Docsity
Docsity

Prepare-se para as provas
Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity


Ganhe pontos para baixar
Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium


Guias e Dicas
Guias e Dicas

Prob........................, Notas de aula de Probabilidade

....................................................

Tipologia: Notas de aula

2023

Compartilhado em 15/03/2023

adrian-castro-castroalves-tv
adrian-castro-castroalves-tv 🇧🇷

1 documento

1 / 45

Toggle sidebar

Esta página não é visível na pré-visualização

Não perca as partes importantes!

bg1
3
Random variables and their distributions
In this chapter, we introduce random variables, an incredibly useful concept that
simplifies notation and expands our ability to quantify uncertainty and summarize
the results of experiments. Random variables are essential throughout the rest of
this book, and throughout statistics, so it is crucial to think through what they
mean, both intuitively and mathematically.
3.1 Random variables
To see why our current notation can quickly become unwieldy, consider again the
gambler’s ruin problem from Chapter 2. In this problem, we may be very interested
in how much wealth each gambler has at any particular time. So we could make up
notation like letting Ajk be the event that gambler A has exactly jdollars after k
rounds, and similarly defining an event Bjk for gambler B, for all jand k.
This is already too complicated. Furthermore, we may also be interested in other
quantities, such as the difference in their wealths (gambler A’s minus gambler B’s)
after krounds, or the duration of the game (the number of rounds until one player is
bankrupt). Expressing the event “the duration of the game is rrounds” in terms of
the Ajk and Bjk would involve a long, awkward string of unions and intersections.
And then what if we want to express gambler A’s wealth as the equivalent amount
in euros rather than dollars? We can multiply a number in dollars by a currency
exchange rate, but we can’t multiply an event by an exchange rate.
Instead of having convoluted notation that obscures how the quantities of interest
are related, wouldn’t it be nice if we could say something like the following?
Let Xkbe the wealth of gambler A after krounds. Then Yk=NXk
is the wealth of gambler B after krounds (where Nis the fixed total wealth);
XkYk= 2XkNis the difference in wealths after krounds; ckXkis the wealth
of gambler A in euros after krounds, where ckis the euros per dollar exchange
rate after krounds; and the duration is R= min{n:Xn= 0 or Yn= 0}.
The notion of a random variable will allow us to do exactly this! It needs to be
introduced carefully though, to make it both conceptually and technically correct.
Sometimes a definition of “random variable” is given that is a barely paraphrased
103
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d

Pré-visualização parcial do texto

Baixe Prob........................ e outras Notas de aula em PDF para Probabilidade, somente na Docsity!

Random variables and their distributions

In this chapter, we introduce random variables, an incredibly useful concept that simplifies notation and expands our ability to quantify uncertainty and summarize the results of experiments. Random variables are essential throughout the rest of this book, and throughout statistics, so it is crucial to think through what they mean, both intuitively and mathematically.

3.1 Random variables

To see why our current notation can quickly become unwieldy, consider again the gambler’s ruin problem from Chapter 2. In this problem, we may be very interested in how much wealth each gambler has at any particular time. So we could make up notation like letting Ajk be the event that gambler A has exactly j dollars after k rounds, and similarly defining an event Bjk for gambler B, for all j and k.

This is already too complicated. Furthermore, we may also be interested in other quantities, such as the difference in their wealths (gambler A’s minus gambler B’s) after k rounds, or the duration of the game (the number of rounds until one player is bankrupt). Expressing the event “the duration of the game is r rounds” in terms of the Ajk and Bjk would involve a long, awkward string of unions and intersections. And then what if we want to express gambler A’s wealth as the equivalent amount in euros rather than dollars? We can multiply a number in dollars by a currency exchange rate, but we can’t multiply an event by an exchange rate.

Instead of having convoluted notation that obscures how the quantities of interest are related, wouldn’t it be nice if we could say something like the following?

Let Xk be the wealth of gambler A after k rounds. Then Yk = N − Xk is the wealth of gambler B after k rounds (where N is the fixed total wealth); Xk −Yk = 2Xk −N is the difference in wealths after k rounds; ckXk is the wealth of gambler A in euros after k rounds, where ck is the euros per dollar exchange rate after k rounds; and the duration is R = min{n : Xn = 0 or Yn = 0}.

The notion of a random variable will allow us to do exactly this! It needs to be introduced carefully though, to make it both conceptually and technically correct. Sometimes a definition of “random variable” is given that is a barely paraphrased

103

104 Introduction to Probability

version of “a random variable is a variable that takes on random values”, but such a feeble attempt at a definition fails to say where the randomness come from. Nor does it help us to derive properties of random variables: we’re familiar with working with algebraic equations like x^2 + y^2 = 1, but what are the valid mathematical operations if x and y are random variables? To make the notion of random variable precise, we define it as a function mapping the sample space to the real line. (See the math appendix for review of some concepts about functions.)

X

0 1 4

s 1 s 3

s 4

s 2

s 5 s 6

FIGURE 3. A random variable maps the sample space into the real line. The r.v. X depicted here is defined on a sample space with 6 elements, and has possible values 0, 1, and 4. The randomness comes from choosing a random pebble according to the probability function P for the sample space.

Definition 3.1.1 (Random variable). Given an experiment with sample space S, a random variable (r.v.) is a function from the sample space S to the real numbers R. It is common, but not required, to denote random variables by capital letters.

Thus, a random variable X assigns a numerical value X(s) to each possible outcome s of the experiment. The randomness comes from the fact that we have a random experiment (with probabilities described by the probability function P ); the map- ping itself is deterministic, as illustrated in Figure 3.1. The same r.v. is shown in a simpler way in the left panel of Figure 3.2, in which we inscribe the values inside the pebbles.

This definition is abstract but fundamental; one of the most important skills to develop when studying probability and statistics is the ability to go back and forth between abstract ideas and concrete examples. Relatedly, it is important to work on recognizing the essential pattern or structure of a problem and how it connects

106 Introduction to Probability

defined on the same sample space: the pebbles or outcomes are the same, but the real numbers assigned to the outcomes are different.

1

4

1

(^0 )

0 5.

3

2

(^9 )

s 1 s 3

s 4

s 2

s 5 s 6

s 1 s 3

s 4

s 2

s 5 s 6

FIGURE 3. Two random variables defined on the same sample space.

As we’ve mentioned earlier, the source of the randomness in a random variable is the experiment itself, in which a sample outcome s ∈ S is chosen according to a probability function P. Before we perform the experiment, the outcome s has not yet been realized, so we don’t know the value of X, though we could calculate the probability that X will take on a given value or range of values. After we perform the experiment and the outcome s has been realized, the random variable crystallizes into the numerical value X(s).

Random variables provide numerical summaries of the experiment in question. This is very handy because the sample space of an experiment is often incredibly com- plicated or high-dimensional, and the outcomes s ∈ S may be non-numeric. For example, the experiment may be to collect a random sample of people in a certain city and ask them various questions, which may have numeric (e.g., age or height) or non-numeric (e.g., political party or favorite movie) answers. The fact that r.v.s take on numerical values is a very convenient simplification compared to having to work with the full complexity of S at all times.

3.2 Distributions and probability mass functions

There are two main types of random variables used in practice: discrete r.v.s and continuous r.v.s. In this chapter and the next, our focus is on discrete r.v.s. Con- tinuous r.v.s are introduced in Chapter 5.

Definition 3.2.1 (Discrete random variable). A random variable X is said to be discrete if there is a finite list of values a 1 , a 2 ,... , an or an infinite list of values a 1 , a 2 ,... such that P (X = aj for some j) = 1. If X is a discrete r.v., then the

Random variables and their distributions 107

finite or countably infinite set of values x such that P (X = x) > 0 is called the support of X.

Most commonly in applications, the support of a discrete r.v. is a set of integers. In contrast, a continuous r.v. can take on any real value in an interval (possibly even the entire real line); such r.v.s are defined more precisely in Chapter 5. It is also possible to have an r.v. that is a hybrid of discrete and continuous, such as by flipping a coin and then generating a discrete r.v. if the coin lands Heads and generating a continuous r.v. if the coin lands Tails. But the starting point for understanding such r.v.s is to understand discrete and continuous r.v.s.

Given a random variable, we would like to be able to describe its behavior using the language of probability. For example, we might want to answer questions about the probability that the r.v. will fall into a given range: if L is the lifetime earnings of a randomly chosen U.S. college graduate, what is the probability that L exceeds a million dollars? If M is the number of major earthquakes in California in the next five years, what is the probability that M equals 0?

The distribution of a random variable provides the answers to these questions; it specifies the probabilities of all events associated with the r.v., such as the proba- bility of it equaling 3 and the probability of it being at least 110. We will see that there are several equivalent ways to express the distribution of an r.v. For a discrete r.v., the most natural way to do so is with a probability mass function, which we now define.

Definition 3.2.2 (Probability mass function). The probability mass function (PMF) of a discrete r.v. X is the function pX given by pX (x) = P (X = x). Note that this is positive if x is in the support of X, and 0 otherwise.

h 3.2.3. In writing P (X = x), we are using X = x to denote an event, consisting of all outcomes s to which X assigns the number x. This event is also written as {X = x}; formally, {X = x} is defined as {s ∈ S : X(s) = x}, but writing {X = x} is shorter and more intuitive. Going back to Example 3.1.2, if X is the number of Heads in two fair coin tosses, then {X = 1} consists of the sample outcomes HT and T H, which are the two outcomes to which X assigns the number 1. Since {HT, T H} is a subset of the sample space, it is an event. So it makes sense to talk about P (X = 1), or more generally, P (X = x). If {X = x} were anything other than an event, it would make no sense to calculate its probability! It does not make sense to write “P (X)”; we can only take the probability of an event, not of an r.v.

Let’s look at a few examples of PMFs.

Example 3.2.4 (Coin tosses continued). In this example we’ll find the PMFs of all the random variables in Example 3.1.2, the example with two fair coin tosses. Here are the r.v.s we defined, along with their PMFs:

  • X, the number of Heads. Since X equals 0 if T T occurs, 1 if HT or T H occurs,

Random variables and their distributions 109

Example 3.2.5 (Sum of die rolls). We roll two fair 6-sided dice. Let T = X + Y be the total of the two rolls, where X and Y are the individual rolls. The sample space of this experiment has 36 equally likely outcomes:

S = {(1, 1), (1, 2),... , (6, 5), (6, 6)}.

For example, 7 of the 36 outcomes s are shown in the table below, along with the corresponding values of X, Y, and T. After the experiment is performed, we observe values for X and Y , and then the observed value of T is the sum of those values.

s X Y X + Y (1, 2) 1 2 3 (1, 6) 1 6 7 (2, 5) 2 5 7 (3, 1) 3 1 4 (4, 3) 4 3 7 (5, 4) 5 4 9 (6, 6) 6 6 12

Since the dice are fair, the PMF of X is

P (X = j) = 1/ 6 ,

for j = 1, 2 ,... , 6 (and P (X = j) = 0 otherwise); we say that X has a Discrete Uni- form distribution on 1, 2 ,... , 6. Similarly, Y is also Discrete Uniform on 1, 2 ,... , 6.

Note that Y has the same distribution as X but is not the same random variable as X. In fact, we have P (X = Y ) = 6/36 = 1/ 6.

Two more r.v.s in this experiment with the same distribution as X are 7 − X and 7 − Y. To see this, we can use the fact that for a standard die, 7 − X is the value on the bottom if X is the value on the top. If the top value is equally likely to be any of the numbers 1, 2 ,... , 6, then so is the bottom value. Note that even though 7 − X has the same distribution as X, it is never equal to X in a run of the experiment!

Let’s now find the PMF of T. By the naive definition of probability,

P (T = 2) = P (T = 12) = 1/ 36 , P (T = 3) = P (T = 11) = 2/ 36 , P (T = 4) = P (T = 10) = 3/ 36 , P (T = 5) = P (T = 9) = 4/ 36 , P (T = 6) = P (T = 8) = 5/ 36 , P (T = 7) = 6/ 36.

For all other values of t, P (T = t) = 0. We can see directly that the support of T

110 Introduction to Probability

is { 2 , 3 ,... , 12 } just by looking at the possible totals for two dice, but as a check, note that P (T = 2) + P (T = 3) + · · · + P (T = 12) = 1, which shows that all possibilities have been accounted for. The symmetry property of T that appears above, P (T = t) = P (T = 14−t), makes sense since each outcome {X = x, Y = y} which makes T = t has a corresponding outcome {X = 7 − x, Y = 7 − y} of the same probability which makes T = 14 − t.

t

PMF

● ● ● ● ● ●

2 3 4 5 6 7 8 9 10 11 12

FIGURE 3. PMF of the sum of two die rolls.

The PMF of T is plotted in Figure 3.4; it has a triangular shape, and the symmetry noted above is very visible.  Example 3.2.6 (Children in a U.S. household). Suppose we choose a household in the United States at random. Let X be the number of children in the chosen household. Since X can only take on integer values, it is a discrete r.v. The proba- bility that X takes on the value x is proportional to the number of households in the United States with x children. Using data from the 2010 General Social Survey [23], we can approximate the pro- portion of households with 0 children, 1 child, 2 children, etc., and hence approxi- mate the PMF of X, which is plotted in Figure 3.5.  We will now state the properties of a valid PMF. Theorem 3.2.7 (Valid PMFs). Let X be a discrete r.v. with support x 1 , x 2 ,... (assume these values are distinct and, for notational simplicity, that the support is countably infinite; the analogous results hold if the support is finite). The PMF pX of X must satisfy the following two criteria:

  • Nonnegative: pX (x) > 0 if x = xj for some j, and pX (x) = 0 otherwise;
  • Sums to 1:

j=1 pX^ (xj^ ) = 1.

112 Introduction to Probability

3.3 Bernoulli and Binomial

Some distributions are so ubiquitous in probability and statistics that they have their own names. We will introduce these named distributions throughout the book, starting with a very simple but useful case: an r.v. that can take on only two possible values, 0 and 1.

Definition 3.3.1 (Bernoulli distribution). An r.v. X is said to have the Bernoulli distribution with parameter p if P (X = 1) = p and P (X = 0) = 1 − p, where 0 < p < 1. We write this as X ∼ Bern(p). The symbol ∼ is read “is distributed as”.

Any r.v. whose possible values are 0 and 1 has a Bern(p) distribution, with p the probability of the r.v. equaling 1. This number p in Bern(p) is called the parame- ter of the distribution; it determines which specific Bernoulli distribution we have. Thus there is not just one Bernoulli distribution, but rather a family of Bernoulli distributions, indexed by p. For example, if X ∼ Bern(1/3), it would be correct but incomplete to say “X is Bernoulli”; to fully specify the distribution of X, we should both say its name (Bernoulli) and its parameter value (1/3), which is the point of the notation X ∼ Bern(1/3).

Any event has a Bernoulli r.v. that is naturally associated with it, equal to 1 if the event happens and 0 otherwise. This is called the indicator random variable of the event; we will see that such r.v.s are extremely useful.

Definition 3.3.2 (Indicator random variable). The indicator random variable of an event A is the r.v. which equals 1 if A occurs and 0 otherwise. We will denote the indicator r.v. of A by IA or I(A). Note that IA ∼ Bern(p) with p = P (A).

We often imagine Bernoulli r.v.s using coin tosses, but this is just convenient lan- guage for discussing the following general story.

Story 3.3.3 (Bernoulli trial). An experiment that can result in either a “success” or a “failure” (but not both) is called a Bernoulli trial. A Bernoulli random variable can be thought of as the indicator of success in a Bernoulli trial: it equals 1 if success occurs and 0 if failure occurs in the trial. 

Because of this story, the parameter p is often called the success probability of the Bern(p) distribution. Once we start thinking about Bernoulli trials, it’s hard not to start thinking about what happens when we have more than one trial.

Story 3.3.4 (Binomial distribution). Suppose that n independent Bernoulli trials are performed, each with the same success probability p. Let X be the number of successes. The distribution of X is called the Binomial distribution with parameters n and p. We write X ∼ Bin(n, p) to mean that X has the Binomial distribution with parameters n and p, where n is a positive integer and 0 < p < 1. 

Notice that we define the Binomial distribution not by its PMF, but by a story

Random variables and their distributions 113

about the type of experiment that could give rise to a random variable with a Binomial distribution. The most famous distributions in statistics all have stories which explain why they are so often used as models for data, or as the building blocks for more complicated distributions.

Thinking about the named distributions first and foremost in terms of their stories has many benefits. It facilitates pattern recognition, allowing us to see when two problems are essentially identical in structure; it often leads to cleaner solutions that avoid PMF calculations altogether; and it helps us understand how the named distributions are connected to one another. Here it is clear that Bern(p) is the same distribution as Bin(1, p): the Bernoulli is a special case of the Binomial.

Using the story definition of the Binomial, let’s find its PMF.

Theorem 3.3.5 (Binomial PMF). If X ∼ Bin(n, p), then the PMF of X is

P (X = k) =

n k

pk(1 − p)n−k

for k = 0, 1 ,... , n (and P (X = k) = 0 otherwise).

h 3.3.6. To save writing, it is often left implicit that a PMF is zero wherever it is not specified to be nonzero, but in any case it is important to understand what the support of a random variable is, and good practice to check that PMFs are valid. If two discrete r.v.s have the same PMF, then they also must have the same support. So we sometimes refer to the support of a discrete distribution; this is the support of any r.v. with that distribution.

Proof. An experiment consisting of n independent Bernoulli trials produces a se- quence of successes and failures. The probability of any specific sequence of k suc- cesses and n − k failures is pk(1 − p)n−k. There are

(n k

such sequences, since we just need to select where the successes are. Therefore, letting X be the number of successes,

P (X = k) =

n k

pk(1 − p)n−k

for k = 0, 1 ,... , n, and P (X = k) = 0 otherwise. This is a valid PMF because it is nonnegative and it sums to 1 by the binomial theorem. 

Figure 3.6 shows plots of the Binomial PMF for various values of n and p. Note that the PMF of the Bin(10, 1 /2) distribution is symmetric about 5, but when the success probability is not 1/2, the PMF is skewed. For a fixed number of trials n, X tends to be larger when the success probability is high and lower when the success probability is low, as we would expect from the story of the Binomial distribution. Also recall that in any PMF plot, the sum of the heights of the vertical bars must be 1.

We’ve used Story 3.3.4 to find the Bin(n, p) PMF. The story also gives us a straight- forward proof of the fact that if X is Binomial, then n − X is also Binomial.

Random variables and their distributions 115

Theorem 3.3.7. Let X ∼ Bin(n, p), and q = 1 − p (we often use q to denote the failure probability of a Bernoulli trial). Then n − X ∼ Bin(n, q).

Proof. Using the story of the Binomial, interpret X as the number of successes in n independent Bernoulli trials. Then n − X is the number of failures in those trials. Interchanging the roles of success and failure, we have n − X ∼ Bin(n, q). Alternatively, we can check that n − X has the Bin(n, q) PMF. Let Y = n − X. The PMF of Y is

P (Y = k) = P (X = n − k) =

n n − k

pn−kqk^ =

n k

qkpn−k,

for k = 0, 1 ,... , n. 

Corollary 3.3.8. Let X ∼ Bin(n, p) with p = 1/2 and n even. Then the distribution of X is symmetric about n/2, in the sense that P (X = n/2 + j) = P (X = n/ 2 − j) for all nonnegative integers j.

Proof. By Theorem 3.3.7, n − X is also Bin(n, 1 /2), so

P (X = k) = P (n − X = k) = P (X = n − k)

for all nonnegative integers k. Letting k = n/2 + j, the desired result follows. This explains why the Bin(10, 1 /2) PMF is symmetric about 5 in Figure 3.6. 

Example 3.3.9 (Coin tosses continued). Going back to Example 3.1.2, we now know that X ∼ Bin(2, 1 /2), Y ∼ Bin(2, 1 /2), and I ∼ Bern(1/2). Consistent with Theorem 3.3.7, X and Y = 2 − X have the same distribution, and consistent with Corollary 3.3.8, the distribution of X (and of Y ) is symmetric about 1. 

3.4 Hypergeometric

If we have an urn filled with w white and b black balls, then drawing n balls out of the urn with replacement yields a Bin(n, w/(w + b)) distribution for the number of white balls obtained in n trials, since the draws are independent Bernoulli trials, each with probability w/(w+b) of success. If we instead sample without replacement, as illustrated in Figure 3.7, then the number of white balls follows a Hypergeometric distribution.

Story 3.4.1 (Hypergeometric distribution). Consider an urn with w white balls and b black balls. We draw n balls out of the urn at random without replacement, such that all

(w+b n

samples are equally likely. Let X be the number of white balls in the sample. Then X is said to have the Hypergeometric distribution with parameters w, b, and n; we denote this by X ∼ HGeom(w, b, n). 

116 Introduction to Probability

FIGURE 3. Hypergeometric story. An urn contains w = 6 white balls and b = 4 black balls. We sample n = 5 without replacement. The number X of white balls in the sample is Hypergeometric; here we observe X = 3.

As with the Binomial distribution, we can obtain the PMF of the Hypergeometric distribution from the story.

Theorem 3.4.2 (Hypergeometric PMF). If X ∼ HGeom(w, b, n), then the PMF of X is

P (X = k) =

(w k

)( (^) b n−k

(w+b n

for integers k satisfying 0 ≤ k ≤ w and 0 ≤ n − k ≤ b, and P (X = k) = 0 otherwise.

Proof. To get P (X = k), we first count the number of possible ways to draw exactly k white balls and n − k black balls from the urn (without distinguishing between different orderings for getting the same set of balls). If k > w or n − k > b, then the draw is impossible. Otherwise, there are

(w k

)( (^) b n−k

ways to draw k white and

n − k black balls by the multiplication rule, and there are

(w+b n

total ways to draw n balls. Since all samples are equally likely, the naive definition of probability gives

P (X = k) =

(w k

)( (^) b n−k

(w+b n

for integers k satisfying 0 ≤ k ≤ w and 0 ≤ n−k ≤ b. This PMF is valid because the numerator, summed over all k, equals

(w+b n

by Vandermonde’s identity (Example 1.5.3), so the PMF sums to 1. 

The Hypergeometric distribution comes up in many scenarios which, on the surface, have little in common with white and black balls in an urn. The essential structure of the Hypergeometric story is that items in a population are classified using two sets of tags: in the urn story, each ball is either white or black (this is the first set of tags), and each ball is either sampled or not sampled (this is the second set of tags). Furthermore, at least one of these sets of tags is assigned completely at random (in the urn story, the balls are sampled randomly, with all sets of the correct size equally likely). Then X ∼ HGeom(w, b, n) represents the number of twice-tagged items: in the urn story, balls that are both white and sampled.

118 Introduction to Probability

the second set of tags. Both X and Y count the number of white sampled balls, so they have the same distribution.

Alternatively, we can check algebraically that X and Y have the same PMF:

P (X = k) =

(w k

)( (^) b n−k

(w+b n

w!b!n!(w + b − n)! k!(w + b)!(w − k)!(n − k)!(b − n + k)!

P (Y = k) =

(n k

)(w+b−n w−k

(w+b w

w!b!n!(w + b − n)! k!(w + b)!(w − k)!(n − k)!(b − n + k)!

We prefer the story proof because it is less tedious and more memorable. 

h 3.4.6 (Binomial vs. Hypergeometric). The Binomial and Hypergeometric distri- butions are often confused. Both are discrete distributions taking on integer values between 0 and n for some n, and both can be interpreted as the number of successes in n Bernoulli trials (for the Hypergeometric, each tagged elk in the recaptured sam- ple can be considered a success and each untagged elk a failure). However, a crucial part of the Binomial story is that the Bernoulli trials involved are independent. The Bernoulli trials in the Hypergeometric story are dependent, since the sampling is done without replacement: knowing that one elk in our sample is tagged decreases the probability that the second elk will also be tagged.

3.5 Discrete Uniform

A very simple story, closely connected to the naive definition of probability, describes picking a random number from some finite set of possibilities.

Story 3.5.1 (Discrete Uniform distribution). Let C be a finite, nonempty set of numbers. Choose one of these numbers uniformly at random (i.e., all values in C are equally likely). Call the chosen number X. Then X is said to have the Discrete Uniform distribution with parameter C; we denote this by X ∼ DUnif(C). 

The PMF of X ∼ DUnif(C) is

P (X = x) =

|C|

for x ∈ C (and 0 otherwise), since a PMF must sum to 1. As with questions based on the naive definition of probability, questions based on a Discrete Uniform distribution reduce to counting problems. Specifically, for X ∼ DUnif(C) and any A ⊆ C, we have

P (X ∈ A) =

|A|

|C|

Random variables and their distributions 119

Example 3.5.2 (Random slips of paper). There are 100 slips of paper in a hat, each of which has one of the numbers 1, 2 ,... , 100 written on it, with no number appearing more than once. Five of the slips are drawn, one at a time.

First consider random sampling with replacement (with equal probabilities).

(a) What is the distribution of how many of the drawn slips have a value of at least 80 written on them?

(b) What is the distribution of the value of the jth draw (for 1 ≤ j ≤ 5)?

(c) What is the probability that the number 100 is drawn at least once?

Now consider random sampling without replacement (with all sets of five slips equally likely to be chosen).

(d) What is the distribution of how many of the drawn slips have a value of at least 80 written on them?

(e) What is the distribution of the value of the jth draw (for 1 ≤ j ≤ 5)?

(f) What is the probability that the number 100 is drawn in the sample?

Solution:

(a) By the story of the Binomial, the distribution is Bin(5, 0 .21).

(b) Let Xj be the value of the jth draw. By symmetry, Xj ∼ DUnif(1, 2 ,... , 100). There aren’t certain slips that love being chosen on the jth draw and others that avoid being chosen then; all are equally likely.

(c) Taking complements,

P (Xj = 100 for at least one j) = 1 − P (X 1 6 = 100,... , X 5 6 = 100).

By the naive definition of probability, this is

1 − (99/100)^5 ≈ 0. 049.

This solution just uses new notation for concepts from Chapter 1. It is useful to have this new notation since it is compact and flexible. In the above calculation, it is important to see why

P (X 1 6 = 100,... , X 5 6 = 100) = P (X 1 6 = 100)... P (X 5 6 = 100).

This follows from the naive definition in this case, but a more general way to think about such statements is through independence of r.v.s, a concept discussed in detail in Section 3.8.

(d) By the story of the Hypergeometric, the distribution is HGeom(21, 79 , 5).

(e) Let Yj be the value of the jth draw. By symmetry, Yj ∼ DUnif(1, 2 ,... , 100).

Random variables and their distributions 121

Definition 3.6.1. The cumulative distribution function (CDF) of an r.v. X is the function FX given by FX (x) = P (X ≤ x). When there is no risk of ambiguity, we sometimes drop the subscript and just write F (or some other letter) for a CDF. The next example demonstrates that for discrete r.v.s, we can freely convert between CDF and PMF. Example 3.6.2. Let X ∼ Bin(4, 1 /2). Figure 3.8 shows the PMF and CDF of X.

0 1 2 3 4

x

PMF

● ●

● 0 1 2 3 4

x

CDF

● ●

● ●

● ●

● ●

P ( X = 2)

P ( X = 2)

FIGURE 3. Bin(4, 1 /2) PMF and CDF. The height of the vertical bar P (X = 2) in the PMF is also the height of the jump in the CDF at 2.

  • From PMF to CDF : To find P (X ≤ 1 .5), which is the CDF evaluated at 1.5, we sum the PMF over all values of the support that are less than or equal to 1.5:

P (X ≤ 1 .5) = P (X = 0) + P (X = 1) =

Similarly, the value of the CDF at an arbitrary point x is the sum of the heights of the vertical bars of the PMF at values less than or equal to x.

  • From CDF to PMF : The CDF of a discrete r.v. consists of jumps and flat regions. The height of a jump in the CDF at x is equal to the value of the PMF at x. For example, in Figure 3.8, the height of the jump in the CDF at 2 is the same as the height of the corresponding vertical bar in the PMF; this is indicated in the figure with curly braces. The flat regions of the CDF correspond to values outside the support of X, so the PMF is equal to 0 in those regions.  Valid CDFs satisfy the following criteria.

Theorem 3.6.3 (Valid CDFs). Any CDF F has the following properties.

  • Increasing: If x 1 ≤ x 2 , then F (x 1 ) ≤ F (x 2 ).

122 Introduction to Probability

  • Right-continuous: As in Figure 3.8, the CDF is continuous except possibly for having some jumps. Wherever there is a jump, the CDF is continuous from the right. That is, for any a, we have

F (a) = lim x→a+ F (x).

  • Convergence to 0 and 1 in the limits:

lim x→−∞ F (x) = 0 and lim x→∞ F (x) = 1.

Proof. The above criteria are true for all CDFs, but for simplicity we will only prove it for the case where F is the CDF of a discrete r.v. X whose possible values are 0 , 1 , 2 ,.... As an example of how to visualize the criteria, consider Figure 3.8: the CDF shown there is increasing (with some flat regions), continuous from the right (it is continuous except at jumps, and each jump has an open dot at the bottom and a closed dot at the top), and it converges to 0 as x → −∞ and to 1 as x → ∞ (in this example, it reaches 0 and 1; in some examples, one or both of these values may be approached but never reached). The first criterion is true since the event {X ≤ x 1 } is a subset of the event {X ≤ x 2 }, so P (X ≤ x 1 ) ≤ P (X ≤ x 2 ). For the second criterion, note that

P (X ≤ x) = P (X ≤ bxc),

where bxc is the greatest integer less than or equal to x. For example, P (X ≤ 4 .9) = P (X ≤ 4) since X is integer-valued. So F (a + b) = F (a) for any b > 0 that is small enough so that a + b < bac + 1, e.g., for a = 4.9, this holds for 0 < b < 0 .1. This implies F (a) = limx→a+^ F (x) (in fact, it’s much stronger since it says F (x) equals F (a) when x is close enough to a and on the right). For the third criterion, we have F (x) = 0 for x < 0, and

lim x→∞ F (x) = lim x→∞ P (X ≤ bxc) = lim x→∞

∑bxc

n=

P (X = n) =

∑^ ∞

n=

P (X = n) = 1. 

The converse is true too: we will show in Chapter 5 that given any function F meeting these criteria, we can construct a random variable whose CDF is F. To recap, we have now seen three equivalent ways of expressing the distribution of a random variable. Two of these are the PMF and the CDF: we know these two functions contain the same information, since we can always figure out the CDF from the PMF and vice versa. Generally the PMF is easier to work with for discrete r.v.s, since evaluating the CDF requires a summation.