Geometric distribution

Geometric
	Probability mass function
	Cumulative distribution function
Parameters	success probability (real)
Support	k trials where
PMF
CDF	for ,; for
Mean
Median	; (not unique if is an integer)
Mode
Variance
Skewness
Excess kurtosis
Entropy
MGF	; for
CF
PGF

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

The probability distribution of the number $X$ of Bernoulli trials needed to get one success, supported on the set $\{1,2,3,\ldots \}$ ;
The probability distribution of the number $Y=X-1$ of failures before the first success, supported on the set $\{0,1,2,\ldots \}$ .

Which of these is called the geometric distribution is a matter of convention and convenience.

These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of $X$ ); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.

The geometric distribution gives the probability that the first occurrence of success requires $k$ independent trials, each with success probability $p$ . If the probability of success on each trial is $p$ , then the probability that the $k$ -th trial is the first success is

\Pr(X=k)=(1-p)^{k-1}p

for $k=1,2,3,4,\dots$

The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:

\Pr(Y=k)=\Pr(X=k+1)=(1-p)^{k}p

for $k=0,1,2,3,\dots$

In either case, the sequence of probabilities is a geometric sequence.

The geometric distribution is denoted by Geo(p) where $0<p\leq 1$ .^[1]

Definition[edit]

The geometric distribution is the discrete probability distribution that describes when the first success in an infinite sequence of independent and identically distributed Bernoulli trials occurs. Its probability mass function is

P(X=k)=(1-p)^{k-1}p

where

k=1,2,3,\dotsc

is the number of trials and

p

is the probability of success in each trial.^[2]

Alternatively, some texts define the distribution where $k=0,1,2,\dotsc$ altering the probability mass function into:^[2]

P(Y=k)=(1-p)^{k}p

An example of a geometric distribution comes from rolling a six-sided die until a "1" appears. Each roll is independent with a

1/6

chance of success. The number of rolls needed follows a geometric distribution with

p=1/6

.

Properties[edit]

Moments and cumulants[edit]

The expected value for the number of independent trials to get the first success, and the variance of a geometrically distributed random variable X is:

\operatorname {E} (X)={\frac {1}{p}},\qquad \operatorname {var} (X)={\frac {1-p}{p^{2}}}.

Similarly, the expected value and variance of the geometrically distributed random variable Y = X - 1 (See definition of distribution $\Pr(Y=k)$ ) is:

\operatorname {E} (Y)=\operatorname {E} (X-1)=\operatorname {E} (X)-1={\frac {1-p}{p}},\qquad \operatorname {var} (Y)={\frac {1-p}{p^{2}}}.

Proof[edit]

Expected value of X[edit]

Consider the expected value $\mathrm {E} (X)$ of X as above, i.e. the average number of trials until a success. On the first trial we either succeed with probability $p$ , or we fail with probability $1-p$ . If we fail the remaining mean number of trials until a success is identical to the original mean. This follows from the fact that all trials are independent. From this we get the formula:

$\mathrm {E} (X)=p\cdot 1+(1-p)\cdot (1+\mathrm {E} (X)),$

which if solved for $\mathrm {E} (X)$ gives:

$\mathrm {E} (X)={\frac {1}{p}}.$

Expected value of Y[edit]

That the expected value of Y as above is (1 − p)/p can be trivially seen from $\mathrm {E} (Y)=\mathrm {E} (X-1)=\mathrm {E} (X)-1={\frac {1}{p}}-1={\frac {1-p}{p}}$ which follows from the linearity of expectation, or can be shown in the following way:

${\begin{aligned}\mathrm {E} (Y)&{}=\sum _{k=0}^{\infty }(1-p)^{k}p\cdot k\\&{}=p\sum _{k=0}^{\infty }(1-p)^{k}k\\&{}=p(1-p)\sum _{k=0}^{\infty }(1-p)^{k-1}\cdot k\\&{}=p(1-p)\left[{\frac {d}{dp}}\left(-\sum _{k=0}^{\infty }(1-p)^{k}\right)\right]\\&{}=p(1-p){\frac {d}{dp}}\left(-{\frac {1}{p}}\right)={\frac {1-p}{p}}.\end{aligned}}$

The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.

Let μ = (1 − p)/p be the expected value of Y. Then the cumulants $\kappa _{n}$ of the probability distribution of Y satisfy the recursion

\kappa _{n+1}=\mu (\mu +1){\frac {d\kappa _{n}}{d\mu }}.

Expected value examples[edit]

E3) A patient is waiting for a suitable matching kidney donor for a transplant. If the probability that a randomly selected donor is a suitable match is p = 0.1, what is the expected number of donors who will be tested before a matching donor is found?

With p = 0.1, the mean number of failures before the first success is E(Y) = (1 − p)/p =(1 − 0.1)/0.1 = 9.

For the alternative formulation, where X is the number of trials up to and including the first success, the expected value is E(X) = 1/p = 1/0.1 = 10.

For example 1 above, with p = 0.6, the mean number of failures before the first success is E(Y) = (1 − p)/p = (1 − 0.6)/0.6 = 0.67.

Higher-order moments[edit]

The moments for the number of failures before the first success are given by

{\begin{aligned}\mathrm {E} (Y^{n})&{}=\sum _{k=0}^{\infty }(1-p)^{k}p\cdot k^{n}\\&{}=p\operatorname {Li} _{-n}(1-p)&({\text{for }}n\neq 0)\end{aligned}}

where $\operatorname {Li} _{-n}(1-p)$ is the polylogarithm function.

General properties[edit]

The probability-generating functions of X and Y are, respectively,

{\begin{aligned}G_{X}(s)&={\frac {s\,p}{1-s\,(1-p)}},\\[10pt]G_{Y}(s)&={\frac {p}{1-s\,(1-p)}},\quad |s|<(1-p)^{-1}.\end{aligned}}

Like its continuous analogue (the exponential distribution), the geometric distribution is memoryless. That is, the following holds for every m and n.

\Pr\{X>m+n|X>n\}=\Pr\{X>m\}

The geometric distribution supported on {0, 1, 2, 3, ... } is the only memoryless discrete distribution. Note that the geometric distribution supported on {1, 2, ... } is not memoryless.

Among all discrete probability distributions supported on {1, 2, 3, ... } with given expected value μ, the geometric distribution X with parameter p = 1/μ is the one with the largest entropy.^[3]
The geometric distribution of the number Y of failures before the first success is infinitely divisible, i.e., for any positive integer n, there exist independent identically distributed random variables Y₁, ..., Y_n whose sum has the same distribution that Y has. These will not be geometrically distributed unless n = 1; they follow a negative binomial distribution.
The decimal digits of the geometrically distributed random variable Y are a sequence of independent (and not identically distributed) random variables.^{[citation needed]} For example, the hundreds digit D has this probability distribution:

\Pr(D=d)={q^{100d} \over 1+q^{100}+q^{200}+\cdots +q^{900}},

where q = 1 − p, and similarly for the other digits, and, more generally, similarly for numeral systems with other bases than 10. When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are indecomposable.

Golomb coding is the optimal prefix code^{[clarification needed]} for the geometric discrete distribution.^[4]
The sum of two independent Geo(p) distributed random variables is not a geometric distribution.^[1]

Related distributions[edit]

The geometric distribution Y is a special case of the negative binomial distribution, with r = 1. More generally, if Y₁, ..., Y_r are independent geometrically distributed variables with parameter p, then the sum

Z=\sum _{m=1}^{r}Y_{m}

follows a negative binomial distribution with parameters r and p.^[5]

The geometric distribution is a special case of discrete compound Poisson distribution.
If Y₁, ..., Y_r are independent geometrically distributed variables (with possibly different success parameters p_m), then their minimum

W=\min _{m\in 1,\ldots ,r}Y_{m}\,

is also geometrically distributed, with parameter

p=1-\prod _{m}(1-p_{m}).

^[6]

Suppose 0 < r < 1, and for k = 1, 2, 3, ... the random variable X_k has a Poisson distribution with expected value r^k/k. Then

\sum _{k=1}^{\infty }k\,X_{k}

has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value r/(1 − r).^{[citation needed]}

The exponential distribution is the continuous analogue of the geometric distribution. If X is an exponentially distributed random variable with parameter λ, then

Y=\lfloor X\rfloor ,

where

\lfloor \quad \rfloor

is the floor (or greatest integer) function, is a geometrically distributed random variable with parameter p = 1 − e^−λ (thus λ = −ln(1 − p)^[7]) and taking values in the set {0, 1, 2, ...}. This can be used to generate geometrically distributed pseudorandom numbers by first generating exponentially distributed pseudorandom numbers from a uniform pseudorandom number generator: then

\lfloor \ln(U)/\ln(1-p)\rfloor

is geometrically distributed with parameter

p

, if

U

is uniformly distributed in [0,1].

If p = 1/n and X is geometrically distributed with parameter p, then the distribution of X/n approaches an exponential distribution with expected value 1 as n → ∞, since

{\begin{aligned}\Pr(X/n>a)=\Pr(X>na)&=(1-p)^{na}=\left(1-{\frac {1}{n}}\right)^{na}=\left[\left(1-{\frac {1}{n}}\right)^{n}\right]^{a}\\&\to [e^{-1}]^{a}=e^{-a}{\text{ as }}n\to \infty .\end{aligned}}

More generally, if p = λ/n, where λ is a parameter, then as n→ ∞ the distribution of X/n approaches an exponential distribution with rate λ:

\Pr(X>nx)=\lim _{n\to \infty }(1-\lambda /n)^{nx}=e^{-\lambda x}

therefore the distribution function of X/n converges to $1-e^{-\lambda x}$ , which is that of an exponential random variable.

Statistical inference[edit]

Parameter estimation[edit]

For both variants of the geometric distribution, the parameter p can be estimated by equating the expected value with the sample mean. This is the method of moments, which in this case happens to yield maximum likelihood estimates of p.^[8]^[9]

Specifically, for the first variant let k = k₁, ..., k_n be a sample where k_i ≥ 1 for i = 1, ..., n. Then p can be estimated as

{\widehat {p}}=\left({\frac {1}{n}}\sum _{i=1}^{n}k_{i}\right)^{-1}={\frac {n}{\sum _{i=1}^{n}k_{i}}}.\!

In Bayesian inference, the Beta distribution is the conjugate prior distribution for the parameter p. If this parameter is given a Beta(α, β) prior, then the posterior distribution is

p\sim \mathrm {Beta} \left(\alpha +n,\ \beta +\sum _{i=1}^{n}(k_{i}-1)\right).\!

The posterior mean E[p] approaches the maximum likelihood estimate ${\widehat {p}}$ as α and β approach zero.

In the alternative case, let k₁, ..., k_n be a sample where k_i ≥ 0 for i = 1, ..., n. Then p can be estimated as

{\widehat {p}}=\left(1+{\frac {1}{n}}\sum _{i=1}^{n}k_{i}\right)^{-1}={\frac {n}{\sum _{i=1}^{n}k_{i}+n}}.\!

The posterior distribution of p given a Beta(α, β) prior is^[10]

p\sim \mathrm {Beta} \left(\alpha +n,\ \beta +\sum _{i=1}^{n}k_{i}\right).\!

Again the posterior mean E[p] approaches the maximum likelihood estimate ${\widehat {p}}$ as α and β approach zero.

For either estimate of ${\widehat {p}}$ using Maximum Likelihood, the bias is equal to

b\equiv \operatorname {E} {\bigg [}\;({\hat {p}}_{\mathrm {mle} }-p)\;{\bigg ]}={\frac {p\,(1-p)}{n}}

which yields the bias-corrected maximum likelihood estimator

{\hat {p\,}}_{\text{mle}}^{*}={\hat {p\,}}_{\text{mle}}-{\hat {b\,}}

Computational methods[edit]

In the programming language R, the function dgeom(k, prob) calculates the probability of k failures before a success with a success probability prob for each trial.

In Microsoft Excel, the function NEGBINOMDIST(number_f, number_s, probability_s) can be used to calculate the number of failures, number_f, before a number of successes, number_s, with a success probability, probability_s, for each trial. Setting number_s to 1, gives the geometric distribution.^[11]

References[edit]

^ ^a ^b A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. pp. 48–50, 61–62, 152. ISBN 9781852338961. OCLC 262680588.{{cite book}}: CS1 maint: others (link)
^ ^a ^b Nagel, Werner; Steyer, Rolf (2017-04-04). Probability and Conditional Expectation: Fundamentals for the Empirical Sciences. Wiley Series in Probability and Statistics (1st ed.). Wiley. pp. 260–261. doi:10.1002/9781119243496. ISBN 978-1-119-24352-6.
^ Park, Sung Y.; Bera, Anil K. (June 2009). "Maximum entropy autoregressive conditional heteroskedasticity model". Journal of Econometrics. 150 (2): 219–230. doi:10.1016/j.jeconom.2008.12.014.
^ Gallager, R.; van Voorhis, D. (March 1975). "Optimal source codes for geometrically distributed integer alphabets (Corresp.)". IEEE Transactions on Information Theory. 21 (2): 228–230. doi:10.1109/TIT.1975.1055357. ISSN 0018-9448.
^ Pitman, Jim. Probability (1993 edition). Springer Publishers. pp 372.
^ Ciardo, Gianfranco; Leemis, Lawrence M.; Nicol, David (1 June 1995). "On the minimum of independent geometrically distributed random variables". Statistics & Probability Letters. 23 (4): 313–326. doi:10.1016/0167-7152(94)00130-Z. hdl:2060/19940028569. S2CID 1505801.
^ "Wolfram-Alpha: Computational Knowledge Engine". www.wolframalpha.com.
^ casella, george; berger, roger l (2002). statistical inference (2nd ed.). pp. 312–315. ISBN 0-534-24312-6.
^ "MLE Examples: Exponential and Geometric Distributions Old Kiwi - Rhea". www.projectrhea.org. Retrieved 2019-11-17.
^ "3. Conjugate families of distributions" (PDF). Archived (PDF) from the original on 2010-04-08.
^ "3.5 Geometric Probability Distribution using Excel Spreadsheet". Statistics LibreTexts. 2021-07-24. Retrieved 2023-10-20.

External links[edit]

Geometric distribution on MathWorld.

[:0-1] A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. pp. 48–50, 61–62, 152. ISBN 9781852338961. OCLC 262680588.{{cite book}}: CS1 maint: others (link)

[:1-2] Nagel, Werner; Steyer, Rolf (2017-04-04). Probability and Conditional Expectation: Fundamentals for the Empirical Sciences. Wiley Series in Probability and Statistics (1st ed.). Wiley. pp. 260–261. doi:10.1002/9781119243496. ISBN 978-1-119-24352-6.

[3] Park, Sung Y.; Bera, Anil K. (June 2009). "Maximum entropy autoregressive conditional heteroskedasticity model". Journal of Econometrics. 150 (2): 219–230. doi:10.1016/j.jeconom.2008.12.014.

[4] Gallager, R.; van Voorhis, D. (March 1975). "Optimal source codes for geometrically distributed integer alphabets (Corresp.)". IEEE Transactions on Information Theory. 21 (2): 228–230. doi:10.1109/TIT.1975.1055357. ISSN 0018-9448.

[5] Pitman, Jim. Probability (1993 edition). Springer Publishers. pp 372.

[6] Ciardo, Gianfranco; Leemis, Lawrence M.; Nicol, David (1 June 1995). "On the minimum of independent geometrically distributed random variables". Statistics & Probability Letters. 23 (4): 313–326. doi:10.1016/0167-7152(94)00130-Z. hdl:2060/19940028569. S2CID 1505801.

[7] "Wolfram-Alpha: Computational Knowledge Engine". www.wolframalpha.com.

[8] sella, george; berger, roger l (2002). statistical inference (2nd ed.). pp. 312–315. ISBN 0-534-24312-6.

[9] "MLE Examples: Exponential and Geometric Distributions Old Kiwi - Rhea". www.projectrhea.org. Retrieved 2019-11-17.

[10] "3. Conjugate families of distributions" (PDF). Archived (PDF) from the original on 2010-04-08.

[11] "3.5 Geometric Probability Distribution using Excel Spreadsheet". Statistics LibreTexts. 2021-07-24. Retrieved 2023-10-20.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

Probability mass function
Cumulative distribution function
Parameters	$0<p\leq 1$ success probability (real)	$0<p\leq 1$ success probability (real)
Support	k trials where $k\in \{1,2,3,\dots \}$	k failures where $k\in \{0,1,2,3,\dots \}$
PMF	$(1-p)^{k-1}p$	$(1-p)^{k}p$
CDF	$1-(1-p)^{\lfloor x\rfloor }$ for $x\geq 1$ , $0$ for $x<1$	$1-(1-p)^{\lfloor x\rfloor +1}$ for $x\geq 0$ , $0$ for $x<0$
Mean	${\frac {1}{p}}$	${\frac {1-p}{p}}$
Median	$\left\lceil {\frac {-1}{\log _{2}(1-p)}}\right\rceil$ (not unique if $-1/\log _{2}(1-p)$ is an integer)	$\left\lceil {\frac {-1}{\log _{2}(1-p)}}\right\rceil -1$ (not unique if $-1/\log _{2}(1-p)$ is an integer)
Mode	$1$	$0$
Variance	${\frac {1-p}{p^{2}}}$	${\frac {1-p}{p^{2}}}$
Skewness	${\frac {2-p}{\sqrt {1-p}}}$	${\frac {2-p}{\sqrt {1-p}}}$
Excess kurtosis	$6+{\frac {p^{2}}{1-p}}$	$6+{\frac {p^{2}}{1-p}}$
Entropy	${\tfrac {-(1-p)\log(1-p)-p\log p}{p}}$	${\tfrac {-(1-p)\log(1-p)-p\log p}{p}}$
MGF	${\frac {pe^{t}}{1-(1-p)e^{t}}},$ for $t<-\ln(1-p)$	${\frac {p}{1-(1-p)e^{t}}},$ for $t<-\ln(1-p)$
CF	${\frac {pe^{it}}{1-(1-p)e^{it}}}$	${\frac {p}{1-(1-p)e^{it}}}$
PGF	${\frac {pz}{1-(1-p)z}}$	${\frac {p}{1-(1-p)z}}$