Distribution Form Frame

The Binomial Distribution
probability density = ( ⁿ_x ) p^x (1-p)^(n-x)

Use the binomial probability when the outcome of experiments is success or failure, when the experiments are independent of one another, and when the probability of success doesn't change in successive trials. A coin toss is a binomial experiment.

Example: let success be a flip of heads. In two coin tosses (trials),

    0 heads .... pdf( N=2, p=.5, X=0 ) = .25
    1 head ..... pdf( N=2, p=.5, X=1 ) = .50
    2 heads .... pdf( N=2, p=.5, X=2 ) = .25

Example: you flip a coin twenty times and get more than fourteen heads. Is it biased?

cdf( N=20, p=.5, x=14) = 0.9793053

is the probability of no more than fourteen heads, so the probability of fifteen or more is 1 - 0.9793053 = 0.0206947, slightly more than two per cent. The event is rare, but not unheard of. You would say the result is significant at a 95% confidence level but not at 99%. But ask for a new coin anyway.

Example: using a random sample, a polling organization asks 50 voters if they favor Candidate A for reelection. Given that 55% of the city's voters favor Candidate A, this formula returns the probability that exactly 33 people from the sample will favor her:

pdf( N=50, p=.55, X=33 ) = 0.0338768

The Beta Distribution
probability density = x^a-1(1 - x)^b-1 / B(a,b)

The beta probability is useful for studying a variable such as a percentage or probability which may only take on values within some restricted range [x₀, x₁].

If a < 1 and b < 1, the graph is U-shaped, tending to infinity at the limits. If a > 1 and b > 1, the graph is bell-shaped. For a = b = 1, we get the uniform density as a special case. You might want to experiment with a > 1 and b < 1.

B(a,b) in the formula above is defined as G(a) G(b) / G(a+b). The mean of the beta distribution is a/(a+b), and its variance is ab / ((a+b)² (a+b+1))

The Cauchy Distribution
probability density = 1 / ( pb (1+(x-a)² / b²) )

The Cauchy distribution, also called Lorentzian, is the distribution of the quotient of independent standard normal random variables. The tails of the distribution, although exponentially decreasing, are asymptotically much larger than any corresponding normal distribution.

It is interesting for a number of theoretical reasons. Although its mean can be taken as zero, since it is symmetrical about zero, the expectation, variance, higher moments, and moment generating function do not exist.

For independent identically distributed Cauchy variables X₁, ..., X_n, the average (X₁ + ... + X_n) /n has, surprisingly, the same density as the X_j.

The Chi-Square Distribution
probability density = x^n/2-1 e^-x/2 / ( 2^n/2 G(n/2) )

If Z is a standard normal random variable, then the distribution of Z² is chi-square with one degree of freedom.

If U₁ , U₂ , ..., U_n are independent chi-square variables with one degree of freedom, then the distribution of U₁ + U₂ + ... + U_n is chi-square with n degrees of freedom.

The distribution is used in chi-square tests, which allow you to compare the differences between observed and expected counts.

The Exponential Distribution
probability density = l e^-lx

lambda

The exponential distribution, also known as the waiting-time distribution, describes the amount of time or distance between the occurrence of random events (such as the time between major earthquakes or the time between no hitters pitched in major league baseball).

Use this distribution in connection with estimating the length of material life, or the length of time a process might take.

The F-Distribution
density = (m/n)^m/2 x^m/2-1 (1 + mx/n)^-(m+n)/2 / B(m/2, n/2)

Df-1

Df-2

F stands for Sir Ronald Fisher, English geneticist and statistician. The distribution is used in the analysis of variance and is a function of the ratio of two independent random variables each of which has a chi-square distribution and is divided by its number of degrees of freedom.

The Gamma Distribution
probability density = l^a x^(a-1) e^-lx / G(a)

alpha

lambda

Gamma densities provide a fairly flexible class for modeling nonnegative random variables. The chi square distribution is an instance of the gamma, and when alpha = 1, the gamma distribution is an exponential distribution.

The gamma distribution can be used to describe the probability that a events will occur within a time period l. Contrast it with the exponential distribution, which describes the probability that one event will occur.

Verify that the formula is the one you expected to use. Certain texts define the distribution as Gamma( x; a, b), where b is the reciprocal of Lambda.

The Geometric Distribution
probability density = p(1-p)^x-1

The geometric distribution describes a set of Bernoulli trials with a fixed Bernoulli probability p. It is the probability of N, the trial number of the first success.

The number of trials is unknown. The experiment continues until the first success occurs. In contrast with the binomial distribution, N is the random variable.

Example: let success in a trial be defined as a dice roll of a six. The probability of success on the first trial will be one-sixth.

pdf( p=1/6, N=1 ) = 1/6

On the second trial, the probability will be one-sixth of the remaining chances. The pdf( p=1/6, N=2 ) will be 1/6 x 5/6, or 5/36.

The Gumbel Distribution
probability density = exp( -e^-(x-a)/b )

The Gumbel distribution, a special case of the Fisher-Tippett Distribution, is particularly convenient for extreme value distribution purposes, and it may be used as an alternative to the normal distribution in the case of skewed empirical data.

The Gumbel distribution is used in flood frequency analysis, to determine the probability that a given flow will occur within a given time interval.

The Hypergeometric Distribution
probability density = ( ^T_x ) ( ^N_n ^-_- ^T_x ) / ( ^N_n )

The distribution describes random sampling without replacement when information about the population as a whole is known. The the major practical use of the hypergeometric distribution is in survey sampling, although textbook examples mostly concern urns filled with black and red balls,

Example: five cards are drawn from a deck of 52 playing cards. This formula calculates the probability that exactly one of the five cards drawn is an ace (assuming there are only four aces in the deck):

pdf( N=52, n=5, T=4, X=1 ) = 0.299474

The probability that no more than one of the cards is an ace is

cdf( N=52, n=5, T=4, X=1 ) = 0.958316

SampleSuccess must be greater than the larger of 0 or (SampleSize - PopSize + PopSuccess)

The Lognormal Distribution
probability density = exp( -½(logx - µ)² / s² ) / (sx Ö2p)

sigma

A random variable X is said to follow a lognormal distribution if the random variable Y = log( X ) is normally distributed.

The distribution of particle sizes in crushing, when there have been repeated impacts, is often skewed, with a slowly decreasing right tail. The lognormal is sometimes used to fit such a distribution.

The Negative Binomial Distribution
probability density = ( ^{r +}_x ^{x - 1} ) p^r (1-p)^x

The negative binomial distribution describes the number of additional trials N, given a success rate p, needed to achieve r successes. Note that the total number of trials is r+N.

Example: a door-to-door encyclopedia salesperson is required to make five in-home visits each day. Suppose he has a 30% chance of being invited into any given home. If he selects, ahead of time, the addresses of 30 households upon which to call, what is the probability that A) he requires fewer than eight addresses to make five calls? B) he requires twenty-five or more addresses to make five calls?

A) CDF( r=5, p=.3, N=2 )= 0.0287955,

B) 1 - CDF( r=5, p=.3, N=20 )= 1 - 0.909528081 = .090471919,

Conventions: the statistics textbooks are evenly divided between this random variable N = number of trials in excess, and the variable X = total trials. The conversion formula is X = N + r, and so the above results could also be written as

The Normal Distribution
probability density = exp( -½(x - µ)² / s² ) / (s Ö2p)

sigma

The graph of this distribution is the bell-shaped curve called the normal, or Gaussian, probability curve. The graph can illustrate the idea of a percentile. Click Display Graph to see that one deviation is equivalent to the 84th percentile.

Many sets of measurements have been found to have this frequency distribution. For example, let x_i be the number of 6's cast in the N respective runs of n tosses of a die and assume N to be moderately large. Let y_j be the weights, correct to the nearest 1/100 g, of N lima beans chosen haphazardly from a 100-kg bag of lima beans. Let z_k be the barometric pressures recorded to the nearest 1/1000 cm by N students in succession, reading the same barometer. It will be observed that the x 's, y 's, and z 's have an amazingly similar frequency pattern.

There is a proof that, as a sample becomes large, the distribution of its mean (from a population with finite variance) approximates the normal distribution. It is known as the Central Limit Theorem.

The Poisson Distribution
probability density = l^x e^-l / x!

lambda

Poisson probability is the probability that X number of events will occur in a given space or over a given time period.

Example: on average, Company ABC receives 30 customer service phone calls per hour. What is the probability that Company ABC will receive exactly 35 calls in one hour?

pdf( Lambda=30, X=35 ) = 0.0453082

The probability that Company ABC will receive 35 or fewer calls in one hour is

cdf( Lambda=30, X=35 ) = 0.8426165

The Rayleigh Distribution
probability density = x exp( -½ x² / b² ) / b²

The Rayleigh distribution was developed to describe the scattering of radiation.

Its has a theoretical importance in the transformation of a standard bivariate normal distribution to polar coordinates, which can be used to construct an algorithm for generating standard normal random variables.

The Student T Distribution
density = G((n+1)/2) (1 + x²/n)^-½(n+1) / ( G(n/2) Önp )

DegFreedom

The Student's t-distribution measures the significance of a difference of means of small samples when two distributions are though to have the same variance, but possibly different means.

It is the distribution of a random variable X = u / v² where u and v are themselves independent random variables, u has a normal distribution with mean 0 and a standard deviation of 1, and v² has a chi-square distribution with Df degrees of freedom.

The Weibull Distribution
probability density = abx ^b-1 exp(-ax ^b-1)

alpha

beta

The Weibull distribution is used to calculate the mean time to failure
of a device. Industrial applications of survival analysis often involve testing components to destruction after subjecting them to a stress which is assumed to speed up the aging process. The name of this technique is accelerated life testing.

If beta = 1, the Weibull is the same as Exponential distribution with lambda = 1/beta.

AUTHOR: John Bohr
Last updated: 21 April 1998