# Math: How to Find the Mean of a Probability Distribution

*I hold both a bachelor's and a master's degree in applied mathematics.*

## What Is a Probability Distribution?

In a lot of situations, multiple outcomes are possible. For all outcomes, there is a probability that it will happen. This is called the probability distribution. The probabilities of all possible outcomes must add up to 1, or 100%.

A probability distribution can be discrete or continuous. In a discrete probability distribution, there are only a countable number of possibilities. In a continuous probability distribution, an uncountable number of outcomes are possible. An example of a discrete probability is rolling a die. There are only six possible outcomes. Also, the number of people that are in line for an entrance is a discrete event. Although it could in theory be any possible length, it is countable and therefore discrete. Examples of continuous outcomes are time, weight, length and so on, as long as you do not round the outcome but take the exact amount. Then there are uncountably many options. Even when all weights between 0 and 1 kg are considered, these are uncountable infinite options. When you would round any weight to one decimal it becomes discrete.

**Examples of Common Probability Distributions**

The most natural probability distribution is the uniform distribution. If the outcomes of an event are uniformly distributed, then every outcome is equally likely—for example, rolling a die. Then all outcomes 1, 2, 3, 4, 5 and 6 are equally likely and happen with a probability of 1/6. This is an example of a discrete uniform distribution.

**Uniform Distribution**

The uniform distribution can also be continuous. Then the probability that one certain event happens is 0, since there are infinitely many possible outcomes. Therefore, it is more useful to look at the probability that the outcome is between some values. For example, when X is uniformly distributed between 0 and 1, then the probability that X<0.5 = 1/2, and also the probability that 0.25 < X < 0.75 = 1/2, since all outcomes are equally likely. In general, the probability that X is equal to x, or more formally P(X = x) can be calculated as P(X= x) = 1/n, where n is the total number of possible outcomes.

### Bernouilli Distribution

Another well known distribution is the Bernouilli distribution. In the Bernouilli distribution, there are only two possible outcomes: success and no success. The probability of success is p and therefore the probability of no success is 1-p. Success is denoted by 1, no success by 0. The classic example is a coin toss where heads is success, tails is no success, or vice versa. Then p = 0.5. Another example could be rolling a six with a die. Then p = 1/6. So P(X = 1) = p.

### Binomial Distribution

The binomial distribution looks at repeated Bernouilli outcomes. It gives the probability that in n tries you get k successes and n-k fails. Therefore this distribution has three parameters: the number of tries n, the number of successes k, and the success probability p. Then the probability P(X = x) = (n ncr x) p^{x}(1-p)^{n-x} where n ncr k is the binomial coefficient.

### Geometric Distribution

The geometric distribution is meant to look at the number of tries before the first success in a Bernouilli setting—for example, the number of tries until a six is rolled or the number of weeks before you win in the lottery. P(X = x) = p*(1-p)^x.

### Poisson Distribution

The Poisson distribution counts the number of events that happen in a certain fixed time interval—for example, the number of customers that come to the supermarket every day. It has one parameter, which is mostly called lambda. Lambda is the intensity of arrivals. So on average, lambda customers arrive. The probability that there are x arrivals then is P(X = x) = lambda^{x}/x! e^{-lambda}

### Exponential Distribution

The exponential distribution is a well-known continuous distribution. It is closely related to the Poisson distribution, as it is the time between two arrivals in a Poisson process. Here P(X = x) = 0, and therefore it is more useful to look at the probability mass function f(x) = lambda*e^{-lambda*x}. This is the derivative of the probability density function, which represents P(X < x).

There are many more probability distributions, but these are the ones that come up the most in practice.

## How to Find the Mean of a Probability Distribution

The mean of a probability distribution is the average. By the law of large numbers, if you would keep taking samples of a probability distribution forever then the average of your samples will be the mean of the probability distribution. The mean is also called the expected value or the expectation of the random variable X. The expectation E[X] of a random variable X when X is discrete can be calculated as follows:

E[X] = sum_{x from 0 to infinity} x*P(X = x)

**Uniform Distribution**

Let X be uniformly distributed. Then the expected value is the sum of all outcomes, divided by the number of possible outcomes. For the die example we saw that P(X=x) = 1/6 for all possible outcomes. Then E[X] = (1+2+3+4+5+6)/6 = 3.5. Here you see that the expected value does not need to be a possible outcome. If you keep rolling a die the average number you roll will be 3.5, but you will of course never actually roll 3.5.

**Bernouilli Distribution**

The expectation of the Bernouilli distribution is p, since there are two possible outcomes. These are 0 and 1. So:

E[X] = 0*P(X=0) + 1*P(X=1) = p

**Binomial Distribution**

For the binomial distribution, we must again solve a difficult sum:

sum x*(n ncr x)*p^{x}*(1-p)^{n-x}^{}

This sum is equal to n*p. The exact calculation of this sum goes beyond the scope of this article.

**Geometric Distribution**

For the geometric distribution the expected value is calculated using the definition. Although the sum is pretty difficult to calculate, the result is very simple:

E[X] = sum x*p*(1-p)^{x-1} = 1/p

This is also very intuitive. If something happens with probability p, you expect to need 1/p tries to get a success. For example, on average you need six tries to roll a six with a die. Sometime is will be more, sometimes it will be less, but the mean is six.

**Poisson Distribution**

The expectation of the Poisson distribution is lambda, since lambda is defined as the arrival intensity. If we apply the definition of the mean we indeed get this:

E[X] = sum x*lambda^{x}/x! * e^{-lambda} = lambda*e^{-lambda} *sum lambda^{x-1}/(x-1)! = lambda*e^{-lambda}*e^{lambda} = lambda

**Exponential Distribution**

The exponential distribution is continuous and therefore it is impossible to take the sum over all possible outcomes. Also P(X=x) = 0 for all x. Instead we use the integral and the probability mass function. Then:

E[X] = integral_{-infty to infty} x*f(x) dx

The exponential distribution is only defined for x larger or equal than zero, since a negative rate of arrivals is impossible. This means the lower bound of the integral will be 0 instead of minus infinity.

E[X] = integral_{0 to infty} x*lambda*e^{-lambda*x} dx

To solve this integral one needs partial integration to get that E[X] = 1/lambda.

This is also very intuitive since lambda was the intensity of arrivals, so the number of arrivals in one time unit. So the time until an arrival will indeed on average be 1/lambda.

Again, there are many more probability distributions and all have their own expectation. The recipe however, will always be the same. If it is discrete, use the sum and P(X=x). If it is a continuous distribution, use the integral and probability mass function.

## Properties of the Expected Value

The expectation of the sum of two events is the sum of the expectations:

E[X+Y] = E[X] + E[Y]

Also, multiplying with a scalar inside the expectation is the same as outside:

E[aX] = aE[X]

However, the expectation of the product of two random variables is not equal to the product of the expectations, so:

E[X*Y] **≠ **E[X]*E[Y] in general

Only when X and Y are independent will these be equal.

## The Variance

Another important measure for probability distributions is the variance. It quantifies the spread of the outcomes. Distributions with a low variance have outcomes that are concentrated close to the mean. If the variance is high, then the outcomes are spread out much more. If you want to know more about the variance and how to compute it I suggest reading my article about the variance.

*This content is accurate and true to the best of the author’s knowledge and is not meant to substitute for formal and individualized advice from a qualified professional.*