Distributions

Lesson 9/26 | Study Time: 1 Min

Definition


In statistics, when we talk about distributions we usually mean
probability distributions.

 

Definition
(informal): A distribution is a function
that shows the possible
values for a variable and how often they occur.

 

Definition
(Wikipedia): In probability theory and statistics,
a probability distribution is a

mathematical function that, stated in
simple terms, can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment.

 















Examples:
Normal distribution, Student’s T distribution, Poisson distribution, Uniform distribution, Binomial distribution



Graphical representations


It is a common mistake to believe
that the distribution is the graph.
In
fact, the distribution is the ‘rule’
that determines how values are

positioned in relation to each other.

 

Very often, we use a graph to
visualize the data. Since









different distributions have a particular graphical representation, statisticians like to plot them.







The Normal Distribution

 The Normal distribution is also known as Gaussian distribution or the Bell
curve. It is one of the most common distributions due to the following reasons:

        
It approximates a wide variety
of random variables

        
Distributions of sample means
with large
enough samples
sizes could
be approximated to normal

        
All computable statistics are elegant

        
Heavily used in regression
analysis

.    Good track record





Examples:

 

        
Biology. Most biological measures are normally
distributed, such as: height; length
of arms, legs, nails; blood pressure; thickness of tree
barks, etc.

        
IQ tests









        
Stock market
information


Controlling for the standard deviation





Keeping the standard deviation constant, the graph of a normal distribution with:


• a smaller mean would look in the same way, but be situated to the left (in gray)

• a larger mean would look in the same way, but be situated to the right (in red)



Controlling for the mean




Keeping the mean constant,
a normal distribution with:

 a smaller
standard deviation would
be situated in the same spot, but have a higher
peak and thinner tails
(in red)


 a larger standard deviation
would be situated
in the same spot, but have a lower
peak and fatter tails (in gray)






 The Standard Normal Distribution





 The Standard Normal
distribution is a particular case
of the Normal
distribution. It has a mean of 0
and a standard deviation of 1.
Every Normal distribution
can be ‘standardized’ using
the standardization formula
:



A variable following the Standard Normal distribution is denoted with the letter z.

Why standardize?
Standardization allows us to:

• compare different
normally distributed
datasets

• detect normality


• detect outliers

• create confidence intervals

• test hypotheses

• perform regression analysis

                   




Rationale of the formula for standardization:

We want to transform a random variable from N~ μ, σ² to N~(0,1).
Subtracting the mean from all observations would cause a transformation from N~ μ,σ²
to N~ 0, σ² , moving the graph to the origin.
Subsequently, dividing all observations by the standard deviation would
cause a transformation from N~ 0, σ² to N~ 0,1, standardizing the peak and
the tails of the graph.