Definition
In statistics, when we talk about distributions we usually mean
probability distributions.
Definition
(informal): A distribution is a function
that shows the possible
values for a variable and how often they occur.
Definition
(Wikipedia): In probability theory and statistics,
a probability distribution is a
mathematical function that, stated in
simple terms, can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment.
Examples:
Normal distribution, Student’s T distribution, Poisson distribution, Uniform distribution, Binomial distribution
Graphical representations
It is a common mistake to believe
that the distribution is the graph.
In
fact, the distribution is the ‘rule’
that determines how values are
positioned in relation to each other.
Very often, we use a graph to
visualize the data. Since
different distributions have a particular graphical representation, statisticians like to plot them.
The Normal distribution is also known as Gaussian distribution or the Bell
curve. It is one of the most common distributions due to the following reasons:
•
It approximates a wide variety
of random variables
•
Distributions of sample means
with large
enough samples
sizes could
be approximated to normal
•
All computable statistics are elegant
•
Heavily used in regression
analysis
. Good track record
Examples:
•
Biology. Most biological measures are normally
distributed, such as: height; length
of arms, legs, nails; blood pressure; thickness of tree
barks, etc.
•
IQ tests
•
Stock market
information
Controlling for the standard deviation
Keeping the standard deviation constant, the graph of a normal distribution with:
• a smaller mean would look in the same way, but be situated to the left (in gray)
• a larger mean would look in the same way, but be situated to the right (in red)
Controlling for the mean
Keeping the mean constant,
a normal distribution with:
• a smaller
standard deviation would
be situated in the same spot, but have a higher
peak and thinner tails (in red)
• a larger standard deviation
would be situated
in the same spot, but have a lower
peak and fatter tails (in gray)
The Standard Normal
distribution is a particular case
of the Normal
distribution. It has a mean of 0
and a standard deviation of 1.
Every Normal distribution
can be ‘standardized’ using
the standardization formula:
A variable following the Standard Normal distribution is denoted with the letter z.
Why standardize?
Standardization allows us to:
• compare different
normally distributed
datasets
• detect normality
• detect outliers
• create confidence intervals
• test hypotheses
• perform regression analysis
Rationale of the formula for standardization:
We want to transform a random variable from N~ μ, σ² to N~(0,1).
Subtracting the mean from all observations would cause a transformation from N~ μ,σ²
to N~ 0, σ² , moving the graph to the origin.
Subsequently, dividing all observations by the standard deviation would
cause a transformation from N~ 0, σ² to N~ 0,1, standardizing the peak and
the tails of the graph.