Distributions

Lesson 9/26 | Study Time: 1 Min

Course: Statistics for Data Science and Analytics.

Definition

In statistics, when we talk about distributions we usually mean
probability distributions.

Definition
(informal): A distribution is a function
that shows the possible
values for a variable and how often they occur.

Definition
(Wikipedia): In probability theory and statistics,
a probability distribution is a

mathematical function that, stated in
simple terms, can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment.

Examples:
Normal distribution, Student’s T distribution, Poisson distribution, Uniform distribution, Binomial distribution

Graphical representations

It is a common mistake to believe
that the distribution is the graph.
In
fact, the distribution is the ‘rule’
that determines how values are

positioned in relation to each other.

Very often, we use a graph to
visualize the data. Since

different distributions have a particular graphical representation, statisticians like to plot them.

The Normal Distribution

The Normal distribution is also known as Gaussian distribution or the Bell
curve. It is one of the most common distributions due to the following reasons:

•
It approximates a wide variety
of random variables

•
Distributions of sample means
with large
enough samples
sizes could
be approximated to normal

•
All computable statistics are elegant

•
Heavily used in regression
analysis

. Good track record

Examples:

•
Biology. Most biological measures are normally
distributed, such as: height; length
of arms, legs, nails; blood pressure; thickness of tree
barks, etc.

•
IQ tests

•
Stock market
information

Controlling for the standard deviation

Keeping the standard deviation constant, the graph of a normal distribution with:

• a smaller mean would look in the same way, but be situated to the left (in gray)

• a larger mean would look in the same way, but be situated to the right (in red)

Controlling for the mean

Keeping the mean constant,
a normal distribution with:

• a smaller
standard deviation would
be situated in the same spot, but have a higher
peak and thinner tails (in red)

• a larger standard deviation
would be situated
in the same spot, but have a lower
peak and fatter tails (in gray)

The Standard Normal Distribution

The Standard Normal
distribution is a particular case
of the Normal
distribution. It has a mean of 0
and a standard deviation of 1.
Every Normal distribution
can be ‘standardized’ using
the standardization formula:

A variable following the Standard Normal distribution is denoted with the letter z.

Why standardize?
Standardization allows us to:

• compare different
normally distributed
datasets

• detect normality

• detect outliers

• create confidence intervals

• test hypotheses

• perform regression analysis

Rationale of the formula for standardization:

We want to transform a random variable from N~ μ, σ² to N~(0,1).
Subtracting the mean from all observations would cause a transformation from N~ μ,σ²
to N~ 0, σ² , moving the graph to the origin.
Subsequently, dividing all observations by the standard deviation would
cause a transformation from N~ 0, σ² to N~ 0,1, standardizing the peak and
the tails of the graph.

Previous Lesson Next Lesson

Xaviour Aluku

Product Designer

Profile Book a Meeting

Class Sessions

1- Types of data and level of measurement 2- Graphs and Tables that Represent Categorical Variables 3- Excel formulas 4- Graphs and tables that represent numerical variables 5- Graphs and Tables for Relationships Between Variables. 6- Mean, Median, Mode 7- Variance and Standard Deviation 8- Covariance and Correlation 9- Distributions 10- The Central Limit Theorem 11- Estimators and Estimates 12- Confidence Intervals and the Margin of Error 13- Student’s T Distribution 14- Formulas for Confidence Intervals 15- Scientific method 16- Hypotheses 17- Decisions You Can Take 18- Statistical Errors (Type I Error and Type II Error) 19- P-Value 20- Formulae for Hypothesis Testing 21- Basics 22- Linear regression equation 23- How to do linear regression in Excel with Analysis ToolPak 24- Interpret regression analysis output 25- How to make a linear regression graph in Excel 26- How to do regression in Excel using formulas