As you have just seen, running regression in Excel is easy because all calculations are preformed
automatically. The interpretation of the results is a bit trickier because you need to know what is
behind each number. Below you will find a breakdown of 4 major parts of the regression
analysis output
Regression analysis output: Summary Output
This part tells you how well the calculated linear regression equation fits your source data.
Here's what each piece of information means:
Multiple R. It is the Correlation Coefficient that measures the strength of a linear relationship
between two variables. The correlation coefficient can be any value between -1 and 1, and its
absolute value indicates the relationship strength. The larger the absolute value, the stronger the
relationship:
- 1 means a strong positive relationship
-1 means a strong negative relationship
-0 means no relationship at all
R Square. It is the Coefficient of Determination, which is used as an indicator of the goodness of
fit. It shows how many points fall on the regression line. The R2
value is calculated from the total
sum of squares, more precisely, it is the sum of the squared deviations of the original data from
the mean.
In our example, R2
is 0.91 (rounded to 2 digits), which is fairy good. It means that 91% of our
values fit the regression analysis model. In other words, 91% of the dependent variables (y-values) are explained by the independent variables (x-values). Generally, R Squared of 95% or
more is considered a good fit.
Adjusted R Square. It is the R square adjusted for the number of independent variable in the
model. You will want to use this value instead of R square for multiple regression analysis.
Standard Error. It is another goodness-of-fit measure that shows the precision of your
regression analysis - the smaller the number, the more certain you can be about your regression
equation. While R2
represents the percentage of the dependent variables variance that is
explained by the model, Standard Error is an absolute measure that shows the average distance
that the data points fall from the regression line.
Observations. It is simply the number of observations in your model
Regression analysis output: ANOVA
The second part of the output is Analysis of Variance (ANOVA):
Basically, it splits the sum of squares into individual components that give information about the
levels of variability within your regression model:
- df is the number of the degrees of freedom associated with the sources of variance.
- SS is the sum of squares. The smaller the Residual SS compared with the Total SS, the
better your model fits the data.
-MS is the mean square.
- F is the F statistic, or F-test for the null hypothesis. It is used to test the overall
significance of the model.
- Significance F is the P-value of F.
The ANOVA part is rarely used for a simple linear regression analysis in Excel, but you should
definitely have a close look at the last component. The Significance F value gives an idea of
how reliable (statistically significant) your results are. If Significance F is less than 0.05 (5%),
your model is OK. If it is greater than 0.05, you'd probably better choose another independent
variable.
Regression analysis output: coefficients
This section provides specific information about the components of your analysis:
The most useful component in this section is Coefficients. It enables you to build a linear regression equation in Excel:
y = bx + a
For our data set, where y is the number of umbrellas sold and x is an average monthly rainfall,
our linear regression formula goes as follows:
Y = Rainfall Coefficient * x + Intercept
Equipped with a and b values rounded to three decimal places, it turns into:
Y=0.45*x-19.074
For example, with the average monthly rainfall equal to 82 mm, the umbrella sales would be
approximately 17.8:
0.45*82-19.074=17.8
In a similar manner, you can find out how many umbrellas are going to be sold with any other
monthly rainfall (x variable) you specify.
Regression analysis output: residuals
If you compare the estimated and actual number of sold umbrellas corresponding to the monthly
rainfall of 82 mm, you will see that these numbers are slightly different:
- Estimated: 17.8 (calculated above)
- Actual: 15 (row 2 of the source data)
Why's the difference? Because independent variables are never perfect predictors of the
dependent variables. And the residuals can help you understand how far away the actual values
are from the predicted values:
For the first data point (rainfall of 82 mm), the residual is approximately -2.8. So, we add this
number to the predicted value, and get the actual value: 17.8 - 2.8 = 15.