I don’t think the person exists that doesn’t take a step backwards when they first hear they need to perform regressions especially the dreaded Multi-Variant Linear Regression.

Let me help everyone that is frightened by this….It looks scary, the formulas look like some secret spy code, the good news is most of us have done a similar analysis we just didn’t write the regression model out.

Simply put Regressions put the “variable of interest” (the dependent variable) on one side of the equation and the variables that we believe contribute or explain that variable of interest on the other side of the equation.

That explanation even sounds scary to me even with as simplified as it is.

The example used in many text books is the Salary analysis.

Salary = years of experience + years of education + average salary in a field

When it is put into the standard form we simplify variable names and add some requisite pieces to the equation that I’ll explain in a minute.

Variables

S – Salary (the dependent variable from dataset)

E – years of Experience (Independent variable from dataset)

D – years of Education (Independent variable from dataset)

F – average Salary in Field (Independent variable from dataset)

α – the Y-axis intercept

β – variables coefficient, one for each variable so subscripts are assigned to them. β1, β2, β3…

μ – error term component, this effectively is a stand in variable for all the possible variables we do not know of or do not have data for.

S = α + β1E + β2D + β3F + μ

The 3 parts you can’t do without. α, β, μ .

α : the Y-axis intercept

β –Coefficient of the variable for how much the variable “explains” the “variable of Interest”. While it may look like a simple “how much does this variable contribute to the ‘variable of interest’ like 45%” This IS NOT what the Coefficients function is in the regression.

μ – error term component, sometimes noted as U (unknown), e or ε (error).

The goal of a regression is to test how well the variables we Hypothosize explain the Dependent Variable of interest. It would be great if our Hypothesized equation (Model) explained 100% that is entirely explains the Variable of interest. Well let me tell you that won’t happen. The μ error term helps get us closer but we still won’t get all the way there.

It is not uncommon for one of the independent variables to be found to not explain the dependent variable at all, at which point a new model omitting that variable is in order. Unless we want to support the hypothesis that that variable has nothing to do with the dependent variable.

If we go back to the above Model and we found D (years of Education) did not explain Salary, we may be incline to re-specify the model without variable D. Personally, I would be re-examining the datasets for issues because logically we probably all believe years of education directly effects Salary, why else would we have studied stats.