Regression Analysis, Everything Converges to Average.
If we define machine learning as a method of learning data, predicting, and classifying,
the most famous machine learning algorithm would be the regression analysis,
which is developed in the field of Statistics.
Especially, the regression analysis was the most frequently
used forecasting technique in academic papers
and administrative work before the advent of big data.
It was because it was able to express the relationship between independent variable
and dependent variable in a formula, with relatively little data.
For a more elaborate example, we can predict the relationship
between the apartment price,
which is the dependent variable and several independent variables
(like space, access, school district, or convenience facility)
with the method of multiple return analysis.
We use this method because we assume that data has a characteristic of converging
to a specific value, so called as 'regression to the average’.
Who would have thought of this ‘regression analysis’?
Galton was interested in genetics.
Among his 340 papers, there is a study which observed the relationship
between the parents’ and children’s height.
Galton made the average height to adjust the difference
between dads’ and moms’ height, and called it ‘midparent height’.
Then, he drew a graph classifying the midparent heights.
The title of this paper is
'Regression toward Mediocrity in Hereditary Statuer'.
One interesting fact is that he discovered it not from the regression expression,
but rather from drawing accurate line of each values.
There are some other famous people known for regression expression.
It is the method of least squares.
Let’s listen to what these two people have to say.
Historically, it is known that Gauss, the one who thought of it earlier, takes the credit.
The least squares method is a procedure that finds the fit
that minimizes the distance between each line(curve) and point.
Through this method of least squares,
we are able to find out the regression expression in regression analysis.
But this least squares method came out first?
Its basis, the Bayes algorithm, however, has a definite source.
It is from Thomas Bayes.
He was a British Presbyterian minister and a mathematician in the 1700s.