The ‘Hello World’ of data analysis, IRIS data
When learning programming, we always start with printing “Hello World” using print function.
The phrase comes from the famous book by Brian Wilson Kernighan and Dennis Ritchie. .
When entering the field of data analysis, you will also see a common starting point.
On the UCI website, which has been providing data sets for machine learning for years,
the iris data is among the most popular ones.
Most machine learning algorithm course starts with this iris data, consisted of 150 rows and 5 columns.
How could this simple data set become the ‘Hello World’ of data analysis?
A British statistician and biologist Ronald Fisher introduced this iris data set in his paper
"The use of multiple measurements in taxonomic problems", in the exercise for multivariate statistics.
He is indeed a historical figure. He created many terminologies we learn in Statistics.
Coming back to our original story, In fact, the person who actually created this iris data is Edgar Anderson.
He is an American botanist who came to study with Fisher.
Presumably, it seemed of too much work for Fisher, who was already an elder in Statistics,
to do the hard work and observe the flowers in the sun. Therefore, Anderson, who came for research with his fellowship,
made this set with his expertise on morphology.
It might seem simple, but this data enables morphological classification of iris flowers,
measuring the length and width of petals and sepals.
Iris flower is classified into 3 species: Setosa, Versicolour, Virginica. Each is distinguished by the length and width of both petals and sepals.
They all seem like the same kind, but the results differ according to different species.
It is a well-classified data, and is very useful in analyzing.