Wes McKinney, the Creator of Pandas: the Masterpiece of Data Analysis in Python
Today, I’d like to introduce Wes McKinney. He is an open source software developer, mainly focusing on data analysis tools.
He is especially interested in
1. Improving user productivity
2. Increasing performance and efficiency
3. Increasing data interoperability
Born in 1985, 1998~2001 video game speedrun TOP, Graduated from MIT majoring in
Theoretical Mathematics in late 2006. Until then, Wes McKinney did not know the first thing about Python.
After graduating from University, he worked in the Front Quant Research team at ACQ Capital Management
from 2007. There, many doctors were processing and analyzing data using SQL,
Excel(Spread sheet),and MATLAB. He felt a little frustrated using these tools.
Wes McKinney kept looking for a more succinct, better way. Meanwhile, one of his coworkers heard his thought,
and wrote several scripts with Python. The codes were similar to those presenting the algorithms
he learned back in University. Wes McKinney was instantly attracted to them.
In 2008, Wes McKinney finally enters the world of Python. This is how he met the SciPy.
However, he was still working in the financial company back then, struggling with SQL and Excel, spending 40% of his work time.
Looking at the SciPy paper, he realized that there are ways to process missing data and the null value using Python,
and that there are plenty of open source replacing the SAS(statistical tool) in common use.
The problem was that Python did not have the statistical tool he needed.
He compares the situation of not being able to use Python due to absence of statistical tool to this grand dilemma:
“Which comes first: the chicken or the egg?”
Here, Wes McKinney fancied himself as the chicken.
Then, he discovered the module he was looking for in the package Jonathan Taylor,
a professor in Statistics at Stanford University, made.
After one month of hard work trying to implant the code to Python,
the initial version of Panda came out to the world.
Around this time, he left the well-off financial company to start a doctoral course in Statistics in Duke University.
Soon, however, he took time off from school, after being impressed by John Hunter of matplotlib.
Then, he continues to develop Pandas.
Later, Pandas kept on improving itself, and now, it enables several statistical calculations
in Python and expansion of the legacy system.
From 2015, he is working on Apache Arrow project in partnership with R studio as a director of Usra Labs,
a nonprofit development group focusing on data science tools for Python and R.
Do you need database performance monitoring? Contact us and we will send you a free quote
[email protected]