How to Implement Data Science for Your Business Part 3: Finally it's Time to Talk about Machine Learning!

Note: This is part 3 of a 4-part series. For an overview of the series, check out Part 1. For more on how to prepare you data for analysis, read Part 2.

Hey did you hear about Machine Learning? You know that thing Silicon Valley companies are using to build self-driving cars, predict a shopper's next purchase, and alert you to traffic congestion before it even happens?

Oh you did hear about it? That's why you're reading a Data Science blog in the first place you say? And you're kind of annoyed that this is my fourth blog post and I'm just now talking about it?

Well great news! It's finally time to talk about machine learning!


Wait a minute... actually we need to talk about statistical analysis first. Sorry, but I promise you it's worth it!

In the broadest terms possible, statistical analysis simply means using statistics to get non-obvious insight from your data. This definition includes machine learning of course, but it also includes summary statistics like averages, minimums, maximums and sums. Group-bys, where you aggregate one column based on the values of another column (think pivot tables) are also a form of statistical analysis.

Summary statistics and group bys are pretty simple and can be done almost as soon as you have data. For that reason, this type of analysis is typically a part of a process called exploratory data analysis (EDA) - i.e. the type of analysis that is done when first exploring a dataset. EDA is extremely valuable, and any Data Scientist worth their salt will do a lot of EDA before getting into machine learning. EDA can even lead to some really good insight and on it's own without having to get into ML.

But most of the time in a Data Science project, exploratory data analysis acts as a lead-in to machine learning. Yes, we're finally talking about ML! Whereas summary statistics are simple calculations, ML uses more complex algorithms and typically involves different variables in your data. It can find the hidden relationships between different features of your dataset and it can account for how the interactions between different variables affect these relationships. Machine learning can be smart enough to handle unusual values, and it can be powerful enough to deliver real insight from a limited amount of information.

If you didn't already buy into the hype surrounding ML, hopefully you do now. But hopefully you also understand how it fits under the larger umbrella of statistical analysis. A well-executed analysis process that includes both ML and EDA usually leads to significantly better insight than just machine learning by itself.

Take for example a real estate company trying to predict house prices. They know from experience that people prefer to move in summer months, so they suspect house prices will be different in the summer than in the winter. To quantify this, a Data Scientist could look at average house prices grouped by month and see that on average, houses sell for a few thousand dollars more in the summer months than in the winter months. This is a classic example of EDA.

But much more goes into house pricing than month of sale, however, and it would be foolish to price every house sold in June according to the average June house price. Smaller houses, for instance, would be overpriced while larger houses would be underpriced. So now the Data Scientist can train a machine learning model to predict house price based on both square footage and month of sale. This model probably won’t be perfect, but it’ll consider multiple pieces of relevant information and weight them appropriately to find the most accurate price it can find.

That's how ML usually works. It requires some subject matter expertise and a decent understanding of the data before you can get into it. But once you know what you're doing you have the power to get really precise insight. In the above example that precise insight is an exact predicted house price. But it can be just about anything under the sun that we know how to quantify.

If you understand what ML can do and where it fits into the larger picture of statistical analysis, you'll begin to see countless ways it can be practically used in your organization. Hopefully you'll also begin to see just how close you actually are to using ML to improve the way your business is run. With just a little bit of data expertise (perhaps as provided by your favorite boutique Data Science consulting firm…) you can quickly get there and really see the impacts of machine learning and Data Science on your business!