How to Implement Data Science for Your Business Part 2: Getting Your Data Right

Note: This is part 2 of a 4-part series. Part 1 can be found here.

I was recently talking shop with a good friend of mine who is a Data Scientist at an investment bank. As we compared our approaches to our work, we found ourselves talking less about the latest advancements in AI or cool Tableau visualizations, and more about working with and cleaning data. In fact, to put a cap on the conversation, my friend offered this take: “Most companies spend too much time trying to use AI and Machine Learning, and they would be much better off simply getting their data to the point where they can automatically calculate averages on their data.”

Not to speak ill of AI or Machine Learning, but I wholeheartedly agree with my friend, and I suspect most other Data Scientists do as well. The point isn’t that averages are more useful than Machine Learning, the point is having data that is easy to work with is the single most crucial aspect of using Data Science to make a strategic impact.

Put another way, trying to get insight from your data when it isn’t clean or well-maintained is like trying to build a house from some trees you just cut down. It can certainly be done, but it’s a whole lot easier once those trees have been turned into 2x4’s.

So what does this actually mean for a small business trying to implement Data Science? In most cases it means gathering your data sources, storing them in a permanent and consistent way, and cleaning up any problems in your data so it can be used for analysis.

For example, suppose you are running a multi-media digital marketing campaign. You are using one platform to send emails while also running facebook ads. You pull daily reports from both your email platform and facebook to track the progress of your campaign, then you save that data in separate tabs of an Excel workbook. Once the data is in Excel you may have to reformat dates so they are consistent, but then you have data that is comparable between your two platforms so you can analyze your campaigns.

With the data in place, maybe you notice that facebook is much more effective during the weekend, while your emails have better reach on weekdays. So you spend less on facebook during the week and less on emails during the weekend, saving you money without having a major effect on your overall customer reach.

Of course, manually copying data into an Excel workbook isn’t great practice. So now you may want to use some of those savings to invest a little more in your Data Science infrastructure. A Data Scientist could help you take things to the next level, writing automatic scripts that pull that data from API’s, format the data appropriately, then write everything into an SQL database. As you run more campaigns and advertise over more platforms, you can easily scale that infrastructure to track everything that is going on in your campaigns.

With this little bit of investment, suddenly you have lots of data from lots of platforms, all formatted correctly and stored in one place. The possibilities of what you can do with this data are endless. You can build interactive dashboards using tools like Tableau to track your performance in real time. You can train detailed predictive models to predict your sales more accurately. You can even take simple averages of different metrics!

Wait I ended on the least exciting one there…

But it doesn’t matter, because once your data is set up you’re in a position to extract whatever insight you want from your data. And that’s exciting enough!

Odds are your organization has some variety of data they should be tracking. If that data is already gathered, stored, cleaned, and ready for analysis, then great! Stay tuned for more insight in the coming weeks on what to do with that data.

If you need to get to the point where that data is usable in the palm of your hands, however, then go ahead and reach out and we’ll help you figure out how to get it there. It’s a lot simpler than you think, and once you have that data the possibilities are endless...

Including calculating averages!