Your company needs data science consulting before Big Data tools
“Big Data” and “Data Science” are today’s buzzwords. Capitalism has responded predictably to a business media frenzy: the world is flooded with Big Data products, and businesses have enthusiastically invested in them for years now. Many companies are trying to modernize their data platform and enable their employees to monetize their valuable data, but most businesses are not seeing the benefits.
A white paper and survey from Kapow Software states: “Big Data projects are taking far too long, costing too much and not delivering on anticipated ROI because it’s really difficult to pinpoint and surgically extract critical insights without hiring expensive consultants or data scientists in short demand.” Only 10% of Kapow’s survey responders agreed that their Big Data solutions are effective at getting important data into the hands of their employees, and only 10% agreed that their vendors provide effective guidance.
Many are misinformed about how science produces value, and there’s a major misconception that data infrastructure is the key investment that they need to make. Some are realising that they don’t have the data scientists to utilize data infrastructure effectively; they have tools, but a deficit of ideas and the right kind of talent using them. Product companies benefit from this misconception, and contributive actively to it: it’s easier to sell a software license than to solve a real problem, and easy to believe that your employees can already solve problems but just lack the tools that are being sold to you.
If you think you’re in this situation, these questions might help:
- Where have you seen concrete examples that your employees have created innovative data products in the laboratory state and you just couldn’t implement them due to lack of technology?
- Have your employees created accurate predictive analytics solutions that just didn’t scale well enough to reach production?
- Has the strategy of “build it and they will come” been successful at your company?
Software consultants, including ThoughtWorks, have had great success over the past two decades convincing businesses of the value of Agile methodology and lean approaches. Build or buy what you need only when you know you need it next. Build things iteratively with feedback loops in place. If someone tells you that you’ll see benefits only after building some enormous system, run away and quickly!
As my colleague Ken Collier argues in his book Agile Analytics, data infrastructure seems to have survived the big-upfront-investment extinction that happened in the rest of the software industry. While the Big Data product champions might claim that their products enable agility, the truth is that they are massive, inflexible systems suited only for scaling up systems whose idea development is nearly completed. Their tools solve an important problem. That problem is just not the problem that most companies really have.
Put Small before Big
There is no point in building a scalable data environment when you have not already proven some ideas at a smaller scale. Putting a Hadoop eco-system in place or buying an expensive database before actually requiring it is not only wasteful, but may not enable the kind of scaling that you’ll need. What if your bottleneck is network bandwidth rather than IO or processing? This is like building a 6-lane highway system before finishing inventing the car.
Experienced data scientists know how to reduce data (extracting the small amount of valuable information from a larger data set). This may include sampling, variable selection, compression or choosing a more appropriate algorithm. Never in my 20 years of experience have I required more data than can fit on my personal computer in order to learn enough to develop a useful algorithm. Working with more data than this actually hinders development of initial ideas and delays the development of a minimally viable product. By putting small before big, by the time you need to run on larger amounts of data, or all the data, you know exactly what kind of scalable technology is going to be needed and know how to make an intelligent, timely investment if necessary.
Have the right kind of data science talent
If your company really does have the right kind of data science talent, congratulations! You’re in the minority. Most companies will answer a question about their data science talent in one of two ways. Some will admit that they probably do not have the right kind of talent but do have software developers or less experienced data analysts who can play the part until they can put together a data science team. They’re struggling to hire that data science team, and in the mean time, they’ve hired a product company to set up a Big Data platform that currently adds little value.
Other companies will simply answer: Well, I hope we have the right kind of talent! They may have people on the payroll whose job title is “data scientist” or have an entire analytics team, but while these people are doing constructive data work, they may not be capable of creating and using advanced data science algorithms, and under a lot of stress from working well beyond their areas of expertise.
If you don’t have it, rent it.
The solution is the same: hire an experienced data science consultant. If you’re unsure about the capabilities of your data science team, this is the quickest way to find out, and you can do this without conflict between existing data science teams and the data science consultant. Like a management consultant, a good data science consultant can work with your existing teams and help them be more effective at delivering data science applications. Data science consultants can help with the high level strategies around utilizing data to improve your business model, and some consulting organizations, including our own, can provide the entire delivery team and help with hiring people to support it once the delivery is complete.
Why not just hire a full time data scientist?
Many companies haven’t even considered hiring a data scientists as consultants and simply wish to hire them as permanent employees. While I’m strongly in favor of doing this as well, the reality is that data scientists are very difficult to hire. They are in great demand and can afford to be very choosy where they work.
I used to work as a cosmologist at NASA. When I tell people that, people seem surprised. “That sounds like so much fun. Why would someone stop doing that and go work in the business world as a consultant?” I answer, “This is also a lot of fun”. The moral is that data scientists can do lots of fun things. If they are going to work at your company, it had better be a fun place offering them plenty of technical challenges, power to make changes and ability to stay focused on crucial ideas rather than minutiae. Outside of the hottest tech startups, data scientists are more likely to find these conditions offered to them in consulting roles rather than as permanent employees. It’s my prediction that the best data scientists will migrate into the consulting industry and this is how most companies will access these rare skills in the future.
The most common objections from business people to hiring data scientists as consultants is that they believe that knowledge of their company and its data is crucial. They can’t imagine a consultant learning enough quickly to be productive in weeks and, if even if they do, they don’t want that knowledge walking out the door in six months. The last objection is understandable but, then again, do you not worry about your other talented employees suddenly leaving their jobs? At least with a consultant you have plenty of notice and usually you get to decide when they leave. I’m certain that if you have a good data scientist, they get job inquiries daily, and if you’re not worried about them leaving, you should be.
The first objection is also understandable. A data scientist is indeed more productive, the more they know the data. Data science consultants however can make up for this in several ways. One is that they have experience with many clients. What they lack in knowledge of your company, they make up for in knowing more about what is going on outside your company. They can bring in fresh ideas, ask the uncomfortable questions that no one else will ask and look at your business model for creative improvements. One of the most crucial tasks any consultant can perform is to observe the social and organizational environment of the business and give an objective opinion of where problems lie.
The last objection we hear is that consulting isn’t free. They’ll say, “Of course consultants suggest hiring them to build something rather than buying some off the shelf product!” To that, I’ll just ask you to refer to what I said earlier about data scientists having plenty of fun opportunities. Reinventing the wheel doesn’t qualify as fun. I don’t know any data scientist who wants to work on something that is available off the shelf somewhere else. Data science problems that are interesting and worth pursuing are also generally unique enough to require custom software. Nobody writes general purpose software to handle your company’s unique problems and if they did, what would that say about your company’s competitive advantage?
We recommend that companies hold off investing in big technology initiatives without first speaking with a data science consultant about the bigger data strategy, and preferably a consultant who isn’t selling products. Consider your commitment to Agile methodologies, and apply the same strategy to your data initiatives. If you are trying to build a data science practice, a data science consultant can help you steer that venture and get started on the right foot. Maximally leveraging your unique information should be your core competency, so invest in it wisely.
Via: Google Alert for Data Science