Has the term "big data" become meaningless? Thanks to the buzz, it's been slapped onto a vast range of products by overzealous marketers. And this can make it hard for users to know the true definition and value of big data.
In his new book Big Data at Work: Dispelling the Myths, Uncovering the Opportunities, Tom Davenport, professor at Babson College and research fellow at the MIT Center for Digital Business, attempts to cut through the hype and give readers a concrete sense of what big data is truly useful for today. In this Q&A, he talks about the three types of big data projects, gives advice on how to recruit and retain data scientists and provides his perspective on where big data is headed in coming years.
You feel that the term "big data" is problematic. Why?
Tom Davenport: It's an umbrella term and a lot fits beneath the umbrella. Most people agree that big data is large in volume -- I say too big to fit on a single server -- too fast moving to easily segregate into a data warehouse, and too unstructured to fit into a relational database, which I think is the most important aspect. The "big" term is the least important. People feel like they have to have petabytes of data to qualify, but other attributes matter more and are more difficult to deal with.
I think we should just refer to "all data" because it's the combination of these new unstructured types with the previously structured [data] that is powerful. But I don't think we'll be rid of the term anytime soon.
You listed three ways that big data can provide value to organizations today. What are these scenarios?
Davenport: The least interesting to me is doing the same stuff more cheaply using big data technologies, specifically Hadoop running on commodity servers. That's powerful, but it's never quite as interesting when you just do the same thing and it costs a little less.
Then there are two types of decision improvements: faster decisions and better decisions. So we can process data a lot more quickly, and also use new types of data to make better decisions. Some of it's semistructured or not structured, [but] we can combine [data] from our previous models with external data and make better decisions [with] more predictive models: what a customer's likely to do or what our performance is likely to be.
The one I'm most excited about is doing entirely new things, and that generally involves new products and services based on data and analytics. Most business analytics were used for internal purposes, and this opens up the game considerably.
Data scientists are a hot -- and scarce -- commodity. What do you think are the critical skills of a data scientist?
Davenport: At the core is combining computational capabilities with analytics capabilities. We had quantitative analysts and computer science types before, but [big data] people are at the intersection, because the data takes some work. One of the sad facts is that 80-90% of data scientists' labor is getting the data in shape for analysis. At some point we'll have tools that will let us do those things more easily, but for now, we don't. Then I think any person who will [act] as a consultant to internal managers needs strong communication skills and the ability to inspire trust. The best quantitative analysts say they spend half their time figuring out how to explain [concepts] in terms that nonquantitative people will understand. If you create products and services based on big data, then you'll also need product development orientation.
People would have to be superhuman to have all these skills, so the majority of organizations are going to have teams that pull them together. That can be tricky because people with one skill don't often have a lot of respect for people with other skills, so you've got to make sure that team [members] respect each other.
And even when companies manage to recruit people with these skills, the work isn't over -- the focus then switches to retention. How can HR managers ensure these valuable employees don't stray?
Davenport: You can use analytics to try to identify who's at risk of leaving -- many companies have done that with all types of employees. Also, data scientists, like everybody else, care about money and stock options and so on, and you can get those things in startup companies. So you need to match the entrepreneurial rewards that companies are giving in places like Silicon Valley and Boston for data scientists. But I also think feeling their work is being used and their ideas applied is critical. When I talked to data scientists at startups, that was a big part of the appeal.
Data scientists [and] IT people in general care a lot of about learning and growth, so give them as many opportunities for learning as you can. That will also help.
How should decision making processes adapt to the big data age?
Davenport: Decision making processes are hard to change -- there's politics associated with them in many organizations. But the data in big data environments is continuously flowing, so you need a more continuous approach to decision making as opposed to doing some analysis and making a decision that'll last for months or years. A continuous decision-making, adjusting [and] reviewing process will be important.
Eventually, we're going to see a lot more automated decision making just because humans can't grasp all of the data and variables, and so we're going to turn more and more over to computers and software. We'll have to figure out new ways that humans will oversee this whole process.
What do you think lies ahead for the future of big data?
Davenport: I concluded the book with the idea of Analytics 3.0. We've had traditional analytics for several decades, [and] that worked well, but it was slow and didn't have the kind of impact we wanted. Then we had companies -- particularly in online businesses -- doing great things with big data, but they didn't have any traditional IT environment or infrastructure. Now, I think the most sophisticated companies are saying "We need to combine big and small data, have people who can deal with both, do both new products and services and [revamp] internal decisions quickly and at scale." So I think this 3.0 [era] is likely to be what happens for the next five years or so.
More on big data
Read a big data tutorial
Discover the business benefits of big data
4.0 -- who knows? One could argue that the things we're seeing in financial services now are maybe what all industries will be like at some point, where more and more decisions are made in an automated fashion, [and] humans only have a dim view of how they're being made.
How do you recommend that companies get started with using big data?
Davenport: If you would like to encourage a big data initiative in your company, you need to address management awareness and understanding first. You could work with consultants or academics, or just embark on an education initiative.
The software is easily available and the hardware is cheaper and cheaper all the time. The only thing that isn't widely available is the talent. So if you're a company that hasn't done anything yet you might have a hard time hiring those people, so you might think about developing skills in your existing IT or quantitatively oriented people. Have them take a course in Hadoop and start to understand some of the issues associated with whatever kind of data you're likely to use -- textual, speech, video, human genome.
Then you just have to decide on a first project, and that's always a combination of something relatively small and doable and something important enough to impress people and make a difference in your business.