Over the past several years, the interest in “Big Data” has socialized the value of data to businesses. There are a host of companies developing platforms and tools to help companies utilize their data. While there may be “Big Data” skeptics out there, it’s impossible for IT executives to ignore the issue. This is all to the good, as it means that businesses are starting to take a serious look at the opportunity to use data to drive decision making, improve operational efficiencies and increase revenue. But companies that haven’t addressed these issues before find it difficult to know where to start. Having worked with and led data analysis teams for many years, I want to share some experiences about what works (and what doesn’t).
The Problem with Business Intelligence
The traditional approach in IT organizations to business intelligence is to define a problem, fund a project to create a data warehouse, and then analyze the data. With the pace of business today, this approach is no longer viable. It simply takes too long. The traditional approach often depends on business case approval and funding cycles, which automatically add time and overhead. Worse, it also depends on precisely understanding the problem and the benefits of a solution (which are needed for the business case) – all before looking at any data. The problem with this approach is that you often need to work with the data before you can fully understand what questions to ask and what the data is telling you. A simple analysis can often tell you whether a direction is likely to be fruitful or not. It’s better to figure this out quickly, instead of after months of working through the business case approval process.
Learning to Be Nimble
Recognizing this, companies have learned how to be more nimble. They may create a flexible team of analysts with a broad mandate to go after problems in a specific domain, such as operational efficiency, marketing insight, or brand enhancement. It’s likely that this team will set up a data integration platform, and will then go after the data they need. While this is the right approach, ironically, it also ends up taking too long all too often. Why? One reason is that it takes time to get all of the data in one place. In my experience, up to 90% of wall clock time can be spent gaining access to the necessary data, even when that data is owned by someone in your business, while it takes only 10% of the time to actually analyze it. Why does it take so long to get access to the data?
The analytics team often finds a number of roadblocks in their way. They need to find out what data exists, who “owns” it, and get their cooperation. This can take months. If the data doesn’t exist, isn’t being collected, or needs to be enhanced, it can take even longer. Moreover, in a large company, there may be multiple groups trying to solve related problems, who aren’t even aware of each other’s efforts. This results in duplication of effort, which is not only slow, it’s costly.
Creating a Big Data Program
There are many aspects of a comprehensive program that uses data to drive business value. There are issues related to tools, governance, expertise, and culture – each of which is a topic in itself. In this section, I’ll touch on three elements of a Big Data program that cut cross these areas.
Recognize that Data is a Corporate Asset
First, leadership needs to recognize that every bit of data generated by the business is valuable: it’s a corporate asset. You need to manage it, just as you manage other assets. Achieving this often requires a culture change. In the past, the goal was to get a product or service out quickly. Collecting data about the service to produce meaningful metrics was of secondary importance. Perhaps, if budget was tight, you might have faced the question: do we really need to collect that data? Instead, the default should be “what can I measure that might be useful?” and “what can I measure that will allow me to make things better over time?” Then, measure it and collect the data. You can’t improve what you can’t measure.
Bring Data Together Proactively
Second, it’s important to recognize that analysts can’t analyze data they don’t have. Instead of waiting for a business case, proactively bring all of the data about your service and your customers together in a data integration platform. I’ve recently been hearing this platform called a “data lake” or “data ocean,” which gives the right impression. This promiscuous attitude to data integration has become much more feasible with low-cost distributed storage, as compared with large engineered data warehouses of the past. There are a broad range of tools emerging that can be used to analyze data that is brought into the data lake. Streaming analysis tools may be needed when it is important to generate results in real-time. Data search/exploration tools allow interactive exploration of both unstructured and structured data. It’s valuable to create a meta-data repository about the data in the lake, so analysts know what is there, and can find the data they need.
Establish an Open Data Culture
Finally, cultivate an “open data” philosophy, and facilitate sharing of data, code, results, and approaches among your teams. Clearly, you need to worry about information security, so that sensitive customer and business information is not vulnerable to attacks. But sharing within the business should be the default, not the exception. For example, you could mandate that all internally developed analysis and visualization software be stored in an internal Github repository. Data analysts should also be encouraged to publish derived data or results via APIs that are documented in the meta-data repository. In this way, when one analyst figures out how to join two data sets that lack a common key, all of the other analysts in your business can benefit from that learning.
By institutionalizing a culture that values data, it’s much more likely that the data that is needed will be collected in the first place. By creating a data lake, meta-data repository and data exploration tools, you can eliminate or dramatically reduce the time spent getting access to data. And, by encouraging sharing of data, code and results, new projects can readily build on the success of older ones.
This is not the traditional business intelligence approach. It’s not about knowing what you want to do, in detail, up front. The data will teach you way you need to know, if you create the culture and framework that allow analysts to unlock its value.