If all the hype and deluge of headlines, articles and advanced analytics and reporting material is anything to go by, BIG DATA (BD) is the next big thing. At times you may even wonder what have we been doing in the name of analytics and insight generation thus far. So is it worth all the hype, or is it a bunch of data and business intelligence companies who like to use these two magic words – BIG DATA and “Hadoop.” Is there a solution ready for an enterprise level need?
So what is BIG DATA, one more time?
To begin with, BIG DATA is NOT just “BIG”. It is a misnomer that implies it is only about the size, but put simply, it is big, fast and diverse data that can come from varying sources and channels (offline and online) but cannot be processed or analyzed using traditional processes or databases or even data warehouses. It is a methodology and approach (not just a technology solution) to collect, store, analyze and convert the volume, velocity and variety of data into business critical and actionable insights for organizations to get ahead of the competition. A quick view of the key characteristics, the 3 Vs:
- Volume – A shift from managing terabytes to petabytes, exabytes and zettabytes of data. Facebook and Twitter alone generate approx 20 terabytes of data each day.
- Variety – Complex combination of raw, structured, semi-structured and unstructured data from Web pages, log files, indexes, social media, emails, documents, sensor data from active and passive systems generated due to the explosion of sensors, smart devices, communication and social collaboration.
- Velocity – The speed at which the data can flow and provide near real time analysis and actionable insights. A capability to parse the data in motion and not just the data at rest.
What is it trying to solve?
For me, it is not so much about solving a problem but creating an opportunity that has been around for a while, but never been tapped. It is an attempt to provide businesses with insights, hidden behaviors and patterns they didn’t know they didn’t know. If executed successfully, organizations could benefit by:
- Applying predictive models and scoring against fast-moving data and complex event streams for smarter decisions in real time
- Using tips for turning massive amounts of data from online customer behavior and social media activity into valuable and timely business insight
- Becoming a proactive organization by using big data analytics to speed recognition and resolution of problems in customer experiences, supply chains, and business processes
- Addressing new challenges posed by streaming data, social media data, content, events and so on
How is BIG DATA (BD) different from a conventional Data Warehouse (DW)?
There are fundamental differences like:
- Variety – A DW is more ideal for analyzing structured data, BD solves the “variety” challenge
- Processing – Data in DW is usually cleansed, enriched, modeled before being stored, a higher value per byte whereas data in BD does not go through the same quality controls and checks because of the obvious cost. The data is typically stored in its native format.
- Shelf Life – Data in DW can have a much longer shelf life compared to BD
In a well laid out enterprise solution, a BIG DATA solution could push its “reduced” data from a “MapReduce” program permanently into a DW. In other words, BIG DATA will never replace a DW but will compliment it.
It would be too pre-mature to either write it off or treat it as a universal solution to all measurement, analytics and insight but it does possess enough impetus that deserves attention, investment and a well-planned, architected execution. It is definitely a step in the right direction for any business – an initiative that is here to stay.