Imagining the Possibilities of Big Data (OPINION)
The rapid advancement of information technology continues to significantly impact the way we conduct both our business and personal lives. These digitization technological changes have resulted in new terminology such as Big Data, the Internet of Things (IoT)/Internet of Everything (IoE), data grids and in-memory databases. Only a few people have not participated in this new paradigm, made possible via smartphones, sensors, software and the Internet.
The term Big Data was first referenced by University of Pennsylvania economist Francis X. Diebold at a conference in 2000. But the most commonly used definition of Big Data comes from a 2001 research note written by Doug Laney, an analyst with the Meta Group, where he referred to the “Big Data” phenomenon of data volume, velocity, and variety. The “three Vs” have become the generally accepted three, defining dimensions of big data. Others have added Vs, such as veracity, which describes the quality of the data.
The amount of data created over the next decade will be astronomical. The EMC Data Visualization Study, published in 2014, estimates the digital universe is doubling in size every two years and will multiply 10-fold between 2013 and 2020 — from 4.4 trillion gigabytes to 44 trillion gigabytes.
For perspective, the average household in 2013 created enough data to fill 65 iPhones per year. In 2020, this will grow to 318 iPhones per year.
IoT/IoE describes the ability to transfer data over the Internet without requiring human-to-human or computer-to-human interaction. Wearable technology — think of the FitBit — is an example of the IoT. The number of connected devices is projected to reach 50 billion by 2020.
Because of the increasing volumes, variety and speed of data, new technologies have evolved to facilitate managing and finding value in the data. Hadoop is an open source framework for distributed storage and processing of both structured and unstructured data on commodity hardware and MapReduce is the software framework for distributed processing of large Hadoop datasets. This computing environment uses inexpensive commodity hardware with redundancy to recover from expected failures. These have been referred to as data lakes. Flight Efficiency Services, General Electric’s data lake, analyzes sensor data from jet engines. It has reduced the engine analysis time from weeks to days and now minutes, resulting in a 10-fold savings for maintenance.
Big data is about analytics and there is an enormous effect of the synergies created by Big Data, new technologies and the IoT/IoE. Consider the health insurance and health care industries. Insurance companies contract with organizations to provide insurance for their employees, so it is in the best interest of the insurer to identify ways to reduce health care costs to remain competitive. The insurance companies can use Big Data to segment, or cluster, individuals into groups for targeted intervention. They also can use predictive analytics to understand factors that can lead to health events, such as heart attacks.
Big Data is having a significant effect in the retail sector, as well. Many shoppers now scan items with smartphones while shopping in a store so they can compare the prices to nearby stores. And in the new sharing economy, firms such as Uber and Airbnb would not have been possible only a few years without the Internet and associated apps including location data (think Big Data variety). Driverless cars, which rely on sensor data, perhaps will be the norm within a few years.
The first smartphone became available in 2007. No one then could imagine its impact today. Fast forward 10 years — the pace of change is only going to increase, making it impossible to conceive the impact of Big Data on business in 2025.
David E. Douglas is a professor in the department of information systems at the University of Arkansas, where he holds the Sam M. Walton College Professorship in Information Systems. Douglas also serves as co-director of the university’s Institute for Advanced Data Analytics. He can be reached at 479-575-6114 or by email at [email protected].