You are on page 1of 8

BIG DATA IMPACT ANALYSIS

ABSTRACT
The amount of data that has been generated by the users and organization has been
increasing rapidly which has resulted in the creation of a new field of study called Big Data. Over
the past few years, the capacity of the world to exchange and generate information has increased
from 0.3 Exabyte in 1986 (20% digitized) to 65 Exabyte in 2007 (99.9% digitized) (Manyika et
al., 2011). This rapid progression has made it necessary for organizations to capture trillions of
bytes of data generated about every event that happens through social media and sensors in various
electronic devices such as smartphones and laptops. This large pool of datasets that are captured,
processed, stored and analyzed are here to stay as they provide useful information on the
environment the organizations perform their operations. In this research paper we will discuss the
investments of organizations in big data, return on investment and overall benefits, the challenges
and privacy concerns of big data.

INTRODUCTION
Over the past several years various research teams have attempted to understand data
growth and generation. Though they have yielded different numbers all the research teams seem
to agree on the fact that the data captured is growing exponentially. McKinsey Global Institute
(MGI) estimated (2010) that organizations around the globe store more than 7 exabytes of new
data on disk drives and consumers stored more than 6 exabytes of new data on their personal
devices such as laptops, personal computers and smart phones. Researchers also suggest that in
the future we will generate enormous amount of data that will be physically impossible to store on
such devices if we plan to capture every piece of data generated. The significant factors that led to
the explosion of data are said to be contributed from the growth of traditional transactional
databases, the rapid increase in unstructured multimedia data, the growth of internet of things
and the growing popularity of social media (Manyika et al., 2011,p.21). As a result a new field has
emerged called Big Data
In the year of 2014 big data was listed in Gartners Top 10 Strategic Technology Trends
for 2013 and Top 10 Critical Tech Trends for the Next Five Years. It has started a huge
revolution in the field of information technology that has made many organizations to change their
traditional ways they have been operating for many years now. Big data does not have a well-
defined definition, however, it can be explained with the help of the 3Vs, Volume this represents
the size of the dataset and generally requires very large storage capacities in terabytes and Exabyte,
Velocity this refers to the speed at which the data is captured and communicated within the
system, Variety this refers to the heterogeneous nature of the environment of the datasets
(structured or unstructured) (Russom, 2011) (Edosio, 2014). According to Gartner (2012) Big
data are high-volume, high-velocity and /or high-variety information assets that require new forms
of processing to enable enhanced decision making, insight discovery and process optimization.
With the world moving towards the internet of things (IoT) the way data is collected today is highly
dynamic. Data is collected through sensors in household devices, internet and all other networks,
scientific experiments, medical and other healthcare instruments which has led to the rapid
increase in the amount of data collected by organizations. This makes the traditional data analysis
techniques and algorithms useless in order to analyze and visualize big data which calls for special
data mining techniques, storage devices, analyzing algorithms and visualization tools.

INVESTMENTS IN BIG DATA


According to the 2014 IDG Enterprise survey more than half of the organizations that
participated in the survey were either implementing or likely to implement a big data project which
was a 5% increase compared to the status in 2012, see figure 1.
Figure 1 IDC Enterprise Survey

Based on a research conducted by GE and Accenture it is clear that organizations are


making big investments in big data. Around 73% of the surveyed organizations have already
invested more than 20% of their information technology budget on big data analytics and many
top-level executives expect that their investments will increase in the following year, see figure 2.

RETURN ON INVESTMENT
As organizations are making big bets in investing in big data analytics, some of the early
investors are already reaping good profits. The Wikibons 2011 case study compared the return on
investments in two different analytical environments of organizations. They compared the high-
speed data warehouse environment with ETL and data provisioning processes and big data
technology with massively parallel hardware (MPP). The following figure shows that the MPP
environment was most favorable in terms of return on investments with cumulative cash flows,
net present value including the rate of return. However, big data analytics takes longer time to
break even compared to the traditional approach (Davenport, 2013).

Figure 3 Benefits of Big Data

BENEFITS OF BIG DATA


According to Wielki (2013) a survey conducted on 115 leading organizations, the benefits
of Big Data Analysis to business is represented in figure 4. Based on a recent MGI report, big data
creates transparency the simple process of making big data easily accessible throughout the
organization makes it easy for the decision makers to make timely decisions and it creates an
environment for easy information flow between the different departments thus creating
transparency in the organization. It encourages organizations to innovate and experiment Big
data analytics deals with improving the overall efficiency and performance of business operations
by analyzing transactional data which enable organizations to set up controlled experiments. This
provides the analysts with the root cause for challenges and thus improves performance. Through
customer segmentation Big data enables organizations to cluster and segment its customers to
the most detailed level based on their preferences. This helps the organizations in tailoring
marketing campaigns to the customers needs and not spend millions of dollars in marketing
efforts.
Figure 4 Wielki Big Survey

.
One among the key drivers of big data is its capability of cost reduction. For example, the
ORION project at UPS which stands for (On-Road Integrated Optimization and Navigation is a
real-time big data research project. The data is captured from telematic sensors in its 46,000
vehicles that relay truck data such as speed, direction, braking and drive train performance in real-
time. The data is used in order to design the UPS drivers routes and monitor daily performance.
This particular project has saved UPS over 8.4 million gallons of fuel by reducing over 85 million
miles off routes in the year 2011. Looking at the benefit of cost savings UPS is now planning on
investing more into big data so that it can use data and analytics to monitor its 2000 aircrafts
(Davenport, 2013). The high price of relational database management system pricing and the
capital required for building the infrastructure is a major reason for the increase in investments in
big data technologies. In most cases of relational databases, licensing is said to be very expensive
ranging up to $50,000 per CPU core (Lopez & Antonio). This price changes from vendor to vendor
but it is considerably high when compared to the cost involved in licensing and support for big
data which is about $4000/node/year (Bantleman, 2012).
Big data also helps organizations reduce processing time. One good example is the
merchandise pricing optimization software used at Macys which helps reducing the processing
time of complex, analytical calculations to seconds and hours which will otherwise take days to
solve using traditional methods/systems. The clothing department store has been able to reduce
the time taken to price over 73 million items before sale from 27 hours to a day. This reduction in
time has helped Macys retail stores to adapt to the changes in consumer demand and change prices
of items more easily. In addition to this Macys is also able to interact with its customers in real-
time using big data analytics. The big data analytics uses Hadoop distributed processing and
storage systems along with multiple parallel processing and in-memory application. This has
enabled the IT infrastructure costs to be cut by 70% (Davenport, 2013).
Big data analytics also helps organizations to make big data based features in their
products. For example LinkedIn offers various big data based features on its website such as
recommendations, people you may know, suggested openings or job positions, groups you may
prefer. These features make LinkedIn a favorite professional networking website for the people of
the corporate world. Similarly Netflix uses various big data methodologies such as clustering and
neural networks which recommends television series and movies to users based on their viewing
history. This feature provides Netflix with the competitive advantage to retain customers
subscribed to their services.
The field of data analytics in general was initiated in order for organizations to get support
in making business decisions. Big data has made it easy to analyze unstructured data, for example,
United Healthcare has been analyzing only structured data so far but with the help of big data they
have utilized unstructured data captured from customer calls. United Healthcare converts these
voice calls into text and applies big data analytics to this data, thus understanding the attitude of
customers.
According to the Bain & Company study, many early adopters of big data analytics have
gained various benefits over their competitors. The study examined more than 400 organizations
and found out that the early adopters have been outperforming their counter parts by wide margins.
They are twice as more likely to perform well financially, they are able to take decisions five times
much faster than their peers, they are three times more likely to execute the decisions as planned
and twice as likely to use data analytics during decision making (Bain & Company, 2013) see
figure below .
CHALLENGENS IN BIG DATA
Similar to any new initiative big data has considerable risks and challenges. A Capgemini
(2012) study on the major impediments to big data (see figure below) found out that the major
challenge is organizational silos. The data stored in an organization is segregated and stored in
separate systems used by different departments such as marketing and sales, finance and
accounting, research and development and human resources. The data that these departments hold
is not shared among the departments. The second challenge is the shortage in the supply of skilled
resources to perform these analytical operations as big data analytics is a newly emerging field.

MGI listed various challenges to be addressed in order to capture the full potential of big data,
Data policy As this is a new field there are various data policy issues that are starting to
become highly significant which includes privacy, security, intellectual property and even
liability.
The existing legacy systems, data storage methodologies when combined with big data
technology often results in issues during the data integration process. This forces
organizations to create new technologies and migrate all the existing data into the new
systems which is a project that requires months of planning and implementation. Many
organizations even lack the facility to capture incoming data, hence implementing big data
solutions require overhauling the existing IT infrastructure and updating all the software
applications. The access to data especially in unstructured format will require integration
from third-party vendors. In such scenarios there are no efficient markets for sharing and
movement of this data which leads to additional data integration issues.
One of the major challenge for big data is the shortage of analysts with the skills to work
in this new technology. The existing employees may need additional training to utilize big
data analytics and generally any organizational change is not met with welcome arms in
the organization.
The structure of industries also play an important role in determining the ease of data
capture. For example, public sectors are designed in such a way with no competitiveness
hence it will be slow to leverage the full potential of big data whereas private organizations
and hospitals in order to attract more customers will leverage the full benefit of big data
analytics (Manyika, 2011).

BIG DATA PRIVACY CONCERNS


One of the major challenges of big data is the privacy of the data generated by users. Users
profiles are created and shared widely which leads to information security issues. Traditionally
organizations used various different methodologies such as encryption, data sharding and
anonymization in order to protect the identities of users. However data scientists have continuously
been able to use isolated datasets and identify information regarding the users which is used for
profiling consumers. Paul Ohm observed that re-identification used by data scientists is a privacy
concern as it undermines the users faith in anonymization. In 2006, a Harvard-based research
group started gathering the profiles of 1,700 college-based Facebook users to study how their
interests and friendships changed over time (Lewis et al. 2008). These supposedly anonymous data
were released to the world, allowing other researchers to explore and analyze them. What other
researchers quickly discovered was that it was possible to de-anonymize parts of the data set:
compromising the privacy of students, none of whom were aware their data were being collected
(Zimmer 2008). This particular case made headlines and raised a series of questions on the
practices of big data on social media and what can be used or not used from these social networking
websites. Privacy concern is a very sensitive topic as it is very hard to realize the effect of any
damage done. At the same time it is also not possible to get consent from every single person using
facebook and twitter but it is also not ethical for organizations to start using any data that is freely
available for data analysis (Ess 2002). Hence organizations must implement strict policies around
who gets access to users data and how it is analyzed. As big data is still a growing field we need
good standards which the organizations can follow in order to utilize consumer data ethically.

CONCLUSION
Over the following decade the investment in big data has been increasing rapidly and so is
the return on investment for the organizations. It is comparatively cheaper and more efficient,
however there are many challenges associated with using big data. One of the major challenge is
that we should be mindful of the ways in which users information is utilized. Just because it is
available it does not mean that the organizations can exploit the information. In this era of big
data the users have no control over what happens with their data and who uses it for what purpose.
Hence I suggest that the use of consumers data must be regulated with the help of strong rules
and regulations. Organizations exploiting consumers privacy must be taken action against and it
must also be included in the organizations code of ethics in order to ensure proper implementation.
Reference
Bantleman, John [2012]. The Big Cost of Big Data, Forbes, http://www.forbes.com/sites/
ciocentral/2012/04/16/the-big-cost-of-big-data/ (accessed on May 4, 2015).
Davenport, T. H., & Dych, J. (2013). Big data in big companies. May 2013.
Ess, C. (2002) Ethical decision-making and Internet research: recommendations from the
aoir ethics working committee, Association of Internet Researchers, [Online] Available
at: http://aoir.org/reports/ethics.pdf (12 September 2011).
Gartner [2012]. IT Key Metrics Data 2012: Key Infrastructure Measures: Storage
Analysis: Current Year, Jamie K. Guevara, Linda Hall, Eric Steggman.
Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A. & Christakis, N. (2008) Tastes, ties,
and time: a new social network dataset using Facebook.com, Social Networks, vol. 30, no.
4, pp. 330342.
Lopez & Antonio, The Modern Data Warehouse How Big Data Impacts Analytics
Architecture, Business Intelligence Journal, Vol.19, No.3.
Manyika, J. et al., 2011. Big data: The next frontier for innovation, competition, and
productivity. Washington: McKinsey Global Institute.
Ohm, supra note 7, at 1704
Pearson & Weagner, 2013. Big Data The organizational challenge study, Bain &
Company.
Russom, P., 2011. Big Data Analytics. The Data Warehousing Institute, 4(1), pp.1-36.
Wielki, J. (2013). An analysis of the opportunities and challenges connected with Big
Data. Zarzdzanie i Finanse, 11(3, cz. 1), 54-69.
Wielki, J., 2013. Implementation of the Big Data concept in organizations possibilities,
impediments and challenge. In FEDCSIS., 2013.
Zimmer, M. (2008) More on the Anonymity of the Facebook dataset its Harvard
College, MichaelZimmer.org Blog, [Online] Available at: http://
www.michaelzimmer.org/2008/01/03/more-on-the-anonymity-of-the-face book-dataset-
its-harvard-college/ (20 June 2011).