Вы находитесь на странице: 1из 11

MANAGEMENT INFORMATION SYSTEMS

Big Data
Mid Term Report
Rishi Kumar Paswan
Rohan Samria
Siddharth Malani
T R Chakravarthy
Venkat Singhal

The term Big Data is used anywhere and everywhere these days; from professional
magazines to news articles, YouTube videos to tweets and blog discussions. The term was
coined by Roger Magoulas from OReilly media in 2005, it refers to a wide range of large data
sets almost impossible to manage and process using traditional data management tools - due to
their size, but also their complexity. Big Data can be seen in the business and finance where huge
amount of banking, stock exchange, online and onsite purchasing data flows through
computerized system every day and further they are captured and stored for inventory
monitoring, market behavior and customer behavior. It can also be seen in the life sciences
where big sets of data such as clinical data, genome sequencing and patient data are analyzed and
used to advance breakthroughs in science. Other areas of research where Big Data is of central
importance are oceanography, astronomy and engineering among many others. The leap in
computational and storage power enables the collection, storage and analysis of these Big Data
sets.
According to McKinsey big-data -report June 2011 by McKinsey Global Institute, 5
billion mobile phones are in use in 2010, 30 billion of pieces of content shared on Facebook
every month, 40% projected growth global data generated per year VS only 5% growth in global
it spending, 235 terabytes of data collected by US Library of Congress by April 2011 and these
are barely scratching a surface of the amount of data generated today every minute.

What is Big Data?


Big data is general term used to describe the massive amount of structured and
unstructured data generated exponentially. This data come from various sources viz. PCs, mobile
devices, and machine sensors. It comes from everywhere: digital picture videos posted online,
posts on social media site, sensor used to gather climate information, purchase transaction
records to name a few.

Is Big Data a Volume or a Technology?


According to IBM, we create 2.5 quintillion bytes of data every day, 90% of data has
been created in last two years alone. However Big Data is not only about the volume of data that
is being produced, but the term Big Data is used in organizations may refer to technology which
include tools (e.g. Hadoop) and processes that any organization require to make use of large
amount of data and storage facility.

3 Vs of Big Data:
Big Data spans in 3 dimension:
Volume: Big data implies to large volume of data. Many factors contribute to this as
mentioned earlier, as data shared in social
media, call records and others. Data is so huge
that however the storage is becoming cheaper,
how to determine relevance of unstructured
data and how to store and use such a huge
amount.
Velocity: No more data moves in very
less speed as it used to be before; it is
becoming batch to streaming data (near real
time and becoming real time).
Variety: As sources are different so is
the type of data that has variety. Today data is available in different format, structured,
unstructured. It can be text and information based as well as can be in different forms like video,
audio, emails. It can be financial transaction, call records to weather data recorded by machine
sensors.

Conventional Methods of Handling Data


The traditional way of data handling in the early days of computing offered a lot of
convenience to business processes as well as the benefits of storing data electronically. The
conventional approach consisted of custom built data processes and computer information

systems tailored for a specific business function. For example, an accounting department would
have their own information system tailored to their needs, where any other department like
manufacturing department would have an entirely separate system for their needs.
Initially these separate systems were simple to set up and allowed business processes of
individual departments do things faster with less work. However separate database systems for
each business function led to conflicts of interest within the company. Departments felt a great
deal of ownership for the data that they collected, processed and managed which caused many
issues among companywide collaboration and data sharing, data redundancy and high rate of
inconsistent data.
Architecture is apparently needed to draw together the various strands of informational
system activity within the company. In 1980s IBM researchers Barry Devlin and Paul Murphy
developed the Business data Warehouse which stores data at one place from a different data
sources. In 1990, Bill Inmon coined the term Data Warehouse which is a single, complete and
consistent store of data obtained from a variety of different sources made available to end users
in a way they can understand and use in a business. The process of transforming data from data
warehouse into information and making it available to users in a timely manner to make a
difference is called Data Warehousing.
The following is the Data Warehouse architecture:

Client

Client

Query
and
Analysis

DATA
WAREHOUSE

Meta Data

Integration

Source

Source

Source

The process of extracting information from companys various databases and reorganizing
it for purposes other than what the databases were originally intended for is called as Data
Mining. Online analytical processing refers to such end user activities as DSS modeling using
spreadsheets and graphics that are done online. OLAP- Online Analytical Processing coined by
EF Codd in 1994, generally synonymous to decision support, business intelligence, executive
information system. Objective of OLAP is to analyze complex relationships and look for
patterns, trends and exceptions.

One ultimate use of the data gathered and processed is for business intelligence. Business
intelligence is the ability to make fact based decisions based on reliable and integrated data.
Disadvantages of Data Warehouse:

Data Warehouses are not the optimal environment for unstructured data.

Because data must be extracted, transformed and loaded into the warehouse, there is an
element of latency in data warehouse data.

Over their life, data warehouses can have high maintenance costs.

Data warehouses can get outdated relatively quickly. There is a cost of delivering
suboptimal information to the organization.

There is often a fine line between data warehouses and operational systems. Duplicate,
expensive functionality may be developed in the data warehouse that in retrospect should
have been developed in the operational systems and vice versa.

To overcome the disadvantages of data warehousing and explosion of data lead to BIG
DATA. Big data is a popular term used to describe the exponential growth and availability of
data, both structured and unstructured. And big data has become as important to business
and society as the Internet has become. The hopeful vision of organizations for big data is to
take data from any source, harness relevant data and analyze it to find answers that enable 1)
cost reductions, 2) time reductions, 3) new product development and optimized offerings, and
4) smarter business decision making.

Whats Unique about BIG DATA?


Information has been always considered a mine of gold to improve the business
capabilities of companies but the uniqueness of Big Data lies in its size and structure. Big Data is
also special because it represents both significant information - which can open new doors and
the way this information is analyzed to help open those doors. The analysis goes hand-in-hand
with the information, so in this sense "Big Data" represents a noun "the data" - and a verb
"combing the data to find value."

How can we make sense of Big Data?


Big Data is a powerful term which can give insights which are not immediately visible or
difficult to find using traditional methods. With the appropriate tools, we can find the hidden
patterns and trends which are not visible to naked eye. It required new technologies and skills to
analyze the flow of material and draw conclusions.
Apache Hadoop is one such technology, and it is generally the software most commonly
associated with Big Data. Apache calls it "a framework that allows for the distributed processing
of large data sets across clusters of computers using simple programming models." Just as Big
Data can be both a noun and a verb, Hadoop involves something that is and something that does
specifically, data storage and data processing. Both of these occur in a distributed fashion to
improve efficiency and results.
Hadoop is open-source and there are variants produced by many different vendors such
as Cloudera, Hortonworks, MapR and Amazon. There also other products such as HPCC and
cloud-based services such as Google BigQuery.
Skills are brought to the table by Big Data Scientists who obtain business value from a
plethora of information by analyzing it for meaning and trends. This requires mathematical and
statistical expertise as well as creative, communicative, problem-solving, and business skills,
making it a very complex but incredibly valuable role. New fields have developed to train for
this expanding career path, and there is a wealth of advice for those aspiring to enter the Big
Data industry - which is expected to see a 500 percent job increase from January 2012 to January
2014, according to Indeed.com.

Big Data in Business


The day is not far when all the data generated will be Big Data. Data, both structured
and unstructured, will be collected, processed, and analyzed in real time to open a new world of
unique insights, unexpected relations, and could possibly help to predict our future. Big Data
shows promise to the corporate IT leaders. They strategize, execute and improve their resources

with the help of big data. There are risks involved and hence many organizations are skeptical to
fully explore the advantages of big data and its analysis.
Big Data Difference:
Lets consider a real life example. Apple pie used to be Americas favorite pie. Looking
at the sales of 30 cm pies, apple pie dominated the market. Then, supermarkets came up with
smaller, 11 cm pies, which led to the apple pie falling to fourth or fifth preference. The answer
became evident when large chunks of consumer data were analyzed. Apple pie being everyones
common second favorite, a family had to agree on buying a 30 cm apple pie. But when smaller
11 cm variants of pies came into the market, people could buy their respective first preferences.
Presence of big data now, could make this analysis possible which otherwise would not have
been possible in the absence of adequate data.
Consider another example in which a telecommunication service provider would like to
reduce its potential risk of customer attrition. The company can analyze millions of call data
records to identify those customers which receive or make the from multiple phone numbers.
The company can then direct its promotional offers towards these individuals to keep them
happy and reduce attrition. It is such hidden insights that demonstrate the potential use of big
data in business decision making.
Business decisions like promotion, pricing, investments any business decision are
assisted by insights from big data sources. For example, Wal-Mart developed its Polaris search
engine in order to help online shoppers to find products that they were searching for, quickly.
This search capability used data from Wal-Marts 45 million monthly online shoppers in addition
to product and category popularity scores obtained using social media streams. And using this,
Wal-Mart achieved a 10-15% rise in customers completing their online purchases.
Hidden Insights
Organizations today can discover new opportunities by mining large quantum of data and
analyzing to explore hidden avenues that they previously were unaware of. To take an example,
former Google employees started a company called The Climate Corp which offered crop
insurance to those parts of the world where unexplored. They use weather and soil data from 5

Lakh locations around the world to predict climate risks for specific crops in specific locations.
This empowers them to outperform other companies which do not have the same level of
specific data to help Asian and African farmers.
Automate Business Process
Analysis can be transformed into process to make automated decision making because of
presence of new technology to analyze big data in real time. McDonalds replaced color cards and
calipers to check their bun size and color with high speed image analytics to examine thousands
of buns a minute for their size, color, and various other properties. This enabled them with an
accurate and fast response tool to limit wastage and also create uniform buns.
Big Data + Mobile = New Business Processes
With companies constantly improving their data usage capability, it is very natural for
them to put this into action in the most accessible technology that people use, Mobile phones.
Mobility ensures immediate action from the customer and hence the impact of big data will get
accentuated. It also empowers employees to access desirable data and real time insights, on the
go. It helps in the collection of data in real time from the field, enriching the companys
knowledge base. In this case, we can look at an example of a delivery company which has to
send its trucks on the field. They can improve their delivery process by using efficient and
smarter routing tools that can help the company in anticipating the traffic conditions along some
routes or they can also take a new route in response to the information about an accident that just
occurred or also according to the drivers input of information.
More than Technology
Acquiring latest technology and gathering huge quantum of data are a requirement but
not a sufficient condition for an organization to effectively and efficiently use big data to its
advantage. Leveraging the power of big data for the company requires innovation, analytical
mindset, cultural change and new skills. Garter, an IT research company has predicted that
around 30 percent of the businesses around the world will use their information in hand as a
currency to barter amongst themselves or even sell it at a reasonable price. They have also
predicted that there will be more than 40 lakh jobs created throughout the world and only one-

third of them will be filled. There will be a dearth of data scientists with sufficient business
knowledge which gives them the ability to ask relevant questions and analyze and structure data
to get required answers.
Big Data in Small Businesses
Big Data is not only for big business with bigger pockets. Even small businesses can also
use the massive online information repository to take data-driven decisions. They just need to
understand where to look for relevant information. IBMs Watson Analytics is an advanced data
analytics tool which is easily accessible to SMEs. It automates the process of data mining and
analysis. It provides suites for data access, data refinement and data warehousing to help prepare
data in a presentable manner to guide decision making. Google analytics is another online Webtraffic-monitoring tool which uses a number of traffic sources and metrics to provide data about
website visitors. It can analyze social media traffic with the help of which SMEs can change
their marketing technique online. There are many other big data analysis tools for small
businesses like InsightSquared, Canopy Labs, TranzLogic, Qualtrics, etc. Small businesses can
look for trends in customers behavior on, before and after certain occasions, thus manipulating
the costs of their products or services. Big data can help in providing personalized services by
providing the businesses with better insights about the customers likes and dislikes, their
behavior patterns, etc.
The primary purpose of using Big Data for analysis is for improvement in decision
making. Its primary focus is to solve business problems. However it can also be used for the
benefit of mankind to recover from disaster or even predict them and many such issues. One of
the major curses of big data is that people assume that more data = better analysis, which may
not always be true. A better assumption would be clean data = better analysis. Big data tools like
Hadoop, etc. doesnt clean the data. It only gives you access to large chunks of data. Another
assumption that people make when analyzing big data is correlation = causation. However, no
data analysis can tell you for sure, the cause for something. It can only tell you the nature of
correlation between two or more data. In the end its the human intellect that determines
causation. Hence we can conclude that using big data properly can work wonders for a business
but improper understanding may also be misleading.

References:
1. Big Data basic concepts and benefits explained: By Scott Matteson in Big Data
Analytics, September 25, 2013
2. https://srinivasansundararajan.sys-con.com/node/1968472/
3. http://www.1keydata.com/datawarehousing/datawarehouse.html
4. http://www.sas.com/en_us/insights/big-data/what-is-big-data.html
5. https://www.youtube.com/watch?v=vfhOxEzoqqA
6. http://tdwi.org/portals/data-warehousing.aspx
7. Big Data is better data: Ted Talk by Kenneth Cukier, September 2014.
8. What Big Data means for business: By Doug Laney, Hung LeHong, Anne Lapkin
research analysts at Gartner
9. 4 Ways Big Data Will Transform Business: By Sashi Reddi, General Manager of CSC
10. 6 Big Data Solutions for Small Businesses: By Sara Angeles, September 18, 2014

Вам также может понравиться