Академический Документы
Профессиональный Документы
Культура Документы
Big Data
Module ID
10500
Length
1 hour
12/13/16
2015 IBM Corporation
Disclaimer
Copyright IBM Corporation 2015. All rights reserved.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES
ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE
INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED AS IS WITHOUT WARRANTY OF
ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBMS CURRENT
PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM
SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE
RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS
PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR
REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND
CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR
SOFTWARE.
IBM, the IBM logo, ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked
on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S.
registered or common law trademarks owned by IBM at the time this information was published. Such trademarks
may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available
on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
Module Information
After completing this module, you should be able to:
Understand what big data is
Understand some use cases
Understand whats included in IBMs Big Data platform
Understand the Open Data Platform initiative
Understand IBMs Open Platform with Apache Hadoop
Understand IBM BigInsights v4 components and packaging structure
BIG DATA
Big Data technologies describe a new generation of technologies and architectures, designed
to economically extract value from very large volumes of a wide variety of data, by enabling
high velocity capture, discovery and/or analysis.
Source: Matt Eastwood, IDC
4
5 TB per flight
300,000 tweets
per minute
Velocity
Variety
Volume
2020
40 zettabytes
80%
Of worlds data
is unstructured
1 in 3
1 in 2
83%
60%
2012
2.8 zettabytes
Variety:
Business Users
Determine what
question to ask
IT
Delivers a platform to
enable creative
discovery
IT
Business
Structures the
data to answer
that question
Explores what
questions could be
asked
Brand sentiment
Product strategy
Maximum asset utilization
Operations Analysis
Analyze a variety of machine
data for improved business results
Security/Intelligence
Extension
Lower risk, detect fraud and
monitor cyber security in realtime
Benefits
40 times improvement in analysis performance
15-25% performance increase in customer email
campaigns
Analysis time reduced from hours to seconds
11
Benefits
Reduce time required to identify placement
of turbine from weeks to hours
Reduces IT footprint and costs, and
decreases energy consumption by 40 % -while increasing computational power
Incorporate 2.5 PB of structured and semistructured information flows. Data volume
expected to grow to 6 PB
12
Differentiates
Solution components
Software
IBM BigInsights
13
Increases satisfaction
Hardware
IBM System x
14
Provides Insights
to customers across all industries,
helping companies make faster
and better decisions
IBMs approach
12/13/16
2015 IBM Corporation
17
Integrate and Manage the full variety, velocity and volume of Big Data
Analytic Applications
BI /
Reporting
Exploration /
Visualization
Industry Predictive
App
Analytics
Content BI. /. . .
AnalyticsReporting
Application
Development
Systems
Management
Accelerators
Hadoop
System
Stream
Computing
Data
Warehouse
18
Speed time to
value with
analytic and
application
accelerators
Analyze
streaming data
and large data
bursts for realtime insights
Deliver deep
insight with
advanced
in-database
analytics and
operational
analytics
Analytics and
Reporting Zone
Warehousing Zone
BI &
Reporting
Connectors
Enterprise
Warehouse
Predictive
Analytics
Hadoop
MapReduce
Hive/HBa
se
Col Stores
Data Marts
Visualization &
Discovery
Documents
in variety of formats
Hadoop
System
IBM advantage
20
IBM BigInsights
Analyst
Industry Standard
SQL (Big SQL)
Spreadsheet-style
tool (Big Sheets)
Machine Learning on
Big R
Big R (R support)
Big SQL
Big Sheets
(non production):
IBM BigInsights
Enterprise Management
POSIX Distributed
Filesystem
Multi-workload, Multi-tenant
Scheduling
...
22
The entire industry is enabled to create big data offerings using this reference
implementation
Apache: Complementary to the great work happening today
Apache creates source artifacts inside projects, the foundation will create a reference
implementation of the fully integrated platform
23
24
MapReduce
MapReduce
Spark
Spark
Hive
Hive
HCatalog
HCatalog
Pig
Pig
YARN
YARN
Ambari
Ambari
HBase
HBase
Flume
Flume
Sqoop
Sqoop
Solr/Lucene
Solr/Lucene
ODP (future)
25
26
Persona
IBM Value
Need
Business Analyst
27
Data Scientist
Customer Insight
Large financial services
company analyzed 4 billion
tweets and identified 110 million
client profiles that matched with
at least 90 percent precision
Administrator
Manage workloads and
schedule jobs to ensure
performance
Secure environment to reduce
risk
Performance
4x improvement in running
MapReduce jobs ( STAC
report )
1
2
3
BigInsights Analyst Module: Leverage existing skills to find and visualize data across all
sources including Hadoop
Big SQL: Hbase support, High Availability
Big Sheets: Geospatial support & Big SQL Integration
IBM Open Platform with Apache Hadoop: Free (with optional paid support) product use
4
28
Support for long-running applications within YARN for enhanced reliability & security
Heterogeneous storage in HDFS for in-memory, SSD in addition to HDD
Optimize applications based on data access speed required
29
Hadoop
HDFS/MapReduce/YARN*
Ambari*
Avro
Flume
HBase
Hive
Knox
Open JDK
Oozie
Pig
Parquet
Sqoop
Snappy
Solr
Slider
Spark
Zookeeper
2015 IBM Corporation
Data formats
31
32
IBM BigInsights v4
Elite
IBM
Support for
Open
BigInsights BigInsights
BigInsights
IBM Open BigInsights
BigInsights
Platform
Data
Enterprise
Quick Start
Platform
Analyst
for Apache
with
Scientist Management
Edition
with
Module
Hadoop
Apache
Module
Module
Apache
Hadoop
Hadoop
33
Governance Catalog
BigInsights
Quick Start
Pricing Terms
Free
Support
provided
Usage
License
Pricing Model
Access via
Nonproduction,
five node cap
Free
Community
Free
ibm.com/hadoop
34
Community
Elite Support
IBM Open
BigInsights
for IBM
BigInsights
Platform
BigInsights
Enterprise BigInsights
Open
Data
with
Analyst
Managemen for Apache
Platform
Scientist
Apache
Module
t
Hadoop
with Apache
Module
Hadoop
Module
Hadoop
Yearly
Free
Subscription
Perpetual or Monthly License
Only
Summary
In this Module you learned about:
Big Data
IBM Open Platform with Apache Hadoop
IBM BigInsights 4.0 ( IBM BigInsights for Apache Hadoop )
Open Data Platform (ODP) initiative
35
37
Questions?
askdata@ca.ibm.com
12/13/16
2015 IBM Corporation