Академический Документы
Профессиональный Документы
Культура Документы
Operational
data
stores
built
for
$300,000
in
Hadoop
versus
$4,000,000
using
relational
databases
Trading
warehouse
built
for
$200,000
in
Hadoop
versus
$4,000,000
with
a
database
appliance
Analyzing
risk
data
in
3
hours
versus
3
months
Pricing
calculations
performed
in
20
minutes
versus
48
hours
Behavioral
analytics
in
20
minutes
versus
72
hours
Modeling
automation
from
150
models
per
year
to
15,000
models
per
year.
Retail
Banking
Screen
New
Account
Applications
for
Risk
of
Default
Business Challenge
Every
day,
large
retail
banks
receive
thousands
of
applications
for
new
checking
and
savings
accounts.
Bankers
that
accept
these
applications
consult
3rd
party
risk
scoring
services
before
opening
an
account.
They
can
(and
do)
override
do-not-open
recommendations
for
applicants
with
poor
banking
histories.
Many
of
these
high-risk
accounts
overdraw
and
charge-off
due
to
mismanagement
or
fraud,
costing
banks
millions
of
dollars
in
losses.
Some
of
this
cost
is
passed
on
to
other
customers
who
responsibly
manage
their
accounts.
Solution
Apache
Hadoop
can
store
and
analyze
multiple
data
streams
and
help
bank
managers
control
new
account
risk
in
their
branches.
They
can
match
banker
decisions
with
the
risk
information
presented
at
the
time
of
decision.
This
allows
them
to
manage
risk
better
by
sanctioning
individuals,
updating
policies,
and
identifying
patterns
of
fraud.
Over
time,
the
accumulated
data
informs
algorithms
that
may
detect
subtle,
high-risk
behavior
patterns
unseen
by
the
banks
risk
analysts.
Value Realized
Improved
risk
management
allows
the
bank
to
lower
its
provisions
for
bad
debts
and
write-offs.
Business Challenge
Traditional
auto
insurance
attempts
to
differentiate
and
reward
safe
drivers
for
their
historical
driving
recordsthe
accidents
and
traffic
infractions
that
have
(or
have
not)
already
happened.
That
raises
the
question
whether
a
particular
driver
with
a
good
driving
record
has
merely
been
lucky,
or
is
in
fact
a
prudent
driver.
A
property
and
casualty
insurance
company
with
$17B
in
revenue
and
28,000
employees
illustrates
the
challenge.
Newer
usage-based
insurance
(also
called
Pay
as
You
Drive,
or
PAYD)
attempts
to
align
premiums
with
empirical
risk,
based
on
how
policyholders
actually
drive.
Safer
drivers
pay
less,
because
the
insurance
company
actually
knows
how
they
drive.
Because
policyholders
know
this,
PAYD
insurance
promotes
a
virtuous
cycle
that
improves
overall
safety
and
reduces
moral
hazard
amongst
drivers
who
take
more
risk
on
the
road
because
they
know
that
theyre
covered.
Advances
in
GPS
and
telemetry
technologies
have
reduced
the
cost
of
capturing
the
driving
data
used
to
price
PAYD
policies,
but
the
data
streaming
from
vehicles
grows
very
quickly,
and
it
needs
to
be
stored
for
analysis.
The
growing
volume,
velocity
and
variety
of
incoming
data
taxed
their
existing
systems
and
processes.
The
insurer
was
storing
its
PAYD
data
on
an
RDBMS
platform,
but
storage
costs
were
too
high,
so
the
company
only
retained
25%
of
the
available
data.
Processing
that
subset
of
data
took
one
working
week.
Risk
analysis
was
too
slow.
Solution
After
adopting
Hadoop,
the
company
retains
100%
of
policyholders
PAYD
geo-location
data
and
processes
that
quadrupled
data
stream
in
three
days
or
less.
More
data
and
faster
processing
enables
the
insurer
to
price
risks
better.
It
can
retain
low-risk
drivers
that
might
have
churned
because
other
insurers
were
offering
better
rates.
And
it
can
re-price
high-risk
drivers
so
that
they
become
sustainably
profitable
for
the
insurer.
Value Realized
The insurer is able to acquire certain segments the prudent drivers with more affordable
rates, and is able to more profitably serve that segment since risky drivers are re-priced based on
their behavior.
Business Challenge
A
major
provider
of
property,
casualty,
life
and
mortgage
insurance
with
$65B
in
revenue,
60,000
employees
and
operations
in
100
countries
illustrates
the
potential
of
Hadoop
for
claims
processing.
The
company
already
had
systems
in
place
for
analyzing
structured
data
at
scale.
Less-
structured
claims
notes
or
social
media
analysis
was
used
on
a
claim-by-claim
basis,
but
it
did
not
scale
easily.
Combining
all
textual
or
social
data
with
all
structured
data
was
not
economically
viable,
yet
had
the
potential
to
add
valuable
information
to
claims
analysis.
Solution
Apache
Hadoop
changed
that.
It
is
a
schema
on
read
architecture
that
permits
ingest
of
a
much
wider
range
of
data
types.
Data
puddles
that
were
previously
scattered
about
are
now
unified
in
a
data
lake,
for
a
much
clearer
and
holistic
picture
needed
to
process
a
claim.
This
deep
data
reservoir
can
still
be
analyzed
using
existing
business
intelligence
tools
and
employee
skills,
thanks
to
close
integration
between
Hadoop
and
existing
BI
assets
such
as
SAS,
Tableau
and
QlikView.
Value Realized
Hadoop
allows
the
insurer
to
blend
and
correlate
data
from
various
sources
using
a
variety
of
processing
engines
and
analytical
applications.
This
is
of
crucial
importance
when
combatting
claims
fraud
because
what
may
appear
as
a
legitimate
claim
in
one
system
is
quickly
exposed
as
a
fraud
when
additional
structured
and
unstructured
data
from
different
sources
are
brought
to
bear.
hundreds
of
thousands
of
market
participants.
This
ticker
plant
collects
and
processes
massive
data
streams,
displaying
prices
for
traders
and
feeding
computerized
trading
systems
fast
enough
to
capture
opportunities
in
seconds.
This
is
useful
for
making
real-time
decisions.
Years
of
historical
market
data
are
also
available
for
long-term
analysis
of
market
trends.
The
provider
was
ingesting
50GB
of
server
log
data
from
10,000
feeds
daily.
Four
times
daily,
this
data
is
pushed
into
DB2.
Applications
query
the
data
35,000
times
per
second.
70%
of
queries
are
for
data
less
than
1
year
old,
30%
for
data
more
than
one
year
old.
The
specific
challenge
was
two-fold:
The
existing
architecture
was
only
able
to
hold
10
years
of
trading
data.
And
the
growing
volume
of
data
was
degrading
the
performance,
with
a
risk
of
missing
an
SLA
of
12
milliseconds.
Solution
The
provider
re-architected
their
ticker
plant
with
Hadoop
as
its
cornerstone.
ETL
offloading
to
Hadoop
provided
affordable
long-term
data
retention.
And
serving
queries
from
Apache
HBase
provided
ultra-low
latency
that
meets
the
rigorous
SLA
requirements.
Value
Realized
The
market
data
provider
realized
a
more
than
ten
times
improvement
in
price-performance
for
this
particular
area
of
their
business.
Trade
Surveillance
and
Compliance
Analysis
Business
Challenge
An
investment
services
firm
with
$16B
in
assets
and
4,000
financial
advisors
serving
millions
of
individual
clients
processes
fifteen
million
transactions
and
three
hundred
thousand
trades
every
day.
The
specific
challenge
was
two-fold:
The
existing
architecture
was
only
able
to
hold
a
limited
period
of
data
online,
which
means
that
analyses
of
historical
data
were
only
possible
with
a
cumbersome
restore
of
the
data
from
archive.
More
importantly,
each
days
trading
data
was
not
available
for
risk
analysis
until
after
the
close
of
business.
This
created
an
unacceptable
window
of
time
where
the
firm
was
exposed
to
risks
from
rogue
trading
without
a
timely
way
to
intervene
while
improper
trading
was
happening.
Intraday
risk
analysis
was
very
limited.
Solution
Hadoop
accelerates
a
firms
speed-to-analytics
and
also
extends
its
data
retention
timeline.
A
shared
data
repository
across
multiple
lines
of
business
provides
more
visibility
into
all
intra-
day
trading
activities.
The
trading
risk
group
accesses
this
shared
data
lake
to
processes
more
position,
execution
and
balance
data.
They
can
do
this
analysis
on
data
from
the
current
workday,
and
it
is
available
for
at
least
five
yearsmuch
longer
than
before.
Moreover,
Hadoop
enables
ingest
of
data
from
recent
acquisitions
despite
disparate
data
definitions
and
infrastructures.
Value
Realized
The
data
lake
accelerates
time-to-insight
and
extends
retention.
Operational
data
is
available
to
risk
analysts
while
markets
are
still
open,
enabling
them
to
reduce
risk
of
that
days
trading
activities.
Mining
Data
Assets
with
an
Enterprise-Wide
Data
Lake
Business
Challenge
A
leading
global
investment
services
company
with
$1.5
trillion
in
assets
under
management,
$14
billion
in
revenue
and
50,000
employees
wanted
to
capitalize
on
its
disparate
data
assets
which
were
largely
unavailable
across
the
organization.
Current
enterprise
data
warehouse
solutions
were
appropriate
for
some
data
workloads
but
too
expensive
for
others,
such
as
sever
logs.
Financial
log
data
is
difficult
to
aggregate
and
analyze
at
scale.
The
high
cost
of
legacy
technology
means
typical
retention
periods
are
short,
which
hampers
analysis
of
prices
and
performance
over
longer
periods.
This
is
relevant
for
analyses
such
as
lifetime
customer
value
and
cost
to
serve.
Solution
The
company
deployed
a
multi-tenant
Hadoop
cluster
to
merge
data
across
groups.
Server
log
data
for
instance
was
merged
with
structured
data
to
uncover
trends
across
assets,
traders
and
customers.
Sensitive
data
was
accessed
via
Accumulo
part
of
the
Hortonworks
distribution
of
Apache
Hadoop
-
which
enforces
read
permissions
on
individual
data
cells.
Value
Realized
The
company
started
mining
its
data
assets
with
this
enterprise-wide
data
lake
and
expects
literally
more
than
a
hundred
use
cases
to
emerge
over
time.
Already,
the
project
has
more
than
covered
its
cost
since
Hadoop
right-sizes
enterprise
data
warehouses.
Moreover,
Hadoop
delivers
better
insights
already
on
customer
acquisition
costs
or
longer-term
patterns
in
various
corners
of
the
financial
market.
Business Challenge
One
of
the
largest
US
financial
institutions
was
looking
for
a
way
to
monetize
aggregate
consumer
finance
data.
Banks
possess
massive
amounts
of
operational,
transactional
and
balance
data
that
holds
information
about
macro-economic
trends.
This
information
can
be
valuable
for
investors,
advertisers
and
merchants.
The
specific
technical
challenge
in
generating
revenue
from
this
data
was
two-fold:
Data
protection
regulations
and
policies
require
that
the
privacy
of
bank
customers
is
strictly
protected.
This
requires
valuable
banking
data
to
be
aggregated
and
served
in
a
way
that
does
not
contain
personally
identifiable
information.
Moreover,
the
data
that
was
of
interest
lived
in
isolated
legacy
silos
controlled
by
different
lines
of
business.
Solution
The
financial
institution
turned
to
Hadoop
as
a
common
cross-company
data
lake
for
data
from
different
lines
of
business,
covering
mortgages,
consumer
checking,
personal
credit,
wholesale
transactions
and
treasury
banking.
A
single
point
of
security
and
privacy
enforcement
allows
the
bank
to
operationalize
security
and
privacy
measures
such
as
de-identification,
masking,
encryption,
user
authentication
and
access
control.
Value Realized
Both
internal
bank
executives
and
consumers
in
the
secondary
market
derive
value
from
the
data.
Mortgage
bankers,
consumer
bankers,
credit
card
group
and
treasury
bankers
have
access
to
the
same
cross-sell
data.
As
a
result,
the
bank
is
able
to
improve
operational
decision
making
but
also
generate
revenue
from
the
sale
of
actionable
intelligence
to
investors,
advertisers
and
merchants.
*
*
*
Any
financial
services
business
cares
about
minimizing
risk
and
maximizing
opportunity.
Banks
weigh
the
risk
of
opening
accounts
and
extending
credit
against
the
opportunity
to
hold
deposits.
Insurance
companies
balance
the
risk
of
claims
outpacing
premiums.
Investment
companies
pursue
long-term
portfolio
appreciation
knowing
that
some
securities
will
lose
value.
Storing
and
processing
all
data
in
Hadoop
provides
better
insight
into
the
optimal
balance
of
risk
and
opportunity.
With
Hadoop,
financial
services
firms
can
build
a
competitive
advantage
by
improving
their
risk
management,
reducing
fraud,
driving
customer
upsell
and
improving
investment
decisions.
These
financial
services
company
examples
show
what
enterprises
across
industries
are
discovering:
Hadoop
brings
both
superior
economics
compared
to
legacy
analytics,
data
warehousing
and
storage
alternatives
as
well
as
exciting
new
capabilities.
These
capabilities
provide
deeper
and
more
actionable
insights
to
drive
revenue
up
and
costs
down.