You are on page 1of 18

Big Data Security

Top 5 Security Risks and Recommendations


Key Insights of Big Data Architecture Top 5 Big Data Security Risks Top 5 Recommendation

Big Data Architecture

Key Insights

Distributed Architecture & Auto Tiering Real Time, Streaming and Continuous Computation Adhoc Queries Parallel and Powerful Computation Language Move the Code, Not the data Non Relational Data Variety of Input Sources

Distributed Architecture
(Hadoop as example)

Data Partition, Replication and Distribution

Move the Code


Real Time, Streaming and Continuous Integration Patterns Computation

Variety of Input Sources Real time

Adhoc Queries

Parallel & Powerful Programming Framework

Example: 16TB Data 128 MB Chunks 82000 Maps
Java vs SQL / PLSQL Frameworks: MapReduce Storm Topology (Spouts & Bolts)

Big Data Architecture

No Single Silver Bullet

Hadoop is already unsuitable for many Big data problems Real-time analytics
Cloudscale, Storm

Graph computation
o Giraph and Pregel (Some examples graph computation are Shortest Paths, Degree of Separation etc.)

Low latency queries

o Dremel

Top 5 Unique Security Risks

Insecure Computation End Point Input Validation and

Filtering Granular Access Control Insecure Data Storage and Communication Privacy Preserving Data Mining and Analytics

Insecure Computation

Untrusted Computation program

Sensitive Info

Health Data

Information Leak Data Corruption DoS

Input Validation and Filtering

Input Validation Data Filtering

o How can we trust data? o What kind of data is untrusted? o What are the untrusted data sources?
o Filter Rogue or malicious data

o GBs or TBs continuous data o Signature based data filtering has limitations How to filter Behavior aspect of data?

Granular Access Controls

Designed for Performance, no security in

mind Security in Big Data still ongoing research Table, Row or Cell level access control gone missing Adhoc Queries poses additional challenges Access Control is disabled by default

Insecure Data Storage

Data at various nodes, Authentication,

Authorization & Encryptions is challenging Autotiering moves cold data to lesser secure medium
o What if cold data is sensitive?

Encryption of Real time data can have

performance impacts Secure communication among nodes, middleware and end users are disabled by default

Privacy Preserving Data Mining and Analytics

Monetization of Big Data generally involves

Data Mining and Analytics Sharing of Results involve multiple challenges

o Invasion of Privacy o Invasive Marketing o Unintentional Disclosure of Information
o AOL release of Anonymzed search logs, Users can easily be identified o Netflix faced a similar problem


Top 5 Recommendations

Secure your Computation Code

Implement access control, code signing, dynamic analysis of computational code Strategy to prevent data in case of untrusted code

Implement Comprehensive Input Validation and Filtering

Implement validation and filtering of input data, from internal or external sources Evaluate input validation filtering of your Big Data solution

Top 5 Recommendations

Implement Granular Access Control

Review Role and Privilege Matrix Review permission to execute Adhoc queries Enable Access Control
Sensitive Data should be segregated Enable Data encryption for sensitive data Audit Administrative Access on Data Nodes API Security

Secure your Data Storage and Computation

Top 5 Recommendations

Review and Implement Privacy Preserving Data Mining and Analytics

Analytics data should not disclose sensitive information Get the Big Data Audited

Thank You

About iViZ