Вы находитесь на странице: 1из 10

BIG DATA TESTING

BIG DATA ?

• It is a term for collection of data sets so large & complex that becomes difficult
to process using traditional data processing applications.

Big Data

Activities
Normal
Processing
Capabilities

Content Volume
• Social Networking sites like Facebook, LinkedIn, Twitter etc.,

• Mobile device data such as  Text messages,  Calls data, Apps data etc.,

“Big Data” Sources • Internet Transactions like e-Commerce websites, banking activities etc

• Network devices/ sensors data like weather forecasting, temp etc.,


Need of RDBMS

• Very Quick in response

• Enables relation between data elements to be defined &


managed

Traditional Data Processing • Single DB can be utilized for all applications

Limitations of traditional approach

• Data processing takes too long as the volume of data increases

• Not Scalable
Business Master data Transactions
Strategy
Business Processes
OLTP
Operations

OLTP & OLAP Information

OLAP
Business Data
Warehouse
Data Mining
Analytics
5 Vs

• Volume

• Velocity
Big Data
Characteristics • Variety

• Value

• Veracity
 Apache Hadoop is a framework that allows distributed processing
of large datasets across clusters of commodity of computers using
a simple programming model

 It is an architecture that can scale with huge volumes, variety and


speed requirements of big data by distributing the workload
across various commodity servers that process the data in parallel.

Goals of HDFS:

HADOOP  Fast recovery from


hardware failures.

 Access to streaming
data

 Accommodation of
large data sets

 Portability
Phases in Big Data Testing

Test Entry Points

• Data Staging Validation


Data
Source
(RDBMS, • Map reduce Validation
MongoD
Source ETL Target Data B
HADOOP Process Warehouse I
B, social
media • Output Validation
data etc)
 HDFS – For data storage

 Pig & Hive / Map reduce - for Processing


& Transforming data

 Sqoop – For bulk transfer of data between


Tools Used in Big Data Scenarios
RDBMS and HDFS

 Kafka – For real-time data streaming


• TestingWhiz -  Helps in verifying structured &
unstructured data sets, schemas at different
sources such as Hive, Map reduce, Sqoop &
Pig

• QuerySurge – Helps in end – end testing

Automation Tools &


Challenges in Big Data Testing Challenges in Big data testing:

Large datasets & possible latency.

Automation tools may not be well equipped to


handle unexpected challenges.

Performance testing

Вам также может понравиться