Вы находитесь на странице: 1из 2

Software Testing

Group Members

Ubaid ur Rehman (066)


Umer Tariq(115)
Umer Ashiq(090)
Rao Inam Bari(033)
Asad Muhammad Rana(25)

Challenges in Quality of Big Data Testing


Testing Big data is one of biggest challenge faced by organizations becaue of lack of
knowledge on what to test and how much data to test. Organizations have been facing
challenges in defining the test strategies for structured and unstructured data validation,
setting up an optimal test environment, working with non-relational databases and
performing non-functional testing. These challenges are causing in poor quality of data in
production and delayed implementation and increase in cost. functional and non-functional
testing are required along with strong test data and test environment management to
ensure that the data from varied sources is processed error free and is of good quality to
perform analysis.
Big data implementation deals with writing complex Pig, Hive programs and running
these jobs using Hadoop map reduce framework on huge volumes of data across different
nodes. Hadoop is a framework that allows for the distributed processing of large data sets
across clusters of computers. Hadoop uses Map/Reduce, where the application is divided
into many small fragments of work, each of which may be executed or re-executed on any
node in the cluster.
HDFS involves in extracting the data from different source systems and loading into HDFS.
Data is extracted using crawl jobs for web data, tools like sqoop for transactional data and then
loaded into HDFS by splitting into multiple files. Once this step is completed second step perform
map reduce operations involves in processing the input files and applying map and reduce
operations to get a desired output. Last setup extract the output results from HDFS involves in
extracting the data output generated out of second step and loading into downstream systems
which can be enterprise data warehouse for generating analytical reports or any of the
transactional systems for further processing.
Testing should be performed at each of the three phases of Big data processing to ensure
that data is getting processed without any errors
Functional Testing includes.
(i) validation of pre-Hadoop processing
(ii), validation of Hadoop Map Reduce process data output
(iii) validation of data extract, and load into EDW.
Huge Volume and Heterogeneity
Testing a huge volume of data is the biggest challenge in itself. A decade ago, a data pool of 10
million records was considered gigantic. Today, businesses have to store Petabyte or Exabyte data,
extracted from various online and offline sources, to conduct their daily business. Testers are required
to audit such voluminous data to ensure that they are a fit for business purposes.
Understanding the Data
For the Big Data testing strategy to be effective, testers need to continuously monitor and validate the
4Vs (basic characteristics) of Data Volume, Variety, Velocity and Value. Understanding the data and
its impact on the business is the real challenge faced by a any Big Data tester.
Dealing with Sentiments and Emotions In a big-data system, unstructured data drawn from

sources such as tweets, text documents and social media posts supplement a data fee

Вам также может понравиться