Umer Tariq(115) Umer Ashiq(090) Rao Inam Bari(033) Asad Muhammad Rana(25)
Challenges in Quality of Big Data Testing
Testing Big data is one of biggest challenge faced by organizations becaue of lack of knowledge on what to test and how much data to test. Organizations have been facing challenges in defining the test strategies for structured and unstructured data validation, setting up an optimal test environment, working with non-relational databases and performing non-functional testing. These challenges are causing in poor quality of data in production and delayed implementation and increase in cost. functional and non-functional testing are required along with strong test data and test environment management to ensure that the data from varied sources is processed error free and is of good quality to perform analysis. Big data implementation deals with writing complex Pig, Hive programs and running these jobs using Hadoop map reduce framework on huge volumes of data across different nodes. Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers. Hadoop uses Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. HDFS involves in extracting the data from different source systems and loading into HDFS. Data is extracted using crawl jobs for web data, tools like sqoop for transactional data and then loaded into HDFS by splitting into multiple files. Once this step is completed second step perform map reduce operations involves in processing the input files and applying map and reduce operations to get a desired output. Last setup extract the output results from HDFS involves in extracting the data output generated out of second step and loading into downstream systems which can be enterprise data warehouse for generating analytical reports or any of the transactional systems for further processing. Testing should be performed at each of the three phases of Big data processing to ensure that data is getting processed without any errors Functional Testing includes. (i) validation of pre-Hadoop processing (ii), validation of Hadoop Map Reduce process data output (iii) validation of data extract, and load into EDW. Huge Volume and Heterogeneity Testing a huge volume of data is the biggest challenge in itself. A decade ago, a data pool of 10 million records was considered gigantic. Today, businesses have to store Petabyte or Exabyte data, extracted from various online and offline sources, to conduct their daily business. Testers are required to audit such voluminous data to ensure that they are a fit for business purposes. Understanding the Data For the Big Data testing strategy to be effective, testers need to continuously monitor and validate the 4Vs (basic characteristics) of Data Volume, Variety, Velocity and Value. Understanding the data and its impact on the business is the real challenge faced by a any Big Data tester. Dealing with Sentiments and Emotions In a big-data system, unstructured data drawn from
sources such as tweets, text documents and social media posts supplement a data fee