Introduction to Big Data Testing

Introduction to Big Data Testing, How to handle Big Data?, SQL Databases vs. NoSQL Databases, testing structured and unstructured databases.

Big Data Tutorial
Introduction to Big Data Testing.

What is Big Data?

Big Data means data that is huge in size. Big data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time.

Examples of Big Data generation includes stock exchanges, social media sites, jet engines, etc.

Data formats in Big Data

1. Structured Data

This refers to data that is highly organized, It can be easily stored in any relational database. This also means that it can be easily retrieved / searched using simple queries.

Ex: Table (Columns and Rows)format data.

2. Semi Structured Data

Semi-structured data is not rigidly organized in a format that can allow it to be easily accessed and searched.

Semi-structured data is not usually stored in a relational database. It can contain tags and other metadata to implement a hierarchy and order.

Examples of Semi-Structured Data

CSV, XML and JavaScript Object Notation (JSON)

3. Unstructured Data

Unstructured data does not have any predefined format, it does not follow a structured data model, and it is not organized into a predefined structure.

Examples of Unstructured Data

Images, videos, word documents, presentations, mp3 files etc.

Examples and Usage Of Big Data

E-commerce applications
Amazon, Flipkart and other e-commerce sites have millions of visitors each day with hundreds of thousands of products. They use Big Data to store information regarding products, customer and purchases.

Social Media applications
Social media sites Facebook, Twitter, Instagram, etc. use Big Data to generate huge amounts of data in terms of pictures, videos, likes, posts, comments etc.

Healthcare applications

Stock Market applications

Two types of Databases for storing the data

Relational or SQL Databases: MS Access, MS SQL Server, Oracle, MySQL, SyBase, DB2, DB/400, etc.

NoSQL Databases: MongoDB, CouchDB, CouchBase, Cassandra, HBase, Redis, etc.

What is Big Data Testing?

Testing of a big data application in order to ensure that all the functionalities of a big data application works as expected.

The General approach to test a Big Data Application involves the following stages.

1. Data Ingestion

The first step for deploying a big data solution is the data ingestion i.e. extraction of data from various sources.

The data source may be a CRM like Salesforce, Enterprise Resource Planning System like SAP, RDBMS like MySQL or any other log files, documents, social media feeds etc.

2. Data Processing

After data ingestion, the next step is to store the extracted data. The data either be stored in HDFS (Hadoop Distributed File System) or NoSQL database.

3. Validation of the Output

The final step in deploying a big data solution is the data processing. The data is processed through one of the processing frameworks like Spark, MapReduce, Pig, etc.

Big Data Testing Strategy?

There is various type of testing in Big Data projects such as Database testing, Functional testing, Infrastructure testing, and Performance Testing.

In Big data testing, QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components.

Subsets of Big Data Testing

Data Ingestion Testing

Data Storage Testing

Data Processing Testing

Data Migration Testing

Big Data Tools For Data Analysis

Xplenty
Apache Hadoop
CDH (Cloudera Distribution for Hadoop)
Apache Cassandra
Knime
Datawrapper
MongoDB
Etc.

Challenges in Testing Big Data:

1. The volume of the data is one major challenge for testing.
2. Test environment and automation should be developed for different platforms
3. No single tool can perform end to end testing
4. High Degree of scripting is required for designing test cases
5. Automated Big Data Testing procedures are predefined and not suited for unexpected errors.


SQL step by step Videos

SQL Queries for Software Testers

SQL Interview Questions for Beginners

SQL Syllabus

Follow me on social media: