You are on page 1of 4

Enter the Exciting World of Big Data

In Partnership with

Programming for Big Data and Analytics


Develop Real-World Big Data and Analytics Applications
Earn High Value Certificates | Get 365 Days Placement Assistance

PROGRAM MENTORS

What is Big Data?


Big Data consists of an exploding world of structured and unstructured data
sets that are beyond the ability of conventional databases and programming
tools to process and analyze. Big Data is characterized by far greater Volume,
Variety and Velocity than traditional data.

Asokan Pichai
Senior Vice President, TalentSprint

Big Data and Analytics refers to the process of collecting, organizing,


transforming, analyzing, and presenting large sets of data to discover useful
patterns, correlations and insights.
Why Learn Big Data and Analytics?
We live in a digital era, defined by information abundance and growing

S. Anand

complexity. Every interaction with businesses is now digitized, waiting for a

Chief Data Scientist, Gramener

clever algorithm for analysis. The potential of analytics to increase efficiency


or forecast future probabilities is tremendous. Businesses and governments
are taking advantage of these new data-focused tools and techniques to
improve organizational efficiency and gain a competitive advantage.
With all this comes the demand for new talent for Big Data. Many companies
are now looking for IT professionals with Big Data programming skills who
are able to identify, collect, analyze, interpret and transform data to drive
value and innovation for the organization. Programming for Big Data and
Analytics is a highly innovative program designed to create IT professionals
with the skills required to manage and analyze Big Data.
Why Join this Particular Program?

Big Data professionals earn twice as much as other IT professionals

Co-designed by Big Data experts at TalentSprint and Gramener

5 real-time case studies and 1 project from Gramener

Joint certificate from TalentSprint and Gramener

MOOC certificates on Big Data from top US universities

365 days placement assistance after the program

Program Details
Duration : 300 hour program in 3-month full time format or 6-month part
time format

Prabhu Ramachandran
Professor, IIT Bombay

Who Should take this Course?

Professionals returning from a career break

Professionals looking for high-growth careers

Engineering graduates / students looking for a differentiated career

Program Outline
Setting up the Tool Chain

Core Language and Libraries

Data Collection and Cleansing

Understand why Python is the toolbox of

Use the standard library and development


tools setup to write, execute, and
troubleshoot Python programs for regular
data processing tasks.

Learn to load data from common sources,


such as structured text files, web pages
and SQL databases. Use some standard
tools to clean and prepare data for
analysis.

choice for data science and how each tool is


used for data science workflow. Set up a
functional Python based environment in
your machine and cloud for your use.

Data Types and objects

Version control: git, github

Control Flow

Reading and Writing CSV files

iPython, notebook

Reading and writing data

Reading and Writing SQL

Editors and IDEs

Modules and Namespaces

lxml, beautiful soup and requests

Get a MOOC Certificate!

Advanced constructs: OOP, FP

Background Statistics

Data Exploration and Analysis

Data Transformation

Brush up on the core concepts of statistics


and learn how to carry out statistical
computations using Python.

Learn how to perform statistical analysis


using specialized tools and packages to
explore data and extract meaningful
insights.

Gain in-depth experience in using pandas


for data manipulation and transformation.

Mean, Median, Mode, ANOVA


scipy

Using pandas

Data munging with pandas


Short case studies
Normalization and outlier removal

Overview of R
scipy.stats

Distributed Computing and

Machine Learning

Parallel Computing with Spark

Hadoop

Learn about the machine learning


algorithms and how to use the scikit-learn
library.

Learn how to manipulate data sets using


parallel processing with Apache Spark.

Learn how to deploy and extract data from


Hadoop.
subprocess and multiprocessing
Hadoop installation/deployment
Data extraction from Hadoop

Survey of Algorithms

G et edX certificate University of


California, Berkeley

scikit-learn library
Regression

Using pig/hive
Get a MOOC Certificate!

Data Visualization

Text Processing

Course Project

Learn how to present information and


results of analysis graphically for best
assimilation.

Learn about patterns and NLTK libraries


and how to use them for Text Processing
tasks.

Gramener's extensive library of live case

matplotlib in depth
survey of other tools

Work on a real time project from


studies and big data sets!

TalentSprint Awards

FICCI LeapVault Skills


Champion Roll of
Honour 2012

Best Non-Corporate
Performer 2012

Best Performing
Partner Award 2013

TalentSprint Professional Affiliations

About TalentSprint
TalentSprint is a leader in professional skill development and integrated talent management for the
Information Technology and Banking sectors. Funded by NSDC and Nexus Ventures, TalentSprint
has embarked on an ambitious mission to skill ONE MILLION young people by 2020. The company
partners with more than 250 employers and 150 colleges, and has skilled 50,000 young job-seekers
since its inception. Our trainees are regularly recruited by major multinational IT firms that include
Accenture, Genpact, Deloitte, Capgemini, Virtusa, ADP, Wells Fargo, Tech Mahindra, Cognizant, CSC,
Value Labs, HSBC, Cyient, Broadridge, Renault Nissan, CA Technologies, Polaris, Invesco to name a
few. TalentSprint conducts industry-linked skill programs across multiple channels such as skill
centers, online, and college campuses.

About Gramener
Gramener is a Data Visualization and Analytics company. Its proprietary platform, Gramex
Visualization, handles large-scale data via programmatic analysis and visualizes it in real-time. The
company helps clients unlock hidden insights from data, and using cutting-edge visualizations
develops foresight for critical business decisions. Gramener works with Global Fortune 100
customers in a quick, non-intrusive manner to condense large amounts of data from heterogeneous
sources and convert these findings into intuitive visual representations. Gramener customers are
spread across various domains including Telecom, Manufacturing, Financial, Pharmaceuticals,
Media, Utilities, Airlines, Retail, Education and Government sectors. The company has its offices in
Hyderabad, Bangalore, and Coimbatore. For more information, please visit www.gramener.com.

Corporate Office
TalentSprint Pvt Ltd
Block A, IIIT Campus, Gachibowli, Hyderabad - 500 032
Hyderabad | Bangalore | Chennai | Coimbatore | Visakhapatnam
www.talentsprint.com
email : sourcing@talentsprint.com

CIN: U80902TG2008PTC062284