Вы находитесь на странице: 1из 15

Data Science using Python

A comprehensive, job-oriented
training program crafted by experts

Disclaimer: This material is protected under copyright act AnalytixLabs ©, 2011-2018. Unauthorized use and/ or duplication of this material or any part of this material
including data, in any form without explicit and written permission from AnalytixLabs is strictly prohibited. Any violation of this copyright will attract legal actions
About AnalytixLabs
AnalytixLabs is a capability building and training solutions firm led by McKinsey, IIM, ISB and IIT alumni with deep industry experience
and a flair for coaching. We are focused at helping our clients develop skills in basic and advanced analytics to enable them to emerge as
“Industry Ready” professionals and enhance career opportunities. AnalytixLabs has been also featured as top institutes by prestigious
publications like Analytics India Magazine and Higher Education Review, since 2013.

Bottom line
• Job-oriented training
Faculty
• Lucrative job prospects in high
growth domain
• Seasoned analytics professionals
Content
• Together we have 30 + years of • Support for relevant
• World class course structure certifications and diplomas
experience with prestigious firms,
Approach like McKinsey, KPMG, Deloitte
• Career counseling and planning
• Surpasses industry requirements and AOL
• 80-20 focus on practical & theory • Value for money with high return
• Cater to Standard certifications • Regular sessions by industry
on investment
• Personal attention and Individual experts
counselling • High quality course material and
real life case studies
• Industry best practices
Global Data science and Big Data skill gap

McKinsey Global Institute estimates a shortage of nearly 1.7 million big data talents by 2018. This includes a
shortage of 140,000 to 190,000 workers with deep technical and analytical expertise, and a shortage of 1.5
million managers and analysts equipped to work with and use big data outputs
Candidates trained by us are working in leading companies across
industries…
Program Objective

Data Science using Python program aims to provide its students an international, wide-spectrum qualification for job-readiness
and seamless absorption in Big Data job roles.

The program will expose the students and professionals to the roles of Big Data Analysts who have:

 Ability to translate business problem into analytics problem


 Understanding of storage, retrieval and mining of data
 Possess Outcome-Oriented and Global Industry-Specific expertise in Critical Data Analytics and Data Management Skills
 Hands-on practical skills on exploratory analysis, prescriptive and predictive analysis using Python
 Application of analytics in various domains, like ecommerce, Retail, Telecom, BFSI etc.
 Skills to leverage analytics to drive smart business decisions

Crafted by team of experts and maintains a balance between theoretical concepts and practical applications
Data Science using Python is a comprehensive program with following
modules, weekly assignments and case studies

• Python Foundation – 21 hours + Practice exercises


Module 1
• Basic data handling, data manipulation and visualization

• Business Analytics – 27 hours + Practice exercises


Module 2
• Data preparation for advanced analytics and predictive modeling

• Machine Learning – 24 hours + Practice exercises


Module 3
• Supervised & Unsupervised learning (ANN, SVM, KNN) and Text Mining

Crafted by team of experts and maintains a balance between theoretical concepts and practical applications
Data Science using Python-Python Foundation (1/4)
Total Duration: 21 hours live training + Practice
Introduction to Data Science with Python Python Essentials (Core) Data Manipulation – cleansing – Munging using Python
• What is analytics & Data Science? • List and Dictionary Comprehensions modules
• Common Terms in Analytics • Variable & Value Labels – Date & Time Values • Normalizing data
• Analytics vs. Data warehousing, OLAP, MIS Reporting • Basic Operations - Mathematical - string - date • Formatting data
• Relevance in industry and need of the hour • Reading and writing data • Important Python modules for data manipulation
• Types of problems and business objectives in various • Simple plotting (Pandas, Numpy, re, math, string, datetime etc)
industries • Control flow & conditional statements
• How leading companies are harnessing the power of • Debugging & Code profiling Data Analysis – Visualization using Python
analytics? • How to create class and modules and how to call them? • Introduction exploratory data analysis
• Critical success drivers • Descriptive statistics, Frequency Tables and summarization
• Overview of analytics tools & their popularity Scientific distributions used in python for Data Science • Univariate Analysis (Distribution of data & Graphical
• Analytics Methodology & problem solving framework • Numpy, scify, pandas, scikitlearn, statmodels, nltk etc Analysis)
• List of steps in Analytics projects • Bivariate Analysis(Cross Tabs, Distributions & Relationships,
• Identify the most appropriate solution design for the given Accessing/Importing and Exporting Data using python modules Graphical Analysis)
problem statement • Importing Data from various sources (Csv, txt, excel, access • Creating Graphs- Bar/pie/line chart/histogram/ boxplot/
• Project plan for Analytics project & key milestones based on etc) scatter/ density etc)
effort estimates • Database Input (Connecting to database) • Important Packages for Exploratory Analysis(NumPy
• Build Resource plan for analytics project • Viewing Data objects - subsetting, methods Arrays, Matplotlib, seaborn, Pandas and scipy.stats etc)
• Why Python for data science? • Exporting Data to various formats
• Important python modules: Pandas, beautifulsoup
Python Essentials (Core)
• Overview of Python- Starting with Python Data Manipulation – cleansing – Munging using Python modules
• Introduction to installation of Python • Cleansing Data with Python
• Introduction to Python Editors & IDE's(Canopy, pycharm, • Data Manipulation steps(Sorting, filtering, duplicates, merging,
Jupyter, Rodeo, Ipython etc…) appending, subsetting, derived variables, sampling, Data type
• Understand Jupyter notebook & Customize Settings conversions, renaming, formatting etc)
• Concept of Packages/Libraries - Important packages(NumPy, • Data manipulation tools(Operators, Functions, Packages,
SciPy, scikit-learn, Pandas, Matplotlib, etc) control structures, Loops, arrays etc)
• Installing & loading Packages & Name Spaces • Python Built-in Functions (Text, numeric, date, utility functions)
• Data Types & Data objects/structures (strings, Tuples, Lists, • Python User Defined Functions
Dictionaries) • Stripping out extraneous information
Data Science using Python-Business Analytics (2/4)
Total Duration: 27 hours live training + Practice
Introduction to Statistics Data Preparation Logistic Regression: Solving classification problems
• Basic Statistics - Measures of Central Tendencies and • Need of Data preparation • Introduction - Applications
Variance • Consolidation/Aggregation - Outlier treatment - Flat Liners - • Linear Regression Vs. Logistic Regression Vs. Generalized
• Building blocks - Probability Distributions - Normal Missing values- Dummy creation - Variable Reduction Linear Models
distribution - Central Limit Theorem • Variable Reduction Techniques - Factor & PCA Analysis • Building Logistic Regression Model (Binary Logistic Model)
• Inferential Statistics -Sampling - Concept of Hypothesis • Understanding standard model metrics (Concordance,
Testing Segmentation: Solving segmentation problems Variable significance, Hosmer Lemeshov Test, Gini, KS,
• Statistical Methods - Z/t-tests( One sample, independent, • Introduction to Segmentation Misclassification, ROC Curve etc)
paired), Anova, Correlations and Chi-square • Types of Segmentation (Subjective Vs Objective, Heuristic Vs. • Validation of Logistic Regression Models (Re running Vs.
• Important modules for statistical methods: Numpy, Scipy, Statistical) Scoring)
Pandas • Heuristic Segmentation Techniques (Value Based, RFM • Standard Business Outputs (Decile Analysis, ROC Curve,
Segmentation and Life Stage Segmentation) Probability Cut-offs, Lift charts, Model equation, Drivers or
Introduction to Predictive Modeling • Behavioral Segmentation Techniques (K-Means Cluster variable importance, etc)
• Concept of model in analytics and how it is used? Analysis) • Interpretation of Results - Business Validation -
• Common terminology used in analytics & modeling process • Cluster evaluation and profiling - Identify cluster characteristics Implementation on new data
• Popular modeling algorithms • Interpretation of results - Implementation on new data
• Types of Business problems - Mapping of Techniques Time Series Forecasting: Solving forecasting problems
• Different Phases of Predictive Modeling Linear Regression: Solving regression problems • Introduction - Applications
• Introduction - Applications • Time Series Components( Trend, Seasonality, Cyclicity and
Data Exploration for modeling • Assumptions of Linear Regression Level) and Decomposition
• Need for structured exploratory data • Building Linear Regression Model • Classification of Techniques(Pattern based - Pattern less)
• EDA framework for exploring the data and identifying any • Understanding standard metrics (Variable significance, R- • Basic Techniques - Averages, Smoothening, etc
problems with the data (Data Audit Report) square/Adjusted R-square, Global hypothesis ,etc) • Advanced Techniques - AR Models, ARIMA, etc
• Identify missing data • Assess the overall effectiveness of the model • Understanding Forecasting Accuracy - MAPE, MAD, MSE,
• Identify outliers data • Validation of Models (Re running Vs. Scoring) etc
• Visualize the data trends and patterns • Standard Business Outputs (Decile Analysis, Error distribution
(histogram), Model equation, drivers etc.)
• Interpretation of Results - Business Validation -
Implementation on new data
Data Science using Python-Machine Learning (3/4)
Total Duration: 24 hours live training + Practice
Machine Learning -Predictive Modeling – Basics Supervised Learning: Decision Trees Supervised Learning: Artificial Neural Networks (ANN)
• Introduction to Machine Learning & Predictive • Decision Trees - Introduction - Applications • Motivation for Neural Networks and Its Applications
Modeling • Types of Decision Tree Algorithms • Perceptron and Single Layer Neural Network, and
• Types of Business problems - Mapping of Techniques - • Construction of Decision Trees through Simplified Hand Calculations
Regression vs. classification vs. segmentation vs. Examples; Choosing the "Best" attribute at each Non-Leaf • Learning In a Multi Layered Neural Net: Back
Forecasting node; Entropy; Information Gain, Gini Index, Chi Square, Propagation and Conjugant Gradient Techniques
• Major Classes of Learning Algorithms -Supervised vs Regression Trees • Neural Networks for Regression
Unsupervised Learning • Generalizing Decision Trees; Information Content and • Neural Networks for Classification
• Different Phases of Predictive Modeling (Data Pre- Gain Ratio; Dealing with Numerical Variables; other • Interpretation of Outputs and Fine tune the models
processing, Sampling, Model Building, Validation) Measures of Randomness with hyper parameters
• Overfitting (Bias-Variance Trade off) & Performance • Pruning a Decision Tree; Cost as a consideration; • Validating ANN models
Metrics Unwrapping Trees as Rules
• Feature engineering & dimension reduction • Decision Trees - Validation Supervised Learning: Support Vector Machines
• Concept of optimization & cost function • Overfitting - Best Practices to avoid • Motivation for Support Vector Machine & Applications
• Overview of gradient descent algorithm • Support Vector Regression
• Overview of Cross validation(Bootstrapping, K-Fold Supervised Learning: Ensemble Learning • Support vector classifier (Linear & Non-Linear)
validation etc) • Concept of Ensembling • Mathematical Intuition (Kernel Methods Revisited,
• Model performance metrics (R-square, Adjusted R- • Manual Ensembling Vs. Automated Ensembling Quadratic Optimization and Soft Constraints)
squre, RMSE, MAPE, AUC, ROC curve, recall, precision, • Methods of Ensembling (Stacking, Mixture of Experts) • Interpretation of Outputs and Fine tune the models
sensitivity, specificity, confusion metrics ) • Bagging (Logic, Practical Applications) with hyper parameters
• Random forest (Logic, Practical Applications) • Validating SVM models
Unsupervised Learning: Segmentation • Boosting (Logic, Practical Applications)
• What is segmentation & Role of ML in Segmentation? • Ada Boost
• Concept of Distance and related math background • Gradient Boosting Machines (GBM)
• K-Means Clustering • XGBoost
• Expectation Maximization
• Hierarchical Clustering
• Spectral Clustering (DBSCAN)
• Principle component Analysis (PCA)
Data Science using Python-Machine Learning (4/4)
Total Duration: 24 hours live training + Practice
Supervised Learning: KNN Text Mining & Analytics
• What is KNN & Applications? • Text Analytics - Segmentation using K-Means/Hierarchical
• KNN for missing treatment Clustering
• KNN For solving regression problems • Text Analytics - Classification (Spam/Not spam)
• KNN for solving classification problems • Applications of Social Media Analytics
• Validating KNN model • Metrics(Measures Actions) in social media analytics
• Model fine tuning with hyper parameters • Examples & Actionable Insights using Social Media
Analytics
Supervised Learning: Naïve Bayes
• Concept of Conditional Probability •Important python modules for Machine Learning (SciKit
• Bayes Theorem and Its Applications Learn, stats models, scipy, nltk etc)
• Naïve Bayes for classification
• Applications of Naïve Bayes in Classifications •Fine tuning the models using Hyper parameters, grid
search, piping etc.
Text Mining & Analytics
• Taming big text, Unstructured vs. Semi-structured Data; Project - Consolidate Learnings:
Fundamentals of information retrieval, Properties of Applying different algorithms to solve the business problems
words; Creating Term-Document (TxD);Matrices; and bench mark the results
Similarity measures, Low-level processes (Sentence
Splitting; Tokenization; Part-of-Speech Tagging;
Stemming; Chunking)
• Finding patterns in text: text mining, text as a graph
• Natural Language processing (NLP)
• Text Analytics – Sentiment Analysis using R
• Text Analytics – Word cloud analysis using R
Course completion and career assistance
Course completion & Certification criteria What is included in career assistance?

• You shall be awarded an AnalytixLabs certificate only • Post successful course completion, candidates can seek
post the submission and evaluation of mandatory course assistance from AnalytixLabs for profile building. A team
project work. These will be provided as a part of the of seasoned professionals will help you based on your
training. overall education background and work experience. This
will be followed by interview preparation along with
• There is no pass/fail for these assignments and projects . mock interviews (if required)
Our objective is to ensure that trainees get strong hands-
on experience so that they are well-prepared for job • Job referrals are based on the requirements we get from
interviews along with performance at their jobs. various organizations, HR consultants and large pool of
AnalytixLabs’ ex-students working in various companies.
• Incase the assignments and projects are not up-to-the-
mark, trainees are welcome to take help and support for • No one can truthfully provide job guarantee, particularly
improvisation. for good quality job profiles in Analytics. However, most
of our students do get multiple interview calls and good
• While weekly schedule is shared with trainees for regular career options based on the skills they learn during
assignments, candidates get 3 months, post course training. For this there will be continuous support from
completion, to submit their final assignment and our side for as long as required.
projects.
Time and investment
Full interactive online training: 72 hours live training + Practice (~120 hours),
INR 30,000 + 18% GST / $1200 (foreign nationals) including taxes

Data Science using Python (self-paced): ~72 hours + Practice, INR 25,000 + 18% GST / $900 (foreign nationals)

Timing: 6 hours per weekend live training (Saturday & Sunday 3 hours each) + Practice

Training mode: Fully interactive live online class


(In addition to the above, you will also get access to the recordings for future reference and self study)

Components: Learning Management System access for courseware like class recordings - study material, Industry-
relevant project work

Certification: Participants will be awarded a certificate on successful completion of the stipulated requirements
including an evaluation
We provide trainings both in ‘fully interactive live online’ and classroom*
mode
Fully interactive
live online class
with personal
attention
Access to quality
Saves training and 24x7
commuting time practice
and resources in sessions
today’s chaotic available at the
world comfort of your
Ensures place
best use of
time and
Delivered resources
Studies prove
lectures are
that online
recorded and
education beats
can be replayed
the conventional
by individuals as
classroom
per their needs One of strongest
global trends in
education, both
in developing
and developed
countries

*Classroom only available at Gurgaon and Bangalore center


Contact Us

Visit us on: http://www.analytixlabs.in/

For course registration, please visit: http://www.analytixlabs.co.in/course-registration/

For more information, please contact us: http://www.analytixlabs.co.in/contact-us/


Or email: info@analytixlabs.co.in
Call us we would love to speak with you: (+91) 9555219007

Join us on:
Twitter - http://twitter.com/#!/AnalytixLabs
Facebook - http://www.facebook.com/analytixlabs
LinkedIn - http://www.linkedin.com/in/analytixlabs
Blog - http://www.analytixlabs.co.in/category/blog/
Visit Us

Gurgaon Address: Bengaluru Address:

GF 382, Sector 29, Bldg 41, First floor,


Adjoining IFFCO Chowk Metro 14th Main Road, Near BDA
Station (Gate 2), complex,
Next to Vasan Eye Care Hospital, Sector 7, HSR Layout
Gurgaon, Haryana 122001, Bengaluru - 560102
India Landmark: Max store

Вам также может понравиться