100% found this document useful (1 vote)
345 views

Analytix Labs Data Science Course

Analytix Labs Data Science Course: A Very detailed Course
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
345 views

Analytix Labs Data Science Course

Analytix Labs Data Science Course: A Very detailed Course
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Science & Machine

Learning with Python

A comprehensive, job-oriented
training program crafted by experts

Disclaimer: This material is protected under copyright act AnalytixLabs ©, 2011-2019. Unauthorized use and/ or duplication of this material or any part of this material
including data, in any form without explicit and written permission from AnalytixLabs is strictly prohibited. Any violation of this copyright will attract legal actions
About AnalytixLabs
AnalytixLabs is a capability building and training solutions firm led by McKinsey, IIM, ISB and IIT alumni with deep industry experience
and a flair for coaching. We are focused at helping our clients develop skills in basic and advanced analytics to enable them to emerge as
“Industry Ready” professionals and enhance career opportunities. AnalytixLabs has been also featured as top institutes by prestigious
publications like Analytics India Magazine and Higher Education Review, since 2013.

Bottom line
• Job-oriented training
Faculty
• Lucrative job prospects in high
• Seasoned analytics professionals growth domain
Content
• Together we have 30 + years of • Support for relevant
• World class course structure experience with prestigious firms,
certifications and diplomas
Approach like McKinsey, KPMG, Deloitte
• Career counseling and planning
• Surpasses industry requirements and AOL
• 80-20 focus on practical & theory
• Value for money with high return
• Cater to Standard certifications • Regular sessions by industry
on investment
• Personal attention and Individual experts
counselling • High quality course material and
real life case studies
• Industry best practices
Global Data science and Big Data skill gap
McKinsey Global Institute estimates a shortage of nearly 1.7 million big data talents by 2018. This includes a
shortage of 140,000 to 190,000 workers with deep technical and analytical expertise, and a shortage of 1.5
million managers and analysts equipped to work with and use big data outputs
Candidates trained by us are working in leading companies across
industries…
Program Objective

Data Science using Python program aims to provide its students an international, wide-spectrum qualification for job-readiness
and seamless absorption in Big Data job roles.

The program will expose the students and professionals to the roles of Big Data Analysts who have:

 Ability to translate business problem into analytics problem


 Understanding of storage, retrieval and mining of data
 Possess Outcome-Oriented and Global Industry-Specific expertise in Critical Data Analytics and Data Management Skills
 Hands-on practical skills on exploratory analysis, prescriptive and predictive analysis using Python
 Application of analytics in various domains, like ecommerce, Retail, Telecom, BFSI etc.
 Skills to leverage analytics to drive smart business decisions

Crafted by team of experts and maintains a balance between theoretical concepts and practical applications
Data Science using Python is a comprehensive program with following
modules, weekly assignments and case studies

• Python Foundation – 30 hours + Practice exercises


Module 1
• Basic data handling, data manipulation and visualization

• Machine Learning – 39 hours + Practice exercises


Module 2
• Supervised & Unsupervised learning (ANN, SVM, KNN)

• Text Mining – 9 hours + Practice exercises


Module 3
• Detailed Text Mining (NLP/NLG)

Crafted by team of experts and maintains a balance between theoretical concepts and practical applications
Data Science using Python-Python Foundation (1/2)
Total Duration: 30 hours live training + Practice
Introduction to Data Science with R and Python Python Essentials (Core) Overview of Pandas
• What is analytics? • Debugging & Code profiling • What is pandas, its functions & methods
• Analytics vs. Data warehousing, OLAP, MIS Reporting • Python Built-in Functions (Text, numeric, date, • Pandas Data Structures (Series & Data Frames)
• Relevance in industry and need of the hour utility functions) • Creating Data Structures (Data import – reading into
• Types of problems and business objectives in various • User defined functions – Lambda functions pandas)
industries • Concept of apply functions
• How leading companies are harnessing the power of • Python – Objects – OOPs concepts Accessing/Importing and Exporting Data using python
analytics? • How to create class and modules? modules
• Critical success drivers • How to call classes and modules? • Importing Data from various sources (Csv, txt, excel,etc.)
• Future of analytics and critical requirement • Concept of pipelines in Python • Database Input (Connecting to database)
• Different phases of a typical Analytics projects • Viewing Data objects - sub setting, methods
Operations with NumPy (Numerical Python) • Exporting Data to various formats
Python Essentials (Core) • What is NumPy? • Understanding of data
• Overview of Python- Starting with Python • Overview of functions & methods in NumPy • Important python modules: Pandas, NumPy
• Why Python for data science? • Data structures in NumPy
• Anaconda vs. python • Creating arrays and initializing
• Introduction to installation of Python • Reading arrays from files
• Introduction to Python Editors & IDE's(Jupyter,/Ipython) • Special initializing functions
• Understand Jupyter notebook & Customize Settings • Slicing and indexing
• Concept of Packages - Important packages(NumPy, • Reshaping arrays
SciPy, scikit-learn, Pandas, Matplotlib, etc.) • NumPy Maths
• Installing & loading Packages & Name Spaces • Combining arrays
• Data Types & Data objects/structures (strings, Tuples, • Basic algebraic operations using NumPy arrays
Lists, Dictionaries) • Solving linear equations
• List and Dictionary Comprehensions • Matix inversions
• Variable & Value Labels – Date & Time Values • Calculating Eigen vectors
• Basic Operations - Mathematical - string – date
• Control flow & conditional statements
Data Science using Python-Python Foundation (2/2)
Total Duration: 30 hours live training + Practice
Cleansing Data with Python Data Analysis – Visualization using Python
• Understand the data • Exploratory data analysis
• Sub Setting Data or Filtering Data or Slicing Data • Descriptive statistics, Frequency Tables and summarization
• Using [] brackets • Uni-variate Analysis (Distribution of data & Graphical
• Using indexing or referring with column names/rows Analysis)
• Using functions • Bi - Variate Analysis(Cross Tabs, Distributions &
• Dropping rows & columns Relationships, Graphical Analysis)
• Mutation of table (Adding/deleting columns) • Creating different Graphs using multiple python packages-
• Binning data (Binning numerical variables in to categorical Bar/pie/line chart/histogram/stack chart/boxplot/ scatter/
variables using cut() and qcut() functions) density etc)
• Renaming columns or rows • Important Packages for Visualization (graphical analysis) –
• Sorting Pandas, Matplotlib, Seaborn, Bokeh etc.)
• by data/values, index
• By one column or multiple columns Basic statistics & implementation of stats methods in python
• Ascending or Descending • Basic Statistics - Measures of Central Tendencies and
• Type conversions Variance
• Setting index • What is probability distribution?
• Handling duplicates • Important distributions (discrete & continuous distributions)
• Handling missing values – detect, filter, replace • Deep dive of normal distributions and properties
• Handling outliers • Concept of sampling & types of sampling
• Creating dummies from categorical data (using • Concept of standard error and central limit theorem
get_dummies()) • Inferential Statistics - Concept of Hypothesis Testing
• Applying functions to all the variables in a data frame • Statistical Methods - Z/t-tests (One sample, independent,
(broadcasting) paired), ANOVA, Correlation and Chi- square
• Data manipulation tools(Operators, Functions, Packages, • Important modules for statistical methods: NumPy, SciPy,
control structures, Loops, arrays etc.) Pandas
• Important Python modules for data manipulation
(Pandas, NumPy, re, math, string, datetime etc.)
Data Science using Python –Machine Learning (1/3)
Total Duration: 39 hours live training + Practice
Introduction to Machine Learning & AI Supervised Learning – Regression problems using Linear Supervised Learning: Classification & Regression
• Introduction to Predictive Modelling Regression Problems using Decision Trees
• Types of Business problems - Mapping of Techniques • Introduction - Applications • Over view of Decision Trees
• Assumptions of Linear Regression
• Relevance in industry and need of the hour • Types of decision trees (Regression Trees,
• Building Linear Regression Model
• Difference between jargons i.e. data science, data Classification trees, Oblique Decision Trees)
• Important steps in Model building
analysis, data analytics, data mining • Need of Data preparation • Types of decision tree algorithms (CART vs. CHAID vs.
• What is Machine Learning? • Data Audit Report and Its importance C50 etc.)
• What is the goal of Machine Learning? • Consolidation/Aggregation - Outlier treatment - Flat Liners - • Concept of objective segmentation
• Applications of ML (Marketing, Risk, Operations, etc.) Missing values- Dummy creation - Variable Reduction • How to use decision trees to solve regression,
• Key components of ML • Variable Reduction Techniques - Factor & PCA Analysis classification & segmentation problems)
• Overall process of executing the ML project (Data Pre- • Understanding standard metrics (Variable significance, R- • Rule Based Knowledge: Logic of Rules, Evaluating
processing, Sampling, Model Building, Validation) square/Adjusted R-square, Global hypothesis ,etc.) Rules, Rule Induction and Association Rules
• Common mistakes done in ML project and how to • Validation of Models (Re running Vs. Scoring) • Construction of Decision Trees through Simplified
overcome • Standard Business Outputs (Decile Analysis, Error distribution Examples; Choosing the "Best" attribute at each Non-
(histogram), Model equation, drivers etc.)
• Different terms to know for ML Leaf node;
• Interpretation of Results - Business Validation - Implementation
• Splitting criteria: Entropy; Information Gain, Gini
on new data
ML Concepts – Learning algorithms Index, Chi Square; ANOVA)
• Major Classes of Learning Algorithms -Supervised vs. Supervised Learning: Classification Problems using Logistic • Generalizing Decision Trees; Information Content and
Unsupervised Learning vs. Semi supervised vs. Regression Gain Ratio; Dealing with Numerical Variables; other
Reinforcement Learning • Introduction - Applications Measures of Randomness
• Important Consideration like fitment of techniques • Linear Regression Vs. Logistic Regression Vs. Generalized Linear • Pruning decision tree
• Concept of Over fitting and Under fitting (Bias-Variance Models • Cost as a consideration
Trade off) & Performance Metrics • Building Logistic Regression Model • Fine Tuning model using tuning parameters
• Concept of optimization - Gradient descent algorithm • Important steps in model building • Model validation
• Concept of feature engineering • Understanding standard model metrics (Concordance, Variable • Over fitting - Best Practices to avoid
significance, Gini, KS, Misclassification, etc.)
• Regularization (LASSO, LARS, Elastic net and Ridge • Implementation of Solution
• Validation of Logistic Regression Models (Re running Vs. Scoring)
regression)
• Standard Business Outputs (Decile Analysis, ROC Curve,
• Types of Cross validation(Train & Test, K-Fold validation Probability Cut-offs, Lift charts, Model equation, Drivers, etc.)
etc.) • Interpretation of Results - Business Validation
• Cost & optimization functions • Implementation on new data and Tracking the model
Data Science using Python –Machine Learning (2/3)
Total Duration: 39 hours live training + Practice
Supervised Learning: Classification & Regression Supervised Learning: Classification & Regression Problems using UnSupervised Learning: Segmentation problems using
Problems using Ensemble Learning Bayesian Techniques Cluster analysis
• What is concept of Ensemble learning (Stacking, • Fundamentals of Probability; Conditional and Marginal • K-Means/K-Medians Clustering
Probability; Bayes Theorem and Its Applications
Mixture of Experts)? • Density Based clustering (DBSCAN)
• Probabilities - The Prior and Posterior Probabilities
• Types of ensemble models (homogeneous, • Identifying number of segments (Pseudo F-value,
• Bayesian Belief nets, MAP, Naïve Rule and Naïve Bayes
heterogeneous) • Naïve Bayes for classification - Data Processing - Discretization Silhouette score, elbow method etc.)
• Logic, Practical Applications of Features • Cluster evaluation and profiling
• Ensemble learning techniques • Applications of Naïve Bayes in Text Mining, Spam Engines and • Identifying the characteristics of segmentation
• Bagging Classifications • Interpretation of results - Implementation on new data
• Random Forest • Model Building, Validation and Evaluation of model • Overview of other unsupervised learning techniques
• Boosting • Pros/Cons of Naïve Bayes Models (Factor analysis, Hidden Markov models, Gaussian
• AdaBoost mixture models etc.)
• Gradient Boost Supervised Learning: Regression & Classification problems using
• XGBoost Support Vector Machines
• What is Support vector machines?
• Fine tuning the model using tuning parameters
• Understanding SVM
• Concepts of Linearly separable vs. non separable data
Supervised Learning: Classification & Regression • Mathematical Intuition (Kernel Methods Revisited, Quadratic
Problems using KNN Optimization and Soft Constraints)
• What is concept of Instance based learning? • Train/Test/Tune the Model using SVM
• What is KNN? • Applications and Interpretation
• KNN method for regression & classification
• KNN method for missing imputation UnSupervised Learning: Segmentation problems using Cluster
• Computation of Distance Matrix analysis
• The Optimum K value • Introduction to Segmentation
• Model Building, validation & Evaluation of Model • Types of Segmentation (Subjective Vs. Objective, Heuristic Vs.
Statistical)
• Advantages & Disadvantages of KNN Models
• Heuristic Segmentation Techniques (Value Based, RFM
• Applications of KNN in collaborative filtering, digit
Segmentation and Life Stage Segmentation)
recognition • Concept of Distance and related math background
• KNN in collaborative filtering, digit recognition • Segmentation Techniques
Data Science using Python –Machine Learning (3/3)
Total Duration: 39 hours live training + Practice
Supervised Learning: Forecasting problems using Time Supervised Learning: Forecasting problems using Time Supervised Learning: Regression & Classification
Series Analysis Series Analysis problems using Neural Networks
• Motivation for Neural Networks and Its
Forecasting overview Trend Based Time Series Applications
• What is forecasting? • Linear Regression • Understand Neural Networks
• Applications of forecasting • Double exponential smoothening (Holt’s Method) • Structure of Networks
• Comparison between Regression & DS • Perceptron and Single Layer Neural Network,
Basics of Time Series
and Hand Calculations
• Time Series Components( Trend, Seasonality, Cyclicity Seasonal Time Series
• Learning In a Multi Layered Neural Net: Back
and Level) and Decomposition • Decomposition - CMA Method
• Types of Seasonality (Hourly, daily, weekly, monthly, Propagation and Conjugant Gradient Techniques
quarterly etc.) Advanced Techniques • The ANN Model
• Classification of Techniques(Pattern based - Pattern • Box Jenkins Methodology • Types of Activate functions
less) • AR, MA, ARMA Models • Train/Test/Tune the ANN Model
• Important terminology: lag, lead, Stationary, stationary • ARIMA/SARIMA
tests, auto correlation & white noise, ACF & PACF plots, • ARIMAX, SARIMAX
auto regression, differencing
• Classification of Time Series Techniques (Univariate & Evaluation of Forecasting
Multivariate) • Understanding Forecasting Accuracy
• Goodness Metrics: MSE, MAPE, RMSE, MAD
Supervised Learning: Forecasting problems using Time
Series Analysis
Stationary Time Series Methods
• Moving Averages
• Weighted moving averages
• Exponential Smoothening
• Comparison between MA & ES
Data Science using Python –Text Mining NLP/NLG
Total Duration: 9 hours live training + Practice
Introduction to Text Mining Initial data processing and simple statistical tools Advanced data processing and visualisation
• Unstructured vs. Semi-structured Data • Fundamentals of information retrieval • Data-centric methods
• Text Mining - characteristics, trends • Reading data from file folder/from text file, from the Internet & • K-means
Web scrapping, Data Parsing
• Domain presentation - discussion of various areas and • Classification Models (spam detection, topic modelling)
• Cleaning and normalization of data
their applications • K Nearest Neighbours
• Sentence Tokenize and Word Tokenize, Removing
• Programming languages designated for working on Text insignificant words,(“stop words”), Removing special • SVM (Linear Support Vector Classifier)
Mining analysis symbols, removing bullet points and digits, changing • Naive Bayes
• Data Scientist - a profession comprising mainly of letters to lowercase, stemming /lemmatisation • Decision tree
working with Text Mining /chunking • Semantic similarity between texts
• Social Media – Characteristics of Social Media • Creating Term-Document matrix • Language Models and n-grams -- Statistical Models of
• Applications of Social Media Analytics • Finding associations Unseen Data (Smoothing)
• Examples & Actionable Insights using Social Media • Removing rare terms (Sparse terms)
Analytics • Measurement of similarity between documents and terms Final Projects
• Visualization of term significance in the form of word clouds • Sentiment Analysis (Classification, weighted score etc)
• Tagging text with parts of speech
Text Processing using Base Python & Pandas, Regular • Word cloud analysis (Examples)
• Word Sense Disambiguation
Expressions • Segmentation using K-Means/Hierarchical Clustering
• Text processing using string functions & methods Advanced data processing and visualization (Grouping the similar words)
• Understanding regular expressions • Sentiment analysis • Classification (Spam/Not spam)
• Identifying patterns in the text using regular • vocabulary approach, based on Bayesian probability methods • Topic Modeling (LDA, LSA, Louvain etc)
expressions • Name entity recognition (NER) • Text Summarization
• Methods of data visualization
Text Processing with specialized modules like NLTK, • word length counts plot
sklearn etc • word frequency plots
• Getting Started with NLTK • word clouds
• Introduction to NLP & NLTK • correlation plots
• letter frequency plot
• Introduction to NLTK Modules (corpus, tokenize, Stem,
• Heat map
collocations, tag, classify, cluster, tbl, chunk, Parse, ccg,
• Grouping texts using different methods
sem, inference, metrics, app, chat, toolbox etc)
Course completion and career assistance
Course completion & Certification criteria What is included in career assistance?

• You shall be awarded an AnalytixLabs certificate only • Post successful course completion, candidates can seek
post the submission and evaluation of mandatory course assistance from AnalytixLabs for profile building. A team
project work. These will be provided as a part of the of seasoned professionals will help you based on your
training. overall education background and work experience. This
will be followed by interview preparation along with
• There is no pass/fail for these assignments and projects . mock interviews (if required)
Our objective is to ensure that trainees get strong hands-
on experience so that they are well-prepared for job • Job referrals are based on the requirements we get from
interviews along with performance at their jobs. various organizations, HR consultants and large pool of
AnalytixLabs’ ex-students working in various companies.
• Incase the assignments and projects are not up-to-the-
mark, trainees are welcome to take help and support for • No one can truthfully provide job guarantee, particularly
improvisation. for good quality job profiles in Analytics. However, most
of our students do get multiple interview calls and good
• While weekly schedule is shared with trainees for regular career options based on the skills they learn during
assignments, candidates get 3 months, post course training. For this there will be continuous support from
completion, to submit their final assignment and our side for as long as required.
projects.
Time and investment
Full interactive online training: 78 hours live training + Practice (~120 hours),
INR 32,000 + 18% GST / $1200 (foreign nationals) including taxes

Data Science using Python (self-paced): ~78 hours + Practice, INR 27,000 + 18% GST / $900 (foreign nationals)

Timing: 6 hours per weekend live training (Saturday & Sunday 3 hours each) + Practice

Training mode: Fully interactive live online class /Classroom (In Gurgaon and Bangalore center only)
(In addition to the above, you will also get access to the recordings for future reference and self study)

Components: Learning Management System access for courseware like class recordings - study material, Industry-
relevant project work
Suggested combo courses – with great offers!
Advance Big Data Science: This is our advanced Big Data training, where attendees will gain practical skill set not
only on Hadoop in detail, but also learn advanced analytics concepts through Hadoop-Spark, Cloud Computing
and Python with Machine Learning
Courses included: Certified Big Data Expert + Data Science using Python
Training: 148 hours live training + Practice,
INR 57,000 48,000 + 18% GST / $1450 (foreign nationals) - with ~16% Combo discount

Machine Learning and Deep learning Specialization: This is our advanced data science, including Python data
handling, Machine Learning with Python and Artificial Intelligence and Deep learning with Python
Courses included: Data Science using Python + AI and Deep Learning with Python
Training: 110 hours live training + Practice,
INR 57,000 48,000 + 18% GST / $1650 (foreign nationals) - with ~16% Combo discount

Truly a Data Science Expert: Learn the latest Data Science skills with Python, Hadoop-Spark and Cloud Computing
and also learn Deep Learning concepts with Python to move to a true Data Scientist field!
Courses included: Certified Big Data Expert + Data Science with Python + AI & Deep Learning (Python)
Training: 110 hours live training + 66 hours video + Practice,
INR 82,000 60,000 + 18% GST / $1850 (foreign nationals) - with ~27% Combo discount
We provide trainings both in ‘fully interactive live online’ and classroom*
mode
Fully interactive
live online class
with personal
attention
Access to quality
Saves training and 24x7
commuting time practice
and resources in sessions
today’s chaotic available at the
world comfort of your
Ensures place
best use of
time and
Delivered resources
Studies prove
lectures are
that online
recorded and
education beats
can be replayed
the conventional
by individuals as
classroom
per their needs One of strongest
global trends in
education, both
in developing
and developed
countries

*Classroom only available at Gurgaon and Bangalore center


Contact Us

Visit us on: http://www.analytixlabs.in/

For course registration, please visit: http://www.analytixlabs.co.in/course-registration/

For more information, please contact us: http://www.analytixlabs.co.in/contact-us/


Or email: [email protected]
Call us we would love to speak with you: (+91) 9555219007

Join us on:
Twitter - http://twitter.com/#!/AnalytixLabs
Facebook - http://www.facebook.com/analytixlabs
LinkedIn - http://www.linkedin.com/in/analytixlabs
Blog - http://www.analytixlabs.co.in/category/blog/
Visit Us

Gurgaon Address: Bengaluru Address:

GF 382, Sector 29, Bldg 41, First floor,


Adjoining IFFCO Chowk Metro 14th Main Road, Near BDA
Station (Gate 2), complex,
Next to Vasan Eye Care Hospital, Sector 7, HSR Layout
Gurgaon, Haryana 122001, Bengaluru - 560102
India Landmark: Max store

You might also like