SlideShare a Scribd company logo
Python for Data Science
Sankalp Gabbita
Graduate Student-Data Science and Business Analytics
UNC Charlotte
How is Data used?
 The extensive use of data, statistical and quantitative analysis, explanatory
and predictive models, and fact-based management to drive decisions and
actions. (Davenport and Harris 2007)
Data
Analytical Tools Actionable
Knowledge
Unicorn data scientist?
Collecting
Cleaning Explore
Transform
ModellingEvaluate
Inference
Agenda
 Anaconda – Spyder
 Review of NumPy,Pandas- Basic data munging
 Using Matplotlib to make visualizations
 Regression concepts
 Regression – Application( Scikit-Learn)
 Clustering concept
 Clustering Application( K- mean clustering using Scikit-Learn)
SPYDER -Scientific Python Development Environment
 Spyder is an interactive development environment for the Python
language with advanced editing, live testing, and a numerical
computing environment
 Spyder also includes the popular Python library NumPy for linear
algebra, MatPlotLib for interactive 2D/3D graphs, Pandas for
dataset manipulation, and SciKit-Learn for machine learning.
 Code line by line
 Interact and alter scripts
 Code directly in the console
 Spyder is accessible through Anaconda
 https://www.continuum.io/downloads
NumPy- Numerical Computing
 Similar to creation of Matlab array objects
 N-dimensional array objects
 Used for linear algebra, fourier transform, and random number capabilities
 Capable of matrix operations, string operations, and binary operations
 Easy to install and import with single line
 Import numpy as np
 The above code fetches numpy package and it can be used with it’s alias as np
eg., np.array([(2,3),(4,5)])
Pandas- Dataframes
 Creates an efficient dataframe object for data manipulation with integrated
indexing
 Takes input data in many formats: CSV, Excel, SQL databases
 Handles messy and missing data easily
 Slicing, dicing and indexing of large datasets
 Very useful for cleaning the data before applying any algorithm
 Can be imported with single line
 Import pandas as pd
 Eg : pd.read_table(‘—file path in local machine-’)
Matplotlib-Visualization
 Python 2D plotting library to generate quality figures
 Generates plots, histograms, bar charts, scatterplots, etc.,
 Uses NumPy NDArrays to plot graphs
 Full control of font styles , line properties , axes properties, etc.
 Easy to install and import using single line
 Import matplotlib
 Pyplot module is used for simple plotting and provides good interface when
combined with Ipython
Regression
One Dependent Variable Y
Independent Variables X1,X2,X3,...
Y = ß0 + ß1 X(1) + ß2 X(2) + ß3 X(3) + ... + ßk X(k) + E
 Estimate the ß's in multiple regression using least squares
 Sizes of the coefficients not good indicators of importance of X variables
Simple Linear Regression Model
Key Assumptions for Linear Regression
 Linearity
 The dependent variable is a linear combination of independent variables
 Homoscedasticity
 Constant variance in errors
 Normality
 Independence of errors
Logistic Regression
Binary target: linear regression does not work due to
unbounded results
Key Assumptions for Logistic Regression
 Linearity
 Linearity of independent variables and log odds
 Homoscedasticity: no
 Normality: no
 Highly skewed independent variables can still be problematic
 Independence of errors: yes
Clustering
 Cluster analysis is the generic name for a wide variety of procedures that can
be used to create a classification of entities/objects
 It has been referred to as Q analysis, typology construction, classification
analysis, unsupervised pattern recognition, and numerical taxonomy
 A deck of 52 cards can be grouped as:
 26 red and 26 black cards
 13 each of Spades, Hearts, Diamonds, and Clubs
 4 each of Aces, Kings, Queens, and Jacks
A Geometrical view of an ideal pattern
Importance of Price
ImportanceofQuality
Reality
Importance of Price
ImportanceofQuality
How to group them?
Importance of Price
ImportanceofQuality
Importance of Price
ImportanceofQuality
Importance of Price
ImportanceofQuality
Similarity and Distance
 To identify natural groups, we must first define a measure of similarity
(proximity) between objects/entities.
 Assume variables (axes in space) are numeric.
 Then, if two things are similar, they should be close to each other in the
space.
That is, the distance between them should be small.
 But, if two things are dissimilar, they should be well separated from each
other in the space.
That is, the distance between them should be large.
 A collection of similar things would therefore likely result in more
cohesive (homogenous) groups than a collection of dissimilar things.
Dimension1
A
B
K
E
Dimension 2
F
G
H
I
J
D
C
K- Means Clustering
1. Select k cluster centers.
2. Assign cases to closest center.
3. Update cluster centers.
4. Re-assign cases.
5. Repeat steps 3 and 4 until convergence.
Dimension1
A
B
K
E
Dimension 2
F
G
H
I
J
D
C
Dimension1
A
B
K
E
Dimension 2
F
G
H
I
J
D
C
Dimension1
A
B
K
E
Dimension 2
F
G
H
I
J
D
C
Thank You
Ad

More Related Content

What's hot (14)

Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- ReduceBig Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce
ijircee
 
Data visualization using py plot part i
Data visualization using py plot part iData visualization using py plot part i
Data visualization using py plot part i
TutorialAICSIP
 
introduction to Data Structure and classification
 introduction to Data Structure and classification introduction to Data Structure and classification
introduction to Data Structure and classification
chauhankapil
 
Introduction to Data Structure part 1
Introduction to Data Structure part 1Introduction to Data Structure part 1
Introduction to Data Structure part 1
ProfSonaliGholveDoif
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Edureka!
 
hash
 hash hash
hash
tim4911
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbai
Unmesh Baile
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
ANKUSH PAL
 
Introduction to data structure
Introduction to data structureIntroduction to data structure
Introduction to data structure
adeel hamid
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in Julia
Jiahao Chen
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)
Yueshen Xu
 
Introductiont To Aray,Tree,Stack, Queue
Introductiont To Aray,Tree,Stack, QueueIntroductiont To Aray,Tree,Stack, Queue
Introductiont To Aray,Tree,Stack, Queue
Ghaffar Khan
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning term
S Rulez
 
SLIQ
SLIQSLIQ
SLIQ
Sara Alaee
 
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- ReduceBig Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce
ijircee
 
Data visualization using py plot part i
Data visualization using py plot part iData visualization using py plot part i
Data visualization using py plot part i
TutorialAICSIP
 
introduction to Data Structure and classification
 introduction to Data Structure and classification introduction to Data Structure and classification
introduction to Data Structure and classification
chauhankapil
 
Introduction to Data Structure part 1
Introduction to Data Structure part 1Introduction to Data Structure part 1
Introduction to Data Structure part 1
ProfSonaliGholveDoif
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Edureka!
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbai
Unmesh Baile
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
ANKUSH PAL
 
Introduction to data structure
Introduction to data structureIntroduction to data structure
Introduction to data structure
adeel hamid
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in Julia
Jiahao Chen
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)
Yueshen Xu
 
Introductiont To Aray,Tree,Stack, Queue
Introductiont To Aray,Tree,Stack, QueueIntroductiont To Aray,Tree,Stack, Queue
Introductiont To Aray,Tree,Stack, Queue
Ghaffar Khan
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning term
S Rulez
 

Similar to Python for data science (20)

Skytree big data london meetup - may 2013
Skytree   big data london meetup - may 2013Skytree   big data london meetup - may 2013
Skytree big data london meetup - may 2013
bigdatalondon
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
TrainerAnalogicx
 
fds u1.docx
fds u1.docxfds u1.docx
fds u1.docx
GaneshPawar819187
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
Anshik Bansal
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
Ritesh Sawant
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Vijay Srinivas Agneeswaran, Ph.D
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
Kevin Crocker
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
598_RamaSrikanthJakkam_CEE
598_RamaSrikanthJakkam_CEE598_RamaSrikanthJakkam_CEE
598_RamaSrikanthJakkam_CEE
Rama Srikanth Jakkam
 
603_SaiKiranPutta_CEE
603_SaiKiranPutta_CEE603_SaiKiranPutta_CEE
603_SaiKiranPutta_CEE
Sai Kiran Putta
 
662_AravindKumarN_CEE
662_AravindKumarN_CEE662_AravindKumarN_CEE
662_AravindKumarN_CEE
ARAVIND KUMAR N
 
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
akira-ai
 
587_EswarPrasadReddyMachireddy_CEE
587_EswarPrasadReddyMachireddy_CEE587_EswarPrasadReddyMachireddy_CEE
587_EswarPrasadReddyMachireddy_CEE
Eswar prasad Reddy Machireddy
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
19445KNithinbabu
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
TanujaSomvanshi1
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-Learn
Ducat India
 
Skytree big data london meetup - may 2013
Skytree   big data london meetup - may 2013Skytree   big data london meetup - may 2013
Skytree big data london meetup - may 2013
bigdatalondon
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
Anshik Bansal
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
Ritesh Sawant
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
Kevin Crocker
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
akira-ai
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
TanujaSomvanshi1
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-Learn
Ducat India
 
Ad

More from botsplash.com (15)

Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresql
botsplash.com
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
botsplash.com
 
Devops Days, 2019 - Charlotte
Devops Days, 2019 - CharlotteDevops Days, 2019 - Charlotte
Devops Days, 2019 - Charlotte
botsplash.com
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
botsplash.com
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
botsplash.com
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
botsplash.com
 
Chat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital MarketingChat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital Marketing
botsplash.com
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing options
botsplash.com
 
Data Science meets Digital Marketing
Data Science meets Digital MarketingData Science meets Digital Marketing
Data Science meets Digital Marketing
botsplash.com
 
botsplash deep dive
botsplash deep divebotsplash deep dive
botsplash deep dive
botsplash.com
 
Building Twitter bot using Python
Building Twitter bot using PythonBuilding Twitter bot using Python
Building Twitter bot using Python
botsplash.com
 
Live development & tools
Live development & toolsLive development & tools
Live development & tools
botsplash.com
 
AI Use Cases discussion
AI Use Cases discussionAI Use Cases discussion
AI Use Cases discussion
botsplash.com
 
Career advice for beginner software engineers
Career advice for beginner software engineersCareer advice for beginner software engineers
Career advice for beginner software engineers
botsplash.com
 
Node.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best PracticesNode.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best Practices
botsplash.com
 
Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresql
botsplash.com
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
botsplash.com
 
Devops Days, 2019 - Charlotte
Devops Days, 2019 - CharlotteDevops Days, 2019 - Charlotte
Devops Days, 2019 - Charlotte
botsplash.com
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
botsplash.com
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
botsplash.com
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
botsplash.com
 
Chat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital MarketingChat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital Marketing
botsplash.com
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing options
botsplash.com
 
Data Science meets Digital Marketing
Data Science meets Digital MarketingData Science meets Digital Marketing
Data Science meets Digital Marketing
botsplash.com
 
Building Twitter bot using Python
Building Twitter bot using PythonBuilding Twitter bot using Python
Building Twitter bot using Python
botsplash.com
 
Live development & tools
Live development & toolsLive development & tools
Live development & tools
botsplash.com
 
AI Use Cases discussion
AI Use Cases discussionAI Use Cases discussion
AI Use Cases discussion
botsplash.com
 
Career advice for beginner software engineers
Career advice for beginner software engineersCareer advice for beginner software engineers
Career advice for beginner software engineers
botsplash.com
 
Node.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best PracticesNode.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best Practices
botsplash.com
 
Ad

Recently uploaded (20)

Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 

Python for data science

  • 1. Python for Data Science Sankalp Gabbita Graduate Student-Data Science and Business Analytics UNC Charlotte
  • 2. How is Data used?  The extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions. (Davenport and Harris 2007) Data Analytical Tools Actionable Knowledge
  • 5. Agenda  Anaconda – Spyder  Review of NumPy,Pandas- Basic data munging  Using Matplotlib to make visualizations  Regression concepts  Regression – Application( Scikit-Learn)  Clustering concept  Clustering Application( K- mean clustering using Scikit-Learn)
  • 6. SPYDER -Scientific Python Development Environment  Spyder is an interactive development environment for the Python language with advanced editing, live testing, and a numerical computing environment  Spyder also includes the popular Python library NumPy for linear algebra, MatPlotLib for interactive 2D/3D graphs, Pandas for dataset manipulation, and SciKit-Learn for machine learning.  Code line by line  Interact and alter scripts  Code directly in the console  Spyder is accessible through Anaconda  https://www.continuum.io/downloads
  • 7. NumPy- Numerical Computing  Similar to creation of Matlab array objects  N-dimensional array objects  Used for linear algebra, fourier transform, and random number capabilities  Capable of matrix operations, string operations, and binary operations  Easy to install and import with single line  Import numpy as np  The above code fetches numpy package and it can be used with it’s alias as np eg., np.array([(2,3),(4,5)])
  • 8. Pandas- Dataframes  Creates an efficient dataframe object for data manipulation with integrated indexing  Takes input data in many formats: CSV, Excel, SQL databases  Handles messy and missing data easily  Slicing, dicing and indexing of large datasets  Very useful for cleaning the data before applying any algorithm  Can be imported with single line  Import pandas as pd  Eg : pd.read_table(‘—file path in local machine-’)
  • 9. Matplotlib-Visualization  Python 2D plotting library to generate quality figures  Generates plots, histograms, bar charts, scatterplots, etc.,  Uses NumPy NDArrays to plot graphs  Full control of font styles , line properties , axes properties, etc.  Easy to install and import using single line  Import matplotlib  Pyplot module is used for simple plotting and provides good interface when combined with Ipython
  • 10. Regression One Dependent Variable Y Independent Variables X1,X2,X3,... Y = ß0 + ß1 X(1) + ß2 X(2) + ß3 X(3) + ... + ßk X(k) + E  Estimate the ß's in multiple regression using least squares  Sizes of the coefficients not good indicators of importance of X variables
  • 12. Key Assumptions for Linear Regression  Linearity  The dependent variable is a linear combination of independent variables  Homoscedasticity  Constant variance in errors  Normality  Independence of errors
  • 13. Logistic Regression Binary target: linear regression does not work due to unbounded results
  • 14. Key Assumptions for Logistic Regression  Linearity  Linearity of independent variables and log odds  Homoscedasticity: no  Normality: no  Highly skewed independent variables can still be problematic  Independence of errors: yes
  • 15. Clustering  Cluster analysis is the generic name for a wide variety of procedures that can be used to create a classification of entities/objects  It has been referred to as Q analysis, typology construction, classification analysis, unsupervised pattern recognition, and numerical taxonomy  A deck of 52 cards can be grouped as:  26 red and 26 black cards  13 each of Spades, Hearts, Diamonds, and Clubs  4 each of Aces, Kings, Queens, and Jacks
  • 16. A Geometrical view of an ideal pattern Importance of Price ImportanceofQuality
  • 18. How to group them? Importance of Price ImportanceofQuality Importance of Price ImportanceofQuality Importance of Price ImportanceofQuality
  • 19. Similarity and Distance  To identify natural groups, we must first define a measure of similarity (proximity) between objects/entities.  Assume variables (axes in space) are numeric.  Then, if two things are similar, they should be close to each other in the space. That is, the distance between them should be small.  But, if two things are dissimilar, they should be well separated from each other in the space. That is, the distance between them should be large.  A collection of similar things would therefore likely result in more cohesive (homogenous) groups than a collection of dissimilar things.
  • 20. Dimension1 A B K E Dimension 2 F G H I J D C K- Means Clustering 1. Select k cluster centers. 2. Assign cases to closest center. 3. Update cluster centers. 4. Re-assign cases. 5. Repeat steps 3 and 4 until convergence. Dimension1 A B K E Dimension 2 F G H I J D C Dimension1 A B K E Dimension 2 F G H I J D C Dimension1 A B K E Dimension 2 F G H I J D C