Lect1

Uploaded by

shashwat

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lect1

Uploaded by

shashwat

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Python for Data Science

CS E, PEC
About the course

◻ What is data science?

◻ Data all around

◻ Why Python?
Data All Around
Lots of data is being collected
and warehoused
Web data, e-commerce
Financial transactions, bank/credit transactions
Online trading and purchasing
Social Network
Limitations of the File-Based Approach

• Separated and Isolated Data

• Duplication of data
• Data Dependence
• Difficulty in representing data from the user’s
view
• Data Inflexibility
• Incompatible file formats
Big Data
Hype cycle 2014
Hype cycle 2022
What is Data Science?
• An area that manages, manipulates, extracts,
and interprets knowledge from tremendous
amount of data
• Data science (DS) is a multidisciplinary field of
study with goal to address the challenges in
big data
• Mathematics
• Statistics
• Machine learning and artificial intelligence
• Specialized programming
• Data science principles apply to all data – big
and small
What is Data Science?
• Investigate and analyze a large amount of data
to help decision makers
• Science, engineering, economics, politics,
finance, and education
• Computer Science
• Pattern recognition, visualization, data warehousing, High
performance computing, Databases, AI
• Mathematics
• Mathematical Modeling
• Statistics
• Statistical and Stochastic modeling, Probability.
What is data science?
Data science produces insights. Machine learning produces predictions
Why Python
• Python libraries
• Data manipulation and pre-processing
• Data Summary
• Visualization
• ML Libraries
Applications
• Banking: all transactions
• Airlines: reservations, schedules
• Universities: registration, grades
• Sales: customers, products, purchases
• Online retailers: order tracking, customized
recommendations
• Manufacturing: production, inventory, orders, supply
chain
• Human resources: employee records, salaries, tax
deductions
DATA SCIENCE APPLICATION EXAMPLES
Types of Data We Have
• Relational Data (Tables/Transaction/Legacy
Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can afford to scan the data once
Roles
What To Do With These Data?
• Aggregation and Statistics
• Data warehousing and OLAP
• Indexing, Searching, and Querying
• Keyword based search
• Pattern matching (XML/RDF)
• Knowledge discovery
• Data Mining
• Statistical Modeling
What are we going to do with data?

• Descriptive analysis and visualization

• Supervised learning(in particular, regression

and classiﬁcation)

• Unsupervised learning(in particular, clustering

and dimensionality reduction)
Data Acquisition
• Exploring and deﬁning the methods of
obtaining data

• What data is needed to achieve the goal?

• How much data is needed?
• Where and how can this data be found?
• What legal and privacy concerns should be considered?
Data sources

Web scraping
Secondary data

GitHub
Kaggle
KDnuggets
UCI Machine Learning Repository
US Government’s Open Data
Five Thirty Eight
Amazon Web Services
BuzzFeed
Data is Plural
Harvard HCI
Application Programming Interface (API).
HTTP request/response cycle
Types of Secondary data
• Administrative and Monitoring Data

• Geo Spatial Data

• traditional satellites, micro- and nano-satellites and unaccompanied aerial
vehicles (UAVs, e.g. drones).
• Remote Sensing
• sensors, and through the Internet of Things (IoT).
• Telecom Data
• call detail records, social media data
• Crowd-sourced Data
• mobile apps.
Cleaned vs. raw data
• Already cleaned, ﬁltered, and ready to use
Data Cleaning
• Ensuring Valid Analysis
• outliers, missing values, typos, erroneous survey codes, illogical
values, duplicates, etc.
• Making the Dataset Usable and
Understandable
• code and document the dataset to make it as self-explanatory
Data issues
• ID Variables
• uniquely and fully identifiable

• Illogical Values

• Typos

• Survey Codes and Missing Values

• like "Do not know" or "Decline to answer"

Artificial Intelligence For Business Analytics Algorithms, Platforms and Application Scenarios (Felix Weber)
No ratings yet
Artificial Intelligence For Business Analytics Algorithms, Platforms and Application Scenarios (Felix Weber)
146 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Enterprise Resource Planning System
No ratings yet
Enterprise Resource Planning System
4 pages
Defining Data Science
100% (1)
Defining Data Science
167 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
53 pages
Python For Data Science and Machine Learning
100% (2)
Python For Data Science and Machine Learning
31 pages
1c. INTRODUCTION-Data-Science-basic
No ratings yet
1c. INTRODUCTION-Data-Science-basic
31 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Lesson 02 2.01 Introduction To Data Science
No ratings yet
Lesson 02 2.01 Introduction To Data Science
31 pages
1
No ratings yet
1
32 pages
Topic 1 T
No ratings yet
Topic 1 T
20 pages
Data Science
No ratings yet
Data Science
244 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
Lesson - 2 Introduction To Data Science
No ratings yet
Lesson - 2 Introduction To Data Science
29 pages
Data Science With Python - Lesson 01 - Data Science Overview
100% (5)
Data Science With Python - Lesson 01 - Data Science Overview
35 pages
Lec1 - For Upload Complete
No ratings yet
Lec1 - For Upload Complete
111 pages
Data Science Unit I
No ratings yet
Data Science Unit I
13 pages
Foundations of Data Science PPT TEXT BOOK
No ratings yet
Foundations of Data Science PPT TEXT BOOK
132 pages
Data Science: October 2021
No ratings yet
Data Science: October 2021
51 pages
CS3352 - Foundations of Data Science
No ratings yet
CS3352 - Foundations of Data Science
142 pages
Intro To Data Science
No ratings yet
Intro To Data Science
73 pages
Lesson 02 Introduction To Data Science
No ratings yet
Lesson 02 Introduction To Data Science
30 pages
CH-2 Data Science Emerging Technology
No ratings yet
CH-2 Data Science Emerging Technology
20 pages
Introduction to Data Science
No ratings yet
Introduction to Data Science
25 pages
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
No ratings yet
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
6 pages
Unit2 PDS
No ratings yet
Unit2 PDS
17 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Chapter One Data Science
No ratings yet
Chapter One Data Science
4 pages
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
100% (1)
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
75 pages
FDS - UNIT 1
No ratings yet
FDS - UNIT 1
233 pages
1.introduction To Data Science
No ratings yet
1.introduction To Data Science
23 pages
DSOST1
No ratings yet
DSOST1
91 pages
What Is Data Science GDI
0% (1)
What Is Data Science GDI
24 pages
DATA SCIENCE
No ratings yet
DATA SCIENCE
8 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
5_6237938787641463884
No ratings yet
5_6237938787641463884
9 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Data Science: by Neha Tyagi
100% (1)
Data Science: by Neha Tyagi
17 pages
FDS CH1
No ratings yet
FDS CH1
4 pages
Introduction to Data Science
No ratings yet
Introduction to Data Science
29 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
3 pages
Emerging Tech Notes - Module1
No ratings yet
Emerging Tech Notes - Module1
55 pages
Modul1 PPt.pptx
No ratings yet
Modul1 PPt.pptx
56 pages
1 Intro
No ratings yet
1 Intro
33 pages
BCSC 0016 - Emerging Tech (Updatedv3) - 1
No ratings yet
BCSC 0016 - Emerging Tech (Updatedv3) - 1
66 pages
Unit - I - Introduction
No ratings yet
Unit - I - Introduction
77 pages
Module 1
No ratings yet
Module 1
192 pages
Module 1 Applied Data Science 1.1 and 1.2
No ratings yet
Module 1 Applied Data Science 1.1 and 1.2
104 pages
FIT1043 - Lecture 1 - 2024 Data Science
No ratings yet
FIT1043 - Lecture 1 - 2024 Data Science
66 pages
IDS UNIT 1,2,3,4 & 5
No ratings yet
IDS UNIT 1,2,3,4 & 5
117 pages
DSBDA_UNIT1
No ratings yet
DSBDA_UNIT1
232 pages
himadev
No ratings yet
himadev
37 pages
06.11 Week 5, Class1 - Introduction to Data Analytics
No ratings yet
06.11 Week 5, Class1 - Introduction to Data Analytics
13 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
The Data Whisperer - Making Sense of Big Data
From Everand
The Data Whisperer - Making Sense of Big Data
Keaton Rivers
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Data Mining For Business Analytics & Data Analysis In Python
From Everand
Data Mining For Business Analytics & Data Analysis In Python
Book Option
No ratings yet
09-Dictionaries
No ratings yet
09-Dictionaries
33 pages
Course Handout
No ratings yet
Course Handout
2 pages
Conditional Functions Iterations
No ratings yet
Conditional Functions Iterations
107 pages
Basic Strings List Tuples
No ratings yet
Basic Strings List Tuples
90 pages
plastering ppt
No ratings yet
plastering ppt
24 pages
Testing of stone
No ratings yet
Testing of stone
5 pages
Jp Courses
No ratings yet
Jp Courses
2 pages
WPF Data Binding With Entity Framework
No ratings yet
WPF Data Binding With Entity Framework
12 pages
Electricity Bill Automation (ICT) Problem Statement
No ratings yet
Electricity Bill Automation (ICT) Problem Statement
11 pages
Using Dataflow Diagrams: Systems Analysis and Design, 7e Kendall & Kendall
No ratings yet
Using Dataflow Diagrams: Systems Analysis and Design, 7e Kendall & Kendall
38 pages
Sap Bi 7.0 - BW 7.3 - Netweaver 7
No ratings yet
Sap Bi 7.0 - BW 7.3 - Netweaver 7
4 pages
Case Study On SQL Injection
No ratings yet
Case Study On SQL Injection
5 pages
Ss Sys Admin
No ratings yet
Ss Sys Admin
65 pages
TCS-Energise TCS-B@ncs Oracle EBS SAS
No ratings yet
TCS-Energise TCS-B@ncs Oracle EBS SAS
2 pages
1ca05c8c07684470a0b3d1d5047af005
No ratings yet
1ca05c8c07684470a0b3d1d5047af005
388 pages
What Is Meant by Unpacking Columns ?: (X, Y) X y (A, B, C) A B C
No ratings yet
What Is Meant by Unpacking Columns ?: (X, Y) X y (A, B, C) A B C
8 pages
How To Change Directory Within Ubuntu WSL in Windows Format - Stack Overflow
No ratings yet
How To Change Directory Within Ubuntu WSL in Windows Format - Stack Overflow
3 pages
How To Rename A SQL Server Database
No ratings yet
How To Rename A SQL Server Database
5 pages
A Roadmap To Enterprise Data Integration
No ratings yet
A Roadmap To Enterprise Data Integration
32 pages
Tableau Interview Preparation: Q. What Are Some of The Complex Dashboards That You Developed
No ratings yet
Tableau Interview Preparation: Q. What Are Some of The Complex Dashboards That You Developed
3 pages
Define An Accounting Key Flexfield Structure
No ratings yet
Define An Accounting Key Flexfield Structure
9 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
Database Management Systems
No ratings yet
Database Management Systems
7 pages
Mirrored Hardisk Replacement Procedure in Solaris Volume Manager
100% (1)
Mirrored Hardisk Replacement Procedure in Solaris Volume Manager
5 pages
PR 5 - No SQL
No ratings yet
PR 5 - No SQL
9 pages
Now Verify DB - NAME and DB - UNIQUE - NAME of Primary Database
No ratings yet
Now Verify DB - NAME and DB - UNIQUE - NAME of Primary Database
7 pages
SQL Result
No ratings yet
SQL Result
16 pages
Streaming Data and Stream Processing With Apache Kafka ™: David Tucker, Director of Partner Engineering
No ratings yet
Streaming Data and Stream Processing With Apache Kafka ™: David Tucker, Director of Partner Engineering
44 pages
ETL (Extract, Transform, and Load) Process in Data Warehouse
No ratings yet
ETL (Extract, Transform, and Load) Process in Data Warehouse
6 pages
Advanced Data Structures: Different Operations On Doubly Linked List
No ratings yet
Advanced Data Structures: Different Operations On Doubly Linked List
14 pages
Zabbix TimescaleDB
No ratings yet
Zabbix TimescaleDB
59 pages
PBI E-Book
No ratings yet
PBI E-Book
121 pages
22619-Sample-Question-Paper (Msbte Study Resources)
100% (1)
22619-Sample-Question-Paper (Msbte Study Resources)
4 pages
Say Goodbye To VLOOKUP, and Hello To INDEX-MATCH - EImagine Technology Group
No ratings yet
Say Goodbye To VLOOKUP, and Hello To INDEX-MATCH - EImagine Technology Group
2 pages
BRMS Backup
No ratings yet
BRMS Backup
109 pages