0% found this document useful (0 votes)
12 views

INTRODUCTION TO DATA SCIENCE

This document provides an introduction to Data Science, covering its definition, importance, applications, challenges, tools, and components. It emphasizes the interdisciplinary nature of Data Science, its reliance on statistics and computer science, and its role in extracting insights from large data sets. Additionally, it discusses the quality and categorization of data, highlighting the significance of structured and unstructured data in the field.

Uploaded by

shahom2221.lic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

INTRODUCTION TO DATA SCIENCE

This document provides an introduction to Data Science, covering its definition, importance, applications, challenges, tools, and components. It emphasizes the interdisciplinary nature of Data Science, its reliance on statistics and computer science, and its role in extracting insights from large data sets. Additionally, it discusses the quality and categorization of data, highlighting the significance of structured and unstructured data in the field.

Uploaded by

shahom2221.lic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 89

INTRODUCTION TO DATA SCIENCE

(Unit-1)

Directorate of Online Education,SRMIST-2022


Objective
 On completion of this session all learners will be able

 to define and explain the term Data Science

 To understand Why Data Science is needed

 Learn Applications-Challenges -Tools-Components

 Learn Data-Quality-Nature-Categorization

 Learn Data handling- Types – Process-Job Roles

03/01/2025 2 Directorate of Online Education,SRMIST-2022


Introduction to Data Science

 The term Data Science has emerged because of the evolution

of mathematical statistics, data analysis, and big data.

 Data Science is the area of study which involves extracting

insights from vast amounts of data using various scientific

methods, algorithms, and processes.

03/01/2025 3 Directorate of Online Education,SRMIST-2022


Introduction to Data Science

 Data Science helps you to discover hidden patterns from the

raw data.

 Data Science is an interdisciplinary field that allows you to

extract knowledge from structured or unstructured data.

03/01/2025 4 Directorate of Online Education,SRMIST-2022


Introduction to Data Science

 Theories and techniques from many fields and disciplines are

used to investigate and analyze a large amount of data to help

decision makers in many industries such as science,

engineering, economics, politics, finance, and education

03/01/2025 5 Directorate of Online Education,SRMIST-2022


Introduction to Data Science

 Computer Science

 Pattern recognition

 Visualization

 Data warehousing, High performance computing, DB, AI

 Mathematics

03/01/2025  Mathematical Modeling 6 Directorate of Online Education,SRMIST-2022


Introduction to Data Science

 Statistics

 Statistical modeling

 Stochastic modeling

 Probability

03/01/2025 7 Directorate of Online Education,SRMIST-2022


Introduction to Data Science

03/01/2025 8 Directorate of Online Education,SRMIST-2022


Introduction to Data Science

03/01/2025 9 Directorate of Online Education,SRMIST-2022


WHY DATA SCIENCE ?

03/01/2025 10 Directorate of Online Education,SRMIST-2022


Why Data Science? - Merits

 Data is the oil for today’s world

 With the right tools, technologies, algorithms, we can use data

and convert it into a distinct business advantage

 Data Science can help you to detect fraud using advanced

machine learning algorithms

03/01/2025 11 Directorate of Online Education,SRMIST-2022


Why Data Science? - Merits

 It helps you to prevent any significant monetary losses

 Allows to build intelligence ability in machines

 You can perform sentiment analysis to gauge customer brand

loyalty

03/01/2025 12 Directorate of Online Education,SRMIST-2022


Why Data Science? - Merits

 It enables you to take better and faster decisions

 It helps you to recommend the right product to the right

customer to enhance your business

03/01/2025 13 Directorate of Online Education,SRMIST-2022


Data Science
 Lots of data is being collected and warehoused

 Scientific Experiments

 Internet of Things

 Web data

03/01/2025 14 Directorate of Online Education,SRMIST-2022


Data Science

 e-commerce

 Financial transactions, bank/credit transactions

 Online trading and purchasing

 Social Network

 …many more!

03/01/2025 15 Directorate of Online Education,SRMIST-2022


Data Science
 Big Data are data sets so large or so complex that traditional

methods of storing, accessing, and analyzing their breakdown

are too expensive

 However, there is a lot of potential value hidden in this data

 so organizations are eager to harness it to drive innovation and

03/01/2025
competitive advantage 16 Directorate of Online Education,SRMIST-2022
Big Data
 Big Data technologies and approaches are used to drive value

out of data rich environments in ways that traditional analytics

tools and methods cannot

03/01/2025 17 Directorate of Online Education,SRMIST-2022


Big Data

03/01/2025 18 Directorate of Online Education,SRMIST-2022


APPLICATIONS OF
DATA SCIENCE

03/01/2025 19 Directorate of Online Education,SRMIST-2022


Applications of Data Science

 Searching in the Internet

 Online price comparison

 Gaming industries

 Image/Speech Recognition

 Recommendation System

03/01/2025 20 Directorate of Online Education,SRMIST-2022


Searching in the Internet

 Uses Data Science to search for a specific result within a

fraction of a second

 Search Engines - Google, Yahoo, Bing, Ask, AOL, etc.,

03/01/2025 21 Directorate of Online Education,SRMIST-2022


Online price comparison

 Websites like PriceRunner, Junglee, Shopzilla uses Data Science

mechanism.

 Data is fetched from relevant websites using APIs

03/01/2025 22 Directorate of Online Education,SRMIST-2022


Gaming industries

 Games are now designed using machine learning algorithms

that improve/upgrade themselves as the player moves up to a

higher level

 In motion gaming also, your opponent (computer) analyzes

your previous moves and accordingly shapes up its game

03/01/2025 23 Directorate of Online Education,SRMIST-2022


Image / Speech Recognition

 Speech recognizes systems like Siri, Google Assistant, and Alexa

run on the Data science technique

 Facebook recognizes your friend when you upload a photo

with them, with the help of Data Science

03/01/2025 24 Directorate of Online Education,SRMIST-2022


Recommendation System

 Friend’s suggestion in Facebook

 Video suggestions in YouTube, Instagram

 Suggestions from Amazon, Flipkart etc.,

03/01/2025 25 Directorate of Online Education,SRMIST-2022


Applications of Data Science

03/01/2025 26 Directorate of Online Education,SRMIST-2022


CHALLENGES OF DATA
SCIENCE

03/01/2025 27 Directorate of Online Education,SRMIST-2022


Challenges of Data Science

 A high variety of information & data is required for accurate

analysis

 Not adequate data science talent pool available

 Management does not provide financial support for a data

science team

03/01/2025 28 Directorate of Online Education,SRMIST-2022


Challenges of Data Science

 Unavailability of/difficult access to data

 Business decision-makers do not effectively use data Science

results

 Explaining data science to others is difficult

03/01/2025 29 Directorate of Online Education,SRMIST-2022


Challenges of Data Science

 Privacy issues

 If an organization is very small, it can’t have a Data Science

team

 Lack of significant domain expert

03/01/2025 30 Directorate of Online Education,SRMIST-2022


TOOLS FOR DATA
SCIENCE

03/01/2025 31 Directorate of Online Education,SRMIST-2022


Tools for Data Science

03/01/2025 32 Directorate of Online Education,SRMIST-2022


Tools for Data Science
● Data Analysis
○ R
○ Spark
○ Python
○ SAS

● Data Warehousing
○ Hadoop
○ SQL
○ Hive

03/01/2025 33 Directorate of Online Education,SRMIST-2022


Tools for Data Science
● Data Visualization
○ R
○ Tableau
○ Raw

● Machine Learning
○ Spark
○ Azure ML studio
○ Mahout

03/01/2025 34 Directorate of Online Education,SRMIST-2022


COMPONENTS OF DATA
SCIENCE

03/01/2025 35 Directorate of Online Education,SRMIST-2022


Components of Data Science

 Statistics

 Visualization

 Machine Learning

 Deep Learning

03/01/2025 36 Directorate of Online Education,SRMIST-2022


Deep Learning:

 Deep Learning method is new machine learning research

where the algorithm selects the analysis model to follow.

03/01/2025 37 Directorate of Online Education,SRMIST-2022


Deep Learning:

03/01/2025 38 Directorate of Online Education,SRMIST-2022


Machine Learning vs Data
Science
 Deep Learning method is new machine learning research

where the algorithm selects the analysis model to follow.

03/01/2025 39 Directorate of Online Education,SRMIST-2022


Final thoughts

 Big Data has given rise to Data Science

 Data science is rooted in solid foundations of mathematics and

statistics, computer science, and domain knowledge

 Not every thing with data or science is Data Science!

03/01/2025 40 Directorate of Online Education,SRMIST-2022


DATA

03/01/2025 41 Directorate of Online Education,SRMIST-2022


Data
• A collection of information in the form of numerical figures is called
DATA
• Example
• Marks of 5 students 88, 98, 56, 76, 74

03/01/2025 42 Directorate of Online Education,SRMIST-2022


Raw Data
• When information is collected and presented randomly then it is
called 'raw data'

03/01/2025 43 Directorate of Online Education,SRMIST-2022


Range
• It is the difference between the highest and lowest values in the data
collection

03/01/2025 44 Directorate of Online Education,SRMIST-2022


Representation of data
• For easy understanding and meaningful comparison the ‘raw data’
will be represented in
• ‘tabular form’ or
• ‘pictorial form’ or
• ‘graphical form’ and
• in many other forms

03/01/2025 45 Directorate of Online Education,SRMIST-2022


Pictorial form

03/01/2025 46 Directorate of Online Education,SRMIST-2022


Pictorial form
• A cricketer scores the following runs in 10 matches

03/01/2025 47 Directorate of Online Education,SRMIST-2022


QUALITY OF DATA

03/01/2025 48 Directorate of Online Education,SRMIST-2022


Quality of Data
• The value of almost anything and everything is directly proportional
to its level of quality and higher quality is equal to higher value.

• The general areas of data quality include:


• Accuracy
• Completeness
• Update status
• Relevance
• Consistency (across sources)
• Reliability
• Appropriateness
• Accessibility
03/01/2025 49 Directorate of Online Education,SRMIST-2022
Data quality
• The quality of data can be affected by the way it is entered, stored,
and managed and the process of addressing data quality
• requires a routine and regular review and evaluation of the data, and
• performing on going processes termed profiling and scrubbing
• “Data profiling is the process of examining the data available in an
existing data source (e.g. a database or a file) and collecting statistics
and information about that data”
• “Data scrubbing is the process of detecting and
correcting (or removing) corrupt or inaccurate
records from a record set, table, or database.”
03/01/2025 50 Directorate of Online Education,SRMIST-2022
Types of Data

03/01/2025 51 Directorate of Online Education,SRMIST-2022


NATURE OF DATA

03/01/2025 52 Directorate of Online Education,SRMIST-2022


Nature of Data -Two Types of Data
• Quantitative Data – values that answer questions about the quantity
or amount (with units) of what is being measured.
• Examples: income ($), height (inches), weight (pounds)
• Categorical Data – (qualitative data) can be separated into different
categories that are often distinguished by some nonnumeric
characteristic
• Examples: sex, race, ethnicity, zip codes
• Note: zip codes as categorical data? I thought they were numbers…

03/01/2025 53 Directorate of Online Education,SRMIST-2022


Categorical vs. Quantitative - You
Decide!
• Length of a song
• Responses in an opinion poll
• Telephone Number
• Income of college graduates
• The genders (male/female) of college graduates

03/01/2025 54 Directorate of Online Education,SRMIST-2022


Levels of Measurement
• Nominal – characterized by data that consist of names, labels, or
categories only
• The data cannot be arranged in an ordering scheme (such as high to
low)
• Example: survey responses of yes, no, and undecided

03/01/2025 55 Directorate of Online Education,SRMIST-2022


Levels of Measurement
• Ordinal – can be arranged in some order, but the differences between
the data values either cannot be determined or are meaningless
Example:
• grade letters (A, B, C, D, F);
• movie ratings (1, 2, 3, 4, 5) – while you can find the difference
between the ratings, it is meaningless.
• The difference of 1 or 2 is meaningless, because it cannot be
compared to other similar differences.

03/01/2025 56 Directorate of Online Education,SRMIST-2022


Levels of Measurement (continued)
• Interval – similar to the ordinal level, but the difference between any
two data values is meaningful.
• However, there is no natural zero starting point (where none of the
quantity is present).
• Example: temperatures (while 0° F seems like a good starting point, it
isn't necessarily)
• Ratio –similar to the interval, but has a natural zero starting point
(where zero indicates none of the quantity is present) • Differences
and ratios are meaningful
• Example: weights of adult humans, prices of jeans
03/01/2025 57 Directorate of Online Education,SRMIST-2022
Levels of Measurement – YOU
DECIDE!
• Body temperature in degrees Fahrenheit of a swimmer
• Collection of phone numbers
• Heart rate (beats per minute) of an athlete.

03/01/2025 58 Directorate of Online Education,SRMIST-2022


CATEGORIZATION OF
DATA

03/01/2025 59 Directorate of Online Education,SRMIST-2022


Categorization of Data
• The process of categorization helps us to gain an understanding of the
data source.
• Industry commonly categorizes big data this way–into the two groups
(structured and unstructured)
• but the categorizing doesn't stop there.

03/01/2025 60 Directorate of Online Education,SRMIST-2022


Structured Data vs Unstructured
Data
• Structured data includes subcategories such as
1. created,
2. provoked,
3. transactional,
4. compiled, and
5. experimental,
• while unstructured data includes subcategories such as
1. captured
2. submitted

03/01/2025 61 Directorate of Online Education,SRMIST-2022


1. Created data
• This is the data being created for a purpose;
• such as focus group surveys or asking website users to establish an
account on the site (rather than allowing anonymous access).

03/01/2025 62 Directorate of Online Education,SRMIST-2022


2. Provoked data
• This is described as data received after some form of provoking,
• perhaps such as providing someone with the opportunity to express
the individual's personal view on a topic,
• such as customers filling out product review forms.

03/01/2025 63 Directorate of Online Education,SRMIST-2022


3. Transactional data
• This is data that is described as database transactions,
• for example, the record of a sales transaction.

03/01/2025 64 Directorate of Online Education,SRMIST-2022


4. Compiled data
• This data described as information collected (or compiled) on a
particular topic such as credit scores.

03/01/2025 65 Directorate of Online Education,SRMIST-2022


5. Experimental data
• when someone experiments with data and/or sources of data to
explore potential new insights.
• For example, combining or relating sales transactions to marketing
and
• promotional information to determine a (potential) correlation.

03/01/2025 66 Directorate of Online Education,SRMIST-2022


Unstructured Data
1. Captured data

• This is the data created passively due to a person's behavior


• like when you enter a search term on Google - perhaps the creepiest
data of all!

03/01/2025 67 Directorate of Online Education,SRMIST-2022


2. User-generated data
• This is the data generated every second by individuals, such as from
Twitter, Facebook, YouTube, and so on
• compared to captured data, this is data you willingly create or put out
there.

03/01/2025 68 Directorate of Online Education,SRMIST-2022


Final words on Data
• To sum up, big data comes with no common or expected format
• and the time required to impose a structure on the data has proven
to be no longer worth it

03/01/2025 69 Directorate of Online Education,SRMIST-2022


DATA HANDLING

03/01/2025 70 Directorate of Online Education,SRMIST-2022


Data Handling

 Data handling refers to the process of gathering, recording and

presenting information in a way that is helpful to others - for

instance, in graphs or charts

 Data handling is also sometimes known as statistics and you

will often come across it in the study of both Maths and

03/01/2025 Science 71 Directorate of Online Education,SRMIST-2022


This includes

 Collecting data using a planned methodology.

 Recording data with precision and accuracy.

 Analysing data to draw conclusions.

 Sharing data in a way which is useful to others.

03/01/2025 72 Directorate of Online Education,SRMIST-2022


Examples of Data Handling

 Taking stock of the remaining items in an inventory.

 Creating a tally chart of what colour eyes class-mates have.

 Drawing a pie chart to show how many boys and girls there are

in a class.

03/01/2025 73 Directorate of Online Education,SRMIST-2022


Examples of Data Handling

 Making a bar chart to show different people’s favourite

colours.

 Finding the mean, mode and median of a data set.

 The National Census

 Voter Polls

 Online
03/01/2025 Marketing Surveys 74 Directorate of Online Education,SRMIST-2022
Types of Data Handling

 Bar Graph (Vertical and Horizontal)

 Pictograph

 Line Graph

 Pie Chart

 Scatter Plot

 Box
03/01/2025 Plot, etc., 75 Directorate of Online Education,SRMIST-2022
Types of Data Handling

 Bar Graph (Vertical and Horizontal)

 Pictograph

 Line Graph

 Pie Chart

 Scatter Plot

 Box
03/01/2025 Plot, etc., 76 Directorate of Online Education,SRMIST-2022
DATA SCIENCE
PROCESS

03/01/2025 77 Directorate of Online Education,SRMIST-2022


Data Science Process

03/01/2025 78 Directorate of Online Education,SRMIST-2022


Discovery
• Involves acquiring data from all the identified internal & external
sources
• This helps you answer the business question

03/01/2025 79 Directorate of Online Education,SRMIST-2022


Preparation
• Data can have many inconsistencies like
• missing values
• blank columns
• an incorrect data format, which needs to be cleaned
• You need to process, explore, and condition data before modelling
• Cleaner your data, the better are your predictions

03/01/2025 80 Directorate of Online Education,SRMIST-2022


Model Planning
• In this stage, you need to determine the method and technique to
draw the relation between input variables
• Planning for a model is performed by using different statistical
formulas and visualization tools
• SQL analysis services, R, and SAS/access are some of the tools used
for this purpose

03/01/2025 81 Directorate of Online Education,SRMIST-2022


Model Building
• In this step, the actual model building process starts
• Here, Data scientist distributes datasets for training and testing
• Techniques like association, classification, and clustering are applied
to the training data set
• The model, once prepared, is tested against the “testing” dataset

03/01/2025 82 Directorate of Online Education,SRMIST-2022


Operationalize
• You deliver the final baselined model with reports, code, and
technical documents in this stage
• Model is deployed into a real-time production environment after
thorough testing

03/01/2025 83 Directorate of Online Education,SRMIST-2022


Communicate Results
• In this stage, the key findings are communicated to all stakeholders
• This helps you decide if the project results are a success or a failure
based on the inputs from the model.

03/01/2025 84 Directorate of Online Education,SRMIST-2022


JOB Roles

03/01/2025 85 Directorate of Online Education,SRMIST-2022


Job Roles – Data Science

• Data Scientist • Data Architect

• Data Engineer • Data Admin

• Data Analyst • Business Analyst

• Statistician • Data/Analytics Manager

03/01/2025 86 Directorate of Online Education,SRMIST-2022


Break
For
Questions

03/01/2025 87 Directorate of Online Education,SRMIST-2022


Summary
 In this session we have learnt and understood the challenges of

Data Science

03/01/2025 88 Directorate of Online Education,SRMIST-2022


Thank you

03/01/2025 89 Directorate of Online Education,SRMIST-2022

You might also like