BCA Lecture I

The document provides an overview of data science, its processes, and its significance in deriving insights from big data. It discusses the types and characteristics of big data, applications across various industries, and the challenges faced in managing it. Additionally, it explains the relationship between data science, machine learning, and artificial intelligence, detailing the skills required for data scientists and the steps involved in the data science process.

Uploaded by

namrata.valecha

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

BCA Lecture I

Uploaded by

namrata.valecha

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Science:

Introduction &
Process Overview
Important Terms
• Data Science: Data Science represents optimization of
processes and resources. It produces data insights, actionable,
data informed conclusions or predictions that you can use to
understand and improve business, investments, health and
lifestyle.
• Big data: Big Data is a term used to describe a collection of
data that is huge in volume and yet growing exponentially with
time. Massive volumes of data are simply termed as Big Data.
• Example: Spotify, an on-demand music providing platform, uses Big
Data Analytics, collects data from all its users around the globe, and
then uses the analyzed data to give informed music recommendations
and suggestions to every individual user.
Types of Big data
1. Structured data: Any data that can be stored,
accessed and processed in the form of fixed format is
termed as a structured data.
2. Unstructured data: The data which have unknown
form and cannot be stored in RDBMS and cannot be
analyzed unless it is transformed into a structured
format is called as unstructured data. This type poses
multiple challenges in terms of its processing for
deriving value out of it. Text Files and multimedia
contents like images, audios, and videos are example of
unstructured data.
3. Semi Structured data: Semi-structured data can
Characteristics of Big data
1. Volume: The name Big Data itself is related to a size which is
enormous. Size of data plays a very crucial role in determining
value out of data.
2. Variety: Variety refers to heterogeneous sources and the nature
of data, both structured and unstructured. Data is in the form of
emails, photos, videos, monitoring devices, PDFs, audio. This
variety of unstructured data poses certain issues for storage,
mining and analyzing data.
3. Velocity: The term velocity refers to the speed of generation of
data. How fast the data is generated and processed to meet the
demands, determines real potential in the data.
Big data velocity deals with the speed at which data flows in
from sources like business processes, application logs, networks,
and social media sites, sensors, Mobile devices, etc. The flow of
data is massive and continuous.
Applications of Big data
1. Smarter Healthcare: Making use of the petabytes of patient’s data, the
organization can extract meaningful information and then build applications that
can predict the patient’s deteriorating condition in advance.
2. Search Quality: Every time we are extracting information from Google, we are
simultaneously generating data for it. Google stores this data and uses it to
improve its search quality.
3. Manufacturing: Analyzing big data in the manufacturing industry can reduce
component defects, improve product quality, and increase efficiency, and save
time and money.
4. Telecom: Telecom sectors collects information, analyzes it and provide solutions
to different problems. By using Big Data applications, telecom companies have
been able to significantly reduce data packet loss, which occurs when networks
are overloaded, and thus, providing a seamless connection to their customers.
Challenges of Big data
1. Data Quality: The problem here is the 4th V i.e. Variability.
The data here is very messy, inconsistent and incomplete.
Dirty data can cost approx. $600 billion to the companies.
2. Security: Since the data is huge in size, keeping it secure is
another challenge. It includes user authentication, restricting
access based on a user, recording data access histories, proper
use of data encryption etc.
3. Storage: The more data, an organization has, the more
complex the problems of managing it can become. Need a
storage system which can easily scale up or down on-demand.
Significance of Data Science
1. The principal purpose of Data Science is to find patterns
within data. It uses various statistical techniques to
analyse and draw insights from the data.
2. Data Scientist must scrutinize the data thoroughly.
• Make predictions from the data.
• Assist companies in making smarter business decisions.
• Data Science churns raw data into meaningful insights.
• Therefore, industries need data science
Data Science, Machine Learning and Artificial
Intelligence

1. Data Science Produces Insights

• Data science is distinguished from the other two fields because its goal
is an especially human one: to gain insight and understanding. ]
• The main distinction is that in data science there is always a human in
the loop: someone is understanding the insight, seeing the figure, or
benefitting from the conclusion.
• This definition of data science thus emphasizes:
• Statistical inference
• Data visualization
• Experiment design
• Domain knowledge
• Communication
Data Science, Machine Learning and Artificial
Intelligence

2. Machine Learning Produces Predictions

• We can think of machine learning as the field of prediction: of
“Given instance X with particular features, predict Y about it.”
• These predictions could be about the future, but they also could
be about qualities that aren’t immediately obvious to a
computer.
• Almost all Kaggle competitions qualify as machine learning
problems: they offer some training data, and then see if
competitors can make accurate predictions about new
examples.
• There’s plenty of overlap between data science and machine
learning. For example, logistic regression can be used to draw
insights about relationships and to make predictions.
Data Science, Machine Learning and Artificial
Intelligence

3. Artificial Intelligence Produces Actions

• Artificial intelligence is by far the oldest and the most widely recognized of
these three designations, and as a result it’s the most challenging to define.
• One common thread in definitions of AI is that an autonomous agent
executes or recommends actions. Some systems I think should be described
as AI include:

• Game-playing algorithms (Deep Blue, AlphaGo)

• Robotics and control theory (motion planning, walking a bipedal robot)
• Optimization (Google Maps choosing a route)
• Natural language processing (Bots - by “bots” here I’m referring to systems
meant to interpret natural language and then respond in kind. This can be
distinguished from text mining, where the goal is to extract insights (data
science) or text classification, where the goal is to categorize documents
(machine learning)
• Reinforcement learning
Case Study: How Would the Three
Be Used Together?

• Suppose we were building a self-driving car, and were working on

the specific problem of stopping at stop signs. We would need skills
drawn from all three of these fields.
• MI: The car has to recognize a stop sign using its cameras. We
construct a dataset of millions of photos of streetside objects, and
train an algorithm to predict which have stop signs in them.
• AI: Once our car can recognize stop signs, it needs to decide when
to take the action of applying the brakes. It’s dangerous to apply
them too early or too late, and we need it to handle varying road
conditions (for example, to recognize on a slippery road that it is not
slowing down quickly enough), which is a problem of control theory.
Case Study: How Would the Three
Be Used Together?

• Data science: In street tests we find that the car’s

performance isn’t good enough, with some false negatives in
which it drives right by a stop sign.
• After analyzing the street test data, we gain the insight that
the rate of false negatives depends on the time of day: it is
more likely to miss a stop sign before sunrise or after sunset.
• We realize that most of our training data included only
objects in full daylight, so we construct a better dataset
including nighttime images and go back to the machine
learning step.
Skill set needed for Data Scientist

Business
Domain Mathema Compute Collabora
Awarenes Soft skills
Expertise tics r Science tive Skills
s
Data Science process
Setting the
research
goal

Presentatio
Retrieving
n and
data
automation

Data Data
Modelling Preparation

Data
Exploration
Setting the research goal
• First prepare a project charter.
• This charter contains information such as what you’re
going to research, how the company benefits from that,
what data and resources you need, a timetable, and
deliverables.
• The project charter contains the details about which data
you need and where you can find it.
Retrieving data

• The second step is to collect data.

• In this step you ensure that you can use the data in your
program, which means checking the existence of, quality,
and access to the data.
• Data can also be delivered by third-party companies and
takes many forms ranging from Excel spreadsheets to
different types of databases.
Data Preparation
• Data collection is an error-prone process, in this
phase you enhance the quality of the data and
prepare it for use in subsequent steps.
• This phase consists of three sub phases:
• Data cleansing removes false values from a data source
and inconsistencies across data sources
• Data integration enriches data sources by combining
information from multiple data sources
• Data transformation ensures that the data is in a suitable
format for use in your models.
Data Exploration
• Data exploration is concerned with building a deeper
understanding of your data.
• Here we try to understand how variables interact with
each other, the distribution of the data, and whether
there are outliers.
• To achieve this you mainly use descriptive statistics,
visual techniques, and simple modelling.
• This step often goes by the abbreviation EDA, for
Exploratory Data Analysis.
Data Modelling
• In this phase you use models, domain knowledge, and
insights about the data you found in the previous steps to
answer the research question.
• You select a technique from the fields of statistics,
machine learning, operations research, and so on.
• Building a model is an iterative process that involves
selecting the variables for the model, executing the
model, and model diagnostics.
Presentation and automation

• Finally, you present the results to your business.

• These results can take many forms, ranging from
presentations to research reports.
• The business officials will want to use the insights you
gained in another project or enable an operational
process to use the outcome from your model.

Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Module 1 Background of The Philiipine History
100% (1)
Module 1 Background of The Philiipine History
3 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
DS
No ratings yet
DS
94 pages
Module 1
No ratings yet
Module 1
35 pages
Unit 1 DA
No ratings yet
Unit 1 DA
72 pages
Inroduction To Data Science
No ratings yet
Inroduction To Data Science
62 pages
AD3491 - FDSA - Unit I - Introduction - Part I
100% (2)
AD3491 - FDSA - Unit I - Introduction - Part I
23 pages
1 Stop Project1
No ratings yet
1 Stop Project1
27 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
53 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
Ass 2
No ratings yet
Ass 2
6 pages
Fundamentals of Data Science unit 1
No ratings yet
Fundamentals of Data Science unit 1
33 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Session 1819
No ratings yet
Session 1819
47 pages
IDS- UNIT-1
No ratings yet
IDS- UNIT-1
14 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Project Work 1
No ratings yet
Project Work 1
12 pages
Unit 1 and Unit 2 notes bda
No ratings yet
Unit 1 and Unit 2 notes bda
11 pages
Unit 1
No ratings yet
Unit 1
28 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Data Science Process Stages Lecture 2
No ratings yet
Data Science Process Stages Lecture 2
4 pages
Data Science and Big Data Analytics Unit 1 notes
No ratings yet
Data Science and Big Data Analytics Unit 1 notes
13 pages
Fds Module 1
No ratings yet
Fds Module 1
65 pages
Lecture 2 The data science process and tools for each step
No ratings yet
Lecture 2 The data science process and tools for each step
8 pages
Fods Notes
No ratings yet
Fods Notes
139 pages
1 1 Intro To Data and Data Science Course Notes
No ratings yet
1 1 Intro To Data and Data Science Course Notes
8 pages
Unit 1
No ratings yet
Unit 1
14 pages
5 Data Science Project Lifecycle
No ratings yet
5 Data Science Project Lifecycle
33 pages
Foundation of Data Science
100% (2)
Foundation of Data Science
143 pages
DSUP Chapter 1 PDF
No ratings yet
DSUP Chapter 1 PDF
31 pages
Business Analytics
100% (5)
Business Analytics
46 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Unit 1_data Science_iii Bsc Cs.
No ratings yet
Unit 1_data Science_iii Bsc Cs.
14 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
LM11 Introduction To Big Data Techniques IFT Notes
No ratings yet
LM11 Introduction To Big Data Techniques IFT Notes
7 pages
Data Science SPPU
No ratings yet
Data Science SPPU
115 pages
Unit 1
No ratings yet
Unit 1
60 pages
Data Science Bcs A
No ratings yet
Data Science Bcs A
20 pages
ds_u1_chp1
No ratings yet
ds_u1_chp1
13 pages
Data Science in IOT
No ratings yet
Data Science in IOT
220 pages
Python For Data Science and Machine Learning
100% (2)
Python For Data Science and Machine Learning
31 pages
Unit 1
No ratings yet
Unit 1
19 pages
FDS NOTES
No ratings yet
FDS NOTES
137 pages
DM Module1
No ratings yet
DM Module1
15 pages
Data Science
No ratings yet
Data Science
7 pages
DS R Unit-1
No ratings yet
DS R Unit-1
41 pages
Unit - I Question & Answer
No ratings yet
Unit - I Question & Answer
23 pages
1 - Introduction To Data Science
No ratings yet
1 - Introduction To Data Science
6 pages
IDS UNIT 1,2,3,4 & 5
No ratings yet
IDS UNIT 1,2,3,4 & 5
117 pages
T Assignment
No ratings yet
T Assignment
5 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
25 pages
DSE 3 Unit 1
100% (1)
DSE 3 Unit 1
10 pages
Introduction To Data Science and Big Data
No ratings yet
Introduction To Data Science and Big Data
6 pages
Data Science Material
No ratings yet
Data Science Material
48 pages
Data Science_ppt
No ratings yet
Data Science_ppt
45 pages
Unit- 1
No ratings yet
Unit- 1
28 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Introduction to Python
No ratings yet
Introduction to Python
71 pages
Data Discretization
No ratings yet
Data Discretization
32 pages
Data Visualization
No ratings yet
Data Visualization
24 pages
Data Preprocessing
No ratings yet
Data Preprocessing
84 pages
Sở Gd&Đt Nghệ An Liên Trường Thpt: (Đề thi có 04 trang) Thời gian làm bài: 60 phút, không kể thời gian phát đề
No ratings yet
Sở Gd&Đt Nghệ An Liên Trường Thpt: (Đề thi có 04 trang) Thời gian làm bài: 60 phút, không kể thời gian phát đề
4 pages
The Divisions of Philosophy
No ratings yet
The Divisions of Philosophy
3 pages
Download Full Speech and voice science workbook Fourth Edition Alison Behrman PDF All Chapters
75% (4)
Download Full Speech and voice science workbook Fourth Edition Alison Behrman PDF All Chapters
40 pages
AP Environmental Science Course
No ratings yet
AP Environmental Science Course
4 pages
Artikel English Version
No ratings yet
Artikel English Version
11 pages
Sample - DLP - TEST OF HYPOTHESIS Sir Buhangin
No ratings yet
Sample - DLP - TEST OF HYPOTHESIS Sir Buhangin
9 pages
Catamisan, Andrew Jozsef D. Phy11l A4 E201 3q1617
No ratings yet
Catamisan, Andrew Jozsef D. Phy11l A4 E201 3q1617
2 pages
Output Number 3 - Final
No ratings yet
Output Number 3 - Final
2 pages
II BTECH II SEM (R23) MID-I TIME TABLE FEB-2025
No ratings yet
II BTECH II SEM (R23) MID-I TIME TABLE FEB-2025
3 pages
Taxonomy
No ratings yet
Taxonomy
46 pages
IOPS311 Reading Package-1
No ratings yet
IOPS311 Reading Package-1
233 pages
031 - Btech - 08 Sem PDF
No ratings yet
031 - Btech - 08 Sem PDF
163 pages
Mil Lesson 3
No ratings yet
Mil Lesson 3
3 pages
Assessment Activity Migration To Australia
No ratings yet
Assessment Activity Migration To Australia
7 pages
Formulating Min-Research
No ratings yet
Formulating Min-Research
43 pages
(Ebook PDF) Guide To Evidence-Based Physical Therapist Practice 4th Edition Download PDF
100% (8)
(Ebook PDF) Guide To Evidence-Based Physical Therapist Practice 4th Edition Download PDF
41 pages
Mapping of Diploma Courses For Undergraduate Courses
No ratings yet
Mapping of Diploma Courses For Undergraduate Courses
2 pages
How To Write A Dissertation Introduction Chapter
No ratings yet
How To Write A Dissertation Introduction Chapter
8 pages
LJ Charts Stago72018
No ratings yet
LJ Charts Stago72018
4 pages
14PHDRM
No ratings yet
14PHDRM
1 page
B.A. PROGRAMME
No ratings yet
B.A. PROGRAMME
11 pages
Module 2 Quarter 2
No ratings yet
Module 2 Quarter 2
5 pages
WEEK 3 Identifying The Area of Inquiry
No ratings yet
WEEK 3 Identifying The Area of Inquiry
33 pages
Statistical Method
No ratings yet
Statistical Method
3 pages
The Scientific Society AIIMS New Delhi Presents: Bridging Minds
No ratings yet
The Scientific Society AIIMS New Delhi Presents: Bridging Minds
5 pages
Choosing Best Stream After 10th
No ratings yet
Choosing Best Stream After 10th
12 pages
GE 4 - Mathematics in The Modern World M2
No ratings yet
GE 4 - Mathematics in The Modern World M2
12 pages
Machines Like Us Passage
No ratings yet
Machines Like Us Passage
5 pages
Accredited Universities in Togo
No ratings yet
Accredited Universities in Togo
4 pages

BCA Lecture I

Uploaded by

BCA Lecture I

Uploaded by

Data Science:

1. Data Science Produces Insights

2. Machine Learning Produces Predictions

3. Artificial Intelligence Produces Actions

• Game-playing algorithms (Deep Blue, AlphaGo)

• Suppose we were building a self-driving car, and were working on

• Data science: In street tests we find that the car’s

• The second step is to collect data.

• Finally, you present the results to your business.

You might also like