BCA Lecture I
BCA Lecture I
Introduction &
Process Overview
Important Terms
• Data Science: Data Science represents optimization of
processes and resources. It produces data insights, actionable,
data informed conclusions or predictions that you can use to
understand and improve business, investments, health and
lifestyle.
• Big data: Big Data is a term used to describe a collection of
data that is huge in volume and yet growing exponentially with
time. Massive volumes of data are simply termed as Big Data.
• Example: Spotify, an on-demand music providing platform, uses Big
Data Analytics, collects data from all its users around the globe, and
then uses the analyzed data to give informed music recommendations
and suggestions to every individual user.
Types of Big data
1. Structured data: Any data that can be stored,
accessed and processed in the form of fixed format is
termed as a structured data.
2. Unstructured data: The data which have unknown
form and cannot be stored in RDBMS and cannot be
analyzed unless it is transformed into a structured
format is called as unstructured data. This type poses
multiple challenges in terms of its processing for
deriving value out of it. Text Files and multimedia
contents like images, audios, and videos are example of
unstructured data.
3. Semi Structured data: Semi-structured data can
Characteristics of Big data
1. Volume: The name Big Data itself is related to a size which is
enormous. Size of data plays a very crucial role in determining
value out of data.
2. Variety: Variety refers to heterogeneous sources and the nature
of data, both structured and unstructured. Data is in the form of
emails, photos, videos, monitoring devices, PDFs, audio. This
variety of unstructured data poses certain issues for storage,
mining and analyzing data.
3. Velocity: The term velocity refers to the speed of generation of
data. How fast the data is generated and processed to meet the
demands, determines real potential in the data.
Big data velocity deals with the speed at which data flows in
from sources like business processes, application logs, networks,
and social media sites, sensors, Mobile devices, etc. The flow of
data is massive and continuous.
Applications of Big data
1. Smarter Healthcare: Making use of the petabytes of patient’s data, the
organization can extract meaningful information and then build applications that
can predict the patient’s deteriorating condition in advance.
2. Search Quality: Every time we are extracting information from Google, we are
simultaneously generating data for it. Google stores this data and uses it to
improve its search quality.
3. Manufacturing: Analyzing big data in the manufacturing industry can reduce
component defects, improve product quality, and increase efficiency, and save
time and money.
4. Telecom: Telecom sectors collects information, analyzes it and provide solutions
to different problems. By using Big Data applications, telecom companies have
been able to significantly reduce data packet loss, which occurs when networks
are overloaded, and thus, providing a seamless connection to their customers.
Challenges of Big data
1. Data Quality: The problem here is the 4th V i.e. Variability.
The data here is very messy, inconsistent and incomplete.
Dirty data can cost approx. $600 billion to the companies.
2. Security: Since the data is huge in size, keeping it secure is
another challenge. It includes user authentication, restricting
access based on a user, recording data access histories, proper
use of data encryption etc.
3. Storage: The more data, an organization has, the more
complex the problems of managing it can become. Need a
storage system which can easily scale up or down on-demand.
Significance of Data Science
1. The principal purpose of Data Science is to find patterns
within data. It uses various statistical techniques to
analyse and draw insights from the data.
2. Data Scientist must scrutinize the data thoroughly.
• Make predictions from the data.
• Assist companies in making smarter business decisions.
• Data Science churns raw data into meaningful insights.
• Therefore, industries need data science
Data Science, Machine Learning and Artificial
Intelligence
Business
Domain Mathema Compute Collabora
Awarenes Soft skills
Expertise tics r Science tive Skills
s
Data Science process
Setting the
research
goal
Presentatio
Retrieving
n and
data
automation
Data Data
Modelling Preparation
Data
Exploration
Setting the research goal
• First prepare a project charter.
• This charter contains information such as what you’re
going to research, how the company benefits from that,
what data and resources you need, a timetable, and
deliverables.
• The project charter contains the details about which data
you need and where you can find it.
Retrieving data