0% found this document useful (0 votes)
13 views

Unit - I

Uploaded by

Akshay Dwivedi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Unit - I

Uploaded by

Akshay Dwivedi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Unit - I

Data Analytics (KIT 601)

SWASTI SINGHAL
Department of CSIT
KIET, Ghaziabad

DA KIT 601 By Swasti Singhal


07/24/2024 2
SYLLABUS
• Introduction to Data Analytics: Sources and nature of data, classification of data (structured, semi-structured,
unstructured), characteristics of data, introduction to Big Data platform, need of data analytics, evolution of
analytic scalability, analytic process and tools, analysis vs reporting, modern data analytic tools, applications
of data analytics.
• Data Analytics Lifecycle: Need, key roles for successful analytic projects, various phases of data analytics
lifecycle – discovery, data preparation, model planning, model building, communicating results,
operationalization.
Learning Objectives
CO 1 Discuss various concepts of data analytics
pipeline

• Define Data and Its importance


• Define data analytics and its types
• Introduction to Data Analytics: Sources and nature
of data, classification of data (structured, semi-
structured, unstructured), characteristics of data,
introduction to Big Data platform, need of data
analytics, evolution of analytic scalability, analytic
process and tools, analysis vs reporting, modern data
analytic tools, applications of data analytics.
• Data Analytics Lifecycle: Need, key roles for
successful analytic projects, various phases of data
analytics lifecycle – discovery, data preparation,
model planning, model building, communicating
results, operationalization.
Source of Data
What is Big data
CLASSIFICATION OF DATA
• Data classification is broadly defined as the process of organizing
data by relevant categories so that it may be used and protected more
efficiently. On a basic level, the classification process makes data
easier to locate and retrieve. Data classification is of particular
importance when it comes to risk management, compliance, and data
security.
• Big Data includes huge volume, high velocity, and extensible variety
of data. These are 3 types: Structured data, Semi-structured data,
and Unstructured data.
Structured data
• Structured data is data whose elements are addressable for effective
analysis. It has been organized into a formatted repository that is
typically a database. It concerns all data which can be stored in
database SQL in a table with rows and columns. They have relational
keys and can easily be mapped into pre-designed fields. Today, those
data are most processed in the development and simplest way to
manage information. Example: Relational data.
Unstructured data
• Unstructured data is a data that is which is not organized in a pre-
defined manner or does not have a pre-defined data model, thus it is
not a good fit for a mainstream relational database. So for
Unstructured data, there are alternative platforms for storing and
managing, it is increasingly prevalent in IT systems and is used by
organizations in a variety of business intelligence and analytics
applications. Example: Word, PDF, Text, Media logs.
Semi-Structured data
• Semi-structured data is information that does not reside in a relational
database but that have some organizational properties that make it
easier to analyze. With some process, you can store them in the
relation database (it could be very hard for some kind of semi-
structured data), but Semi-structured exist to ease space. Example:
XML data.
CHARACTERISTICS OF DATA
?
Volume
Variety
Veracity
Value
Velocity
• Visualization:Using charts and graphs to visualize large amounts of
complex data is much more effective in conveying meaning than
spreadsheets and reports
• Variability: Variability is different from variety. A coffee shop may
offer 6 different blends of coffee, but if you get the same blend every
day and it tastes different every day, that is variability. The same is
true of data, if the meaning is constantly changing it can have a huge
impact on your data.
How Much Data
cern's large hadron collider
TYPES OF DATA
VARIETIES BIG DATA COLLECTED
What is Big data
• Big data exceeds to reach of commonly used hardware
environments and software tools to capture, manage and process it
within a tolerable elapsed time for its user population - Merv ardern
• Big data' refers to datasets whose size is beyond the ability of
typical database software tools to capture, store, manage, and
analyze. – Mckinsey global institute
Summary
• Definition of Data Analytics
• Data Analytics vs Data Mining
• Definitions to Big data
• Classification's of big data
• Characteristics of big data
• Applications of Big data
TRADITIONAL ANALYTICS VS BIG
DATA ANALYTICS
360-degree view

You might also like