DA(Unit-1)

DA ppt ipu

Uploaded by

Aditya

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

DA(Unit-1)

DA ppt ipu

Uploaded by

Aditya

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

DATA ANALYTICS

Unit-1
Sources and Nature of Data
• Data in data analytics comes from various sources and can be
categorized based on its nature and origin.
1. Structured Data
• Definition: Data that is organized and formatted in a specific way,
often in tables with rows and columns.
• Sources: Relational databases, spreadsheets, CSV files.
• Examples: SQL databases, Excel spreadsheets
2. Unstructured Data
• Definition: Data that lacks a predefined data model or is not
organized in a structured manner.
• Sources: Text documents, images, videos, social media posts.
• Examples: Text files, PDFs, images, videos.
3. Semi-Structured Data
• Definition: Data that is not fully structured but contains some level of
organization, often in the form of tags or elements.
• Sources: XML, JSON, log files.
• Examples: JSON files, XML files, log files.
4. Time-Series Data
• Definition: Data collected over time at regular intervals.
• Sources: Sensor data, financial market data, weather data.
• Examples: Stock prices, temperature records, IoT sensor data.
5. Geospatial Data
• Definition: Data that includes information about the geographic
location of objects or events.
• Sources: GPS data, maps, satellite imagery.
• Examples: Location tracking data, maps, satellite images.
6. Big Data
• Definition: Extremely large and complex datasets that cannot be
easily managed or processed using traditional data processing tools.
• Sources: Social media, sensors, Internet of Things (IoT) devices.
• Examples: Large-scale social media data, sensor data from smart
cities, IoT-generated data.
7. Transactional Data
• Definition: Data generated as a result of transactions or interactions.
• Sources: E-commerce transactions, financial transactions.
• Examples: Purchase records, banking transactions.
8. Web and Social Media Data
• Definition: Data collected from websites and social media platforms.
• Sources: Web scraping, social media APIs.
• Examples: Tweets, Facebook posts, web pages.
9. Machine-Generated Data
• Definition: Data generated by machines or devices without human
intervention.
• Sources: Sensor data, logs, machine-generated reports.
• Examples: Sensor readings, system logs.
10. Human-Generated Data
• Definition: Data created and input by human users.
• Sources: Surveys, feedback forms, manual data entry.
• Examples: Survey responses, user reviews.
11. Publicly Available Data
• Definition: Data that is accessible to the public.
• Sources: Open data repositories, government datasets.
• Examples: Census data, public health records.
Structured Data
• Structured data in data analytics refers to information that is highly
organized and formatted in a way that is easily understandable by
both machines and humans.
• This type of data follows a specific schema or data model, typically
arranged in rows and columns within a relational database or a similar
tabular format.
• Structured data is foundational in data analytics, providing a reliable
and efficient way to store and analyze information.
• It is particularly well-suited for scenarios where data relationships and
integrity are critical, such as in business applications and traditional
relational database systems.
Key characteristics of structured
data
Format and Organization:
• Tabular Structure: Structured data is often organized in tables with
rows and columns, where each row represents a record or entry, and
each column represents a specific attribute or field.
• Fixed Schema: The data follows a predefined and fixed schema,
meaning the types and structure of data are well-defined in advance.
Data Types:
• Homogeneous Data Types: Within a column, data types are usually
consistent. For example, a column might contain only numerical
values, dates, or text.
• Well-Defined Data Formats: Each column has a specific data format,
such as integers, floating-point numbers, dates, or strings.
Examples of Structured Data:
• Relational Databases: Most commonly associated with structured
data, relational databases such as MySQL, PostgreSQL, or Microsoft
SQL Server store data in tables with defined relationships.
• Spreadsheets: Excel sheets or CSV files are examples of structured
data where information is organized into rows and columns.
• Tables in HTML: Data presented in tables on web pages follows a
structured format.
Querying and Analysis:
• SQL Queries: Structured Query Language (SQL) is commonly used to
query and manipulate structured data. SQL allows users to retrieve,
update, and analyze data stored in relational databases.
• Aggregation and Join Operations: Techniques like aggregations and
joins are frequently used to derive meaningful insights by combining
and summarizing structured data.
Scalability and Efficiency:
• Efficient Storage: Structured data is highly efficient in terms of
storage, and databases are designed to handle large volumes of
structured information.
• Indexing: Indexing is often applied to columns, making data retrieval
faster and more efficient.
Use Cases:
• Business Applications: Structured data is commonly used in business
applications, such as customer relationship management (CRM)
systems, enterprise resource planning (ERP) systems, and financial
databases.
• Reporting and Business Intelligence: Structured data is well-suited for
generating reports and conducting business intelligence analyses due
to its organized and predictable nature.
Challenges:
• Rigidity: The fixed schema can be a limitation when dealing with
evolving or unanticipated data structures.
• Limited Representation: Structured data may struggle to represent
complex relationships or unstructured information.
Semi-Structured Data
• Semi-structured data in data analytics refers to information that does
not conform to the structure of traditional relational databases, yet
exhibits some level of organization.
• Unlike structured data, which is organized into fixed tables with
predefined schemas, semi-structured data allows for more flexibility
in terms of data representation.
• Semi-structured data strikes a balance between the rigidity of
structured data and the flexibility of unstructured data, making it
suitable for scenarios where the structure of the data is not fixed but
still requires some level of organization and representation.
Key characteristics of semi-
structured data
Flexible Structure:
• No Fixed Schema: Semi-structured data does not adhere to a rigid,
predefined schema. It allows for variations in the structure of the
data, making it more adaptable to changing requirements.
• Self-Describing: Semi-structured data often includes metadata or tags
that describe the structure and meaning of the data.
Data Formats:
• Common Formats: Semi-structured data is often represented in
formats that provide some level of organization but do not enforce a
strict schema.
• Examples: JSON (JavaScript Object Notation), XML (eXtensible
Markup Language), YAML (YAML Ain't Markup Language).
Hierarchy and Nesting:
• Nested Structures: Semi-structured data can have nested or
hierarchical structures, allowing for the representation of complex
relationships between entities.
• Example: In JSON, objects can contain arrays or other objects,
creating a hierarchical structure.
Use of Tags or Labels:
• Key-Value Pairs: Semi-structured data often uses key-value pairs to
represent information, providing a way to label and organize data
elements.
• Tags and Attributes: XML uses tags and attributes to label and
organize data, allowing for a more flexible structure.
Query and Analysis:
• Query Languages: While semi-structured data can be queried using
traditional SQL in some cases, it is also common to use specialized
query languages or tools designed for the specific format, such as
XPath for XML or JSONPath for JSON.
• Schema-on-Read: Unlike structured data with a schema-on-write
approach, semi-structured data often employs a schema-on-read
approach, where the schema is applied when the data is queried.
Examples of Semi-Structured
Data:
• JSON: JSON is widely used for representing semi-structured data. It
allows for nested structures and is commonly used in web
development and data interchange.
• XML: XML provides a hierarchical structure using tags and attributes,
making it suitable for representing semi-structured data with complex
relationships.
• YAML: YAML is a human-readable data serialization format that is
often used for configuration files and data exchange, offering a more
concise syntax compared to XML or JSON.
Use Cases:
• Web Development: Semi-structured data formats like JSON are
commonly used in web development for data exchange between the
server and the client.
• Configuration Files: YAML is often used for configuration files due to
its human-readable and concise syntax.
• Data Interchange: Semi-structured data is suitable for scenarios
where the structure of the data is not fully known in advance or may
evolve over time.
Challenges:
• Interoperability: Different semi-structured data formats may require
different parsing and processing techniques, leading to
interoperability challenges.
• Complexity: While semi-structured data allows for flexibility, it can
also introduce complexity in terms of understanding and managing
the data structure.
Unstructured Data
• Unstructured data in data analytics refers to information that lacks a
predefined data model or a specific organizational structure.
• Unlike structured data, which is organized into tables with well-defined
columns and rows, unstructured data does not follow a rigid format.
• Unstructured data is often characterized by its diverse and free-form nature,
making it challenging to analyze using traditional database management and
analysis tools.
• Unstructured data is a valuable source of information, and advancements in
technologies like natural language processing, computer vision, and machine
learning have enabled organizations to derive meaningful insights from this
type of data.
• As data analytics continues to evolve, the ability to effectively analyze and
derive insights from unstructured data becomes increasingly important for
decision-making and gaining a comprehensive understanding of complex
information.
Key characteristics of
unstructured data
Lack of Formal Structure:
• No Predefined Schema: Unstructured data does not adhere to a
predefined schema or data model. It may include a wide variety of
data types, and the relationships between data elements are not
explicitly defined.
• Varied Formats: Unstructured data can be in the form of text, images,
audio, video, social media posts, emails, and more.
Diverse Content:
• Textual Content: This includes documents, articles, emails, and any
other textual information that is not organized in a tabular structure.
• Media Files: Images, audio, and video files fall under unstructured
data, as they lack a structured format for easy analysis.
• Social Media Feeds: Data from social media platforms, such as
tweets, comments, and posts, is unstructured and often contains
informal language.
Complex Relationships:
• Implicit Relationships: Relationships between data elements in
unstructured data are often implicit and may require advanced
analytics techniques to uncover.
• Context-Dependent: Understanding the context and relationships
within unstructured data may involve natural language processing
(NLP) and machine learning.
Analysis Challenges:
• Text Mining and NLP: Analyzing unstructured textual data requires
techniques such as text mining and natural language processing to
extract meaningful insights.
• Image and Video Analysis: Processing and analyzing unstructured
data in the form of images or videos involve computer vision
techniques.
• Speech Recognition: Transcribing and analyzing unstructured data in
the form of spoken words (audio) may require speech recognition
algorithms.
Examples of Unstructured Data:
• Text Documents: Word documents, PDFs, and other text-based files
without a defined structure.
• Multimedia Files: Images, videos, and audio recordings that lack a
structured format.
• Social Media Posts: Data from platforms like Twitter, Facebook, and
Instagram, which often include unstructured text and multimedia
content.
• Emails: Email messages and attachments that may contain a mix of
structured and unstructured content.
Use Cases:
• Sentiment Analysis: Analyzing social media posts or customer reviews
to determine sentiment towards a product or service.
• Image Recognition: Identifying objects, people, or scenes within
images using computer vision techniques.
• Speech-to-Text Conversion: Converting spoken words in audio files to
text for analysis.
• Document Classification: Categorizing unstructured textual
documents based on their content.
Storage Challenges:
• Large Volumes: Unstructured data can be voluminous, and storing
and managing it efficiently may require scalable storage solutions.
• Data Variety: The diverse nature of unstructured data may pose
challenges in terms of data storage and retrieval.
Data Integration:
• Integration Challenges: Integrating unstructured data with structured
data sources can be challenging due to the differences in data formats
and structures.
• Data Lakes: Unstructured data is often stored in data lakes, providing
a flexible and scalable repository for various data types.

Database Systems Complete Book Solutions
No ratings yet
Database Systems Complete Book Solutions
103 pages
DatabricksDataEngineer Associate2024
67% (3)
DatabricksDataEngineer Associate2024
157 pages
Microsoft: Exam Questions DP-900
No ratings yet
Microsoft: Exam Questions DP-900
20 pages
Quantum Data Warehousing Data Mining Koe 093
No ratings yet
Quantum Data Warehousing Data Mining Koe 093
67 pages
Sap Hana
No ratings yet
Sap Hana
146 pages
Data and Data Storage
No ratings yet
Data and Data Storage
29 pages
Bussiness Analytics Chep-2
No ratings yet
Bussiness Analytics Chep-2
36 pages
CSC4404 Chap3
No ratings yet
CSC4404 Chap3
84 pages
Chapter 01: Types of Digital Data
No ratings yet
Chapter 01: Types of Digital Data
79 pages
Data Types
No ratings yet
Data Types
36 pages
UNIT 1 INTRODUCTION TO BIGDATA by MIT
No ratings yet
UNIT 1 INTRODUCTION TO BIGDATA by MIT
12 pages
DBMS_UNIT4_NOTES
No ratings yet
DBMS_UNIT4_NOTES
95 pages
Group 7 Databases On The Web and Semi Structured Databases
No ratings yet
Group 7 Databases On The Web and Semi Structured Databases
33 pages
Class3-4 (1)
No ratings yet
Class3-4 (1)
48 pages
Module 1.2 Data Preprocessing
No ratings yet
Module 1.2 Data Preprocessing
50 pages
Unit 5 Managing Data Resources
No ratings yet
Unit 5 Managing Data Resources
59 pages
SQL Question From Interview Point of View
No ratings yet
SQL Question From Interview Point of View
61 pages
CSE031.Lecture - 05.introduction To Database - Fall 2019
No ratings yet
CSE031.Lecture - 05.introduction To Database - Fall 2019
29 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
9 pages
Unit_6ppt_(1)[1]
No ratings yet
Unit_6ppt_(1)[1]
28 pages
Chapter One - DS Introduction
No ratings yet
Chapter One - DS Introduction
40 pages
ch 3 data modeling
No ratings yet
ch 3 data modeling
31 pages
Week 08
No ratings yet
Week 08
19 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
Lecture02-Main Motivation and Drivers For Big Data Adoption
No ratings yet
Lecture02-Main Motivation and Drivers For Big Data Adoption
9 pages
Data Types and Sources
No ratings yet
Data Types and Sources
36 pages
Chapter 01: Types of Digital Data
No ratings yet
Chapter 01: Types of Digital Data
80 pages
FDBS Chapter -3 2017
No ratings yet
FDBS Chapter -3 2017
23 pages
Unit 2
No ratings yet
Unit 2
34 pages
Database Management Systems
No ratings yet
Database Management Systems
19 pages
Flavors of Data Organization
No ratings yet
Flavors of Data Organization
8 pages
Structured vs. Unstructured Data Understanding Differences
No ratings yet
Structured vs. Unstructured Data Understanding Differences
9 pages
Untitled document.
No ratings yet
Untitled document.
7 pages
CH-2 Data Science
No ratings yet
CH-2 Data Science
45 pages
Database Design and Development
No ratings yet
Database Design and Development
74 pages
SQL Data Analytics: Data Analysts & Data Scientist To Become A Successful
No ratings yet
SQL Data Analytics: Data Analysts & Data Scientist To Become A Successful
62 pages
1.Database Management System (DBMS) Overview
No ratings yet
1.Database Management System (DBMS) Overview
29 pages
Unit-I
No ratings yet
Unit-I
44 pages
DSA-Week 1
No ratings yet
DSA-Week 1
72 pages
Undestanding Data Module-3
No ratings yet
Undestanding Data Module-3
8 pages
Structured, Semi Structured and Unstructured Data
No ratings yet
Structured, Semi Structured and Unstructured Data
13 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Data Structures & Algorithms: Assignment 1
No ratings yet
Data Structures & Algorithms: Assignment 1
24 pages
AWS ML Notes -Domain 1 - Data Processing
No ratings yet
AWS ML Notes -Domain 1 - Data Processing
37 pages
Chapter 2 - Data Models: Good Database Design Will Get You Through Poor
No ratings yet
Chapter 2 - Data Models: Good Database Design Will Get You Through Poor
59 pages
Digital Data
No ratings yet
Digital Data
32 pages
Unit 2 It-01-1
No ratings yet
Unit 2 It-01-1
72 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
???? ?????????
No ratings yet
???? ?????????
22 pages
Inbound 7821835388304786803
No ratings yet
Inbound 7821835388304786803
10 pages
Module 3 Software Requirement For Data Entry Notes
No ratings yet
Module 3 Software Requirement For Data Entry Notes
23 pages
Structured, Semi-Structured and Unstructured Data (M-2)
No ratings yet
Structured, Semi-Structured and Unstructured Data (M-2)
3 pages
DB_lec_03
No ratings yet
DB_lec_03
22 pages
Unit I Types of Digital Data: CO1: Explain About Big Data Paradigm
No ratings yet
Unit I Types of Digital Data: CO1: Explain About Big Data Paradigm
37 pages
2.structure and Unstructured Data Disruptive System
No ratings yet
2.structure and Unstructured Data Disruptive System
4 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
No ratings yet
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
40 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
4 pages
Intro To Database
No ratings yet
Intro To Database
19 pages
Components of BI & Data Types
No ratings yet
Components of BI & Data Types
21 pages
Data Preparation
No ratings yet
Data Preparation
19 pages
3 Database
No ratings yet
3 Database
21 pages
DBMS M1- Part 2(FYIT)
No ratings yet
DBMS M1- Part 2(FYIT)
25 pages
Big Data Lec5
No ratings yet
Big Data Lec5
37 pages
Basic Concepts in Data Structures
From Everand
Basic Concepts in Data Structures
K.Meenendranath Reddy
No ratings yet
Course 10776A Developing Microsoft SQL Server 2012 Databases
No ratings yet
Course 10776A Developing Microsoft SQL Server 2012 Databases
9 pages
DBMS Notes
No ratings yet
DBMS Notes
27 pages
SQL
100% (1)
SQL
100 pages
SRE Group Assignment 4
No ratings yet
SRE Group Assignment 4
4 pages
LM 2 - Views of Data Data Models
No ratings yet
LM 2 - Views of Data Data Models
40 pages
Professional Summary: Resume
No ratings yet
Professional Summary: Resume
4 pages
DBMS
No ratings yet
DBMS
22 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
System Analysis and Design
80% (15)
System Analysis and Design
90 pages
Amahic QA For Definition Biography and Description Questions
No ratings yet
Amahic QA For Definition Biography and Description Questions
105 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
Introduction To Data Warehousing Types
No ratings yet
Introduction To Data Warehousing Types
13 pages
Sap BW 7.4
No ratings yet
Sap BW 7.4
6 pages
Ims ppt1
100% (1)
Ims ppt1
74 pages
Entity Relationship Model
No ratings yet
Entity Relationship Model
73 pages
CS614 MidTerm MCQs Solved by Arslan
0% (1)
CS614 MidTerm MCQs Solved by Arslan
21 pages
Veeam Backup 11 0 Permissions
No ratings yet
Veeam Backup 11 0 Permissions
27 pages
Pengendalian Mutu Pada Produksi Keripik Sukun
No ratings yet
Pengendalian Mutu Pada Produksi Keripik Sukun
10 pages
Important HZ Tables in Oracle EBS
No ratings yet
Important HZ Tables in Oracle EBS
3 pages
Real Estate Development Project Management ER Diagram-1
No ratings yet
Real Estate Development Project Management ER Diagram-1
7 pages
Types of DBMS Architecture
No ratings yet
Types of DBMS Architecture
13 pages
SQL Notes
No ratings yet
SQL Notes
9 pages
T03 SQL
No ratings yet
T03 SQL
29 pages
Second Year B. C. A. Theory Examination BCA - 205: Data Base Management Systems
No ratings yet
Second Year B. C. A. Theory Examination BCA - 205: Data Base Management Systems
2 pages
BICTE Dbms Sol
No ratings yet
BICTE Dbms Sol
7 pages

DA(Unit-1)

Uploaded by

DA(Unit-1)

Uploaded by

DATA ANALYTICS

You might also like