0% found this document useful (0 votes)
3 views

data_analytics

The document outlines the syllabus for a Data Analytics course, covering key topics such as the introduction to data analytics, the data analytics lifecycle, and the classification and characteristics of data. It emphasizes the importance of structured approaches in analytics projects, detailing various roles involved, including data scientists, data engineers, and business analysts. Additionally, it discusses modern tools and platforms used in data analytics, such as Hadoop and Spark, and highlights their applications across different industries.

Uploaded by

Genius Shivam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

data_analytics

The document outlines the syllabus for a Data Analytics course, covering key topics such as the introduction to data analytics, the data analytics lifecycle, and the classification and characteristics of data. It emphasizes the importance of structured approaches in analytics projects, detailing various roles involved, including data scientists, data engineers, and business analysts. Additionally, it discusses modern tools and platforms used in data analytics, such as Hadoop and Spark, and highlights their applications across different industries.

Uploaded by

Genius Shivam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ITECH WORLD AKTU

ITECH WORLD AKTU


Subject Name: Data Analytics (DA)
Subject Code: BCS052

Unit 1:

Syllabus:

1. Introduction to Data Analytics:

• Sources and nature of data


• Classification of data (structured, semistructured, unstructured)
• Characteristics of data
• Introduction to Big Data platform
• Need of data analytics
• Evolution of analytic scalability
• Analytic process and tools
• Analysis vs reporting
• Modern data analytic tools
• Applications of data analytics

2. Data Analytics Lifecycle:

• Need for data analytics lifecycle


• Key roles for successful analytic projects
• Various phases of data analytics lifecycle:
(a) Discovery
(b) Data preparation
(c) Model planning
(d) Model building
(e) Communicating results
(f) Operationalization

0.1 Introduction to Data Analytics


0.1.1 Definition of Data Analytics
Data Analytics is the process of examining raw data to uncover trends, patterns, and
insights that can assist in informed decision-making. It involves the use of statistical.

1
ITECH WORLD AKTU

Key Points:

• Objective: Transform data into actionable insights.

• Methods: Involves data cleaning, processing, and analysis.

• Outcome: Generates insights for strategic decisions in various domains like busi-
ness, healthcare, and technology.

• Tools: Includes Python, R, Excel, and specialized tools like Tableau, Power BI.

Example: A retail store uses data analytics to identify customer buying patterns
and optimize inventory management, ensuring popular products are always in stock.

0.1.2 Sources and Nature of Data


Data originates from various sources, primarily categorized as social, machine-generated,
and transactional data. Below is a detailed explanation of these sources:

1. Social Data:

• User-Generated Content: Posts, likes, and comments on platforms like


Facebook, Twitter, and Instagram.
• Reviews and Ratings: Feedback on platforms such as Amazon and Yelp
that reflect customer opinions.
• Social Network Analysis: Connections and interactions between users that
reveal behavioral patterns.
• Trending Topics: Real-time topics gaining popularity, aiding in sentiment
and trend analysis.

2. Machine-Generated Data:

• Sensors and IoT Devices: Data from devices like thermostats, smart-
watches, and industrial sensors.
• Log Data: Records of system activities, such as server logs and application
usage.
• GPS Data: Location information generated by devices like smartphones and
vehicles.
• Telemetry Data: Remote data transmitted from devices, such as satellites
and drones.

3. Transactional Data:

• Sales Data: Information about products sold, quantities, and revenues.


• Banking Transactions: Records of deposits, withdrawals, and payments.
• E-Commerce Transactions: Online purchases, customer behavior, and cart
abandonment rates.
• Invoices and Receipts: Structured records of financial exchanges between
businesses or customers.

2
ITECH WORLD AKTU

Example:
• A social media platform like Twitter generates vast amounts of social data from
tweets, hashtags, and mentions.

• Machine-generated data from GPS in delivery trucks helps optimize routes and
reduce costs.

• A retail store’s transactional data tracks customer purchases and identifies high-
demand products.

0.1.3 Classification of Data


Data can be classified into three main categories: structured, semi-structured, and un-
structured. Below is a detailed explanation of each type:

• Structured Data: Data that is organized in a tabular format with rows and
columns. It follows a fixed schema, making it easy to query and analyze.

– Examples: Excel sheets, relational databases (e.g., SQL).


– Common Tools: SQL, Microsoft Excel.

• Semi-Structured Data: Data that does not have a rigid structure but contains
tags or markers to separate elements. It lies between structured and unstructured
data.

– Examples: JSON files, XML files.


– Common Tools: NoSQL databases, tools like MongoDB.

• Unstructured Data: Data without a predefined format or organization. It re-


quires advanced tools and techniques for analysis.

– Examples: Images, videos, audio files, and text documents.


– Common Tools: Machine Learning models, Hadoop, Spark.

Example: Email metadata (e.g., sender, recipient, timestamp) is semi-structured,


while the email body is unstructured.

Comparison Table:

0.1.4 Characteristics of Data


The key characteristics of data, often referred to as the 4Vs, include:

• Volume: Refers to the sheer amount of data generated. Modern data systems
must handle terabytes or even petabytes of data.

– Example: A social media platform like Facebook generates billions of user


interactions daily.

• Velocity: Refers to the speed at which data is generated and processed. Real-time
data processing is crucial for timely insights.

3
ITECH WORLD AKTU

Aspect Structured Data Semi-Structured Unstructured Data


Data
Definition Organized in rows and Contains elements Lacks any predefined
columns with a fixed with tags or markers format or schema.
schema. but lacks strict struc-
ture.
Examples SQL databases, Excel JSON, XML, NoSQL Images, videos, audio
sheets. databases. files, text documents.
Storage Stored in relational Stored in NoSQL Stored in data lakes or
databases. databases or files. object storage.
Ease of Analysis Easy to query and an- Moderate difficulty Requires advanced
alyze using traditional due to partial struc- techniques and tools
tools. ture. for analysis.
Schema Depen- Follows a predefined Partially structured Does not follow any
dency and fixed schema. with flexible schema. schema.
Data Size Typically smaller in Moderate size, often Usually the largest in
size compared to oth- larger than structured size due to diverse for-
ers. data. mats.
Processing Tools SQL, Excel, and BI MongoDB, NoSQL, Hadoop, Spark, and
tools. and custom parsers. AI/ML tools.

Table 1: Comparison of Structured, Semi-Structured, and Unstructured Data

– Example: Stock market systems process millions of trades per second to pro-
vide real-time updates.

• Variety: Refers to the different types and formats of data, including structured,
semi-structured, and unstructured data.

– Example: A company might analyze customer reviews (text), social media


posts (images/videos), and sales transactions (structured data).

• Veracity: Refers to the quality and reliability of the data. High veracity ensures
data accuracy, consistency, and trustworthiness.

– Example: Data from unreliable sources or with missing values can lead to
incorrect insights.

Real-Life Scenario: Social media platforms like Twitter deal with high Volume
(millions of tweets daily), high Velocity (real-time updates), high Variety (text, images,
videos), and mixed Veracity (authentic and fake information).

0.1.5 Introduction to Big Data Platform


Big Data platforms are specialized frameworks and technologies designed to handle the
processing, storage, and analysis of massive datasets that traditional systems cannot effi-

4
ITECH WORLD AKTU

ciently manage. These platforms enable businesses and organizations to derive meaningful
insights from large-scale and diverse data.
Key Features of Big Data Platforms:

• Scalability: Ability to handle growing volumes of data efficiently.

• Distributed Computing: Processing data across multiple machines to improve per-


formance.

• Fault Tolerance: Ensuring reliability even in the event of hardware failures.

• High Performance: Providing fast data access and processing speeds.

Common Tools in Big Data Platforms:

• Hadoop:

– A distributed computing framework that processes and stores large datasets


using the MapReduce programming model.
– Components include:
∗ HDFS (Hadoop Distributed File System): For distributed storage.
∗ YARN: For resource management and job scheduling.
– Example: A telecom company uses Hadoop to analyze call records for iden-
tifying network issues.

• Spark:

– A fast and flexible in-memory processing framework for Big Data.


– Offers support for a wide range of workloads such as batch processing, real-
time streaming, machine learning, and graph computation.
– Compatible with Hadoop for storage and cluster management.
– Example: A financial institution uses Spark for fraud detection by analyzing
transaction data in real time.

• NoSQL Databases:

– Designed to handle unstructured and semi-structured data at scale.


– Types of NoSQL databases:
∗ Document-based (e.g., MongoDB).
∗ Key-Value stores (e.g., Redis).
∗ Columnar databases (e.g., Cassandra).
∗ Graph databases (e.g., Neo4j).
– Example: An e-commerce platform uses MongoDB to store customer profiles,
product details, and purchase history.

Applications of Big Data Platforms:

• Personalized marketing by analyzing customer preferences.

5
ITECH WORLD AKTU

• Real-time analytics for monitoring industrial equipment using IoT sensors.

• Enhancing healthcare diagnostics by analyzing patient records and medical images.

• Predictive maintenance in manufacturing by identifying patterns in machine per-


formance data.

Example in Action: Hadoop processes petabytes of clickstream data from a large


online retailer to optimize website navigation and improve the user experience.

0.1.6 Need for Data Analytics Lifecycle


What is Data Analytics Lifecycle?
The Data Analytics Lifecycle refers to a series of stages or steps that guide the process
of analyzing data from initial collection to final insights and decision-making. It is a
structured framework designed to ensure systematic execution of analytics projects, which
helps in producing accurate and actionable results. The lifecycle consists of multiple
phases, each with specific tasks, and is essential for managing complex data projects.
The key stages of the Data Analytics Lifecycle typically include:

• Discovery: Understanding the project objectives and data requirements.

• Data Preparation: Collecting, cleaning, and transforming data into usable for-
mats.

• Model Planning: Identifying suitable analytical techniques and models.

• Model Building: Developing models to extract insights.

• Communicating Results: Presenting insights and findings to stakeholders.

• Operationalization: Implementing the model or results into a business process.

[width=0.8]1681499276138.png

Need for Data Analytics Lifecycle


A structured approach to managing data analytics projects is crucial for several rea-
sons. The following points highlight the importance of adopting the Data Analytics
Lifecycle:

• Ensures Systematic Approach: The lifecycle provides a systematic framework


for managing projects. It ensures that every step is accounted for, avoiding ran-
domness in execution and ensuring that tasks are completed in the correct order.

• Minimizes Errors: By following a predefined process, the risk of errors is reduced.


Each stage builds upon the previous one, ensuring accuracy and reliability in data
processing and analysis.

• Optimizes Resource Usage: The lifecycle ensures efficient use of resources, such
as time, tools, and personnel. By organizing tasks in a structured way, projects are
completed more efficiently, avoiding wasted effort and resources.

6
ITECH WORLD AKTU

• Increases Efficiency: With a clear workflow in place, tasks are completed in a


more streamlined manner, making the entire process more efficient. The structured
approach ensures that insights can be derived quickly and accurately.
• Improves Communication: Clear milestones and stages help teams stay aligned
and facilitate communication about the progress of the project. This clarity is
especially useful when different teams or departments are involved.
• Better Decision-Making: The lifecycle ensures that all steps are thoroughly exe-
cuted, leading to high-quality insights. This improves decision-making by providing
businesses with reliable and actionable data.
• Scalable: The lifecycle framework is adaptable to projects of different sizes. Whether
it’s a small-scale analysis or a large, complex dataset, the process can scale accord-
ing to the project requirements.

0.1.7 Key Roles in Analytics Projects


In data analytics projects, various roles contribute to the successful execution and delivery
of insights. Each role plays a vital part in the project lifecycle, ensuring that the right
data is collected, processed, analyzed, and interpreted for decision-making. The key roles
typically include:

• Data Scientist:
– A data scientist is responsible for analyzing and interpreting complex data to
extract meaningful insights.
– They design and build models to forecast trends, make predictions, and identify
patterns within data.
– Data scientists use machine learning algorithms, statistical models, and ad-
vanced analytics techniques to solve business problems.
– Example: A data scientist develops a predictive model to forecast customer
churn based on historical data and trends.
• Data Engineer:
– A data engineer is responsible for designing, constructing, and maintaining the
systems and infrastructure that collect, store, and process data.
– They ensure that data pipelines are efficient, scalable, and capable of handling
large volumes of data.
– Data engineers work closely with data scientists to ensure the availability of
clean and well-structured data for analysis.
– Example: A data engineer designs and implements a data pipeline that ex-
tracts real-time transactional data from an e-commerce platform and stores it
in a data warehouse.
• Business Analyst:
– A business analyst bridges the gap between the technical team (data scientists
and engineers) and business stakeholders.

7
ITECH WORLD AKTU

– They are responsible for understanding the business problem and translating
it into actionable data-driven solutions.
– Business analysts also interpret the results of data analysis and communicate
them in a way that is understandable for non-technical stakeholders.
– Example: A business analyst analyzes customer feedback data and interprets
the results to help the marketing team refine their targeting strategy.

• Project Manager:

– A project manager oversees the overall execution of an analytics project, en-


suring that it stays on track and is completed within scope, time, and budget.
– They coordinate between teams, manage resources, and resolve any issues that
may arise during the project.
– Project managers also ensure that the project delivers business value and meets
stakeholder expectations.
– Example: A project manager ensures that the data engineering team delivers
clean data on time, while also coordinating with the data scientists to make
sure the model development phase proceeds smoothly.

0.1.8 Phases of Data Analytics Lifecycle


1. Discovery: Understanding business needs and data requirements.

2. Data Preparation: Cleaning and transforming data.

3. Model Planning: Designing algorithms and techniques.

4. Model Building: Implementing models using tools like Python or R.

5. Communicating Results: Presenting insights using visualization tools like Tableau.

6. Operationalization: Deploying the model into production.

Example: A retail company builds a model to predict customer churn and integrates it
into their CRM system.

[width=1.0]1681499276138.png

You might also like