0% found this document useful (0 votes)
45 views

DA Notes

Data analytics notes

Uploaded by

Aisha Emad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

DA Notes

Data analytics notes

Uploaded by

Aisha Emad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Analytics Notes

 Descriptive Analytics: Focuses on summarizing past data to answer "what


happened?". It uses statistical measures like mean, median, and mode and
creates visual reports, dashboards, and simple data queries. This type of
analytics helps businesses understand the current status and performance
indicators.
 Exploratory/Diagnostic Analytics: Aims to answer "why did it happen?" by
investigating data relationships, root causes, and patterns. It often involves
correlations, business dashboards, and analysis models to identify causal
factors behind trends or outcomes.
 Predictive Analytics: Looks at historical data patterns to forecast future
events, answering "what is likely to happen?" It applies techniques like
regression analysis, time series forecasting, machine learning, and deep
learning, often using tools like R and Python. Predictive analytics helps
anticipate risks, identify opportunities, and optimize processes.
 Prescriptive Analytics: Provides recommendations on the best actions to
take based on predictive models, addressing "what should we do?". It uses
optimization models, simulation techniques, and sensitivity analysis to
guide decision-making and improve performance. This type of analytics is
beneficial for tasks like pricing strategies, investment planning, and process
optimization.

 Data Analytics 1.0 (Business Intelligence Era): Initial stage focused on


structured data reporting through business intelligence (BI) tools like
PowerBI. This era primarily involved data collection and reporting on
company metrics.
 Data Analytics 2.0 (Big Data Era): With the rise of big data and NoSQL
databases, this phase introduced tools like Hadoop, AI, and machine
learning for handling vast and varied data types. The emphasis was on
processing large data volumes to extract insights.
 Data Analytics 3.0 (Data Science and Decision-Making Era): Leveraging
data science and tools like ML, DL, Python, and R, this era focuses on using
data analysis for innovation and sustainable decision-making. It helps
companies stay competitive by developing data-driven products, optimizing
services, and supporting R&D.

 Data Collection and Cleaning: Involves gathering data from different


sources (IoT devices, applications, social media, etc.) and storing it in a
centralized system like a cloud database. Cleaning ensures the data is
reliable and free of errors.
 Data Mining: Sorting, processing, and labeling data with metadata to
enable data scientists to identify trends and focus on insights rather than
manual tasks. Machine learning algorithms help automate this step.
 Descriptive and Exploratory Analysis: Summarizes "what is happening?"
and explores "why is it happening?". This stage uses descriptive statistics,
visualizations, and dashboard tools to gain initial insights.
 Predictive and Prescriptive Analysis: Uses trends and historical data to
predict and advise on future actions, supporting decision-makers with
actionable insights.
 Visualization and Reporting: Visualization tools like Power BI, Tableau, and
dashboards simplify complex datasets, enabling managers to understand
alternatives quickly and make informed decisions.

 Marketing: Marketing analytics assess the success of campaigns, customer


engagement, and ROI. AI and machine learning help optimize campaigns,
segment customers, and guide strategic marketing efforts.
 Human Resources (HR): HR analytics reveal insights on talent acquisition,
employee behavior, and retention. Tools allow HR leaders to optimize
recruitment processes, analyze employee decisions, and predict outcomes.
 Sales: Sales analytics identify key factors influencing customer purchases,
like price, seasonality, or availability, allowing teams to improve sales cycles
and forecasts.
 Finance: Financial analytics enhance budget planning, optimize cost
management, and increase profit margins by analyzing spending patterns,
using predictive modeling, and generating machine learning insights.
Industries:
 Transportation: Helps analyze traffic patterns and network congestion.
 Logistics and Delivery: Optimizes shipping routes and delivery times and
tracks shipments.
 Web Services: Enhances search engine algorithms, providing more relevant
search results.
 Manufacturing: Improves operational efficiency through predictive
maintenance, budgeting, and trend analysis.
 Security: Proactively addresses cybersecurity by analyzing and identifying
potential threats.
 Education: Supports student learning and engagement by analyzing
educational data and outcomes.
 Healthcare: Utilizes analytics to provide faster diagnosis, treatment options,
and personalized care by examining patient data in real-time.
Benefits of Data Analytics
 Competitive Advantage: Data analytics helps companies understand
industry trends, strategize against competitors, and grow in a changing
environment.
 Efficient Use of Data: Companies collect extensive data; analytics helps
them discern what data is valuable and how best to use it.
 Customer Relationship Building: By analyzing customer behavior,
companies can create personalized experiences, improving loyalty and
satisfaction.
 Strategic Planning and Forecasting: Analytics allows companies to adapt to
trends, optimize processes, and make data-driven decisions.
Future of Data Analytics
 Growing Demand: The data analytics field is expected to expand
significantly, with the market projected to grow by 30% annually, reaching a
value of $77.6 billion.
 Job Opportunities: Data analysts and data scientists are in high demand
across various industries. Roles will likely continue to grow as businesses
across all sectors increasingly rely on data-driven insights.
 Importance Across Sectors: Beyond IT, data analytics is becoming essential
in finance, healthcare, media, entertainment, and mobility, leading to
employment growth.
Data Analytics Tools
 Common Tools: Includes SQL, Excel, R, Python, Tableau, and Power BI.
These tools handle data collection, analysis, visualization, and reporting.
 Specialized Tools:
o Spreadsheets (Excel): Often used for data entry, pivot tables, and
data visualization.
o OLAP (Online Analytical Processing): Used for multidimensional
analysis in databases.
o Statistical and Quantitative Tools: Enable complex data analysis and
decision-making, such as decision trees, TOPSIS, and Bayesian
networks.
o Business Rule Engines (BRE): Help automate business rules and data
handling based on specific criteria.
o Simulation Tools: Model various scenarios to predict outcomes using
mathematical functions.
9. Data Cleaning
 Importance: Cleaning is crucial to ensure reliable results. A common saying,
"Garbage in, garbage out," highlights that clean data leads to meaningful
outcomes, whereas unclean data leads to unreliable results.
 Methods:
o Removing Duplicates: Filters out repeat records.
o Deleting Irrelevant Columns: Omits unnecessary data fields to
streamline analysis.
o Handling Missing Data: Missing values can be either imputed based
on other values or excluded from analysis.
o Outlier Management: Identifies and addresses data points that are
extreme or inconsistent with other values.
10. Data Analyst Role
 Responsibilities:
o Collect and clean data, identify trends, and create reports, charts,
and dashboards.
o Communicate insights through data storytelling and visualization.
o Collaborate with stakeholders to turn data insights into actionable
business strategies.
 Required Skills:
o Mathematical proficiency, statistical and programming skills (in
languages like SQL, R, Python), problem-solving, analytical thinking,
and effective communication.
 Tools and Software: Data analysts commonly use tools such as Excel, SQL,
Tableau, Power BI, Python, and R to gather, analyze, and present data.
Data Analysis:
 Process: Exploring, cleaning, transforming, and reporting data.
 Goal: Extract useful insights, suggest conclusions, and support decision-
making.
Tools for Data Analysis:
 Open Refine, Tableau, KNIME, Google Fusion Tables, Node XL.
Data Analytics:
 Focus: Using data visualization and statistical models for insights and better
decision-making.
 Defined as: Transforming data into actionable insights within an
organizational context.
Tools for Data Analytics:
 SAS, R, Python (with libraries), Tableau, Apache Spark, MS Excel.
 Focus:
o BI: Explains past performance using consistent metrics to guide planning.
o Data Analytics: Provides insights, predictions, and prescriptions based on
statistical and data transformation methods.
 Purpose:
o BI: Helps in decision-making by analyzing business operations and identifying
areas for improvement.
o Data Analytics: Transforms raw data into usable formats, supports decision-
making, and applies predictive analytics.
 Techniques:
o BI: Uses descriptive analytics to review past data.
o Data Analytics: Involves advanced modeling, cleaning, and predictive
techniques.
 Visualization Tools:
o BI: Dashboards for summarizing and presenting data insights.
o Data Analytics: Focus on creating new insights and visualizing results
dynamically.
 Typical Progression:
o Companies often implement BI first to understand their business, then advance to
Data Analytics for deeper insights and actionable recommendations.
 Definition:
o Data Analytics: Analyzes data to extract meaningful insights aligned with
business objectives, solving specific questions or problems.
o Data Science: Explores raw data to uncover insights, often answering open-ended
questions using advanced algorithms, statistical models, and programming.
 Focus:
o Data Analytics: Focuses on visualization and decision-making for defined
problems.
o Data Science: Focuses on exploring and modeling raw data to create predictive or
prescriptive solutions.
 Scope:
o Data Analytics: A subset of data science, addressing specific data-related
questions.
o Data Science: Broader, including data analytics, machine learning, data mining,
and other disciplines.
 Methods:
o Data Analytics: Relies on statistical methods to interpret structured data.
o Data Science: Involves coding, machine learning, and advanced statistical
techniques.
 Objective:
o Data Analytics: Solves well-defined business problems.
o Data Science: Creates new methodologies and models to derive insights from raw
data.
 Commonality:
o Both extract insights to support business decisions, but data science is a broader,
more technical field encompassing analytics.
 Definition:
o Business Analytics: Uses strategies and technologies to analyze industry-specific
data and guide decision-making for business growth.
o Data Analytics: Transforms raw or unstructured data into meaningful formats for
insights, conclusions, and predictions.
 Focus:
o Business Analytics: Prescribes solutions and plans specific to a business based on
metrics.
o Data Analytics: Explains data patterns and visualizes results using statistical
methods.
 Scope:
o Business Analytics: Measures past performance and aligns metrics with business
planning.
o Data Analytics: Explores and models data to discover new insights.
 Relationship:
o Integration: Companies often combine both; data analytics results are tailored for
use in business analytics decisions.
 Purpose:
o Data Analyst: Answers existing business questions through data analysis and
visualization.
o Data Scientist: Creates questions, builds models, and predicts future trends using
advanced techniques.
 Focus:
o Data Analyst: Works on data preparation, exploratory analysis, and descriptive
analytics.
o Data Scientist: Develops statistical models, machine learning algorithms, and
prescriptive analytics.
 Skills:
o Data Analyst: Proficient in data visualization tools, statistical analysis, and
database management.
o Data Scientist: Skilled in Python, R, Hadoop, machine learning, and software
development.
 Responsibilities:
o Data Analyst: Prepares reports and visualizations for decision-making.
o Data Scientist: Designs systems and models to automate and optimize operations.
 Complexity:
o Data Analyst: Focuses on simpler, analysis-level insights.
o Data Scientist: Tackles complex problems and builds data models.
 Tools:
o Data Analyst: Uses tools like Excel, Tableau, and SQL for visualization and
analysis.
o Data Scientist: Employs programming languages (Python, R) and platforms for
advanced analytics.
 Outcome:
o Data Analyst: Delivers actionable insights for stakeholders.
o Data Scientist: Creates predictive and prescriptive solutions for long-term
strategies.

Dirty Data: Any data that requires cleaning or preparation before analysis. It
includes:

1. Missing Data:
o Example: Missing values in variables essential for analysis, like customer ages
when analyzing purchasing behavior.
2. Duplicate Data:
o Example: Multiple identical records due to merging data from different sources.
3. Inconsistent or Incorrect Data:
o Example: Structural errors, typos, or inconsistent naming, such as mixed labels
like "Pass/Fail" and "G/B" in the same dataset.

"Garbage In, Garbage Out" (GIGO): Incorrect data leads to incorrect results.

Foundation for Analysis: Clean data ensures meaningful, reliable, and long-lasting
analysis, similar to a strong foundation for a house.

Cost of Dirty Data: Poor data practices can lead to significant long-term expenses.

Dirty data such as duplicate data, missing data , and de

Goal: Properly cleaned data is essential for extracting accurate and actionable
insights.

Simplified Data Cleaning Methods:


1. Remove Duplicates:
o Filter and eliminate repeated data, often introduced during collection.
2. Delete Irrelevant Columns:
o Remove non-essential data (e.g., IDs or birthdates) that don’t contribute to the
analysis.
3. Handle Missing Data:
o Options:
 Delete rows with missing values.
 Impute missing values based on other data.
 Mark as "0" or "missing."
o Choose the method carefully, as it impacts analysis.
4. Remove Outliers:
o Identify and decide whether to keep or remove values that deviate significantly
(e.g., test scores far below/above the norm).
5. Correct Inconsistencies:
o Resolve issues like typos or irregular naming conventions using manual methods
(e.g., "Find and Replace" in Excel) or filters.

Simplified Steps to Cleanse a Dataset in Excel:

1. Remove Duplicates:
o Select all data.
o Create a table (Insert → Table).
o Go to Data → Remove Duplicates to delete duplicate rows.
2. Handle Missing Data:
o Remove Blank Rows:
 Select all data and sort columns (A → Z or Z → A).
 Locate and delete blank rows.
o Find and Remove Blank Cells:
 Select a specific column (e.g., column F).
 Apply a filter (Data → Filter).
 In the filter dropdown, uncheck "Select All" and check only "Blanks."
 Delete rows with blank cells, then clear the filter.

Frequency Distribution: Shows how often a particular value occurs in a dataset.

Measures of Central Tendency: Include the mean, median, and mode, which estimate the
middle or average values.

Measures of Variability: Include range, standard deviation, and variance, which describe the
spread or variability in the dataset.

A pivot table summarizes large amounts of data by grouping it in meaningful ways (e.g., by sum
or average).

You might also like