A47E1-DA-R20 Nov 2023 scheme and Key solutions
A47E1-DA-R20 Nov 2023 scheme and Key solutions
Structured Data:
Organization: Follows a predefined model or schema, typically stored in databases
with a clear format.
Format: Arranged in rows and columns, facilitating easy search, access, and analysis.
Usability: Well-suited for traditional database systems and easier to process with
standard querying and analysis methods.
Examples: Excel sheets, relational databases, organized text documents, etc.
Unstructured Data:
Organization: Lacks a predefined structure or fixed format, often stored in various
sources like text, images, videos, or social media.
Format: Not easily organized, making it more challenging to search, access, or analyze
directly.
Usability: Requires advanced tools and techniques like natural language processing
or machine learning for analysis and interpretation.
Examples: Text documents, images, audio, social media feeds, etc.
The distinction between structured and unstructured data lies in their organization
and accessibility. While structured data is orderly and easily processable,
unstructured data is more diverse and requires specialized methods for analysis and
interpretation.
3 a. List and explain Business Problems solved by Data Analytics 5
Data analytics aids in solving various business problems, Some of the applications
mentioned below:
Market Segmentation: Analyzing customer data to identify distinct segments,
enabling targeted marketing strategies tailored to specific consumer groups.
Predictive Analytics: Forecasting trends, customer behavior, and market demands to
anticipate future needs and make proactive decisions.
Customer Retention: Utilizing data to understand customer preferences,
satisfaction, and churn patterns to implement strategies for retaining customers.
Operational Efficiency: Optimizing processes, supply chain management, and
resource allocation based on data insights to enhance efficiency and reduce costs.
Risk Management: Using data to assess and mitigate risks by analyzing patterns,
anomalies, and trends, enhancing decision-making for risk reduction.
Product Development: Leveraging data to understand consumer feedback,
preferences, and market gaps, aiding in the development of new products or
improving existing ones.
Fraud Detection: Employing analytics to identify irregularities, unusual patterns, or
anomalies in financial or transactional data, thus mitigating fraud risks.
Optimizing Marketing Campaigns: Analyzing campaign performance, customer
response, and engagement metrics to refine marketing strategies for better
outcomes.
Personalization: Creating personalized experiences for customers by using data to
tailor services, products, or interactions based on individual preferences.
Supply Chain Optimization: Utilizing data to improve inventory management,
logistics, and demand forecasting for a streamlined and cost-effective supply chain.
Data analytics plays a pivotal role in addressing these business challenges by
providing insights, patterns, and trends that support informed decision-making and
strategy formulation.
3 b. Explain the need of Data Analytics. 5
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering
The need for data analytics arises from the increasing volume, complexity, and
importance of data in various domains. Here are some key reasons why data
analytics is crucial:
Definition: Covariance measures the degree to which two random variables vary
together.
Scale: It can take any value, positive, negative, or zero, indicating the direction
(positive or negative) and magnitude of the linear relationship between variables.
Interpretation: The magnitude of covariance doesn't have a standardized scale,
making it challenging to interpret. A positive covariance indicates a direct
relationship, while a negative covariance indicates an inverse relationship.
Correlation:
Definition: Correlation is a standardized measure of the strength and direction of the
linear relationship between two variables.
Scale: Ranging between -1 and 1, correlation provides a more interpretable and
standardized measure compared to covariance. A correlation of +1 signifies a perfect
positive linear relationship, -1 indicates a perfect negative linear relationship, and 0
implies no linear relationship.
Interpretation: Correlation allows for better comparison between different pairs of
variables as it's standardized, making it easier to interpret the strength and direction
of the relationship.
In summary, covariance measures the direction and magnitude of the relationship
between variables but lacks standardization, while correlation provides a
standardized measure, making it easier to interpret and compare relationships across
different pairs of variables.
5 b. Examine the Hypothesis Testing using ANOVA 5
ANOVA (Analysis of Variance) is a statistical method used to compare the means of
three or more groups to determine if there are statistically significant differences
among them. It's a hypothesis testing technique that evaluates whether the means
of several groups are equal or if at least one of the group means differs significantly
from the others.
Here are the key steps involved in hypothesis testing using ANOVA:
1. Formulate Hypotheses:
Null Hypothesis (H₀): Assumes that all group means are equal.
Alternative Hypothesis (H₁): Suggests that at least one group mean is different from
the others.
2. Collect Data:
Obtain data from multiple groups or treatments.
3. Calculate Variability:
ANOVA assesses both the variation within each group and the variation between
groups.
It computes the F-statistic by comparing the ratio of the between-group variance to
the within-group variance.
4. Determine Significance:
Use the F-statistic to calculate the p-value.
If the p-value is below a predetermined significance level (commonly 0.05), the null
hypothesis is rejected, indicating that there are significant differences among at least
one pair of group means.
If the p-value is higher than the significance level, the null hypothesis is retained,
suggesting no significant differences among group means.
5. Post-hoc Tests (if necessary):
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering
At each leaf node, assign the majority class label of the instances falling into that
node.
Decision Tree for Play Tennis Example:
print(data_csv)
Using read_table():
import pandas as pd
1. Launch Tableau:
Open Tableau Desktop or Tableau Server/Online and start a new workbook
or project.
2. Select Data Source:
Click on the "Connect" option in Tableau. A list of available data connection
options will be displayed.
3. Choose Data Connection Type:
Select the appropriate data connection type based on your data source.
Options include files (Excel, CSV), databases (MySQL, SQL Server), cloud
services (Google BigQuery, Amazon Redshift), and more.
4. Connect to Data:
Once the connection type is selected, you'll be prompted to provide specific
details based on the chosen source (e.g., file path, server name,
credentials).
5. Select Tables or Files:
For database connections, select the tables or views you want to work with.
For file-based connections, navigate to and select the file(s) containing the
data.
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering
6. Data Preparation:
After connecting, Tableau might offer options for data preparation. Here,
you can perform initial data cleansing, filtering, or aggregation before
starting your analysis.
7. Data Refresh (Optional):
For live connections, you might have options to set up data refresh
schedules to keep the data up-to-date.
For extracts (in-memory data), choose how frequently you want to refresh
the extract.
8. Start Visualization:
Once the data is connected and prepared, drag and drop fields from the
data pane to the canvas to create visualizations like charts, graphs, or
dashboards.
9. Explore and Analyze:
Utilize Tableau's features to explore the data, create calculated fields, apply
filters, and generate insights through interactive visualizations.
10. Save and Share:
Save the workbook or project with the connected data. If using Tableau
Server or Tableau Online, publish the workbook to share it with others in
your organization.
These steps may vary slightly based on the specific data source or the version of
Tableau you're using. Tableau provides a user-friendly interface for connecting to
various data sources and seamlessly creating visualizations for data analysis and
storytelling.
11 a. Mention the details of Interface of Tableau 5
Tableau offers a user-friendly interface designed to facilitate data exploration,
visualization creation, and analysis. Here are the key components of the Tableau
interface:
1. Data Pane:
Displays the data source(s) connected to Tableau. It shows tables,
dimensions, measures, and calculated fields available for use in
visualizations.
2. Shelves:
Columns, Rows, Pages, and Filters shelves: These shelves are used to drag
and drop fields from the Data Pane to define the structure of visualizations.
3. Canvas:
The main area where visualizations (charts, graphs, maps) are created by
dragging fields onto Rows, Columns, and other shelves. It displays the live
preview of the created visualization.
4. Marks Pane:
Allows users to control the appearance and properties of the marks in a
visualization (e.g., color, size, label).
5. Show Me Pane:
Provides a visual guide with suggestions for different types of visualizations
based on the fields selected. It helps users quickly generate suitable chart
types.
6. Toolbar:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering
Contains various tools and options to interact with and customize the
visualization. It includes options for data connectivity, saving, formatting,
undo/redo, and sharing.
7. Worksheet/ Dashboard Tabs:
Tabs for different worksheets or dashboards within a workbook. Users can
switch between different sheets or dashboards to work on multiple
visualizations.
8. Navigation Pane:
Allows navigation between different worksheets, dashboards, stories, and
data sources within a project.
9. Status Bar:
Displays information about the data connection status, data refresh, and
other updates related to the workbook or project.
10. Server/Online Features:
Additional features and options appear for Tableau Server or Tableau
Online users, such as publishing, permissions, collaboration, and access to
shared content.
Tableau's interface aims to be intuitive, offering a drag-and-drop environment that
allows users to explore data, create insightful visualizations, and tell stories with
data effectively. The interface elements are organized to facilitate a seamless
workflow from data connection to visualization and sharing.
11 b. Give an outline about Top charts used in Tableau. 5
Tableau offers a wide range of charts to visualize data effectively. Here's an
outline of some of the top charts frequently used in Tableau:
1. Bar Chart:
Displays categorical data using horizontal or vertical bars.
Suitable for comparing categories or showing trends over time.
2. Line Chart:
Shows trends or changes in data over continuous intervals (e.g., time
series).
Useful for visualizing trends, fluctuations, or correlations.
3. Pie Chart:
Represents parts of a whole by dividing a circle into sectors.
Effective for displaying proportions or percentages of categorical
data.
4. Scatter Plot:
Represents individual data points on a two-dimensional plane.
Helpful for visualizing relationships or correlations between two
numerical variables.
5. Area Chart:
Similar to a line chart but fills the area below the line.
Shows cumulative totals or proportions over time.
6. Histogram:
Displays the distribution of numerical data by dividing it into bins or
intervals.
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering