0% found this document useful (0 votes)
3 views

A47E1-DA-R20 Nov 2023 scheme and Key solutions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

A47E1-DA-R20 Nov 2023 scheme and Key solutions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

B.V.

Raju Institute of Technology


Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

ANSWER KEY and SCHEME


IV B.Tech I Semester Regular/Supplementary Examinations, November 2023
DATA ANALYTICS
(Computer Science and Engineering)
Subject: DATA ANALYTICS
Code: A47E1
Regulation:R20
Q.no Question Marks
PART-A
1 a. What is Semi-Structured Data? 2
Semi-structured data is information that doesn't conform to the structure of
traditional databases but contains some organizational properties, often
characterized by flexible schemas or tags, allowing for varying structures within the
same dataset.
1 b. Outline the role of probability distribution in statistics. 2
Probability distributions in statistics define the possible outcomes of a random
variable, providing a structured framework to understand and predict the likelihood
of various events occurring within a given context.
1 c. Tell the use of data visualization. 2
Data visualization facilitates the exploration, communication, and interpretation of
complex data by presenting it in visual formats, aiding in understanding patterns,
relationships, and insights for informed decision-making.
1 d. Define Classification. 2
Classification is a machine learning technique that categorizes data into distinct
classes or categories based on patterns and features, enabling predictive analysis and
decision-making.
1 e. Explain the importance of Spline. 2
Splines, in data analysis and curve fitting, are crucial for creating smooth, continuous
curves from scattered data points, aiding in interpolation, smoothing, and
approximating complex relationships within datasets.
PART-B
2 a. Explain about the Characteristics of Data. 5
When referring to the characteristics of data with respect to "Vs," it typically aligns
with various dimensions of data known as the "Vs of Big Data," which encompass:
Volume: The sheer size of data, often the scale or quantity of information generated
or stored.
Velocity: The speed at which data is generated, collected, or processed, emphasizing
the rapid influx and analysis of data in real-time.
Variety: The diversity and range of data types, including structured, unstructured, or
semi-structured data.
Veracity: The accuracy and trustworthiness of data, ensuring its reliability for analysis
and decision-making.
Value: The significance or usefulness of the data in achieving insights, innovation, or
practical outcomes for businesses or applications.
These "Vs" help define the key characteristics and challenges involved in managing,
analyzing, and deriving value from diverse and extensive datasets.
2 b. Compare Structured and Unstructured data. 5
Structured data and unstructured data differ significantly in terms of their
organization, format, and usability:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

Structured Data:
Organization: Follows a predefined model or schema, typically stored in databases
with a clear format.
Format: Arranged in rows and columns, facilitating easy search, access, and analysis.
Usability: Well-suited for traditional database systems and easier to process with
standard querying and analysis methods.
Examples: Excel sheets, relational databases, organized text documents, etc.
Unstructured Data:
Organization: Lacks a predefined structure or fixed format, often stored in various
sources like text, images, videos, or social media.
Format: Not easily organized, making it more challenging to search, access, or analyze
directly.
Usability: Requires advanced tools and techniques like natural language processing
or machine learning for analysis and interpretation.
Examples: Text documents, images, audio, social media feeds, etc.
The distinction between structured and unstructured data lies in their organization
and accessibility. While structured data is orderly and easily processable,
unstructured data is more diverse and requires specialized methods for analysis and
interpretation.
3 a. List and explain Business Problems solved by Data Analytics 5
Data analytics aids in solving various business problems, Some of the applications
mentioned below:
Market Segmentation: Analyzing customer data to identify distinct segments,
enabling targeted marketing strategies tailored to specific consumer groups.
Predictive Analytics: Forecasting trends, customer behavior, and market demands to
anticipate future needs and make proactive decisions.
Customer Retention: Utilizing data to understand customer preferences,
satisfaction, and churn patterns to implement strategies for retaining customers.
Operational Efficiency: Optimizing processes, supply chain management, and
resource allocation based on data insights to enhance efficiency and reduce costs.
Risk Management: Using data to assess and mitigate risks by analyzing patterns,
anomalies, and trends, enhancing decision-making for risk reduction.
Product Development: Leveraging data to understand consumer feedback,
preferences, and market gaps, aiding in the development of new products or
improving existing ones.
Fraud Detection: Employing analytics to identify irregularities, unusual patterns, or
anomalies in financial or transactional data, thus mitigating fraud risks.
Optimizing Marketing Campaigns: Analyzing campaign performance, customer
response, and engagement metrics to refine marketing strategies for better
outcomes.
Personalization: Creating personalized experiences for customers by using data to
tailor services, products, or interactions based on individual preferences.
Supply Chain Optimization: Utilizing data to improve inventory management,
logistics, and demand forecasting for a streamlined and cost-effective supply chain.
Data analytics plays a pivotal role in addressing these business challenges by
providing insights, patterns, and trends that support informed decision-making and
strategy formulation.
3 b. Explain the need of Data Analytics. 5
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

The need for data analytics arises from the increasing volume, complexity, and
importance of data in various domains. Here are some key reasons why data
analytics is crucial:

1. Informed Decision-Making: Data analytics provides valuable insights, trends, and


patterns, empowering organizations to make informed and strategic decisions
based on evidence rather than intuition.
2. Competitive Advantage: In a data-driven world, organizations that harness the
power of analytics gain a competitive edge by identifying market trends,
understanding customer behavior, and optimizing operations.
3. Business Growth: Analytics helps identify new opportunities for growth by
uncovering market gaps, optimizing processes, and enabling innovation in products
and services.
4. Customer Understanding: Analyzing customer data allows businesses to
understand customer preferences, behavior, and expectations, facilitating
personalized and targeted marketing strategies.
5. Risk Management: Data analytics assists in identifying and mitigating risks by
detecting anomalies, patterns, or early indicators that might impact business
operations or finances.
6. Operational Efficiency: Optimization of processes, resource allocation, and
supply chain management based on data insights results in improved operational
efficiency and cost reduction.
7. Fraud Detection and Security: Analyzing patterns in data helps detect and
prevent fraudulent activities, enhancing security measures and safeguarding
sensitive information.
8. Healthcare and Research: In fields like healthcare, data analytics aids in medical
research, personalized medicine, and patient care optimization by analyzing vast
amounts of clinical and biological data.
9. Improved Forecasting: Analytics enables accurate forecasting in areas such as
demand, sales, and financial performance, helping organizations plan and allocate
resources effectively.
10. Enhanced Customer Experience: By analyzing customer interactions and
feedback, businesses can enhance the overall customer experience by tailoring
products, services, and support to meet customer expectations.
In essence, data analytics transforms raw data into actionable insights, driving
efficiency, innovation, and competitiveness across various industries and sectors.
4 a. Discuss about Discrete and Uniform Distributions with example. 5
Discrete Distribution: A discrete distribution describes the probability of distinct,
separate outcomes in a sample space. Each outcome has an associated probability,
and the probabilities sum up to 1. Examples of discrete distributions include the
binomial distribution and the Poisson distribution.
Uniform Distribution: A uniform distribution is a type of probability distribution
where each possible outcome has an equal probability of occurring. In other words,
all events are equally likely. The probability density function for a discrete uniform
distribution is given by P(X = x) = 1/n, where 'n' is the number of possible outcomes.
Example: Let's consider a fair six-sided die as an example of both a discrete and a
uniform distribution.
Discrete Distribution:
Sample Space (Outcomes): {1, 2, 3, 4, 5, 6}
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

Probability of Each Outcome: P(X = 1) = P(X = 2) = P(X = 3) = P(X = 4) = P(X = 5) = P(X =


6) = 1/6
Probability Distribution: Discrete, as each outcome is separate and distinct.
Uniform Distribution:
Probability of Each Outcome: P(X = 1) = P(X = 2) = P(X = 3) = P(X = 4) = P(X = 5) = P(X =
6) = 1/6
Probability Distribution: Uniform, as each outcome has an equal probability of 1/6.
In this example, the probability of rolling any specific number on a fair six-sided die is
the same for each outcome (uniform distribution), and each outcome is distinct and
separate (discrete distribution).

4 b. Illustrate the Binomial Distribution. 5

5 a. Compare Correlation and Covariance. 5


Correlation and covariance both measure relationships between variables, but they
differ in scale and interpretation:
Covariance:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

Definition: Covariance measures the degree to which two random variables vary
together.
Scale: It can take any value, positive, negative, or zero, indicating the direction
(positive or negative) and magnitude of the linear relationship between variables.
Interpretation: The magnitude of covariance doesn't have a standardized scale,
making it challenging to interpret. A positive covariance indicates a direct
relationship, while a negative covariance indicates an inverse relationship.
Correlation:
Definition: Correlation is a standardized measure of the strength and direction of the
linear relationship between two variables.
Scale: Ranging between -1 and 1, correlation provides a more interpretable and
standardized measure compared to covariance. A correlation of +1 signifies a perfect
positive linear relationship, -1 indicates a perfect negative linear relationship, and 0
implies no linear relationship.
Interpretation: Correlation allows for better comparison between different pairs of
variables as it's standardized, making it easier to interpret the strength and direction
of the relationship.
In summary, covariance measures the direction and magnitude of the relationship
between variables but lacks standardization, while correlation provides a
standardized measure, making it easier to interpret and compare relationships across
different pairs of variables.
5 b. Examine the Hypothesis Testing using ANOVA 5
ANOVA (Analysis of Variance) is a statistical method used to compare the means of
three or more groups to determine if there are statistically significant differences
among them. It's a hypothesis testing technique that evaluates whether the means
of several groups are equal or if at least one of the group means differs significantly
from the others.
Here are the key steps involved in hypothesis testing using ANOVA:
1. Formulate Hypotheses:
Null Hypothesis (H₀): Assumes that all group means are equal.
Alternative Hypothesis (H₁): Suggests that at least one group mean is different from
the others.
2. Collect Data:
Obtain data from multiple groups or treatments.
3. Calculate Variability:
ANOVA assesses both the variation within each group and the variation between
groups.
It computes the F-statistic by comparing the ratio of the between-group variance to
the within-group variance.
4. Determine Significance:
Use the F-statistic to calculate the p-value.
If the p-value is below a predetermined significance level (commonly 0.05), the null
hypothesis is rejected, indicating that there are significant differences among at least
one pair of group means.
If the p-value is higher than the significance level, the null hypothesis is retained,
suggesting no significant differences among group means.
5. Post-hoc Tests (if necessary):
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

If ANOVA indicates significant differences among group means, further pairwise


comparisons (e.g., Tukey's HSD or Bonferroni test) can be conducted to identify which
specific groups differ from each other.
6. Interpretation:
If the null hypothesis is rejected, it indicates that there is enough evidence to
conclude that at least one group mean is significantly different from the others. The
nature of this difference and which specific groups differ can be explored through
post-hoc analyses.
ANOVA is applicable in various fields, such as experimental research, social sciences,
and manufacturing, to compare means across multiple groups efficiently and
ascertain if these differences are statistically significant.

6 a. Explain Decision Tree algorithm with an example. 5


The Decision Tree algorithm is a supervised machine learning technique used for both
classification and regression tasks. It creates a tree-like structure where each internal
node represents a decision based on a feature, each branch represents an outcome
of that decision, and each leaf node represents a class label or a numerical value.
Let's illustrate the Decision Tree algorithm with a classification example:
Example: Predicting Play Tennis
Consider a dataset that predicts whether to play tennis based on weather conditions.
Outlook Temperature Humidity Windy Play Tennis
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rain Mild High False Yes
Rain Cool Normal False Yes
Rain Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rain Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rain Mild High True No
Building the Decision Tree:
Choose the Root Node:
Select the feature that best splits the data. (E.g., Outlook)
Split the Data:
Split the dataset based on the chosen feature.
Create Subtrees:
For each branch (e.g., Sunny, Overcast, Rain), continue recursively splitting the data
based on the next best feature until a stopping criterion is met (e.g., maximum depth
reached, no further improvement in information gain, etc.).
Assign Class Labels:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

At each leaf node, assign the majority class label of the instances falling into that
node.
Decision Tree for Play Tennis Example:

6 b. Explain about Naïve Bayes Classifier. 5


B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

7 a. Discuss about Random Forest algorithm with an example. 5


The Random Forest algorithm is an ensemble learning method used for both
classification and regression tasks. It operates by constructing multiple decision trees
during training and outputs the class that is the mode of the classes (classification) or
mean prediction (regression) of the individual trees.
Key Concepts:
Ensemble Learning:
Random Forest is an ensemble method that combines multiple individual models
(decision trees) to make more accurate and robust predictions.
Bagging (Bootstrap Aggregating):
It creates multiple subsets of the original dataset by random sampling with
replacement (bootstrap samples).
Each subset is used to train a separate decision tree.
Random Feature Selection:
At each node of the tree, a random subset of features is considered for splitting,
improving diversity among trees.
How Random Forest Works:
Bootstrapping:
Randomly select subsets of the original dataset (with replacement) to create multiple
training sets.
Build Decision Trees:
For each subset, construct a decision tree.
At each node, consider only a random subset of features for splitting.
Voting (Classification) or Averaging (Regression):
For classification, each tree "votes" for the class, and the class with the most votes
becomes the final prediction.
For regression, predictions from each tree are averaged to obtain the final prediction.

7 b. Illustrate the K-Means Clustering with an example. 5


K-Means is an unsupervised machine learning algorithm used for clustering data into
K distinct clusters. Let's illustrate K-Means clustering with a simple example:
Example: Customer Segmentation
Consider a dataset of customers based on their annual income and spending score.
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

Customer ID Annual Income (k$) Spending Score (1-100)


1 15 39
2 15 81
3 16 6
4 16 77
5 17 40
... ... ...
Steps for K-Means Clustering:
Choose the Number of Clusters (K):
Based on the data or domain knowledge, determine the optimal number of clusters
to create.
Initialize Cluster Centroids:
Randomly select K data points as initial centroids.
Assign Data Points to Nearest Centroids:
Calculate the distance (e.g., Euclidean distance) between each data point and each
centroid.
Assign each data point to the nearest centroid, forming K clusters.
Update Centroids:
Recalculate the centroids for each cluster by taking the mean of all data points
assigned to that cluster.
Repeat Steps 3 and 4:
Iterate the assignment and centroid update steps until convergence (when centroids
no longer change significantly or after a certain number of iterations).

8 a. Examine the read_csv() and read_table() functions in pandas with an example. 5


Both read_csv() and read_table() functions in the Pandas library are used to read
tabular data into a Pandas DataFrame. The primary difference lies in the default
delimiter they use for parsing the data.
read_csv() is specifically designed for comma-separated values (CSV) files but can also
handle other delimiters using the sep parameter.
read_table() is a more general function that can read tabular data where columns are
separated by a delimiter, which is often a tab (\t), but it can be specified using the
sep parameter as well.
Here's an example illustrating both functions:
Suppose we have a file named "data.txt" with the following content separated by
tabs:
Name Age City
Alice 25 New York
Bob 30 San Francisco
Charlie 35 Seattle
Using ‘read_table()’:
import pandas as pd

# Using read_csv with tab delimiter


data_csv = pd.read_csv('data.txt', sep='\t')

print("Data Read Using read_csv:")


B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

print(data_csv)

Using read_table():
import pandas as pd

# Using read_table with tab delimiter


data_table = pd.read_table('data.txt')

print("Data Read Using read_table:")


print(data_table)
Both functions will read the content of the "data.txt" file into a DataFrame. In this
case, since the data is separated by tabs, both functions will produce the same
output.
These functions are versatile and offer various parameters to handle different file
formats, headers, handling missing values, and more. The choice between read_csv()
and read_table() depends mainly on the specific data file's delimiter and the user's
preference.

8 b. Explain about the Bar chart with an example. 5


A bar chart is a graphical representation used to display categorical data with
rectangular bars. The length or height of each bar is proportional to the values they
represent. It's an effective way to compare categories or show changes over time for
discrete categories.
Let's create a simple example of a bar chart using Python's Matplotlib library to
visualize the sales data for different products:

import matplotlib.pyplot as plt

# Data for products and their sales


products = ['Product A', 'Product B', 'Product C', 'Product D']
sales = [350, 420, 300, 500]

# Create a bar chart


plt.figure(figsize=(8, 6)) # Set the figure size (optional)
plt.bar(products, sales, color='skyblue')

# Add labels and title


plt.xlabel('Products')
plt.ylabel('Sales')
plt.title('Sales of Different Products')

# Show the plot


plt.xticks(rotation=45) # Rotate x-axis labels for better readability
plt.tight_layout() # Adjust layout to prevent clipping of labels
plt.show()
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

9 a. Define Boxplot and explain it with an example. 5


A boxplot, also known as a box-and-whisker plot, is a graphical representation that
displays the distribution of a dataset along with its key statistical properties. It
provides a visual summary of the central tendency, variability, and potential outliers
within the data.
Here are the components of a boxplot:
Median (Q2): The middle value of the dataset, dividing it into two equal halves.
Quartiles (Q1 and Q3): The first quartile (Q1) represents the 25th percentile, and
the third quartile (Q3) represents the 75th percentile of the data.
Interquartile Range (IQR): The range between Q1 and Q3, indicating the spread of
the middle 50% of the data.
Whiskers: Lines extending from the box that represent the range of the data,
excluding outliers.
Outliers: Data points that fall outside the whiskers' range, considered potential
anomalies or extreme values.
Example of Creating a Boxplot using Python (Matplotlib):
Suppose we have a dataset representing the scores of students in a test:

import matplotlib.pyplot as plt


import numpy as np

# Generate random scores data


np.random.seed(42)
scores = np.random.normal(70, 10, 100) # Mean=70, Std. Deviation=10, 100
samples
# Create a boxplot
plt.figure(figsize=(6, 5)) # Set figure size (optional)
plt.boxplot(scores, vert=False, patch_artist=True,
boxprops=dict(facecolor='lightblue'))
# Add labels and title
plt.xlabel('Scores')
plt.title('Distribution of Test Scores')
# Show the plot
plt.tight_layout()
plt.show()
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

9 b. Construct a python program to plot the Pie Chart. 5


import matplotlib.pyplot as plt
# Data for the pie chart
labels = ['A', 'B', 'C', 'D']
sizes = [25, 30, 20, 25] # Sizes or proportions for each
category
# Create a pie chart
plt.figure(figsize=(8, 6)) # Set the figure size
(optional)
plt.pie(sizes, labels=labels, autopct='%1.1f%%',
startangle=140, colors=['skyblue', 'lightgreen',
'lightcoral', 'lightsalmon'])
# Add title
plt.title('Distribution of Categories')
# Show the plot
plt.axis('equal') # Equal aspect ratio ensures that pie is
drawn as a circle.
plt.show()

10 a. Explain the architecture of Tableau, with a brief outline. 5


Tableau's architecture involves various components that work together to create a
data visualization environment. Here's a brief outline:
Data Sources:
Tableau can connect to various data sources, including databases, files, cloud
services, and live data connections.
Tableau Desktop:
The primary tool for creating visualizations. It allows users to connect to data sources,
create dashboards, and design visualizations through a drag-and-drop interface.
Tableau Server:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

It is a centralized server-based platform where Tableau workbooks and visualizations


are published and shared across an organization. Users can access these resources
via a web browser or Tableau Desktop.
Tableau Online:
A cloud-based version of Tableau Server, allowing users to publish and share
visualizations on the cloud.
Tableau Prep:
A data preparation tool that helps in cleaning, transforming, and shaping data before
visualization, making it easier to work with various data sources.
Data Engine:
Tableau's in-memory data engine enables fast querying and analysis of large
datasets. It stores aggregated data for quicker access during visualizations.
Gateway and Load Balancer:
These components manage the communication between Tableau clients and the
server, ensuring secure and efficient data transfer.
Clients (Web and Mobile):
Users access Tableau visualizations through web browsers, Tableau Desktop, or
mobile devices, allowing them to interact with and explore visualizations.
Metadata Repository:
Stores metadata related to users, permissions, data connections, and workbooks,
facilitating governance and security.
Extensions and APIs:
Tableau supports extensions and APIs that allow integration with other applications,
custom visualizations, and extending functionality.
Tableau's architecture is designed to provide a user-friendly and interactive
environment for data exploration, analysis, and sharing insights across an
organization. It emphasizes ease of use, scalability, and flexibility in handling diverse
data sources and creating impactful visualizations.
10 b. Give the steps involved in connecting data in Tableau 5
Connecting data in Tableau involves several steps:

1. Launch Tableau:
 Open Tableau Desktop or Tableau Server/Online and start a new workbook
or project.
2. Select Data Source:
 Click on the "Connect" option in Tableau. A list of available data connection
options will be displayed.
3. Choose Data Connection Type:
 Select the appropriate data connection type based on your data source.
Options include files (Excel, CSV), databases (MySQL, SQL Server), cloud
services (Google BigQuery, Amazon Redshift), and more.
4. Connect to Data:
 Once the connection type is selected, you'll be prompted to provide specific
details based on the chosen source (e.g., file path, server name,
credentials).
5. Select Tables or Files:
 For database connections, select the tables or views you want to work with.
 For file-based connections, navigate to and select the file(s) containing the
data.
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

6. Data Preparation:
 After connecting, Tableau might offer options for data preparation. Here,
you can perform initial data cleansing, filtering, or aggregation before
starting your analysis.
7. Data Refresh (Optional):
 For live connections, you might have options to set up data refresh
schedules to keep the data up-to-date.
 For extracts (in-memory data), choose how frequently you want to refresh
the extract.
8. Start Visualization:
 Once the data is connected and prepared, drag and drop fields from the
data pane to the canvas to create visualizations like charts, graphs, or
dashboards.
9. Explore and Analyze:
 Utilize Tableau's features to explore the data, create calculated fields, apply
filters, and generate insights through interactive visualizations.
10. Save and Share:
 Save the workbook or project with the connected data. If using Tableau
Server or Tableau Online, publish the workbook to share it with others in
your organization.
These steps may vary slightly based on the specific data source or the version of
Tableau you're using. Tableau provides a user-friendly interface for connecting to
various data sources and seamlessly creating visualizations for data analysis and
storytelling.
11 a. Mention the details of Interface of Tableau 5
Tableau offers a user-friendly interface designed to facilitate data exploration,
visualization creation, and analysis. Here are the key components of the Tableau
interface:
1. Data Pane:
 Displays the data source(s) connected to Tableau. It shows tables,
dimensions, measures, and calculated fields available for use in
visualizations.
2. Shelves:
 Columns, Rows, Pages, and Filters shelves: These shelves are used to drag
and drop fields from the Data Pane to define the structure of visualizations.
3. Canvas:
 The main area where visualizations (charts, graphs, maps) are created by
dragging fields onto Rows, Columns, and other shelves. It displays the live
preview of the created visualization.
4. Marks Pane:
 Allows users to control the appearance and properties of the marks in a
visualization (e.g., color, size, label).
5. Show Me Pane:
 Provides a visual guide with suggestions for different types of visualizations
based on the fields selected. It helps users quickly generate suitable chart
types.
6. Toolbar:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

 Contains various tools and options to interact with and customize the
visualization. It includes options for data connectivity, saving, formatting,
undo/redo, and sharing.
7. Worksheet/ Dashboard Tabs:
 Tabs for different worksheets or dashboards within a workbook. Users can
switch between different sheets or dashboards to work on multiple
visualizations.
8. Navigation Pane:
 Allows navigation between different worksheets, dashboards, stories, and
data sources within a project.
9. Status Bar:
 Displays information about the data connection status, data refresh, and
other updates related to the workbook or project.
10. Server/Online Features:
 Additional features and options appear for Tableau Server or Tableau
Online users, such as publishing, permissions, collaboration, and access to
shared content.
Tableau's interface aims to be intuitive, offering a drag-and-drop environment that
allows users to explore data, create insightful visualizations, and tell stories with
data effectively. The interface elements are organized to facilitate a seamless
workflow from data connection to visualization and sharing.
11 b. Give an outline about Top charts used in Tableau. 5
Tableau offers a wide range of charts to visualize data effectively. Here's an
outline of some of the top charts frequently used in Tableau:
1. Bar Chart:
 Displays categorical data using horizontal or vertical bars.
 Suitable for comparing categories or showing trends over time.
2. Line Chart:
 Shows trends or changes in data over continuous intervals (e.g., time
series).
 Useful for visualizing trends, fluctuations, or correlations.
3. Pie Chart:
 Represents parts of a whole by dividing a circle into sectors.
 Effective for displaying proportions or percentages of categorical
data.
4. Scatter Plot:
 Represents individual data points on a two-dimensional plane.
 Helpful for visualizing relationships or correlations between two
numerical variables.
5. Area Chart:
 Similar to a line chart but fills the area below the line.
 Shows cumulative totals or proportions over time.
6. Histogram:
 Displays the distribution of numerical data by dividing it into bins or
intervals.
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

 Useful for understanding the frequency or distribution of continuous


data.
7. Heat Map:
 Represents data values with colors in a matrix format.
 Ideal for displaying data density or patterns across two categorical
dimensions.
8. Tree Map:
 Visualizes hierarchical data using nested rectangles, where the size of
each rectangle represents a quantitative value.
 Useful for displaying part-to-whole relationships in a hierarchical
structure.
9. Box-and-Whisker Plot (Boxplot):
 Shows the distribution of numerical data through quartiles (median,
quartiles, outliers).
 Helps in understanding the spread and central tendency of the data.
10. Gantt Chart:
 Represents project schedules or timelines, displaying tasks, durations,
and progress bars.
 Useful for project management and scheduling purposes.
11. Bullet Graph:
 Combines a bar chart with additional markers to compare actual and
target values.
 Efficiently represents progress towards a goal or performance metrics.
These charts are just a glimpse of the diverse range of visualization options
available in Tableau. Choosing the right chart type depends on the data
characteristics, the story you want to convey, and the insights you aim to
extract from the data. Tableau's intuitive interface and extensive chart options
empower users to create impactful and insightful visualizations tailored to
their data needs.

You might also like