0% found this document useful (0 votes)

3 views

A47E1-DA-R20 Nov 2023 scheme and Key solutions

Uploaded by

NEMANI SRINITYA NEMANI SRINITYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

A47E1-DA-R20 Nov 2023 scheme and Key solutions

Uploaded by

NEMANI SRINITYA NEMANI SRINITYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

B.V.

Raju Institute of Technology

Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

ANSWER KEY and SCHEME

IV B.Tech I Semester Regular/Supplementary Examinations, November 2023
DATA ANALYTICS
(Computer Science and Engineering)
Subject: DATA ANALYTICS
Code: A47E1
Regulation:R20
Q.no Question Marks
PART-A
1 a. What is Semi-Structured Data? 2
Semi-structured data is information that doesn't conform to the structure of
traditional databases but contains some organizational properties, often
characterized by flexible schemas or tags, allowing for varying structures within the
same dataset.
1 b. Outline the role of probability distribution in statistics. 2
Probability distributions in statistics define the possible outcomes of a random
variable, providing a structured framework to understand and predict the likelihood
of various events occurring within a given context.
1 c. Tell the use of data visualization. 2
Data visualization facilitates the exploration, communication, and interpretation of
complex data by presenting it in visual formats, aiding in understanding patterns,
relationships, and insights for informed decision-making.
1 d. Define Classification. 2
Classification is a machine learning technique that categorizes data into distinct
classes or categories based on patterns and features, enabling predictive analysis and
decision-making.
1 e. Explain the importance of Spline. 2
Splines, in data analysis and curve fitting, are crucial for creating smooth, continuous
curves from scattered data points, aiding in interpolation, smoothing, and
approximating complex relationships within datasets.
PART-B
2 a. Explain about the Characteristics of Data. 5
When referring to the characteristics of data with respect to "Vs," it typically aligns
with various dimensions of data known as the "Vs of Big Data," which encompass:
Volume: The sheer size of data, often the scale or quantity of information generated
or stored.
Velocity: The speed at which data is generated, collected, or processed, emphasizing
the rapid influx and analysis of data in real-time.
Variety: The diversity and range of data types, including structured, unstructured, or
semi-structured data.
Veracity: The accuracy and trustworthiness of data, ensuring its reliability for analysis
and decision-making.
Value: The significance or usefulness of the data in achieving insights, innovation, or
practical outcomes for businesses or applications.
These "Vs" help define the key characteristics and challenges involved in managing,
analyzing, and deriving value from diverse and extensive datasets.
2 b. Compare Structured and Unstructured data. 5
Structured data and unstructured data differ significantly in terms of their
organization, format, and usability:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

Structured Data:
Organization: Follows a predefined model or schema, typically stored in databases
with a clear format.
Format: Arranged in rows and columns, facilitating easy search, access, and analysis.
Usability: Well-suited for traditional database systems and easier to process with
standard querying and analysis methods.
Examples: Excel sheets, relational databases, organized text documents, etc.
Unstructured Data:
Organization: Lacks a predefined structure or fixed format, often stored in various
sources like text, images, videos, or social media.
Format: Not easily organized, making it more challenging to search, access, or analyze
directly.
Usability: Requires advanced tools and techniques like natural language processing
or machine learning for analysis and interpretation.
Examples: Text documents, images, audio, social media feeds, etc.
The distinction between structured and unstructured data lies in their organization
and accessibility. While structured data is orderly and easily processable,
unstructured data is more diverse and requires specialized methods for analysis and
interpretation.
3 a. List and explain Business Problems solved by Data Analytics 5
Data analytics aids in solving various business problems, Some of the applications
mentioned below:
Market Segmentation: Analyzing customer data to identify distinct segments,
enabling targeted marketing strategies tailored to specific consumer groups.
Predictive Analytics: Forecasting trends, customer behavior, and market demands to
anticipate future needs and make proactive decisions.
Customer Retention: Utilizing data to understand customer preferences,
satisfaction, and churn patterns to implement strategies for retaining customers.
Operational Efficiency: Optimizing processes, supply chain management, and
resource allocation based on data insights to enhance efficiency and reduce costs.
Risk Management: Using data to assess and mitigate risks by analyzing patterns,
anomalies, and trends, enhancing decision-making for risk reduction.
Product Development: Leveraging data to understand consumer feedback,
preferences, and market gaps, aiding in the development of new products or
improving existing ones.
Fraud Detection: Employing analytics to identify irregularities, unusual patterns, or
anomalies in financial or transactional data, thus mitigating fraud risks.
Optimizing Marketing Campaigns: Analyzing campaign performance, customer
response, and engagement metrics to refine marketing strategies for better
outcomes.
Personalization: Creating personalized experiences for customers by using data to
tailor services, products, or interactions based on individual preferences.
Supply Chain Optimization: Utilizing data to improve inventory management,
logistics, and demand forecasting for a streamlined and cost-effective supply chain.
Data analytics plays a pivotal role in addressing these business challenges by
providing insights, patterns, and trends that support informed decision-making and
strategy formulation.
3 b. Explain the need of Data Analytics. 5
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

The need for data analytics arises from the increasing volume, complexity, and
importance of data in various domains. Here are some key reasons why data
analytics is crucial:

1. Informed Decision-Making: Data analytics provides valuable insights, trends, and

patterns, empowering organizations to make informed and strategic decisions
based on evidence rather than intuition.
2. Competitive Advantage: In a data-driven world, organizations that harness the
power of analytics gain a competitive edge by identifying market trends,
understanding customer behavior, and optimizing operations.
3. Business Growth: Analytics helps identify new opportunities for growth by
uncovering market gaps, optimizing processes, and enabling innovation in products
and services.
4. Customer Understanding: Analyzing customer data allows businesses to
understand customer preferences, behavior, and expectations, facilitating
personalized and targeted marketing strategies.
5. Risk Management: Data analytics assists in identifying and mitigating risks by
detecting anomalies, patterns, or early indicators that might impact business
operations or finances.
6. Operational Efficiency: Optimization of processes, resource allocation, and
supply chain management based on data insights results in improved operational
efficiency and cost reduction.
7. Fraud Detection and Security: Analyzing patterns in data helps detect and
prevent fraudulent activities, enhancing security measures and safeguarding
sensitive information.
8. Healthcare and Research: In fields like healthcare, data analytics aids in medical
research, personalized medicine, and patient care optimization by analyzing vast
amounts of clinical and biological data.
9. Improved Forecasting: Analytics enables accurate forecasting in areas such as
demand, sales, and financial performance, helping organizations plan and allocate
resources effectively.
10. Enhanced Customer Experience: By analyzing customer interactions and
feedback, businesses can enhance the overall customer experience by tailoring
products, services, and support to meet customer expectations.
In essence, data analytics transforms raw data into actionable insights, driving
efficiency, innovation, and competitiveness across various industries and sectors.
4 a. Discuss about Discrete and Uniform Distributions with example. 5
Discrete Distribution: A discrete distribution describes the probability of distinct,
separate outcomes in a sample space. Each outcome has an associated probability,
and the probabilities sum up to 1. Examples of discrete distributions include the
binomial distribution and the Poisson distribution.
Uniform Distribution: A uniform distribution is a type of probability distribution
where each possible outcome has an equal probability of occurring. In other words,
all events are equally likely. The probability density function for a discrete uniform
distribution is given by P(X = x) = 1/n, where 'n' is the number of possible outcomes.
Example: Let's consider a fair six-sided die as an example of both a discrete and a
uniform distribution.
Discrete Distribution:
Sample Space (Outcomes): {1, 2, 3, 4, 5, 6}
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

Probability of Each Outcome: P(X = 1) = P(X = 2) = P(X = 3) = P(X = 4) = P(X = 5) = P(X =

6) = 1/6
Probability Distribution: Discrete, as each outcome is separate and distinct.
Uniform Distribution:
Probability of Each Outcome: P(X = 1) = P(X = 2) = P(X = 3) = P(X = 4) = P(X = 5) = P(X =
6) = 1/6
Probability Distribution: Uniform, as each outcome has an equal probability of 1/6.
In this example, the probability of rolling any specific number on a fair six-sided die is
the same for each outcome (uniform distribution), and each outcome is distinct and
separate (discrete distribution).

4 b. Illustrate the Binomial Distribution. 5

5 a. Compare Correlation and Covariance. 5

Correlation and covariance both measure relationships between variables, but they
differ in scale and interpretation:
Covariance:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

Definition: Covariance measures the degree to which two random variables vary
together.
Scale: It can take any value, positive, negative, or zero, indicating the direction
(positive or negative) and magnitude of the linear relationship between variables.
Interpretation: The magnitude of covariance doesn't have a standardized scale,
making it challenging to interpret. A positive covariance indicates a direct
relationship, while a negative covariance indicates an inverse relationship.
Correlation:
Definition: Correlation is a standardized measure of the strength and direction of the
linear relationship between two variables.
Scale: Ranging between -1 and 1, correlation provides a more interpretable and
standardized measure compared to covariance. A correlation of +1 signifies a perfect
positive linear relationship, -1 indicates a perfect negative linear relationship, and 0
implies no linear relationship.
Interpretation: Correlation allows for better comparison between different pairs of
variables as it's standardized, making it easier to interpret the strength and direction
of the relationship.
In summary, covariance measures the direction and magnitude of the relationship
between variables but lacks standardization, while correlation provides a
standardized measure, making it easier to interpret and compare relationships across
different pairs of variables.
5 b. Examine the Hypothesis Testing using ANOVA 5
ANOVA (Analysis of Variance) is a statistical method used to compare the means of
three or more groups to determine if there are statistically significant differences
among them. It's a hypothesis testing technique that evaluates whether the means
of several groups are equal or if at least one of the group means differs significantly
from the others.
Here are the key steps involved in hypothesis testing using ANOVA:
1. Formulate Hypotheses:
Null Hypothesis (H₀): Assumes that all group means are equal.
Alternative Hypothesis (H₁): Suggests that at least one group mean is different from
the others.
2. Collect Data:
Obtain data from multiple groups or treatments.
3. Calculate Variability:
ANOVA assesses both the variation within each group and the variation between
groups.
It computes the F-statistic by comparing the ratio of the between-group variance to
the within-group variance.
4. Determine Significance:
Use the F-statistic to calculate the p-value.
If the p-value is below a predetermined significance level (commonly 0.05), the null
hypothesis is rejected, indicating that there are significant differences among at least
one pair of group means.
If the p-value is higher than the significance level, the null hypothesis is retained,
suggesting no significant differences among group means.
5. Post-hoc Tests (if necessary):
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

If ANOVA indicates significant differences among group means, further pairwise

comparisons (e.g., Tukey's HSD or Bonferroni test) can be conducted to identify which
specific groups differ from each other.
6. Interpretation:
If the null hypothesis is rejected, it indicates that there is enough evidence to
conclude that at least one group mean is significantly different from the others. The
nature of this difference and which specific groups differ can be explored through
post-hoc analyses.
ANOVA is applicable in various fields, such as experimental research, social sciences,
and manufacturing, to compare means across multiple groups efficiently and
ascertain if these differences are statistically significant.

6 a. Explain Decision Tree algorithm with an example. 5

The Decision Tree algorithm is a supervised machine learning technique used for both
classification and regression tasks. It creates a tree-like structure where each internal
node represents a decision based on a feature, each branch represents an outcome
of that decision, and each leaf node represents a class label or a numerical value.
Let's illustrate the Decision Tree algorithm with a classification example:
Example: Predicting Play Tennis
Consider a dataset that predicts whether to play tennis based on weather conditions.
Outlook Temperature Humidity Windy Play Tennis
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rain Mild High False Yes
Rain Cool Normal False Yes
Rain Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rain Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rain Mild High True No
Building the Decision Tree:
Choose the Root Node:
Select the feature that best splits the data. (E.g., Outlook)
Split the Data:
Split the dataset based on the chosen feature.
Create Subtrees:
For each branch (e.g., Sunny, Overcast, Rain), continue recursively splitting the data
based on the next best feature until a stopping criterion is met (e.g., maximum depth
reached, no further improvement in information gain, etc.).
Assign Class Labels:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

At each leaf node, assign the majority class label of the instances falling into that
node.
Decision Tree for Play Tennis Example:

6 b. Explain about Naïve Bayes Classifier. 5

B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

7 a. Discuss about Random Forest algorithm with an example. 5

The Random Forest algorithm is an ensemble learning method used for both
classification and regression tasks. It operates by constructing multiple decision trees
during training and outputs the class that is the mode of the classes (classification) or
mean prediction (regression) of the individual trees.
Key Concepts:
Ensemble Learning:
Random Forest is an ensemble method that combines multiple individual models
(decision trees) to make more accurate and robust predictions.
Bagging (Bootstrap Aggregating):
It creates multiple subsets of the original dataset by random sampling with
replacement (bootstrap samples).
Each subset is used to train a separate decision tree.
Random Feature Selection:
At each node of the tree, a random subset of features is considered for splitting,
improving diversity among trees.
How Random Forest Works:
Bootstrapping:
Randomly select subsets of the original dataset (with replacement) to create multiple
training sets.
Build Decision Trees:
For each subset, construct a decision tree.
At each node, consider only a random subset of features for splitting.
Voting (Classification) or Averaging (Regression):
For classification, each tree "votes" for the class, and the class with the most votes
becomes the final prediction.
For regression, predictions from each tree are averaged to obtain the final prediction.

7 b. Illustrate the K-Means Clustering with an example. 5

K-Means is an unsupervised machine learning algorithm used for clustering data into
K distinct clusters. Let's illustrate K-Means clustering with a simple example:
Example: Customer Segmentation
Consider a dataset of customers based on their annual income and spending score.
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

Customer ID Annual Income (k$) Spending Score (1-100)

1 15 39
2 15 81
3 16 6
4 16 77
5 17 40
... ... ...
Steps for K-Means Clustering:
Choose the Number of Clusters (K):
Based on the data or domain knowledge, determine the optimal number of clusters
to create.
Initialize Cluster Centroids:
Randomly select K data points as initial centroids.
Assign Data Points to Nearest Centroids:
Calculate the distance (e.g., Euclidean distance) between each data point and each
centroid.
Assign each data point to the nearest centroid, forming K clusters.
Update Centroids:
Recalculate the centroids for each cluster by taking the mean of all data points
assigned to that cluster.
Repeat Steps 3 and 4:
Iterate the assignment and centroid update steps until convergence (when centroids
no longer change significantly or after a certain number of iterations).

8 a. Examine the read_csv() and read_table() functions in pandas with an example. 5

Both read_csv() and read_table() functions in the Pandas library are used to read
tabular data into a Pandas DataFrame. The primary difference lies in the default
delimiter they use for parsing the data.
read_csv() is specifically designed for comma-separated values (CSV) files but can also
handle other delimiters using the sep parameter.
read_table() is a more general function that can read tabular data where columns are
separated by a delimiter, which is often a tab (\t), but it can be specified using the
sep parameter as well.
Here's an example illustrating both functions:
Suppose we have a file named "data.txt" with the following content separated by
tabs:
Name Age City
Alice 25 New York
Bob 30 San Francisco
Charlie 35 Seattle
Using ‘read_table()’:
import pandas as pd

# Using read_csv with tab delimiter

data_csv = pd.read_csv('data.txt', sep='\t')

print("Data Read Using read_csv:")

B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

print(data_csv)

Using read_table():
import pandas as pd

# Using read_table with tab delimiter

data_table = pd.read_table('data.txt')

print("Data Read Using read_table:")

print(data_table)
Both functions will read the content of the "data.txt" file into a DataFrame. In this
case, since the data is separated by tabs, both functions will produce the same
output.
These functions are versatile and offer various parameters to handle different file
formats, headers, handling missing values, and more. The choice between read_csv()
and read_table() depends mainly on the specific data file's delimiter and the user's
preference.

8 b. Explain about the Bar chart with an example. 5

A bar chart is a graphical representation used to display categorical data with
rectangular bars. The length or height of each bar is proportional to the values they
represent. It's an effective way to compare categories or show changes over time for
discrete categories.
Let's create a simple example of a bar chart using Python's Matplotlib library to
visualize the sales data for different products:

import matplotlib.pyplot as plt

# Data for products and their sales

products = ['Product A', 'Product B', 'Product C', 'Product D']
sales = [350, 420, 300, 500]

# Create a bar chart

plt.figure(figsize=(8, 6)) # Set the figure size (optional)
plt.bar(products, sales, color='skyblue')

# Add labels and title

plt.xlabel('Products')
plt.ylabel('Sales')
plt.title('Sales of Different Products')

# Show the plot

plt.xticks(rotation=45) # Rotate x-axis labels for better readability
plt.tight_layout() # Adjust layout to prevent clipping of labels
plt.show()
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

9 a. Define Boxplot and explain it with an example. 5

A boxplot, also known as a box-and-whisker plot, is a graphical representation that
displays the distribution of a dataset along with its key statistical properties. It
provides a visual summary of the central tendency, variability, and potential outliers
within the data.
Here are the components of a boxplot:
Median (Q2): The middle value of the dataset, dividing it into two equal halves.
Quartiles (Q1 and Q3): The first quartile (Q1) represents the 25th percentile, and
the third quartile (Q3) represents the 75th percentile of the data.
Interquartile Range (IQR): The range between Q1 and Q3, indicating the spread of
the middle 50% of the data.
Whiskers: Lines extending from the box that represent the range of the data,
excluding outliers.
Outliers: Data points that fall outside the whiskers' range, considered potential
anomalies or extreme values.
Example of Creating a Boxplot using Python (Matplotlib):
Suppose we have a dataset representing the scores of students in a test:

import matplotlib.pyplot as plt

import numpy as np

# Generate random scores data

np.random.seed(42)
scores = np.random.normal(70, 10, 100) # Mean=70, Std. Deviation=10, 100
samples
# Create a boxplot
plt.figure(figsize=(6, 5)) # Set figure size (optional)
plt.boxplot(scores, vert=False, patch_artist=True,
boxprops=dict(facecolor='lightblue'))
# Add labels and title
plt.xlabel('Scores')
plt.title('Distribution of Test Scores')
# Show the plot
plt.tight_layout()
plt.show()
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

9 b. Construct a python program to plot the Pie Chart. 5

import matplotlib.pyplot as plt
# Data for the pie chart
labels = ['A', 'B', 'C', 'D']
sizes = [25, 30, 20, 25] # Sizes or proportions for each
category
# Create a pie chart
plt.figure(figsize=(8, 6)) # Set the figure size
(optional)
plt.pie(sizes, labels=labels, autopct='%1.1f%%',
startangle=140, colors=['skyblue', 'lightgreen',
'lightcoral', 'lightsalmon'])
# Add title
plt.title('Distribution of Categories')
# Show the plot
plt.axis('equal') # Equal aspect ratio ensures that pie is
drawn as a circle.
plt.show()

10 a. Explain the architecture of Tableau, with a brief outline. 5

Tableau's architecture involves various components that work together to create a
data visualization environment. Here's a brief outline:
Data Sources:
Tableau can connect to various data sources, including databases, files, cloud
services, and live data connections.
Tableau Desktop:
The primary tool for creating visualizations. It allows users to connect to data sources,
create dashboards, and design visualizations through a drag-and-drop interface.
Tableau Server:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

It is a centralized server-based platform where Tableau workbooks and visualizations

are published and shared across an organization. Users can access these resources
via a web browser or Tableau Desktop.
Tableau Online:
A cloud-based version of Tableau Server, allowing users to publish and share
visualizations on the cloud.
Tableau Prep:
A data preparation tool that helps in cleaning, transforming, and shaping data before
visualization, making it easier to work with various data sources.
Data Engine:
Tableau's in-memory data engine enables fast querying and analysis of large
datasets. It stores aggregated data for quicker access during visualizations.
Gateway and Load Balancer:
These components manage the communication between Tableau clients and the
server, ensuring secure and efficient data transfer.
Clients (Web and Mobile):
Users access Tableau visualizations through web browsers, Tableau Desktop, or
mobile devices, allowing them to interact with and explore visualizations.
Metadata Repository:
Stores metadata related to users, permissions, data connections, and workbooks,
facilitating governance and security.
Extensions and APIs:
Tableau supports extensions and APIs that allow integration with other applications,
custom visualizations, and extending functionality.
Tableau's architecture is designed to provide a user-friendly and interactive
environment for data exploration, analysis, and sharing insights across an
organization. It emphasizes ease of use, scalability, and flexibility in handling diverse
data sources and creating impactful visualizations.
10 b. Give the steps involved in connecting data in Tableau 5
Connecting data in Tableau involves several steps:

1. Launch Tableau:
 Open Tableau Desktop or Tableau Server/Online and start a new workbook
or project.
2. Select Data Source:
 Click on the "Connect" option in Tableau. A list of available data connection
options will be displayed.
3. Choose Data Connection Type:
 Select the appropriate data connection type based on your data source.
Options include files (Excel, CSV), databases (MySQL, SQL Server), cloud
services (Google BigQuery, Amazon Redshift), and more.
4. Connect to Data:
 Once the connection type is selected, you'll be prompted to provide specific
details based on the chosen source (e.g., file path, server name,
credentials).
5. Select Tables or Files:
 For database connections, select the tables or views you want to work with.
 For file-based connections, navigate to and select the file(s) containing the
data.
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

6. Data Preparation:
 After connecting, Tableau might offer options for data preparation. Here,
you can perform initial data cleansing, filtering, or aggregation before
starting your analysis.
7. Data Refresh (Optional):
 For live connections, you might have options to set up data refresh
schedules to keep the data up-to-date.
 For extracts (in-memory data), choose how frequently you want to refresh
the extract.
8. Start Visualization:
 Once the data is connected and prepared, drag and drop fields from the
data pane to the canvas to create visualizations like charts, graphs, or
dashboards.
9. Explore and Analyze:
 Utilize Tableau's features to explore the data, create calculated fields, apply
filters, and generate insights through interactive visualizations.
10. Save and Share:
 Save the workbook or project with the connected data. If using Tableau
Server or Tableau Online, publish the workbook to share it with others in
your organization.
These steps may vary slightly based on the specific data source or the version of
Tableau you're using. Tableau provides a user-friendly interface for connecting to
various data sources and seamlessly creating visualizations for data analysis and
storytelling.
11 a. Mention the details of Interface of Tableau 5
Tableau offers a user-friendly interface designed to facilitate data exploration,
visualization creation, and analysis. Here are the key components of the Tableau
interface:
1. Data Pane:
 Displays the data source(s) connected to Tableau. It shows tables,
dimensions, measures, and calculated fields available for use in
visualizations.
2. Shelves:
 Columns, Rows, Pages, and Filters shelves: These shelves are used to drag
and drop fields from the Data Pane to define the structure of visualizations.
3. Canvas:
 The main area where visualizations (charts, graphs, maps) are created by
dragging fields onto Rows, Columns, and other shelves. It displays the live
preview of the created visualization.
4. Marks Pane:
 Allows users to control the appearance and properties of the marks in a
visualization (e.g., color, size, label).
5. Show Me Pane:
 Provides a visual guide with suggestions for different types of visualizations
based on the fields selected. It helps users quickly generate suitable chart
types.
6. Toolbar:
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

 Contains various tools and options to interact with and customize the
visualization. It includes options for data connectivity, saving, formatting,
undo/redo, and sharing.
7. Worksheet/ Dashboard Tabs:
 Tabs for different worksheets or dashboards within a workbook. Users can
switch between different sheets or dashboards to work on multiple
visualizations.
8. Navigation Pane:
 Allows navigation between different worksheets, dashboards, stories, and
data sources within a project.
9. Status Bar:
 Displays information about the data connection status, data refresh, and
other updates related to the workbook or project.
10. Server/Online Features:
 Additional features and options appear for Tableau Server or Tableau
Online users, such as publishing, permissions, collaboration, and access to
shared content.
Tableau's interface aims to be intuitive, offering a drag-and-drop environment that
allows users to explore data, create insightful visualizations, and tell stories with
data effectively. The interface elements are organized to facilitate a seamless
workflow from data connection to visualization and sharing.
11 b. Give an outline about Top charts used in Tableau. 5
Tableau offers a wide range of charts to visualize data effectively. Here's an
outline of some of the top charts frequently used in Tableau:
1. Bar Chart:
 Displays categorical data using horizontal or vertical bars.
 Suitable for comparing categories or showing trends over time.
2. Line Chart:
 Shows trends or changes in data over continuous intervals (e.g., time
series).
 Useful for visualizing trends, fluctuations, or correlations.
3. Pie Chart:
 Represents parts of a whole by dividing a circle into sectors.
 Effective for displaying proportions or percentages of categorical
data.
4. Scatter Plot:
 Represents individual data points on a two-dimensional plane.
 Helpful for visualizing relationships or correlations between two
numerical variables.
5. Area Chart:
 Similar to a line chart but fills the area below the line.
 Shows cumulative totals or proportions over time.
6. Histogram:
 Displays the distribution of numerical data by dividing it into bins or
intervals.
B.V.Raju Institute of Technology
Vishnupur, Narsapur, MedakDist – 502313 (UGC-Autonomous)
Computer Science and Engineering

 Useful for understanding the frequency or distribution of continuous

data.
7. Heat Map:
 Represents data values with colors in a matrix format.
 Ideal for displaying data density or patterns across two categorical
dimensions.
8. Tree Map:
 Visualizes hierarchical data using nested rectangles, where the size of
each rectangle represents a quantitative value.
 Useful for displaying part-to-whole relationships in a hierarchical
structure.
9. Box-and-Whisker Plot (Boxplot):
 Shows the distribution of numerical data through quartiles (median,
quartiles, outliers).
 Helps in understanding the spread and central tendency of the data.
10. Gantt Chart:
 Represents project schedules or timelines, displaying tasks, durations,
and progress bars.
 Useful for project management and scheduling purposes.
11. Bullet Graph:
 Combines a bar chart with additional markers to compare actual and
target values.
 Efficiently represents progress towards a goal or performance metrics.
These charts are just a glimpse of the diverse range of visualization options
available in Tableau. Choosing the right chart type depends on the data
characteristics, the story you want to convey, and the insights you aim to
extract from the data. Tableau's intuitive interface and extensive chart options
empower users to create impactful and insightful visualizations tailored to
their data needs.

Third Person Shooter Kit v2.0 Documentation
No ratings yet
Third Person Shooter Kit v2.0 Documentation
102 pages
Data Analytics Quantum
75% (4)
Data Analytics Quantum
142 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
AMO UWB Module (SR150) : Amosense Co., LTD
100% (1)
AMO UWB Module (SR150) : Amosense Co., LTD
19 pages
BUSINESS ANALYTICS NOTES
No ratings yet
BUSINESS ANALYTICS NOTES
31 pages
Unit 1 Introduction to Data Analytics
No ratings yet
Unit 1 Introduction to Data Analytics
20 pages
Data Analysis _Unit1
No ratings yet
Data Analysis _Unit1
65 pages
Business Analytics Summary (Units 1.2 - 1.8)
No ratings yet
Business Analytics Summary (Units 1.2 - 1.8)
8 pages
dataanalyticsunit-1[1]
No ratings yet
dataanalyticsunit-1[1]
26 pages
BDA Unit 1
No ratings yet
BDA Unit 1
22 pages
Unit 1
No ratings yet
Unit 1
21 pages
Week 1
No ratings yet
Week 1
50 pages
Data Analytics III-i
No ratings yet
Data Analytics III-i
85 pages
Data Science Notes
No ratings yet
Data Science Notes
56 pages
Unit 2
No ratings yet
Unit 2
35 pages
BA_Unit 1
No ratings yet
BA_Unit 1
16 pages
BA_Unit_1_merged[1] Highlighted
No ratings yet
BA_Unit_1_merged[1] Highlighted
103 pages
Data sci notes
No ratings yet
Data sci notes
88 pages
Fda621s Unit 2 2023
No ratings yet
Fda621s Unit 2 2023
38 pages
Module 3
No ratings yet
Module 3
137 pages
Document (10).pdf_20250324_162852_0000
No ratings yet
Document (10).pdf_20250324_162852_0000
3 pages
ch-1.pdf
No ratings yet
ch-1.pdf
19 pages
1 - Konsep Big Data
No ratings yet
1 - Konsep Big Data
35 pages
pankajSeminar
No ratings yet
pankajSeminar
39 pages
DATA ANALYTICS
No ratings yet
DATA ANALYTICS
42 pages
Bda Unit 1
No ratings yet
Bda Unit 1
74 pages
Bda Unit 1
No ratings yet
Bda Unit 1
24 pages
BA Th Exam
No ratings yet
BA Th Exam
38 pages
L01-Fundamentals of Big Data and Data Analytics (1)
No ratings yet
L01-Fundamentals of Big Data and Data Analytics (1)
58 pages
Da Unit-1
No ratings yet
Da Unit-1
23 pages
business analytics
No ratings yet
business analytics
34 pages
Unit 2 - DA - Data Analysis
No ratings yet
Unit 2 - DA - Data Analysis
124 pages
Unit 1
No ratings yet
Unit 1
61 pages
Business Analytics UNIT 1
No ratings yet
Business Analytics UNIT 1
25 pages
BIG DATA ANALYTICS
No ratings yet
BIG DATA ANALYTICS
10 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
4 pages
CHAPTER-1
No ratings yet
CHAPTER-1
149 pages
Data Analytics Complete Notes
No ratings yet
Data Analytics Complete Notes
33 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Bda
No ratings yet
Bda
36 pages
DA (1)
No ratings yet
DA (1)
86 pages
Data Anaytics
No ratings yet
Data Anaytics
52 pages
Ad404 Data Science Notes Unit-2
No ratings yet
Ad404 Data Science Notes Unit-2
21 pages
What is Data Analytics
No ratings yet
What is Data Analytics
12 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
Module 1 - Data Science Introduction _Detailed
No ratings yet
Module 1 - Data Science Introduction _Detailed
131 pages
Summary_ Introduction to Data Analytics (2)-3978
No ratings yet
Summary_ Introduction to Data Analytics (2)-3978
7 pages
Lec_1_ABA
No ratings yet
Lec_1_ABA
19 pages
BIG DATA ANALYTICS NOTES (1)
No ratings yet
BIG DATA ANALYTICS NOTES (1)
81 pages
Introduction-to-Data-Analytics
No ratings yet
Introduction-to-Data-Analytics
15 pages
BIG DATA ANALYTICS notes unit 1 and 2
No ratings yet
BIG DATA ANALYTICS notes unit 1 and 2
34 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
DA Merge Notes(30!09!24)
No ratings yet
DA Merge Notes(30!09!24)
348 pages
Unit 2 - DA - Data Analysis
No ratings yet
Unit 2 - DA - Data Analysis
113 pages
Quantum DA Review
No ratings yet
Quantum DA Review
28 pages
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
No ratings yet
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
117 pages
Internship Report
No ratings yet
Internship Report
9 pages
Data Analytics III I
No ratings yet
Data Analytics III I
86 pages
3-2 Csd Bda Full Notes
No ratings yet
3-2 Csd Bda Full Notes
115 pages
Unit_1.pptx
No ratings yet
Unit_1.pptx
57 pages
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Class Vii Ict Annual Examrevision Questions
No ratings yet
Class Vii Ict Annual Examrevision Questions
24 pages
Creativity in Intelligent Technologies and Data Science 2019
No ratings yet
Creativity in Intelligent Technologies and Data Science 2019
449 pages
Business Process Reengineering
No ratings yet
Business Process Reengineering
19 pages
TCL Service Manual: 40S6500FS/RT41FB-AG
No ratings yet
TCL Service Manual: 40S6500FS/RT41FB-AG
60 pages
CH1 GRAP Lecture
No ratings yet
CH1 GRAP Lecture
34 pages
Keyboard Shortcuts (2022-06-09)
No ratings yet
Keyboard Shortcuts (2022-06-09)
2 pages
Quiz App Report Chapters
No ratings yet
Quiz App Report Chapters
21 pages
MVC Quiznetonline
No ratings yet
MVC Quiznetonline
15 pages
Cpa Overview
No ratings yet
Cpa Overview
75 pages
Utilizing Python For Neutrosophic Theory: A Study of Neutrosophic Crisp Sets and Topological Spaces
No ratings yet
Utilizing Python For Neutrosophic Theory: A Study of Neutrosophic Crisp Sets and Topological Spaces
12 pages
Carta de Motivacion
No ratings yet
Carta de Motivacion
2 pages
Case Study - Mandarin Oriental
No ratings yet
Case Study - Mandarin Oriental
3 pages
Pos Printer Dir 80220 Q&A
No ratings yet
Pos Printer Dir 80220 Q&A
2 pages
PACSystems RX3i and RSTi-EP PROFINET IO Controller - User Manual
No ratings yet
PACSystems RX3i and RSTi-EP PROFINET IO Controller - User Manual
170 pages
Statistics With R Specialization
No ratings yet
Statistics With R Specialization
15 pages
SM Youtube Case Study
No ratings yet
SM Youtube Case Study
12 pages
Mesc 76-210
100% (1)
Mesc 76-210
3 pages
A5001 Alarm in ABB ACS355 - ABB ACS355 Drive - Click2electro Forum
No ratings yet
A5001 Alarm in ABB ACS355 - ABB ACS355 Drive - Click2electro Forum
3 pages
CSS 9 TQ
No ratings yet
CSS 9 TQ
7 pages
Mobirad 400 Spec Sheet Sitec 01
No ratings yet
Mobirad 400 Spec Sheet Sitec 01
6 pages
Project On Movie Ticketing System
No ratings yet
Project On Movie Ticketing System
31 pages
Experiment-7 BDA
No ratings yet
Experiment-7 BDA
4 pages
Ebay Shopping Cart
No ratings yet
Ebay Shopping Cart
3 pages
Watchtower 2023 Eoy Report en
No ratings yet
Watchtower 2023 Eoy Report en
52 pages
SDH & PDH
100% (5)
SDH & PDH
5 pages
Connect Microsoft Fabric Lakehouse With SQL Server Management System (SSMS)
No ratings yet
Connect Microsoft Fabric Lakehouse With SQL Server Management System (SSMS)
9 pages
NG Categories
No ratings yet
NG Categories
24 pages
PMC Module 7 Assignment Sana Parveen d17891
0% (1)
PMC Module 7 Assignment Sana Parveen d17891
29 pages

A47E1-DA-R20 Nov 2023 scheme and Key solutions

Uploaded by

A47E1-DA-R20 Nov 2023 scheme and Key solutions

Uploaded by

B.V.

Raju Institute of Technology

ANSWER KEY and SCHEME

1. Informed Decision-Making: Data analytics provides valuable insights, trends, and

Probability of Each Outcome: P(X = 1) = P(X = 2) = P(X = 3) = P(X = 4) = P(X = 5) = P(X =

4 b. Illustrate the Binomial Distribution. 5

5 a. Compare Correlation and Covariance. 5

If ANOVA indicates significant differences among group means, further pairwise

6 a. Explain Decision Tree algorithm with an example. 5

6 b. Explain about Naïve Bayes Classifier. 5

7 a. Discuss about Random Forest algorithm with an example. 5

7 b. Illustrate the K-Means Clustering with an example. 5

Customer ID Annual Income (k$) Spending Score (1-100)

8 a. Examine the read_csv() and read_table() functions in pandas with an example. 5

# Using read_csv with tab delimiter

print("Data Read Using read_csv:")

# Using read_table with tab delimiter

print("Data Read Using read_table:")

8 b. Explain about the Bar chart with an example. 5

import matplotlib.pyplot as plt

# Data for products and their sales

# Create a bar chart

# Add labels and title

# Show the plot

9 a. Define Boxplot and explain it with an example. 5

import matplotlib.pyplot as plt

# Generate random scores data

9 b. Construct a python program to plot the Pie Chart. 5

10 a. Explain the architecture of Tableau, with a brief outline. 5

It is a centralized server-based platform where Tableau workbooks and visualizations

 Useful for understanding the frequency or distribution of continuous

You might also like