DEV Lab Record
DEV Lab Record
List of Experiments
Staff Incharge
1
Ex.No: Installation of the data Analysis and Visualization tool
01 Power BI
AIM
ALGORITHM
PROGRAM
1. Download and Install Power BI Desktop
1. Download Power BI Desktop:
o Go to the Power BI Desktop download page.
o Click on “Download Free” to start the download.
2
o Locate the downloaded installer file (usually in your
Downloads folder).
o Double-click the file to start the installation process.
o Follow the on-screen instructions to complete the installation.
You may need to accept the license agreement and choose an
installation location.
3
3. Launch Power BI Desktop:
o After installation, open Power BI Desktop from the Start menu
or desktop shortcut.
2. Initial Configuration
1. Configure Initial Settings:
o When Power BI Desktop opens for the first time, you may be
prompted to sign in. You can sign in with a Microsoft account,
but this is optional for local use.
o Choose any initial settings based on your preferences or leave
them as defaults.
4
3. Import Data
1. Load Data:
o Click on the “Home” tab in the ribbon.
o Click “Get Data” to open the data source options.
o Choose the type of data source (e.g., Excel, CSV, SQL Server)
and follow the prompts to connect to your data source.
5
4. Create Visualizations
1. Add Visualizations:
o Once your data is loaded and cleaned, you can start creating
visualizations.
o In the “Report” view, drag and drop fields from your data onto
the report canvas.
o Choose from various visualization types (e.g., bar charts, line
charts, pie charts) from the “Visualizations” pane.
6
2. Customize Visualizations:
o Click on a visualization to configure its properties (e.g., axis
titles, colors).
o Use the “Format” pane to adjust visual settings.
7
3. Arrange Visualizations:
o Arrange and resize visualizations on the canvas to create a
meaningful report layout.
8
2. Share Reports:
o After publishing, you can share the report with others by
providing them access through the Power BI Service.
o Use sharing options available in Power BI Service to control
who can view or interact with the report.
RESULT
AIM
ALGORITHM
9
PROGRAM
import pandas as pd
# Top 10 senders
top_senders = df['Sender'].value_counts().head(10)
10
# Plotting the top senders
plt.figure(figsize=(10, 6))
top_senders.plot(kind='bar', color='skyblue')
plt.title('Top 10 Senders')
plt.xlabel('Sender')
plt.ylabel('Number of Emails')
plt.show()
11
OUTPUT
12
13
RESULT
AIM
ALGORITHM
PROGRAM
1. NumPy Arrays
pip install numpy
import numpy as np
# 1D Array
arr_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", arr_1d)
# 2D Array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", arr_2d)
14
print("Identity Matrix:\n", identity)
# Element-wise operations
arr_add = arr_1d + 10
arr_mult = arr_2d * 2
# Matrix operations
matrix_mult = np.dot(arr_2d, np.array([[1, 0], [0, 1], [1, 0]]))
print("Matrix Multiplication:\n", matrix_mult)
2. Pandas DataFrames
pip install pandas
import pandas as pd
df = pd.DataFrame(data)
print("DataFrame:\n", df)
# Basic statistics
print("Statistics:\n", df.describe(include='all'))
# Filtering rows
filtered_df = df[df['Age'] > 28]
print("Filtered DataFrame:\n", filtered_df)
15
3. Basic Plots Using Matplotlib
pip install matplotlib
import matplotlib.pyplot as plt
# Line Plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(10, 6))
plt.plot(x, y, label='Sine Wave', color='blue', linestyle='-')
plt.title('Line Plot')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.legend()
plt.grid(True)
plt.show()
# Scatter Plot
x = np.random.rand(50)
y = np.random.rand(50)
plt.figure(figsize=(10, 6))
plt.scatter(x, y, color='red', alpha=0.5)
plt.title('Scatter Plot')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.grid(True)
plt.show()
# Histogram
data = np.random.randn(1000)
plt.figure(figsize=(10, 6))
plt.hist(data, bins=30, edgecolor='black')
plt.title('Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# Bar Plot
categories = ['A', 'B', 'C']
16
values = [10, 20, 15]
plt.figure(figsize=(10, 6))
plt.bar(categories, values, color='green')
plt.title('Bar Plot')
plt.xlabel('Category')
plt.ylabel('Value')
plt.grid(True)
plt.show()
OUTPUT
17
18
RESULT
AIM
ALGORITHM
19
PROGRAM
Step 1: Data Loading and Initial Inspection
# Load a sample dataset (e.g., the 'mtcars' dataset)
data(mtcars)
20
# Scatter plot of Horsepower vs Miles Per Gallon
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = 'blue') +
labs(title = "Horsepower vs Miles Per Gallon", x = "Horsepower", y =
"Miles Per Gallon")
OUTPUT
21
22
23
RESULT
AIM
ALGORITHM
PROGRAM
pip install pandas numpy matplotlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
24
data = pd.DataFrame(date_rng, columns=['date'])
data['value'] = np.sin(np.linspace(0, 10, len(date_rng))) +
np.random.normal(0, 0.5, len(date_rng))
print(data.head())
plt.figure(figsize=(12, 6))
plt.plot(data.index, data['value'], label='Value')
plt.title('Time Series Plot')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show()
plt.figure(figsize=(12, 6))
plt.plot(data.index, data['value'], label='Value', alpha=0.5)
plt.plot(data.index, data['rolling_mean'], label='Rolling Mean', color='red')
plt.plot(data.index, data['rolling_std'], label='Rolling Std Dev',
color='orange')
plt.title('Rolling Statistics')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show()
plt.figure(figsize=(12, 6))
autocorrelation_plot(data['value'])
plt.title('Autocorrelation Plot')
plt.grid(True)
plt.show()
25
data['month'] = data.index.month
# Plot by month
plt.figure(figsize=(12, 6))
for month in range(1, 13):
monthly_data = data[data['month'] == month]
plt.plot(monthly_data.index, monthly_data['value'], label=f'Month
{month}')
plt.title('Seasonal Plot by Month')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show()
OUTPUT
26
RESULT
27
Ex.No: Implementation of Data Analysis and representation on
06 a Map using various Map data sets
AIM
ALGORITHM
PROGRAM
pip install folium geopandas plotly pandas
import folium
import geopandas as gpd
import pandas as pd
# Load some data for analysis (e.g., population or any other metric)
# For this example, we'll use a simple dataset:
data = pd.DataFrame({
'iso_a3': ['USA', 'CAN', 'MEX'],
'population': [331002651, 37742154, 126190788]
})
28
# Add a Choropleth map
folium.Choropleth(
geo_data=world,
name="choropleth",
data=world,
columns=["iso_a3", "population"],
key_on="feature.properties.iso_a3",
fill_color="YlGn",
fill_opacity=0.7,
line_opacity=0.2,
legend_name="Population by Country",
).add_to(m)
OUTPUT
29
RESULT
AIM
ALGORITHM
30
PROGRAM
pip install geopandas folium matplotlib
31
plt.title('World Population Map')
plt.show()
32
RESULT
AIM
ALGORITHM
33
PROGRAM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Correlation heatmap
plt.figure(figsize=(10, 8))
corr = wine_data.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Wine Quality Data')
plt.show()
34
plt.title('Alcohol Content vs Wine Quality')
plt.show()
# 3D Scatter plot
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(wine_data['sulphates'], wine_data['alcohol'],
wine_data['quality'], c=wine_data['quality'], cmap='viridis')
ax.set_xlabel('Sulphates')
ax.set_ylabel('Alcohol')
ax.set_zlabel('Quality')
plt.show()
OUTPUT
35
36
37
38
RESULT
Ex.No: Case study on a data set and apply the various EDA and
09 visualization techniques and present an analysis report.
39
AIM
ALGORITHM
PROGRAM
The Titanic dataset is a classic dataset often used to demonstrate various
data analysis techniques. This dataset provides information on the
passengers aboard the Titanic, including whether they survived the
disaster, their age, gender, class, fare, and other details. This case study
will apply various EDA and visualization techniques to uncover insights
from this data and present an analysis report.
Dataset Overview
Dataset: Titanic passenger data.
Features:
o Survived: Survival (0 = No, 1 = Yes)
o Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd)
o Sex: Sex
o Age: Age in years
o SibSp: Number of siblings/spouses aboard the Titanic
o Parch: Number of parents/children aboard the Titanic
o Fare: Passenger fare
o Embarked: Port of Embarkation (C = Cherbourg; Q =
Queenstown; S = Southampton)
40
# Display the first few rows of the dataset
print(titanic_data.head())
# Summary statistics
print(titanic_data.describe())
Observation:
The dataset has 891 entries and several features, some of which
have missing values (e.g., Age, Cabin, and Embarked).
Step 2: Data Cleaning
# Handle missing values by filling or dropping
titanic_data['Age'].fillna(titanic_data['Age'].median(), inplace=True)
titanic_data['Embarked'].fillna(titanic_data['Embarked'].mode()[0],
inplace=True)
titanic_data.drop(columns=['Cabin'], inplace=True) # Drop Cabin due to
too many missing values
# Distribution of Age
sns.histplot(titanic_data['Age'], bins=30, kde=True, color='blue')
plt.title('Age Distribution of Passengers')
plt.show()
Observation:
The majority of passengers did not survive.
Most passengers are between 20 and 40 years old.
Step 4: Bivariate Analysis
# Survival rate by class
sns.barplot(x='Pclass', y='Survived', data=titanic_data, palette='viridis')
41
plt.title('Survival Rate by Passenger Class')
plt.show()
# Correlation heatmap
plt.figure(figsize=(10, 8))
corr = titanic_data.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()
Observation:
The survival rate is highest for females in 1st class.
There is a strong negative correlation between Pclass and Survived,
indicating that higher-class passengers were more likely to survive.
Insights
1. Class and Survival: Passengers in higher classes had better
survival chances, with 1st class being the safest.
2. Gender and Survival: Females had a much higher survival rate,
especially in the 1st and 2nd classes.
3. Age Factor: Younger passengers had a better survival rate, with
children particularly having a higher chance of survival.
4. Embarkation Point: Passengers who embarked at Cherbourg (C)
had a higher survival rate compared to other embarkation points.
5. Multivariate Interaction: The combination of gender, class, and
age played a crucial role in determining the survival of a passenger.
42
OUTPUT
43
44
45
RESULT
46