Fds Record
Fds Record
Aim:
To Download, install and explore the features of NumPy, SciPy, Jupyter,
Statsmodels and Pandas packages by using pip command.
Basic Tools:
a. Python
b. Numpy
c. Scipy
d. Matplotlib
e. Pandas
f. statmodels
g. seaborn
h. plotly
i. bokeh
1. Python:
Python is an easy to learn, powerful programming language. It has efficient high-level data
structures and a simple but effective approach to object-oriented programming. Python’s
elegant
syntax and dynamic typing, together with its interpreted nature, make it an ideal language for
scripting and rapid application development in many areas on most platforms.
The Python interpreter and the extensive standard library are freely available in source or
binary form for all major platforms from the Python web site, https://www.python.org/, and
may be
freely distributed.
Installation Commands:
Step 1: Download the Python Installer binaries. Open the official Python website in
your web browser https://www.python.org/downloads/
Step 2: Run the Executable Installer. Once the installer is downloaded, run the
Python installer. ...
Step 3: Add Python to environmental variables. ...
Step 4: Verify the Python Installation.
2. Numpy:
NumPy stands for Numerical Python and it is a core scientific computing library
in Python. It provides efficient multi-dimensional array objects and various operations
to work with these array objects.
Package installer for Python (pip) needed to run Python on your computer.
Installation Commands:
1. Command Prompt : Py –m pip - -version
2. Command Prompt :Py –m pip install numpy
3. Scipy
SciPy is a scientific computation library that uses NumPy underneath.SciPy stands for
Scientific Python.It provides more utility functions for optimization, stats and signal
processing.LikeNumPy, SciPy is open source so we can use it freely.
Installation Commands:
Command Prompt :Py –m pip install scipy
4. Matplotlib:
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python. Matplotlib makes easy things easy and hard things possible.
Create publication quality plots.
Make interactive figures that can zoom, pan, update.
Customize visual style and layout.
Export to many file formats.
Embed in JupyterLab and Graphical User Interfaces.
Use a rich array of third-party packages built on Matplotlib.
Installation Commands:
Command Prompt :Py –m pip install matplotlib
5. Pandas:
pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation
tool, built on top of the Python programming language
Installation Commands:
6. Jupyter:
The Jupyter Notebook is the original web application for creating and sharing
computational documents. It offers a simple, streamlined, document-centric experience.
Installation Commands:
7. Statmodels:
Statsmodels is a Python package that allows users to explore data, estimate statistical
models, and perform statistical tests.
Installation Commands:
8. Seaborn:
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level
interface for drawing attractive and informative statistical graphics.
Installation Commands:
9. Plotly:
Plotly is a technical computing company headquartered in Montreal, Quebec, that
develops online data analytics and visualization tools. Plotly provides online graphing,
analytics, and statistics tools for individuals and collaboration, as well as scientific graphing
libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST.
Installation Commands:
Command Prompt: Py –m pip install plotly
9. Bokeh:
Bokeh is a Python library for creating interactive visualizations for modern web browsers. It
helps you build beautiful graphics, ranging from simple plots to complex dashboards with
streaming datasets.
Installation Commands:
EXERCISE PROGRAM
1.Basic Array Program:(Numpy Python)
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
Result:
Thus the Download, install and explore the features of NumPy, SciPy, Matplotlib, Statsmodels
and Pandas packages by using pip command and basic program was executed
OUTPUT:
3.Draw a line in a diagram from position (0,0) to position (6,250) using Matplotlib
Aim :
To write python programs to create and access numpy arrays.
Algorithm:
1. Start the Program.
2. Import Numpy Library.
3. Perform operation with Numpy Array.
4. Display the output.
5. Stop the Program.
Numpy:
Numpy stands for Numerical Python. It is a Python library used for working with
an
array. In Python, we use the list for purpose of the array but it’s slow to process. NumPy array
is a powerful N-dimensional array object and its use in linear algebra, Fourier transform, and
random number capabilities. It provides an array object much faster than traditional Python
lists.
NUMPY CONCEPTS:
Create NumPy ndarray Object
numpy.zeros()
numpy.ones()
numpy.empty()
numpy.linspace()
numpy.arange()
numpy.array()
Check Number of Dimensions
Dimensions in Arrays
Higher Dimensional Arrays
NumPy Array Indexing
NumPy Array Slicing
Two-Dimensional Arrays
NumPy Array Shape
Reshaping of Arrays
Aggregations
• Mean
• Median
• Mode
• Standard deviation
NumPy concepts:
import numpy as np
print(arr)
• Definition: This function creates a new array of given shape and type, filled with
zeros.
• Syntax: np.zeros(shape, dtype=float)
• Example:
• Definition: This function creates a new array of given shape and type, filled with
ones.
• Syntax: np.ones(shape, dtype=float)
• Example:
• Definition: This function returns an array of evenly spaced numbers over a specified
range.
• Syntax: np.linspace(start, stop, num=50, endpoint=True)
• Example:
• Definition: This function returns an array with values spaced regularly within a given
interval.
• Syntax: np.arange([start, ]stop, [step, ])
• Example:
print(arr)
• Definition: This function is used to create arrays from lists, tuples, or other
sequences.
• Syntax: np.array(object, dtype=None)
• Example:
• Definition: You can use the .ndim attribute to check the number of dimensions of an
array.
• Syntax: array.ndim
• Example:
9. Dimensions in Arrays
• Definition: NumPy arrays can be 1D, 2D, 3D, or more. The number of dimensions is
the rank of the array.
• Example:
• Definition: You can create arrays with more than two dimensions, such as 3D or 4D
arrays.
• Example:
• Definition: NumPy arrays support indexing, similar to Python lists, allowing access
to individual elements.
• Syntax: array[index]
• Example:
print(arr_2d)
• Definition: The .shape attribute returns a tuple representing the dimensions of the
array.
• Syntax: array.shape
• Example:
reshaped_arr = arr.reshape(5, 1)
print(reshaped_arr)
• Definition: NumPy provides functions for aggregation operations such as sum, mean,
and median.
• Example:
print(arr.sum()) # Sum
print(arr.mean()) # Mean
print(np.median(arr)) # Median
• Definition: You can join multiple arrays into one using np.concatenate().
• Syntax: np.concatenate((arr1, arr2))
• Example:
print(joined_arr)
• Definition: You can split an array into multiple sub-arrays using np.split().
• Syntax: np.split(array, sections)
• Example:
split_arr = np.split(arr, 3)
def add_five(x):
return x + 5
ufunc_add_five = np.frompyfunc(add_five, 1, 1)
print(ufunc_add_five(np.array([1, 2, 3])))
22. Comparisons
• Definition: You can create masks for filtering arrays based on conditions.
• Example:
Sum of arr: 15
Mean of arr: 3.0
Median of arr: 3.0
Joined Arrays: [1 2 3 4 5 6]
Splitting the joined array into two parts: [array([1, 2, 3]), array([4, 5, 6])]
Mask for elements > 2 and < 5: [False True True False False]
Elements satisfying the mask: [3 4]
EX.No:3 Working with Pandas data frames
Date:
Aim:
To write Python programs for using pandas data frames and accessing it.
Alogorithm:
1. Start the Program.
2. Import Numpy & Pandas Packages.
3. Create a Dataframe for the list of elements.
4. Load a Dataset from an external source into a pandas dataframe
5. Display the Output.
6. Stop the Program
Pandas Series:
Pandas DataFrame:
• Grouping data by a specific column and applying aggregation functions (e.g., sum,
mean).
• Syntax: df.groupby('Column').agg(function)
Descriptive Statistics:
PROGRAM:
import pandas as pd
df = pd.DataFrame(data_dict)
Aim:
To read the data from text files, Excel and the web and exploring various commands
for
doing descriptive analytics on the Iris data set
Descriptive Analysis:
Descriptive analysis, also known as descriptive analytics or descriptive statistics, is the
process
of using statistical techniques to describe or summarize a set of data.
Data Loading:
• Loads data from an Excel file. Additional comments show how to load data from a
text file or URL.
Basic Information:
• Displays dataset structure, head (first five rows), basic statistics, and any missing
values.
Class Distribution:
Distributions:
Pair Plot:
• A pair plot helps visualize relationships between all feature pairs, with species
differentiated by color.
Correlation Heatmap:
Grouped Statistics:
• Aggregates statistics like mean, standard deviation, minimum, and maximum for each
species group.
PROGRAM:
import pandas as pd
import numpy as np
excel_data = pd.read_excel('/mnt/data/IRIS.xls')
# text_data = pd.read_csv('path_to_text_file.txt')
# If the dataset were from a URL:
# url = 'http://example.com/iris.csv'
# web_data = pd.read_csv(url)
print("Dataset Information:")
print(excel_data.info())
print(excel_data.head())
print("\nDescriptive Statistics:")
print(excel_data.describe())
print(excel_data.isnull().sum())
print("\nClass distribution:")
print(excel_data['species'].value_counts())
plt.figure(figsize=(8, 4))
plt.title(f'Distribution of {column}')
plt.show()
plt.show()
# Correlation heatmap
plt.figure(figsize=(10, 8))
plt.title("Correlation Heatmap")
plt.show()
print(grouped_stats)
Result:
Thus the data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set were successfully completed and executed.
OUTPUT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
sepal_length sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
max 7.900000 4.400000 6.900000 2.500000
sepal_length 0
sepal_width 0
petal_length 0
petal_width 0
species 0
dtype: int64
setosa 50
versicolor 50
virginica 50
Name: species, dtype: int64
sepal_length sepal_width ...
mean std min max mean std min max
species
setosa 5.006000 0.352490 4.300 5.800 3.428000 0.379064 2.300 4.400
versicolor 5.936000 0.516171 4.900 7.000 2.770000 0.313798 2.000 3.400
virginica 6.588000 0.635880 4.900 7.900 2.974000 0.322497 2.200 3.800
EX.No:5 UCI and Pima Indians Diabetes data set
Date:
Aim:
To Use the diabetes data set from UCI and Pima Indians Diabetes data set for
performing the following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.
Algorithm:
Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the
following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.
Importing Libraries:
• Specified file paths for UCI Wine Quality and Pima Indians Diabetes datasets.
• Used try-except for loading datasets to handle potential FileNotFoundError
gracefully.
Univariate Analysis:
• For both datasets, calculated statistical summaries: mean, median, mode, variance,
standard deviation, skewness, and kurtosis.
• Utilized describe() for overall summary statistics.
• Displayed mean values for each column in the Wine Quality dataset for additional
detail.
• Used LinearRegression to fit a linear model for predicting wine quality based on
various features.
• Split data into training and testing sets with train_test_split.
• Calculated predictions and computed Mean Squared Error (MSE) for model
evaluation.
• For both linear and logistic regression, used statsmodels to generate detailed summary
reports.
• Added a constant term to the feature set (intercept) before fitting the model.
• Used OLS for linear regression and Logit for logistic regression to view statistical
significance of features, coefficients, and other model diagnostics.
PROGRAM:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, accuracy_score
import statsmodels.api as sm
# Define file paths
uci_file_path = r'C:\Users\hp\Downloads\Datasets for FDS\wine_quality.csv' # Adjust the
name as necessary
pima_file_path = r'C:\Users\hp\Downloads\Datasets for FDS\diabetes.csv' # Update with
actual file name
# Debugging print statements
print("UCI File Path:", uci_file_path)
print("Pima File Path:", pima_file_path)
# Load the UCI dataset (Wine Quality)
try:
uci_data = pd.read_csv(uci_file_path, delimiter=';') # Using the correct delimiter
print("UCI Data Loaded Successfully.")
except FileNotFoundError as e:
print("FileNotFoundError:", e)
exit(1) # Exit if the file is not found
# Make predictions
y_pred = linear_model.predict(X_test)
# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"\nMean Squared Error (Wine Quality): {mse:.2f}")
# Load the Pima Indians Diabetes dataset
try:
pima_data = pd.read_csv(pima_file_path)
print("Pima Data Loaded Successfully.")
except FileNotFoundError as e:
print("FileNotFoundError:", e)
exit(1) # Exit if the file is not found
# Univariate Analysis for Pima Dataset
pima_desc = pima_data.describe()
pima_mean = pima_data.mean()
pima_median = pima_data.median()
pima_mode = pima_data.mode().iloc[0]
pima_var = pima_data.var()
pima_std = pima_data.std()
pima_skew = pima_data.skew()
pima_kurt = pima_data.kurt()
print("\nPima Indians Diabetes Dataset Univariate Analysis")
print("Mean:", pima_mean)
print("Median:", pima_median)
print("Mode:", pima_mode)
print("Variance:", pima_var)
print("Standard Deviation:", pima_std)
print("Skewness:", pima_skew)
print("Kurtosis:", pima_kurt)
Result :
Thus Python programs are written for performing univariate,bivariate and multiple
linear regression analysis and executed successfully
OUTPUT:
Aim:
To Apply and explore various plotting functions on UCI data sets for performing the
following
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots d. Histograms e. Three dimensional plotting
1. Normal Curve
Definition:
A normal curve, or Gaussian distribution, is a symmetric, bell-shaped curve that shows the
probability distribution of a continuous random variable. The curve is defined by the mean
(center) and the standard deviation (spread).
Syntax:
import numpy as np
import matplotlib.pyplot as plt
# Plotting
plt.plot(x, y)
plt.title("Normal Distribution Curve")
plt.xlabel("Values")
plt.ylabel("Probability Density")
plt.show()
2. Scatter Plot
Definition:
A scatter plot is used to show the relationship between two numerical variables by plotting
data points on an XY axis. Each point represents a pair of values for the two variables.
Syntax:
# Example data
x = [1, 2, 3, 4, 5]
y = [10, 15, 13, 18, 16]
# Plotting
plt.scatter(x, y)
plt.title("Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
3. Histogram
Definition:
A histogram is a graphical representation of the distribution of numerical data. It groups data
into bins (ranges) and plots the frequency of each bin as bars, showing the shape of the data
distribution.
Syntax:
# Example data
data = [10, 20, 20, 30, 30, 40, 40, 40, 50, 50, 50, 50]
# Plotting
plt.hist(data, bins=5, color='skyblue', edgecolor='black')
plt.title("Histogram")
plt.xlabel("Value Ranges")
plt.ylabel("Frequency")
plt.show()
4. Contour Plot
Definition:
A contour plot is a two-dimensional representation of a 3D surface, showing lines where a
particular z-value is constant. It is often used to show the density of data points in two
dimensions.
Syntax:
import numpy as np
import matplotlib.pyplot as plt
# Plotting
plt.contour(X, Y, Z, levels=10, cmap="viridis")
plt.title("Contour Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
5. Density Plot
Definition:
A density plot represents the distribution of a numerical variable using kernel density
estimation (KDE). It smooths out data points to give an estimated continuous probability
density function.
Syntax:
# Example data
data = np.random.normal(0, 1, 1000)
6. 3D Plot
Definition:
A 3D plot is a graphical representation that shows data points in three dimensions, typically
using x, y, and z coordinates. It can be a line, scatter plot, or surface plot.
Syntax:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Example data
x = np.random.rand(50)
y = np.random.rand(50)
z = np.random.rand(50)
# Plotting
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z)
ax.set_title("3D Scatter Plot")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.set_zlabel("Z-axis")
plt.show()
Result :
Thus python programs for exploring various plots using matplotlib were executed
successfully.
OUTPUT:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Load data from CSV file (use a sample data if hou_all.csv is unavailable)
# Replace 'hou_all.csv' with your file path
# df = pd.read_csv('C:\\Users\\Admin\\Downloads\\hou_all.csv')
# For demonstration, create a sample DataFrame
np.random.seed(0)
df = pd.DataFrame({
'set': np.random.normal(loc=50, scale=15, size=100),
'value': np.random.normal(loc=30, scale=10, size=100)
})
# 1. Normal Curve Plot
mean = df['set'].mean()
std_dev = df['set'].std()
x = np.linspace(mean - 3*std_dev, mean + 3*std_dev, 100)
y = (1 / (std_dev * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mean) / std_dev) ** 2)
plt.plot(x, y)
plt.title("Normal Distribution Curve")
plt.xlabel("Set")
plt.ylabel("Probability Density")
plt.show()
# 2. Scatter Plot
plt.scatter(df['set'], df['value'])
plt.title("Scatter Plot")
plt.xlabel("Set")
plt.ylabel("Value")
plt.show()
# 3. Histogram
plt.hist(df['set'], bins=10, color='skyblue', edgecolor='black')
plt.title("Histogram of Set")
plt.xlabel("Set")
plt.ylabel("Frequency")
plt.show()
# 4. Contour Plot
# Creating a 2D grid of values for contour plot
X, Y = np.meshgrid(np.linspace(df['set'].min(), df['set'].max(), 100),
np.linspace(df['value'].min(), df['value'].max(), 100))
Z = np.exp(-((X - mean)**2 + (Y - mean)**2) / (2 * std_dev**2))
plt.contour(X, Y, Z, levels=10, cmap="viridis")
plt.title("Contour Plot")
plt.xlabel("Set")
plt.ylabel("Value")
plt.show()
# 5. Density Plot
df['set'].plot(kind='density')
plt.title("Density Plot of Set")
plt.xlabel("Set")
plt.show()
# 6. 3D Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel("Set")
ax.set_ylabel("Value")
ax.set_zlabel("Frequency")
plt.show()
EX.No:7 Visualizing Geographic Data with Basemap
Date:
Aim:
To Visualize Geographic Data with Basemap
Algorithm:
1. Start the Program.
2. Import Basemap Package.
3. Perform Visualize Function of Geographic data.
4. Display the Output.
5. Stop the Program
Base Map
The common type of visualization in data science is that of geographic data.
Matplotlib's main
tool for this type of visualization is the Basemap toolkit, which is one of several Matplotlib
toolkits which lives under the mpl_toolkits namespace.
Basemap is a matplotlib extension used to visualize and create geographical maps in python.
Installing Base map :
pip Install basemap
Importing Basemap and matplotlib libraries:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
Functions used in Basemap:
Basemap() - To create a base map class.
Physical boundaries and bodies of water
# Step 2: Create a Basemap instance for the world map (Orthographic projection)
m = Basemap(projection='ortho', lat_0=0, lon_0=0) # Orthographic projection centered at (0,
0)
Result :
AIM:
Write a python code to perform multidimensional summarization using pivot table.
Algorithm:
PROGRAM:
import numpy as np
import pandas as pd
# Creating a pivot table with the sum of 'D' values, and fill NaN values with 0
table = pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'], aggfunc=np.sum,
fill_value=0)
print("\nPivot Table with sum of 'D' values and fill_value=0:")
print(table)
# Pivot table with mean of 'D' and 'E' values, grouped by 'A' and 'C'
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'], aggfunc={'D': np.mean, 'E':
np.mean})
print("\nPivot Table with mean of 'D' and 'E' values:")
print(table)
# Pivot table with mean of 'D' and multiple aggregations for 'E'
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'], aggfunc={'D': np.mean, 'E':
[min, max, np.mean]})
print("\nPivot Table with mean of 'D' and multiple aggregations for 'E':")
print(table)
Result:
AIM:
Write a python program for frequency distribution, find the averages like mean,
median,mode of the given dataset, find data type, mean, median, standard deviation,
variance, mean absolute deviation.
Algorithm:
1. Import Libraries:
o Import the necessary libraries such as pandas, numpy, and scipy.stats.
2. Load the Dataset:
o Read the dataset from a CSV file using pandas.read_csv().
3. Generate Frequency Distribution:
o Use value_counts() on specific columns to generate the frequency table.
o Sort the frequency table by index using value_counts().sort_index() for an
organized view.
o Calculate relative frequency by dividing each frequency by the total number of
observations.
o Calculate percentage frequency by multiplying each relative frequency by 100.
4. Calculate Percentiles and Percentile Ranks:
o Use scipy.stats.percentileofscore() to find the percentile rank for a specific
score.
o Use pandas.describe() to find percentiles (like 25th, 50th, 75th) and other
statistics.
5. Create Grouped Frequency Distribution:
o Set bins in value_counts() or use pandas.cut() to create a grouped frequency
distribution.
o For custom intervals, use pandas.interval_range() to set specific range
intervals as required.
6. Calculate Descriptive Statistics:
o Data Type: Use type() function on columns to determine data types.
o Mean: Apply np.mean() on specific columns to calculate the mean.
o Median: Use np.median() to calculate the median value.
o Mode: Use scipy.stats.mode() to find the mode of specific columns.
7. Calculate Additional Statistics:
o Standard Deviation: Use pandas.std() to calculate the standard deviation.
o Variance: Use pandas.var() to calculate variance.
o Mean Absolute Deviation: Use pandas.mad() for the mean absolute
deviation.
8. Output Results:
o Display each result (frequency distribution tables, mean, median, mode,
standard deviation, variance, and other calculated statistics) in an organized
format for analysis.
PROGRAM:
freq_dis_height = wnba['Height'].value_counts()
print("Frequency Distribution for Height:\n", freq_dis_height, "\n")
freq_dis_height_sorted = wnba['Height'].value_counts().sort_index(ascending=False)
print("Sorted Frequency Distribution for Height (Descending):\n", freq_dis_height_sorted,
"\n")
# Mode example
data1 = (2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7)
print("Mode of data1:", mode(data1), "\n")
Result:
Thus the program for Comprehensive Data Analysis Algorithm for Frequency
Distribution and Descriptive Statistics has been executed successfully
OUTPUT:
EX.NO:10 Correlation, Scatter Plots, Correlation Coefficient
DATE: Regression
Aim:
Write a python program to perform correlation and scatter plots, to find correlation
coefficient, to find Regression.
Algorithm:
PROGRAM:
# Scatter plot for the first two features in the iris dataset
plt.scatter(features[0], features[1], alpha=0.2, s=100 * features[3], c=iris.target,
cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title("Scatter Plot of Iris Features")
plt.show()
RESULT:
Thus the program has been successfully executed.
OUTPUT: