FDS Lab Manual Student Manual
FDS Lab Manual Student Manual
(AUTONOMOUS)
LABORATORY MANUAL
Student Name :
Register Number :
Course Code :
Course Name :
Year /Semester :
Department :
Academic Year :
KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY
(AUTONOMOUS)
VISION
"To become an internationally renowned Institution in technical education, research and development,
by transforming the students into competent professionals with leadership skills and ethical values."
MISSION
VISION:
To produce globally competent engineers in the field of Artificial Intelligence and Data Science
with a focus on emerging computing needs of the industry and society.
MISSION:
➢ Enrich the students’ skills, knowledge with interdisciplinary skill sets by cognitive learning
environment and industrial collaboration.
➢ Promote quality and value based education towards emerging computing needs of the industry
and entrepreneurship skills among students.
➢ Provide for students with leadership qualities, ethical and human values to serve the nation and
focus on students’ overall development.
AIM:
To implement the various operations on arrays, vectors and matrices using NumPy library in
Python.
THEORY :
NUMPY LIBRARY
▪ NumPy is a computational library that helps in speeding up Vector Algebra operations that
involve Vectors (Distance between points, Cosine Similarity) and Matrices.
▪ Specifically, it helps in constructing powerful n-dimensional arrays that works smoothly with
distributed and GPU systems.
▪ It is a very handy library and extensively used in the domains of Data Analytics and Machine
Learning.
Given are two similar dimensional numpy arrays, get a numpy array output in which every element is
an element-wise sum of the 2 numpy arrays.
Given a numpy array (matrix), get a numpy array output which is equal to the original matrix
multiplied by a given scalar.
Given 2 numpy arrays as matrices, output the result of multiplying the 2 matrices (as a numpy array)
# matrix multiplication
import numpy as np
a = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
b = np.array([[2,3,4],
[5,6,7],
[8,9,10]])
c = a@b
print(c)
#5 - Matrix transpose
# matrix transpose
import numpy as np
a = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
b = a.T
print(b)
Convert all the elements of a numpy array from one datatype to another datatype (ex: float to int)
b = a.astype('int')
print(b)
Stack 2 numpy arrays horizontally i.e., 2 arrays having the same 1st dimension (number of rows in z2D
arrays)
[4,5,6]])
a2 = np.array([[7,8,9],
[10,11,12]])
c = np.hstack((a1, a2))
print(c)
# Array stacking - Vertical
import numpy as np
a1 = np.array([[1,2],
[3,4],
[5,6]])
b = np.array([[7,8],
[9,10],
[10,11]])
c = np.vstack((a, b))
print(c)
#8 - Sequence generation
Generate a sequence of numbers in the form of a numpy array from 0 to 100 with gaps of 2 numbers,
for example: 0, 2, 4 ....
Code
# Sequence generation
import numpy as np
list = [x for x in range(0, 101, 2)]
a = np.array(list)
print(a)
Output
#9 - Matrix generation with specific value
Output a matrix (numpy array) of dimension 2-by-3 with each and every value equal to 5
import numpy as np
[3, 4, 6],
RESULT:
Ex.No:2 MANIPULATING DATA FRAMES AND SERIES
Date: USING PANDAS
AIM:
To implement the basic operations used for data analysis using Pandas in Python.
THEORY:
Pandas
▪ Pandas is a Python Data Analysis Lirbary, dealing primarily with tabular data.
▪ It's forms a major Data Analysis Toolbox which is widely used in the domains like Data
Mining, Data Warehousing, Machine Learning and General Data Science.
▪ It is an Open Source Library under a liberal BSD license.
▪ It has mainly 2 forms:
1. Series: Contains data related to a single variable (can be visualized as a vector) along with
indexing information.
2. DataFrame: Contains tabular data.
Data Frames
▪ A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in
rows and columns.
▪ Features of DataFrame
✓ Potentially columns are of different types
✓ Size – Mutable
✓ Labeled axes (rows and columns)
✓ Can Perform Arithmetic operations on rows and columns
1. Lists
2. dict
3. Series
4. Numpy ndarrays
5. Another DataFrame
Exercises
import pandas as pd
df = pd.DataFrame()
print(df)
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)
#3 - Creating data frame from Dictionary of n-D arrays / Lists
All the n-D arrays must be of same length. If index is passed, then the length of the index should equal
to the length of the arrays.
If no index is passed, then by default, index will be range (n), where n is the array length.
import pandas as pd
df = pd.DataFrame(data)
print (df)
import pandas as pd
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print (df)
Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the
series indexes passed
Code Sample Output
import pandas as pd
df = pd.DataFrame(d)
print (df)
#5 - Sorting data frame
import pandas as pd
df = pd.DataFrame(data)
print (df)
print (df_sorted)
Manipulating column includes selection of column, adding a new column and removing an existing
column from the data frame.
Code Sample Output
#selecting a column
import pandas as pd
df = pd.DataFrame(d)
print(df [ ‘one’])
import pandas as pd
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print (df)
import pandas as pd
df = pd.DataFrame(d)
del df['one']
print(df)
Manipulating a row includes selection of row, adding new row, and removing an existing row from the
data frame.
#selecting a row
import pandas as pd
df = pd.DataFrame(d)
print df.loc['b']
#adding a new row
import pandas as pd
df = df.append(df2)
print df
import pandas as pd
df = df.append(df2)
df = df.drop(0)
print df
RESULT:
Ex.No: 3
BASIC PLOTS USING MATPLOTLIB
Date:
AIM:
To implement the different types of plots used for data analysis using MatplotLib.
Theory:
MATPLOTLIB:
▪ Matplotlib is one of the most popular Python packages used for data visualization.
▪ It is a cross-platform library for making 2D plots from data in arrays.
▪ Matplotlib is written in Python and makes use of NumPy, the numerical mathematics Error!
Hyperlink reference not valid.Extension of Python.
▪ Matplotlib has a procedural interface named the Pylab, which is designed to resemble
MATLAB, a proprietary programming language developed by Math Works.
▪ Matplotlib along with NumPy can be considered as the open source equivalent of MATLAB.
STEPS
# x axis values
x = [1,2,3]
y = [2,4,1]
plt.plot(x, y)
plt.xlabel('x - axis')
plt.ylabel('y - axis')
import numpy as np
import numpy as np
import math
plt.plot(x,y)
plt.xlabel("angle")
plt.ylabel("sine")
plt.title('sine wave')
plt.show()
It is also possible to create a plot using categorical variables. Matplotlib allows you to pass categorical
variables directly to many plotting functions
Code
Sample Output
RESULT:
Ex. No: 4
FREQUENCY DISTRIBUTION IN PYTHON
Date:
AIM:
To write a Python program to create frequency table and cummulative sums for the given data
set
THEORY :
FREQUENCY DISTRIBUTION:
To get frequency table of column in pandas, the following three methods are used
Cummulative Sum:
Cumulative sum of a column in pandas is computed using cumsum() function and stored in the
new column namely “cumulative_Tax”. axis =0indicates the column wise performance
ALGORITHM:
Step 2: Create the data frame for the given data sing the function pd.DataFrame().
Step 3: Calculate the frequency count for the state using the function value_counts().
Step 4: Calculate the cumulative sum using cumsum() for the tax data.
PROGRAM:
import pandas as pd
import numpy as np
data = { 'State':['Alaska', 'California', 'Texas', 'North Carolina', 'California', 'Texas', 'Alaska',
'Texas', 'North Carolina', 'Alaska', 'California', 'Texas'],
'Sales':[14,24,31,12,13,7,9,31,18,16,18,14],
‘Tax’:[14,24,31,12,13,7,9,31,18,16,18,14]}
print(df1)
#method-1
df1.State.value_counts()
print(df1)
#method-2
df1['State'].value_counts()
print(df1)
#method-3
my_tab = pd.crosstab(index=df1[“State”], columns=”count”)
print(my_tab)
#method-4
df1.groupby(['State'])['Sales'].count()
#cumulative sum
df1['cumulative_Tax']=df1['Tax'].cumsum(axis = 0)
print(df1)
OUTPUT:
RESULT:
Ex.No: 5
AVERAGES IN PYTHON
Date:
AIM:
To implement the different methods of finding the average of a list in python.
Theory:
Given a list of numbers, the task is to find average of that list. Average is the sum of elements
divided by the number of elements.
EXERCISE
#1 - Using sum( ) method
In Python, the average of a list can be computed by simply using the sum() and len() function.
• sum() : Using sum() function we can get the sum of the list.
• len() : len() function is used to get the length or the number of elements in a list .
Code Sample Output
def Average(lst):
# Driver Code
average = Average(lst)
The reduce() can be used to reduce the loop and by using the lambda function can compute
summation of list. len() function is used to get the length or the number of elements in a list .
Code Sample Output
# importing reduce()
def Average(lst):
# Driver Code
average = Average(lst)
The inbuilt function mean() can be used to calculate the mean( average ) of the list.
# importing mean()
def Average(lst):
return mean(lst)
# Driver Code
average = Average(lst)
RESULT:
Ex.No: 6
VARIANCE IN PYTHON
Date:
AIM:
To implement the methods of finding the variance of a sample list in python.
THEORY:
▪ Statistics module provides very powerful tools, which can be used to compute anything
related to Statistics. variance() is one such function.
▪ This function helps to calculate the variance from a sample of data (sample is a subset of
populated data).
variance() function should only be used when variance of a sample needs to be calculated.
▪ There’s another function known as pvariance(), which is used to calculate the variance of an
entire population.
Steps for calculating the variance
Step 1: Find the mean. To find the mean, add up all the scores, then divide them by the
number of scores. ...
Step 2: Find each score's deviation from the mean. ...
Step 3: Square each deviation from the mean. ...
Step 4: Find the sum of squares. ...
Step 5: Divide the sum of squares by n – 1 or N.
EXERCISES:
#1 - variance ( ) method
import statistics
sample = [1, 2, 3, 4, 5]
%(statistics.variance(sample)))
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),fr(5, 6), fr(7, 8))
import statistics
m = statistics.mean(sample)
Exercise 7.1. Write a Python program to generate random numbers from normal distribution
AIM:
To generate five random numbers from normal distribution using Numpy array manipulation in python
ALGORITHM:
Step 3: Using the numpy universal function, find the normal distribution values.
PROGRAM:
import numpy as np
import statsmodels.api as sm
x1 = np.random.normal(size=5)
print(x1)
OUTPUT:
RESULT :
Exercise 7.2. Generate a random normal distribution of size 2x3, and also with mean and
standard deviation. To visualize the data use seaborn library.
AIM:
To write a python program to calculate the normal distribution for two dimensional array and to
use sea born library for visualization of curve.
ALGORITHM:
Step 1: Import the library numpy, pandas, matplot, statsmodels and seaborn.
Step 4: To calculate the distribution with mean and Standard deviation use “.normal(loc=mean, scale,
size_array)” function.
PROGRAM:
import numpy as np
import pandas as pd
import statsmodels.api as sm
print(x1)
x2=np.random.normal(loc=1,scale=2,size=(2,3))
print(x2)
#visualization:
sns.displot(np.random.normal(size=1000),kind="kde")
plt.show()
OUTPUT:
RESULT:
Ex.No: 8
CORRELATION BETWEEN VARIABLES
Date:
Write a python program to find the correlation between variables. Also create a heatmap using
seaborn to present their relations.
Sample data:
Temperature: 3, 39, 46, 45, 34, 38, 30, 22, 34, 38, 31, 14, 27, 39
wind speed: 4.3, 3.86, 9.73, 6.86, 7.8, 8.7, 16.46, 3.1, 9.1, 9.76, 17.9, 2.2, 8.23, 11.26
AIM:
To write a python program to find correlation between temperature and wind speed and to plot scatter
plot and heat map of the correlation.
Correlation:
Correlation is simple with seaborn and pandas. It is a way to determine if two variables in a dataset are
related in any way. Its features are strongly correlated.
ALGORITHM:
Step 4: Add the linear model fit line to a scatterplot using sea born’s lmplot( ) method.
Step 5: Calculate the pearson correlation coefficient values for x and y using the method pearsonr( ),
which is provided by scipy library.
PROGRAM:
import pandas as pd
weather={'temp':[3,39,46,45,34,38,30,22,34,38,31,14,27,39],
'wind_speed':[4.3,3.86,9.73,6.86,7.8,8.7,16.46,3.1,9.1,9.76,17.9,2.2,8.23,11.26]}
weather=pd.DataFrame(weather)
#Scatter plot
ax.set_title("Temperaure Vs Wind_speed")
sns.lmplot(x="temp",y="wind_speed", data=weather)
plt.show()
stats.pearsonr(weather['temp'],weather['wind_speed'])
cormat = weather.corr()
round(cormat,2)
print(cormat)
sns.heatmap(cormat)
OUTPUT:
RESULT:
Exp No: 9 CALCULATION OF CORRELATION
Date: COEFFICIENT
'points':[25,12,15,14,19,23,15,29],
'assists':[5,7,7,9,12,9,9,4],
'rebound':[11,8,10,6,6,5,9,12]
AIM:
To calculate the correlation coefficient between points using pandas data frames.
ALGORITHM:
import pandas as pd
df=pd.DataFrame({'points':[25,12,15,14,19,23,15,29],
'assists':[5,7,7,9,12,9,9,4],
'rebound':[11,8,10,6,6,5,9,12]})
df.head()
coef=df['points'].corr(df['assists'])
print(coef)
OUTPUT:
RESULT:
Ex.No: 10
CREATION OF A LINEAR REGRESSION
Date:
A linear regression line has an equation of the Y=a+bX, where x is the explanatory variables
and Y is the dependent variable. The slope of the line is b, a is the intercept (the value of y when x =0).
AIM:
ALGORITHM:
Step 1: Import the library and packages such as numpy, Linear regression.
Step 3: Create a regression model and fit it with the given data. Method using sklearn.linear_model.
Linear Regression.
Step 4: Check the results of model fitting to know whether the model is satisfactory.
PROGRAM:
import numpy as np
x=np.array([5,15,25,35,45,55]).reshape((-1,1))
y=np.array([5,20,14,32,22,38])
print(x)
print(y)
model=LinearRegression()
model.fit(x,y)
model=LinearRegression().fit(x,y)
r_sq=model.score(x,y)
print('coefficient of determination:',r_sq)
print('intercept:',model.intercept_)
print('slope:',model.coef_)
new_model=LinearRegression().fit(x,y.reshape((-1,1)))
print('intercept:',new_model.intercept_)
print('slope:',new_model.coef_)
y_pred=model.predict(x)
print('predicted response:',y_pred,sep='\n')
y_pred=model.intercept_+model.coef_*x
x_new=np.arange(5).reshape((-1,1))
print(x_new)
y_new=model.predict(x_new)
print(y_new)
OUTPUT:
RESULT:
Ex.No: 11
IMPLEMENT DECISION TREE CLASSIFICATION
Date: TECHNIQUES
AIM
To implement a decision tree used to representing a decision situation in visually and
show all those factors within the analysis that are considered relevant to the decision.
ALGORITHM
Step 1:It begins with the original set S as the root node.
Step 2:On each iteration of the algorithm, it iterates through the very unused attribute of the set
Step 3:It then selects the attribute which has the smallest Entropy or Largest Information gain.
Step 4:The set S is then split by the selected attribute to produce a subset of the data.
Step 5:The algorithm continues to recur on each subset, considering only attributes never
selected before.
PROGRAM
table(birthwt$low)
set.seed(1)
plot(birthwtTree)
text(birthwtTree, pretty = 0)
summary(birthwtTree)
OUTPUT
RESULT
Ex.No: 12
IMPLEMENTATION OF CLUSTERING TECHNIQUES
Date:
AIM:
ALGORITHM:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. ...
Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
PROGRAM:
RESULT:
Ex.No: 13 IMPLEMENT AN ALGORITHM TO DEMONSTRATE THE
Date:
SIGNIFICANCE OF GENETIC ALGORITHM
AIM:
ALGORITHM :
Step 2:Compute the fitness of each candidate solution using the function \( f(x) = x^2 \).
Step 7 :Repeat the process until a stopping criterion is met (e.g., a maximum number of
generations).
PROGRAM :
import numpy as np
# Parameters
MAX_GENERATIONS = 100
# Functions
def create_population():
def fitness_function(individual):
def evaluate_population(population):
return population[indices]
def mutate(individual):
individual[mutation_indices] = 1 - individual[mutation_indices]
return individual
def genetic_algorithm():
population = create_population()
if generation % 10 == 0:
new_population = []
new_population.append(mutate(child1))
new_population.append(mutate(child2))
population = np.array(new_population)
final_fitness = evaluate_population(population)
best_individual = population[np.argmax(final_fitness)]
best_fitness = final_fitness.max()
best_x = int(best_binary, 2)
RESULT: