0% found this document useful (0 votes)
65 views

FDS Lab Manual Student Manual

This is crazy and mad i know that you are not in the mood to talk to you and you are not in the mood to talk to you and you are not in the mood to talk to you and you are not in the mood to talk to you and you are not in the mood

Uploaded by

yasvand62
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

FDS Lab Manual Student Manual

This is crazy and mad i know that you are not in the mood to talk to you and you are not in the mood to talk to you and you are not in the mood to talk to you and you are not in the mood to talk to you and you are not in the mood

Uploaded by

yasvand62
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY

(AUTONOMOUS)

Tholurpatti (Po), Thottiam (Tk), Trichy (Dt)-621 215.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

STUDENT LAB MANUAL

LABORATORY MANUAL

Student Name :

Register Number :

Course Code :

Course Name :

Year /Semester :

Department :

Academic Year :
KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY
(AUTONOMOUS)

Tholurpatti (Po), Thottiam (Tk), Trichy (Dt) - 621 215

COLLEGE VISION & MISSION STATEMENT

VISION

"To become an internationally renowned Institution in technical education, research and development,
by transforming the students into competent professionals with leadership skills and ethical values."

MISSION

❖ Providing the Best Resources and Infrastructure.


❖ Creating Learner Centric Environment and continuous -Learning.
❖ Promoting Effective Links with Intellectuals and Industries.
❖ Enriching Employability and Entrepreneurial Skills.
❖ Adapting to Changes for Sustainable Development.
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

VISION:
To produce globally competent engineers in the field of Artificial Intelligence and Data Science
with a focus on emerging computing needs of the industry and society.

MISSION:
➢ Enrich the students’ skills, knowledge with interdisciplinary skill sets by cognitive learning
environment and industrial collaboration.
➢ Promote quality and value based education towards emerging computing needs of the industry
and entrepreneurship skills among students.
➢ Provide for students with leadership qualities, ethical and human values to serve the nation and
focus on students’ overall development.

PROGRAM EDUCATIONAL OBJECTIVES (PEOs)


➢ PEO I: Graduates will be able to Artificial Intelligence professionals with expertise in the fields
of Artificial Intelligence, Big Data Analytics and Data Science.
➢ PEO II: Graduates will be able to develop problem solving skills and ability to provide solution
for real time problems.
➢ PEO III: Graduates shall have professional ethics, team spirit, life-long learning,
communication skills and adopt corporate culture, core values and leadership skills.

PROGRAM SPECIFIC OUTCOMES (PSOs)


➢ PSO1: Professional skills: Students shall excel in software development including Artificial
Intelligence technologies to solve complex computation task with soft skills.
➢ PSO2: Competency: Students shall qualify at the State, National and International level
competitive examination for employment, higher studies and research.
PROGRAM OUTCOMES (POs):

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of the
information to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the
professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and need for
sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms
ofthe engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
12. Life Long Learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY
NAMAKKAL- TRICHY MAIN ROAD, THOTTIAM
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Course: 20AD304L/ FUNDAMENTALS OF DATA SCIENCE LABORATORY
Course Outcome:
S.NO DESCRIPTION PO(1..12) & PSO(1..2) MAPPING
C208.1 Develop relevant programming abilities PO1,PO3,PO6,PSO1,PSO2
C208.2 Demonstrate knowledge of statistical data analysis PO1,PO2,PO3,PO4,PSO1,PSO2
techniques
C208.3 Exhibit proficiency to build and assess data-based PO1,PO2,PO4,PSO1,PSO2
models
C208.4 Demonstrate skill in Data management & processing PO1,PO2,PO4,PO12,PSO1,PSO2
tasks using Python
C208.5 Apply data science concepts and methods to solve PO1,PO2,PO3,PO4,PO6,PO9,PO
problems in real-world contexts and will 11,PO12,PSO1,PSO2
communicate these solutions effectively

List of Experiments Mapping with COs, POs & PSOs:

Mapping Mapping Mapping


S. NO Experiment Name
with COs with POs with PSOs

1 Working with NumPy arrays. CO1 PO1,PO3,PO6 PSO1, PSO2

2 Working with Pandas data frames. CO1 PO1,PO3 PSO1, PSO2

Basic plots using Matplotlib. PO1, PO2,


3 CO2 PSO1, PSO2
PO3
Frequency distributions. PO1, PO2,
4 CO2 PSO1, PSO2
PO3, PO4
Averages. PO1, PO2,
5 CO3 PSO1, PSO2
PO3, PO4
Variability. PO1, PO2,
6 CO3 PSO1, PSO2
PO3, PO4
PO1, PO2,
Normal curves.
7 CO4 PO3, PO4, PSO1, PSO2
PO8
PO1, PO2,
Correlation and scatter plots.
8 CO4 PO3, PO4, PSO1, PSO2
PO8
PO1, PO2,
Correlation coefficient.
9 CO5 PO3, PO4, PSO1, PSO2
PO12
PO1, PO2,
Regression.
10 CO5,CO4 PO3, PO4, PSO1, PSO2
PO12
Advanced Experiments
PO1, PO2,
Implement decision tree classification
11 CO5 PO3, PO4, PSO1, PSO2
Techniques
PO8, PO11
Design Experiments
PO1, PO2,
12 Implementation Of Clustering Techniques CO5 PO3, PO4, PSO1, PSO2
PO12
Open ended Experiments
PO1, PO2,
Implement An Algorithm to demonstrate the
13 CO5 PO3, PO4, PSO1, PSO2
significance of Genetic Algorithm PO8, PO11
CONTENTS

Ex.No Date Title Page No Mark Signature


1 Python program to work with Numpy
arrays
2 Python program to work with Pandas
data frames
3 Python program for Basic plots using
Matplotlib
4 Python code for frequency
distributions using Pandas
5 Python code to find averages of rows
and columns in Pandas data frames
6 program to find variance function in
python Pandas
7 Python program to generate
normal curves using Matplotlib
library
8 Python program to find correlation
and scatter plots
9 Python program to find correlation
coefficient using Pandas and
Matplotlib libraries
10 Python program to apply regression
using Matplot library
ADVANCED EXPERIMENTS
11 Implement decision tree
classification Techniques
DESIGN EXPERIMENTS
12 Implementation Of Clustering
Techniques
OPEN ENDED EXPRIMENTS
13 Implement An Algorithm to
demonstrate the significance of
Genetic Algorithm
Ex.No:1
ARRAY PROCESSING USING NUMPY
Date:

AIM:
To implement the various operations on arrays, vectors and matrices using NumPy library in
Python.

THEORY :

NUMPY LIBRARY

▪ NumPy is a computational library that helps in speeding up Vector Algebra operations that
involve Vectors (Distance between points, Cosine Similarity) and Matrices.
▪ Specifically, it helps in constructing powerful n-dimensional arrays that works smoothly with
distributed and GPU systems.
▪ It is a very handy library and extensively used in the domains of Data Analytics and Machine
Learning.

(1) Arrays in NumPy

▪ NumPy’s main object is the homogeneous multidimensional array.


▪ It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive
integers.
▪ In NumPy dimensions are called axes. The number of axes is rank.
▪ NumPy’s array class is called array. It is also known by the alias array.

(2) Array creation:

▪ There are various ways to create arrays in NumPy.


▪ Create an array from a regular Python list or tuple using the array function. The type of the
resulting array is deduced from the type of the elements in the sequences.
▪ Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy
offers several functions to create arrays with initial placeholder content. These minimize the
necessity of growing arrays, an expensive operation.
For example: np.zeros, np.ones, np.full, np.empty, etc.
#1 - Matrix addition

Given are two similar dimensional numpy arrays, get a numpy array output in which every element is
an element-wise sum of the 2 numpy arrays.

Code Sample Output

#Elementwise addition of two numpy arrays


import numpy as np
a = np.array([[1,2,3],
[4,5,6]])
b = np.array([[10,11,12],
[13,14,15]])
c=a+b
print(c)

#2 - Multiplying a Matrix by a scalar.

Given a numpy array (matrix), get a numpy array output which is equal to the original matrix
multiplied by a given scalar.

Code Sample Output

#Multiply a matrix by a scalar


import numpy as np
a = np.array([[1,2,3],
[4,5,6]])
b=3*a
print(b)

#3 - Create an identity Matrix


Create an identity matrix of given dimension

Code Sample Output

#Indentity matrix of 4 X 4 size


import numpy as np
i = np.eye(4)
print(i)
#4 - Matrix Multiplication

Given 2 numpy arrays as matrices, output the result of multiplying the 2 matrices (as a numpy array)

Code Sample Output

# matrix multiplication
import numpy as np
a = np.array([[1,2,3],

[4,5,6],

[7,8,9]])

b = np.array([[2,3,4],

[5,6,7],

[8,9,10]])

c = a@b

print(c)

#5 - Matrix transpose

Print the transpose of a given matrix.

Code Sample Output

# matrix transpose
import numpy as np
a = np.array([[1,2,3],

[4,5,6],

[7,8,9]])
b = a.T

print(b)

#6 - Array datatype conversion

Convert all the elements of a numpy array from one datatype to another datatype (ex: float to int)

Code Sample Output

# Array datatype conversion


import numpy as np
a = np.array([[2.5, 3.8, 1.5],

[4.7, 2.9, 1.56]])

b = a.astype('int')

print(b)

#7 - Stacking of Numpy arrays

Stack 2 numpy arrays horizontally i.e., 2 arrays having the same 1st dimension (number of rows in z2D
arrays)

Code Sample Output

# Array stacking - horizontal


import numpy as np
a1 = np.array([[1,2,3],

[4,5,6]])

a2 = np.array([[7,8,9],

[10,11,12]])

c = np.hstack((a1, a2))

print(c)
# Array stacking - Vertical
import numpy as np
a1 = np.array([[1,2],
[3,4],
[5,6]])
b = np.array([[7,8],
[9,10],
[10,11]])
c = np.vstack((a, b))
print(c)

#8 - Sequence generation

Generate a sequence of numbers in the form of a numpy array from 0 to 100 with gaps of 2 numbers,
for example: 0, 2, 4 ....

Code

# Sequence generation
import numpy as np
list = [x for x in range(0, 101, 2)]
a = np.array(list)
print(a)

Output
#9 - Matrix generation with specific value

Output a matrix (numpy array) of dimension 2-by-3 with each and every value equal to 5

Code Sample Output

# Array stacking - horizontal


import numpy as np
a = np.full((2, 3), 5)
print(a)\

#10 - Sorting an array

Sort the given NumPy array in ascending order.

Code Sample Output

# Array stacking - horizontal

import numpy as np

a = np.array ([[1, 4, 2],

[3, 4, 6],

[0, -1, 5]])

# printing the sorted array

print (np.sort(a, axis = None))

# sort array row-wise

Print (np.sort(a, axis = 1))

# sort array column-wise

print (np.sort(a, axis = 0))

RESULT:
Ex.No:2 MANIPULATING DATA FRAMES AND SERIES
Date: USING PANDAS

AIM:
To implement the basic operations used for data analysis using Pandas in Python.

THEORY:

Pandas

▪ Pandas is a Python Data Analysis Lirbary, dealing primarily with tabular data.
▪ It's forms a major Data Analysis Toolbox which is widely used in the domains like Data
Mining, Data Warehousing, Machine Learning and General Data Science.
▪ It is an Open Source Library under a liberal BSD license.
▪ It has mainly 2 forms:
1. Series: Contains data related to a single variable (can be visualized as a vector) along with
indexing information.
2. DataFrame: Contains tabular data.

Data Frames

▪ A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in
rows and columns.
▪ Features of DataFrame
✓ Potentially columns are of different types
✓ Size – Mutable
✓ Labeled axes (rows and columns)
✓ Can Perform Arithmetic operations on rows and columns

Creating Data Frame using Pandas

A pandas DataFrame can be created using the following constructor

pandas.DataFrame( data, index, columns, dtype, copy)

A pandas DataFrame can be created using various inputs like:

1. Lists
2. dict
3. Series
4. Numpy ndarrays
5. Another DataFrame

Exercises

#1 - Creating an empty data frame

A basic DataFrame, which can be created is an Empty Dataframe.

Code Sample Output

#import the pandas library

import pandas as pd

df = pd.DataFrame()

print(df)

#2 - Creating data frame from a List

The DataFrame can be created using a single list or a list of lists

Code Sample Output

#import the pandas library

import pandas as pd

data = [1,2,3,4,5]

df = pd.DataFrame(data)

print (df)

#import the pandas library

import pandas as pd

data = [['Alex',10],['Bob',12],['Clarke',13]]

df = pd.DataFrame(data,columns=['Name','Age'])

print (df)
#3 - Creating data frame from Dictionary of n-D arrays / Lists

All the n-D arrays must be of same length. If index is passed, then the length of the index should equal
to the length of the arrays.
If no index is passed, then by default, index will be range (n), where n is the array length.

Code Sample Output

#import the pandas library

import pandas as pd

data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}

df = pd.DataFrame(data)

print (df)

#creating a data frame from an array

#import the pandas library

import pandas as pd

data={'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}

df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])

print (df)

#4 - Creating a data frame from a Series

Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the
series indexes passed
Code Sample Output

import pandas as pd

d = { 'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

print (df)
#5 - Sorting data frame

Given a data frame sort by a given column.


Code Sample Output

#import the pandas library

import pandas as pd

data ={ 'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}

df = pd.DataFrame(data)

print (df)

df_sorted = df.sort_values( by = ‘Name’)

print (“Sorted data frame…”)

print (df_sorted)

#6- Manipulating a Data frame Column

Manipulating column includes selection of column, adding a new column and removing an existing
column from the data frame.
Code Sample Output

#selecting a column

import pandas as pd

d = { 'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

print(df [ ‘one’])

#adding a new column

import pandas as pd

d = { 'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}


df = pd.DataFrame(d)

df['three']=pd.Series([10,20,30],index=['a','b','c'])

print (df)

#deleting an existing column

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),

'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)

print ("Deleting the first column using DEL function:")

del df['one']

print(df)

#7 - Manipulating a Data frame row

Manipulating a row includes selection of row, adding new row, and removing an existing row from the
data frame.

Code Sample Output

#selecting a row

import pandas as pd

d={ 'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

print df.loc['b']
#adding a new row

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

print df

#deleting an existing row

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])

df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0

df = df.drop(0)

print df

RESULT:
Ex.No: 3
BASIC PLOTS USING MATPLOTLIB
Date:

AIM:
To implement the different types of plots used for data analysis using MatplotLib.
Theory:

MATPLOTLIB:

▪ Matplotlib is one of the most popular Python packages used for data visualization.
▪ It is a cross-platform library for making 2D plots from data in arrays.
▪ Matplotlib is written in Python and makes use of NumPy, the numerical mathematics Error!
Hyperlink reference not valid.Extension of Python.
▪ Matplotlib has a procedural interface named the Pylab, which is designed to resemble
MATLAB, a proprietary programming language developed by Math Works.
▪ Matplotlib along with NumPy can be considered as the open source equivalent of MATLAB.
STEPS

1. Define the x-axis and corresponding y-axis values as lists.


2. Plot them on canvas using. plot() function.
3. Give a name to x-axis and y-axis using. xlabel() and . ylabel() functions.
4. Give a title to your plot using. title() function.
5. Finally, to view your plot, we use. show() function.
EXERCISES

#1 - Ploting simple line

plot is a versatile function, and will take an arbitrary number of arguments.

Code Sample Output

import matplotlib.pyplot as plt


plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
plt.show()
It may be wondering why the x-axis ranges from 0-3 and the y-axis from 1-4. If you provide a single
list or array to plot.

Code Sample Output

import matplotlib.pyplot as plt


plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.ylabel('some numbers')
plt.show()

#2 - Line Plot with properties

plot is a versatile function, and will take an arbitrary number of arguments.

Code Sample Output

# importing the required module

import matplotlib.pyplot as plt

# x axis values

x = [1,2,3]

# corresponding y axis values

y = [2,4,1]

# plotting the points

plt.plot(x, y)

# naming the x axis

plt.xlabel('x - axis')

# naming the y axis

plt.ylabel('y - axis')

# giving a title to my graph

plt.title('My first graph!')

# function to show the plot


plt.show()
#3 - Line Plot with different style
The axis function in the example above takes a list of [xmin, xmax, ymin, ymax] and specifies the
viewport of the axes

Code Sample Output

import pyplot as plt

#ro – red colour circles.


plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro')
plt.axis([0, 6, 0, 20])
plt.show( )

#plotting multiple lines with different style

import pyplot as plt

import numpy as np

t = np.arange(0., 5., 0.2)


# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
plt.show()

#4 - Ploting sine wave.

Code Sample Output

from matplotlib import pyplot as plt

import numpy as np

import math

#ndarray object containing angle between 0 and 2

x = np.arange(0, math.pi*2, 0.05)


y = np.sin(x)

plt.plot(x,y)

plt.xlabel("angle")

plt.ylabel("sine")

plt.title('sine wave')

plt.show()

#5 - Ploting with categorical variables

It is also possible to create a plot using categorical variables. Matplotlib allows you to pass categorical
variables directly to many plotting functions

Code

import pyplot as plt

names = ['group_a', 'group_b', 'group_c']


values = [1, 10, 100]
plt.figure(figsize=(9, 3))
plt.subplot(131)
plt.bar(names, values)
plt.subplot(132)
plt.scatter(names, values)
plt.subplot(133)
plt.plot(names, values)
plt.suptitle('Categorical Plotting')
plt.show()

Sample Output
RESULT:
Ex. No: 4
FREQUENCY DISTRIBUTION IN PYTHON
Date:

AIM:

To write a Python program to create frequency table and cummulative sums for the given data
set

THEORY :

FREQUENCY DISTRIBUTION:

To get frequency table of column in pandas, the following three methods are used

• Frequency table of column in pandas python is created using value_counts() function.


• crosstab() function in pandas is used to get the cross table or frequency table. It takes up the
column name as argument counts the frequency of occurrence of its values
• groupby() count function is used to get the frequency count of the dataframe. It takes up the
column name as argument followed by count() function

Cummulative Sum:

Cumulative sum of a column in pandas is computed using cumsum() function and stored in the
new column namely “cumulative_Tax”. axis =0indicates the column wise performance

ALGORITHM:

Step 1: Import the necessary library such as numpy and pandas.

Step 2: Create the data frame for the given data sing the function pd.DataFrame().

Step 3: Calculate the frequency count for the state using the function value_counts().

Step 4: Calculate the cumulative sum using cumsum() for the tax data.

State:[Alaska, California,Texas,North Carolina,California,Texas, Alaska, Texas, North Carolina,


Alaska, California,Texas],
Sales:[14,24,31,12,13,7,9,31,18,16,18,14],Tax:[14,24,31,12,13,7,9,31,18,16,18,14]

PROGRAM:

import pandas as pd

import numpy as np
data = { 'State':['Alaska', 'California', 'Texas', 'North Carolina', 'California', 'Texas', 'Alaska',
'Texas', 'North Carolina', 'Alaska', 'California', 'Texas'],

'Sales':[14,24,31,12,13,7,9,31,18,16,18,14],

‘Tax’:[14,24,31,12,13,7,9,31,18,16,18,14]}

df1=pd.DataFrame(data,columns=[‘State’, ‘Sales’, ‘Tax’])

print(df1)
#method-1
df1.State.value_counts()
print(df1)
#method-2
df1['State'].value_counts()
print(df1)
#method-3
my_tab = pd.crosstab(index=df1[“State”], columns=”count”)
print(my_tab)
#method-4
df1.groupby(['State'])['Sales'].count()
#cumulative sum
df1['cumulative_Tax']=df1['Tax'].cumsum(axis = 0)
print(df1)

OUTPUT:
RESULT:
Ex.No: 5
AVERAGES IN PYTHON
Date:

AIM:
To implement the different methods of finding the average of a list in python.
Theory:
Given a list of numbers, the task is to find average of that list. Average is the sum of elements
divided by the number of elements.

EXERCISE
#1 - Using sum( ) method

In Python, the average of a list can be computed by simply using the sum() and len() function.

• sum() : Using sum() function we can get the sum of the list.
• len() : len() function is used to get the length or the number of elements in a list .
Code Sample Output

# Python program to get average of a list

def Average(lst):

return sum(lst) / len(lst)

# Driver Code

lst = [15, 9, 55, 41, 35, 20, 62, 49]

average = Average(lst)

# Printing average of the list


print(average)

#2 - Using reduce() and lambda() method

The reduce() can be used to reduce the loop and by using the lambda function can compute
summation of list. len() function is used to get the length or the number of elements in a list .
Code Sample Output

# importing reduce()

from functools import reduce

def Average(lst):

return reduce(lambda a, b: a + b, lst) / len(lst)

# Driver Code

lst = [15, 9, 55, 41, 35, 20, 62, 49]

average = Average(lst)

# Printing average of the list

print("Average of the list =", round(average, 2))

#3 - Using mean() method

The inbuilt function mean() can be used to calculate the mean( average ) of the list.

Code Sample Output

# importing mean()

from statistics import mean

def Average(lst):

return mean(lst)

# Driver Code

lst = [15, 9, 55, 41, 35, 20, 62, 49]

average = Average(lst)

# Printing average of the list

print("Average =", round(average, 2))

RESULT:
Ex.No: 6
VARIANCE IN PYTHON
Date:

AIM:
To implement the methods of finding the variance of a sample list in python.
THEORY:

▪ Statistics module provides very powerful tools, which can be used to compute anything
related to Statistics. variance() is one such function.
▪ This function helps to calculate the variance from a sample of data (sample is a subset of
populated data).
variance() function should only be used when variance of a sample needs to be calculated.
▪ There’s another function known as pvariance(), which is used to calculate the variance of an
entire population.
Steps for calculating the variance

Step 1: Find the mean. To find the mean, add up all the scores, then divide them by the
number of scores. ...
Step 2: Find each score's deviation from the mean. ...
Step 3: Square each deviation from the mean. ...
Step 4: Find the sum of squares. ...
Step 5: Divide the sum of squares by n – 1 or N.
EXERCISES:
#1 - variance ( ) method

Code Sample Output

import statistics

sample = [1, 2, 3, 4, 5]

print("Variance of sample set is % s"

%(statistics.variance(sample)))

#2 - variance ( ) on range of data types


Code Sample Output

from statistics import variance

# importing fractions as parameter values

from fractions import Fraction as fr

sample1 = (1, 2, 5, 4, 8, 9, 12)

sample2 = (-2, -4, -3, -1, -5, -6)

sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),fr(5, 6), fr(7, 8))

sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)

print("Variance of Sample1 is % s " %(variance(sample1)))

print("Variance of Sample2 is % s " %(variance(sample2)))

print("Variance of Sample3 is % s " %(variance(sample3)))

print("Variance of Sample4 is % s " %(variance(sample4)))

print("Variance of Sample5 is % s " %(variance(sample5)))

#1 - using xbar parameter

Code Sample Output

# the use of xbar parameter

import statistics

# creating a sample list

sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2)

# calculating the mean of sample set

m = statistics.mean(sample)

# calculating the variance of sample set

print("Variance of Sample: % s"

%(statistics.variance(sample, xbar = m)))


RESULT:
Ex.No: 7
IMPLEMENTAION OF NORMAL CURVES
Date:

Exercise 7.1. Write a Python program to generate random numbers from normal distribution

AIM:

To generate five random numbers from normal distribution using Numpy array manipulation in python

ALGORITHM:

Step 1: Import the library such as numpy, seaborn and matplot.

Step2: Create an random single dimensional array of size 5.

Step 3: Using the numpy universal function, find the normal distribution values.

PROGRAM:

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

import statsmodels.api as sm

x1 = np.random.normal(size=5)

print(x1)

OUTPUT:

RESULT :
Exercise 7.2. Generate a random normal distribution of size 2x3, and also with mean and
standard deviation. To visualize the data use seaborn library.

AIM:

To write a python program to calculate the normal distribution for two dimensional array and to
use sea born library for visualization of curve.

ALGORITHM:

Step 1: Import the library numpy, pandas, matplot, statsmodels and seaborn.

Step 2: Create an nump array with size (2,3).

Step 3: Calculate the distribution values by “.normal()”.

Step 4: To calculate the distribution with mean and Standard deviation use “.normal(loc=mean, scale,
size_array)” function.

Step 5: Plot the distribution curve by “sns.distplot( )” function.

PROGRAM:

import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

import statsmodels.api as sm

x1 = np.random.normal( size = (2,3))

print(x1)

# with mean 1 and standard deviation of 2

x2=np.random.normal(loc=1,scale=2,size=(2,3))

print(x2)

#visualization:

sns.displot(np.random.normal(size=1000),kind="kde")

plt.show()
OUTPUT:

RESULT:
Ex.No: 8
CORRELATION BETWEEN VARIABLES
Date:

Write a python program to find the correlation between variables. Also create a heatmap using
seaborn to present their relations.

Sample data:

Temperature: 3, 39, 46, 45, 34, 38, 30, 22, 34, 38, 31, 14, 27, 39

wind speed: 4.3, 3.86, 9.73, 6.86, 7.8, 8.7, 16.46, 3.1, 9.1, 9.76, 17.9, 2.2, 8.23, 11.26

AIM:

To write a python program to find correlation between temperature and wind speed and to plot scatter
plot and heat map of the correlation.

Correlation:

Correlation is simple with seaborn and pandas. It is a way to determine if two variables in a dataset are
related in any way. Its features are strongly correlated.

ALGORITHM:

Step 1: Load the library pandas, scipy and seaborn

Step 2: Create a data frame for the given data.

Step 3: Set the x axis, y axis. And call function sns.scatterplot( ).

Step 4: Add the linear model fit line to a scatterplot using sea born’s lmplot( ) method.

Step 5: Calculate the pearson correlation coefficient values for x and y using the method pearsonr( ),
which is provided by scipy library.

PROGRAM:

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

from scipy import stats

weather={'temp':[3,39,46,45,34,38,30,22,34,38,31,14,27,39],
'wind_speed':[4.3,3.86,9.73,6.86,7.8,8.7,16.46,3.1,9.1,9.76,17.9,2.2,8.23,11.26]}

weather=pd.DataFrame(weather)

#Scatter plot

ax = sns.scatterplot(x="temp", y="wind_speed", data=weather)

ax.set_title("Temperaure Vs Wind_speed")

#adding best fit line

sns.lmplot(x="temp",y="wind_speed", data=weather)

plt.show()

#finding correlation between temp and wind speed

stats.pearsonr(weather['temp'],weather['wind_speed'])

cormat = weather.corr()

round(cormat,2)

print(cormat)

#correlation matrix to heatmap

sns.heatmap(cormat)

OUTPUT:
RESULT:
Exp No: 9 CALCULATION OF CORRELATION
Date: COEFFICIENT

Calculate correlation coefficient between two columns in a pandas data frames.

'points':[25,12,15,14,19,23,15,29],

'assists':[5,7,7,9,12,9,9,4],

'rebound':[11,8,10,6,6,5,9,12]

AIM:

To calculate the correlation coefficient between points using pandas data frames.

ALGORITHM:

Step 1: Load the pandas library


Step 2: Create a data frame for the given data by pd.DataFrame( ).
Step 3: Calculate the correlation coefficient using the method “.corr”
PROGRAM:

import pandas as pd

df=pd.DataFrame({'points':[25,12,15,14,19,23,15,29],

'assists':[5,7,7,9,12,9,9,4],

'rebound':[11,8,10,6,6,5,9,12]})

df.head()

coef=df['points'].corr(df['assists'])

print(coef)

OUTPUT:

RESULT:
Ex.No: 10
CREATION OF A LINEAR REGRESSION
Date:

A linear regression line has an equation of the Y=a+bX, where x is the explanatory variables
and Y is the dependent variable. The slope of the line is b, a is the intercept (the value of y when x =0).

AIM:

To write a python program to create the linear regression.

ALGORITHM:

Step 1: Import the library and packages such as numpy, Linear regression.

Step 2: Create an numpy array elements.

Step 3: Create a regression model and fit it with the given data. Method using sklearn.linear_model.
Linear Regression.

Step 4: Check the results of model fitting to know whether the model is satisfactory.

Step 5: Apply the model for prediction.

PROGRAM:

import numpy as np

from sklearn.linear_model import LinearRegression

x=np.array([5,15,25,35,45,55]).reshape((-1,1))

y=np.array([5,20,14,32,22,38])

print(x)

print(y)

model=LinearRegression()

model.fit(x,y)

model=LinearRegression().fit(x,y)

r_sq=model.score(x,y)
print('coefficient of determination:',r_sq)

print('intercept:',model.intercept_)

print('slope:',model.coef_)

new_model=LinearRegression().fit(x,y.reshape((-1,1)))

print('intercept:',new_model.intercept_)

print('slope:',new_model.coef_)

y_pred=model.predict(x)

print('predicted response:',y_pred,sep='\n')

y_pred=model.intercept_+model.coef_*x

print('predicted response;',y_pred, sep='\n')

x_new=np.arange(5).reshape((-1,1))

print(x_new)

y_new=model.predict(x_new)

print(y_new)

OUTPUT:
RESULT:
Ex.No: 11
IMPLEMENT DECISION TREE CLASSIFICATION
Date: TECHNIQUES

AIM
To implement a decision tree used to representing a decision situation in visually and
show all those factors within the analysis that are considered relevant to the decision.

ALGORITHM

Step 1:It begins with the original set S as the root node.

Step 2:On each iteration of the algorithm, it iterates through the very unused attribute of the set

S and calculates Entropy(H) and Information gain(IG) of this attribute.

Step 3:It then selects the attribute which has the smallest Entropy or Largest Information gain.

Step 4:The set S is then split by the selected attribute to produce a subset of the data.

Step 5:The algorithm continues to recur on each subset, considering only attributes never

selected before.

PROGRAM

library(MASS) library(rpart) head(birthwt) hist(birthwt$bwt)

table(birthwt$low)

cols <- c('low', 'race', 'smoke', 'ht', 'ui')

birthwt[cols] <- lapply(birthwt[cols], as.factor)

set.seed(1)

train<- sample(1:nrow(birthwt), 0.75 * nrow(birthwt))

birthwtTree<- rpart(low ~ . - bwt, data = birthwt[train, ], method = 'class')

plot(birthwtTree)
text(birthwtTree, pretty = 0)

summary(birthwtTree)

birthwtPred<- predict(birthwtTree, birthwt[-train, ], type = 'class')

table(birthwtPred, birthwt[-train, ]$low)

OUTPUT

RESULT
Ex.No: 12
IMPLEMENTATION OF CLUSTERING TECHNIQUES
Date:

AIM:

To implement a Clustering Techniques.

ALGORITHM:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. ...
Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.

PROGRAM:

from numpy import where


from sklearn.datasets import make_classification
from matplotlib import pyplot
# define dataset
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0,
n_clusters_per_class=1, random_state=4)
# create scatter plot for samples from each class
for class_value in range(2):
# get row indexes for samples with this class
row_ix = where(y == class_value)
# create scatter of these samples
pyplot.scatter(X[row_ix, 0], X[row_ix, 1])
# show the plot
pyplot.show()
OUTPUT:

RESULT:
Ex.No: 13 IMPLEMENT AN ALGORITHM TO DEMONSTRATE THE
Date:
SIGNIFICANCE OF GENETIC ALGORITHM

AIM:

To implement an algorithm to demonstrate to show how a genetic algorithm can evolve a


population of candidate solutions to maximize a given objective function, specifically the function
\( f(x) = x^2 \) where \( x \) is a binary string.

ALGORITHM :

Step 1:Generate an initial population of candidate solutions (binary strings).

Step 2:Compute the fitness of each candidate solution using the function \( f(x) = x^2 \).

Step 3:Select candidate solutions for reproduction based on their fitness.

Step 4:Perform crossover (recombination) to produce new candidate solutions.

Step 5:Apply mutations to introduce variability.

Step 6:Replace the old population with the new one.

Step 7 :Repeat the process until a stopping criterion is met (e.g., a maximum number of

generations).

PROGRAM :

import numpy as np

# Parameters

POP_SIZE = 20 # Population size

GENE_LENGTH = 10 # Length of binary strings

MUTATION_RATE = 0.01 # Probability of mutation

MAX_GENERATIONS = 100
# Functions

def create_population():

return np.random.randint(2, size=(POP_SIZE, GENE_LENGTH))

def fitness_function(individual):

binary_string = ''.join(str(bit) for bit in individual)

x = int(binary_string, 2) # Convert binary string to integer

return x**2 # The fitness function

def evaluate_population(population):

return np.array([fitness_function(ind) for ind in population])

def select(population, fitness):

probabilities = fitness / fitness.sum()

indices = np.random.choice(range(POP_SIZE), size=POP_SIZE, p=probabilities)

return population[indices]

def crossover(parent1, parent2):

point = np.random.randint(1, GENE_LENGTH-1)

child1 = np.concatenate((parent1[:point], parent2[point:]))

child2 = np.concatenate((parent2[:point], parent1[point:]))

return child1, child2

def mutate(individual):

mutation_indices = np.random.rand(GENE_LENGTH) < MUTATION_RATE

individual[mutation_indices] = 1 - individual[mutation_indices]

return individual

def genetic_algorithm():

population = create_population()

for generation in range(MAX_GENERATIONS):


fitness = evaluate_population(population)

if generation % 10 == 0:

print(f"Generation {generation}: Best fitness = {fitness.max()}")

selected_population = select(population, fitness)

new_population = []

for i in range(0, POP_SIZE, 2):

parent1, parent2 = selected_population[i], selected_population[i+1]

child1, child2 = crossover(parent1, parent2)

new_population.append(mutate(child1))

new_population.append(mutate(child2))

population = np.array(new_population)

final_fitness = evaluate_population(population)

best_individual = population[np.argmax(final_fitness)]

best_fitness = final_fitness.max()

return best_individual, best_fitness

# Run the genetic algorithm

best_individual, best_fitness = genetic_algorithm()

best_binary = ''.join(str(bit) for bit in best_individual)

best_x = int(best_binary, 2)

print(f"Best individual: {best_binary} (x = {best_x})")

print(f"Best fitness: {best_fitness}")


OUTPUT :

RESULT:

You might also like