0% found this document useful (0 votes)
16 views4 pages

DWM Practical

The document discusses two programs related to data cleaning and analysis techniques. The first program implements data smoothing using binning techniques like mean, median and boundaries. The second program calculates measures of central tendency like mean, median, mode, calculates five number summary and identifies outliers.

Uploaded by

kirtanpatel6365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views4 pages

DWM Practical

The document discusses two programs related to data cleaning and analysis techniques. The first program implements data smoothing using binning techniques like mean, median and boundaries. The second program calculates measures of central tendency like mean, median, mode, calculates five number summary and identifies outliers.

Uploaded by

kirtanpatel6365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

200410116027 TY-IT-2-C

PRACTICAL-1

AIM:Write a program which implements-“Data Cleaning” Smoothing by


binning techniques mean, median and boundaries.

INPUT:

import numpy as np
import math
from sklearn.datasets import load_iris
from sklearn import datasets, linear_model, metrics

# load iris data set


dataset = load_iris()
a = dataset.data
b =np.zeros(150)

# take 1st column among 4 column of data set


for i in range (150):
b[i]=a[i,1]
b=np.sort(b)

# create bins
bin1=np.zeros((30,5))
bin2=np.zeros((30,5))
bin3=np.zeros((30,5))

# Bin mean
for i in range (0,150,5):
k=int(i/5)
mean=(b[i] + b[i+1] + b[i+2] + b[i+3] + b[i+4])/5
for j in range(5):
bin1[k,j]=mean
print("Smoothing by Bin Mean: \n",bin1)

# Bin boundaries
for i in range (0,150,5):
k=int(i/5)
for j in range (5):

SVIT 1
200410116027 TY-IT-2-C

if (b[i+j]-b[i]) < (b[i+4]-b[i+j]):


bin2[k,j]=b[i]
else:
bin2[k,j]=b[i+4]
print("Smoothing by Bin Boundaries: \n",bin2)

# Bin median
for i in range (0,150,5):
k=int(i/5)
for j in range (5):
bin3[k,j]=b[i+2]
print("Smoothing by Bin Median: \n",bin3)

OUTPUT:

Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24,
25, 26, 28, 29, 34
Partition using equal frequency approach:

- Bin 1 : 4, 8, 9, 15
- Bin 2 : 21, 21, 24, 25
- Bin 3 : 26, 28, 29, 34
Smoothing by bin means:

SVIT 2
200410116027 TY-IT-2-C

PRACTICAL-2

AIM: Write a program for “central tendency of data”to calculate mean,


median, mode, midrange and five number summary.

INPUT:

import statistics
import matplotlib.pyplot as plt
import numpy as np

data=[13 , 15, 16, 16, 19, 20, 20, 21, 22, 22, 22, 25, 25, 25, 25, 30, 33, 33,
35, 35, 35, 35, 36, 40, 45, 46, 53, 70]
print(data)
print("mean of given data %s"%(statistics.mean(data)))
print("median of given data %s"%(statistics.median(data)))
print("mode of given data %s" %(statistics.mode(data)))

min=np.min(data)
print("min value of given data", min)

max=np.max(data)
print("max value of given data",max)

Q1=np.percentile(data,25)
print("Q1 is" ,Q1)

Q2=np.percentile(data, 50)
print("Q2 is" ,Q2)

Q3=np.percentile(data,75)
print("Q3 is",Q3)

IQR=(Q3 - Q1)
print("IQR is",IQR)

midrange=(max-min)/2
print("midrange is ",midrange)

out = 1.5*IQR

SVIT 3
200410116027 TY-IT-2-C

lb=Q1-out
ub=Q3+out
outliers=[]
for i in data:
if i<lb:
outliers.append(i)
elif i>ub:
outliers.append(i)
print("outliers is",outliers)

fig = plt.figure(figsize =(8, 5))


plt.boxplot(data)
plt.show()

OUTPUT:

- Bin 2: 23, 23, 23, 23


- Bin 3: 29, 29, 29, 29
Smoothing by bin boundaries:

- Bin 1: 4, 4, 4, 15

SVIT 4

You might also like