0% found this document useful (0 votes)

3 views

Exp-2

Uploaded by

C 10 Mayur Sonawane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Exp-2

Uploaded by

C 10 Mayur Sonawane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Department of Computer Engineering Subject : DSBDAL

----------------------------------------------------------------------------------------------------------------

Group A
Assignment No: 2
----------------------------------------------------------------------------------------------------------------
Contents for Theory:
1. Creation of Dataset using Microsoft Excel.
2. Identification and Handling of Null Values
3. Identification and Handling of Outliers
4. Data Transformation for the purpose of :
a. To change the scale for better understanding
b. To decrease the skewness and convert distribution into normal distribution
---------------------------------------------------------------------------------------------------------------
Theory:
1. Creation of Dataset using Microsoft Excel.
The dataset is created in “CSV” format.
● The name of dataset is StudentsPerformance
● The features of the dataset are: Math_Score, Reading_Score, Writing_Score,
Placement_Score, Club_Join_Date .
● Number of Instances: 30
● The response variable is: Placement_Offer_Count .
● Range of Values:
Math_Score [60-80], Reading_Score[75-,95], ,Writing_Score [60,80],
Placement_Score[75-100], Club_Join_Date [2018-2021].
● The response variable is the number of placement offers facilitated to particular
students, which is largely depend on Placement_Score
To fill the values in the dataset the RANDBETWEEN is used. Returns a random
integer number between the numbers you specify
Syntax : RANDBETWEEN(bottom, top) Bottom The smallest integer and
Top The largest integer RANDBETWEEN will return.
For better understanding and visualization, 20% impurities are added into each variable
to the dataset.

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

The step to create the dataset are as follows:

Step 1: Open Microsoft Excel and click on Save As. Select Other .Formats

Step 2: Enter the name of the dataset and Save the dataset astye CSV(MS-DOS).

Step 3: Enter the name of features as column header.

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 3: Fill the dara by using RANDOMBETWEEN function. For every feature , fill
the data by considering above spectified range.
one example is given:

Scroll down the cursor for 30 rows to create 30 instances.

Repeat this for the features, Reading_Score, Writing_Score, Placement_Score,
Club_Join_Date.

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

The placement count largely depends on the placement score. It is considered that if
placement score <75, 1 offer is facilitated; for placement score >75 , 2 offer is facilitated
and for else (>85) 3 offer is facilitated. Nested If formula is used for ease of data filling.

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 4: In 20% data, fill the impurities. The range of math score is [60,80], updating a
few instances values below 60 or above 80. Repeat this for Writing_Score [60,80],
Placement_Score[75-100], Club_Join_Date [2018-2021].

Step 5: To violate the ruleof response variable, update few valus . If placement scoreis
greater then 85, facilated only 1 offer.

The dataset is created with the given description.

2. Identification and Handling of Null Values

Missing Data can occur when no information is provided for one or more items or for a
whole unit. Missing Data is a very big problem in real-life scenarios. Missing Data can
also refer to as NA(Not Available) values in pandas. In DataFrame sometimes many
datasets simply arrive with missing data, either because it exists and was not collected or
it never existed. For Example, Suppose different users being surveyed may choose not to
share their income, some users may choose not to share the address in this way many
datasets went missing.
In Pandas missing data is represented by two value:

1. None: None is a Python singleton object that is often used for missing data in
Python code.
2. NaN : NaN (an acronym for Not a Number), is a special floating-point value
recognized by all systems that use the standard IEEE floating-point
representation.

Pandas treat None and NaN as essentially interchangeable for indicating missing
or null values. To facilitate this convention, there are several useful functions for
detecting, removing, and replacing null values in Pandas DataFrame :
SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS
Department of Computer Engineering Subject : DSBDAL

● isnull()
● notnull()
● dropna()
● fillna()
● replace()
1. Checking for missing values using isnull() and notnull()

● Checking for missing values using isnull()

In order to check null values in Pandas DataFrame, isnull() function is used. This
function return dataframe of Boolean values which are True for NaN values.

Algorithm:
Step 1 : Import pandas and numpy in order to check missing values in Pandas
DataFrame
import pandas as pd
import numpy as np
Step 2: Load the dataset in dataframe object df
df=pd.read_csv("/content/StudentsPerformanceTest1.csv")
Step 3: Display the data frame
df

Step 4: Use isnull() function to check null values in the dataset.

df.isnull()

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 5: To create a series true for NaN values for specific columns. for example
math score in dataset and display data with only math score as NaN
series = pd.isnull(df["math score"])
df[series]

● Checking for missing values using notnull()

In order to check null values in Pandas Dataframe, notnull() function is used. This
function return dataframe of Boolean values which are False for NaN values.

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 4: Use notnull() function to check null values in the dataset.

df.notnull()

Step 5: To create a series true for NaN values for specific columns. for example
math score in dataset and display data with only math score as NaN
series1 = pd.notnull(df["math score"])
df[series1]

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

See that there are also categorical values in the dataset, for this, you need to use
Label Encoding or One Hot Encoding.
■ from sklearn.preprocessing import LabelEncoder
■ le = LabelEncoder()
■ df['gender'] = le.fit_transform(df['gender'])
■ newdf=df
df

2. Filling missing values using dropna(), fillna(), replace()

In order to fill null values in a datasets, fillna(), replace() functions are used.
These functions replace NaN values with some value of their own. All these
functions help in filling null values in datasets of a DataFrame.

● For replacing null values with NaN

missing_values = ["Na", "na"]

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

df = pd.read_csv("StudentsPerformanceTest1.csv", na_values =
missing_values)
df

● Filling null values with a single value

Step 1 : Import pandas and numpy in order to check missing values in Pandas
DataFrame
import pandas as pd
import numpy as np
Step 2: Load the dataset in dataframe object df
df=pd.read_csv("/content/StudentsPerformanceTest1.csv")
Step 3: Display the data frame
df
Step 4: filling missing value using fillna()
ndf=df
ndf.fillna(0)

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 5: filling missing values using mean, median and standard deviation of that
column.

data['math score'] = data['math score'].fillna(data['math score'].mean())

data[''math score''] = data[''math score''].fillna(data[''math

score''].median())

data['math score''] = data[''math score''].fillna(data[''math score''].std())

replacing missing values in forenoon column with minimum/maximum number

of that column

data[''math score''] = data[''math score''].fillna(data[''math score''].min())

data[''math score''] = data[''math score''].fillna(data[''math score''].max())

● Filling null values in dataset

To fill null values in dataset use inplace=true
m_v=df['math score'].mean()
df['math score'].fillna(value=m_v, inplace=True)
df

● Filling a null values using replace() method

Following line will replace Nan value in dataframe with value -99
ndf.replace(to_replace = np.nan, value = -99)

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

● Deleting null values using dropna() method

In order to drop null values from a dataframe, dropna() function is used. This
function drops Rows/Columns of datasets with Null values in different ways.
1. Dropping rows with at least 1 null value
2. Dropping rows if all values in that row are missing
3. Dropping columns with at least 1 null value.
4. Dropping Rows with at least 1 null value in CSV file

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 5: To Drop rows if all values in that row are missing

ndf.dropna(how = 'all')

Step 6: To Drop columns with at least 1 null value.

ndf.dropna(axis = 1)

Step 7 : To drop rows with at least 1 null value in CSV file.

making new data frame with dropped NA values
new_data = ndf.dropna(axis = 0, how ='any')
new_data

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

3. Identification and Handling of Outliers

3.1 Identification of Outliers
One of the most important steps as part of data preprocessing is detecting and treating the
outliers as they can negatively affect the statistical analysis and the training process of a
machine learning algorithm resulting in lower accuracy.
○ 1. What are Outliers?
We all have heard of the idiom ‘odd one out' which means something unusual in
comparison to the others in a group.

Similarly, an Outlier is an observation in a given dataset that lies far from the rest
of the observations. That means an outlier is vastly larger or smaller than the remaining
values in the set.

○ 2. Why do they occur?

An outlier may occur due to the variability in the data, or due to experimental
error/human error.

They may indicate an experimental error or heavy skewness in the

data(heavy-tailed distribution).

○ 3. What do they affect?

In statistics, we have three measures of central tendency namely Mean, Median,
and Mode. They help us describe the data.

Mean is the accurate measure to describe the data when we do not have any
outliers present. Median is used if there is an outlier in the dataset. Mode is used if there
is an outlier AND about ½ or more of the data is the same.

‘Mean’ is the only measure of central tendency that is affected by the outliers
which in turn impacts Standard deviation.

■ Example:
Consider a small dataset, sample= [15, 101, 18, 7, 13, 16, 11, 21, 5, 15, 10, 9]. By
looking at it, one can quickly say ‘101’ is an outlier that is much larger than the other
values.

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

fig. Computation with and without outlier

From the above calculations, we can clearly say the Mean is more affected than the
Median.
○ 4. Detecting Outliers
If our dataset is small, we can detect the outlier by just looking at the dataset. But
what if we have a huge dataset, how do we identify the outliers then? We need to use
visualization and mathematical techniques.

Below are some of the techniques of detecting outliers

● Boxplots
● Scatterplots
● Z-score
● Inter Quantile Range(IQR)

4.1 Detecting outliers using Boxplot:

It captures the summary of the data effectively and efficiently with only a simple
box and whiskers. Boxplot summarizes sample data using 25th, 50th, and 75th
percentiles. One can just get insights(quartiles, median, and outliers) into the dataset by
just looking at its boxplot.
Algorithm:
Step 1 : Import pandas and numpy libraries
import pandas as pd
import numpy as np
Step 2: Load the dataset in dataframe object df
df=pd.read_csv("/content/demo.csv")
Step 3: Display the data frame
df

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 4:Select the columns for boxplot and draw the boxplot.

col = ['math score', 'reading score' , 'writing

score','placement score']
df.boxplot(col)

Step 5: We can now print the outliers for each column with reference to the box plot.
print(np.where(df['math score']>90))
print(np.where(df['reading score']<25))
print(np.where(df['writing score']<30))

4.2 Detecting outliers using Scatterplot:

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

It is used when you have paired numerical data, or when your dependent variable
has multiple values for each reading independent variable, or when trying to determine
the relationship between the two variables. In the process of utilizing the scatter plot, one
can also use it for outlier detection.
To plot the scatter plot one requires two variables that are somehow related to
each other. So here Placement score and Placement count features are used.
Algorithm:
Step 1 : Import pandas , numpy and matplotlib libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Step 2: Load the dataset in dataframe object df

df=pd.read_csv("/content/demo.csv")
Step 3: Display the data frame
df
Step 4: Draw the scatter plot with placement score and placement offer count
fig, ax = plt.subplots(figsize = (18,10))
ax.scatter(df['placement score'], df['placement offer
count'])
plt.show()
Labels to the axis can be assigned (Optional)
ax.set_xlabel('(Proportion non-retail business
acres)/(town)')
ax.set_ylabel('(Full-value property-tax rate)/(
$10,000)')

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 5: We can now print the outliers with reference to scatter plot.
print(np.where((df['placement score']<50) & (df['placement
offer count']>1)))
print(np.where((df['placement score']>85) & (df['placement
offer count']<3)))

4.3 Detecting outliers using Z-Score:

Z-Score is also called a standard score. This value/score helps to
understand how far is the data point from the mean. And after setting up a
threshold value one can utilize z score values of data points to define the outliers.
Zscore = (data_point -mean) / std. deviation

Algorithm:
Step 1 : Import numpy and stats from scipy libraries
import numpy as np
from scipy import stats

Step 2: Calculate Z-Score for mathscore column

z = np.abs(stats.zscore(df['math score']))
Step 3: Print Z-Score Value. It prints the z-score values of each data item
of the column
print(z)

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 4: Now to define an outlier threshold value is chosen.

threshold = 0.18
Step 5: Display the sample outliers
sample_outliers = np.where(z <threshold)
sample_outliers

4.4 Detecting outliers using Inter Quantile Range(IQR):

IQR (Inter Quartile Range) Inter Quartile Range approach to finding the
outliers is the most commonly used and most trusted approach used in the
research field.
IQR = Quartile3 – Quartile1
To define the outlier base value is defined above and below datasets
normal range namely Upper and Lower bounds, define the upper and the lower
bound (1.5*IQR value is considered) :

upper = Q3 +1.5*IQR
lower = Q1 – 1.5*IQR
In the above formula as according to statistics, the 0.5 scale-up of IQR
(new_IQR = IQR + 0.5*IQR) is taken.

Algorithm:
Step 1 : Import numpy library
import numpy as np

Step 2: Sort Reading Score feature and store it into sorted_rscore.

sorted_rscore= sorted(df['reading score'])
Step 3: Print sorted_rscore
sorted_rscore
Step 4: Calculate and print Quartile 1 and Quartile 3
q1 = np.percentile(sorted_rscore, 25)

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

q3 = np.percentile(sorted_rscore, 75)
print(q1,q3)

Step 5: Calculate value of IQR (Inter Quartile Range)

IQR = q3-q1
Step 6: Calculate and print Upper and Lower Bound to define the
outlier base value.
lwr_bound = q1-(1.5*IQR)
upr_bound = q3+(1.5*IQR)
print(lwr_bound, upr_bound)

Step 7: Print Outliers

r_outliers = []
for i in sorted_rscore:
if (i<lwr_bound or i>upr_bound):
r_outliers.append(i)
print(r_outliers)

3.2 Handling of Outliers:

For removing the outlier, one must follow the same process of removing an entry
from the dataset using its exact position in the dataset because in all the above methods of
detecting the outliers end result is the list of all those data items that satisfy the outlier
definition according to the method used.

Below are some of the methods of treating the outliers

● Trimming/removing the outlier

● Quantile based flooring and capping
● Mean/Median imputation

● Trimming/removing the outlier:

In this technique, we remove the outliers from the dataset. Although it is not a

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

good practice to follow.

new_df=df
for i in sample_outliers:
new_df.drop(i,inplace=True)
new_df

Here Sample_outliers are So instances with index 0,

12 ,16 and 17 are deleted.

● Quantile based flooring and capping:

In this technique, the outlier is capped at a certain value above the 90th percentile value
or floored at a factor below the 10th percentile value
df=pd.read_csv("/demo.csv")
df_stud=df
ninetieth_percentile = np.percentile(df_stud['math score'], 90)
b = np.where(df_stud['math score']>ninetieth_percentile,
ninetieth_percentile, df_stud['math score'])
print("New array:",b)

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

df_stud.insert(1,"m score",b,True)
df_stud

● Mean/Median imputation:
As the mean value is highly influenced by the outliers, it is advised to replace the
outliers with the median value.
1. Plot the box plot for reading score
col = ['reading score']
df.boxplot(col)

2. Outliers are seen in box plot.

3. Calculate the median of reading score by using sorted_rscore
median=np.median(sorted_rscore)
median
4. Replace the upper bound outliers using median value
refined_df=df
SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS
Department of Computer Engineering Subject : DSBDAL

refined_df['reading score'] = np.where(refined_df['reading

score'] >upr_bound, median,refined_df['reading score'])
5. Display redefined_df

6. Replace the lower bound outliers using median value

refined_df['reading score'] = np.where(refined_df['reading
score'] <lwr_bound, median,refined_df['reading score'])
7. Display redefined_df

8. Draw the box plot for redefined_df

col = ['reading score']
refined_df.boxplot(col)

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

4. Data Transformation for the purpose of :

Data transformation is the process of converting raw data into a format or structure that
would be more suitable for model building and also data discovery in general.The process
of data transformation can also be referred to as extract/transform/load (ETL). The
extraction phase involves identifying and pulling data from the various source systems
that create data and then moving the data to a single repository. Next, the raw data is
cleansed, if needed. It's then transformed into a target format that can be fed into
operational systems or into a data warehouse, a date lake or another repository for use in
business intelligence and analytics applications. The transformation The data are
transformed in ways that are ideal for mining the data. The data transformation involves
steps that are.
● Smoothing: It is a process that is used to remove noise from the dataset using some
algorithms It allows for highlighting important features present in the dataset. It
helps in predicting the patterns
● Aggregation: Data collection or aggregation is the method of storing and presenting
data in a summary format. The data may be obtained from multiple data sources to
SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS
Department of Computer Engineering Subject : DSBDAL

integrate these data sources into a data analysis description. This is a crucial step
since the accuracy of data analysis insights is highly dependent on the quantity and
quality of the data used.
● Generalization:It converts low-level data attributes to high-level data attributes
using concept hierarchy. For Example Age initially in Numerical form (22, 25) is
converted into categorical value (young, old).
● Normalization: Data normalization involves converting all data variables into a
given range. Some of the techniques that are used for accomplishing normalization
are:
○ Min–max normalization: This transforms the original data linearly.
○ Z-score normalization: In z-score normalization (or zero-mean normalization)
the values of an attribute (A), are normalized based on the mean of A and its
standard deviation.
○ Normalization by decimal scaling: It normalizes the values of an attribute by
changing the position of their decimal points
● Attribute or feature construction.
○ New attributes constructed from the given ones: Where new attributes are
created & applied to assist the mining process from the given set of attributes.
This simplifies the original data & makes the mining more efficient.
In this assignment , The purpose of this transformation should be one of the
following reasons:

a. To change the scale for better understanding (Attribute or feature

construction)
Here the Club_Join_Date is transferred to Duration.
Algorithm:
Step 1 : Import pandas and numpy libraries
import pandas as pd
import numpy as np
Step 2: Load the dataset in dataframe object df
df=pd.read_csv("/content/demo.csv")
Step 3: Display the data frame
df

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 3: Change the scale of Joining year to duration.

b. To decrease the skewness and convert distribution into normal distribution

(Normalization by decimal scaling)
Data Skewness: It is asymmetry in a statistical distribution, in which the curve
appears distorted or skewed either to the left or to the right. Skewness can be
quantified to define the extent to which a distribution differs from a normal
distribution.
Normal Distribution: In a normal distribution, the graph appears as a classical,
symmetrical “bell-shaped curve.” The mean, or average, and the mode, or
maximum point on the curve, are equal.

Positively Skewed Distribution

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

A positively skewed distribution means that the extreme data results are larger.
This skews the data in that it brings the mean (average) up. The mean will be
larger than the median in a Positively skewed distribution.
A negatively skewed distribution means the opposite: that the extreme data
results are smaller. This means that the mean is brought down, and the median is
larger than the mean in a negatively skewed distribution.

Reducing skewness A data transformation may be used to reduce skewness. A

distribution that is symmetric or nearly so is often easier to handle and interpret
than a skewed distribution. The logarithm, x to log base 10 of x, or x to log base e
of x (ln x), or x to log base 2 of x, is a strong transformation with a major effect
on distribution shape. It is commonly used for reducing right skewness and is
often appropriate for measured variables. It can not be applied to zero or negative
values.

Algorithm:
Step 1 : Detecting outliers using Z-Score for the Math_score variable and
remove the outliers.
Step 2: Observe the histogram for math_score variable.
import matplotlib.pyplot as plt
new_df['math score'].plot(kind = 'hist')
Step 3: Convert the variables to logarithm at the scale 10.
df['log_math'] = np.log10(df['math score'])

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Department of Computer Engineering Subject : DSBDAL

Step 4: Observe the histogram for math_score variable.

df['log_math'].plot(kind = 'hist')

It is observed that skewness is reduced at some level.

Conclusion: In this way we have explored the functions of the python library for Data
Identifying and handling the outliers. Data Transformations Techniques are explored with the
purpose of creating the new variable and reducing the skewness from datasets.
Assignment Question:
1. Explain the methods to detect the outlier.
2. Explain data transformation methods
3. Write the algorithm to display the statistics of Null values present in the dataset.
4. Write an algorithm to replace the outlier value with the mean of the variable.
.

SNJB’s Late Sau. K B Jain College of Engineering, Chandwad Dist. Nashik, MS

Machine Learning
100% (2)
Machine Learning
136 pages
KTMB Current Logistic Situation
No ratings yet
KTMB Current Logistic Situation
9 pages
Approaches To Understanding and Measuring Interdisciplinary Scientific Research (IDR) : A Review of The Literature
No ratings yet
Approaches To Understanding and Measuring Interdisciplinary Scientific Research (IDR) : A Review of The Literature
13 pages
Ass 2 DSBDL
No ratings yet
Ass 2 DSBDL
29 pages
Ass-2 Ds
No ratings yet
Ass-2 Ds
29 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Unit 5 Python
No ratings yet
Unit 5 Python
30 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Pandas
No ratings yet
Pandas
4 pages
Code explanation for date types
No ratings yet
Code explanation for date types
8 pages
Data Cleaning With Python and Pandas
No ratings yet
Data Cleaning With Python and Pandas
49 pages
Data Project
No ratings yet
Data Project
12 pages
Phython Example
No ratings yet
Phython Example
12 pages
Lecture 4 New Data Pre Processing
No ratings yet
Lecture 4 New Data Pre Processing
41 pages
EXP-12_IAIML
No ratings yet
EXP-12_IAIML
13 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Missing Data
No ratings yet
Missing Data
25 pages
Lecture 8 Handling Missing Values
No ratings yet
Lecture 8 Handling Missing Values
25 pages
Practical Exam Papers (2024)(Set - 1 and 2)with solutions
No ratings yet
Practical Exam Papers (2024)(Set - 1 and 2)with solutions
8 pages
Lesson 07 Data Manipulation With Pandas
No ratings yet
Lesson 07 Data Manipulation With Pandas
82 pages
FDS Unit 2
No ratings yet
FDS Unit 2
8 pages
Unit 3
No ratings yet
Unit 3
30 pages
Handling Missing Values
No ratings yet
Handling Missing Values
4 pages
Traversing Dataframe Elements Using: Iterrows, Iteritems and Itertuples
No ratings yet
Traversing Dataframe Elements Using: Iterrows, Iteritems and Itertuples
8 pages
Practice 1
No ratings yet
Practice 1
45 pages
12. Pandas -Dataframe - Handling Missing Nan Values
No ratings yet
12. Pandas -Dataframe - Handling Missing Nan Values
16 pages
Dataset Extraction and Datasetpre-Processing
No ratings yet
Dataset Extraction and Datasetpre-Processing
7 pages
12th - Mid-Term-IP
No ratings yet
12th - Mid-Term-IP
5 pages
Data Analysis by Using Python
No ratings yet
Data Analysis by Using Python
15 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
Pandas
No ratings yet
Pandas
30 pages
64[7]
No ratings yet
64[7]
4 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
Data Cleaning & Preparation
100% (2)
Data Cleaning & Preparation
2 pages
Document (4)-1
No ratings yet
Document (4)-1
15 pages
XII IP Practical List 2023-24
No ratings yet
XII IP Practical List 2023-24
4 pages
Adsl Exp 3 2024
No ratings yet
Adsl Exp 3 2024
11 pages
Kenny-230722-Data Cleaning With Python and Pandas - Detecting Missing Values
No ratings yet
Kenny-230722-Data Cleaning With Python and Pandas - Detecting Missing Values
13 pages
chai
No ratings yet
chai
5 pages
IAT-II FDS-Answer Key
No ratings yet
IAT-II FDS-Answer Key
11 pages
ip 123 questions
No ratings yet
ip 123 questions
6 pages
IP File
No ratings yet
IP File
2 pages
Info Practical
No ratings yet
Info Practical
56 pages
What Are the Different Ways to Handle Missing Values
No ratings yet
What Are the Different Ways to Handle Missing Values
2 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
DSBDA Lab Manual24-25
No ratings yet
DSBDA Lab Manual24-25
58 pages
Shivansh Rawat IP Practical File XII
No ratings yet
Shivansh Rawat IP Practical File XII
43 pages
prints
No ratings yet
prints
43 pages
Python
No ratings yet
Python
32 pages
Informatics Practices Practical List22-2323
No ratings yet
Informatics Practices Practical List22-2323
6 pages
Xii Ip May Test Set 1
100% (1)
Xii Ip May Test Set 1
2 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
Class Xii Ip Practical Main Questions
No ratings yet
Class Xii Ip Practical Main Questions
6 pages
Nitin
No ratings yet
Nitin
41 pages
ML Practical 03
No ratings yet
ML Practical 03
20 pages
Document (4)
No ratings yet
Document (4)
15 pages
Week 3
No ratings yet
Week 3
77 pages
Missing Data
No ratings yet
Missing Data
14 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Q CELLS Product Catalogue 2019-20 2019-09 Rev01 EN PDF
No ratings yet
Q CELLS Product Catalogue 2019-20 2019-09 Rev01 EN PDF
17 pages
Issued By: Issue Date Replaces
No ratings yet
Issued By: Issue Date Replaces
2 pages
Multi Level Packaging
No ratings yet
Multi Level Packaging
15 pages
Chem 0000
No ratings yet
Chem 0000
5 pages
Seminar Report On Satellite Communication
100% (1)
Seminar Report On Satellite Communication
17 pages
Data - 2023 - 2024
No ratings yet
Data - 2023 - 2024
837 pages
datesheet du 2025
No ratings yet
datesheet du 2025
3 pages
Tur y Bibiloni 2016
No ratings yet
Tur y Bibiloni 2016
5 pages
Anand CCBL Anilpathak
No ratings yet
Anand CCBL Anilpathak
24 pages
Drawing1gatry Gider BS-Model 12
No ratings yet
Drawing1gatry Gider BS-Model 12
1 page
Biology Study Day 1 Questions
No ratings yet
Biology Study Day 1 Questions
10 pages
Implant Placement Information and Consent Form
No ratings yet
Implant Placement Information and Consent Form
1 page
Scratch - Module 4 - Build A Game
No ratings yet
Scratch - Module 4 - Build A Game
48 pages
Test Your English
No ratings yet
Test Your English
6 pages
Mini Test Listening Unit 3
No ratings yet
Mini Test Listening Unit 3
1 page
Arnis Basic Stances
No ratings yet
Arnis Basic Stances
1 page
Memorandum of Deposit of Title Deeds: The Customer
No ratings yet
Memorandum of Deposit of Title Deeds: The Customer
7 pages
My flower
No ratings yet
My flower
13 pages
Types of Suturing Technique
No ratings yet
Types of Suturing Technique
35 pages
Resume Judy Ann Gonzales
No ratings yet
Resume Judy Ann Gonzales
2 pages
First Voyage Around The World Summary
No ratings yet
First Voyage Around The World Summary
4 pages
Measuring Instruments For Temperature: Resistance Thermometers
No ratings yet
Measuring Instruments For Temperature: Resistance Thermometers
20 pages
Software Engineering: Dr. S M Satapathy
No ratings yet
Software Engineering: Dr. S M Satapathy
128 pages
Hacker 3 RC
No ratings yet
Hacker 3 RC
291 pages
Basic Kite Flying
No ratings yet
Basic Kite Flying
3 pages
Dark Future - Golgotha Run
100% (1)
Dark Future - Golgotha Run
137 pages
Tech Note - Abseiling Loads
No ratings yet
Tech Note - Abseiling Loads
3 pages
Cengel (2013) Ejercicios Canales Abiertos
0% (1)
Cengel (2013) Ejercicios Canales Abiertos
3 pages