0% found this document useful (0 votes)

38 views

Practical No - 1

Uploaded by

Deep Tayade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Practical No - 1

Uploaded by

Deep Tayade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Practical No-1

Date of Conduction: Date of Checking:

Data Wrangling, I
Perform the following operations using Python on any open source dataset (e.g., data.csv)
1. Import all the required Python Libraries.
2. Locate an open source data from the web (e.g. https://www.kaggle.com). Provide a clear
description of the data and its source (i.e., URL of the web site).
3. Load the Dataset into pandas data frame.
4. Data Preprocessing: check for missing values in the data using pandas insult(), describe()
function to get some initial statistics. Provide variable descriptions. Types of variables etc.
Check the dimensions of the data frame.
5. Data Formatting and Data Normalization: Summarize the types of variables by checking the
data types (i.e., character, numeric, integer, factor, and logical) of the variables in the data set.
If variables are not in the correct data type, apply proper type conversions.
6. Turn categorical variables into quantitative variables in Python. In addition to the codes and
outputs, explain every operation that you do in the above steps and explain everything that you
do to import/read/scrape the data set.

Python Code:

# 1. Import all the required Python Libraries.

import pandas as pd
import numpy as np

# 2. Locate an open source data from the web.

# In this example, I'll use the Iris dataset available at UCI ML Repository.
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# 3. Load the Dataset into pandas data frame.

column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
iris_df = pd.read_csv(url, names=column_names)
# Display the first few rows of the dataset to verify the import.
print("First few rows of the Iris dataset:")
print(iris_df.head())

# 4. Data Preprocessing:
# Check for missing values using pandas info(), describe() functions.
print("\nInformation about the dataset:")
print(iris_df.info())

print("\nDescriptive statistics of the dataset:")

print(iris_df.describe())

# Variable Descriptions:
# - Sepal Length, Sepal Width, Petal Length, Petal Width: Numeric variables.
# - Class: Categorical variable representing the species of iris flowers.

# Check the dimensions of the data frame.

print("\nDimensions of the dataset (rows, columns):", iris_df.shape)

# 5. Data Formatting and Normalization:

# Summarize the types of variables by checking data types.
print("\nData Types of Variables:")
print(iris_df.dtypes)

# Ensure that numeric variables are in the correct data type.

# In this case, they are already in the correct data types (float64).

# 6. Turn categorical variables into quantitative variables.

# The 'class' variable is categorical; we can use one-hot encoding to convert it to quantitative.
iris_df = pd.get_dummies(iris_df, columns=['class'], drop_first=True)
# Display the updated dataframe.
print("\nUpdated DataFrame after one-hot encoding:")
print(iris_df.head())
Explanation:

• The code starts by importing necessary libraries, including Pandas for data
manipulation and NumPy for numerical operations.
• The dataset URL is specified, and the read_csv function from Pandas is used to load the
dataset into a Pandas DataFrame.
• The info() and describe() functions are used to obtain initial statistics and check for
missing values.
• Variable descriptions are provided, and the dimensions of the DataFrame are printed.
• The data types of variables are displayed using dtypes.
• The 'class' variable is categorical, so one-hot encoding is applied using
pd.get_dummies() to convert it into quantitative variables.
• The updated DataFrame is displayed.

Output:

"C:\Users\Ram Kumar Solanki\PycharmProjects\pythonProject\venv\Scripts\python.exe"

"C:\Users\Ram Kumar Solanki\PycharmProjects\MBA_BFS\main.py"
First few rows of the Iris dataset:
sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

Information about the dataset:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 class 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
None

Descriptive statistics of the dataset:

sepal_length sepal_width petal_length petal_width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

Dimensions of the dataset (rows, columns): (150, 5)

Data Types of Variables:

sepal_length float64
sepal_width float64
petal_length float64
petal_width float64
class object
dtype: object

Updated DataFrame after one-hot encoding:

sepal_length sepal_width ... class_Iris-versicolor class_Iris-virginica
0 5.1 3.5 ... False False
1 4.9 3.0 ... False False
2 4.7 3.2 ... False False
3 4.6 3.1 ... False False
4 5.0 3.6 ... False False

[5 rows x 6 columns]

Process finished with exit code 0

Date :
Name &Signature of Instructor

Dr. Ram Kumar Solanki

The Reasoned Schemer
100% (2)
The Reasoned Schemer
177 pages
Exno 4
No ratings yet
Exno 4
13 pages
Assignment 5'
No ratings yet
Assignment 5'
4 pages
Experiment-2-1-Ml Kritika
No ratings yet
Experiment-2-1-Ml Kritika
11 pages
ML n PY Programs
No ratings yet
ML n PY Programs
17 pages
Chap5_wei.ipynb - Colab
No ratings yet
Chap5_wei.ipynb - Colab
29 pages
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
No ratings yet
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
7 pages
Lab Manual
No ratings yet
Lab Manual
32 pages
DSA_1
No ratings yet
DSA_1
8 pages
Iris Flower Classification
No ratings yet
Iris Flower Classification
47 pages
DS Journal_Final
No ratings yet
DS Journal_Final
37 pages
DS Journal-1
No ratings yet
DS Journal-1
25 pages
pr_6
No ratings yet
pr_6
6 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
EXP 07 (ML) - Sarthak
No ratings yet
EXP 07 (ML) - Sarthak
4 pages
Exp 07 (ML)
No ratings yet
Exp 07 (ML)
4 pages
5-1 dataframes intro load inspect - instruction
No ratings yet
5-1 dataframes intro load inspect - instruction
2 pages
Session-24 - Jupyter Notebook
No ratings yet
Session-24 - Jupyter Notebook
13 pages
ploomber-notebook-conversion_2
No ratings yet
ploomber-notebook-conversion_2
14 pages
SVM and Kmeans -Iris Dataset.ipynb - Colab
No ratings yet
SVM and Kmeans -Iris Dataset.ipynb - Colab
5 pages
25 - Assignment10.ipynb - Colaboratory
No ratings yet
25 - Assignment10.ipynb - Colaboratory
13 pages
Trần Mạnh Hùng 20192643.Ipynb - Colab
No ratings yet
Trần Mạnh Hùng 20192643.Ipynb - Colab
6 pages
Session-25 - Jupyter Notebook
No ratings yet
Session-25 - Jupyter Notebook
20 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
SC Assignment Q2
No ratings yet
SC Assignment Q2
7 pages
KRAI LabManual
No ratings yet
KRAI LabManual
77 pages
dsbda_assig_6_data_analytcs_3[1]
No ratings yet
dsbda_assig_6_data_analytcs_3[1]
6 pages
BDA pr2
No ratings yet
BDA pr2
2 pages
Machine Learning Algorithm
No ratings yet
Machine Learning Algorithm
18 pages
Untitled5 1
No ratings yet
Untitled5 1
13 pages
Assigntment 3 Python Lab
No ratings yet
Assigntment 3 Python Lab
1 page
DL experiment - 1
No ratings yet
DL experiment - 1
10 pages
Data Preprocessing Report
No ratings yet
Data Preprocessing Report
6 pages
B Question5
No ratings yet
B Question5
5 pages
Pandas Exercises
No ratings yet
Pandas Exercises
15 pages
batch1 ds
No ratings yet
batch1 ds
15 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
1. Data Wrangling 1
No ratings yet
1. Data Wrangling 1
4 pages
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
No ratings yet
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
1 page
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
Task 1
No ratings yet
Task 1
14 pages
PR Final File
No ratings yet
PR Final File
70 pages
Unsupervised ML
No ratings yet
Unsupervised ML
17 pages
Practical 2 51
No ratings yet
Practical 2 51
5 pages
Task 1 Iris Flower Classification Using Machine Learning
No ratings yet
Task 1 Iris Flower Classification Using Machine Learning
10 pages
Datascience Set A
No ratings yet
Datascience Set A
7 pages
Data Science Practicals - Ipynb
No ratings yet
Data Science Practicals - Ipynb
54 pages
Part A Assignment_No_1
No ratings yet
Part A Assignment_No_1
7 pages
A2 60 Rohit Jakkam EDA of Iris - Ipynb - Colaboratory
No ratings yet
A2 60 Rohit Jakkam EDA of Iris - Ipynb - Colaboratory
5 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
cota12-6
No ratings yet
cota12-6
4 pages
Exp 8 Rushya
No ratings yet
Exp 8 Rushya
8 pages
1 Assignment 3 - Classification
No ratings yet
1 Assignment 3 - Classification
16 pages
Normalization and PCA
No ratings yet
Normalization and PCA
12 pages
Python_for_Kids
No ratings yet
Python_for_Kids
19 pages
Praveen Ai
No ratings yet
Praveen Ai
6 pages
Lecture 3 Part 1 Understanding Data With Statistics
No ratings yet
Lecture 3 Part 1 Understanding Data With Statistics
7 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
Introduction To Matlab
No ratings yet
Introduction To Matlab
45 pages
Parallel Processors: Session4 Program Partitioning and Computational Granularity
No ratings yet
Parallel Processors: Session4 Program Partitioning and Computational Granularity
39 pages
PPT ch03
No ratings yet
PPT ch03
39 pages
Scratch Programming Guide
100% (2)
Scratch Programming Guide
39 pages
CRM CRM MasscopyofUIconfigfromoneroletoanother
No ratings yet
CRM CRM MasscopyofUIconfigfromoneroletoanother
5 pages
CRC Card
No ratings yet
CRC Card
14 pages
Computer Applications Practice Paper ICSE 10th
100% (1)
Computer Applications Practice Paper ICSE 10th
17 pages
PB Application Techniques
No ratings yet
PB Application Techniques
784 pages
TIOBE Index - TIOBE
No ratings yet
TIOBE Index - TIOBE
9 pages
40 User Guideline PDF
100% (1)
40 User Guideline PDF
41 pages
Claude Prompts
100% (1)
Claude Prompts
2 pages
Reports
No ratings yet
Reports
26 pages
FUNCTION Zidoc Input Zinv
No ratings yet
FUNCTION Zidoc Input Zinv
1 page
Lock.: Chapter 2: Deadlocks 2.1 Introduction To Deadlock
No ratings yet
Lock.: Chapter 2: Deadlocks 2.1 Introduction To Deadlock
10 pages
Computer Science Class XI
No ratings yet
Computer Science Class XI
4 pages
Oo
No ratings yet
Oo
691 pages
Stack and Subroutines
No ratings yet
Stack and Subroutines
25 pages
BCEE 231, Lecture 1, Sep 9 2024
No ratings yet
BCEE 231, Lecture 1, Sep 9 2024
37 pages
Sforce API Reference Manual
No ratings yet
Sforce API Reference Manual
138 pages
Algorithm Implementation Source Code:: Program
No ratings yet
Algorithm Implementation Source Code:: Program
5 pages
Chapter 1 - Basic Concepts of Programming
No ratings yet
Chapter 1 - Basic Concepts of Programming
81 pages
Acm Icpc
No ratings yet
Acm Icpc
3 pages
Addressing Modes
No ratings yet
Addressing Modes
7 pages
Interview Questions For Oracle
No ratings yet
Interview Questions For Oracle
17 pages
Algorithm and Flow Chart Ss2
No ratings yet
Algorithm and Flow Chart Ss2
3 pages
Accenture Coding questions
No ratings yet
Accenture Coding questions
49 pages
Study Program Curriculum 2023 Informatics
No ratings yet
Study Program Curriculum 2023 Informatics
46 pages
Optimisation Techniques Syllabus BIT Mesra PDF
No ratings yet
Optimisation Techniques Syllabus BIT Mesra PDF
1 page
Algorithms: A Brief Introduction Notes: Section 2.1 of Rosen
No ratings yet
Algorithms: A Brief Introduction Notes: Section 2.1 of Rosen
6 pages

Practical No - 1

Uploaded by

Practical No - 1

Uploaded by

Practical No-1

Date of Conduction: Date of Checking:

# 1. Import all the required Python Libraries.

# 2. Locate an open source data from the web.

# 3. Load the Dataset into pandas data frame.

print("\nDescriptive statistics of the dataset:")

# Check the dimensions of the data frame.

# 5. Data Formatting and Normalization:

# Ensure that numeric variables are in the correct data type.

# 6. Turn categorical variables into quantitative variables.

"C:\Users\Ram Kumar Solanki\PycharmProjects\pythonProject\venv\Scripts\python.exe"

Information about the dataset:

Descriptive statistics of the dataset:

Dimensions of the dataset (rows, columns): (150, 5)

Data Types of Variables:

Updated DataFrame after one-hot encoding:

Process finished with exit code 0

Dr. Ram Kumar Solanki

You might also like