0% found this document useful (0 votes)
15 views8 pages

practical-7

The document outlines a practical assignment for a Python Programming Lab focused on data manipulation using the Pandas library. Students are required to develop an application that performs various operations on a car dataset, including data cleaning and analysis. It also provides information on hardware and software requirements, theoretical background on Pandas, and expected learning outcomes.

Uploaded by

abhaykatre70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

practical-7

The document outlines a practical assignment for a Python Programming Lab focused on data manipulation using the Pandas library. Students are required to develop an application that performs various operations on a car dataset, including data cleaning and analysis. It also provides information on hardware and software requirements, theoretical background on Pandas, and expected learning outcomes.

Uploaded by

abhaykatre70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Python Programming Lab (N-SECCS401P)

S. B. JAIN INSTITUTE OF TECHNOLOGY,


MANAGEMENT & RESEARCH, NAGPUR.

Practical No. 7
Aim: Develop an application which can read data from a given dataset.
The dataset contains information regarding cars manufactured by
various companies. Ensure your application does following operations:
● Print the first and last five rows.
● Clean the dataset and update the CSV file
● Find the most expensive car company name
● Print All Toyota Cars details
● Count total cars per company

Name of Student:
Roll No.:
Semester/Year:
Academic Session:
Date of Performance:
Date of Submission:

Department of Computer Science & Engineering, S.B.J.I.T.M.R., Nagpur


Python Programming Lab (N-SECCS401P)

AIM: Develop an application which can read data from a given dataset. The dataset contains
information regarding cars manufactured by various companies. Ensure your application does
following operations:
● Print the first and last five rows.
● Clean the dataset and update the CSV file
● Find the most expensive car company name
● Print All Toyota Cars details
● Count total cars per company

OBJECTIVE/EXPECTED LEARNING OUTCOME:


● To be able to understand the pandas library.
● To be able to understand the data frames.
● To be able to understand operations on data frames.

HARDWARE AND SOFTWARE REQUIREMENT:


Hardware Requirement
● Processor : Dual Core
● RAM : 1GB
● Hard Disk Drive : > 80 GB

Software Requirement
● Operating System – Windows 2007 and Ubuntu
● Package used – Python3, Numpy, Pandas, Django
● IDE – Visual Studio, Pycharm
● Editors – Text editor, sublime text
● Online platform – Jupyter, Google co-lab

THEORY:
Pandas is an open-source Python package that is most widely used for data science/data
analysis and machine learning tasks. It is built on top of another package named Numpy, which
provides support for multi-dimensional arrays. As one of the most popular data wrangling
packages, Pandas works well with many other data science modules inside the Python

Department of Computer Science & Engineering, S.B.J.I.T.M.R., Nagpur


Python Programming Lab (N-SECCS401P)

ecosystem, and is typically included in every Python distribution, from those that come with
your operating system to commercial vendor distributions like ActiveState’s ActivePython.
Pandas makes it simple to do many of the time consuming, repetitive tasks associated
with working with data, including:
⮚ Data cleansing
⮚ Data fill
⮚ Data normalization
⮚ Merges and joins
⮚ Data visualization
⮚ Statistical analysis
⮚ Data inspection
⮚ Loading and saving data
⮚ And much more
In fact, with Pandas, you can do everything that makes world-leading data scientists
vote Pandas as the best data analysis and manipulation tool available.

Features of pandas:
• A fast and efficient DataFrame object for data manipulation with integrated indexing.
• Tools for reading and writing data between in-memory data structures and different formats:
CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format.
• Intelligent data alignment and integrated handling of missing data: gain automatic label-
based alignment in computations and easily manipulate messy data into an orderly form.
• Flexible reshaping and pivoting of data sets.
• Intelligent label-based slicing, fancy indexing, and subsetting of large data sets.
• Columns can be inserted and deleted from data structures for size mutability.
• Aggregating or transforming data with a powerful group by engine allowing split-apply-
combine operations on data sets.
• High performance merging and joining of data sets.
• Hierarchical axis indexing provides an intuitive way of working with high-dimensional data
in a lower-dimensional data structure.

Department of Computer Science & Engineering, S.B.J.I.T.M.R., Nagpur


Python Programming Lab (N-SECCS401P)

• Time series-functionality: date range generation and frequency conversion, moving window
statistics, date shifting and lagging. Even create domain-specific time offsets and join time
series without losing data.
• Highly optimized for performance, with critical code paths written in Cython or C.
• Python with pandas is in use in a wide variety of academic and commercial domains,
including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and
more.

Working of pandas:
After the pandas have been installed into the system, you need to import the library. This
module is generally imported as:
import pandas as pd
Here, pd is referred to as an alias to the Pandas. However, it is not necessary to import
the library using the alias, it just helps in writing less amount code every time a method or
property is called.
Pandas generally provide two data structures for manipulating data, They are:
• Series
• DataFrame

1. Series -
Pandas Series is a one-dimensional labelled array capable of holding data of any type (integer,
string, float, python objects, etc.). The axis labels are collectively called indexes. Pandas Series
is nothing but a column in an excel sheet. Labels need not be unique but must be a hashable
type. The object supports both integer and label-based indexing and provides a host of methods
for performing operations involving the index.
Example
import pandas as pd
import numpy as np
# Creating empty series
ser = pd.Series()
print(ser)
# simple array
data = np.array(['p', 'y', 't', 'h', 'o',’n’])

Department of Computer Science & Engineering, S.B.J.I.T.M.R., Nagpur


Python Programming Lab (N-SECCS401P)

ser = pd.Series(data)
print(ser)
Output
Series([], dtype: float64)

0 p

1 y

2 t

3 h

4 o

5 n

dtype: object

2. Dataframe -
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns). A Data frame is a two-dimensional data
structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame
consists of three principal components, the data, rows, and columns.
Example
import pandas as pd
# Calling DataFrame constructor
df = pd.DataFrame()
print(df)
# list of strings
lst = ['Python', 'programming', 'lab']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)
Output
Empty DataFrame

Columns: []

Department of Computer Science & Engineering, S.B.J.I.T.M.R., Nagpur


Python Programming Lab (N-SECCS401P)

Index: []

0 Python

1 programming

2 lab

ALGORITHM:

FLOWCHART:

Department of Computer Science & Engineering, S.B.J.I.T.M.R., Nagpur


Python Programming Lab (N-SECCS401P)

CODE:

INPUT & OUTPUT (With Different Test Cases):

Department of Computer Science & Engineering, S.B.J.I.T.M.R., Nagpur


Python Programming Lab (N-SECCS401P)

CONCLUSION:

DISCUSSION AND VIVA VOCE:


Q. 1) What are the feature of pandas library?
Q. 2) Mention the different types of Data Structures in Pandas?
Q. 3) Define the different ways a DataFrame can be created in pandas?
Q. 4) Define Series in pandas’ library?
Q. 5) Which kind of task performed by pandas?

REFERENCE:
1. https://www.geeksforgeeks.org/introduction-to-pandas-in-python/
2. https://www.youtube.com/watch?v=UB3DE5Bgfx4
3. https://www.youtube.com/watch?v=PfVxFV1ZPnk

Department of Computer Science & Engineering, S.B.J.I.T.M.R., Nagpur

You might also like