practical-7
practical-7
Practical No. 7
Aim: Develop an application which can read data from a given dataset.
The dataset contains information regarding cars manufactured by
various companies. Ensure your application does following operations:
● Print the first and last five rows.
● Clean the dataset and update the CSV file
● Find the most expensive car company name
● Print All Toyota Cars details
● Count total cars per company
Name of Student:
Roll No.:
Semester/Year:
Academic Session:
Date of Performance:
Date of Submission:
AIM: Develop an application which can read data from a given dataset. The dataset contains
information regarding cars manufactured by various companies. Ensure your application does
following operations:
● Print the first and last five rows.
● Clean the dataset and update the CSV file
● Find the most expensive car company name
● Print All Toyota Cars details
● Count total cars per company
Software Requirement
● Operating System – Windows 2007 and Ubuntu
● Package used – Python3, Numpy, Pandas, Django
● IDE – Visual Studio, Pycharm
● Editors – Text editor, sublime text
● Online platform – Jupyter, Google co-lab
THEORY:
Pandas is an open-source Python package that is most widely used for data science/data
analysis and machine learning tasks. It is built on top of another package named Numpy, which
provides support for multi-dimensional arrays. As one of the most popular data wrangling
packages, Pandas works well with many other data science modules inside the Python
ecosystem, and is typically included in every Python distribution, from those that come with
your operating system to commercial vendor distributions like ActiveState’s ActivePython.
Pandas makes it simple to do many of the time consuming, repetitive tasks associated
with working with data, including:
⮚ Data cleansing
⮚ Data fill
⮚ Data normalization
⮚ Merges and joins
⮚ Data visualization
⮚ Statistical analysis
⮚ Data inspection
⮚ Loading and saving data
⮚ And much more
In fact, with Pandas, you can do everything that makes world-leading data scientists
vote Pandas as the best data analysis and manipulation tool available.
Features of pandas:
• A fast and efficient DataFrame object for data manipulation with integrated indexing.
• Tools for reading and writing data between in-memory data structures and different formats:
CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format.
• Intelligent data alignment and integrated handling of missing data: gain automatic label-
based alignment in computations and easily manipulate messy data into an orderly form.
• Flexible reshaping and pivoting of data sets.
• Intelligent label-based slicing, fancy indexing, and subsetting of large data sets.
• Columns can be inserted and deleted from data structures for size mutability.
• Aggregating or transforming data with a powerful group by engine allowing split-apply-
combine operations on data sets.
• High performance merging and joining of data sets.
• Hierarchical axis indexing provides an intuitive way of working with high-dimensional data
in a lower-dimensional data structure.
• Time series-functionality: date range generation and frequency conversion, moving window
statistics, date shifting and lagging. Even create domain-specific time offsets and join time
series without losing data.
• Highly optimized for performance, with critical code paths written in Cython or C.
• Python with pandas is in use in a wide variety of academic and commercial domains,
including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and
more.
Working of pandas:
After the pandas have been installed into the system, you need to import the library. This
module is generally imported as:
import pandas as pd
Here, pd is referred to as an alias to the Pandas. However, it is not necessary to import
the library using the alias, it just helps in writing less amount code every time a method or
property is called.
Pandas generally provide two data structures for manipulating data, They are:
• Series
• DataFrame
1. Series -
Pandas Series is a one-dimensional labelled array capable of holding data of any type (integer,
string, float, python objects, etc.). The axis labels are collectively called indexes. Pandas Series
is nothing but a column in an excel sheet. Labels need not be unique but must be a hashable
type. The object supports both integer and label-based indexing and provides a host of methods
for performing operations involving the index.
Example
import pandas as pd
import numpy as np
# Creating empty series
ser = pd.Series()
print(ser)
# simple array
data = np.array(['p', 'y', 't', 'h', 'o',’n’])
ser = pd.Series(data)
print(ser)
Output
Series([], dtype: float64)
0 p
1 y
2 t
3 h
4 o
5 n
dtype: object
2. Dataframe -
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns). A Data frame is a two-dimensional data
structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame
consists of three principal components, the data, rows, and columns.
Example
import pandas as pd
# Calling DataFrame constructor
df = pd.DataFrame()
print(df)
# list of strings
lst = ['Python', 'programming', 'lab']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)
Output
Empty DataFrame
Columns: []
Index: []
0 Python
1 programming
2 lab
ALGORITHM:
FLOWCHART:
CODE:
CONCLUSION:
REFERENCE:
1. https://www.geeksforgeeks.org/introduction-to-pandas-in-python/
2. https://www.youtube.com/watch?v=UB3DE5Bgfx4
3. https://www.youtube.com/watch?v=PfVxFV1ZPnk