0% found this document useful (0 votes)
6 views

CSL-410-L16

Uploaded by

rpschauhan2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

CSL-410-L16

Uploaded by

rpschauhan2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Program:B.

Tech(CSE) IV Semester II Year

CSL-410: Data Science using Python


Unit No. 2
Pandas: I/O Tools

Lecture No. 16

Dr. Sanjay Jain


Associate Professor, CSA/SOET
Outlines
• Introduction
• Read and write CSV files
• Read and write excel files
• Examples
• References
Student Effective Learning Outcomes(SELO)
01: Ability to understand subject related concepts clearly along with
contemporary issues.
02: Ability to use updated tools, techniques and skills for effective domain
specific practices.
03: Understanding available tools and products and ability to use it
effectively.
Introduction
• The Pandas I/O API is a set of top level reader functions accessed like
pd.read_csv() that generally return a Pandas object.
• The two workhorse functions for reading text files (or the flat files) are
read_csv() and read_table(). They both use the same parsing code to
intelligently convert tabular data into a DataFrame object:
pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer',
names=None, index_col=None, usecols=None)

<SELO: 1> <Reference No.: R1,R4>


read.csv()
• read.csv reads data from the csv files and creates a DataFrame object.
import pandas as pd
df=pd.read_csv("temp.csv")
print (df)
• Output:
S.No Name Age City Salary
0 1 Tom 28 Toronto 20000
1 2 Lee 32 HongKong 3000
2 3 Steven 43 Bay Area 8300
3 4 Ram 38 Hyderabad 3900

<SELO: 1> <Reference No.: R1,R4>


read.csv()
custom index
• This specifies a column in the csv file to customize the index using
index_col.
import pandas as pd
df=pd.read_csv("temp.csv" ,index_col=['S.No'])
print (df)
• Output:
S.No Name Age City Salary
1 Tom 28 Toronto 20000
2 Lee 32 HongKong 3000
3 Steven 43 Bay Area 8300
4 Ram 38 Hyderabad 3900

<SELO: 1> <Reference No.: R1,R4>


read.csv()
• Converters: dtype of the columns can be passed as a dict.
import pandas as pd
df = pd.read_csv("temp.csv", dtype={'Salary': np.float64})
print (df.dtypes)
• Output:
S.No int64
Name object
Age int64
City object
Salary float64
dtype: object
• Note: By default, the dtype of the Salary column is int, but the result
shows it as float because we have explicitly casted the type.
<SELO: 1> <Reference No.: R1,R4>
read.csv()
• header_names: Specify the names of the header using the names
argument.
import pandas as pd
df=pd.read_csv("temp.csv", names=['a', 'b', 'c','d','e'])
print (df)

• Output:
a b c d e
S.No Name Age City Salary
0 1 Tom 28 Toronto 20000
1 2 Lee 32 HongKong 3000
2 3 Steven 43 Bay Area 8300
3 4 Ram 38 Hyderabad 3900

<SELO: 1> <Reference No.: R1,R4>


read.csv()
• header_names: Observe, the header names are appended with the
custom names, but the header in the file has not been eliminated.
Now, we use the header argument to remove that. If the header is in
a row other than the first, pass the row number to header. This will
skip the preceding rows.
import pandas as pd
df=pd.read_csv("temp.csv", names=['a', 'b', 'c','d','e'] ,header=0)
print (df)
• Output:
a b c d e
0 1 Tom 28 Toronto 20000
1 2 Lee 32 HongKong 3000
2 3 Steven 43 Bay Area 8300
3 4 Ram 38 Hyderabad 3900

<SELO: 1> <Reference No.: R1,R4>


read.csv()
• Skiprows: skiprows skips the number of rows specified.
import pandas as pd
df=pd.read_csv("temp.csv", skiprows=2)
print (df)
• Output:
2 Lee 32 HongKong 3000
0 3 Steven 43 Bay Area 8300
1 4 Ram 38 Hyderabad 3900

<SELO: 1> <Reference No.: R1,R4>


read.csv()
• head()
• Example:
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
print (tips.head())
• Output:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
<SELO: 1> <Reference No.: R1,R4>
read.csv()
• Column Selection:
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
print(tips[['total_bill', 'tip', 'smoker', 'time']].head(5))
• Output:

<SELO: 1> <Reference No.: R1,R4>


read.csv()
• Filtering: DataFrames can be filtered in multiple ways; the most
intuitive of which is using Boolean indexing.
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
print(tips[tips['time'] == 'Dinner'].head(5))
• Output:

<SELO: 1> <Reference No.: R1,R4>


read.csv()
• Group By: This operation fetches the count of records in each
group throughout a dataset. For instance, a query fetching us the
number of tips left by sex:
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
print(tips.groupby('sex').size())
• Output:
sex
Female 87
Male 157
dtype: int64

<SELO: 1> <Reference No.: R1,R4>


read.csv()
• head(): Top N rows
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
tips = tips[['smoker', 'day', 'time']].head(5)
print(tips)
• Output:

<SELO: 1> <Reference No.: R1,R4>


read.csv()
• tail(): Bottom N rows
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
tips = tips[['smoker', 'day', 'time']].tail(5)
print(tips)
• Output:

<SELO: 1> <Reference No.: R1,R4>


Writing CSV Files with to_csv()
• The process of creating or writing a CSV file through Pandas can be
a little more complicated than reading CSV, but it's still relatively
simple. We use the to_csv() function to perform this task. However,
you have to create a Pandas DataFrame first, followed by writing
that DataFrame to the CSV file.
• Example:
import pandas as pd
city = pd.DataFrame([['Sacramento', 'California'], ['Miami', 'Florida']],
columns=['City', 'State'])
city.to_csv('city.csv')
• In the above example, we have created a DataFrame named city.
Subsequently, we have written that DataFrame to a file named
"city.csv" using the to_csv() function.

<SELO: 1> <Reference No.: R1,R4>


to_excel()
• The to_excel() method stores the data as an excel file
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
tips.to_excel("tips.xlsx", sheet_name=“customer", index=False)

• In the example here, the sheet_name is named customer instead of the


default Sheet1. By setting index=False the row index labels are not saved
in the spreadsheet.

<SELO: 1> <Reference No.: R1,R4>


read_excel()
• read_excel() will reload the data to a DataFrame:
import pandas as pd
tips = pd.read_excel("tips.xlsx", sheet_name=“customer")
print (tips.head())
• Output:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

<SELO: 1> <Reference No.: R1,R4>


Learning Outcomes

The students have learn and understand the followings:

•Introduction
•Read and write CSV files
•Read and write excel files
•Examples
References

1. Data Science with Python by by Aaron England, Mohamed Noordeen


Alaudeen, and Rohan Chopra. Packt Publishing; July 2019
2. https://intellipaat.com/blog/what-is-data-science/
3. https://onlinecourses.nptel.ac.in/noc20_cs36/
Thank you

You might also like