0% found this document useful (0 votes)
8 views

011b IS3A38 PD - Ch2b

The document discusses various methods for exporting and importing Pandas DataFrames and Series. These include pickle, CSV, Excel, and Feather formats. It provides code examples for reading and writing data in each of these formats.

Uploaded by

bennjoel4587
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

011b IS3A38 PD - Ch2b

The document discusses various methods for exporting and importing Pandas DataFrames and Series. These include pickle, CSV, Excel, and Feather formats. It provides code examples for reading and writing data in each of these formats.

Uploaded by

bennjoel4587
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

(PD) CH.

02
Dr. Norita Ahmad
School of Business Administration
American University of Sharjah
Spring 2024

ISA PYTHON FOR BUSINESS


383
(PD) CH02b: Pandas

1. Exporting and Importing data


✤ Pickle
✤ CSV
✤ Excel
✤ Feather format

*
2 Dr Norita Ahmad :: ISA 383
The DataFrame

3
Exporting Data

๏ It is a common practice to export or save out data sets while processing


them.
๏ Data sets are either saved out as nal cleaned versions of data or in
intermediate steps.
๏ Both of these outputs can be used for analysis or as input to another part of
the data processing pipeline.

4 Dr Norita Ahmad :: ISA 383


fi
Pickle
๏ Pickle is Python’s way of serializing and saving data in a binary format.
๏ If the object is an intermediate step in a set of calculations or if the data will not be
used outside of Python, then saving objects to a pickle will be optimized for Python
as well as in terms of disk storage space.
๏ However, this approach means that people who do not use Python will not be able to
read the data.

The pickle
les are
saved with an
extension
of .p, .pkl,
or .pickle.

5 Dr Norita Ahmad :: ISA 383


fi
Reading pickle data: Series

๏ To read in pickle data, we can use the pd.read_pickle function.

6 Dr Norita Ahmad :: ISA 383


Reading pickle data: Dataframe

๏ To read in pickle data, we can use the pd.read_pickle function.

7 Dr Norita Ahmad :: ISA 383


CSV

๏ Comma-separated values (CSV) are the most exible data storage type. For
each row, the column information is separated with a comma.
➡ The comma is not the only type of delimiter, however. Some les are
delimited by a tab (TSV) or even a semicolon.
➡ The main reason why CSVs are a preferred data format when
collaborating and sharing data is because any program can open this
kind of data structure. It can even be opened in a text editor.

8 Dr Norita Ahmad :: ISA 383


fl
fi
Saving a CSV le

๏ The Series and DataFrame have a to_csv method to write a CSV le.

9 Dr Norita Ahmad :: ISA 383


fi
fi
Removing Row numbers from output

๏ If you open the CSV or TSV le created, you will notice that the rst
“column” looks like the row number of the dataframe.

10 Dr Norita Ahmad :: ISA 383


fi
fi
Removing Row numbers from output

๏ Set index=False when converting the le to exclude the additional


“column” that looks like the row number.

11 Dr Norita Ahmad :: ISA 383


fi
Excel

๏ Probably the most commonly used data type (or the second most
commonly used, after CSVs).
➡ Excel has a bad reputation within the data science community.
๏ The Series data structure does not have an explicit to_excel method. If you
have a Series that needs to be exported to an Excel le, one option is to
convert the Series into a one-column DataFrame.

12 Dr Norita Ahmad :: ISA 383

fi
Excel: Series

You might get a


warning that “the xlwt
engine will be removed
in a future version of
pandas.”

If you get an error


importing xlwt and/or
openpyxl, then both
can be installed with
pip
$ pip install xlwt
$ pip install openpyxl

13 Dr Norita Ahmad :: ISA 383


Excel: Dataframe

๏ You can simply use to_excel method to export a data frame to an excel le.
๏ There are several ways to further ne-tune the output of the excel le.
➡ Refer to DataFrame to Excel documentation: http://pandas.pydata.org/
pandasdocs/stable/generated/pandas.DataFrame.to_excel.html

14 Dr Norita Ahmad :: ISA 383


fi
fi
fi
Read Excel

15 Dr Norita Ahmad :: ISA 383


Read Excel

16 Dr Norita Ahmad :: ISA 383


Feather format to interface with R

๏ The format called “feather” is used to save a binary object that can also be
loaded into the R language. The main bene t of this approach is that it is
faster than writing and reading a CSV le between the languages.
๏ The general rule of thumb for using this data format is to use it only as an
intermediate data format, and to not use the feather format for long-term
storage.
➡ That is, use it in your code only to pass in data into R; do not use it to
save a nal version of your data.

17 Dr Norita Ahmad :: ISA 383


fi
fi
fi
Other data output types

Export Method Description


to_clipboard Save data into the system clipboard for pasting
to_dense Convert data into a regular “dense” DataFrame
to_dict Convert data into a Python
dict to_gbq Convert data into a Google BigQuery table
to_hdf Save data into a hierarchal data format (HDF)
to_msgpack Save data into a portable JSON-like binary
to_html Convert data into a HTML table
to_json Convert data into a JSON string
to_latex Convert data into a LATEX tabular environment
to_records Convert data into a record array
to_string Show DataFrame as a string for stdout
to_sparse Convert data into a SparceDataFrame
to_sql Save data into a SQL database
to_stata Convert data into a Stata dta le

18 Dr Norita Ahmad :: ISA 383


fi

You might also like