011b IS3A38 PD - Ch2b
011b IS3A38 PD - Ch2b
02
Dr. Norita Ahmad
School of Business Administration
American University of Sharjah
Spring 2024
*
2 Dr Norita Ahmad :: ISA 383
The DataFrame
3
Exporting Data
The pickle
les are
saved with an
extension
of .p, .pkl,
or .pickle.
๏ Comma-separated values (CSV) are the most exible data storage type. For
each row, the column information is separated with a comma.
➡ The comma is not the only type of delimiter, however. Some les are
delimited by a tab (TSV) or even a semicolon.
➡ The main reason why CSVs are a preferred data format when
collaborating and sharing data is because any program can open this
kind of data structure. It can even be opened in a text editor.
๏ The Series and DataFrame have a to_csv method to write a CSV le.
๏ If you open the CSV or TSV le created, you will notice that the rst
“column” looks like the row number of the dataframe.
๏ Probably the most commonly used data type (or the second most
commonly used, after CSVs).
➡ Excel has a bad reputation within the data science community.
๏ The Series data structure does not have an explicit to_excel method. If you
have a Series that needs to be exported to an Excel le, one option is to
convert the Series into a one-column DataFrame.
fi
Excel: Series
๏ You can simply use to_excel method to export a data frame to an excel le.
๏ There are several ways to further ne-tune the output of the excel le.
➡ Refer to DataFrame to Excel documentation: http://pandas.pydata.org/
pandasdocs/stable/generated/pandas.DataFrame.to_excel.html
๏ The format called “feather” is used to save a binary object that can also be
loaded into the R language. The main bene t of this approach is that it is
faster than writing and reading a CSV le between the languages.
๏ The general rule of thumb for using this data format is to use it only as an
intermediate data format, and to not use the feather format for long-term
storage.
➡ That is, use it in your code only to pass in data into R; do not use it to
save a nal version of your data.