Skip to content

DOC for refactored compression (GH14576) + BUG: bz2-compressed URL with C engine (GH14874) #14880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Prev Previous commit
Next Next commit
DOC: Improve what's new
Reference corresponding issues in What's New.

Change code example to use string formating for improved modularity.

Add what's new id
  • Loading branch information
dhimmel committed Dec 15, 2016
commit 09dcbff6b3dc83df748b623786d4ef66fd78062c
23 changes: 16 additions & 7 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,15 +64,24 @@ Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now refere

df.groupby(['second', 'A']).sum()

Reading dataframes from URLs, in :func:`read_csv` or :func:`read_table`, now
supports additional compression methods (`xz`, `bz2`, `zip`). Previously, only
`gzip` compression was supported. By default, compression of URLs and paths are
now both inferred using their file extensions.
.. _whatsnew_0200.enhancements.compressed_urls:

.. ipython:: python
Better support for compressed URLs in ``read_csv``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Compression code was refactored (:issue:`12688`). As a result, reading
dataframes from URLs in :func:`read_csv` or :func:`read_table` now supports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are any other issues that were closed by this, pls list them as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rechecked... they're all already listed.

additional compression methods: ``xz``, ``bz2``, and ``zip`` (:issue:`14570`).
Previously, only ``gzip`` compression was supported. By default, compression of
URLs and paths are now both inferred using their file extensions. Additionally,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compression code

paths are not inferred using (remove both)

Additionally, support for bz2 compress in the python 2 c-engine improved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments 1 and 3 in e1b5d42. @jreback, I didn't change:

By default, compression of URLs and paths are now both inferred using their file extensions.

Previously, compression of paths was by default inferred from their extension, but not URLs. Now both are inferred by their extension. Am I missing something?

bz2 support for the python 2 c-engine improved (:issue:`14874`).

url = ('https://github.com/pandas-dev/pandas/raw/master/' +
'pandas/io/tests/parser/data/salaries.csv.bz2')
.. ipython:: python
url = 'https://github.com/{repo}/raw/{branch}/{path}'.format(
repo = 'pandas-dev/pandas',
branch = 'master',
path = 'pandas/io/tests/parser/data/salaries.csv.bz2',
)
df = pd.read_table(url, compression='infer') # default, infer compression
df = pd.read_table(url, compression='bz2') # explicitly specify compression
df.head(2)
Expand Down