Skip to content

min_itemsize not working on MultiIndex columns for Series, with format="table" #11412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Oct 22, 2015 · 3 comments
Closed
Labels
IO HDF5 read_hdf, HDFStore
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Oct 22, 2015

If I do

ddf = pd.DataFrame([['a', 'b', 1],
                    ['a', 'b', 2]],
                    columns=['A', 'B', 'C']).set_index(['A', 'B'])

and then

ddf['C'].to_hdf('/tmp/store.hdf', 'test',
          format="table",
          min_itemsize={'A' : 3})

I get the following:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-46-66f05c11146d> in <module>()
      1 ddf['C'].to_hdf('/tmp/store.hdf', 'test',
      2           format="table",
----> 3           min_itemsize={'A' : 3})

/usr/lib/python2.7/dist-packages/pandas/core/generic.pyc in to_hdf(self, path_or_buf, key, **kwargs)
    936 
    937         from pandas.io import pytables
--> 938         return pytables.to_hdf(path_or_buf, key, self, **kwargs)
    939 
    940     def to_msgpack(self, path_or_buf=None, **kwargs):

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, **kwargs)
    268         with HDFStore(path_or_buf, mode=mode, complevel=complevel,
    269                        complib=complib) as store:
--> 270             f(store)
    271     else:
    272         f(path_or_buf)

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in <lambda>(store)
    263         f = lambda store: store.append(key, value, **kwargs)
    264     else:
--> 265         f = lambda store: store.put(key, value, **kwargs)
    266 
    267     if isinstance(path_or_buf, string_types):

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in put(self, key, value, format, append, **kwargs)
    825             format = get_option("io.hdf.default_format") or 'fixed'
    826         kwargs = self._validate_format(format, kwargs)
--> 827         self._write_to_group(key, value, append=append, **kwargs)
    828 
    829     def remove(self, key, where=None, start=None, stop=None):

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1263 
   1264         # write the object
-> 1265         s.write(obj=value, append=append, complib=complib, **kwargs)
   1266 
   1267         if s.is_table and index:

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in write(self, obj, **kwargs)
   4104         cols.append(name)
   4105         obj.columns = cols
-> 4106         return super(AppendableMultiSeriesTable, self).write(obj=obj, **kwargs)
   4107 
   4108 

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in write(self, obj, data_columns, **kwargs)
   4071             obj.columns = [name]
   4072         return super(AppendableSeriesTable, self).write(
-> 4073             obj=obj, data_columns=obj.columns, **kwargs)
   4074 
   4075     def read(self, columns=None, **kwargs):

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3769         self.create_axes(axes=axes, obj=obj, validate=append,
   3770                          min_itemsize=min_itemsize,
-> 3771                          **kwargs)
   3772 
   3773         for a in self.axes:

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3371             axis, axis_labels = self.non_index_axes[0]
   3372             data_columns = self.validate_data_columns(
-> 3373                 data_columns, min_itemsize)
   3374             if len(data_columns):
   3375                 mgr = block_obj.reindex_axis(

/usr/lib/python2.7/dist-packages/pandas/io/pytables.pyc in validate_data_columns(self, data_columns, min_itemsize)
   3247 
   3248             existing_data_columns = set(data_columns)
-> 3249             data_columns.extend([
   3250                 k for k in min_itemsize.keys()
   3251                 if k != 'values' and k not in existing_data_columns

AttributeError: 'Index' object has no attribute 'extend'

All goes smoothly instead if I don't specify "format=table", or if I don't specify the min_itemsize, or if I save as DataFrame (ddf[['C']]) rather than a as Series.

Tested with up to date pandas from git and pytables 3.2.2-1.

@jreback
Copy link
Contributor

jreback commented Oct 22, 2015

dupe of #11364

its a bug, specify 'index' as the key to make it work

@jreback jreback closed this as completed Oct 22, 2015
@jreback jreback added the IO HDF5 read_hdf, HDFStore label Oct 22, 2015
@toobaz
Copy link
Member Author

toobaz commented Oct 22, 2015

Sorry for the dupe (and for the ridiculous bug title).

But that said,

ddf['C'].to_hdf('/tmp/store.hdf', 'test',
          format="table",
          min_itemsize={'index' : 3})

still gives exactly the same error.

@toobaz toobaz changed the title min_itemsize min_itemsize not working on MultiIndex columns for Series, with format="table" Oct 22, 2015
@jreback
Copy link
Contributor

jreback commented Oct 22, 2015

you can post that as an example in the other issue then. its the same/related.

toobaz added a commit to toobaz/pandas that referenced this issue Nov 24, 2016
toobaz added a commit to toobaz/pandas that referenced this issue Dec 5, 2016
@jreback jreback added this to the 0.19.2 milestone Dec 5, 2016
jreback pushed a commit that referenced this issue Dec 5, 2016
closes #11412

Author: Pietro Battiston <[email protected]>

Closes #14728 from toobaz/minitemsizefix and squashes the following commits:

e25cd1f [Pietro Battiston] Whatsnew
b9bb88f [Pietro Battiston] Tests for previous commit
6406ee8 [Pietro Battiston] BUG: Ensure min_itemsize is always a list
jorisvandenbossche pushed a commit that referenced this issue Dec 15, 2016
closes #11412

Author: Pietro Battiston <[email protected]>

Closes #14728 from toobaz/minitemsizefix and squashes the following commits:

e25cd1f [Pietro Battiston] Whatsnew
b9bb88f [Pietro Battiston] Tests for previous commit
6406ee8 [Pietro Battiston] BUG: Ensure min_itemsize is always a list

(cherry picked from commit 53bf1b2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants