-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Styler extremely slow #19917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is it Note that styled = final.style.apply(highlighter)._compute() |
Good point. Function scanning the df and writing new values The Styler by itself The Styler and openpyxl writer |
If you want, you can break that down into how much time is spent in pandas' For the |
As stated above, unless I misunderstand, calling the function independently of the styler using Both calls have the same amount of iteration over the original dataframe, the only difference being the latter creates a |
Could you do some line profiling then? Or post a reproducible example? |
@N2ITN find any time to profile this? |
Closing, let us know if you're able to profile things. |
I'm experiencing the same issue and did some profiling. I have a very small DataFrame: In [4]: df.shape
Out[4]: (78, 4) Just adding a
I ran the line profiler and pasted the output here. As suspected most time is spent in @TomAugspurger Could you please reopen the issue? I'm happy to provide more info, just let me know what is helpful! |
Profiling would be helpful.
…On Sat, Mar 30, 2019 at 5:52 AM Simon Gurcke ***@***.***> wrote:
I'm experiencing the same issue and did some profiling.
I have a very small DataFrame:
In [4]: df.shape
Out[4]: (78, 4)
Just adding a text-align: left property takes almost 1 second:
In [5]: %%time
...: s = df.style.set_properties(**{'text-align': 'left'})
...: s.render();
...:
CPU times: user 844 ms, sys: 68 ms, total: 912 ms
Wall time: 833 ms
I ran the line profiler and pasted the output here
<https://pastebin.com/raw/ShfAcVGx>. As suspected most time is spent in
_update_ctx().
@TomAugspurger <https://github.com/TomAugspurger> Could you please reopen
the issue? I'm happy to provide more info, just let me know what is helpful!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19917 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIpgucxErZ7xaqt2nSz2arz6AAxpRks5vb0H_gaJpZM4ST6vu>
.
|
Ah, I missed that you linked to some. Does set_properties update every cell
in the table? It'd be nice to avoid the
calls to get_indexer in those cases.
```
484 312 1351736.0 4332.5 86.9 i =
self.index.get_indexer([row_label])[0]
485 312 170846.0 547.6 11.0 j =
self.columns.get_indexer([col_label])[0]
```
On Sat, Mar 30, 2019 at 6:33 AM Tom Augspurger <[email protected]>
wrote:
… Profiling would be helpful.
On Sat, Mar 30, 2019 at 5:52 AM Simon Gurcke ***@***.***>
wrote:
> I'm experiencing the same issue and did some profiling.
>
> I have a very small DataFrame:
>
> In [4]: df.shape
> Out[4]: (78, 4)
>
> Just adding a text-align: left property takes almost 1 second:
>
> In [5]: %%time
> ...: s = df.style.set_properties(**{'text-align': 'left'})
> ...: s.render();
> ...:
> CPU times: user 844 ms, sys: 68 ms, total: 912 ms
> Wall time: 833 ms
>
> I ran the line profiler and pasted the output here
> <https://pastebin.com/raw/ShfAcVGx>. As suspected most time is spent in
> _update_ctx().
>
> @TomAugspurger <https://github.com/TomAugspurger> Could you please
> reopen the issue? I'm happy to provide more info, just let me know what is
> helpful!
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#19917 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABQHIpgucxErZ7xaqt2nSz2arz6AAxpRks5vb0H_gaJpZM4ST6vu>
> .
>
|
Yes, it seems that way. Unless you specify the |
I could imagine an alternative updater for special cases like updating every value.
Alternatively we could change the per cell dict to a chain map and apply the table properties to a dict shared by each cell, if that is indeed the issue.
… On Mar 30, 2019, at 06:53, Simon Gurcke ***@***.***> wrote:
Yes, it seems that way. Unless you specify the subset parameter.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi there, I am still relatively new to this toolset (I'm loving pandas overall!), but I seem to be having the same issue with slow Stylers as other users in this thread... Just checking if there are any new developments? In my medium-small df (6K rows by 10 cols), it takes almost 4 minutes to process a function that assigns css font-size using df.style.applymap(my_css_function), while df.applymap(my_css_function) is practically instantaneous. I can process particular slices faster, of course, but it's awkward that I can't re-use the result - it has to get re-computed every time and prints the entire structure with no 'head' method, etc. For my purposes, the style idea is very relevant, but I'd prefer to define rules to control how whatever I'm looking at in my jupyter notebooks are rendered all the time, as opposed to working most of the time with one structure with limited readability and creating a separate structure (Styler) every time I want to see it more clearly/richly. I don't know how reasonable of a request that is, but unless I'm missing something or it gets a lot faster, the usefulness will be fairly low. Thanks for your time! Edit: Just found the "Limitations" section near the bottom of the documentation, it says: "No large repr, and performance isn’t great; this is intended for summary DataFrames". So maybe it's working as intended, formatting control and basic visualizations as a bonus as opposed to core functionality. I can support that. But maybe this "Limitations" section could be moved close to the top of the documentation page so people don't get the wrong idea? |
This issue is still open.
It sounds like using a custom CSS file may better suite your needs. Styler is primarily for css rules where the value depends on the data.
… On Apr 19, 2019, at 18:43, mjh7 ***@***.***> wrote:
Hi there,
I am still relatively new to this toolset (I'm loving pandas overall!), but I seem to be having the same issue with slow Stylers as other users in this thread... Just checking if there are any new developments?
In my medium-small df (6K rows by 10 cols), it takes almost 4 minutes to process a function that assigns css font-size using df.style.applymap(my_css_function), while df.applymap(my_css_function) is practically instantaneous. I can process particular slices faster, of course, but it's awkward that I can't re-use the result - it has to get re-computed every time and prints the entire structure with no 'head' method, etc.
For my purposes, the style idea is very relevant, but I'd prefer to define rules to control how whatever I'm looking at in my jupyter notebooks are rendered all the time, as opposed to working most of the time with one structure with limited readability and creating a separate structure (Styler) every time I want to see it more clearly/richly. I don't know how reasonable of a request that is, but unless I'm missing something or it gets a lot faster, the usefulness will be fairly low.
Thanks for your time!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Here's a temporary solution until it gets fixed:
Original code in current version goes like this: pandas/pandas/io/formats/style.py Lines 551 to 569 in 3c959fc
There were two reasons that it was slow:
I could not figure out:
It went in my app from 20 seconds to render down to less than 2 seconds. |
I made a pull request addressing this issue -- see #34863
|
- experimental, 10% further improvement by eliminating get_indexer call see pandas-dev#19917
- experimental, 10% further improvement by eliminating get_indexer call see pandas-dev#19917
Code Sample
Problem description
Here I have some conditional highlighting on a df with 18k rows. The issue is that despite preceding complex operations on the df (such as conditional merges and
df.apply
by row) taking ~300ms at the most, theStyler.apply
part takes over two minutes. I realize this feature is in development but I am wondering if there is a way to make it faster or if this is a known issue.The text was updated successfully, but these errors were encountered: