-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: Removed the GIL from parts of the TextReader class #11272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if (not have_real_test_parallel): | ||
raise NotImplementedError | ||
# Using the values | ||
self.data = '0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336\n 0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285\n 0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126\n 0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394\n 0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020\n ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just do use random here. e.g. DataFrame(np.random.randn(10000,10))
(or whatever makes this a reasonable time
also can you add a tests that uses object and another that has a datetime parse in the index_col
some windows cythoning errors. odd they don't show up for you, what platform are you testing on?
|
I fixed a few issues with the build |
builds ,couple of errors on windows
|
This was an issue with python3 and not limited to windows. |
def time_nogil_read_csv(self): | ||
@test_parallel(num_threads=2) | ||
def run(arr): | ||
read_csv('__test__.csv', sep=',', header=None, float_precision=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is their a cleanup function on these? (e.g. to remove the created csvs)?
@qwhelan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't. I had a look at a few tests and they don't seem to cleanup either... see test.h5
|
Where do I need to add information in the performance section? Is it in the what's new file in the documentation? |
whatsnew/v0.17.1 (Performance section) |
@jdeschenes looks good. can you
|
can you update according to comments |
What's the status on this @jdeschenes? I'd like to include this work in a talk happening tomorrow. It'd be awesome to be able to say that this was in master rather than in a branch. |
you can say slated for 0.17.1 :) |
I will get the final changes tomorrow. |
a5ff16a
to
9a4f845
Compare
@jreback, The changes have been implemented. Let me know if there is anything else that needs to be done. |
@@ -1452,6 +1449,18 @@ cdef _to_fw_string(parser_t *parser, int col, int line_start, | |||
result = np.empty(line_end - line_start, dtype='|S%d' % width) | |||
data = <char*> result.data | |||
|
|||
with nogil: | |||
_to_fw_string_internal(parser, col, line_start, line_end, width, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this: _to_fw_string_nogil
instead
@jdeschenes thanks, just some small comments. ping when pushed. pls also post a short benchmark in the top of the PR (you can just run before/after in ipython via timeit if you want), mainly for posterity. |
The GIL was released around the tokenizer functions and the conversion function(_string_convert excluded).
@jreback: Added the benchmarks. |
looks good ping when green |
Ping |
PERF: Removed the GIL from parts of the TextReader class
thanks @jdeschenes and @mrocklin for the pings! |
Merge pull request pandas-dev#11272 from jdeschenes/nogil_csv
The GIL was removed around the tokenizer functions and the conversion function(_string_convert excluded).
Benchmark:
Data Generation:
Benchmark Code:
Before:
After: