You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The real root of this problem is that dateutil.parse() is really liberal about parsing weird values into datetimes. So I think the two best strategies are:
Stop using dateutil for datetime parsing, because it is too liberal. Have more constrained rules for what is acceptable as a datetime and change pandas to only parse those.
or
Add additional filtering before dateutil to hopefully provide more sane behavior
This change is an attempt at (2). The benefits of (2) are that it is much less of a BC break, and hopefully it is more user-friendly [though this isn't a given - the current dateutil behavior can output some surprising results, like this bug]. I think (1) has a lot of merits as well, but my guess is (2) is more in the spirit of pandas.
So this change basically just moves the pre-filtering for dateutil into a new module, pandas.utils.datetime_parsing, and then just calls that before using dateutil. There is little performance concern, because dateutil.parse() is much slower than this and this will only involve code paths that are eventually getting to dateutil.
I think this is also nice in that it moves the dateutil pre-filtering into one spot with testing, so if we wanted to go more down this path of pre-filtering dateutil, we have a good place for it. Previously there was already a slight amount of dateutil pre-filtering in the cython.
…#4601
Currently dateutil will parse almost any string into a datetime. This
change adds a filter in front of dateutil that will prevent it from
parsing certain strings that don't look like datetimes:
1) Strings that parse to float values that are less than 1000
2) Certain special one character strings (this was already in there,
this just moves that code)
Additionally, this filters out datetimes that are out of range for the
datetime64[ns] type. Currently any out-of-range datetimes will just
overflow and be mapped to some random time within the bounds of
datetime64[ns].
Moved from #3171.
The text was updated successfully, but these errors were encountered: