Skip to content

EA: fillna should accept same type #32414 #43230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

Bhavay-2001
Copy link

@Bhavay-2001 Bhavay-2001 commented Aug 26, 2021

A unit test has been added to check the validity of pd.Categorical() . This PR is with respect to the GitHub issue no. 32414.
Thanks

@pep8speaks
Copy link

pep8speaks commented Aug 26, 2021

Hello @Bhavay192! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 683:5: E304 blank lines found after function decorator

Comment last updated at 2021-11-20 10:33:09 UTC

@Bhavay-2001
Copy link
Author

I request all the members to please review my PR and please tell if any changes needs to done. Thanks

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your test seems broken, e.g. It is not passing the ci.

also you have a linting error. You could install pre-commit to run the Checks locally.

See https://pandas.pydata.org/pandas-docs/stable/development/contributing.html for further information

@alimcmaster1 alimcmaster1 added the Testing pandas testing functions or related to the test suite label Aug 26, 2021
@Bhavay-2001
Copy link
Author

hey @phofl , thanks for reviewing. I couldn't understand where my test is failing. It would be really helpful if you could help me understand in a little more detail as I'm quite new to open source. Hope you understand. Also I checked the pandas documentation but couldn't find anything that suits to my test.

@Bhavay-2001
Copy link
Author

hey @phofl , thanks for suggesting me the failure. I opened the link in you comments . It is quite difficult to interpret where my test is failing but by seeing the logs, it makes me feel that the code is failing with the TypeError that i have raised. Please cross check my code once and tell me if could you find my mistake. Also i ran the test locally on my laptop using the unittest module of python, soo there might be some changes with pytest. Please see if you can help me with the error. Thanks

@phofl
Copy link
Member

phofl commented Aug 28, 2021

Running them with unittest is not helpful, since we are using a lot of pytest functionality. Please run them with pytest then your test will fail locally too

@Bhavay-2001
Copy link
Author

hey @phofl, thanks for replying. Based on your comment I ran my test locally on my machine with pytest module and I have updated the code with the changes that were causing the error. Hope soo that it shall now pass all the tests and doesn't conflict with the rest of code. Please review the updated code. Thanks

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use our pr template for your pull request header to reference the issue and check the remaining tasks

Edit: Test is still failing

data = ["A", "B", np.nan, np.nan, "C"]
ser = Series(Categorical(data, categories=["A", "B", "C"]))

# msg = "Element not present in categories. Cannot be filled in series."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant? If not please delete

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will delete the commented part

# ser.fillna("D")

exp = Series(Categorical(expected_output, categories=["A", "B", "C"]))
result = ser.fillna(fill_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not cover the issue. We need to check with a categorical fill_value too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have defined the the fill_value in the pytest parameterized section. There I have provided a fill_value . Do we need to explicitly define that categroical value here??

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please ellaborate on this a little more ?? I find it difficult to understand

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not testing everything mentioned in the issue. @mroeschke provided a code snippet there which should be covered here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [22]: cat = pd.Categorical(["A", "B", None, "A"])
...: ser = pd.Series(cat).fillna("B")

In [23]: >>> filled = cat.fillna(ser)

In [24]: >>> cat.fillna(filled)
Out[24]:
['A', 'B', 'B', 'A']
Categories (2, object): ['A', 'B']

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @phofl, this was the code snippet as being provided by @mroeschke, I have tried to add the same thing. Soo i'm asking should I add "B" instead of that fill_value??
I saw other tests too and they did the same thing, do you want me to explicitly declare "B"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay @phofl , I have checked with categorical_fillna too. I will add the updated code tomorrow positively. I think with that issue will be resolved. Thanks

# with pytest.raises(TypeError, match=msg):
# ser.fillna("D")

exp = Series(Categorical(expected_output, categories=["A", "B", "C"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please call expected

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay i will do the necessary changes.

@Bhavay-2001
Copy link
Author

hey @phofl, I have updated the pull request. Please review it once.

@phofl
Copy link
Member

phofl commented Sep 3, 2021

Tests are still failing

@Bhavay-2001
Copy link
Author

I'm unable to understand why they are still failing. I made all the necessary changes that you said, made a complete test out of that code snippet and its working fine on my machine and also I tried to design the test in the style of the other tests in that file. Still its failing .

@MarcoGorelli
Copy link
Member

and its working fine on my machine

Pretty sure the code, as it's written, won't pass - could you show what you ran and what your output is please? Perhaps you have some commits you haven't pushed yet?

@Bhavay-2001
Copy link
Author

hey @MarcoGorelli, thanks for replying. I tried to run only the function I have written on my machine without the class. As i made the class and ran it showed some error and I'm unable to intrepret it.
I will attach the my code and the output here. Please review it.

Code
Code

Output
Output

@Bhavay-2001
Copy link
Author

However, If I try to run test_fillna.py complete file on my machine, It passes all the 39 tests, but fails on only 1 test. I'm adding below the output of that too.
Output

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Sep 4, 2021

Try running pytest pandas/tests/series/methods/test_fillna.py -k test_series_fill, that'll reproduce the error you see in CI

(you may need to replace / with \ if you're on Windows)

@jreback jreback changed the title TST Providing unit test to snippet GH32414 EA: fillna should accept same type #32414 Sep 4, 2021
@jreback jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Sep 4, 2021
],
)

def test_series_fill(fill_value, expected_output):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename to tests_fillna_categorical

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaa i will rename it.

exp_ser = Series(exp)
result = ser.fillna(fill_value)
filled = cat.fillna(fill_value)
tm.assert_almost_equal(result, exp_ser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use
tm.assert_series_equal(result_ser, exp_ser)

and
tm.assert_categorical_equal(result_cat, exp_cat)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and rename things a bit, this is very hard to read

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaa i surely rename it. Thanks for checking that out.

@Bhavay-2001
Copy link
Author

Hey @MarcoGorelli, I tested the complete test_fillna.py in my machine, it shows no errors. However, on running the above command that you told, it gives just an import error in a seperate file and not in test_fillna.py. Now, what should i do next??

@MarcoGorelli
Copy link
Member

can you paste your command and output please?

@Bhavay-2001
Copy link
Author

Yes, my testing_fillna.py is same as the test_fillna.py and i ran it. it shows the following command
image

@MarcoGorelli
Copy link
Member

However, on running the above command that you told, it gives just an import error in a seperate file and not in test_fillna.py

can you paste command and output of this?

@Bhavay-2001
Copy link
Author

Output

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 5, 2021
@mroeschke
Copy link
Member

Appears this PR has been dormant for a while and is still failing in the CI so closing. If interested in continuing, please merge master, address related comments and we can reopen.

@mroeschke mroeschke closed this Oct 31, 2021
@Bhavay-2001
Copy link
Author

Hey, I was a bit busy soo couldn't contribute to it. I'm thinking of working on this PR now.

@Bhavay-2001
Copy link
Author

Hey, my function is just working fine I believe. The problem is coming with other functions. I have tested my function by commenting out the error-causing functions and all the tests have passed. Please open the PR, soo that I can discuss on this further. Thanks

@MarcoGorelli
Copy link
Member

sure, reopened

@MarcoGorelli MarcoGorelli reopened this Nov 12, 2021
@Bhavay-2001
Copy link
Author

Hey @MarcoGorelli, May I show u my code ?? Cause the problem comes here that the test is failing for other codes in the file. If I run my code alone on a separate file it just works fine. Soo may I??

@MarcoGorelli
Copy link
Member

feel free to paste your code (copy and paste it rather than showing a screenshot)

@Bhavay-2001
Copy link
Author

Hey @MarcoGorelli , sorry for late replying. I will surely paste my code sample here so that u can have a look. Please give me some time.

@Bhavay-2001
Copy link
Author

Bhavay-2001 commented Nov 19, 2021

def test_fillna_categorical(self, fill_value, expected_output):
        # GH32414
        data = ["A", "B", np.nan, np.nan, "C"]
        cat = Categorical(data, categories=["A", "B", "C"])
        ser = Series(cat)
        exp_cat = Categorical(expected_output, categories=["A", "B", "C"])
        exp_ser = Series(exp_cat)
        result_ser = ser.fillna(fill_value)
        filled = cat.fillna(fill_value)
        tm.assert_almost_equal(result_ser, exp_ser)
        tm.assert_almost_equal(filled, exp_cat)

@Bhavay-2001
Copy link
Author

@MarcoGorelli , this is my code sample. Please review it soo that I can merge it. Thanks

@MarcoGorelli
Copy link
Member

Please show your whole file - I don't believe that that one passes because self would be undefined.
Please also show what you ran, and the output

@Bhavay-2001
Copy link
Author

Hey @MarcoGorelli, soo I ran the testing_datatype.py file in another file and here was the result.

@Bhavay-2001
Copy link
Author

Output
this was after I commented out all the error causing functions. Thanks

@Bhavay-2001
Copy link
Author

Please if anyone can review my code and see what is causing the problem cause I can't really find the mistake due to which it is not passing all the tests. Any help will be appreciated. Thanks

@MarcoGorelli
Copy link
Member

  1. show your whole testing_datatype.py file
  2. you need to run the test in your clone of pandas, within your pandas-dev virtual environment

@Bhavay-2001
Copy link
Author

He @MarcoGorelli , how can I show you my whole testing_datatype.py file?? Should I paste the code here??? And also how can I run the tests in my pandas-dev virtual environment??

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Nov 20, 2021

Trim the file down so it only has this test - try to isolate the issue

Regarding pytest, you just need to:

  1. activate your virtual environment - if you use conda, that's conda activate pandas-dev
  2. run the test, like you've done

see https://pandas.pydata.org/pandas-docs/stable/development/contributing_codebase.html?highlight=pytest#test-driven-development-code-writing

In general, I'd suggest you read through the contributing guide, it sounds like there might be a bit more preparation needed before we can pick this PR up https://pandas.pydata.org/pandas-docs/stable/development/index.html

@jreback
Copy link
Contributor

jreback commented Jan 16, 2022

closing as stale

@jreback jreback closed this Jan 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Stale Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EA: fillna should accept same type
7 participants