[BUG]: DataFrame.join inconsistent behavior, accepts overlapping columns provided suffixes is specified #13659

dragonator4 · 2016-07-14T21:43:42Z

Here is a sample code to reproduce the error:

In [1]: df1 = pd.DataFrame(np.random.rand(5,2))
        df2 = pd.DataFrame(np.random.rand(5,2))

In [2]: df2.join(df1)
Out[2]: ---------------------------------------------------------------------------
        ValueError: columns overlap but no suffix specified: RangeIndex(start=0, stop=2, step=1)

In [3]: df2.join(df1, lsuffix='_x', rsuffix='_x')
Out[2]:     0_x         1_x         0_x         1_x
        0   0.904888    0.491802    0.509346    0.367847
        1   0.282420    0.092652    0.672786    0.358450
        2   0.339018    0.318990    0.359977    0.640366
        3   0.775293    0.767872    0.820965    0.018728
        4   0.543648    0.412799    0.650457    0.712789

So ultimately one does get a merged DataFrame with overlapping column names. Then why raise an error in the first place?

Note, I am using the latest Pandas, Python and Numpy.

The text was updated successfully, but these errors were encountered:

sinhrks · 2016-07-15T14:04:03Z

I think this is for foolproof. People doesn't want Index with duplicates cos it's confusing and not very performant. One idea is to change the default behavior to add suffix by some rule.

jorisvandenbossche · 2016-07-15T14:17:45Z

@dragonator4 In any case, when passing such prefixes, it is the deliberate choice of the user to have duplicate names, so I think that justifies the difference in behaviour.

BTW, if you want to just join dataframes on the index without worrying about column names (because eg in your example the column names go from int to string), you can also use concat

dragonator4 · 2016-07-15T16:32:56Z

@jorisvandenbossche I stumbled across this when I was trying to do a proper join where I cared about my indexes. It completely flummoxed me for all of 5 minutes, then I realised that I passed rsuffix = '_x' instead of '_y'. This is an issue because had I not checked my output, I could have run into some serious trouble.

Perhaps it should raise an error when suffixes are not passed and there are duplicate columns, as it does. But then perhaps it should warn if the passed suffixes also cause duplicate column names. That way you cater to the deliberate choice of the user as you put it.

jeswcollins · 2019-05-16T22:33:37Z

I found the error message confusing. columns has a specific meaning as a Pandas DataFrame parameter, but it also has a broader meaning. In the broadest sense of the word "column", we might interpret it to include the specific Pandas term index as a column. Indeed, the index appears in a columnar format in stdout.

So I expected the index "column" to overlap in my two dataframes. That's what I was trying to join on!

I'm not sure how to rephrase the error message, but a clearer error message, or joining with default suffices as in a df.merge(df,right_index=True,left_index=True), both seem preferred.

sinhrks added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jul 15, 2016

mroeschke added Enhancement Error Reporting Incorrect or improved errors from pandas labels May 1, 2021

nikaltipar mentioned this issue May 7, 2025

BUG: Duplicate columns allowed on merge if originating from separate dataframes #61402

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: DataFrame.join inconsistent behavior, accepts overlapping columns provided suffixes is specified #13659

[BUG]: DataFrame.join inconsistent behavior, accepts overlapping columns provided suffixes is specified #13659

dragonator4 commented Jul 14, 2016 •

edited

Loading

sinhrks commented Jul 15, 2016

jorisvandenbossche commented Jul 15, 2016

dragonator4 commented Jul 15, 2016

jeswcollins commented May 16, 2019

[BUG]: DataFrame.join inconsistent behavior, accepts overlapping columns provided suffixes is specified #13659

[BUG]: DataFrame.join inconsistent behavior, accepts overlapping columns provided suffixes is specified #13659

Comments

dragonator4 commented Jul 14, 2016 • edited Loading

sinhrks commented Jul 15, 2016

jorisvandenbossche commented Jul 15, 2016

dragonator4 commented Jul 15, 2016

jeswcollins commented May 16, 2019

dragonator4 commented Jul 14, 2016 •

edited

Loading