Pandas Dataframe: Union and Concat Examples

Last updated:
Table of Contents

SQL Union all behaviour

The default behaviour for pandas.concat is not to remove duplicates!

Use ignore_index=True to make sure sure the index gets reset in the new dataframe.

import pandas as pd

df1 = pd.DataFrame({
    'name':['john','mary'],
    'age':[24,45]
})

df2 = pd.DataFrame({
    'name':['mary','john'],
    'age':[45,89]
})

pd.concat([
    df1,df2
],ignore_index=True)

dataframe-1 Dataframe 1 dataframe-2 Dataframe 2
concatenated-dataframe Union of Dataframe 1 and 2:
Note that the index was reset and
the duplicate row was NOT removed

SQL Union behaviour

In SQL, the union keyword implies that duplicates are removed:

import pandas as pd

df1 = pd.DataFrame({
    'name':['john','mary'],
    'age':[24,45]
})

df2 = pd.DataFrame({
    'name':['mary','john'],
    'age':[45,89]
})

pd.concat([
    df1,df2
],ignore_index=True).drop_duplicates().reset_index(drop=True)

dataframe-1 Dataframe 1 dataframe-2 Dataframe 2
concatenated-dataframe Union of Dataframe 1 and 2:
No duplicates now

Concatenate side-by-side

Just pass axis=1

import pandas as pd

df1 = pd.DataFrame({
    'name':['john','mary'],
    'age':[24,45]
})

df2 = pd.DataFrame({
    'name':['mary','john'],
    'age':[45,89]
})

pd.concat([
    df1,df2
],axis=1)

dataframe-1 Dataframe 1 dataframe-2 Dataframe 2
concatenated-dataframe Concatenation of Dataframe 1 and 2:
Pandas will not warn you if you try
to concatenate two dataframes that have
columns with the same name!


References

Dialogue & Discussion