Tutorials References Menu

Pandas DataFrame drop_duplicates() Method

❮ DataFrame Reference


Example

Remove duplicate rows from  the DataFrame:

import pandas as pd

data = {
  "name": ["Sally", "Mary", "John", "Mary"],
  "age": [50, 40, 30, 40],
  "qualified": [True, False, False, False]
}

df = pd.DataFrame(data)

newdf = df.drop_duplicates()
Try it Yourself »

Definition and Usage

The drop_duplicates() method removes duplicate rows.

Use the subset parameter to specify if any columns should not be considered when looking for duplicates.


Syntax

dataframe.drop_duplicates(subset, keep, inplace, ignore_index)

Parameters

The parameters are keyword arguments.

Parameter Value Description
subset column label(s) Optional. A String, or a list, containing any columns to ignore
keep 'first'
'last'
False
Optional, default 'first'. Specifies which duplicate to keep. If False, drop ALL duplicates
inplace True
False
Optional, default False. If True: the removing is done on the current DataFrame. If False: returns a copy where the removing is done.
ignore_index True
False
Optional, default False. Specifies whether to label the 0, 1, 2 etc., or not

Return Value

A DataFrame with the result, or None if the inplace parameter is set to True.


❮ DataFrame Reference