How To Find Duplicates in a Python List

Finding duplicates in a Python List and Removing duplicates from a Python list variable are quite common tasks. And that’s because Python Lists are prone to collecting duplicates in them. Checking if there are duplicates in a list variable is a common task for Python programmers.

Fortunately, it is relatively easy to check for duplicates in Python. And once you spot them, you can do several action items

  • List Duplicate Values only
  • Remove Duplicates Values and create a new list without any duplicates
  • Change the current list by removing only the duplicates, essentially deduplicating the existing list.
  • Just evaluate the list for duplicates, and report if there are duplicates in this list.
  • Count the duplicates in the list.

But before we delve deeper into these tasks, it is better to quickly understand what Lists are and why duplicates can exist in Python lists.

I also want you to know about the Set data type in the Python programming language. Once you know their unique points and differences, you will better appreciate the methods used to identify and remove duplicates from a Python list.

And if you need help with Python, Join our Python Programming Fundamentals Training Course.

What is a List in Python?

A list in Python is like an array. It is a collection of objects stored in a single variable. A list is changeable. You can add or remove elements from Python lists. A list can be sorted too. But by default, a list is not sorted.

A Python list can also contain duplicates, and it can also contain multiple elements of different data types. This way, you can store integers, floating point numbers, positive or negatives, strings, and even boolean values in a list.

Python lists can also contain other lists within it and can grow to any size. But lists are considered slower in accessing elements, as compared to Tuples. So some methods are more suited for small lists, and others are better for large lists. It largely depends on the list size.

You define a list by enclosing the elements in square brackets. Each element is separated by commas within the list.

What is a Set in Python?

A Set is another data type available in Python. Here also, you can store multiple items in a Set. But a set differs from a Python list in that a Set can not contain duplicates.

You can define a Set with curly braces, as compared to a list, which is defined by using square brackets.

A Set in Python is not ordered or indexed. It is possible that every time you access a particular index from a set, you get a different value.

Once you have created a Set in Python, you can add elements but can’t change the existing elements.

Now that you have a basic list comprehension and Set datatype understanding in Python, we will explore the identification and removal of duplicates in Python Lists.

Multiple Ways To Check if duplicates exist in a Python List

  • The length of the List & length of the Set are different
  • Check each element in set. if yes, dup; if not, append.
  • Check for list.count() for each element.

We will be using Python 3 as the language. So as long as you have any version of the Python 3 compiler, you are good to go.

Method 1: Using the length of a list to identify if it contains duplicate elements.

Let’s write the Python program to check this.

# this input list contains duplicates
mylist = [5, 3, 5, 2, 1, 6, 6, 4] # 5 & 6 are duplicate numbers.

# find the length of the list
print(len(mylist))
8

# create a set from the list
myset = set(mylist)

# find the length of the Python set variable myset
print(len(myset))
6
# create a set from the list
myset = set(mylist)

# find the length of the Python set variable myset
print(len(myset))
6

As you can see, the length of the mylist variable is 8, and the myset length is 6.

# create a set from the list
myset = set(mylist)

# find the length of the Python set variable myset
print(len(myset))

Output:

6

Here’s the final Python program – the full code can be copied and pasted into a Python program and used to check if identical items exist in a list or not.

# this input list contains duplicates
mylist = [5, 3, 5, 2, 1, 6, 6, 4] # 5 & 6 are duplicate numbers.

# find the length of the list
print(len(mylist))

# create a set from the list
myset = set(mylist)

# find the length of the Python set variable myset
print(len(myset))

# compare the length and print if the list contains duplicates
if len(mylist) != len(myset):
    print("duplicates found in the list")
else:
    print("No duplicates found in the list")

Output:

8
6
duplicates found in the list

Alternatively, we can create a function that will check if duplicate items exist, and will return a True or a False to alert us of duplicates.

Here is the complete function to check if duplicates exist in Python list

def is_duplicate(anylist):
    if type(anylist) != 'list':
        return("Error. Passed parameter is Not a list")
    if len(anylist) != len(set(anylist)):
        return True
    else:
        return False

mylist = [5, 3, 5, 2, 1, 6, 6, 4] # you can see some repeated number in the list.
if is_duplicate(mylist):
    print("duplicates found in list")
else:
    print("no duplicates found in list")

The output of this Python code is:

duplicates found in list

Method 2: Listing Duplicates in a List & Listing Unique Values – Sorted

In this method, we will create different lists for different use – one to have the duplicate keys or repeated values, and different lists for the unique keys. A few lines of code can do magic in a Python program.

# the given list contains duplicates
mylist = [5, 3, 5, 2, 1, 6, 6, 4] # the original list of integers with duplicates

newlist = [] # empty list to hold unique elements from the list
duplist = [] # empty list to hold the duplicate elements from the list
for i in mylist:
    if i not in newlist:
        newlist.append(i)
    else:
        duplist.append(i) # this method catches the first duplicate entries, and appends them to the list

# The next step is to print the duplicate entries, and the unique entries
print("List of duplicates", duplist)
print("Unique Item List", newlist) # prints the final list of unique items

Output:

List of duplicates [5, 6]
Unique Item List [5, 3, 2, 1, 6, 4]

And if you want to sort the list items after removing the duplicates, you can use the inbuilt function called sort on the list of numbers.

# sorting the list
newlist.sort() # the sort method sorts all the values
print("The sorted list", newlist) # this prints the sorted list

Output:

The sorted list [1, 2, 3, 4, 5, 6]
Join our Python Programming Fundamentals Training Course.

Join the Most Popular Python Programming  Training Course

Method 3: Listing only Duplicate values with the Count Method

This method iterates over each element of the entire list and checks if the count of each element is greater than 1. If yes, that item is added to a set. If you remember, a set cannot contain any duplicates by design. In the following code, for items that exist more than once, only those repeated elements are added to the set.

# the mylist variable represents a duplicate list.
mylist = [5, 3, 5, 2, 1, 6, 6, 4] # the original input list with repeated elements.

dup = {x for x in mylist if mylist.count(x) > 1}
print(dup)

#To count the number of list elements that were duplicated, you can run
print(len(dup))

Output:

{5, 6}
2

Keep in mind that the listed duplicate values might have existed once, or eve

The fastest way to Remove Duplicates From Python Lists

One of the fastest ways to remove duplicates is to create a set from the list variable. All this can be done in just a single Python statement. This is the fastest method, which is more suited for large lists.

Here’s the final code in Python – probably the best way…

# this list contains duplicate number 5 & 6
mylist = [5, 3, 5, 2, 1, 6, 6, 4]
myunique = set(mylist) # prints the final list without any duplicates
print(myunique)

Output:

{1, 2, 3, 4, 5, 6}

How to Avoid Duplicates in a Python List

The first thing you must consider is – Why am I using a list in Python?

Because it can collect duplicates. If you are clear that duplicates don’t exist in whatever you are collecting or storing, then don’t use a list. Instead, a better way is to use a Set. A set is built to reject duplicates, which is a better solution. You should explore sets a bit more to gain a better set comprehension. It can be a real-time saver as this is a more efficient way.

If you don’t care about the order, then just using set(mylist) will do the job of removing any duplicates. I use this, even in the worst case scenario where the incoming entire list is a dirty list of multiple duplicate elements.

Alternatively, if you really must use a list because of the things you can do with a list data type, then do a simple check before you add any element.

For example, you can sort a list, but not a Set in Python. It can be useful for large lists.

So before you add any new element in a list, just do a quick check for the existence of the value. If the element exists, then don’t store it. Simple!

The methods discussed above work on any list of elements. So if you want to find duplicate strings or duplicate integers or duplicate floating numbers or any kind of duplicate objects, you can use these Python programs.

Hope the different ways to find duplicates, list them, and finally remove duplicate elements altogether from any Python list using simple programs and methods will come in handy for your processing and list comprehension.

Beginner in Python?

Join the Beginner’s Python Programming Course.,

Or Join the Python For Data Analysis training course if you know the basics and want to join the more advanced level of Python Data Analysis techniques.

 

vinai@brandrich.com