Python - Group records by Kth column in List

Last Updated : 18 Apr, 2023

Sometimes, while working with Python lists, we can have a problem in which we need to perform grouping of records on basis of certain parameters. One such parameters can be on the Kth element of Tuple. Lets discuss certain ways in which this task can be performed.

Method #1 : Using loop + defaultdict() The combination of above methods can be used to perform this task. In this we store the tuples in different list on basis of Kth Column using defaultdict and iteration using loop.

Python3

# Python3 code to demonstrate 
# Group records by Kth column in List
# using loop + defaultdict()
from collections import defaultdict

# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]

# printing original list
print(&quot;The original list is : &quot; + str(test_list))

# Initializing K 
K = 0

# Group records by Kth column in List
# using loop + defaultdict()
temp = defaultdict(list)
for ele in test_list:
    temp[ele[K]].append(ele)
res = list(temp.values())

# printing result 
print (&quot;The list after grouping : &quot; + str(res))

Output :

The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping : [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

Time Complexity: O(n*n) where n is the number of elements in the list “test_list”.
Auxiliary Space: O(n) where n is the number of elements in the list “test_list”.

Method #2 : Using itemgetter() + groupby() + list comprehension The combination of above function can also be performed using above functions. In this, itemgetter is used to select Kth Column, groupby() is used to group and list comprehension is used to compile the result.

Python3

# Python3 code to demonstrate 
# Group records by Kth column in List
# using itemgetter() + list comprehension + groupby()
from operator import itemgetter
from itertools import groupby

# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]

# printing original list
print(&quot;The original list is : &quot; + str(test_list))

# Initializing K 
K = 0

# Group records by Kth column in List
# using loop + defaultdict()
temp = itemgetter(K)
res = [list(val) for key, val in groupby(sorted(test_list, key = temp), temp)]

# printing result 
print (&quot;The list after grouping : &quot; + str(res))

Output :

The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping : [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

The time complexity of the code is O(nlogn), where n is the length of the input list.
The space complexity of the code is O(n), where n is the length of the input list.

Method #3 : Using numpy

One more approach to perform the grouping of records based on the Kth column in a list is using the numpy library.

Here's how it can be done:

Python3

import numpy as np

# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]

# printing original list
print("The original list is : " + str(test_list))

# Initializing K
K = 0

# Group records by Kth column in List using numpy
arr = np.array(test_list)
keys, indices, inverse = np.unique(arr[:, K], return_index=True, return_inverse=True)
res = [arr[np.where(inverse == i)].tolist() for i in range(len(keys))]

# printing result
print("The list after grouping : " + str(res))
#This code is contributed by Edula Vinay Kumar Reddy

Output:

The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping : [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

Time Complexity: O(NlogN)
Space Complexity: O(N)

Method #4: Using a list comprehension with enumerate() function and a set:

Prints the original list.
Initializes a variable K with a value of 0.
Creating a list result with the length equal to the number of unique keys in the input list. This is done by getting all the keys in the input list, using a set to remove duplicates, and getting the length of the resulting set.
Loops over each tuple in the input list, unpacking each tuple into the variables key and value, and using the enumerate() function to also get the index I.
Uses a list comprehension to get a list of all unique keys in the input list. Then, it finds the index of the current key in that list, and appends the current tuple to the corresponding list in the result list.

Python3

# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]

# printing original list
print("The original list is : " + str(test_list))

# Initializing K
K = 0

# Group records by Kth column in List
# using enumerate() function

result = [[] for i in range(len(set([x[0] for x in test_list])))]
for i, (key, value) in enumerate(test_list):
    result[list(set([x[0] for x in test_list])).index(key)].append((key, value))

# printing result
print("The list after grouping : " + str(result))

Output

The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping : [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

Time Complexity: O(n log n), where n is the length of the input list test_list as the set() operation in the list comprehension takes O(n) time, and the index() method in the for loop takes O(log n) time.

Space Complexity: O(n), as a new list is created of length n.

Method #5: Using setdefault on a dictionary

Step-by-step algorithm:

Initialize the list of tuples containing records.
Initialize the column number K for grouping by that column.
Initialize an empty dictionary called groups.
Iterate each record in the test_list.
a. Get the value of the Kth column in the current record.
b. If the key for this value does not exist in the groups dictionary, create a new empty list as the value for that key.
c. Append the current record to the list for the corresponding key in the groups dictionary.
Convert the dictionary of groups to a list of lists containing the grouped records.
Print the resulting list of lists.

Python3

#initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]

# printing original list
print("The original list is : " + str(test_list))

#The column number K is initialized to 0
K = 0

#An empty dictionary called groups is initialized
groups = {}

#Then on each record in the test_list is iterated over
for x in test_list:
    groups.setdefault(x[K], []).append(x)
groups = list(groups.values())

#The resulting list of lists is printed
print("The list after grouping: " + str(groups))

Output

The original list is : [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping: [[('Gfg', 1), ('Gfg', 3)], [('is', 2), ('is', 4)], [('best', 5)]]

Time Complexity: O(n), where n is the number of records in the input list. This is because the algorithm iterates over each record in the input list once.
Auxiliary Space: O(n), where n is the number of records in the input list.

Method #6: Using itertools.groupby()

Step-by-step approach:

We first import the groupby() function from the itertools module.
We initialize the original list test_list and print it.
We initialize the value of K to the index of the column we want to group by (in this case, the 0th column).
We use the sort() method to sort the list based on the values in the Kth column.
We use a list comprehension and the groupby() function to group the records based on the Kth column. The groupby() function groups the list into sub-lists based on the key value returned by the lambda function (lambda x: x[K]), which is the Kth element of each tuple. We then convert each group into a list and append it to the res list.
Finally, we print the resulting list.

Below is the implementation of the above approach:

Python3

from itertools import groupby

# Initializing list
test_list = [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]

# printing original list
print("The original list is: " + str(test_list))

# Initializing K
K = 0

# Group records by Kth column in List
# using itertools.groupby()
test_list.sort(key=lambda x: x[K])
res = [list(group) for key, group in groupby(test_list, lambda x: x[K])]

# printing result
print("The list after grouping: " + str(res))

Output

The original list is: [('Gfg', 1), ('is', 2), ('Gfg', 3), ('is', 4), ('best', 5)]
The list after grouping: [[('Gfg', 1), ('Gfg', 3)], [('best', 5)], [('is', 2), ('is', 4)]]

Time complexity: O(n log n), where n is the length of the list.
Auxiliary space: O(n), where n is the length of the list.