Python - N sized substrings with K distinct characters
Last Updated :
26 Apr, 2023
Given a String, the task is to write a Python program to extract strings of N size and have K distinct characters.
Examples:
Input : test_str = 'geeksforgeeksforgeeks', N = 3, K = 2
Output : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']
Explanation : 3 lengths have 2 unique characters are extracted.
Input : test_str = 'geeksforgeeksforgeeks', N = 4, K = 2
Output : []
Explanation : No strings with 4 length and just have 2 elements.
Method #1 : Using slicing + set() + loop
In this we perform task of getting N chunks using slicing, set() is used to check for unique elements. Loop is used to iterate for all the possible chunks.
Python3
# Python3 code to demonstrate working of
# N sized substrings with K distinct characters
# Using slicing + set() + loop
# initializing string
test_str = 'geeksforgeeksforgeeks'
# printing original string
print("The original string is : " + str(test_str))
# initializing N
N = 3
# initializing K
K = 2
res = []
for idx in range(0, len(test_str) - N + 1):
# getting unique elements off sliced string
if (len(set(test_str[idx: idx + N])) == K):
res.append(test_str[idx: idx + N])
# printing result
print("Extracted Strings : " + str(res))
OutputThe original string is : geeksforgeeksforgeeks
Extracted Strings : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']
Time Complexity: O(n*n)
Auxiliary Space: O(n)
Method #2 : Using list comprehension + len() + set() + slicing
Similar to above method, only difference being list comprehension is used instead of loop, just to provide shorthand to solve this task.
Python3
# Python3 code to demonstrate working of
# N sized substrings with K distinct characters
# Using list comprehension + len() + set() + slicing
# initializing string
test_str = 'geeksforgeeksforgeeks'
# printing original string
print("The original string is : " + str(test_str))
# initializing N
N = 3
# initializing K
K = 2
# list comprehension used to slice
res = [test_str[idx: idx + N]
for idx in range(0, len(test_str) - N + 1)
if len(set(test_str[idx: idx + N])) == K]
# printing result
print("Extracted Strings : " + str(res))
OutputThe original string is : geeksforgeeksforgeeks
Extracted Strings : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']
The time and space complexity for all the methods are the same:
Time Complexity: O(n)
Space Complexity: O(n)
Approach 3 : Using Counter and Sliding Window
This approach uses the Counter method from the collection module, which allows us to count the frequency of elements in an iterable. We maintain a sliding window with the length of N and move it to the right, updating the count of characters in the window as we move. If the count of unique characters in the window becomes less than or equal to K, we append the window to the result list.
Python3
# Python3 code to demonstrate working of
# N sized substrings with K distinct characters
# Using Counter and Sliding Window
from collections import Counter
# initializing string
test_str = 'geeksforgeeksforgeeks'
# printing original string
print("The original string is : " + str(test_str))
# initializing N
N = 3
# initializing K
K = 2
res = []
# Sliding window of length N
for i in range(len(test_str) - N + 1):
window = test_str[i:i+N]
c = Counter(window)
# if count of unique characters in the window <= K
if len(c) <= K:
res.append(window)
# printing result
print("Extracted Strings : " + str(res))
OutputThe original string is : geeksforgeeksforgeeks
Extracted Strings : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']
Time Complexity: O(n)
Auxiliary Space: O(n)
Approach 4: Using sliding window and hash map
- Initialize a dictionary char_count to store the count of characters in the current window.
- Initialize two pointers start and end to represent the left and right ends of the window. Also, initialize a variable count to keep track of the number of distinct characters in the window.
- Traverse the string from left to right using the end pointer end and do the following:
a. If the count of the character at the end index is 0 in the dictionary, increment the count variable.
b. Increment the count of the character at the end index in the dictionary.
c. If the length of the current window is greater than N, move the start pointer start to the right and do the following:
i. Decrement the count of the character at the start index in the dictionary.
ii. If the count of the character at the start index becomes 0 in the dictionary, decrement the count variable.
d. If the length of the current window is equal to N and the count variable is equal to K, add the current substring to the result list. - Return the result list.
Python3
def n_size_substrings_with_k_distinct_characters(s, N, K):
res = []
char_count = {}
start = 0
count = 0
for end in range(len(s)):
if s[end] not in char_count or char_count[s[end]] == 0:
count += 1
char_count[s[end]] = char_count.get(s[end], 0) + 1
if end - start + 1 > N:
char_count[s[start]] -= 1
if char_count[s[start]] == 0:
count -= 1
start += 1
if end - start + 1 == N and count == K:
res.append(s[start:end+1])
return res
# example usage
test_str = 'geeksforgeeksforgeeks'
print(n_size_substrings_with_k_distinct_characters(test_str, 3, 2))
Output['gee', 'eek', 'gee', 'eek', 'gee', 'eek']
Time complexity: O(n), where n is the length of the string.
Auxiliary space: O(k), where k is the number of distinct
Similar Reads
String with Most Unique Characters - Python
The task of finding the string with the most unique characters in a list involves determining which string contains the highest number of distinct characters. This is done by evaluating each string's distinct elements and comparing their counts. The string with the greatest count of unique character
4 min read
Python - Characters which Occur in More than K Strings
Sometimes, while working with Python, we have a problem in which we compute how many characters are present in string. But sometimes, we can have a problem in which we need to get all characters that occur in atleast K Strings. Let's discuss certain ways in which this task can be performed. Method #
4 min read
Python - Groups Strings on Kth character
Sometimes, while working with Python Strings, we can have a problem in which we need to perform Grouping of Python Strings on the basis of its Kth character. This kind of problem can come in day-day programming. Let's discuss certain ways in which this task can be performed. Method #1: Using loop Th
4 min read
Python | Extract K sized strings
Sometimes, while working with huge amount of data, we can have a problem in which we need to extract just specific sized strings. This kind of problem can occur during validation cases across many domains. Let's discuss certain ways to handle this in Python strings list. Method #1 : Using list compr
5 min read
Split string on Kth Occurrence of Character - Python
The task is to write Python program to split a given string into two parts at the KáµÊ° occurrence of a specified character. If the character occurs fewer than K times return the entire string as the first part and an empty string as the second part. For example, in the string "a,b,c,d,e,f", splitting
3 min read
Create List of Substrings from List of Strings in Python
In Python, when we work with lists of words or phrases, we often need to break them into smaller pieces, called substrings. A substring is a contiguous sequence of characters within a string. Creating a new list of substrings from a list of strings can be a common task in various applications. In th
3 min read
Python | Extract characters except of K string
Sometimes, while working with Python strings, we can have a problem in which we require to extract all the elements of string except those which present in a substring. This is quite common problem and has application in many domains including those of day-day and competitive programming. Lets discu
6 min read
Python | Find all possible substrings after deleting k characters
Given a string and an Integer k, write a Python program to find all possible substrings of the given string after deleting k characters. Examples: Input : geeks, k = 1 Output : {'gees', 'eeks', 'geks', 'geek'} Input : dog, k = 1 Output : {'do', 'dg', 'og'} Approach #1 : Naive Approach This is the re
5 min read
Get Last N characters of a string - Python
We are given a string and our task is to extract the last N characters from it. For example, if we have a string s = "geeks" and n = 2, then the output will be "ks". Let's explore the most efficient methods to achieve this in Python.Using String Slicing String slicing is the fastest and most straigh
2 min read
Python | Categorize the given list by string size
Sometimes, we have a use case in which we need to perform the grouping of strings by various factors, like first letter or any other factor. These type of problems are typical to database queries and hence can occur in web development while programming. This article focuses on one such grouping by s
6 min read