How to find the longest non-repeating substring in a string with python

The problem

Say you want to find out the length of the longest substring in a string without any repeating characters.

As an example. Let's say the string is as following

abcabccb

The output of the above would be 3 since abc is the longest substring with no repeating characters.

The Solution

To solve this problem, we'll use a sliding window.

What is a sliding window?

This is a concept that can be used to solve a lot of problems pertaining to lists or iterables like strings. Consider there is a window superimposed over the first character of our string:

[a]b c a b c c b

We'll also maintain a variable which will store the value of the maximum substring with no repeating characters that we have found until now.

max_length=1

We are going to slide this window from the right side if the next character is different from the current character, which it is. So the window looks like this:

[a b]c a b c c b

max_length=2

The same goes for the next character:

[a b c] a b c c b

max_length=3

So far, we've been able to identify a substring that is 3 characters long and has no repeating strings. However, the next character already exists in our current substring.

What we need to do now is to slide the left end of our window such that the new character (a in this case) is out of our window. We will match characters in our substring with the new one until we find the matching one. This will be the new left end of our window. We'll also add the new character to our substring by sliding the right end by one character. We will also save the maximum length 3

a [b c a] b c c b

max_length=3

Since the next character also exists in our substring, we'll repeat the process from above. The maximum length stays the same. This will give us the following:

a b [c a b] c c b

max_length=3

Again, the c on the right side of the window already exists in our string. Again, the maximum length stays the same as before which was 3 We'll slide the window again.

a b c [a b c] c b

max_length=3

The next character is a c. This character exists in our window and it is the last character inside it. We'll keep matching all of our characters inside the window until we find a c which is at the end of the window. After all the necessary operations, this is what our window looks like:

a b c a b c [c] b

max_length=3

The next character is added to the substring and this is our final output:

a b c a b c [c b]

max_length=3

So the final output of the given string is 3

How do we implement this in python?

Code

Let's look at some code:

def length_of_longest_substring(s):
    my_set = set()
    start = 0
    maximum = 0

    for end, value in enumerate(s):
        if s[end] in my_set:
            while s[start] != s[end]:
                my_set.remove(s[start])
                start += 1
            my_set.remove(s[start])
            start += 1
        my_set.add(s[end])
        maximum = max(maximum, len(my_set)) 
    return maximum

Explanation

We start off by creating a variable called my_set that will store the characters of the sliding_window. This variable is a set.

We previously talked left and right ends of this window. The variables start and end correspond to these respectively. These variables correspond to the indexes of the characters of our string. Remember, a string in python is an iterable and therefore can be looped over.

That is exactly what we do. We loop over the given string using the for loop above.

The if condition inside the for loop checks if the character is inside our window (the my_set) or not. If it's not, then it adds the variable to the window. It then computes the maximum length of the substring by comparing length of my_set with the previous maximum length stored in the variable maximum. It then moves on to the next character.

If the new character is found inside the loop, this means that the left end of the window needs to be slided such that the character is no longer inside the loop.

This is done by using the while loop inside the if condition:

while s[start] != s[end]:
    my_set.remove(s[start])
    start += 1

We keep comparing the value of s[start] with the value of s[end]. It is important to note here that the value of s[end] does not change as it is the character we found inside the window. Instead we slide the left side by incrementing the value of the index start until we find the character that matches. This way the old matching character is removed from the left side of the window and the new matching character is added to the right side of it.

We keep doing this until we have parsed through the entire string. At the end of this whole operation, we are left with the maximum value of the longest substring with no repeating characters.

The sliding window can be used to solve quite a few other problems and we'll do a dedicated post on it soon.

Previous
Previous

How to extend the default User Model of Django

Next
Next

Iterables in Python