Longest Substring with At Most K Distinct Characters

Rework later. Don't write code when you travel.

Question (LC.340)

Given a string, find the longest string that contains at most K distinct characters.

Example

I: s = "eceba", k = 2
O: "ece"
I: s = "eceba", k = 3
O: "eceb"

Analysis

The question asks for a longest substring with a constraint. Typically, this is a substring/window problem.

Brute Force Approach

From the start of every string, have a hashset, stops when size exceeds k. Store an int to keep track of the longest substring under this constraint.

Here is the pseudo-code:

func longestSubKD(str, k)
    if s == null || len(s) == 0
        return 0

    int maxLenght = 0
    Set<Character> charSet = new HashSet

    for i from 0 to len(s)
        charSet.clear()
        for j from i to len(s) and charSet.size() <= k // comment
            charSet.add(str.charAt(j))
        end for
        maxLength = Math.max(maxLength, j-i) 
        // since the for loop breaks after the condition becomes false
        // (one step beyond the constraint), j-i+1 the +1 is omitted
    end for

    return maxLength
end func

The pseudo-code looks good but there's one logical error.

How For Loop Works

int i = 0;
int j = i;

for (i = 0; i < 5 && (j - 1) <= 3; i++) {
    j = i;
}
System.out.println("j: " + j + " i: " + i);

for (j = 0; j - 1 <= 3; j++) {
}
System.out.println("the other j: " + j);

What is the output for i, j, and the other j? They should all output 5 right? Noooo ʕº̫͡ºʔ In fact, i = 5, j = 4!, the other j = 5. How so? Let's examine how a for loop actually works

Init: i = 0, j = i
Condition: i < 5 && (j - 1) <= 3
Conditional Code: j = i
Increment: i++

             --F-> end loop
             |
Init ----> Condition --T-> ConCode ----> Increment
             /                               |
             |                               /
             --------------------------------

You see we get to increment i = 5, then the condition becomes false and the loop ends. We never get to the conditional code in this case.

Brute Force Code

public int lengthOfLongestSubstringKDistinct(String str, int k) {
    if (k <= 0 || str == null || str.length() == 0) {
        return 0;
    }

    int maxLength = 0;
    Set<Character> charSet = new HashSet<>();

    int i, j;
    for (i = 0; i < str.length(); i++) {
        charSet.clear();
        for (j = i; j < str.length(); j++) {
            charSet.add(str.charAt(j));
            if (charSet.size() > k) {
                break;
            }
        }
        maxLength = Math.max(maxLength, j-i);
    }

    return maxLength;
}

The time complexity is O(n^2). A relatively long string will result in Time Limit Exceeded.

Sliding Window Approach

Observe the following:

I: s = "eceba", k = 3
S  e c e b a
L1 i---->j
L2   i---->j
L3     i-->j
L4       i->j

Moving j back to i is not necessary. Just extend the window as we go.

Need a map for frequency of each character... Set is not enough

Sliding Window Code

TBC

Time Complexity

Only slide the window one way so O(n)

Follow Up

Data stream

results matching ""

    No results matching ""