Longest Substring with At Most K Distinct Characters
Rework later. Don't write code when you travel.
Question (LC.340)
Given a string, find the longest string that contains at most K distinct characters.
Example
I: s = "eceba", k = 2
O: "ece"
I: s = "eceba", k = 3
O: "eceb"
Analysis
The question asks for a longest substring with a constraint. Typically, this is a substring/window problem.
Brute Force Approach
From the start of every string, have a hashset, stops when size exceeds k. Store an int to keep track of the longest substring under this constraint.
Here is the pseudo-code:
func longestSubKD(str, k)
if s == null || len(s) == 0
return 0
int maxLenght = 0
Set<Character> charSet = new HashSet
for i from 0 to len(s)
charSet.clear()
for j from i to len(s) and charSet.size() <= k // comment
charSet.add(str.charAt(j))
end for
maxLength = Math.max(maxLength, j-i)
// since the for loop breaks after the condition becomes false
// (one step beyond the constraint), j-i+1 the +1 is omitted
end for
return maxLength
end func
The pseudo-code looks good but there's one logical error.
How For Loop Works
int i = 0;
int j = i;
for (i = 0; i < 5 && (j - 1) <= 3; i++) {
j = i;
}
System.out.println("j: " + j + " i: " + i);
for (j = 0; j - 1 <= 3; j++) {
}
System.out.println("the other j: " + j);
What is the output for i
, j
, and the other j
? They should all output 5
right? Noooo ʕº̫͡ºʔ
In fact, i = 5, j = 4!, the other j = 5
. How so? Let's examine how a for loop actually works
Init: i = 0, j = i
Condition: i < 5 && (j - 1) <= 3
Conditional Code: j = i
Increment: i++
--F-> end loop
|
Init ----> Condition --T-> ConCode ----> Increment
/ |
| /
--------------------------------
You see we get to increment i = 5
, then the condition becomes false
and the loop ends. We never get to the conditional code in this case.
Brute Force Code
public int lengthOfLongestSubstringKDistinct(String str, int k) {
if (k <= 0 || str == null || str.length() == 0) {
return 0;
}
int maxLength = 0;
Set<Character> charSet = new HashSet<>();
int i, j;
for (i = 0; i < str.length(); i++) {
charSet.clear();
for (j = i; j < str.length(); j++) {
charSet.add(str.charAt(j));
if (charSet.size() > k) {
break;
}
}
maxLength = Math.max(maxLength, j-i);
}
return maxLength;
}
The time complexity is O(n^2)
. A relatively long string will result in Time Limit Exceeded.
Sliding Window Approach
Observe the following:
I: s = "eceba", k = 3
S e c e b a
L1 i---->j
L2 i---->j
L3 i-->j
L4 i->j
Moving j
back to i
is not necessary. Just extend the window as we go.
Need a map for frequency of each character... Set is not enough
Sliding Window Code
TBC
Time Complexity
Only slide the window one way so O(n)
Follow Up
Data stream