Optimal Partition of String
Problem
You are given a string s of lowercase English letters. Split it into one or more non-overlapping substrings that together cover the whole string, with the rule that no letter may appear twice inside the same substring. Return the minimum number of substrings such a split can have.
s = "abacaba"4def partition_string(s):
seen = set()
parts = 1
for ch in s:
if ch in seen:
parts += 1 # cut: start a new piece
seen = set()
seen.add(ch) # current char joins the piece
return parts
function partitionString(s) {
let seen = new Set();
let parts = 1;
for (const ch of s) {
if (seen.has(ch)) {
parts++; // cut: start a new piece
seen = new Set();
}
seen.add(ch); // current char joins the piece
}
return parts;
}
int partitionString(String s) {
Set<Character> seen = new HashSet<>();
int parts = 1;
for (char ch : s.toCharArray()) {
if (seen.contains(ch)) {
parts++; // cut: start a new piece
seen.clear();
}
seen.add(ch); // current char joins the piece
}
return parts;
}
int partitionString(string s) {
unordered_set<char> seen;
int parts = 1;
for (char ch : s) {
if (seen.count(ch)) {
parts++; // cut: start a new piece
seen.clear();
}
seen.insert(ch); // current char joins the piece
}
return parts;
}
Explanation
The key insight is to cut as late as possible. Scan the string left to right and keep extending the current piece. Only when the next character already occurs in the piece are we forced to cut — so that is exactly when we cut, and the repeated character becomes the first letter of a brand-new piece.
Why is the lazy cut optimal? Look at any valid partition: its first piece must end at or before the position where the greedy piece ends, because the greedy piece is the longest valid prefix. So the greedy strategy never finishes a piece earlier than necessary, which means it can never need more pieces than any other valid partition. By induction on the remaining suffix, the greedy count is the minimum.
Implementation is a one-pass loop with a hash set of the letters in the current piece. For each character: if it is already in the set, increment the piece counter and clear the set (the cut); then insert the character. The counter starts at 1 because a non-empty string always has at least one piece.
Walk through s = "abacaba": piece 1 takes a, b; the next a repeats, so cut — piece 2 takes a, c; the next a repeats again — piece 3 takes a, b; the final a repeats once more — piece 4 is just a. Total: 4 pieces, matching the expected answer.
Each character is examined once with O(1) set operations, so the time is O(n). The set never holds more than 26 lowercase letters, so the extra space is O(1).