Number of Matching Subsequences
Problem
Given string s and an array of words, return how many of those words are subsequences of s.
s = "abcde", words = ["a","bb","acd","ace"]3from collections import defaultdict
def num_matching_subseq(s, words):
waiting = defaultdict(list)
for w in words:
waiting[w[0]].append((w, 0))
matched = 0
for c in s:
bucket = waiting[c]
waiting[c] = []
for w, i in bucket:
j = i + 1
if j == len(w):
matched += 1
else:
waiting[w[j]].append((w, j))
return matched
function numMatchingSubseq(s, words) {
const waiting = {};
for (let i = 0; i < 26; i++) waiting[String.fromCharCode(97 + i)] = [];
for (const w of words) waiting[w[0]].push([w, 0]);
let matched = 0;
for (const c of s) {
const bucket = waiting[c];
waiting[c] = [];
for (const [w, i] of bucket) {
const j = i + 1;
if (j === w.length) matched++;
else waiting[w[j]].push([w, j]);
}
}
return matched;
}
class Solution {
public int numMatchingSubseq(String s, String[] words) {
List<int[]>[] waiting = new ArrayList[26];
for (int i = 0; i < 26; i++) waiting[i] = new ArrayList<>();
for (int k = 0; k < words.length; k++) waiting[words[k].charAt(0) - 'a'].add(new int[]{k, 0});
int matched = 0;
for (char c : s.toCharArray()) {
int idx = c - 'a';
List<int[]> bucket = waiting[idx];
waiting[idx] = new ArrayList<>();
for (int[] p : bucket) {
int j = p[1] + 1;
if (j == words[p[0]].length()) matched++;
else waiting[words[p[0]].charAt(j) - 'a'].add(new int[]{p[0], j});
}
}
return matched;
}
}
class Solution {
public:
int numMatchingSubseq(string s, vector<string>& words) {
vector<vector<pair<int, int>>> waiting(26);
for (int k = 0; k < (int)words.size(); k++) waiting[words[k][0] - 'a'].push_back({k, 0});
int matched = 0;
for (char c : s) {
auto bucket = move(waiting[c - 'a']);
waiting[c - 'a'].clear();
for (auto& p : bucket) {
int j = p.second + 1;
if (j == (int)words[p.first].size()) matched++;
else waiting[words[p.first][j] - 'a'].push_back({p.first, j});
}
}
return matched;
}
};
Explanation
Checking each word against s separately would re-scan s many times. The clever idea is to scan s just once and let all the words make progress together, by grouping words by the next letter each one is waiting for.
Set up a waiting map of buckets. Each word starts in the bucket for its first letter, tracked as a pair (word, index) meaning "this word needs its character at index next." For example "acd" begins in bucket 'a' at position 0.
Now walk through s character by character. For the current letter c, take everything in bucket c and clear it. Each word there just matched its needed letter, so advance its index by one. If the word has reached its end, matched increases; otherwise re-file it into the bucket of its new next letter.
Example: s = "abcde", words ["a","bb","acd","ace"]. Scanning a finishes "a" and moves "acd", "ace" to bucket c. Later c, d, e finish "acd" and "ace". "bb" never finds its second b, so the answer is 3.
Each character of s and each letter of each word is handled once, so the work is O(|s| + total word length).