Number of Matching Subsequences

medium hash map two pointers string trie

Problem

Given string s and an array of words, return how many of those words are subsequences of s.

Inputs = "abcde", words = ["a","bb","acd","ace"]

Output3

"a", "acd", "ace" are all subsequences of "abcde"; "bb" is not.

words (comma-separated)

from collections import defaultdict

def num_matching_subseq(s, words):
    waiting = defaultdict(list)
    for w in words:
        waiting[w[0]].append((w, 0))
    matched = 0
    for c in s:
        bucket = waiting[c]
        waiting[c] = []
        for w, i in bucket:
            j = i + 1
            if j == len(w):
                matched += 1
            else:
                waiting[w[j]].append((w, j))
    return matched

function numMatchingSubseq(s, words) {
  const waiting = {};
  for (let i = 0; i < 26; i++) waiting[String.fromCharCode(97 + i)] = [];
  for (const w of words) waiting[w[0]].push([w, 0]);
  let matched = 0;
  for (const c of s) {
    const bucket = waiting[c];
    waiting[c] = [];
    for (const [w, i] of bucket) {
      const j = i + 1;
      if (j === w.length) matched++;
      else waiting[w[j]].push([w, j]);
    }
  }
  return matched;
}

class Solution {
    public int numMatchingSubseq(String s, String[] words) {
        List<int[]>[] waiting = new ArrayList[26];
        for (int i = 0; i < 26; i++) waiting[i] = new ArrayList<>();
        for (int k = 0; k < words.length; k++) waiting[words[k].charAt(0) - 'a'].add(new int[]{k, 0});
        int matched = 0;
        for (char c : s.toCharArray()) {
            int idx = c - 'a';
            List<int[]> bucket = waiting[idx];
            waiting[idx] = new ArrayList<>();
            for (int[] p : bucket) {
                int j = p[1] + 1;
                if (j == words[p[0]].length()) matched++;
                else waiting[words[p[0]].charAt(j) - 'a'].add(new int[]{p[0], j});
            }
        }
        return matched;
    }
}

class Solution {
public:
    int numMatchingSubseq(string s, vector<string>& words) {
        vector<vector<pair<int, int>>> waiting(26);
        for (int k = 0; k < (int)words.size(); k++) waiting[words[k][0] - 'a'].push_back({k, 0});
        int matched = 0;
        for (char c : s) {
            auto bucket = move(waiting[c - 'a']);
            waiting[c - 'a'].clear();
            for (auto& p : bucket) {
                int j = p.second + 1;
                if (j == (int)words[p.first].size()) matched++;
                else waiting[words[p.first][j] - 'a'].push_back({p.first, j});
            }
        }
        return matched;
    }
};

Explanation

Checking each word against s separately would re-scan s many times. The clever idea is to scan s just once and let all the words make progress together, by grouping words by the next letter each one is waiting for.

Set up a waiting map of buckets. Each word starts in the bucket for its first letter, tracked as a pair (word, index) meaning "this word needs its character at index next." For example "acd" begins in bucket 'a' at position 0.

Now walk through s character by character. For the current letter c, take everything in bucket c and clear it. Each word there just matched its needed letter, so advance its index by one. If the word has reached its end, matched increases; otherwise re-file it into the bucket of its new next letter.

Example: s = "abcde", words ["a","bb","acd","ace"]. Scanning a finishes "a" and moves "acd", "ace" to bucket c. Later c, d, e finish "acd" and "ace". "bb" never finds its second b, so the answer is 3.

Each character of s and each letter of each word is handled once, so the work is O(|s| + total word length).

Time: O(|s| + Σ|wᵢ|) Space: O(Σ|wᵢ|)