Distinct Subsequences II
Problem
Given a string s, return the number of distinct non-empty subsequences of s. Since the answer may be very large, return it modulo 10^9 + 7. A subsequence is formed by deleting some (possibly zero) characters without changing the relative order of the remaining characters.
s = "abc"7def distinct_subseq_ii(s):
MOD = 10**9 + 7
end = {} # end[c] = # distinct subseq ending in c
for c in s:
total = sum(end.values()) % MOD
end[c] = (total + 1) % MOD
return sum(end.values()) % MOD
function distinctSubseqII(s) {
const MOD = 1000000007n;
const end = {};
for (const c of s) {
let total = 0n;
for (const k in end) total = (total + end[k]) % MOD;
end[c] = (total + 1n) % MOD;
}
let ans = 0n;
for (const k in end) ans = (ans + end[k]) % MOD;
return Number(ans);
}
class Solution {
public int distinctSubseqII(String s) {
int MOD = 1_000_000_007;
long[] end = new long[26];
for (char c : s.toCharArray()) {
long total = 0;
for (long e : end) total = (total + e) % MOD;
end[c - 'a'] = (total + 1) % MOD;
}
long ans = 0;
for (long e : end) ans = (ans + e) % MOD;
return (int) ans;
}
}
int distinctSubseqII(string s) {
const int MOD = 1e9 + 7;
vector<long long> end(26, 0);
for (char c : s) {
long long total = 0;
for (long long e : end) total = (total + e) % MOD;
end[c - 'a'] = (total + 1) % MOD;
}
long long ans = 0;
for (long long e : end) ans = (ans + e) % MOD;
return (int) ans;
}
Explanation
Counting distinct subsequences is tricky because the same subsequence can be formed many ways, so we must avoid double counting. The neat fix is to group subsequences by their last character and keep one count per ending letter in a map end.
When we read a new character c, every subsequence we already have can be extended by c, and there is also the brand-new single-character subsequence "c". So the new number of subsequences ending in c is total + 1, where total = sum of all current end counts.
The magic is that we overwrite end[c] instead of adding to it. Any subsequence that previously ended in c is also reachable by extending with this new c, so overwriting automatically discards the duplicates and keeps only distinct ones.
Example: s = "abc". After 'a': end['a']=1. After 'b': total is 1, so end['b']=2 ("b","ab"). After 'c': total is 3, so end['c']=4 ("c","ac","bc","abc"). Summing gives 1+2+4 = 7 distinct subsequences.
The final answer is the sum of all end counts. Each character does a small fixed amount of work (the alphabet is size 26), so it runs in linear time, all taken modulo 10^9 + 7.