Minimum Genetic Mutation
Problem
A gene string can be represented by an 8-character long string, with choices from "A", "C", "G", "T". Suppose we need to investigate about a mutation (mutation from "start" to "end"), where ONE mutation is defined as ONE single character changed in the gene string. For example, "AACCGGTT" -> "AACCGGTA" is 1 mutation.
Each gene is a vertex; edges connect genes that differ by one letter and both lie in bank ∪ {start}. BFS from start finds the shortest chain in O(N · 8 · 4) = O(N) time.
start = "AACCGGTT", end = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]2from collections import deque
def min_mutations(start, end, bank):
bank = set(bank)
if end not in bank: return -1
seen = {start}
q = deque([(start, 0)])
while q:
s, d = q.popleft()
if s == end: return d
for i in range(len(s)):
for ch in "ACGT":
if ch == s[i]: continue
t = s[:i] + ch + s[i+1:]
if t in bank and t not in seen:
seen.add(t); q.append((t, d + 1))
return -1
function minMutations(start, end, bank) {
const set = new Set(bank);
if (!set.has(end)) return -1;
const seen = new Set([start]);
const q = [[start, 0]];
while (q.length) {
const [s, d] = q.shift();
if (s === end) return d;
for (let i = 0; i < s.length; i++) {
for (const ch of "ACGT") {
if (ch === s[i]) continue;
const t = s.slice(0, i) + ch + s.slice(i + 1);
if (set.has(t) && !seen.has(t)) { seen.add(t); q.push([t, d + 1]); }
}
}
}
return -1;
}
int minMutations(String start, String end, String[] bank) {
Set<String> set = new HashSet<>(Arrays.asList(bank));
if (!set.contains(end)) return -1;
Set<String> seen = new HashSet<>(); seen.add(start);
Deque<Object[]> q = new ArrayDeque<>();
q.add(new Object[]{start, 0});
char[] alpha = {'A','C','G','T'};
while (!q.isEmpty()) {
Object[] cur = q.poll();
String s = (String) cur[0]; int d = (int) cur[1];
if (s.equals(end)) return d;
char[] arr = s.toCharArray();
for (int i = 0; i < arr.length; i++) {
char orig = arr[i];
for (char ch : alpha) {
if (ch == orig) continue;
arr[i] = ch;
String t = new String(arr);
if (set.contains(t) && seen.add(t)) q.add(new Object[]{t, d + 1});
}
arr[i] = orig;
}
}
return -1;
}
int minMutations(string start, string end, vector<string>& bank) {
unordered_set<string> set(bank.begin(), bank.end());
if (!set.count(end)) return -1;
unordered_set<string> seen{start};
queue<pair<string,int>> q; q.push({start, 0});
string alpha = "ACGT";
while (!q.empty()) {
auto [s, d] = q.front(); q.pop();
if (s == end) return d;
for (int i = 0; i < (int)s.size(); i++) {
char orig = s[i];
for (char ch : alpha) {
if (ch == orig) continue;
s[i] = ch;
if (set.count(s) && !seen.count(s)) { seen.insert(s); q.push({s, d + 1}); }
}
s[i] = orig;
}
}
return -1;
}
Explanation
We want the fewest single-letter changes to turn start into end, where every intermediate gene must be a valid one in the bank. Since we want the shortest chain, this is a perfect fit for BFS.
Each gene is a node, and two genes are neighbors if they differ in exactly one letter and both are valid. BFS explores level by level, so the first time we reach end, the level number is the minimum number of mutations.
From the current gene we generate every neighbor: for each of the 8 positions we try replacing the letter with A, C, G, or T. If the result is in bank and not yet seen, we enqueue it with distance d + 1. The seen set stops us from revisiting genes.
If the queue empties without reaching end (or end is not even in the bank), we return -1.
Example: AACCGGTT → AACCGGTA → AAACGGTA. Each step changes one letter and lands on a bank gene, so the answer is 2.