Encode and Decode Strings
Problem
Design an algorithm to encode a list of strings to a single string, and another algorithm to decode that single string back to the original list. The encoded form must be unambiguous for any input — strings may contain any character, including delimiters.
Use a length-prefix protocol: each string s is encoded as len(s) + "#" + s. To decode, repeatedly read digits until the next '#', read that many characters, then continue.
["lint", "code", "love", "you"]"4#lint4#code4#love3#you"["lint", "code", "love", "you"]def encode(strs):
out = []
for s in strs:
out.append(f"{len(s)}#{s}")
return "".join(out)
def decode(encoded):
res, i = [], 0
while i < len(encoded):
j = i
while encoded[j] != "#":
j += 1
length = int(encoded[i:j])
res.append(encoded[j+1:j+1+length])
i = j + 1 + length
return res
function encode(strs) {
let out = "";
for (const s of strs) out += s.length + "#" + s;
return out;
}
function decode(encoded) {
const res = [];
let i = 0;
while (i < encoded.length) {
let j = i;
while (encoded[j] !== "#") j++;
const length = parseInt(encoded.slice(i, j), 10);
res.push(encoded.slice(j + 1, j + 1 + length));
i = j + 1 + length;
}
return res;
}
class Codec {
public String encode(List<String> strs) {
StringBuilder sb = new StringBuilder();
for (String s : strs) sb.append(s.length()).append('#').append(s);
return sb.toString();
}
public List<String> decode(String encoded) {
List<String> res = new ArrayList<>();
int i = 0;
while (i < encoded.length()) {
int j = i;
while (encoded.charAt(j) != '#') j++;
int length = Integer.parseInt(encoded.substring(i, j));
res.add(encoded.substring(j + 1, j + 1 + length));
i = j + 1 + length;
}
return res;
}
}
string encode(vector<string>& strs) {
string out;
for (auto& s : strs) out += to_string(s.size()) + "#" + s;
return out;
}
vector<string> decode(string encoded) {
vector<string> res;
int i = 0;
while (i < (int)encoded.size()) {
int j = i;
while (encoded[j] != '#') j++;
int length = stoi(encoded.substr(i, j - i));
res.push_back(encoded.substr(j + 1, length));
i = j + 1 + length;
}
return res;
}
Explanation
We need to squash a list of strings into one string and later pull them back out — even when the strings themselves contain commas, spaces, or any other character. A plain separator like "," would be ambiguous, so we use a length-prefix trick instead.
To encode, each string s is written as len(s) + "#" + s. The number tells the decoder exactly how many characters to read, so the # only ever marks the end of that number — the contents of s can be anything.
To decode, we sit at position i, scan forward to the next # to read the length, then grab that many characters as the next string and jump i past them. We repeat until we reach the end.
Example: ["lint", "code", "love", "you"] encodes to "4#lint4#code4#love3#you". Decoding reads 4, takes lint, reads 4, takes code, and so on — recovering the original list exactly.
Because every character is written once and read once, both directions run in time proportional to the total number of characters.