Remove Sub-Folders from the Filesystem
Problem
Given a list of folder paths, remove every folder that is a sub-folder of another folder in the list. A path x is a sub-folder of y if x starts with y followed by a /. Return the remaining folders in any order.
folder = ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]["/a", "/c/d", "/c/f"]def remove_subfolders(folder):
folder.sort()
ans = []
for f in folder:
if not ans or not f.startswith(ans[-1] + "/"):
ans.append(f)
return ans
function removeSubfolders(folder) {
folder.sort();
const ans = [];
for (const f of folder) {
if (ans.length === 0 || !f.startsWith(ans[ans.length - 1] + "/")) {
ans.push(f);
}
}
return ans;
}
class Solution {
public List<String> removeSubfolders(String[] folder) {
Arrays.sort(folder);
List<String> ans = new ArrayList<>();
for (String f : folder) {
if (ans.isEmpty() || !f.startsWith(ans.get(ans.size() - 1) + "/")) {
ans.add(f);
}
}
return ans;
}
}
vector<string> removeSubfolders(vector<string>& folder) {
sort(folder.begin(), folder.end());
vector<string> ans;
for (auto& f : folder) {
if (ans.empty() || f.compare(0, ans.back().size() + 1, ans.back() + "/") != 0) {
ans.push_back(f);
}
}
return ans;
}
Explanation
A folder is a sub-folder of another only if its path starts with that other path plus a /. The big idea here is to sort the paths first, because once sorted, every folder lands immediately before all of its own sub-folders.
After sorting, we walk through the paths and build an answer list. For each folder f, we compare it only with the last folder we kept. If f starts with lastKept + "/", it is nested inside that parent, so we drop it. Otherwise it is a brand-new top-level folder and we add it.
Why comparing with just the last kept entry is enough: sorting guarantees a parent appears right before its children, and any kept folder shadows everything underneath it. So a single most-recent parent is all we ever need to check against.
Example: ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"] (already sorted). We keep /a; /a/b starts with /a/, drop it. Keep /c/d; /c/d/e starts with /c/d/, drop it. /c/f does not start with /c/d/, so keep it. Result: ["/a", "/c/d", "/c/f"].
The cost is dominated by the sort; the keep-or-drop scan is then a single linear pass.