Concatenated Words

The problem is asking to find all the words in the given list that can be formed by concatenating at least two other words from the list.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class Solution:
    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
        words_set = set(words)
        memo = {}

        def canForm(word):
            if word in memo:
                return memo[word]

            for i in range(1, len(word)):
                prefix = word[:i]
                suffix = word[i:]

                if prefix in words_set and suffix in words_set:
                    memo[word] = True
                    return True

                if prefix in words_set and canForm(suffix):
                    memo[word] = True
                    return True

                if suffix in words_set and canForm(prefix):
                    memo[word] = True
                    return True

            memo[word] = False
            return False

        concatenated_words = []
        for word in words:
            words_set.remove(word)
            if canForm(word):
                concatenated_words.append(word)
            words_set.add(word)

        return concatenated_words

Explanation:

  • We define a recursive function canForm to check if a word can be formed by concatenating other words from the list.
  • Inside this function, we split the word into two parts (prefix and suffix) and check if both parts are in the list, or if one part is in the list and the other part can be formed by recursion.
  • We use memoization (memo) to store already computed results for efficiency.
  • Finally, we iterate through the original list and apply canForm to find all concatenated words.

This solution considers all possibilities and avoids redundant computations, achieving a time complexity that’s closer to ( O(n \cdot m^2) ), where ( n ) is the number of words and ( m ) is the average length of a word.

Identifying Problem Isomorphism

“Concatenated Words” requires you to find all the words in a given list that are formed by concatenating one or more words from the same list.

An easier problem with a similar concept is “Word Break”. In this problem, you’re given a string and a dictionary of words. The objective is to determine if the string can be segmented into a space-separated sequence of one or more dictionary words.

A more complex problem is “Palindrome Pairs”. In this problem, you are given a list of unique words and you need to find all pairs of distinct indices (i, j) such that the concatenation of the two words forms a palindrome. This problem shares the concept of concatenation from “Concatenated Words”, but adds another layer of complexity by requiring the concatenated word to form a palindrome.

The reasoning behind this selection is that all three problems involve the idea of concatenating strings, but they vary in complexity:

  1. “Word Break” checks if a given string can be broken into valid words from a dictionary.
  2. “Concatenated Words” builds on this concept by requiring to find all words in a list that can be formed by concatenating other words from the same list.
  3. “Palindrome Pairs” further escalates the complexity by concatenating two words to form a palindrome.

So, in order of increasing complexity, the problems are:

  1. “Word Break”
  2. “Concatenated Words”
  3. “Palindrome Pairs”

This mapping is approximate as the problems aren’t isomorphic, but they share the core principle of string concatenation.

Here are the problems to prepare you to tackle the “Concatenated Words” problem:

  1. “Word Break” (LeetCode Problem #139): This problem is a simpler version of “Concatenated Words” where you have to find if a given string can be broken down into words present in a dictionary.

  2. “Word Break II” (LeetCode Problem #140): A slightly more complex version of the previous problem, where you not only have to find if the string can be broken down but also return all possible sentences.

  3. “Longest Word in Dictionary” (LeetCode Problem #720): This problem asks you to find the longest word that can be built one character at a time, which is useful for understanding how to work with words made up of other words.

  4. “Longest Word in Dictionary through Deleting” (LeetCode Problem #524): Here, you need to find the longest string in the dictionary that can be formed by deleting some characters of the given string.

  5. “Implement Trie (Prefix Tree)” (LeetCode Problem #208): Trie is a very important data structure when it comes to handling word-related problems, understanding this can be very useful.

  6. “Add and Search Word - Data structure design” (LeetCode Problem #211): This problem provides practice for using trie data structures with a twist.

  7. “Replace Words” (LeetCode Problem #648): This problem is about replacing roots in a sentence, which involves finding words within other words.

  8. “Palindrome Pairs” (LeetCode Problem #336): This problem requires identifying pairs of words that can form a palindrome when concatenated. While more complex than some other problems, it provides practice for working with concatenated words.

  9. “Short Encoding of Words” (LeetCode Problem #820): This problem involves finding a short encoding of words where shorter words can be found at the end of longer words. The concepts here are related to those in “Concatenated Words”.

  10. “Prefix and Suffix Search” (LeetCode Problem #745): This problem also involves working with prefixes and suffixes of words, similar to the “Concatenated Words” problem.

  11. Subsets (LeetCode 78): This problem can help you understand the basics of depth-first search, which can be very useful for solving the Concatenated Words problem.

  12. Longest Increasing Subsequence (LeetCode 300): This problem can help build your dynamic programming skills.

  13. Palindrome Partitioning (LeetCode 131): Although not directly related, this problem also combines depth-first search and dynamic programming.

  14. Longest Common Subsequence (LeetCode 1143): This problem is a classic dynamic programming problem that can help build your understanding of the concept.

  15. Distinct Subsequences (LeetCode 115): This problem can help you learn how to count the number of different subsequences in a string using dynamic programming.

  16. Word Ladder (LeetCode 127): This problem is similar to Concatenated Words but involves finding the shortest transformation sequence from one word to another.

  17. Edit Distance (LeetCode 72): This problem involves finding the minimum number of operations required to convert one word to another, and can help build your string manipulation and dynamic programming skills.

The goal is to get familiar with dealing with string manipulation, and recursion as they are key elements to solving the “Concatenated Words” problem.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class TrieNode:
    def __init__(self):
        self.children = [None] * 26
        self.is_end = False

class Solution:
    def __init__(self):
        self.root = TrieNode()
        
    def insert(self, root, key):
        curr = root
        for i in range(len(key)):
            idx = ord(key[i]) - ord('a')
            if not curr.children[idx]:
                curr.children[idx] = TrieNode()
            curr = curr.children[idx]
        curr.is_end = True

    def dfs(self, root, key, index, count):
        if index >= len(key):
            return count > 1
        curr = root
        for i in range(index, len(key)):
            p = ord(key[i]) - ord('a')
            if not curr.children[p]:
                return False
            curr = curr.children[p]
            if curr.is_end:
                if self.dfs(root, key, i+1, count+1):
                    return True
        return False

    def findAllConcatenatedWordsInADict(self, words):
        for i in range(len(words)):
            self.insert(self.root, words[i])
        ans = []
        for i in range(len(words)):
            if self.dfs(self.root, words[i], 0, 0):
                ans.append(words[i])
        return ans

Problem Classification

The problem asks to find all the words that can be formed by concatenating other words. This is a Word Concatenation Problem.

Clarification Questions

What are the clarification questions we can ask about this problem?

Language Agnostic Coding Drills

To solve this problem, the following coding concepts are used:

  1. Data Structures - Trie (Prefix Tree): Trie, also called digital tree and sometimes radix tree or prefix tree, is a kind of search tree—an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings.

  2. TrieNode Creation: The creation of nodes in a Trie involves initializing an array for the 26 English alphabets (a to z) and a flag to indicate the end of a word.

  3. Insertion into Trie: The operation to insert a word into the Trie involves traversing each character of the word, checking if the corresponding child node exists in the Trie and creating a new one if it does not exist.

  4. Depth-First Search (DFS) on Trie: DFS is a tree traversal algorithm that is used to traverse all the nodes of a tree. In this case, DFS is used to traverse the Trie and find out whether a word can be formed by the concatenation of two or more words in the Trie.

  5. Checking the end of a word in Trie: A flag is used in the TrieNode to check whether the current character is the end of a word. This helps in finding whether a word exists in the Trie.

  6. Recursive Functions: Recursive functions are functions that call themselves during their execution. In this problem, the DFS operation on Trie is a recursive function.

The problem-solving approach for this problem is as follows:

  1. First, all the words are inserted into the Trie.

  2. Then, for each word in the list of words, a DFS is performed on the Trie starting from the root.

  3. During the DFS, if the end of a word is reached in the Trie, a recursive call to DFS is made with the index of the next character and the count of words incremented by 1. If the DFS returns true, then the current word is added to the answer.

  4. If no word ending is found during the DFS, then the DFS returns false.

  5. Finally, the list of all words that are formed by the concatenation of two or more words in the Trie is returned.

Targeted Drills in Python

Let’s get to the coding drills in Python.

Drill 1 - Creating a TrieNode:

In Python, the TrieNode data structure can be created using a class. This data structure has a list of its children (for the 26 alphabets) and a flag to indicate the end of a word.

1
2
3
4
class TrieNode:
    def __init__(self):
        self.children = [None] * 26
        self.is_end = False

Drill 2 - Inserting a word into Trie:

This function takes a root node and a word as parameters, and inserts the word into the Trie.

1
2
3
4
5
6
7
8
def insert(self, root, key):
    curr = root
    for i in range(len(key)):
        idx = ord(key[i]) - ord('a')
        if not curr.children[idx]:
            curr.children[idx] = TrieNode()
        curr = curr.children[idx]
    curr.is_end = True

Drill 3 - Performing DFS on Trie:

This function takes a root node, a word, an index and a count as parameters, and performs a depth-first search on the Trie to find out whether the word can be formed by the concatenation of two or more words in the Trie.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def dfs(self, root, key, index, count):
    if index >= len(key):
        return count > 1
    curr = root
    for i in range(index, len(key)):
        p = ord(key[i]) - ord('a')
        if not curr.children[p]:
            return False
        curr = curr.children[p]
        if curr.is_end:
            if self.dfs(root, key, i+1, count+1):
                return True
    return False

Drill 4 - Finding all concatenated words in a dictionary:

This function takes a list of words as a parameter, inserts all the words into a Trie, and then performs a DFS on the Trie for each word to find out whether the word can be formed by the concatenation of two or more words in the Trie.

1
2
3
4
5
6
7
8
def findAllConcatenatedWordsInADict(self, words):
    for i in range(len(words)):
        self.insert(self.root, words[i])
    ans = []
    for i in range(len(words)):
        if self.dfs(self.root, words[i], 0, 0):
            ans.append(words[i])
    return ans

Remember that the above drills are meant to be run within a class, and they depend on each other. In an actual coding problem, they would not be separate drills but parts of the same solution.