Just found out that there is a popular data structure to deal with substring kind of problems, called the "Suffix Tree". (constrast with "Prefix tree", which is used in longest prefix kind of problems)
Suffix tree can be constructed with the simple O($$n^2$$) algo (insert the "n" suffixes of a string with n chars into the tree creating a child wherever necessary) or with more advanced algorithms that can do it in O(n) (http://en.wikipedia.org/wiki/Ukkonen%27s_algorithm). The point is, once we have a suffix tree, we can execute several of these algorithms in linear time.
Ref: http://www.csc.liv.ac.uk/~leszek/COMP526/week5/week5.pdf
Suffix tree can be constructed with the simple O($$n^2$$) algo (insert the "n" suffixes of a string with n chars into the tree creating a child wherever necessary) or with more advanced algorithms that can do it in O(n) (http://en.wikipedia.org/wiki/Ukkonen%27s_algorithm). The point is, once we have a suffix tree, we can execute several of these algorithms in linear time.
- Longest repeating substring: the node which is farthest away from the tree but which has degree > 1.
- Longest substring which is also a palindrome: to do this, we take te original string lets say abcac and concatenate it with itself reversed i.e. abcaccacba and run "longest repeating substring"
- Longest common substring: where we concatenate 2 strings first e.g. abcd and bcd so as to get abcdbcd and now search for the node deepest down which has degree > 1.
- Shortest substring that occurs only once: to answer this, we find the node with the shortest length that takes us to a leaf node.
Ref: http://www.csc.liv.ac.uk/~leszek/COMP526/week5/week5.pdf
No comments:
Post a Comment