Hi Ceki,

Just wondering about the SUB-STRING patterns, such as "*/if/then/*" and "*/if/else/*".

I'm currently working on adding a permuterm tree to the concurrent-trees[1] project. It will support fast retrieval from the tree for wildcard queries like: X, X*, *X, *X*, X*Y. These patterns would be fully accelerated.

It looks like the *X* variant might fit your sub-string patterns.

It's also possible to have partial acceleration for additional patterns such as X*Y*Z, *X*Y*Z, *X*Y*Z* and so on with any number of terms, which involves using the first and last terms to find a candidate set in the tree and then doing on-the-fly filtering for the terms in the middle.

Would that help? I'm about 80% of the way to completion.

[1] https://code.google.com/p/concurrent-trees/

Niall