
I've done some work on a trie-based rule store for improving the performance of Joran. The good news is that the initial version of the trie performs 5x better than the list-based rule store. Moreover, the performance of the trie could be significantly improved by using a less brain-dead indexing scheme for child nodes. As to the bad news, Joran's performance is impacted mostly by the XML parser and the application of actions. Computations done by the existing rule-store probably account for less than 20% of the overall cost of invoking Joran. You can get the code at [1]. My trie implementation should be easy to understand for those familiar with tries. [1] https://github.com/ceki/trie2 On 6/18/2013 1:18 PM, ceki wrote:
Hello all,
I am currently working on improving the performance of Joran rule store. As described in [1], Joran interprets XML files by applying rules while traversing the XML tree. A rule consists of a pattern and an action.
While traversing the tree of XML elements, Joran asks its RuleStore for the Action applicable for the current element.
Let assume that the RuleStore consists of just two rules
rule1: "configuration" -> ConfigurationAction rule2: "configuration/statusListener" -> StatusListenerAction
here is a short config file:
<configuration> <statusListener class="..." /> <logger .../> </configuration>
for the <configuration> element, the rule 1 matches and ConfigurationAction is applicable; for the <statusListener> element nested within <configuration> rule 2 matches and StatusListenerAction is applicable; for the <logger> element there is no matching rule.
Joran's RuleStore understands 3 types of patterns:
1) EXACT patterns, such as "configuration" and "configuration/statusListener"
2) PREFIX patterns, such as configuration/appender/sift/*
3) SUFFIX patterns, such as "*/param", "*/if", "*if/then"
4) SUB-STRING patterns, such as "*/if/then/*" and "*/if/else/*"
Currently there are 27 rules using exact patterns, one rule using a prefix pattern, four rules using suffix patterns and two rules using sub-string patterns. The rules are defined in [2] and [3].
The current implementation of the RuleStore [4] is rather naive. It iterates over the various patterns to see if there is a match.
Placing the various patterns as keys in a trie seems like a good way of improving performance. BTW, Joran instances are not shared so that support for concurrent access is not required.
Given that there are only a few rules, I was initially thinking of using trie nodes with 256 links. Assuming an average key length of 20 and 30 rules, that would use about 600x256 node pointers bytes or 1.2MB of memory (assuming 8byte pointers). However, since many of the patterns share the same prefixes, it makes sense to compress the trie, as done in the Patricia trie.
Anyway, for exact and prefix patterns, the trie data structure seems highly appropriate. For suffix patterns, a reverse trie could do the job. For sub-strings patterns, I guess looping over the patterns would be acceptable (there are only two sub-string patterns).
I should note that when the traversing the XML tree, often enough none of the rules are applicable and Joran falls back on implicit rules. Thus, in practice, much of the configuration work is done by implicit actions which are triggered when the search for an explicit rule fails.
Enough said.
Your comments, questions and ideas for improvement are most welcome,
[1] http://logback.qos.ch/manual/onJoran.html [2] http://tinyurl.com/JoranConfiguratorBase [3] http://tinyurl.com/JoranConfigurator [4] http://tinyurl.com/SimpleRuleStore