
Hello all, I am currently working on improving the performance of Joran rule store. As described in [1], Joran interprets XML files by applying rules while traversing the XML tree. A rule consists of a pattern and an action. While traversing the tree of XML elements, Joran asks its RuleStore for the Action applicable for the current element. Let assume that the RuleStore consists of just two rules rule1: "configuration" -> ConfigurationAction rule2: "configuration/statusListener" -> StatusListenerAction here is a short config file: <configuration> <statusListener class="..." /> <logger .../> </configuration> for the <configuration> element, the rule 1 matches and ConfigurationAction is applicable; for the <statusListener> element nested within <configuration> rule 2 matches and StatusListenerAction is applicable; for the <logger> element there is no matching rule. Joran's RuleStore understands 3 types of patterns: 1) EXACT patterns, such as "configuration" and "configuration/statusListener" 2) PREFIX patterns, such as configuration/appender/sift/* 3) SUFFIX patterns, such as "*/param", "*/if", "*if/then" 4) SUB-STRING patterns, such as "*/if/then/*" and "*/if/else/*" Currently there are 27 rules using exact patterns, one rule using a prefix pattern, four rules using suffix patterns and two rules using sub-string patterns. The rules are defined in [2] and [3]. The current implementation of the RuleStore [4] is rather naive. It iterates over the various patterns to see if there is a match. Placing the various patterns as keys in a trie seems like a good way of improving performance. BTW, Joran instances are not shared so that support for concurrent access is not required. Given that there are only a few rules, I was initially thinking of using trie nodes with 256 links. Assuming an average key length of 20 and 30 rules, that would use about 600x256 node pointers bytes or 1.2MB of memory (assuming 8byte pointers). However, since many of the patterns share the same prefixes, it makes sense to compress the trie, as done in the Patricia trie. Anyway, for exact and prefix patterns, the trie data structure seems highly appropriate. For suffix patterns, a reverse trie could do the job. For sub-strings patterns, I guess looping over the patterns would be acceptable (there are only two sub-string patterns). I should note that when the traversing the XML tree, often enough none of the rules are applicable and Joran falls back on implicit rules. Thus, in practice, much of the configuration work is done by implicit actions which are triggered when the search for an explicit rule fails. Enough said. Your comments, questions and ideas for improvement are most welcome, [1] http://logback.qos.ch/manual/onJoran.html [2] http://tinyurl.com/JoranConfiguratorBase [3] http://tinyurl.com/JoranConfigurator [4] http://tinyurl.com/SimpleRuleStore -- Ceki 65% of statistics are made up on the spot