
On Tue, Jun 19, 2012 at 2:37 PM, ceki <ceki@qos.ch> wrote:
Hi Tony,
Great to see logback-decoder making progress. I see that given a pattern your are able to produce a regex. I am also glad to see that you are much more savvy at writing regular expressions than I am.
I piggy-backed the PatternLayoutBase class to reuse its converter logic for converting the layout patterns into regular expressions. I don't think this is a very clean way of doing it, but I went with it for now. Originally, I was thinking we pass the input stream (read from a file) to the converters, allowing them to parse something from the stream and advance the stream position, but I wasn't sure how to get the parsed items back to the caller or how well the regex matching would work if only one regex pattern were given at a time. Have you thought about how to capture fields so as to fill in
LoggingEvent/AccessEvent fields? At this stage of the code, there is no grouping in these regular expressions so it is not clear how they could be used to capture field data. Anyway, do you already have an idea how to go further or should we come up with something together?
I was thinking we use regex capture groups to capture the fields. I just haven't added them to regex patterns yet, as I need to figure out exactly what fields to look for. Perhaps you have a better way to capture the fields. I don't yet have a complete design thought out yet, and I'd like to collaborate on that. My initial thoughts were: 1. Determine the logback log-file pattern (e.g., "#logback.class-pattern: %d{HH:mm:ss} %msg%n") by reading it from the file or from a command-line parameter. 2. For each pattern element, convert the pattern to a named regular-expression capture group, where the name is the pattern itself (e.g., "(?<%d{HH:MM:SS}>\\d{2}:\\d{2}:\\d{2}) ((?s).+)(\\n)"). Compile the regular expression into a Pattern object for better performance during iterative matching. NOTE: Name capture groups require Java 7 or a 3rd party library. 3. Match each line of the file with the regex pattern. Collect all matches, and parse the capture groups into a proxy class for LoggingEvent/AccessEvent. The proxy class is used for serialization annotations (e.g., JsonSerialize [1]). 4. Use the appropriate serializer (based on format specified from command-line) to process the proxy events, thereby outputting them to a file or stdout. My logic above relies on effective regular expressions, which I'm still validating in my unit tests. I hope to make better progress especially with you coming aboard. [1] http://sghill-dev.blogspot.com/2012/04/how-do-i-write-jackson-json-serialize...