I piggy-backed the PatternLayoutBase class to reuse its converter logic for converting the layout patterns into regular expressions. I don't think this is a very clean way of doing it, but I went with it for now. Originally, I was thinking we pass the input stream (read from a file) to the converters, allowing them to parse something from the stream and advance the stream position, but I wasn't sure how to get the parsed items back to the caller or how well the regex matching would work if only one regex pattern were given at a time.
I was thinking we use regex capture groups to capture the fields. I just haven't added them to regex patterns yet, as I need to figure out exactly what fields to look for. Perhaps you have a better way to capture the fields.
I don't yet have a complete design thought out yet, and I'd like to collaborate on that. My initial thoughts were:
1. Determine the logback log-file pattern (e.g., "#logback.class-pattern: %d{HH:mm:ss} %msg%n") by reading it from the file or from a command-line parameter.
2. For each pattern element, convert the pattern to a named regular-expression capture group, where the name is the pattern itself (e.g., "(?<%d{HH:MM:SS}>\\d{2}:\\d{2}:\\d{2}) ((?s).+)(\\n)"). Compile the regular expression into a Pattern object for better performance during iterative matching.
NOTE: Name capture groups require Java 7 or a 3rd party library.
3. Match each line of the file with the regex pattern. Collect all matches, and parse the capture groups into a proxy class for LoggingEvent/AccessEvent. The proxy class is used for serialization annotations (e.g., JsonSerialize [1]).
4. Use the appropriate serializer (based on format specified from command-line) to process the proxy events, thereby outputting them to a file or stdout.
My logic above relies on effective regular expressions, which I'm still validating in my unit tests. I hope to make better progress especially with you coming aboard.