What is the most efficient way - preferrably platform agnostic - to submit events from "the outside"?

As mentioned in the thread about LoggingEvent refactoring, I think that the underlying problem that should be solved first, is how to transport logging events in and out of a JVM as efficiently as possible. Efficient means at least to me: * Little overhead - both network bandwith wise, and cpu wise * Large amounts of events can be sent fast - the host program is not tied up. * Large amounts of events can be received fast - perhaps a bundle concept? * Supports Unicode (no 8-bit folding) The current approach for the Socketserver is a simple accept-and-fork system where the incoming data is serialized log events. Pro: Data stream is rather compact -> network friendly. Contra: Log events can only originate on a JVM with a compatible version of logback. Jörn, perhaps you'd like to do a more complete writeup on what should be anticipated? Larger datablocks may be acceptible if the data is lightly compressed. My personal preference before actual experiments would be a XML format in a dialect supported by a third party, e.g. XMLDecoder/Encoder or an Apache project, as it is important to delegate the performance work to another team. If we bring ZeroConf into it too for the discovery process it would be nice. The default behaviour of network loggers in Java has not been established yet. -- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

Thorbjoern Ravn Andersen wrote:
Jörn, perhaps you'd like to do a more complete writeup on what should be anticipated?
I'll try to write a comparison of the current SocketAppender behavior vs. my multiplexer behavior but I can't do that right now, maybe this evening... Is that what you meant? Or rather some more info on the Serializer/Deserializer concept? Joern.

Joern Huxhorn skrev:
Thorbjoern Ravn Andersen wrote:
Jörn, perhaps you'd like to do a more complete writeup on what should be anticipated?
I'll try to write a comparison of the current SocketAppender behavior vs. my multiplexer behavior but I can't do that right now, maybe this evening... Is that what you meant? Or rather some more info on the Serializer/Deserializer concept?
I was thinking of some of the Oopses you have encountered in production environements and adapted to, which could be beneficial in the official logback implemetation. -- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

On 25.02.2009, at 15:33, Thorbjoern Ravn Andersen wrote:
Joern Huxhorn skrev:
Thorbjoern Ravn Andersen wrote:
Jörn, perhaps you'd like to do a more complete writeup on what should be anticipated?
I'll try to write a comparison of the current SocketAppender behavior vs. my multiplexer behavior but I can't do that right now, maybe this evening... Is that what you meant? Or rather some more info on the Serializer/Deserializer concept?
I was thinking of some of the Oopses you have encountered in production environements and adapted to, which could be beneficial in the official logback implemetation.
Hi guys, I tried to provide some explanations and background informations about the way my multiplex appender is working. http://apps.sourceforge.net/trac/lilith/wiki/MultiplexAppenderBackground Additionally, I collected some pitfalls of other appenders. http://apps.sourceforge.net/trac/lilith/wiki/AppenderPitfalls I'm not sure if I succeeded in explaining the complexity of my appender but I tried my best, for sure ;) Joern.

Joern Huxhorn skrev:
I tried to provide some explanations and background informations about the way my multiplex appender is working.
Could you say why you've named it multiplex appender? I've missed it if you said it already? Perhaps because it can send to multiple receivers?
http://apps.sourceforge.net/trac/lilith/wiki/MultiplexAppenderBackground
Additionally, I collected some pitfalls of other appenders.
http://apps.sourceforge.net/trac/lilith/wiki/AppenderPitfalls
I'm not sure if I succeeded in explaining the complexity of my appender but I tried my best, for sure ;)
If I understand your writeup correctly there are three basic issues you are dealing with: * detecting if a receiver goes away * flattening events at the proper time to ensure that the contents is right when deserialized * buffering events to ensure the application can run at full speed independently from the speed of the receivers The first one, should it be enough to detect a broken TCP/IP connection? I am still pondering on a language agnostic receiver. The reason for the XML being uninteresting was because it was much more verbose than the plain serialised byte object? Would a sufficiently terse xml-dialect be interesting? I was thinking of having one-character node names and attribute names? (and moving the namespace outside the fragments). -- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

Hi Thorbjoern. On 28.02.2009, at 10:50, Thorbjoern Ravn Andersen wrote:
Joern Huxhorn skrev:
I tried to provide some explanations and background informations about the way my multiplex appender is working.
Could you say why you've named it multiplex appender? I've missed it if you said it already? Perhaps because it can send to multiple receivers?
Oh yes, that's the reason. Funny that I failed to mention it ... :)
http://apps.sourceforge.net/trac/lilith/wiki/MultiplexAppenderBackground
Additionally, I collected some pitfalls of other appenders.
http://apps.sourceforge.net/trac/lilith/wiki/AppenderPitfalls
I'm not sure if I succeeded in explaining the complexity of my appender but I tried my best, for sure ;)
If I understand your writeup correctly there are three basic issues you are dealing with:
* detecting if a receiver goes away
It's only a problem if a receiver leaves in a dirty way, e.g. unplugging the network cable or putting the computer to sleep, instead of closing the stream/app first. If the stream is properly closed, the appender receives an IOException when it's trying to use the stream the next time. Otherwise, the appender will hang while writing.
* flattening events at the proper time to ensure that the contents is right when deserialized * buffering events to ensure the application can run at full speed independently from the speed of the receivers
Yes, exactly. We are routinely logging certain classes to > 10 receivers - while in production!!
The first one, should it be enough to detect a broken TCP/IP connection?
I've found no other way beside the TimeoutOutputStream that I'm using now. Any other idea?
I am still pondering on a language agnostic receiver. The reason for the XML being uninteresting was because it was much more verbose than the plain serialised byte object?
I wouldn't call it uninteresting ;) It's just more expensive to create such events so I'd only use it if I have to. The main objectives of my XML schema are: - lossless conversion It must be possible to create the exact same event from the XML that was used to create it in the first place. This isn't the case wit the log4j XML because it flattens the NDC. Beside that, if used in conjunction with Logback, it destroys the messagePattern and arguments of the original event. I plan to use those for translation of log messages. - readable by a human being otherwise there's no real point to use XML. I could just define a binary format, right?
Would a sufficiently terse xml-dialect be interesting? I was thinking of having one-character node names and attribute names? (and moving the namespace outside the fragments).
I'm not sure about that. Thinking about a binary format would probably be more worthwhile... Joern.

Joern Huxhorn skrev:
I've found no other way beside the TimeoutOutputStream that I'm using now. Any other idea?
Not as such. TCP/IP is very robust especially for dealing with these situtations, which is exactly what you are trying to avoid (so normally you would use UDP in such a situation like Quake and other multiplayer games do).
I am still pondering on a language agnostic receiver. The reason for the XML being uninteresting was because it was much more verbose than the plain serialised byte object?
I wouldn't call it uninteresting ;) It's just more expensive to create such events so I'd only use it if I have to. Are they? How come? Perhaps if using XMLEncoder instead of rolling your own :)
otherwise there's no real point to use XML. I could just define a binary format, right?
THe usual reason for using XML instead of a binary format is that you get the parser for free so you can go language agnostic easily.
Would a sufficiently terse xml-dialect be interesting? I was thinking of having one-character node names and attribute names? (and moving the namespace outside the fragments).
I'm not sure about that. Thinking about a binary format would probably be more worthwhile... Binary formats are rather painful to extend at a later date. Just see how hard it is even with help from the serialization modules.
You may remember that I am in a situation where our production servers are inaccessible and where I want our logs to be both humanly readable as well as reprocessable. I would be very interested in defining a very terse XML dialect for this purpose, as Ceki has demonstrated earlier that my needs cannot be fullfilled by the log4j dtd. I see a good need for a production strength "xml receiver -> sfl4j events" added to the slf4j-ext package (or so) - would it licensewise be possible to adapt your work into this? -- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

On 28.02.2009, at 21:44, Thorbjoern Ravn Andersen wrote:
I am still pondering on a language agnostic receiver. The reason for the XML being uninteresting was because it was much more verbose than the plain serialised byte object?
I wouldn't call it uninteresting ;) It's just more expensive to create such events so I'd only use it if I have to. Are they? How come? Perhaps if using XMLEncoder instead of rolling your own :)
I've done some more benchmarking concerning purely serialization/ deserialization without any disk I/O, i.e. using just byte[]. http://apps.sourceforge.net/trac/lilith/ticket/28 the last table at the bottom. That's actually quite interesting... I didn't expect that using my own XML Serializer + compression would actually result in smaller data size than using java serialization + compression. Even uncompressed, my Serializer doesn't produce much more bytes than java serialization.... I didn't expect that... However, creation and handling of XML is still much slower than pure java serialization - which doesn't surprise me at all. My own implementation (using StAX) is a lot faster than the generic java.beans.XML one, though.
Would a sufficiently terse xml-dialect be interesting? I was thinking of having one-character node names and attribute names? (and moving the namespace outside the fragments).
I'm not sure about that. Thinking about a binary format would probably be more worthwhile... Binary formats are rather painful to extend at a later date. Just see how hard it is even with help from the serialization modules.
The main problem with java serialization is IMHO that it's not possible to *somehow* load older versions of a class. If you change a class and change it's serialVersionUID then you have no chance to load any previously serialized objects. No chance at all. That's quite evil... To do something like this one would need to reimplement the class, with a different name, and implement conversion from old to new.
You may remember that I am in a situation where our production servers are inaccessible and where I want our logs to be both humanly readable as well as reprocessable. I would be very interested in defining a very terse XML dialect for this purpose, as Ceki has demonstrated earlier that my needs cannot be fullfilled by the log4j dtd.
Have you had the time to check my xml format? While it's not exactly terse ;) it's definitely human-readable.
I see a good need for a production strength "xml receiver -> sfl4j events" added to the slf4j-ext package (or so) - would it licensewise be possible to adapt your work into this?
My code is licensed under LGPL v3 so this shouldn't be a problem. Since I'm the only developer I could grant any license, anyway. I'm not sure what you mean with "slf4j events", though. You mean "logback" instead of "slf4j", right?

Joern Huxhorn skrev:
On 28.02.2009, at 21:44, Thorbjoern Ravn Andersen wrote:
I am still pondering on a language agnostic receiver. The reason for the XML being uninteresting was because it was much more verbose than the plain serialised byte object?
I wouldn't call it uninteresting ;) It's just more expensive to create such events so I'd only use it if I have to. Are they? How come? Perhaps if using XMLEncoder instead of rolling your own :)
I've done some more benchmarking concerning purely serialization/deserialization without any disk I/O, i.e. using just byte[].
http://apps.sourceforge.net/trac/lilith/ticket/28 the last table at the bottom.
That's actually quite interesting... I didn't expect that using my own XML Serializer + compression would actually result in smaller data size than using java serialization + compression. Even uncompressed, my Serializer doesn't produce much more bytes than java serialization.... I didn't expect that...
However, creation and handling of XML is still much slower than pure java serialization - which doesn't surprise me at all. My own implementation (using StAX) is a lot faster than the generic java.beans.XML one, though.
I have seen your data and it is interesting. If you want compression you need something lighter than the default setting of the gzip compressor, as even the lightest setting give reasonable results for repetitive data without using much time. Why use StAX - you need a SAX parser or something more sophisticated? I am unfamiliar with the project.
Would a sufficiently terse xml-dialect be interesting? I was thinking of having one-character node names and attribute names? (and moving the namespace outside the fragments).
I'm not sure about that. Thinking about a binary format would probably be more worthwhile... Binary formats are rather painful to extend at a later date. Just see how hard it is even with help from the serialization modules.
The main problem with java serialization is IMHO that it's not possible to *somehow* load older versions of a class. If you change a class and change it's serialVersionUID then you have no chance to load any previously serialized objects. No chance at all. That's quite evil... To do something like this one would need to reimplement the class, with a different name, and implement conversion from old to new.
Since the serialization mechanism in java relies on the exact ordering and type of each field in order to generate and interpret the byte stream without any additional information in the byte stream but the raw values, it makes sense to me that it will break if the class signature change. It would not be hard to use an interface internally and let multiple versions of the class to serialize implement it. It is not an evil, it is just a trade off :)
You may remember that I am in a situation where our production servers are inaccessible and where I want our logs to be both humanly readable as well as reprocessable. I would be very interested in defining a very terse XML dialect for this purpose, as Ceki has demonstrated earlier that my needs cannot be fullfilled by the log4j dtd.
Have you had the time to check my xml format? While it's not exactly terse ;) it's definitely human-readable. I think I have understood what the problem with my use of the "humanly readable" term is. It is not the actual XML with names and tags, but the data carried I am talking about. Timestamps are fine but is not really readable to a human.
I can easily live with one character tag and attribute names if the data in the fields carry meaning to the human eye :)
I see a good need for a production strength "xml receiver -> sfl4j events" added to the slf4j-ext package (or so) - would it licensewise be possible to adapt your work into this?
My code is licensed under LGPL v3 so this shouldn't be a problem. Since I'm the only developer I could grant any license, anyway. I'm not sure what you mean with "slf4j events", though. You mean "logback" instead of "slf4j", right? No, in the above I actually mean slf4j events. A receiver which accepts incoming events and throw them straight into a slf4j log.debug(....) statement (or whatever level was used) - which to me would be a generic way to glue two separate platforms together.
-- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

On 01.03.2009, at 18:39, Thorbjoern Ravn Andersen wrote:
Joern Huxhorn skrev:
On 28.02.2009, at 21:44, Thorbjoern Ravn Andersen wrote:
I am still pondering on a language agnostic receiver. The reason for the XML being uninteresting was because it was much more verbose than the plain serialised byte object?
I wouldn't call it uninteresting ;) It's just more expensive to create such events so I'd only use it if I have to. Are they? How come? Perhaps if using XMLEncoder instead of rolling your own :)
I've done some more benchmarking concerning purely serialization/ deserialization without any disk I/O, i.e. using just byte[].
http://apps.sourceforge.net/trac/lilith/ticket/28 the last table at the bottom.
That's actually quite interesting... I didn't expect that using my own XML Serializer + compression would actually result in smaller data size than using java serialization + compression. Even uncompressed, my Serializer doesn't produce much more bytes than java serialization.... I didn't expect that...
However, creation and handling of XML is still much slower than pure java serialization - which doesn't surprise me at all. My own implementation (using StAX) is a lot faster than the generic java.beans.XML one, though.
I have seen your data and it is interesting. If you want compression you need something lighter than the default setting of the gzip compressor, as even the lightest setting give reasonable results for repetitive data without using much time.
Why use StAX - you need a SAX parser or something more sophisticated? I am unfamiliar with the project.
Well, I just like the API. See http://www.developer.com/xml/article.php/3397691 for a comparison. While I don't use the skip-feature right now, I like to have that option. I dislike everything that's using a DOM because has to read the complete XML and keep it in memory... all in all, it's just a matter of personal preference... ;)
Would a sufficiently terse xml-dialect be interesting? I was thinking of having one-character node names and attribute names? (and moving the namespace outside the fragments).
I'm not sure about that. Thinking about a binary format would probably be more worthwhile... Binary formats are rather painful to extend at a later date. Just see how hard it is even with help from the serialization modules.
The main problem with java serialization is IMHO that it's not possible to *somehow* load older versions of a class. If you change a class and change it's serialVersionUID then you have no chance to load any previously serialized objects. No chance at all. That's quite evil... To do something like this one would need to reimplement the class, with a different name, and implement conversion from old to new.
Since the serialization mechanism in java relies on the exact ordering and type of each field in order to generate and interpret the byte stream without any additional information in the byte stream but the raw values, it makes sense to me that it will break if the class signature change. It would not be hard to use an interface internally and let multiple versions of the class to serialize implement it. It is not an evil, it is just a trade off :)
Yes, I understand the reasons for the situation but it doesn't make it easy to read serialized data of previous versions of the class. The easiest way to do this would probably be the use of versioned classes, e.g. LoggingEventVO0916. So if an incompatible change would occur in lets say 0.9.18 a new LoggingEventVO0918 would be created, without any relationship to the previous one. That would have to stay exactly the same. The deserializer could then do checks using instanceof and convert accordingly (i.e. convert "old" to "new"). This would prohibit the implementation of an interface, though, because that would mean that the old class would need to be changed to support the new data. While this would be possible using empty stubs I'm not sure how elegant that would be :p I'm not sure about the elegance of this whole approach, either. I'm just brainstorming a bit... One *big* downside would be that the value objects, as well as all contained value objects, would have to have such a version in it's name :p Ceki, are you still with us? What's your plan to support the deserialization of previous versions of the VO class? As a general comment, it has been proven to be really helpful to use a general interface for serialization. http://apps.sourceforge.net/trac/sulky/browser/trunk/sulky-generics/src/main... That way, the whole logic of transforming a given object to an arbitrary byte array is entirely detached from the class itself. It would also remove the need that LoggingEvent is aware of persisting at all. The responsibility is entirely in the Serializer/Deserializer implementation, i.e. there's no need for getLoggerContextVO or similar methods.
You may remember that I am in a situation where our production servers are inaccessible and where I want our logs to be both humanly readable as well as reprocessable. I would be very interested in defining a very terse XML dialect for this purpose, as Ceki has demonstrated earlier that my needs cannot be fullfilled by the log4j dtd.
Have you had the time to check my xml format? While it's not exactly terse ;) it's definitely human-readable. I think I have understood what the problem with my use of the "humanly readable" term is. It is not the actual XML with names and tags, but the data carried I am talking about. Timestamps are fine but is not really readable to a human.
I can easily live with one character tag and attribute names if the data in the fields carry meaning to the human eye :)
Well, I'm using yyyy-MM-dd'T'HH:mm:ss.SSSZ as the primary time stamp format and then apply some magic to change it into a valid xml timestamp, i.e. I add a colon between hh and mm of the timezone. It mad my sigh quite a bit when I realized that SimpleDateFormat did not support this...
I see a good need for a production strength "xml receiver -> sfl4j events" added to the slf4j-ext package (or so) - would it licensewise be possible to adapt your work into this?
My code is licensed under LGPL v3 so this shouldn't be a problem. Since I'm the only developer I could grant any license, anyway. I'm not sure what you mean with "slf4j events", though. You mean "logback" instead of "slf4j", right? No, in the above I actually mean slf4j events. A receiver which accepts incoming events and throw them straight into a slf4j log.debug(....) statement (or whatever level was used) - which to me would be a generic way to glue two separate platforms together.
But slf4j doesn't really define the events, does it? Joern.

Joern Huxhorn skrev:
Why use StAX - you need a SAX parser or something more sophisticated? I am unfamiliar with the project.
Well, I just like the API. See http://www.developer.com/xml/article.php/3397691 for a comparison. While I don't use the skip-feature right now, I like to have that option. I dislike everything that's using a DOM because has to read the complete XML and keep it in memory... all in all, it's just a matter of personal preference... ;)
FIne with me. DOM's don't scale well, and one of the tricks to speed up XSLT's is to figure out when stuff can be done (before readign all the data in) which is a science in itself.
Since the serialization mechanism in java relies on the exact ordering and type of each field in order to generate and interpret the byte stream without any additional information in the byte stream but the raw values, it makes sense to me that it will break if the class signature change. It would not be hard to use an interface internally and let multiple versions of the class to serialize implement it. It is not an evil, it is just a trade off :)
Yes, I understand the reasons for the situation but it doesn't make it easy to read serialized data of previous versions of the class. Then you need customized serializer and deserializer code, and a version tag in the stream.
That way, the whole logic of transforming a given object to an arbitrary byte array is entirely detached from the class itself. It would also remove the need that LoggingEvent is aware of persisting at all. The responsibility is entirely in the Serializer/Deserializer implementation, i.e. there's no need for getLoggerContextVO or similar methods.
It is still a painpoint. But the version number in the data stream is no worse than the serialID in the stream anyway :)
Have you had the time to check my xml format? While it's not exactly terse ;) it's definitely human-readable. I think I have understood what the problem with my use of the "humanly readable" term is. It is not the actual XML with names and tags, but the data carried I am talking about. Timestamps are fine but is not really readable to a human.
I can easily live with one character tag and attribute names if the data in the fields carry meaning to the human eye :)
Well, I'm using yyyy-MM-dd'T'HH:mm:ss.SSSZ as the primary time stamp format and then apply some magic to change it into a valid xml timestamp, i.e. I add a colon between hh and mm of the timezone. It mad my sigh quite a bit when I realized that SimpleDateFormat did not support this...
A valid xml timestamp? According to what dialect? SOAP? Frankly I think that these date objects are expensive to reconstruct - perhaps you should look into this if performance is an issue to you? I believe the date object just serializes the long datestamp which is much cheaper to deserialize than to deconstruct a string into pieces and build a date/calendar object from it. Note you also need timestamp information for this so it most likely goes through a Calendar.
I see a good need for a production strength "xml receiver -> sfl4j events" added to the slf4j-ext package (or so) - would it licensewise be possible to adapt your work into this?
My code is licensed under LGPL v3 so this shouldn't be a problem. Since I'm the only developer I could grant any license, anyway. I'm not sure what you mean with "slf4j events", though. You mean "logback" instead of "slf4j", right? No, in the above I actually mean slf4j events. A receiver which accepts incoming events and throw them straight into a slf4j log.debug(....) statement (or whatever level was used) - which to me would be a generic way to glue two separate platforms together.
But slf4j doesn't really define the events, does it? You have a point there. It doesn't. It is the logging backend that does :)
Sigh. I just hoped for a low entry solution here. Back to the brainstorming tank. -- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

On 01.03.2009, at 22:18, Thorbjoern Ravn Andersen wrote:
Joern Huxhorn skrev:
Why use StAX - you need a SAX parser or something more sophisticated? I am unfamiliar with the project.
Well, I just like the API. See http://www.developer.com/xml/article.php/3397691 for a comparison. While I don't use the skip-feature right now, I like to have that option. I dislike everything that's using a DOM because has to read the complete XML and keep it in memory... all in all, it's just a matter of personal preference... ;)
FIne with me. DOM's don't scale well, and one of the tricks to speed up XSLT's is to figure out when stuff can be done (before readign all the data in) which is a science in itself.
It's also impossible to read an invalid XML if you are using DOM. I'd like to be able to read an XML file which is only missing the last closing tag, for example.
Since the serialization mechanism in java relies on the exact ordering and type of each field in order to generate and interpret the byte stream without any additional information in the byte stream but the raw values, it makes sense to me that it will break if the class signature change. It would not be hard to use an interface internally and let multiple versions of the class to serialize implement it. It is not an evil, it is just a trade off :)
Yes, I understand the reasons for the situation but it doesn't make it easy to read serialized data of previous versions of the class. Then you need customized serializer and deserializer code, and a version tag in the stream.
That way, the whole logic of transforming a given object to an arbitrary byte array is entirely detached from the class itself. It would also remove the need that LoggingEvent is aware of persisting at all. The responsibility is entirely in the Serializer/ Deserializer implementation, i.e. there's no need for getLoggerContextVO or similar methods.
It is still a painpoint. But the version number in the data stream is no worse than the serialID in the stream anyway :)
Well, a stream could contain both LoggingEvent0916 and LoggingEvent0918 and the Deserializer would produce the correct output using instanceof and some defined conversion. The point is that the old class would not be lost as it is the case if just the serialVersionUID is changed. It could therefore still be deserialized.
Have you had the time to check my xml format? While it's not exactly terse ;) it's definitely human-readable. I think I have understood what the problem with my use of the "humanly readable" term is. It is not the actual XML with names and tags, but the data carried I am talking about. Timestamps are fine but is not really readable to a human.
I can easily live with one character tag and attribute names if the data in the fields carry meaning to the human eye :)
Well, I'm using yyyy-MM-dd'T'HH:mm:ss.SSSZ as the primary time stamp format and then apply some magic to change it into a valid xml timestamp, i.e. I add a colon between hh and mm of the timezone. It mad my sigh quite a bit when I realized that SimpleDateFormat did not support this...
A valid xml timestamp? According to what dialect? SOAP? Frankly I think that these date objects are expensive to reconstruct - perhaps you should look into this if performance is an issue to you? I believe the date object just serializes the long datestamp which is much cheaper to deserialize than to deconstruct a string into pieces and build a date/calendar object from it. Note you also need timestamp information for this so it most likely goes through a Calendar.
Point taken, I wasn't really thinking about that when I was implementing it, I primarily wanted to do it *right* ;) It's an xs:dateTime (http://www.w3.org/TR/xmlschema-2/#dateTime) which seemed reasonable to me. xs:dateTime expects a colon between timezone hh and mm while SimpleDateFormat does only support hhmm :p I implemented your suggestion and it's - unsurprisingly - deserializing quite a bit faster, now. By default, I write both timeStamp (formatted) and timeStampMillis but I only evaluate timeStampMillis if it's available and a long, otherwise I evaluate timeStamp as before. Now I just need to update the schema... and I really need to implement an EventWrapper Serializer that's using my xml so I can use it as the Lilith file format...! Performance and size is absolutely acceptable. I moved the benchmarks to http://apps.sourceforge.net/trac/lilith/wiki/SerializationPerformance so I don't have to change the closed bug all the time :p
I see a good need for a production strength "xml receiver -> sfl4j events" added to the slf4j-ext package (or so) - would it licensewise be possible to adapt your work into this?
My code is licensed under LGPL v3 so this shouldn't be a problem. Since I'm the only developer I could grant any license, anyway. I'm not sure what you mean with "slf4j events", though. You mean "logback" instead of "slf4j", right? No, in the above I actually mean slf4j events. A receiver which accepts incoming events and throw them straight into a slf4j log.debug(....) statement (or whatever level was used) - which to me would be a generic way to glue two separate platforms together.
But slf4j doesn't really define the events, does it? You have a point there. It doesn't. It is the logging backend that does :)
Sigh. I just hoped for a low entry solution here. Back to the brainstorming tank.
While such a bridge would certainly have it's use case, quite some information would be lost while tunneling the events if they are tunneled using the logger interface. Thread, code location etc... Joern.

Joern Huxhorn skrev:
Well, a stream could contain both LoggingEvent0916 and LoggingEvent0918 and the Deserializer would produce the correct output using instanceof and some defined conversion. The point is that the old class would not be lost as it is the case if just the serialVersionUID is changed. It could therefore still be deserialized. After thinking it over I believe that this is the exact reason why XMLEncoder/XMLDecoder was introduced in the first place - the ability to read in old classes.
I don't think that it is possible to get to work _well_ but I'm always happy for a good discussion.
Have you had the time to check my xml format? While it's not exactly terse ;) it's definitely human-readable. I think I have understood what the problem with my use of the "humanly readable" term is. It is not the actual XML with names and tags, but the data carried I am talking about. Timestamps are fine but is not really readable to a human.
I can easily live with one character tag and attribute names if the data in the fields carry meaning to the human eye :)
Well, I'm using yyyy-MM-dd'T'HH:mm:ss.SSSZ as the primary time stamp format and then apply some magic to change it into a valid xml timestamp, i.e. I add a colon between hh and mm of the timezone. It mad my sigh quite a bit when I realized that SimpleDateFormat did not support this...
A valid xml timestamp? According to what dialect? SOAP? Frankly I think that these date objects are expensive to reconstruct - perhaps you should look into this if performance is an issue to you? I believe the date object just serializes the long datestamp which is much cheaper to deserialize than to deconstruct a string into pieces and build a date/calendar object from it. Note you also need timestamp information for this so it most likely goes through a Calendar.
Point taken, I wasn't really thinking about that when I was implementing it, I primarily wanted to do it *right* ;)
It's an xs:dateTime (http://www.w3.org/TR/xmlschema-2/#dateTime) which seemed reasonable to me. xs:dateTime expects a colon between timezone hh and mm while SimpleDateFormat does only support hhmm :p
I implemented your suggestion and it's - unsurprisingly - deserializing quite a bit faster, now. Hehe, if I read your tables correctly the lilithXmlUncompressedDeserializer went from 1.974544 sec to 0.497055 sec, which is four times faster so "quite a bit" is quite a bit of an understatement :)
Also the value is very close to the 0.493445 value for serializationUncompressedDeserializer so it appears that the two approaches have about the same runtime performance.
Now I just need to update the schema... and I really need to implement an EventWrapper Serializer that's using my xml so I can use it as the Lilith file format...! Performance and size is absolutely acceptable.
Now I just need to bully you into shortening all tags and attribute names into one character and skip any namespace prefixes :) Then we agree O:-)
I moved the benchmarks to http://apps.sourceforge.net/trac/lilith/wiki/SerializationPerformance so I don't have to change the closed bug all the time :p You _could_ also just reopen the bug :)
-- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

I think I've lost the point of this discussion somewhere. The subject says something about submitting events remotely yet this discussion seems to be totally about serialization. If it is really about something like a "service" to submit events than I would suggest looking at Spring remoting and some of the protocols it supports - such as Hessian or Burlap. I would argue that a discussion about how best to serialize an object is pointless without having first decided on what the service API is. For example, are you presuming that one system will log to an Appender that will forward to a server that will turn around and log the event again? Or perhaps an Appender would just forward the event to an Appender on the remote system? Or, using Spring Remoting one could imagine that the local Appender is just a client stub generated by Spring forwarding to the "real" Appender somewhere else. Ralph On Mar 1, 2009, at 4:23 PM, Thorbjoern Ravn Andersen wrote:
Joern Huxhorn skrev:
Well, a stream could contain both LoggingEvent0916 and LoggingEvent0918 and the Deserializer would produce the correct output using instanceof and some defined conversion. The point is that the old class would not be lost as it is the case if just the serialVersionUID is changed. It could therefore still be deserialized. After thinking it over I believe that this is the exact reason why XMLEncoder/XMLDecoder was introduced in the first place - the ability to read in old classes.
I don't think that it is possible to get to work _well_ but I'm always happy for a good discussion.
Have you had the time to check my xml format? While it's not exactly terse ;) it's definitely human-readable. I think I have understood what the problem with my use of the "humanly readable" term is. It is not the actual XML with names and tags, but the data carried I am talking about. Timestamps are fine but is not really readable to a human.
I can easily live with one character tag and attribute names if the data in the fields carry meaning to the human eye :)
Well, I'm using yyyy-MM-dd'T'HH:mm:ss.SSSZ as the primary time stamp format and then apply some magic to change it into a valid xml timestamp, i.e. I add a colon between hh and mm of the timezone. It mad my sigh quite a bit when I realized that SimpleDateFormat did not support this...
A valid xml timestamp? According to what dialect? SOAP? Frankly I think that these date objects are expensive to reconstruct - perhaps you should look into this if performance is an issue to you? I believe the date object just serializes the long datestamp which is much cheaper to deserialize than to deconstruct a string into pieces and build a date/calendar object from it. Note you also need timestamp information for this so it most likely goes through a Calendar.
Point taken, I wasn't really thinking about that when I was implementing it, I primarily wanted to do it *right* ;)
It's an xs:dateTime (http://www.w3.org/TR/xmlschema-2/#dateTime) which seemed reasonable to me. xs:dateTime expects a colon between timezone hh and mm while SimpleDateFormat does only support hhmm :p
I implemented your suggestion and it's - unsurprisingly - deserializing quite a bit faster, now. Hehe, if I read your tables correctly the lilithXmlUncompressedDeserializer went from 1.974544 sec to 0.497055 sec, which is four times faster so "quite a bit" is quite a bit of an understatement :)
Also the value is very close to the 0.493445 value for serializationUncompressedDeserializer so it appears that the two approaches have about the same runtime performance.
Now I just need to update the schema... and I really need to implement an EventWrapper Serializer that's using my xml so I can use it as the Lilith file format...! Performance and size is absolutely acceptable.
Now I just need to bully you into shortening all tags and attribute names into one character and skip any namespace prefixes :) Then we agree O:-)
I moved the benchmarks to http://apps.sourceforge.net/trac/lilith/wiki/SerializationPerformance so I don't have to change the closed bug all the time :p You _could_ also just reopen the bug :)
-- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"
_______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

Ralph Goers skrev:
I think I've lost the point of this discussion somewhere. The subject says something about submitting events remotely yet this discussion seems to be totally about serialization. If it is really about something like a "service" to submit events than I would suggest looking at Spring remoting and some of the protocols it supports - such as Hessian or Burlap. I would argue that a discussion about how best to serialize an object is pointless without having first decided on what the service API is. For example, are you presuming that one system will log to an Appender that will forward to a server that will turn around and log the event again? Or perhaps an Appender would just forward the event to an Appender on the remote system? Or, using Spring Remoting one could imagine that the local Appender is just a client stub generated by Spring forwarding to the "real" Appender somewhere else. I think the reason was that I asked Jörn to share his experiences with the appenders in Lilith and that I could not understand his conclusion :)
What I am trying to get to is a simple way to "magically" transport a logging event from one instance of logback to another, where it would be processed with filters etc as any other event originated on the instance itself. The platform agnosticity implies that Java serialization is not trivial to use, hence the discussion with Jörn... Would Spring Remoting imply that Java is required? -- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

On Mar 1, 2009, at 11:38 PM, Thorbjoern Ravn Andersen wrote:
Ralph Goers skrev:
I think I've lost the point of this discussion somewhere. The subject says something about submitting events remotely yet this discussion seems to be totally about serialization. If it is really about something like a "service" to submit events than I would suggest looking at Spring remoting and some of the protocols it supports - such as Hessian or Burlap. I would argue that a discussion about how best to serialize an object is pointless without having first decided on what the service API is. For example, are you presuming that one system will log to an Appender that will forward to a server that will turn around and log the event again? Or perhaps an Appender would just forward the event to an Appender on the remote system? Or, using Spring Remoting one could imagine that the local Appender is just a client stub generated by Spring forwarding to the "real" Appender somewhere else. I think the reason was that I asked Jörn to share his experiences with the appenders in Lilith and that I could not understand his conclusion :)
What I am trying to get to is a simple way to "magically" transport a logging event from one instance of logback to another, where it would be processed with filters etc as any other event originated on the instance itself. The platform agnosticity implies that Java serialization is not trivial to use, hence the discussion with Jörn...
Would Spring Remoting imply that Java is required?
Not necessarily. It depends on the transport protocol you use. For example, both Hessian (http://hessian.caucho.com/) and Burlap http://hessian.caucho.com/doc/burlap.xtp) are protocols that could be used from any language. If you look at Hessian you will see that Caucho even provides implementations for many languages. Spring remoting leverages the Caucho Java classes under the covers. Seehttp://static.springframework.org/spring/docs/2.5.x/reference/remoting.html . Ralph

Hello, Yesterday, I created a logging.proto file for serializing a Logback LoggingEvent. Still have to do the benchmarks, but based on [1] and [2] I expect it to be faster then java serialization. [1] http://www.eishay.com/2008/11/serialization-protobuf-vs-thrift-vs.html [2] http://www.eishay.com/2008/11/protobuf-with-option-optimize-for-speed.html WDYT ? Maarten On Mon, Mar 2, 2009 at 8:58 AM, Ralph Goers <rgoers@apache.org> wrote:
On Mar 1, 2009, at 11:38 PM, Thorbjoern Ravn Andersen wrote:
Ralph Goers skrev:
I think I've lost the point of this discussion somewhere. The subject says something about submitting events remotely yet this discussion seems to be totally about serialization. If it is really about something like a "service" to submit events than I would suggest looking at Spring remoting and some of the protocols it supports - such as Hessian or Burlap. I would argue that a discussion about how best to serialize an object is pointless without having first decided on what the service API is. For example, are you presuming that one system will log to an Appender that will forward to a server that will turn around and log the event again? Or perhaps an Appender would just forward the event to an Appender on the remote system? Or, using Spring Remoting one could imagine that the local Appender is just a client stub generated by Spring forwarding to the "real" Appender somewhere else.
I think the reason was that I asked Jörn to share his experiences with the appenders in Lilith and that I could not understand his conclusion :)
What I am trying to get to is a simple way to "magically" transport a logging event from one instance of logback to another, where it would be processed with filters etc as any other event originated on the instance itself. The platform agnosticity implies that Java serialization is not trivial to use, hence the discussion with Jörn...
Would Spring Remoting imply that Java is required?
Not necessarily. It depends on the transport protocol you use. For example, both Hessian (http://hessian.caucho.com/) and Burlap http://hessian.caucho.com/doc/burlap.xtp) are protocols that could be used from any language. If you look at Hessian you will see that Caucho even provides implementations for many languages. Spring remoting leverages the Caucho Java classes under the covers. Seehttp:// static.springframework.org/spring/docs/2.5.x/reference/remoting.html.
Ralph
_______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

Maarten Bosteels wrote:
Hello,
Yesterday, I created a logging.proto file for serializing a Logback LoggingEvent. Still have to do the benchmarks, but based on [1] and [2] I expect it to be faster then java serialization.
[1] http://www.eishay.com/2008/11/serialization-protobuf-vs-thrift-vs.html [2] http://www.eishay.com/2008/11/protobuf-with-option-optimize-for-speed.html
WDYT ?
Maarten
That looks very interesting, especially [2]! I didn't know about Protobuf and Thrift. Care to share the proto file? Joern.

On Mon, Mar 2, 2009 at 10:34 AM, Joern Huxhorn <jhuxhorn@googlemail.com>wrote:
Maarten Bosteels wrote:
Hello,
Yesterday, I created a logging.proto file for serializing a Logback LoggingEvent. Still have to do the benchmarks, but based on [1] and [2] I expect it to be faster then java serialization.
[1] http://www.eishay.com/2008/11/serialization-protobuf-vs-thrift-vs.html [2]
http://www.eishay.com/2008/11/protobuf-with-option-optimize-for-speed.html
WDYT ?
Maarten
That looks very interesting, especially [2]! I didn't know about Protobuf and Thrift. Care to share the proto file?
Sure, will send it to list tonight (haven't got access to it right now). As I wrote earlier, I think protobuf (and Thrift) have some very cool features: * you can update messages without breaking old code [a] * faster than XML * smaller than XML * binary, but with a convenient human-readable representation for debugging and editing * language-neutral, platform-neutral [a] http://code.google.com/apis/protocolbuffers/docs/proto.html#updating Maarten
Joern. _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

Hello all, Several fairly broad topics have been mentioned in this thread. One is communication strategies which can deal with arbitrary network or host failures. Another is the RPC mechanism, yet another is the question of data encoding which prefarably would be easy-to-use, language-agnostic, resistant to changes in the data (long term version stability), fast, small and human-readable. Those who still beleive in Santa would like to have all these properties at the same time. Each of these topics is very broad on its own. I think that focusing on a single topic at a time will improve the chances of a getting a concrete result from the discussion. Regarding plain old java serialization, I think it is more robust than what many people think. As long as the serialVersionUID of the writer (let's call it W), and the reader (call it R) are the same, java serialization can handle the addition of field in W missing in R, as well as missing fields in W added in R. In many respects java serialization is more robust than XML, especially if you turn on DTD validation in XML. I found the object serialization examples in [1] and especially [2] very useful. [1] http://java.sun.com/j2se/1.4.2/docs/guide/serialization/examples/index.html [2] http://java.sun.com/j2se/1.4.2/docs/guide/serialization/examples/evolveseria... Logback takes advantage of the optimized way java serialization deals with object references. Many references contained in a LoggingEvent (now LoggingEventVO) are transferred by referece instead of by value. I was under the impression that serialization of LoggingEvents was already very fast. However, if we can significantly improve on the existing, so much the better. As there are certainly pros and cons for each serialization approach, instead of the debate eventually degenerating into a religious argument, we are likely to be better served by basing comparisons on the same logging event data collection, which in the compression world is called a "corpus". At present time, we do not have a logging event corpus. Just as importantly, logback currently lacks a format for storing the said corpus. Given that this corpus will serve as a yardstick for a long time, and performance is not an issue, a human readable text format such as XML seems like a reasonable choice. Is anyone interested in providing a corpus? Is anyone interested in writing a LoggingEventVO to XML and XML to LoggingEvent converter(s)? To answer some of Joern's questions, I would like to note that logback-classic, and in particular ILoggingEvent, does not contain any java serialization-specific methods. Serialization is now an appender issue not handled by ILoggingEvent. Regarding old events, I would very much like to come up with a strategy for supporting "old" versions of serialized logging events. For example, by having versioned serializer/deserializers, e.g. LoggingEvent0916 and LoggingEvent0918. It's a highly technical issue which would need to be decided by actual implementation. Having said that, defining a corpus seems to me as being the most pressing issue at this time. Once we settle on a corpus, we can more objectively debate the merits of such and such serialization strategy. More below. Maarten Bosteels wrote:
Sure, will send it to list tonight (haven't got access to it right now).
As I wrote earlier, I think protobuf (and Thrift) have some very cool features: * you can update messages without breaking old code [a] * faster than XML * smaller than XML * binary, but with a convenient human-readable representation for debugging and editing * language-neutral, platform-neutral
It looks to good to be true. Does Santa exist after all?
[a] http://code.google.com/apis/protocolbuffers/docs/proto.html#updating
Maarten
-- Ceki Gülcü Logback: The reliable, generic, fast and flexible logging framework for Java. http://logback.qos.ch

The only response I really can have to this is that it is focusing on the wrong thing. If the desire is to allow Logback to accept remote LogEvents then would should be identified first is the service that would be used. Once you have that you will determine just how you want to implement the service. For that matter, it might be decided that choose to accept multiple ways to connect. This is the only place serialization should come into the picture - simply as a byproduct of what you choose to expose. And in many cases users of the service shouldn't or wouldn't even know what serialization technique is being used. They just call the client API passing in their language's equivalent of the LogEvent. Ralph On Mar 2, 2009, at 5:21 AM, Ceki Gulcu wrote:
Hello all,
Several fairly broad topics have been mentioned in this thread. One is communication strategies which can deal with arbitrary network or host failures. Another is the RPC mechanism, yet another is the question of data encoding which prefarably would be easy-to-use, language-agnostic, resistant to changes in the data (long term version stability), fast, small and human-readable. Those who still beleive in Santa would like to have all these properties at the same time.
Each of these topics is very broad on its own. I think that focusing on a single topic at a time will improve the chances of a getting a concrete result from the discussion.
Regarding plain old java serialization, I think it is more robust than what many people think. As long as the serialVersionUID of the writer (let's call it W), and the reader (call it R) are the same, java serialization can handle the addition of field in W missing in R, as well as missing fields in W added in R. In many respects java serialization is more robust than XML, especially if you turn on DTD validation in XML.
I found the object serialization examples in [1] and especially [2] very useful.
[1] http://java.sun.com/j2se/1.4.2/docs/guide/serialization/examples/index.html [2] http://java.sun.com/j2se/1.4.2/docs/guide/serialization/examples/evolveseria...
Logback takes advantage of the optimized way java serialization deals with object references. Many references contained in a LoggingEvent (now LoggingEventVO) are transferred by referece instead of by value. I was under the impression that serialization of LoggingEvents was already very fast. However, if we can significantly improve on the existing, so much the better.
As there are certainly pros and cons for each serialization approach, instead of the debate eventually degenerating into a religious argument, we are likely to be better served by basing comparisons on the same logging event data collection, which in the compression world is called a "corpus". At present time, we do not have a logging event corpus. Just as importantly, logback currently lacks a format for storing the said corpus. Given that this corpus will serve as a yardstick for a long time, and performance is not an issue, a human readable text format such as XML seems like a reasonable choice.
Is anyone interested in providing a corpus?
Is anyone interested in writing a LoggingEventVO to XML and XML to LoggingEvent converter(s)?
To answer some of Joern's questions, I would like to note that logback-classic, and in particular ILoggingEvent, does not contain any java serialization-specific methods. Serialization is now an appender issue not handled by ILoggingEvent.
Regarding old events, I would very much like to come up with a strategy for supporting "old" versions of serialized logging events. For example, by having versioned serializer/deserializers, e.g. LoggingEvent0916 and LoggingEvent0918. It's a highly technical issue which would need to be decided by actual implementation.
Having said that, defining a corpus seems to me as being the most pressing issue at this time.
Once we settle on a corpus, we can more objectively debate the merits of such and such serialization strategy.
More below.
Maarten Bosteels wrote:
Sure, will send it to list tonight (haven't got access to it right now). As I wrote earlier, I think protobuf (and Thrift) have some very cool features: * you can update messages without breaking old code [a] * faster than XML * smaller than XML * binary, but with a convenient human-readable representation for debugging and editing * language-neutral, platform-neutral
It looks to good to be true. Does Santa exist after all?
[a] http://code.google.com/apis/protocolbuffers/docs/proto.html#updating Maarten
-- Ceki Gülcü Logback: The reliable, generic, fast and flexible logging framework for Java. http://logback.qos.ch _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

Hello Ralph, In a different context I would have completely agreed with you. In this case however, the service is already known. It's always assumed to be a variant of "send this logging event there," is it not? Ralph Goers wrote:
The only response I really can have to this is that it is focusing on the wrong thing. If the desire is to allow Logback to accept remote LogEvents then would should be identified first is the service that would be used. Once you have that you will determine just how you want to implement the service. For that matter, it might be decided that choose to accept multiple ways to connect. This is the only place serialization should come into the picture - simply as a byproduct of what you choose to expose. And in many cases users of the service shouldn't or wouldn't even know what serialization technique is being used. They just call the client API passing in their language's equivalent of the LogEvent.
Ralph -- Ceki Gülcü Logback: The reliable, generic, fast and flexible logging framework for Java. http://logback.qos.ch

Exactly. We are just trying to find a less fragile approach than serialization for the events sent using SocketAppender (or my multipex-appenders). Ideally, it should be platform- and programming-language-neutral and shouldn't perform much worse than java serialization. protobuf seems to be a ideal candidate from what I've read so far. I'll definitely give it a try. While this is just a subset of "send this logging event there" I had the impression that the question of transport was already solved more or less. There's little alternative to a message based approach (int containing size of data followed by data bytes) if Java serialization is not used. Any kind of RMI will *definitely* be slower than this approach and we should evaluate it as an additional possibility after implementing the high-speed one. Joern. Ceki Gulcu wrote:
Hello Ralph,
In a different context I would have completely agreed with you. In this case however, the service is already known. It's always assumed to be a variant of "send this logging event there," is it not?
Ralph Goers wrote:
The only response I really can have to this is that it is focusing on the wrong thing. If the desire is to allow Logback to accept remote LogEvents then would should be identified first is the service that would be used. Once you have that you will determine just how you want to implement the service. For that matter, it might be decided that choose to accept multiple ways to connect. This is the only place serialization should come into the picture - simply as a byproduct of what you choose to expose. And in many cases users of the service shouldn't or wouldn't even know what serialization technique is being used. They just call the client API passing in their language's equivalent of the LogEvent.
Ralph

On Mar 2, 2009, at 8:33 AM, Joern Huxhorn wrote:
Exactly. We are just trying to find a less fragile approach than serialization for the events sent using SocketAppender (or my multipex-appenders). Ideally, it should be platform- and programming-language-neutral and shouldn't perform much worse than java serialization. protobuf seems to be a ideal candidate from what I've read so far. I'll definitely give it a try.
This is a completely different problem than what the subject line implies. Are you presupposing that the decision to use the SocketAppender to get a LogEvent into a remote Logback has already been made? Even so, what service in the remote system would the Appender call? AFAIK there is nothing in Logback today to handle that . The way I read the subject line is "something" wants to act as a Logback remote client and feed events into a system where Logback is running. The client might be an appender or something completely different.
While this is just a subset of "send this logging event there" I had the impression that the question of transport was already solved more or less. There's little alternative to a message based approach (int containing size of data followed by data bytes) if Java serialization is not used.
Here you seem to be making the classic confusion between messaging and RPC. The only real difference is that RPC contains information about what service to invoke. In actuality many "messaging" systems do the exact same thing. They just embed the name of the service in the message and use their own proprietary logic to figure out what to do with the message based on the data received instead of just using something standard.
Any kind of RMI will *definitely* be slower than this approach and we should evaluate it as an additional possibility after implementing the high-speed one.
Who said anything about RMI? Ralph

Ralph, You are right to note that the subject line is more general than the issue of data encoding, a.k.a data binding. Joern and I seems to be focused on the data binding part of the general problem because as Joern's observes, any generic RPC-type of solution is bound to be slower than a simple TCP socket approach which implies zero protocol overhead. The socket appender approach might not be the most user-friendly or reliable but it is bound to be the fastest. The only remaining issue then (having fixed the transport mechanism to basic TCP socket) is data binding. What is the essence of your objection? Do you think we are wasting our time or is there something else? :-) Ralph Goers wrote:
On Mar 2, 2009, at 8:33 AM, Joern Huxhorn wrote:
Exactly. We are just trying to find a less fragile approach than serialization for the events sent using SocketAppender (or my multipex-appenders). Ideally, it should be platform- and programming-language-neutral and shouldn't perform much worse than java serialization. protobuf seems to be a ideal candidate from what I've read so far. I'll definitely give it a try.
This is a completely different problem than what the subject line implies. Are you presupposing that the decision to use the SocketAppender to get a LogEvent into a remote Logback has already been made? Even so, what service in the remote system would the Appender call? AFAIK there is nothing in Logback today to handle that .
The way I read the subject line is "something" wants to act as a Logback remote client and feed events into a system where Logback is running. The client might be an appender or something completely different.
While this is just a subset of "send this logging event there" I had the impression that the question of transport was already solved more or less. There's little alternative to a message based approach (int containing size of data followed by data bytes) if Java serialization is not used.
Here you seem to be making the classic confusion between messaging and RPC. The only real difference is that RPC contains information about what service to invoke. In actuality many "messaging" systems do the exact same thing. They just embed the name of the service in the message and use their own proprietary logic to figure out what to do with the message based on the data received instead of just using something standard.
Any kind of RMI will *definitely* be slower than this approach and we should evaluate it as an additional possibility after implementing the high-speed one.
Who said anything about RMI?
Ralph _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev
-- Ceki Gülcü Logback: The reliable, generic, fast and flexible logging framework for Java. http://logback.qos.ch

On Mar 2, 2009, at 11:51 AM, Ceki Gulcu wrote:
Ralph,
You are right to note that the subject line is more general than the issue of data encoding, a.k.a data binding. Joern and I seems to be focused on the data binding part of the general problem because as Joern's observes, any generic RPC-type of solution is bound to be slower than a simple TCP socket approach which implies zero protocol overhead. The socket appender approach might not be the most user-friendly or reliable but it is bound to be the fastest. The only remaining issue then (having fixed the transport mechanism to basic TCP socket) is data binding.
What is the essence of your objection? Do you think we are wasting our time or is there something else? :-)
I'm simply trying to understand what problem you are solving. I also think your assumption regarding the socket approach being the fastest is correct only under the simplest of circumstances. Finally, while the SocketAppender definitely clears up the client side, I haven't seen any discussion about what it is connecting to. Frankly, that is the part I find most important and why I'm making any kind of issue out of this. Let's assume you use a SocketAppender and connect to something listening for some kind of serialized LogEvent. What if the client wants to dictate to the server whether it should get control back immediately after the event is received or wait until the event is procesed? What if I want to communicate the locale or timezone of the client to the server? I'm pulling this off the top of my head but hopefully you get the idea that by specifying the service interface that you get more flexibility for enhancement down the road. And the cost of serializing a method where one of the parameters is the LogEvent probably isn't even measturable vs the cost of serializing the event itself. So my point is simply that if I was doing this I would have the SocketAppender use the service interface to establish the connection with whatever parameters are desired and then use the service to invoke the method to send the event. Doing this also allows you to create different appenders that use the same service one of which might be on top of Spring remoting so that any of its remoting implementations could be used. On the server side they would call the same implementation class as what the SocketAppender ends up interacting with. Does that clear it up? Ralph

On Mar 2, 2009, at 11:51 AM, Ceki Gulcu wrote:
Ralph,
You are right to note that the subject line is more general than the issue of data encoding, a.k.a data binding. Joern and I seems to be focused on the data binding part of the general problem because as Joern's observes, any generic RPC-type of solution is bound to be slower than a simple TCP socket approach which implies zero protocol overhead. The socket appender approach might not be the most user-friendly or reliable but it is bound to be the fastest. The only remaining issue then (having fixed the transport mechanism to basic TCP socket) is data binding.
What is the essence of your objection? Do you think we are wasting our time or is there something else? :-)
I'm simply trying to understand what problem you are solving. I also think your assumption regarding the socket approach being the fastest is correct only under the simplest of circumstances. Finally, while the SocketAppender definitely clears up the client side, I haven't seen any discussion about what it is connecting to except that it is a Logback running remotely. Frankly, that is the part I find most important and why I'm making any kind of issue out of this. If you are simply looking to improve the existing SocketAppender without providing the other end of the connection, then I apologize and you can ignore everything I've said. Let's assume you use a SocketAppender and connect to something listening for some kind of serialized LogEvent. What if the client wants to dictate to the server whether it should get control back immediately after the event is received or wait until the event is procesed? What if I want to communicate the locale or timezone of the client to the server? I'm pulling this off the top of my head but hopefully you get the idea that by specifying the service interface that you get more flexibility for enhancement down the road. And the cost of serializing a method where one of the parameters is the LogEvent probably isn't even measturable vs the cost of serializing the event itself. So my point is simply that if I was doing this I would have the SocketAppender use the service interface to establish the connection with whatever parameters are desired and then use the service to invoke the method to send the event. Doing this also allows you to create different appenders that use the same service one of which might be on top of Spring remoting so that any of its remoting implementations could be used. On the server side they would call the same implementation class as what the SocketAppender ends up interacting with. Does that clear it up? Ralph

Hi Ralph, Ralph Goers wrote:
I'm simply trying to understand what problem you are solving. I also think your assumption regarding the socket approach being the fastest is correct only under the simplest of circumstances. Finally, while the SocketAppender definitely clears up the client side, I haven't seen any discussion about what it is connecting to except that it is a Logback running remotely. A remotely running Logback was just one of Thorbjoerns ideas. The main point of *this* thread - at least as I understood ;) - is to find the "best" solution to encode and decode Logback events.
And "best" is defined as "the most efficient way - preferably platform agnostic".
Frankly, that is the part I find most important and why I'm making any kind of issue out of this. If you are simply looking to improve the existing SocketAppender without providing the other end of the connection, then I apologize and you can ignore everything I've said. I, for one, will implement both sides in Lilith (the SocketReceiver) and the related multiplex appenders. I'll also try to implement anything that is done in Logback I'm quite interested in this topic because I'll also use the best solution as my native file format. It looks like it will be protobuf but I haven't finished the serializer/deserializer, yet. The generated API is very nice, btw. Let's assume you use a SocketAppender and connect to something listening for some kind of serialized LogEvent. What if the client wants to dictate to the server whether it should get control back immediately after the event is received or wait until the event is procesed? I think that isn't the task of the client (the logged application) at all. Logging should simply log, everything else is, more or less, view or usage of the event and should be handled on the receiver side. What if I want to communicate the locale or timezone of the client to the server? I've lost you there. You want to communicate the locale or timezone of the client (the logged application) to the server (the receiver of the events)? I know, it's just an example, but I want to make sure I understood it correctly. I'm pulling this off the top of my head but hopefully you get the idea that by specifying the service interface that you get more flexibility for enhancement down the road. There *is* no service interface. Any type of service in addition to simply receiving events would create an delay.
Joern.

Joern Huxhorn skrev:
And "best" is defined as "the most efficient way - preferably platform agnostic".
Ralph is probably hinting at that "efficient" is not necessarily "fastest". I think the core here is if anything else but the raw events needs to be communicated? (Perhaps exchanging timestamps to allow the receiver to normalize, some handshake to ensure that both ends are trusted?) Personally I'd like robustness, and flexibility. Basically my scenario is to have a production server - attach a receiver, grab some log data, and detach it again without notifying or restarting the production server, and that the application being logged is not influenced by this. An analogue would be a Quake 1 multiplayer scenario where players can enter and leave at will, instead of all having to be ready at the start of the game which is what Doom and DukeNukem did. An application restart is rather inefficient :)
Frankly, that is the part I find most important and why I'm making any kind of issue out of this. If you are simply looking to improve the existing SocketAppender without providing the other end of the connection, then I apologize and you can ignore everything I've said.
I, for one, will implement both sides in Lilith (the SocketReceiver) and the related multiplex appenders. I'll also try to implement anything that is done in Logback I'm quite interested in this topic because I'll also use the best solution as my native file format. It looks like it will be protobuf but I haven't finished the serializer/deserializer, yet. The generated API is very nice, btw.
Looking forward to hearing about it. I gather the protobuf is open source (haven't done anything but skim the webpage) -- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

On Mar 2, 2009, at 7:36 AM, Ceki Gulcu wrote:
Hello Ralph,
In a different context I would have completely agreed with you. In this case however, the service is already known. It's always assumed to be a variant of "send this logging event there," is it not?
Using what API? (The key here is "assumed to be a variant"). You can't use SLF4J to send a LogEvent. Is there some internal Logback API that would be exposed as a service? Is it part of a class that has other services you wouldn't want to or can't expose? Once you have agreement on what that Interface is then it makes sense to discuss how to transport it. Ralph

On Mar 2, 2009, at 7:36 AM, Ceki Gulcu wrote:
Hello Ralph,
In a different context I would have completely agreed with you. In this case however, the service is already known. It's always assumed to be a variant of "send this logging event there," is it not?
Using what API? (The key here is "assumed to be a variant"). You can't use SLF4J to send a LogEvent. Is there some internal Logback API that would be exposed as a service? Is it part of a class that has other services you wouldn't want to or can't expose? Once you have agreement on what that Interface is then it makes sense to discuss how to transport it. Ralph

The only response I really can have to this is that it is focusing on the wrong thing. If the desire is to allow Logback to accept remote LogEvents then would should be identified first is the service that would be used. Once you have that you will determine just how you want to implement the service. For that matter, it might be decided that choose to accept multiple ways to connect. This is the only place serialization should come into the picture - simply as a byproduct of what you choose to expose. And in many cases users of the service shouldn't or wouldn't even know what serialization technique is being used. They just call the client API passing in their language's equivalent of the LogEvent. Ralph On Mar 2, 2009, at 5:21 AM, Ceki Gulcu wrote:
Hello all,
Several fairly broad topics have been mentioned in this thread. One is communication strategies which can deal with arbitrary network or host failures. Another is the RPC mechanism, yet another is the question of data encoding which prefarably would be easy-to-use, language-agnostic, resistant to changes in the data (long term version stability), fast, small and human-readable. Those who still beleive in Santa would like to have all these properties at the same time.
Each of these topics is very broad on its own. I think that focusing on a single topic at a time will improve the chances of a getting a concrete result from the discussion.
Regarding plain old java serialization, I think it is more robust than what many people think. As long as the serialVersionUID of the writer (let's call it W), and the reader (call it R) are the same, java serialization can handle the addition of field in W missing in R, as well as missing fields in W added in R. In many respects java serialization is more robust than XML, especially if you turn on DTD validation in XML.
I found the object serialization examples in [1] and especially [2] very useful.
[1] http://java.sun.com/j2se/1.4.2/docs/guide/serialization/examples/index.html [2] http://java.sun.com/j2se/1.4.2/docs/guide/serialization/examples/evolveseria...
Logback takes advantage of the optimized way java serialization deals with object references. Many references contained in a LoggingEvent (now LoggingEventVO) are transferred by referece instead of by value. I was under the impression that serialization of LoggingEvents was already very fast. However, if we can significantly improve on the existing, so much the better.
As there are certainly pros and cons for each serialization approach, instead of the debate eventually degenerating into a religious argument, we are likely to be better served by basing comparisons on the same logging event data collection, which in the compression world is called a "corpus". At present time, we do not have a logging event corpus. Just as importantly, logback currently lacks a format for storing the said corpus. Given that this corpus will serve as a yardstick for a long time, and performance is not an issue, a human readable text format such as XML seems like a reasonable choice.
Is anyone interested in providing a corpus?
Is anyone interested in writing a LoggingEventVO to XML and XML to LoggingEvent converter(s)?
To answer some of Joern's questions, I would like to note that logback-classic, and in particular ILoggingEvent, does not contain any java serialization-specific methods. Serialization is now an appender issue not handled by ILoggingEvent.
Regarding old events, I would very much like to come up with a strategy for supporting "old" versions of serialized logging events. For example, by having versioned serializer/deserializers, e.g. LoggingEvent0916 and LoggingEvent0918. It's a highly technical issue which would need to be decided by actual implementation.
Having said that, defining a corpus seems to me as being the most pressing issue at this time.
Once we settle on a corpus, we can more objectively debate the merits of such and such serialization strategy.
More below.
Maarten Bosteels wrote:
Sure, will send it to list tonight (haven't got access to it right now). As I wrote earlier, I think protobuf (and Thrift) have some very cool features: * you can update messages without breaking old code [a] * faster than XML * smaller than XML * binary, but with a convenient human-readable representation for debugging and editing * language-neutral, platform-neutral
It looks to good to be true. Does Santa exist after all?
[a] http://code.google.com/apis/protocolbuffers/docs/proto.html#updating Maarten
-- Ceki Gülcü Logback: The reliable, generic, fast and flexible logging framework for Java. http://logback.qos.ch _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

Ceki Gulcu skrev:
As there are certainly pros and cons for each serialization approach, instead of the debate eventually degenerating into a religious argument, we are likely to be better served by basing comparisons on the same logging event data collection, which in the compression world is called a "corpus". At present time, we do not have a logging event corpus. Just as importantly, logback currently lacks a format for storing the said corpus. Given that this corpus will serve as a yardstick for a long time, and performance is not an issue, a human readable text format such as XML seems like a reasonable choice.
Is anyone interested in providing a corpus?
....
Having said that, defining a corpus seems to me as being the most pressing issue at this time.
Once we settle on a corpus, we can more objectively debate the merits of such and such serialization strategy.
If I understand you correctly you basically say there is a need for a standardized set of event data. After thinking this over it might be better to have code generating the events instead of them being stored statically on disk. This is to avoid setting any API in stone except the slf4j interface which by now should be settled. (What if the internal representation of a stack trace is changed or similar? Just happened, might happen again :) ) A test suite might then build all the events for a given test in memory and then do the actual testing (as would have been done anyway if read from XML) That said. What would be reasonable test suites? * A million events with almost no text? * A million events with very large texts (using full unicode set?) * Lots of exceptions? * Large MDC's? What do those with experience in large data sets say? -- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"

On Tue, Mar 3, 2009 at 12:18 PM, Thorbjoern Ravn Andersen <ravn@runjva.com>wrote:
Ceki Gulcu skrev:
As there are certainly pros and cons for each serialization approach, instead of the debate eventually degenerating into a religious argument, we are likely to be better served by basing comparisons on the same logging event data collection, which in the compression world is called a "corpus". At present time, we do not have a logging event corpus. Just as importantly, logback currently lacks a format for storing the said corpus. Given that this corpus will serve as a yardstick for a long time, and performance is not an issue, a human readable text format such as XML seems like a reasonable choice.
Is anyone interested in providing a corpus?
....
Having said that, defining a corpus seems to me as being the most pressing issue at this time.
Once we settle on a corpus, we can more objectively debate the merits of such and such serialization strategy.
If I understand you correctly you basically say there is a need for a standardized set of event data.
After thinking this over it might be better to have code generating the events instead of them being stored statically on disk. This is to avoid setting any API in stone except the slf4j interface which by now should be settled.
I was thinking the same thing. Having some code that generates a pre-defined set of logging events. Side note : I don't think we should be looking for THE wire-format for logback events, because different projects will have different needs. For example, I don't care whether the wire-format is human readable or not. And for me it isn't absolutely necessary that the receiver can re-construct an *exact* LoggingEvent from the payload. Maarten
(What if the internal representation of a stack trace is changed or similar? Just happened, might happen again :) )
A test suite might then build all the events for a given test in memory and then do the actual testing (as would have been done anyway if read from XML)
That said. What would be reasonable test suites?
* A million events with almost no text? * A million events with very large texts (using full unicode set?) * Lots of exceptions? * Large MDC's?
What do those with experience in large data sets say?
-- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"
_______________________________________________ - Show quoted text - logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

Thorbjoern Ravn Andersen wrote:
If I understand you correctly you basically say there is a need for a standardized set of event data.
Yes.
After thinking this over it might be better to have code generating the events instead of them being stored statically on disk. This is to avoid setting any API in stone except the slf4j interface which by now should be settled.
The generator might have some logback dependencies, we are talking about logback logging events after all.
(What if the internal representation of a stack trace is changed or similar? Just happened, might happen again :) )
If we assume that the corpus is generated, then we don't need to worry about the internal representation of logging event changing. The generator will take care of adapting to the new format. The generator might have some logback dependencies, we are talking about logback logging events after all. It is likely that the corpus will need to be present in memory, so it should not be huge as to exceed the heap size of the JVM. A sample size of approximately 100'000 is imho sufficient. We would then measure the time it takes to encode and then decode the corpus and storage size of the encoded results. I assume that the generator will be based on pseudo-random number generators. We would need to settle on several parameters such as message length, the number of messages with parameters, the frequency of exceptions, etc. There is no test suite as such. There is just a standard set of logging events called the corpus that various encoding strategies are benchmarked against. -- Ceki Gülcü Logback: The reliable, generic, fast and flexible logging framework for Java. http://logback.qos.ch

Hey guys, I've implemented protobuf Serializer/Deserializer for Lilith and have done some more benchmarks: http://apps.sourceforge.net/trac/lilith/wiki/SerializationPerformance protobuf is really, really fast and creates the smallest data of all tested mechanisms!! Uncompressed protobuf is only slightly larger than compressed java serialization! I'll definitely use it for both my appender and the file format in the next Lilith version... The benchmark code can be found here: http://apps.sourceforge.net/trac/lilith/browser/trunk/lilith/src/test/java/d... It's using the same LoggingEvents all the time so it has a corpus as suggested below, right? The only problematic bit is that it's using Lilith events and not Logback events but they should be comparable, I think. Feel free to bash me if you know a better way to benchmark this :) The proto file can be found here: http://apps.sourceforge.net/trac/lilith/browser/trunk/lilith-data/logging-pr... and all protobuf related code is here: http://apps.sourceforge.net/trac/lilith/browser/trunk/lilith-data/logging-pr... I've also added streamingSerializationWrite and streamingSerializationRead which mimics the way Logbacks SocketAppender is currently serializing. This method has the downside that it's not possible to send separate events to multiple recipients - which I do with my multiplexers - without serializing multiple times. Joern.

Hello Joern, Thank you for this post. Looking at PerformanceTest, and in particular the createCallStack() method, it appears that stack traces used in the generated logging events will have exactly 5 elements. If caller data collection is enabled, in a production system, the computed caller data could easily have 20 or more elements. Given its sheer size, caller data may overwhelm all other elements in a LoggingEvent. Even in case where caller data has only 5 elements, it probably accounts for two thirds, or at least half, the size of a serialized logging event. By any yardstick, caller data is a very significant factor when benchmarking logging event serialization. Joern Huxhorn wrote:
Hey guys, I've implemented protobuf Serializer/Deserializer for Lilith and have done some more benchmarks:
http://apps.sourceforge.net/trac/lilith/wiki/SerializationPerformance
protobuf is really, really fast and creates the smallest data of all tested mechanisms!! Uncompressed protobuf is only slightly larger than compressed java serialization! I'll definitely use it for both my appender and the file format in the next Lilith version...
The benchmark code can be found here: http://apps.sourceforge.net/trac/lilith/browser/trunk/lilith/src/test/java/d...
It's using the same LoggingEvents all the time so it has a corpus as suggested below, right? The only problematic bit is that it's using Lilith events and not Logback events but they should be comparable, I think. Feel free to bash me if you know a better way to benchmark this :)
The proto file can be found here: http://apps.sourceforge.net/trac/lilith/browser/trunk/lilith-data/logging-pr...
and all protobuf related code is here: http://apps.sourceforge.net/trac/lilith/browser/trunk/lilith-data/logging-pr...
I've also added streamingSerializationWrite and streamingSerializationRead which mimics the way Logbacks SocketAppender is currently serializing. This method has the downside that it's not possible to send separate events to multiple recipients - which I do with my multiplexers - without serializing multiple times.
Joern. _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev
-- Ceki Gülcü Logback: The reliable, generic, fast and flexible logging framework for Java. http://logback.qos.ch

I think I've lost the point of this discussion somewhere. The subject says something about submitting events remotely yet this discussion seems to be totally about serialization. If it is really about something like a "service" to submit events than I would suggest looking at Spring remoting and some of the protocols it supports - such as Hessian or Burlap. I would argue that a discussion about how best to serialize an object is pointless without having first decided on what the service API is. For example, are you presuming that one system will log to an Appender that will forward to a server that will turn around and log the event again? Or perhaps an Appender would just forward the event to an Appender on the remote system? Or, using Spring Remoting one could imagine that the local Appender is just a client stub generated by Spring forwarding to the "real" Appender somewhere else. Ralph On Mar 1, 2009, at 4:23 PM, Thorbjoern Ravn Andersen wrote:
Joern Huxhorn skrev:
Well, a stream could contain both LoggingEvent0916 and LoggingEvent0918 and the Deserializer would produce the correct output using instanceof and some defined conversion. The point is that the old class would not be lost as it is the case if just the serialVersionUID is changed. It could therefore still be deserialized. After thinking it over I believe that this is the exact reason why XMLEncoder/XMLDecoder was introduced in the first place - the ability to read in old classes.
I don't think that it is possible to get to work _well_ but I'm always happy for a good discussion.
Have you had the time to check my xml format? While it's not exactly terse ;) it's definitely human-readable. I think I have understood what the problem with my use of the "humanly readable" term is. It is not the actual XML with names and tags, but the data carried I am talking about. Timestamps are fine but is not really readable to a human.
I can easily live with one character tag and attribute names if the data in the fields carry meaning to the human eye :)
Well, I'm using yyyy-MM-dd'T'HH:mm:ss.SSSZ as the primary time stamp format and then apply some magic to change it into a valid xml timestamp, i.e. I add a colon between hh and mm of the timezone. It mad my sigh quite a bit when I realized that SimpleDateFormat did not support this...
A valid xml timestamp? According to what dialect? SOAP? Frankly I think that these date objects are expensive to reconstruct - perhaps you should look into this if performance is an issue to you? I believe the date object just serializes the long datestamp which is much cheaper to deserialize than to deconstruct a string into pieces and build a date/calendar object from it. Note you also need timestamp information for this so it most likely goes through a Calendar.
Point taken, I wasn't really thinking about that when I was implementing it, I primarily wanted to do it *right* ;)
It's an xs:dateTime (http://www.w3.org/TR/xmlschema-2/#dateTime) which seemed reasonable to me. xs:dateTime expects a colon between timezone hh and mm while SimpleDateFormat does only support hhmm :p
I implemented your suggestion and it's - unsurprisingly - deserializing quite a bit faster, now. Hehe, if I read your tables correctly the lilithXmlUncompressedDeserializer went from 1.974544 sec to 0.497055 sec, which is four times faster so "quite a bit" is quite a bit of an understatement :)
Also the value is very close to the 0.493445 value for serializationUncompressedDeserializer so it appears that the two approaches have about the same runtime performance.
Now I just need to update the schema... and I really need to implement an EventWrapper Serializer that's using my xml so I can use it as the Lilith file format...! Performance and size is absolutely acceptable.
Now I just need to bully you into shortening all tags and attribute names into one character and skip any namespace prefixes :) Then we agree O:-)
I moved the benchmarks to http://apps.sourceforge.net/trac/lilith/wiki/SerializationPerformance so I don't have to change the closed bug all the time :p You _could_ also just reopen the bug :)
-- Thorbjørn Ravn Andersen "...plus... Tubular Bells!"
_______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev
participants (6)
-
Ceki Gulcu
-
Joern Huxhorn
-
Maarten Bosteels
-
Ralph Goers
-
Ralph Goers
-
Thorbjoern Ravn Andersen