Log events delivery guarantee.

I made search in a list and only found one relevant message: http://markmail.org/message/tdkr745eqf3vxqme
From Michael Reinhold, Mar 19, 2014 5:45:23 pm: The method described for a disk-cached AsyncAppender of course makes perfect sense. I think it would make a good alternative to the current AsyncAppender for some scenarios and would be a solid addition to the current Logback appenders.
I am going to implement log collection into ElasticSearch and I would like to achieve high level of delivery reliability. Not in sense of latency but in sense of completeness. Business processes operate on collected data and nothing should be lost if remote log collector is down or my application crashed/restarted. Only local disk space the most safest place for write due to low latency, persistence and availability (except rare cases of FS corruption or disk is full error). Network services definitely have downtime. ========================= Is there persistence wrapper for log events to guaranty delivery at later time if remote service is not available? Like AsyncAppender for Appender? Docs https://logback.qos.ch/manual/appenders.html doesn't show possibility for error: public interface Appender<E> extends LifeCycle, ContextAware, FilterAttachable { public String getName(); public void setName(String name); void doAppend(E event); } but in sources I see: void doAppend(E event) throws LogbackException; Potentially it is possible to say that service unavailable via LogbackException subclass and start collect data into memory buffer until some limit, then fallback to disk media. And try to send events to server with some defined pattern until RecipientDownException is gone. On application startup PersistentAppender should check for serialized unpublished events and try to send them again. As I see in https://logback.qos.ch/apidocs/ch/qos/logback/classic/spi/ILoggingEvent.html persister should store: * timestamp * level * msg * marker (not sure is full hierarchy is needed as market is just a string) * MCD * StackTraceElement[] (this information can be useful but require a lot of CPU) Some get* method are strange, like getArgumentArray(), I am not sure if this information is useful. ====================== Alternative approach is to write log and later parse it. Like this done with Filebeat https://www.elastic.co/products/beats/filebeat that complement to ElasticSearch. The problem with such solution is necessity to invent or follow file format. Formatting log with MDC and Markers that can be tricky. ====================== Direct writer to ElasticSearch https://github.com/internetitem/logback-elasticsearch-appender buffer messages only in memory if ElasticSearch isn't available and application should be restarted everything is lost.

I created the FlumeAppender for Log4j with just this purpose in mind. The FlumeAppender will write the log event to local disk and then returns control to the application. At this point eventual delivery is guaranteed. A background thread reads the events that have been written to disk and forwards them on to another Apache Flume node. When that node confirms it has accepted them the event is deleted from local disk. The FlumeAppender has the ability to fail over to alternate Flume nodes. If none are available the events will simply stay on disk until it is full. Ralph
On Jun 2, 2017, at 10:01 AM, Oleksandr Gavenko <gavenkoa@gmail.com> wrote:
I made search in a list and only found one relevant message:
http://markmail.org/message/tdkr745eqf3vxqme
From Michael Reinhold, Mar 19, 2014 5:45:23 pm:
The method described for a disk-cached AsyncAppender of course makes perfect sense. I think it would make a good alternative to the current AsyncAppender for some scenarios and would be a solid addition to the current Logback appenders.
I am going to implement log collection into ElasticSearch and I would like to achieve high level of delivery reliability. Not in sense of latency but in sense of completeness.
Business processes operate on collected data and nothing should be lost if remote log collector is down or my application crashed/restarted.
Only local disk space the most safest place for write due to low latency, persistence and availability (except rare cases of FS corruption or disk is full error).
Network services definitely have downtime.
=========================
Is there persistence wrapper for log events to guaranty delivery at later time if remote service is not available? Like AsyncAppender for Appender?
Docs https://logback.qos.ch/manual/appenders.html doesn't show possibility for error:
public interface Appender<E> extends LifeCycle, ContextAware, FilterAttachable { public String getName(); public void setName(String name); void doAppend(E event); }
but in sources I see:
void doAppend(E event) throws LogbackException;
Potentially it is possible to say that service unavailable via LogbackException subclass and start collect data into memory buffer until some limit, then fallback to disk media.
And try to send events to server with some defined pattern until RecipientDownException is gone.
On application startup PersistentAppender should check for serialized unpublished events and try to send them again.
As I see in https://logback.qos.ch/apidocs/ch/qos/logback/classic/spi/ILoggingEvent.html persister should store:
* timestamp * level * msg * marker (not sure is full hierarchy is needed as market is just a string) * MCD * StackTraceElement[] (this information can be useful but require a lot of CPU)
Some get* method are strange, like getArgumentArray(), I am not sure if this information is useful.
======================
Alternative approach is to write log and later parse it. Like this done with Filebeat https://www.elastic.co/products/beats/filebeat that complement to ElasticSearch.
The problem with such solution is necessity to invent or follow file format. Formatting log with MDC and Markers that can be tricky.
======================
Direct writer to ElasticSearch https://github.com/internetitem/logback-elasticsearch-appender buffer messages only in memory if ElasticSearch isn't available and application should be restarted everything is lost. _______________________________________________ logback-user mailing list logback-user@qos.ch http://mailman.qos.ch/mailman/listinfo/logback-user

I found https://github.com/gilt/logback-flume-appender <https://github.com/gilt/logback-flume-appender> but haven’t looked at the code. Maybe it will meet your needs. Ralph
On Jun 2, 2017, at 11:02 AM, Ralph Goers <rgoers@apache.org> wrote:
I created the FlumeAppender for Log4j with just this purpose in mind. The FlumeAppender will write the log event to local disk and then returns control to the application. At this point eventual delivery is guaranteed. A background thread reads the events that have been written to disk and forwards them on to another Apache Flume node. When that node confirms it has accepted them the event is deleted from local disk. The FlumeAppender has the ability to fail over to alternate Flume nodes. If none are available the events will simply stay on disk until it is full.
Ralph
On Jun 2, 2017, at 10:01 AM, Oleksandr Gavenko <gavenkoa@gmail.com> wrote:
I made search in a list and only found one relevant message:
http://markmail.org/message/tdkr745eqf3vxqme
From Michael Reinhold, Mar 19, 2014 5:45:23 pm:
The method described for a disk-cached AsyncAppender of course makes perfect sense. I think it would make a good alternative to the current AsyncAppender for some scenarios and would be a solid addition to the current Logback appenders.
I am going to implement log collection into ElasticSearch and I would like to achieve high level of delivery reliability. Not in sense of latency but in sense of completeness.
Business processes operate on collected data and nothing should be lost if remote log collector is down or my application crashed/restarted.
Only local disk space the most safest place for write due to low latency, persistence and availability (except rare cases of FS corruption or disk is full error).
Network services definitely have downtime.
=========================
Is there persistence wrapper for log events to guaranty delivery at later time if remote service is not available? Like AsyncAppender for Appender?
Docs https://logback.qos.ch/manual/appenders.html doesn't show possibility for error:
public interface Appender<E> extends LifeCycle, ContextAware, FilterAttachable { public String getName(); public void setName(String name); void doAppend(E event); }
but in sources I see:
void doAppend(E event) throws LogbackException;
Potentially it is possible to say that service unavailable via LogbackException subclass and start collect data into memory buffer until some limit, then fallback to disk media.
And try to send events to server with some defined pattern until RecipientDownException is gone.
On application startup PersistentAppender should check for serialized unpublished events and try to send them again.
As I see in https://logback.qos.ch/apidocs/ch/qos/logback/classic/spi/ILoggingEvent.html persister should store:
* timestamp * level * msg * marker (not sure is full hierarchy is needed as market is just a string) * MCD * StackTraceElement[] (this information can be useful but require a lot of CPU)
Some get* method are strange, like getArgumentArray(), I am not sure if this information is useful.
======================
Alternative approach is to write log and later parse it. Like this done with Filebeat https://www.elastic.co/products/beats/filebeat that complement to ElasticSearch.
The problem with such solution is necessity to invent or follow file format. Formatting log with MDC and Markers that can be tricky.
======================
Direct writer to ElasticSearch https://github.com/internetitem/logback-elasticsearch-appender buffer messages only in memory if ElasticSearch isn't available and application should be restarted everything is lost. _______________________________________________ logback-user mailing list logback-user@qos.ch http://mailman.qos.ch/mailman/listinfo/logback-user

On Fri, Jun 2, 2017 at 9:14 PM, Ralph Goers <rgoers@apache.org> wrote:
I found https://github.com/gilt/logback-flume-appender but haven’t looked at the code. Maybe it will meet your needs.
Sorry, that project have only networking code. There is https://github.com/internetitem/logback-elasticsearch-appender project with writes from Logback to Elasticsearch directly (with buffering and awaiting to Elasticsearch to become up). But that project store events only in memory...

On Fri, Jun 2, 2017 at 9:02 PM, Ralph Goers <rgoers@apache.org> wrote:
I created the FlumeAppender for Log4j with just this purpose in mind. The FlumeAppender will write the log event to local disk and then returns control to the application. At this point eventual delivery is guaranteed. Can you share how do you serialize log event? https://logback.qos.ch/apidocs/ch/qos/logback/classic/spi/ILoggingEvent.html interface has simple fields along with complex, like:
* MCD (which I definitely will use) * StackTraceElement[] Is that some CSV format? How do you handle control characters and new lines? My end destination works with JSON format, I can stop with this serialization.
A background thread reads the events that have been written to disk and forwards them on to another Apache Flume node. When that node confirms it has accepted them the event is deleted from local disk. The FlumeAppender has the ability to fail over to alternate Flume nodes. If none are available the events will simply stay on disk until it is full.
Originally I thought about complex solution that asynchronously send to network unless remote host down or event buffer is full because of load. In later case write to disc and later try to deliver saved data. On application shutdown saving to disk can be much faster then trying to deliver logging events to external server. What I wander is how do you manage saved events. In single file or several? How do you discover file for processing? How do you split logging events? How do you keep pointers for not yet processes events?

I would suggest you look at http://flume.apache.org/FlumeUserGuide.html <http://flume.apache.org/FlumeUserGuide.html> and specifically http://flume.apache.org/FlumeUserGuide.html#file-channel <http://flume.apache.org/FlumeUserGuide.html#file-channel>. Flume has an embedded agent http://flume.apache.org/FlumeDeveloperGuide.html#embedded-agent <http://flume.apache.org/FlumeDeveloperGuide.html#embedded-agent> that you can use that handles all of this for you. The Log4j FlumeAppender does this but it also supports a lighter-weight version that you could copy as well. It uses BerkelyDB and writes the event to it before passing the event to another thread where it uses the Flume RPC client to send the event off. You could copy just the BerkelyDB logic and tie it to another delivery mechanism if you want. The code for that is at https://github.com/apache/logging-log4j2/blob/master/log4j-flume-ng/src/main... <https://github.com/apache/logging-log4j2/blob/master/log4j-flume-ng/src/main/java/org/apache/logging/log4j/flume/appender/FlumePersistentManager.java> and the main portion of the appender is at https://github.com/apache/logging-log4j2/blob/master/log4j-flume-ng/src/main... <https://github.com/apache/logging-log4j2/blob/master/log4j-flume-ng/src/main/java/org/apache/logging/log4j/flume/appender/FlumeAppender.java> so you can see how they tie together. The FlumeAppender supports any valid Layout so the answer to how the stack trace is passed would be “it depends”. Ralph
On Jun 2, 2017, at 11:44 AM, Oleksandr Gavenko <gavenkoa@gmail.com> wrote:
On Fri, Jun 2, 2017 at 9:02 PM, Ralph Goers <rgoers@apache.org> wrote:
I created the FlumeAppender for Log4j with just this purpose in mind. The FlumeAppender will write the log event to local disk and then returns control to the application. At this point eventual delivery is guaranteed. Can you share how do you serialize log event? https://logback.qos.ch/apidocs/ch/qos/logback/classic/spi/ILoggingEvent.html interface has simple fields along with complex, like:
* MCD (which I definitely will use) * StackTraceElement[]
Is that some CSV format? How do you handle control characters and new lines?
My end destination works with JSON format, I can stop with this serialization.
A background thread reads the events that have been written to disk and forwards them on to another Apache Flume node. When that node confirms it has accepted them the event is deleted from local disk. The FlumeAppender has the ability to fail over to alternate Flume nodes. If none are available the events will simply stay on disk until it is full.
Originally I thought about complex solution that asynchronously send to network unless remote host down or event buffer is full because of load.
In later case write to disc and later try to deliver saved data.
On application shutdown saving to disk can be much faster then trying to deliver logging events to external server.
What I wander is how do you manage saved events. In single file or several? How do you discover file for processing? How do you split logging events? How do you keep pointers for not yet processes events? _______________________________________________ logback-user mailing list logback-user@qos.ch http://mailman.qos.ch/mailman/listinfo/logback-user

On Fri, Jun 2, 2017 at 10:53 PM, Ralph Goers <rgoers@apache.org> wrote:
It uses BerkelyDB and writes the event to it before passing the event to another thread where it uses the Flume RPC client to send the event off. You could copy just the BerkelyDB logic and tie it to another delivery mechanism if you want.
Thanks for suggestion! I didn't profile myself how long latency in writing single row to DB, numbers from Internet about 1 ms. I think that in case of embedded DB latency would be much smaller. According to: http://www.oracle.com/technetwork/database/berkeleydb/overview/index-093405.... BerkelyDB add 1 MiB overhead in distribution which is acceptable. Don't know about runtime overhead. Embedded DB provides reliable solution for data consistency and allows keep it structured. I am not sure if DB size stay low if you add/del data over period of time. I don't like to maintain this DB. With some file rotation solution you can write to new file and process old one (and just delete it at the end).

On Jun 2, 2017, at 2:58 PM, Oleksandr Gavenko <gavenkoa@gmail.com> wrote:
On Fri, Jun 2, 2017 at 10:53 PM, Ralph Goers <rgoers@apache.org> wrote:
It uses BerkelyDB and writes the event to it before passing the event to another thread where it uses the Flume RPC client to send the event off. You could copy just the BerkelyDB logic and tie it to another delivery mechanism if you want.
Thanks for suggestion!
I didn't profile myself how long latency in writing single row to DB, numbers from Internet about 1 ms.
I think that in case of embedded DB latency would be much smaller.
According to:
http://www.oracle.com/technetwork/database/berkeleydb/overview/index-093405....
BerkelyDB add 1 MiB overhead in distribution which is acceptable. Don't know about runtime overhead.
Embedded DB provides reliable solution for data consistency and allows keep it structured.
I am not sure if DB size stay low if you add/del data over period of time. I don't like to maintain this DB.
With some file rotation solution you can write to new file and process old one (and just delete it at the end).
As I recall the architecture of BerkelyDB is such that it uses fixed sized files and writes to them and swaps the old files out as they empty. Basically, you just use it and forget about it. I have had one user complaint that it didn’t gracefully handle the case when the disk was full, but simply adding decent OS monitoring can avoid that. In my testing I found that the performance of BerkelyDB wasn’t quite as good as Flume’s File channel, but Flume has a lot of dependencies and BerkelyDB has very few. Both seem to be pretty reliable. Ralph

On Fri, Jun 2, 2017 at 8:01 PM, Oleksandr Gavenko <gavenkoa@gmail.com> wrote:
Alternative approach is to write log and later parse it. Like this done with Filebeat https://www.elastic.co/products/beats/filebeat that complement to ElasticSearch.
The problem with such solution is necessity to invent or follow file format. Formatting log with MDC and Markers that can be tricky.
What I forgot to add is that parsing stack traces with filebeat is not a pleasure. Also it's tricky to manage file rotation, from https://discuss.elastic.co/t/will-filebeat-handle-properly-the-underlying-lo...
while filebeat uses file metadata like inode and device id in order to track files, it still requires the full path in order to open files. The prospectors can only find files by path and do check inode/device id against known set of inodes.
If a file is simply renamed or closed by the logger, but still open by filebeat, filebeat will process until end (will not close file handle yet). But if filebeat gets restarted in between, it needs to find the renamed files again. In order to find renamed files, the glob pattern must match rotated files too. So to say, it's better use pattern mysqld.log*.
That is not reliable solution in my view.

If you absolutely want to be sure that logs are recorded, to the extent that a failure to log should mean the transaction should be retried then log back is not he right choice of API. If the above doesn't apply then ignore the rest of this reply. Logback API was designed with this property in mind. No IO error will ever propagate to the application. And the Unix API philosophy is that any IO call can fail, and should be retried. Sure you can get local daemons that will successfully replicate your logs from disk to another machine, but there are plenty of things that can go wrong locally that you should consider. On Fri, Jun 2, 2017, at 19:05, Oleksandr Gavenko wrote:
On Fri, Jun 2, 2017 at 8:01 PM, Oleksandr Gavenko <gavenkoa@gmail.com>> wrote:
Alternative approach is to write log and later parse it. Like this done with Filebeat https://www.elastic.co/products/beats/filebeat that> > complement to ElasticSearch.
The problem with such solution is necessity to invent or follow file> > format. Formatting log with MDC and Markers that can be tricky.
What I forgot to add is that parsing stack traces with filebeat is not> a pleasure.
Also it's tricky to manage file rotation, from https://discuss.elastic.co/t/will-filebeat-handle-properly-the-underlying-log-file-rotation/56119/4>
while filebeat uses file metadata like inode and device id in order to track files, it still requires the full path in order to open files.> > The prospectors can only find files by path and do check inode/device id against known set of inodes.> If a file is simply renamed or closed by the logger, but still open by filebeat,> > filebeat will process until end (will not close file handle yet). But if filebeat gets restarted in between, it needs to find the renamed files again.> > In order to find renamed files, the glob pattern must match rotated files too. So to say, it's better use pattern mysqld.log*.> That is not reliable solution in my view.
logback-user mailing list logback-user@qos.ch http://mailman.qos.ch/mailman/listinfo/logback-user
participants (3)
-
David Roussel
-
Oleksandr Gavenko
-
Ralph Goers