Hi Gena,
Thanks for the detailed response - I'll try to be as thorough in return!
I agree completely with you that a timeout of 0 (meaning wait for the worker thread to join, regardless of the time it takes) makes a lot of sense for local appenders, such as the
RollingFileAppender example you gave. On the other hand, when the appender has a high latency, perhaps because the appender is sending the log events to a remote system (database, cloud service, syslog daemon, etc), it may not be feasible for the JVM to wait indefinitely for log events to flush. For instance, sending log events to Loggly (http://www.loggly.com) can have latencies between 200 and 600ms. Depending on the queue depth, this could take a very long time to flush completely. For some environments, it *may* be preferable to drop log events over taking multiple minutes to flush the log queue. For other environments, the log events may be important enough to stall the JVM shutdown indefinitely. For this reason, we determined that it made the most sense to make the join call timeout configurable, using the previously hardcoded value as a default in order to keep the semantics the same. This way, the application developer can determine what is correct for the application by specifying an appropriate timeout value.
Of course any events posted to a logger after calling LoggerContext.stop will be ignored - Logback only makes guarantees about events submitted to loggers that have been started. Since LoggerContext.stop stops all loggers and appenders, any subsequent log messages will not be appended to the target, which makes perfect sense. The key aspect here is that LoggerContext.stop should not be called unless no more log events are expected. In some cases, the application may be able to call it right before System.exit, but as discussed in other scenarios it may need to be triggered via a shutdown hook.
I mis-understood your solution - I got the impression that you were suggesting that the user code would have to change the Logback mode.
The shutdown hook scenario is something that I also run into. The scenarios that you describe are completely valid! Any time an application has background threads (user or daemon) or shutdown hooks that may log events, there is the potential for lost log events due to timing / race conditions.
As you said, using a timeout delay for the shutdown hook clearly is non-optimal for some scenarios (though it may be acceptable for others). The upside to a delayed cleanup shutdown hook is that the user can profile their application and determine a reasonable value for the delay parameter. Obviously this is still not failsafe, but with appropriate understanding of the work that the shutdown hook is attempting to do and the maximum amount of time that the task should take, a delay cleanup hook *could* work within the requirements of the application.
Obviously this does not solve all scenarios for all applications. As a result, the ShutdownHookAction that I am adding to Joran will operate in a manner similar to the AppenderAction in that it will utilize a class parameter that defines which ShutdownHook implementation should be instantiated. For instance, if the application can reasonably expect that the other shutdown hooks will complete in 500 ms, a DelayedCleanupHook could be used with a delay twice the expected runtime of the other hooks:
<shutdownHook class="ch.qos.logback.classic.spi.DelayingCleanupHook">
<delay>1000</delay>
</shutdownHook>
If the application controlled all of the other shutdown hooks (no third-party shutdown hooks are utilized or need to log events), then a custom ShutdownHook could be written that looks for some condition set by the other shutdown hooks:
<shutdownHook class="com.foo.package.CustomCleanupHook">
</shutdownHook>
where the CustomCleanupHook checks against some globally accessible state (say a boolean value) to determine if it is safe to stop the LoggerContext.
Because of this flexibility and extensibility, I don't believe that the ShutdownHookAction would be useless in your scenario. Depending on your application needs and specific scenario, you should be able to define a ShutdownHook implementation that meets your requirements. Generally speaking, it is difficult to define a "universally perfect" solution because every application has different requirements and acceptable tolerances.
Perhaps with more information about your specific scenario, we could come up with a ShutdownHook design that covers your needs adequately and include that in the pull request. There are other ShutdownHooks that I am interested in developing for inclusion such as:
- idle time based shutdown hook - when all Loggers have been idle (no new events appender for a configurable time period), then trigger LoggerContext.stop. This would require some level of Logback internal statistics in order to implement.
- thread count shutdown hook - for applications where the number of active threads is deterministic, it may be acceptable to have the LoggerContext stop once all non-essential threads have stopped. Threads can be queried via the ThreadMXBean API, though the number and function of the JVM internal threads may vary by implementation, making this design difficult
- synchronized variable state shutdown hook - if other shutdown hooks & threads can set a variable indicating their state, perhaps a semaphore or counting lock indicating the number of open threads, which will stop the LoggerContext when all of the locks are released. This would require co-operation of the other threads and shutdown hooks - limited ability to support Third-Party threads or hooks...
I hope this helps to explain how the ShutdownHook design is supposed to actually work. I think that it should provide a foundation that will support most scenarios. That being said, for some use-cases it may be necessary to design additional flexibility into Logback - for instance, my described idle time based shutdown hook would require some way of tracking logger use statistics which could be a sizable modification.
Thank you again for your input! I look forward to hearing your thoughts on the described framework.
Regards,
Mike Reinhold