[JIRA] Created: (LBCLASSIC-158) Interleaved messages from multiple threads

Interleaved messages from multiple threads ------------------------------------------ Key: LBCLASSIC-158 URL: http://jira.qos.ch/browse/LBCLASSIC-158 Project: logback-classic Issue Type: Bug Components: appender Affects Versions: 0.917 Environment: Mac OS X 10.5.8, Eclipse 20090920-1017, Java SE 6, Mac Pro with 2 processors with 4 cores each Reporter: Michael Olson Assignee: Logback dev list Priority: Critical Note: I'm using pastebin so that the formatting is not messed with by JIRA and so that I can refer to line numbers. My apologies if that makes it more difficult to examine my files. I've just moved to logback from log4j (which I was using only with the default configuration) using the slf4j wrappers. I'm using a configuration which is a slightly changed version of one found in the documentation; the configuration can be found here: http://logback.pastebin.com/m1a6f2875 These two log file excerpts are from that configuration: http://logback.pastebin.com/m5e08cb57 - note lines 3 and 4 http://logback.pastebin.com/m7e575564 - note lines 2, 3, and 4 Then this configuration, which only changes the class levels, is for the next log (I wanted to be precise): http://logback.pastebin.com/f6195004e This log file is from this morning: http://logback.pastebin.com/f42bdb83f Note lines 2 and 3. You'll notice, I'm sure, that these are all from the end of my log files; I have not observed this behavior elsewhere in my logs. The observed problems have been during my shutdown procedure. Specifically, the class PostItemSubmitter, with 4 threads, is waiting on input from CalaisSubmitter (it is trying to retrieve an object from a LinkedBlockingQueue with a get() operation). In these particular logs, CalaisSubmitter has run into fatal errors (over SLA, whoops) and has put no data in the queue, so all four threads are currently blocked on their input queue. After all of the CalaisSubmitter threads are shutdown, the main thread tries to shut down the PostItemSubmitter threads. It does this by calling a function in the thread's instance which sets a boolean value to false that controls the continued execution of the main program loop. After it has set those variables, the main thread will check the size of the queue; if it is 0, which it is in this case, then it will interrupt any threads that have not stopped after one second. It will do this immediately, so all four threads will receive Thread.interrupt() calls in sequence. When those threads are interrupted, they throw an InterruptedException. The catch statement looks like this: catch (InterruptedException e) { if (running) logger.error("Interrupted getting item from queue"); else logger.info("Shutdown aborted queue operation"); } In at least two of these three logs, it seems clear that the problem lies in how the interrupted threads are using the static logger defined for their class, which looks like this: private static Logger logger = LoggerFactory.getLogger(PostItemSubmitter.class); In the other log, the interleaving comes with the CalaisSubmitter, but you can see at least one aborted queue operation, so the reason could be similar (that class is also interrupted on an empty queue and catches those interrupts in the same way). Let me know if you need any more information to help diagnose the problem. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.qos.ch/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

[ http://jira.qos.ch/browse/LBCLASSIC-158?page=com.atlassian.jira.plugin.syste... ] Ceki Gulcu commented on LBCLASSIC-158: -------------------------------------- Thank you for this report. Thread.interrupt() will interrupt ongoing IO operations which could explain the behavior you are observing in which case, the issue is not really logback related. Could you please submit a test case illustrating this bug?
Interleaved messages from multiple threads ------------------------------------------
Key: LBCLASSIC-158 URL: http://jira.qos.ch/browse/LBCLASSIC-158 Project: logback-classic Issue Type: Bug Components: appender Affects Versions: 0.917 Environment: Mac OS X 10.5.8, Eclipse 20090920-1017, Java SE 6, Mac Pro with 2 processors with 4 cores each Reporter: Michael Olson Assignee: Logback dev list Priority: Critical
Note: I'm using pastebin so that the formatting is not messed with by JIRA and so that I can refer to line numbers. My apologies if that makes it more difficult to examine my files. I've just moved to logback from log4j (which I was using only with the default configuration) using the slf4j wrappers. I'm using a configuration which is a slightly changed version of one found in the documentation; the configuration can be found here: http://logback.pastebin.com/m1a6f2875 These two log file excerpts are from that configuration: http://logback.pastebin.com/m5e08cb57 - note lines 3 and 4 http://logback.pastebin.com/m7e575564 - note lines 2, 3, and 4 Then this configuration, which only changes the class levels, is for the next log (I wanted to be precise): http://logback.pastebin.com/f6195004e This log file is from this morning: http://logback.pastebin.com/f42bdb83f Note lines 2 and 3. You'll notice, I'm sure, that these are all from the end of my log files; I have not observed this behavior elsewhere in my logs. The observed problems have been during my shutdown procedure. Specifically, the class PostItemSubmitter, with 4 threads, is waiting on input from CalaisSubmitter (it is trying to retrieve an object from a LinkedBlockingQueue with a get() operation). In these particular logs, CalaisSubmitter has run into fatal errors (over SLA, whoops) and has put no data in the queue, so all four threads are currently blocked on their input queue. After all of the CalaisSubmitter threads are shutdown, the main thread tries to shut down the PostItemSubmitter threads. It does this by calling a function in the thread's instance which sets a boolean value to false that controls the continued execution of the main program loop. After it has set those variables, the main thread will check the size of the queue; if it is 0, which it is in this case, then it will interrupt any threads that have not stopped after one second. It will do this immediately, so all four threads will receive Thread.interrupt() calls in sequence. When those threads are interrupted, they throw an InterruptedException. The catch statement looks like this: catch (InterruptedException e) { if (running) logger.error("Interrupted getting item from queue"); else logger.info("Shutdown aborted queue operation"); } In at least two of these three logs, it seems clear that the problem lies in how the interrupted threads are using the static logger defined for their class, which looks like this: private static Logger logger = LoggerFactory.getLogger(PostItemSubmitter.class); In the other log, the interleaving comes with the CalaisSubmitter, but you can see at least one aborted queue operation, so the reason could be similar (that class is also interrupted on an empty queue and catches those interrupts in the same way). Let me know if you need any more information to help diagnose the problem.
-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.qos.ch/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

[ http://jira.qos.ch/browse/LBCLASSIC-158?page=com.atlassian.jira.plugin.syste... ] Michael Olson commented on LBCLASSIC-158: ----------------------------------------- I created a test case based around the idea that the only problem was Thread.interrupt() (which makes sense to me). To that end, I created these two Java files: InterruptTest - http://logback.pastebin.com/f34a492f0 BlockerThread - http://logback.pastebin.com/f5914c9d5 The main function of InterruptTest creates 10 BlockerThreads which then wait on the queue, catch an InterruptedException, and write to the logger. I was, however, unable to reproduce the interleaving using this simple test; this is the output of the program on my computer (I did many iterations with different variations, but only posted this one log): http://logback.pastebin.com/f55ad287c You'll notice that there is no interleaving. You'll also notice that I'm not using the configuration from my other post. I then added the logback configuration folder to my classpath and reran the test; I only needed one iteration: http://logback.pastebin.com/f2726cae2 Instant interleaving on both the console and in the RollingFileAppender; check lines 47, 50, and 51. I am very confident that there is a difference between the default configuration and the one I am using now that creates this problem, although, as you mentioned, the fault probably lies with Thread.interrupt(). However, Thread.interrupt() itself is wholly insufficient to cause interleaving. It could be that the mere presence of multiple appenders is what creates the issue; I'm not sure.
Interleaved messages from multiple threads ------------------------------------------
Key: LBCLASSIC-158 URL: http://jira.qos.ch/browse/LBCLASSIC-158 Project: logback-classic Issue Type: Bug Components: appender Affects Versions: 0.917 Environment: Mac OS X 10.5.8, Eclipse 20090920-1017, Java SE 6, Mac Pro with 2 processors with 4 cores each Reporter: Michael Olson Assignee: Logback dev list Priority: Critical
Note: I'm using pastebin so that the formatting is not messed with by JIRA and so that I can refer to line numbers. My apologies if that makes it more difficult to examine my files. I've just moved to logback from log4j (which I was using only with the default configuration) using the slf4j wrappers. I'm using a configuration which is a slightly changed version of one found in the documentation; the configuration can be found here: http://logback.pastebin.com/m1a6f2875 These two log file excerpts are from that configuration: http://logback.pastebin.com/m5e08cb57 - note lines 3 and 4 http://logback.pastebin.com/m7e575564 - note lines 2, 3, and 4 Then this configuration, which only changes the class levels, is for the next log (I wanted to be precise): http://logback.pastebin.com/f6195004e This log file is from this morning: http://logback.pastebin.com/f42bdb83f Note lines 2 and 3. You'll notice, I'm sure, that these are all from the end of my log files; I have not observed this behavior elsewhere in my logs. The observed problems have been during my shutdown procedure. Specifically, the class PostItemSubmitter, with 4 threads, is waiting on input from CalaisSubmitter (it is trying to retrieve an object from a LinkedBlockingQueue with a get() operation). In these particular logs, CalaisSubmitter has run into fatal errors (over SLA, whoops) and has put no data in the queue, so all four threads are currently blocked on their input queue. After all of the CalaisSubmitter threads are shutdown, the main thread tries to shut down the PostItemSubmitter threads. It does this by calling a function in the thread's instance which sets a boolean value to false that controls the continued execution of the main program loop. After it has set those variables, the main thread will check the size of the queue; if it is 0, which it is in this case, then it will interrupt any threads that have not stopped after one second. It will do this immediately, so all four threads will receive Thread.interrupt() calls in sequence. When those threads are interrupted, they throw an InterruptedException. The catch statement looks like this: catch (InterruptedException e) { if (running) logger.error("Interrupted getting item from queue"); else logger.info("Shutdown aborted queue operation"); } In at least two of these three logs, it seems clear that the problem lies in how the interrupted threads are using the static logger defined for their class, which looks like this: private static Logger logger = LoggerFactory.getLogger(PostItemSubmitter.class); In the other log, the interleaving comes with the CalaisSubmitter, but you can see at least one aborted queue operation, so the reason could be similar (that class is also interrupted on an empty queue and catches those interrupts in the same way). Let me know if you need any more information to help diagnose the problem.
-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.qos.ch/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

[ http://jira.qos.ch/browse/LBCLASSIC-158?page=com.atlassian.jira.plugin.syste... ] Michael Olson commented on LBCLASSIC-158: ----------------------------------------- Update: I was going through my logs this morning to see what kinds of errors were generated over the weekend. I just found the first instance in my RollingFileAppender log of interleaved messages before the shutdown sequence is initiated; here is the log excerpt on pastebin: http://logback.pastebin.com/f41292fea And here it is again just included, since pastebin has been acting funny: 2009-10-03 06:09:06,464 [main] DEBUG Eventweb - IQ: 2 OQ: 0 2009-10-03 06:09:09,786 [Calais-02] DEBUG CalaisSubmitter - Caught ; nested exception is: java.net.ConnectException: Connection timed out while submitting content of size 2,094; retry 10 2009-10-03 06:09:09,7862009-10-03 06:09:09,7862009-10-03 06:09:09,786 [ [ [Calais-02Calais-01Calais-03]]] WARN CalaisSubmitter - Too many NON_FATAL; backing off system for a minute (last: ; nested exception is: java.net.ConnectException: Connection timed out on size: 2,094) 2009-10-03 06:09:09,7862009-10-03 06:09:09,7862009-10-03 06:09:09,786 [ [ [Calais-02Calais-01Calais-03]]] DEBUG CalaisSubmitter - Caught ; nested exception is: java.net.ConnectException: Connection timed out while submitting content of size 613; retry 6 2009-10-03 06:09:09,7862009-10-03 06:09:09,7862009-10-03 06:09:09,786 [ [ [Calais-02Calais-01Calais-03]]] DEBUG CalaisSubmitter - Caught ; nested exception is: java.net.ConnectException: Connection timed out while submitting content of size 1,071; retry 10 2009-10-03 06:09:09,786 [Calais-01] WARN CalaisSubmitter - Too many NON_FATAL; backing off system for a minute (last: ; nested exception is: java.net.ConnectException: Connection timed out on size: 1,071) 2009-10-03 06:09:09,786 [Calais-05] DEBUG CalaisSubmitter - Caught ; nested exception is: java.net.ConnectException: Connection timed out while submitting content of size 213; retry 10 2009-10-03 06:09:09,786 [Calais-05] WARN CalaisSubmitter - Too many NON_FATAL; backing off system for a minute (last: ; nested exception is: java.net.ConnectException: Connection timed out on size: 213) 2009-10-03 06:09:16,465 [main] DEBUG Eventweb - IQ: 2 OQ: 0 This makes it clear that Thread.interrupt() is not actually the problem, as that function is never called until the shutdown sequence.
Interleaved messages from multiple threads ------------------------------------------
Key: LBCLASSIC-158 URL: http://jira.qos.ch/browse/LBCLASSIC-158 Project: logback-classic Issue Type: Bug Components: appender Affects Versions: 0.917 Environment: Mac OS X 10.5.8, Eclipse 20090920-1017, Java SE 6, Mac Pro with 2 processors with 4 cores each Reporter: Michael Olson Assignee: Logback dev list Priority: Critical
Note: I'm using pastebin so that the formatting is not messed with by JIRA and so that I can refer to line numbers. My apologies if that makes it more difficult to examine my files. I've just moved to logback from log4j (which I was using only with the default configuration) using the slf4j wrappers. I'm using a configuration which is a slightly changed version of one found in the documentation; the configuration can be found here: http://logback.pastebin.com/m1a6f2875 These two log file excerpts are from that configuration: http://logback.pastebin.com/m5e08cb57 - note lines 3 and 4 http://logback.pastebin.com/m7e575564 - note lines 2, 3, and 4 Then this configuration, which only changes the class levels, is for the next log (I wanted to be precise): http://logback.pastebin.com/f6195004e This log file is from this morning: http://logback.pastebin.com/f42bdb83f Note lines 2 and 3. You'll notice, I'm sure, that these are all from the end of my log files; I have not observed this behavior elsewhere in my logs. The observed problems have been during my shutdown procedure. Specifically, the class PostItemSubmitter, with 4 threads, is waiting on input from CalaisSubmitter (it is trying to retrieve an object from a LinkedBlockingQueue with a get() operation). In these particular logs, CalaisSubmitter has run into fatal errors (over SLA, whoops) and has put no data in the queue, so all four threads are currently blocked on their input queue. After all of the CalaisSubmitter threads are shutdown, the main thread tries to shut down the PostItemSubmitter threads. It does this by calling a function in the thread's instance which sets a boolean value to false that controls the continued execution of the main program loop. After it has set those variables, the main thread will check the size of the queue; if it is 0, which it is in this case, then it will interrupt any threads that have not stopped after one second. It will do this immediately, so all four threads will receive Thread.interrupt() calls in sequence. When those threads are interrupted, they throw an InterruptedException. The catch statement looks like this: catch (InterruptedException e) { if (running) logger.error("Interrupted getting item from queue"); else logger.info("Shutdown aborted queue operation"); } In at least two of these three logs, it seems clear that the problem lies in how the interrupted threads are using the static logger defined for their class, which looks like this: private static Logger logger = LoggerFactory.getLogger(PostItemSubmitter.class); In the other log, the interleaving comes with the CalaisSubmitter, but you can see at least one aborted queue operation, so the reason could be similar (that class is also interrupted on an empty queue and catches those interrupts in the same way). Let me know if you need any more information to help diagnose the problem.
-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.qos.ch/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
participants (2)
-
Ceki Gulcu (JIRA)
-
Michael Olson (JIRA)