
This issue is still creeping me out. First of all, I'm perfectly fine with your workaround. Nevertheless, we should probably create an app that produces the IllegalMonitorStateException and deliver it to the Java developers. If this is a Java bug - and I have no other explanation - then it's a really critical one... Cheers, Joern.

On Thu, 29 Sep 2011 17:44:29 +0200, Joern Huxhorn wrote:
This issue is still creeping me out.
No reason for that.
Nevertheless, we should probably create an app that produces the IllegalMonitorStateException and deliver it to the Java developers. If this is a Java bug - and I have no other explanation - then it's a really critical one...
Did you read the last comment I added to the issue after it was closed? -h

On 30/09/2011 4:29 PM, Holger Hoffstaette wrote:
On Thu, 29 Sep 2011 17:44:29 +0200, Joern Huxhorn wrote:
This issue is still creeping me out.
No reason for that.
Nevertheless, we should probably create an app that produces the IllegalMonitorStateException and deliver it to the Java developers. If this is a Java bug - and I have no other explanation - then it's a really critical one...
Did you read the last comment I added to the issue after it was closed?
Yes, but I did not understand the point you were trying to make. You wrote: "If you google for the exception practically all occurrences show _buggy application code_ as the cause, " indeed, but then you write: "which explains why this happens only with Zookeeper (no kidding)." huh? You then write: "The simplest explanation can be found here: http://jira.porticoproject.org/browse/PORT-168" In PORT-168 Tim Pokorny talks about "unlocking the readlock when [he] should have locking", that is not the situation we are in. Perhaps you are saying that there is a problem in AppenderAttachableImpl?
-h

On Fri, 30 Sep 2011 17:59:56 +0200, Ceki Gülcü wrote:
Yes, but I did not understand the point you were trying to make.
You wrote:
"If you google for the exception practically all occurrences show _buggy application code_ as the cause, "
indeed, but then you write:
"which explains why this happens only with Zookeeper (no kidding)."
The point being that ZK has "issues" with its own Thread management.
"The simplest explanation can be found here: http://jira.porticoproject.org/browse/PORT-168"
In PORT-168 Tim Pokorny talks about "unlocking the readlock when [he] should have locking", that is not the situation we are in. Perhaps you are saying that there is a problem in AppenderAttachableImpl?
What I was trying to say is that the bug is much more likely in ZK and one of its racy/buggy codepaths that merely exposes this symptom. They originally hardcoded everything against log4j and therefore probably just didn't see this happen, maybe because log4j is oversynchronized everywhere and thus hiding the root cause. I also looked through 0.9.30's AppenderAttachableImpl and it looks fine. Using a COWList is still probably better anyway since it reduces code and makes the most common paths simpler, so that's good. However what it also does is again hide the buggy ZK behaviour, but that is hardly a logback bug. :) cheers Holger

On 30/09/2011 6:27 PM, Holger Hoffstaette wrote:
What I was trying to say is that the bug is much more likely in ZK and one of its racy/buggy codepaths that merely exposes this symptom. They originally hardcoded everything against log4j and therefore probably just didn't see this happen, maybe because log4j is oversynchronized everywhere and thus hiding the root cause.
I don't see how anything a caller of logback, in this case zookeeper, could do to cause the readLock to be unlocked before it is locked. Actually, how could you trigger the observed behavior even if you wanted to trigger it on purpose? Do you have any ideas? -- http://twitter.com/ceki

On Fri, 30 Sep 2011 18:36:54 +0200, Ceki Gülcü wrote:
I don't see how anything a caller of logback, in this case zookeeper, could do to cause the readLock to be unlocked before it is locked. Actually, how could you trigger the observed behavior even if you wanted to trigger it on purpose? Do you have any ideas?
The most common cause seems to be a swallowed inner exception. Several other findings seem to support this: http://java.net/jira/browse/GLASSFISH-1881 http://lists.terracotta.org/pipermail/ehcache-list/2011-July/thread.html http://cs.oswego.edu/pipermail/concurrency-interest/2006-June/002738.html See last one for a good example. Curiously it seems that this problem reared its head some time ago: http://qos.ch/pipermail/logback-dev/2009-February/003620.html though I think that was more about safe lock usage, which seems to be addressed now. Maybe somehow there _is_ a silent exception happening between Logger.appendLoopOnAppenders() and AAI, since it also appears on the JIRA's stack trace? I'm not familiar enough with logback's internals to make more educated guesses right now. The only thing I can immediately think of would be some weird recursive logging from a threaded appender, with bits of StackOverflow (maybe causing more logging..) thrown in. -h

On 30.09.2011, at 19:09, Holger Hoffstaette wrote:
On Fri, 30 Sep 2011 18:36:54 +0200, Ceki Gülcü wrote:
I don't see how anything a caller of logback, in this case zookeeper, could do to cause the readLock to be unlocked before it is locked. Actually, how could you trigger the observed behavior even if you wanted to trigger it on purpose? Do you have any ideas?
The most common cause seems to be a swallowed inner exception. Several other findings seem to support this:
They had a bug in their codebase that they subsequently fixed.
http://lists.terracotta.org/pipermail/ehcache-list/2011-July/thread.html
They extended ReentrantReadWriteLock in a rather complex way and I'm pretty sure that they simply made a mistake somewhere. See https://fisheye2.atlassian.com/browse/ehcache/tags/ehcache-core-2.4.2/src/ma... I don't want to examine 909 lines of code to find the real reason of their issue.
http://cs.oswego.edu/pipermail/concurrency-interest/2006-June/002738.html
See last one for a good example.
I can see the reason for that exception perfectly well. They just misused the locks.
Curiously it seems that this problem reared its head some time ago: http://qos.ch/pipermail/logback-dev/2009-February/003620.html though I think that was more about safe lock usage, which seems to be addressed now.
Maybe somehow there _is_ a silent exception happening between Logger.appendLoopOnAppenders() and AAI, since it also appears on the JIRA's stack trace? I'm not familiar enough with logback's internals to make more educated guesses right now.
The only thing I can immediately think of would be some weird recursive logging from a threaded appender, with bits of StackOverflow (maybe causing more logging..) thrown in.
-h
I checked all the links above and they are all doing way more complex stuff. The method in question is very, very simple: public int appendLoopOnAppenders(E e) { int size = 0; r.lock(); try { for (Appender<E> appender : appenderList) { appender.doAppend(e); size++; } } finally { r.unlock(); } return size; } There is simply *no* way this exception should be able to happen. r is a private final variable so, assuming no one is accessing that variable maliciously via reflection, locking is guaranteed to be symmetric. Usage of variable r in the rest of the class is also correct. I'm just baffled by this issue. And I hate being baffled by something that is very obviously correct. Yes, it's easy to make a mistake but there is just no apparent problem in the previous implementation of AppenderAttachableImpl. I triple-checked it. This leaves only two options: I either fundamentally misunderstand something about java concurrency (which I would be very eager to rectify!) or there is a serious bug in either the implementation of (at least unfair) ReentrantReadWriteLock and/or some Java VMs (which makes me highly nervous). Joern.

Addendum.. On Fri, 30 Sep 2011 18:27:19 +0200, Holger Hoffstaette wrote:
What I was trying to say is that the bug is much more likely in ZK and one of its racy/buggy codepaths that merely exposes this symptom. They
This is further backed by the fact that this happens during a (one of many?) test case during ZKServer shutdown. If they have multiple threads shutting down the server, zombie threads hanging around in the background and maybe shutting down logging (?) or doing other crazy things then it's easy to see how this could result in bungled lock acquisition. -h

On 30/09/2011 6:39 PM, Holger Hoffstaette wrote:
Addendum..
On Fri, 30 Sep 2011 18:27:19 +0200, Holger Hoffstaette wrote:
What I was trying to say is that the bug is much more likely in ZK and one of its racy/buggy codepaths that merely exposes this symptom. They
This is further backed by the fact that this happens during a (one of many?) test case during ZKServer shutdown. If they have multiple threads shutting down the server, zombie threads hanging around in the background and maybe shutting down logging (?) or doing other crazy things then it's easy to see how this could result in bungled lock acquisition.
On branch lbcore224, I modified AppenderAttachableImpl so that only addAppender and appendLoopOnAppenders could be invoked. See [1]. IllegalMonitorStateException was still being thrown. Given commit [1], how could you trigger a IllegalMonitorStateException ? It seems quite impossible although the code for ReadWriteLock is quite complicated so there is ample room for bugs. [1] https://github.com/ceki/logback/commit/379c40573e208b2c5bf52c7e
-h -- http://twitter.com/ceki

On 29/09/2011 5:44 PM, Joern Huxhorn wrote:
This issue is still creeping me out.
First of all, I'm perfectly fine with your workaround.
OK, good to hear. Thanks.
Nevertheless, we should probably create an app that produces the IllegalMonitorStateException and deliver it to the Java developers. If this is a Java bug - and I have no other explanation - then it's a really critical one...
Although I do not have the time to work on this myself, I support your initiative.
Joern.

On 30.09.2011, at 18:02, Ceki Gülcü wrote:
On 29/09/2011 5:44 PM, Joern Huxhorn wrote:
This issue is still creeping me out.
First of all, I'm perfectly fine with your workaround.
OK, good to hear. Thanks.
Nevertheless, we should probably create an app that produces the IllegalMonitorStateException and deliver it to the Java developers. If this is a Java bug - and I have no other explanation - then it's a really critical one...
Although I do not have the time to work on this myself, I support your initiative.
Hint taken. ;) My only problem is that I don't have any system that's able to reproduce this bug (only Mac or Linux) and I'd rather not want to submit this already very obscure issue without being sure that the attached example/application is really reproducing the problem. Joern.

On 03/10/2011 7:16 PM, Joern Huxhorn wrote:
On 30.09.2011, at 18:02, Ceki Gülcü wrote:
On 29/09/2011 5:44 PM, Joern Huxhorn wrote:
This issue is still creeping me out.
First of all, I'm perfectly fine with your workaround.
OK, good to hear. Thanks.
Nevertheless, we should probably create an app that produces the IllegalMonitorStateException and deliver it to the Java developers. If this is a Java bug - and I have no other explanation - then it's a really critical one...
Although I do not have the time to work on this myself, I support your initiative.
Hint taken. ;)
My only problem is that I don't have any system that's able to reproduce this bug (only Mac or Linux) and I'd rather not want to submit this already very obscure issue without being sure that the attached example/application is really reproducing the problem.
On the lbcore224 branch before commit eef137b0c6ca92cd4 [1], running on Windows 7 64 bits and AMD Phenom II X4 965 processor, I am able to reproduce the bug 6 or 7 times out of every 10 runs. So it's pretty reproducible, at least on my machine. The fact that the IllegalMonitorStateException is thrown within the ZooKeeperServer.shutdown method could be relavant. I was also wondering whether the IllegalMonitorStateException could be caused by Thread.interrupt calls. [1] https://github.com/ceki/logback/commit/eef137b0c6ca92cd4
Joern.

Could you please give https://github.com/huxi/lbcore224 a try? Just clone the repository and call gradlew.bat - probably multiple times. This will execute the test. As I said, I can't reproduce the issue myself. Everything works fine for me. I hope I thought about everything... Cheers, Joern. On 03.10.2011, at 19:41, Ceki Gülcü wrote:
On 03/10/2011 7:16 PM, Joern Huxhorn wrote:
On 30.09.2011, at 18:02, Ceki Gülcü wrote:
On 29/09/2011 5:44 PM, Joern Huxhorn wrote:
This issue is still creeping me out.
First of all, I'm perfectly fine with your workaround.
OK, good to hear. Thanks.
Nevertheless, we should probably create an app that produces the IllegalMonitorStateException and deliver it to the Java developers. If this is a Java bug - and I have no other explanation - then it's a really critical one...
Although I do not have the time to work on this myself, I support your initiative.
Hint taken. ;)
My only problem is that I don't have any system that's able to reproduce this bug (only Mac or Linux) and I'd rather not want to submit this already very obscure issue without being sure that the attached example/application is really reproducing the problem.
On the lbcore224 branch before commit eef137b0c6ca92cd4 [1], running on Windows 7 64 bits and AMD Phenom II X4 965 processor, I am able to reproduce the bug 6 or 7 times out of every 10 runs. So it's pretty reproducible, at least on my machine.
The fact that the IllegalMonitorStateException is thrown within the ZooKeeperServer.shutdown method could be relavant. I was also wondering whether the IllegalMonitorStateException could be caused by Thread.interrupt calls.
[1] https://github.com/ceki/logback/commit/eef137b0c6ca92cd4
Joern.
-- http://twitter.com/ceki _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

Hi Joern, I checked out and ran https://github.com/huxi/lbcore224 but could not reproduce the problem. The only difference I could find between your project and the tests I ran in my lbcore224 branch, is the logback config file [1]. You do not have any config file. After adding it under src/test/resources logback picked it up. So my tests and your looks identical but the latter do not result in IllegalMonitorStateException. Weird getting weirder... [1] https://github.com/ceki/logback/blob/lbcore224/logback-classic/src/test/reso... On 04/10/2011 12:31 AM, Joern Huxhorn wrote:
Could you please give https://github.com/huxi/lbcore224 a try?
Just clone the repository and call gradlew.bat - probably multiple times. This will execute the test.
As I said, I can't reproduce the issue myself. Everything works fine for me.
I hope I thought about everything...
Cheers, Joern.
On 03.10.2011, at 19:41, Ceki Gülcü wrote:
On 03/10/2011 7:16 PM, Joern Huxhorn wrote:
On 30.09.2011, at 18:02, Ceki Gülcü wrote:
On 29/09/2011 5:44 PM, Joern Huxhorn wrote:
This issue is still creeping me out.
First of all, I'm perfectly fine with your workaround.
OK, good to hear. Thanks.
Nevertheless, we should probably create an app that produces the IllegalMonitorStateException and deliver it to the Java developers. If this is a Java bug - and I have no other explanation - then it's a really critical one...
Although I do not have the time to work on this myself, I support your initiative.
Hint taken. ;)
My only problem is that I don't have any system that's able to reproduce this bug (only Mac or Linux) and I'd rather not want to submit this already very obscure issue without being sure that the attached example/application is really reproducing the problem.
On the lbcore224 branch before commit eef137b0c6ca92cd4 [1], running on Windows 7 64 bits and AMD Phenom II X4 965 processor, I am able to reproduce the bug 6 or 7 times out of every 10 runs. So it's pretty reproducible, at least on my machine.
The fact that the IllegalMonitorStateException is thrown within the ZooKeeperServer.shutdown method could be relavant. I was also wondering whether the IllegalMonitorStateException could be caused by Thread.interrupt calls.
[1] https://github.com/ceki/logback/commit/eef137b0c6ca92cd4
Joern.
-- http://twitter.com/ceki _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev
_______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

oO Ok, added the logback-test.xml (but changed file location to build/lbcore224.log) Could you please take a look at the output of your log file? Mine says ~[zookeeper-3.4.0.jar:3.4.0-1178579] since I built against SVN-rev1178579 You could replace the two files in lib with your versions and check with them. The only other apparent difference is that I use a TemporaryFolder @Rule instead of creating and deleting the directories manually. I also use JUnit 4.10. And Gradle instead of Maven. Other than that..... I have not the faintest idea. The other two dependencies are just slf4j-1.6.2 and logback-0.9.30 (see build.gradle). I assumed César was also referring to the released 0.9.30. Thanks for having a look at it & sorry for bothering you. ;) Joern. On 04.10.2011, at 08:09, Ceki Gülcü wrote:
Hi Joern,
I checked out and ran https://github.com/huxi/lbcore224 but could not reproduce the problem. The only difference I could find between your project and the tests I ran in my lbcore224 branch, is the logback config file [1]. You do not have any config file. After adding it under src/test/resources logback picked it up. So my tests and your looks identical but the latter do not result in IllegalMonitorStateException.
Weird getting weirder...
[1] https://github.com/ceki/logback/blob/lbcore224/logback-classic/src/test/reso...
On 04/10/2011 12:31 AM, Joern Huxhorn wrote:
Could you please give https://github.com/huxi/lbcore224 a try?
Just clone the repository and call gradlew.bat - probably multiple times. This will execute the test.
As I said, I can't reproduce the issue myself. Everything works fine for me.
I hope I thought about everything...
Cheers, Joern.
On 03.10.2011, at 19:41, Ceki Gülcü wrote:
On 03/10/2011 7:16 PM, Joern Huxhorn wrote:
On 30.09.2011, at 18:02, Ceki Gülcü wrote:
On 29/09/2011 5:44 PM, Joern Huxhorn wrote:
This issue is still creeping me out.
First of all, I'm perfectly fine with your workaround.
OK, good to hear. Thanks.
Nevertheless, we should probably create an app that produces the IllegalMonitorStateException and deliver it to the Java developers. If this is a Java bug - and I have no other explanation - then it's a really critical one...
Although I do not have the time to work on this myself, I support your initiative.
Hint taken. ;)
My only problem is that I don't have any system that's able to reproduce this bug (only Mac or Linux) and I'd rather not want to submit this already very obscure issue without being sure that the attached example/application is really reproducing the problem.
On the lbcore224 branch before commit eef137b0c6ca92cd4 [1], running on Windows 7 64 bits and AMD Phenom II X4 965 processor, I am able to reproduce the bug 6 or 7 times out of every 10 runs. So it's pretty reproducible, at least on my machine.
The fact that the IllegalMonitorStateException is thrown within the ZooKeeperServer.shutdown method could be relavant. I was also wondering whether the IllegalMonitorStateException could be caused by Thread.interrupt calls.
[1] https://github.com/ceki/logback/commit/eef137b0c6ca92cd4
Joern.
-- http://twitter.com/ceki _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev
_______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev
-- http://twitter.com/ceki _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

When run with gradle, the test finishes without exception. When I run the lbcore244 project under IntelliJ, then IllegalMonitorStateException is thrown. I am using logback version 0.9.30 and junit 4.8.2. I'll try with the same version of junit a little later. On 04.10.2011 11:29, Joern Huxhorn wrote:
oO
Ok, added the logback-test.xml (but changed file location to build/lbcore224.log)
Could you please take a look at the output of your log file? Mine says ~[zookeeper-3.4.0.jar:3.4.0-1178579] since I built against SVN-rev1178579 You could replace the two files in lib with your versions and check with them.
The only other apparent difference is that I use a TemporaryFolder @Rule instead of creating and deleting the directories manually. I also use JUnit 4.10. And Gradle instead of Maven. Other than that..... I have not the faintest idea. The other two dependencies are just slf4j-1.6.2 and logback-0.9.30 (see build.gradle). I assumed César was also referring to the released 0.9.30.
Thanks for having a look at it& sorry for bothering you. ;)
Joern.

The version of junit is not significant here. What seems significant is the build environment, that is junit vs. intellij. On 04.10.2011 13:37, Ceki Gulcu wrote:
When run with gradle, the test finishes without exception. When I run the lbcore244 project under IntelliJ, then IllegalMonitorStateException is thrown. I am using logback version 0.9.30 and junit 4.8.2. I'll try with the same version of junit a little later.

Did the original test in your branch fail using Maven or did it also only fail from IDEA? This could hint at some attached debugger weirdness or something like that. Or the execution in IDEA is simply a little bit slower than on the console, thus triggering the problem... I need to get a Windows machine up and running. :( On 04.10.2011, at 14:35, Ceki Gulcu wrote:
The version of junit is not significant here. What seems significant is the build environment, that is junit vs. intellij.
On 04.10.2011 13:37, Ceki Gulcu wrote:
When run with gradle, the test finishes without exception. When I run the lbcore244 project under IntelliJ, then IllegalMonitorStateException is thrown. I am using logback version 0.9.30 and junit 4.8.2. I'll try with the same version of junit a little later.
_______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

On 04.10.2011 14:46, Joern Huxhorn wrote:
Did the original test in your branch fail using Maven or did it also only fail from IDEA?
I was not sure if I could reproduce the IllegalMonitorStateException from Maven. I just resurrected the WithZookeperTest and ran it with Maven and was able to reproduce the IMSE. BTW, there was no logback configuration file, so the default config was used (ConsoleAppender). Maybe IDEA and Maven launch the tests differently than Gradle... ? -- Ceki

I updated my project at https://github.com/huxi/lbcore224 and added a pom.xml I'm able to reproduce the issue on Mac OS X 10.6 with 1.6.0_u26 if I execute "mvn clean install". I'm also able to reproduce it running the test in IDEA. Last but not least, I submitted http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7099601 but it hasn't been processed yet. Cheers, Joern. On 04.10.2011, at 16:05, Ceki Gulcu wrote:
On 04.10.2011 14:46, Joern Huxhorn wrote:
Did the original test in your branch fail using Maven or did it also only fail from IDEA?
I was not sure if I could reproduce the IllegalMonitorStateException from Maven. I just resurrected the WithZookeperTest and ran it with Maven and was able to reproduce the IMSE. BTW, there was no logback configuration file, so the default config was used (ConsoleAppender).
Maybe IDEA and Maven launch the tests differently than Gradle... ? -- Ceki _______________________________________________ logback-dev mailing list logback-dev@qos.ch http://qos.ch/mailman/listinfo/logback-dev

Bug 7099601 is not yet accessible. Anyway, I hope the Sun folks don't throw it out because it may lack credibility. Its reproduction requires logback+zookeeper. Plus, one needs to carefully read pre-fix version of AppenderAttachableImpl to be convinced that we are not hallucinating. Syzygy is the word that comes to my mind. I am keeping my fingers crossed. On 11.10.2011 17:10, Joern Huxhorn wrote:
I updated my project at https://github.com/huxi/lbcore224 and added a pom.xml I'm able to reproduce the issue on Mac OS X 10.6 with 1.6.0_u26 if I execute "mvn clean install". I'm also able to reproduce it running the test in IDEA. Last but not least, I submitted http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7099601 but it hasn't been processed yet.
Cheers, Joern.
On 04.10.2011, at 16:05, Ceki Gulcu wrote:
On 04.10.2011 14:46, Joern Huxhorn wrote:
Did the original test in your branch fail using Maven or did it also only fail from IDEA?
I was not sure if I could reproduce the IllegalMonitorStateException from Maven. I just resurrected the WithZookeperTest and ran it with Maven and was able to reproduce the IMSE. BTW, there was no logback configuration file, so the default config was used (ConsoleAppender).
Maybe IDEA and Maven launch the tests differently than Gradle... ? -- Ceki
participants (5)
-
ceki
-
Ceki Gulcu
-
Ceki Gülcü
-
Holger Hoffstaette
-
Joern Huxhorn