
Hi Alex, Thank you for this feedback. Your propsal to use tryLock instead of lock seems highly appropriate to me. I've assigned LBCORE-168 to myself to ensure that it gets fixed soon. Cheers, -- Ceki On 08.04.2011 15:32, Alex Vb wrote:
I created a bug report about it a while back: http://jira.qos.ch/browse/LBCORE-168 I am not a windows expert by any stretch of the imagination, but in a windows cluster you can define a single owner for a shared SAN, this means if one server goes down, the other automatically takes up ownership of the drive which should be relatively transparant to other systems. However there is a 10-ish second delay in the switch and during those 10 seconds, something can go awefully wrong with the file locks in prudent mode it seems. Basically the JVM reports the log file as locked and because the lock() is blocking the thread trying to lock it will hang indefinately. The current custom appender we have written uses tryLock() multiple times with a delay and after x (configurable) amount of failures it dies gracefully. It is hard to write a testcase for this for obvious reasons, but I was able (at least back when the issue was filed) to recreate the situation repeatedly.