I think it's reproducible when setting the level to DEBUG again.
I looked a bit further and here are my findings:
- Thread-21 has a lock on Object (id=202). This must be the lock object within Logback ConsoleAppender because all other threads are waiting on this one (as is NioProcessor-2 for example).
- Thread-21 is waiting for Window (id=485) but not because of a lock but because of an Object.wait() call on Window.
- Karaf must have replaced System.out with a ConsoleOutputStream that wraps a ChannelOutputStream from SSHD.
- ChannelOutputStream.flush() is called recursively in Thread-21 because of the console logging.
- The formatting above is poor. In thread NioProcessor-2 it should read "waiting for Object (id=202) which is owned by [Thread-21] which is waiting on Window (id=485)";
The situation must be as follows: Thread-21 is waiting for some other thread to free space in the Window object (waitForSpace). This is implemented using Object.wait/notify(). However, no other thread will ever call Window.notify because they are all stuck in ConsoleAppender while doing some log output because the ConsoleAppender lock is held by Thread-21 which is sleeping.
I think the issue is a combination of SSHD + Karaf + Logback. Logback is configured to write debug info to System.out Karaf hijacks System.out to properly route output for SSH console sessions established via SSHD. SSHD uses a wait/notify pattern when flushing content from the server to clients via an asynchronous threading model. However, the flushing ("notifying") thread is stuck because of Logback OutputStreamAppender synchronization behavior which only allows one thread at a time to write to any output stream.
So, maybe a possible solution is to disable the synchronization for ConsoleAppender only?
I looked a bit further and here are my findings:
- Thread-21 has a lock on Object (id=202). This must be the lock object within Logback ConsoleAppender because all other threads are waiting on this one (as is NioProcessor-2 for example).
- Thread-21 is waiting for Window (id=485) but not because of a lock but because of an Object.wait() call on Window.
- Karaf must have replaced System.out with a ConsoleOutputStream that wraps a ChannelOutputStream from SSHD.
- ChannelOutputStream.flush() is called recursively in Thread-21 because of the console logging.
- The formatting above is poor. In thread NioProcessor-2 it should read "waiting for Object (id=202) which is owned by [Thread-21] which is waiting on Window (id=485)";
The situation must be as follows: Thread-21 is waiting for some other thread to free space in the Window object (waitForSpace). This is implemented using Object.wait/notify(). However, no other thread will ever call Window.notify because they are all stuck in ConsoleAppender while doing some log output because the ConsoleAppender lock is held by Thread-21 which is sleeping.
I think the issue is a combination of SSHD + Karaf + Logback. Logback is configured to write debug info to System.out Karaf hijacks System.out to properly route output for SSH console sessions established via SSHD. SSHD uses a wait/notify pattern when flushing content from the server to clients via an asynchronous threading model. However, the flushing ("notifying") thread is stuck because of Logback OutputStreamAppender synchronization behavior which only allows one thread at a time to write to any output stream.
So, maybe a possible solution is to disable the synchronization for ConsoleAppender only?