tomee+activemq: JMS reading sometimes hang with full queue

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

tomee+activemq: JMS reading sometimes hang with full queue

Emmanuel Touzery-2
Hello,

     we have a tomee+ 7.0.3 installation with activemq, using kahadb as
a persistent message storage. We have an activemq.xml, we plugged it
though :

BrokerXmlConfig = xbean:file:/opt/app/tomee/conf/activemq.xml

     in the tomee.xml. The activemq broken runs within TOMEE:

ServerUrl       =  tcp://127.0.0.1:61616

     We have a prefetch of 2000:

<transportConnector name="nio"
uri="nio://0.0.0.0:61616?jms.prefetchPolicy.all=2000"/>

     We use mKaha. We disabled flow control.

     So that everything would work, we had to add a couple of JARs in
the TOMEE lib folder:

activemq-spring-5.14.3.jar
spring-beans-3.2.9.RELEASE.jar
spring-context-3.2.9.RELEASE.jar
spring-core-3.2.9.RELEASE.jar
spring-expression-3.2.9.RELEASE.jar
spring-web-3.2.9.RELEASE.jar
xbean-spring-3.9.jar

     We are "reading" from JMS through message-driven beans,
implementing MessageListener and with @MessageDriven annotations.

     The application is pretty simple... Receive the data from
HTTP/JSON, and store it to SQL (through hibernate).

     Everything works fine as long as the traffic is normal. However
when there is a surge of incoming traffic, sometimes the JMS consumers
stop getting called, and the queue only grows. The issue does not get
fixed until TOMEE is restarted. And then we've seen the issue re-appear
again maybe 40 minutes later. After a while, the server clears the queue
and everything is fine again.

     We took a jstack thread dump of the application when it's in that
"hung" state:
https://www.dropbox.com/s/p8wy7uz6inzsmlj/jstack.txt?dl=0

     What's interesting is that writes fall quite fast, and in steps, in
general not all at once, but as well not slowly:
https://www.dropbox.com/s/nhm5s2zc7r9mk9z/graph_writes.png?dl=0

     After a restart things are fine again immediately.

     We're not sure what is the cause. From what we can tell from the
thread dump, the consumers are idle, they just don't get notified that
work is available. The server is certainly aware there are items in the
queue, we monitor the queue through JMX and the queue size keeps growing
during these episodes. We don't see anything out of the ordinary in the
logs. We looked at thread IDs for consumers just before the issue, it
doesn't look like the consumers get some deadlock one after the other
for instance. It seems like a bunch of them are called in the last
minute before the dropoff for instance. Also, during a blackout the JDBC
pool usage is at 0 according to our JMX monitoring, so it doesn't seem
to be about a deadlocked JDBC connection.

     We did notice the following activemq warnings in the log file, but
the timestamps don't match with any particular events and from what we
found out, they don't seem to be particularly worrying or likely to be
related to the issue:

WARNING [ActiveMQ Journal Checkpoint Worker]
org.apache.activemq.store.kahadb.MessageDatabase.getNextLocationForAckForward
Failed to load next journal location: null

WARNING [ActiveMQ NIO Worker 6]
org.apache.activemq.broker.TransportConnection.serviceTransportException
Transport Connection to: tcp://127.0.0.1:37024 failed: java.io.EOFException

     Do you have any suggestion to try to fix this issue (which we sadly
can't reproduce at will.. and it only happens pretty rarely)? Should we
rather ask on the activemq mailing list?

     Regards,

emmanuel

Reply | Threaded
Open this post in threaded view
|

Re: tomee+activemq: JMS reading sometimes hang with full queue

exabrial12
Hello Emmanuel, can we see the full log of the tomee startup, the full
tomee.xml, the full activemq.xml, and any other differences between your
runtime and the downloaded runtime? thanks!



--
Sent from: http://tomee-openejb.979440.n4.nabble.com/TomEE-Users-f979441.html
Reply | Threaded
Open this post in threaded view
|

Re: tomee+activemq: JMS reading sometimes hang with full queue

Emmanuel Touzery-2
Hello,

     sure! Regarding the runtime, we also have the postgresql driver in
lib, I think that's all. Hibernate is in our WAR for instance.

     here's the activemq.xml =>
https://www.dropbox.com/s/ukttiupouyiv779/activemq.xml?dl=0

     tomee.xml =>
https://www.dropbox.com/s/452s6d6vknp59a4/tomee.xml?dl=0

     startup log =>
https://www.dropbox.com/s/52uqfatlisodayg/catalina.2018-10-21.log?dl=0

     Regards,

Emmanuel

On 22/10/2018 16:33, exabrial12 wrote:
> Hello Emmanuel, can we see the full log of the tomee startup, the full
> tomee.xml, the full activemq.xml, and any other differences between your
> runtime and the downloaded runtime? thanks!
>
>
>
> --
> Sent from: http://tomee-openejb.979440.n4.nabble.com/TomEE-Users-f979441.html
>

Reply | Threaded
Open this post in threaded view
|

Re: tomee+activemq: JMS reading sometimes hang with full queue

Emmanuel Touzery-2
In reply to this post by Emmanuel Touzery-2
Hello,

     noone has any suggestion?

     Regards,

emmanuel

On 22/10/2018 16:04, Emmanuel Touzery wrote:

> Hello,
>
>     we have a tomee+ 7.0.3 installation with activemq, using kahadb as
> a persistent message storage. We have an activemq.xml, we plugged it
> though :
>
> BrokerXmlConfig = xbean:file:/opt/app/tomee/conf/activemq.xml
>
>     in the tomee.xml. The activemq broken runs within TOMEE:
>
> ServerUrl       =  tcp://127.0.0.1:61616
>
>     We have a prefetch of 2000:
>
> <transportConnector name="nio"
> uri="nio://0.0.0.0:61616?jms.prefetchPolicy.all=2000"/>
>
>     We use mKaha. We disabled flow control.
>
>     So that everything would work, we had to add a couple of JARs in
> the TOMEE lib folder:
>
> activemq-spring-5.14.3.jar
> spring-beans-3.2.9.RELEASE.jar
> spring-context-3.2.9.RELEASE.jar
> spring-core-3.2.9.RELEASE.jar
> spring-expression-3.2.9.RELEASE.jar
> spring-web-3.2.9.RELEASE.jar
> xbean-spring-3.9.jar
>
>     We are "reading" from JMS through message-driven beans,
> implementing MessageListener and with @MessageDriven annotations.
>
>     The application is pretty simple... Receive the data from
> HTTP/JSON, and store it to SQL (through hibernate).
>
>     Everything works fine as long as the traffic is normal. However
> when there is a surge of incoming traffic, sometimes the JMS consumers
> stop getting called, and the queue only grows. The issue does not get
> fixed until TOMEE is restarted. And then we've seen the issue
> re-appear again maybe 40 minutes later. After a while, the server
> clears the queue and everything is fine again.
>
>     We took a jstack thread dump of the application when it's in that
> "hung" state:
> https://www.dropbox.com/s/p8wy7uz6inzsmlj/jstack.txt?dl=0
>
>     What's interesting is that writes fall quite fast, and in steps,
> in general not all at once, but as well not slowly:
> https://www.dropbox.com/s/nhm5s2zc7r9mk9z/graph_writes.png?dl=0
>
>     After a restart things are fine again immediately.
>
>     We're not sure what is the cause. From what we can tell from the
> thread dump, the consumers are idle, they just don't get notified that
> work is available. The server is certainly aware there are items in
> the queue, we monitor the queue through JMX and the queue size keeps
> growing during these episodes. We don't see anything out of the
> ordinary in the logs. We looked at thread IDs for consumers just
> before the issue, it doesn't look like the consumers get some deadlock
> one after the other for instance. It seems like a bunch of them are
> called in the last minute before the dropoff for instance. Also,
> during a blackout the JDBC pool usage is at 0 according to our JMX
> monitoring, so it doesn't seem to be about a deadlocked JDBC connection.
>
>     We did notice the following activemq warnings in the log file, but
> the timestamps don't match with any particular events and from what we
> found out, they don't seem to be particularly worrying or likely to be
> related to the issue:
>
> WARNING [ActiveMQ Journal Checkpoint Worker]
> org.apache.activemq.store.kahadb.MessageDatabase.getNextLocationForAckForward
> Failed to load next journal location: null
>
> WARNING [ActiveMQ NIO Worker 6]
> org.apache.activemq.broker.TransportConnection.serviceTransportException
> Transport Connection to: tcp://127.0.0.1:37024 failed:
> java.io.EOFException
>
>     Do you have any suggestion to try to fix this issue (which we
> sadly can't reproduce at will.. and it only happens pretty rarely)?
> Should we rather ask on the activemq mailing list?
>
>     Regards,
>
> emmanuel
>
>

Reply | Threaded
Open this post in threaded view
|

Re: tomee+activemq: JMS reading sometimes hang with full queue

Romain Manni-Bucau
Hello Emmanuel

It can be a lot of things like a network breakdown behind a proxy (so AMQ
does not see it in some cases and a restart recreates the connection), some
backpressure (exponentional), some disk issue etc...

It can be interesting to check your config for healthchecks, batch sizes,
and dump the threads in the server and client when hanging. Also testing
with another backend than kahadb can be interesting depending your work
load.

Le mer. 24 oct. 2018 07:59, Emmanuel Touzery <
[hidden email]> a écrit :

> Hello,
>
>      noone has any suggestion?
>
>      Regards,
>
> emmanuel
>
> On 22/10/2018 16:04, Emmanuel Touzery wrote:
> > Hello,
> >
> >     we have a tomee+ 7.0.3 installation with activemq, using kahadb as
> > a persistent message storage. We have an activemq.xml, we plugged it
> > though :
> >
> > BrokerXmlConfig = xbean:file:/opt/app/tomee/conf/activemq.xml
> >
> >     in the tomee.xml. The activemq broken runs within TOMEE:
> >
> > ServerUrl       =  tcp://127.0.0.1:61616
> >
> >     We have a prefetch of 2000:
> >
> > <transportConnector name="nio"
> > uri="nio://0.0.0.0:61616?jms.prefetchPolicy.all=2000"/>
> >
> >     We use mKaha. We disabled flow control.
> >
> >     So that everything would work, we had to add a couple of JARs in
> > the TOMEE lib folder:
> >
> > activemq-spring-5.14.3.jar
> > spring-beans-3.2.9.RELEASE.jar
> > spring-context-3.2.9.RELEASE.jar
> > spring-core-3.2.9.RELEASE.jar
> > spring-expression-3.2.9.RELEASE.jar
> > spring-web-3.2.9.RELEASE.jar
> > xbean-spring-3.9.jar
> >
> >     We are "reading" from JMS through message-driven beans,
> > implementing MessageListener and with @MessageDriven annotations.
> >
> >     The application is pretty simple... Receive the data from
> > HTTP/JSON, and store it to SQL (through hibernate).
> >
> >     Everything works fine as long as the traffic is normal. However
> > when there is a surge of incoming traffic, sometimes the JMS consumers
> > stop getting called, and the queue only grows. The issue does not get
> > fixed until TOMEE is restarted. And then we've seen the issue
> > re-appear again maybe 40 minutes later. After a while, the server
> > clears the queue and everything is fine again.
> >
> >     We took a jstack thread dump of the application when it's in that
> > "hung" state:
> > https://www.dropbox.com/s/p8wy7uz6inzsmlj/jstack.txt?dl=0
> >
> >     What's interesting is that writes fall quite fast, and in steps,
> > in general not all at once, but as well not slowly:
> > https://www.dropbox.com/s/nhm5s2zc7r9mk9z/graph_writes.png?dl=0
> >
> >     After a restart things are fine again immediately.
> >
> >     We're not sure what is the cause. From what we can tell from the
> > thread dump, the consumers are idle, they just don't get notified that
> > work is available. The server is certainly aware there are items in
> > the queue, we monitor the queue through JMX and the queue size keeps
> > growing during these episodes. We don't see anything out of the
> > ordinary in the logs. We looked at thread IDs for consumers just
> > before the issue, it doesn't look like the consumers get some deadlock
> > one after the other for instance. It seems like a bunch of them are
> > called in the last minute before the dropoff for instance. Also,
> > during a blackout the JDBC pool usage is at 0 according to our JMX
> > monitoring, so it doesn't seem to be about a deadlocked JDBC connection.
> >
> >     We did notice the following activemq warnings in the log file, but
> > the timestamps don't match with any particular events and from what we
> > found out, they don't seem to be particularly worrying or likely to be
> > related to the issue:
> >
> > WARNING [ActiveMQ Journal Checkpoint Worker]
> >
> org.apache.activemq.store.kahadb.MessageDatabase.getNextLocationForAckForward
>
> > Failed to load next journal location: null
> >
> > WARNING [ActiveMQ NIO Worker 6]
> > org.apache.activemq.broker.TransportConnection.serviceTransportException
> > Transport Connection to: tcp://127.0.0.1:37024 failed:
> > java.io.EOFException
> >
> >     Do you have any suggestion to try to fix this issue (which we
> > sadly can't reproduce at will.. and it only happens pretty rarely)?
> > Should we rather ask on the activemq mailing list?
> >
> >     Regards,
> >
> > emmanuel
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: tomee+activemq: JMS reading sometimes hang with full queue

Emmanuel Touzery-2
Hello,

     thank you for the answer!

     In this case TOMEE and AMQ are in the same process, on the same
machine, communicating through 127.0.0.1 so network between AMQ and
TOMEE shouldn't be an issue.
     In our case, writing to JMS keeps working, but consumers don't get
notified. I'm not sure if there are two separate communication channels
for that?

     I'm not sure what you mean by backpressure, but we did disable flow
control (which should only affect writes though, not notifying
consumers) -- were you referring to something like that?
     Also don't know about a disk issue -- the persistent queue keeps
filling up on disk, and I see no exceptions in the logs.

     When you talk about batch size, do you mean acknowledge
optimization? ("ActiveMQ can acknowledge receipt of messages back to the
broker in batches (to improve performance). The batch size is 65% of the
prefetch limit for the Consumer"). This sounds like it could be
related.. If acknowldge breaks down, AMQ would wait on consumers to
complete, while the consumers did complete and are waiting for new
messages. I already had the idea to check on the JMX "InFlightMessages"
info during such an incident to confirm whether AMQ thinks that
consumers are busy. But even if it turns out it does, that doesn't
really help me, short-term.

     In this case client=server (we get messages over HTTP, write them
in a queue on the activeMQ which runs in the same process as TOMEE, and
consume them from the same TOMEE instance) so the thread dump I did
covers both client & server.

     Another backend than kahadb could be interesting, but there's a lot
of traffic and validation of configuration changes for the server is
expensive. I'm not sure that's really workable, especially since the
chances of this fixing the issue are not that high.

     Regards,

Emmanuel

On 24/10/2018 08:39, Romain Manni-Bucau wrote:

> Hello Emmanuel
>
> It can be a lot of things like a network breakdown behind a proxy (so AMQ
> does not see it in some cases and a restart recreates the connection), some
> backpressure (exponentional), some disk issue etc...
>
> It can be interesting to check your config for healthchecks, batch sizes,
> and dump the threads in the server and client when hanging. Also testing
> with another backend than kahadb can be interesting depending your work
> load.
>
> Le mer. 24 oct. 2018 07:59, Emmanuel Touzery <
> [hidden email]> a écrit :
>
>> Hello,
>>
>>       noone has any suggestion?
>>
>>       Regards,
>>
>> emmanuel
>>
>> On 22/10/2018 16:04, Emmanuel Touzery wrote:
>>> Hello,
>>>
>>>      we have a tomee+ 7.0.3 installation with activemq, using kahadb as
>>> a persistent message storage. We have an activemq.xml, we plugged it
>>> though :
>>>
>>> BrokerXmlConfig = xbean:file:/opt/app/tomee/conf/activemq.xml
>>>
>>>      in the tomee.xml. The activemq broken runs within TOMEE:
>>>
>>> ServerUrl       =  tcp://127.0.0.1:61616
>>>
>>>      We have a prefetch of 2000:
>>>
>>> <transportConnector name="nio"
>>> uri="nio://0.0.0.0:61616?jms.prefetchPolicy.all=2000"/>
>>>
>>>      We use mKaha. We disabled flow control.
>>>
>>>      So that everything would work, we had to add a couple of JARs in
>>> the TOMEE lib folder:
>>>
>>> activemq-spring-5.14.3.jar
>>> spring-beans-3.2.9.RELEASE.jar
>>> spring-context-3.2.9.RELEASE.jar
>>> spring-core-3.2.9.RELEASE.jar
>>> spring-expression-3.2.9.RELEASE.jar
>>> spring-web-3.2.9.RELEASE.jar
>>> xbean-spring-3.9.jar
>>>
>>>      We are "reading" from JMS through message-driven beans,
>>> implementing MessageListener and with @MessageDriven annotations.
>>>
>>>      The application is pretty simple... Receive the data from
>>> HTTP/JSON, and store it to SQL (through hibernate).
>>>
>>>      Everything works fine as long as the traffic is normal. However
>>> when there is a surge of incoming traffic, sometimes the JMS consumers
>>> stop getting called, and the queue only grows. The issue does not get
>>> fixed until TOMEE is restarted. And then we've seen the issue
>>> re-appear again maybe 40 minutes later. After a while, the server
>>> clears the queue and everything is fine again.
>>>
>>>      We took a jstack thread dump of the application when it's in that
>>> "hung" state:
>>> https://www.dropbox.com/s/p8wy7uz6inzsmlj/jstack.txt?dl=0
>>>
>>>      What's interesting is that writes fall quite fast, and in steps,
>>> in general not all at once, but as well not slowly:
>>> https://www.dropbox.com/s/nhm5s2zc7r9mk9z/graph_writes.png?dl=0
>>>
>>>      After a restart things are fine again immediately.
>>>
>>>      We're not sure what is the cause. From what we can tell from the
>>> thread dump, the consumers are idle, they just don't get notified that
>>> work is available. The server is certainly aware there are items in
>>> the queue, we monitor the queue through JMX and the queue size keeps
>>> growing during these episodes. We don't see anything out of the
>>> ordinary in the logs. We looked at thread IDs for consumers just
>>> before the issue, it doesn't look like the consumers get some deadlock
>>> one after the other for instance. It seems like a bunch of them are
>>> called in the last minute before the dropoff for instance. Also,
>>> during a blackout the JDBC pool usage is at 0 according to our JMX
>>> monitoring, so it doesn't seem to be about a deadlocked JDBC connection.
>>>
>>>      We did notice the following activemq warnings in the log file, but
>>> the timestamps don't match with any particular events and from what we
>>> found out, they don't seem to be particularly worrying or likely to be
>>> related to the issue:
>>>
>>> WARNING [ActiveMQ Journal Checkpoint Worker]
>>>
>> org.apache.activemq.store.kahadb.MessageDatabase.getNextLocationForAckForward
>>
>>> Failed to load next journal location: null
>>>
>>> WARNING [ActiveMQ NIO Worker 6]
>>> org.apache.activemq.broker.TransportConnection.serviceTransportException
>>> Transport Connection to: tcp://127.0.0.1:37024 failed:
>>> java.io.EOFException
>>>
>>>      Do you have any suggestion to try to fix this issue (which we
>>> sadly can't reproduce at will.. and it only happens pretty rarely)?
>>> Should we rather ask on the activemq mailing list?
>>>
>>>      Regards,
>>>
>>> emmanuel
>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: tomee+activemq: JMS reading sometimes hang with full queue

Romain Manni-Bucau
Le mer. 24 oct. 2018 à 09:23, Emmanuel Touzery <
[hidden email]> a écrit :

> Hello,
>
>      thank you for the answer!
>
>      In this case TOMEE and AMQ are in the same process, on the same
> machine, communicating through 127.0.0.1 so network between AMQ and
> TOMEE shouldn't be an issue.
>

you are client of yourself? was more thinking about client <-> server

typically if you have

sender ------ some proxy ------- message driven bean

then you can have these "fake state" on network layer


>      In our case, writing to JMS keeps working, but consumers don't get
> notified. I'm not sure if there are two separate communication channels
> for that?
>

normally it is the same but depending the queue state it can block

did you check your DLQ too?


>
>      I'm not sure what you mean by backpressure, but we did disable flow
> control (which should only affect writes though, not notifying
> consumers) -- were you referring to something like that?
>

Yep


>      Also don't know about a disk issue -- the persistent queue keeps
> filling up on disk, and I see no exceptions in the logs.
>

maybe next time give a quick peak to the disk state and if the partition is
full,
can also be good to activate AMQ debug logs in such cases (if you can)


>
>      When you talk about batch size, do you mean acknowledge
> optimization? ("ActiveMQ can acknowledge receipt of messages back to the
> broker in batches (to improve performance). The batch size is 65% of the
> prefetch limit for the Consumer"). This sounds like it could be
> related.. If acknowldge breaks down, AMQ would wait on consumers to
> complete, while the consumers did complete and are waiting for new
> messages. I already had the idea to check on the JMX "InFlightMessages"
> info during such an incident to confirm whether AMQ thinks that
> consumers are busy. But even if it turns out it does, that doesn't
> really help me, short-term.
>

Yep, here you can test with a batch size of 1 (will be "slow" but each
message is considered alone)


>
>      In this case client=server (we get messages over HTTP, write them
> in a queue on the activeMQ which runs in the same process as TOMEE, and
> consume them from the same TOMEE instance) so the thread dump I did
> covers both client & server.
>

It answers to the first question, so if you don't have any dynamic network
(like it can be with docker or so) it is likely not the issue.


>
>      Another backend than kahadb could be interesting, but there's a lot
> of traffic and validation of configuration changes for the server is
> expensive. I'm not sure that's really workable, especially since the
> chances of this fixing the issue are not that high.
>

Well, first you need to have an environment reproducing that issue then you
can iterate to identify it:

(no particular order)

0. DLQ state (ensure it is monitored)
1. network (if relevant)
2. backend
3. (potentially) transport
4. AMQ version

...


>
>      Regards,
>
> Emmanuel
>
> On 24/10/2018 08:39, Romain Manni-Bucau wrote:
> > Hello Emmanuel
> >
> > It can be a lot of things like a network breakdown behind a proxy (so AMQ
> > does not see it in some cases and a restart recreates the connection),
> some
> > backpressure (exponentional), some disk issue etc...
> >
> > It can be interesting to check your config for healthchecks, batch sizes,
> > and dump the threads in the server and client when hanging. Also testing
> > with another backend than kahadb can be interesting depending your work
> > load.
> >
> > Le mer. 24 oct. 2018 07:59, Emmanuel Touzery <
> > [hidden email]> a écrit :
> >
> >> Hello,
> >>
> >>       noone has any suggestion?
> >>
> >>       Regards,
> >>
> >> emmanuel
> >>
> >> On 22/10/2018 16:04, Emmanuel Touzery wrote:
> >>> Hello,
> >>>
> >>>      we have a tomee+ 7.0.3 installation with activemq, using kahadb as
> >>> a persistent message storage. We have an activemq.xml, we plugged it
> >>> though :
> >>>
> >>> BrokerXmlConfig = xbean:file:/opt/app/tomee/conf/activemq.xml
> >>>
> >>>      in the tomee.xml. The activemq broken runs within TOMEE:
> >>>
> >>> ServerUrl       =  tcp://127.0.0.1:61616
> >>>
> >>>      We have a prefetch of 2000:
> >>>
> >>> <transportConnector name="nio"
> >>> uri="nio://0.0.0.0:61616?jms.prefetchPolicy.all=2000"/>
> >>>
> >>>      We use mKaha. We disabled flow control.
> >>>
> >>>      So that everything would work, we had to add a couple of JARs in
> >>> the TOMEE lib folder:
> >>>
> >>> activemq-spring-5.14.3.jar
> >>> spring-beans-3.2.9.RELEASE.jar
> >>> spring-context-3.2.9.RELEASE.jar
> >>> spring-core-3.2.9.RELEASE.jar
> >>> spring-expression-3.2.9.RELEASE.jar
> >>> spring-web-3.2.9.RELEASE.jar
> >>> xbean-spring-3.9.jar
> >>>
> >>>      We are "reading" from JMS through message-driven beans,
> >>> implementing MessageListener and with @MessageDriven annotations.
> >>>
> >>>      The application is pretty simple... Receive the data from
> >>> HTTP/JSON, and store it to SQL (through hibernate).
> >>>
> >>>      Everything works fine as long as the traffic is normal. However
> >>> when there is a surge of incoming traffic, sometimes the JMS consumers
> >>> stop getting called, and the queue only grows. The issue does not get
> >>> fixed until TOMEE is restarted. And then we've seen the issue
> >>> re-appear again maybe 40 minutes later. After a while, the server
> >>> clears the queue and everything is fine again.
> >>>
> >>>      We took a jstack thread dump of the application when it's in that
> >>> "hung" state:
> >>> https://www.dropbox.com/s/p8wy7uz6inzsmlj/jstack.txt?dl=0
> >>>
> >>>      What's interesting is that writes fall quite fast, and in steps,
> >>> in general not all at once, but as well not slowly:
> >>> https://www.dropbox.com/s/nhm5s2zc7r9mk9z/graph_writes.png?dl=0
> >>>
> >>>      After a restart things are fine again immediately.
> >>>
> >>>      We're not sure what is the cause. From what we can tell from the
> >>> thread dump, the consumers are idle, they just don't get notified that
> >>> work is available. The server is certainly aware there are items in
> >>> the queue, we monitor the queue through JMX and the queue size keeps
> >>> growing during these episodes. We don't see anything out of the
> >>> ordinary in the logs. We looked at thread IDs for consumers just
> >>> before the issue, it doesn't look like the consumers get some deadlock
> >>> one after the other for instance. It seems like a bunch of them are
> >>> called in the last minute before the dropoff for instance. Also,
> >>> during a blackout the JDBC pool usage is at 0 according to our JMX
> >>> monitoring, so it doesn't seem to be about a deadlocked JDBC
> connection.
> >>>
> >>>      We did notice the following activemq warnings in the log file, but
> >>> the timestamps don't match with any particular events and from what we
> >>> found out, they don't seem to be particularly worrying or likely to be
> >>> related to the issue:
> >>>
> >>> WARNING [ActiveMQ Journal Checkpoint Worker]
> >>>
> >>
> org.apache.activemq.store.kahadb.MessageDatabase.getNextLocationForAckForward
> >>
> >>> Failed to load next journal location: null
> >>>
> >>> WARNING [ActiveMQ NIO Worker 6]
> >>>
> org.apache.activemq.broker.TransportConnection.serviceTransportException
> >>> Transport Connection to: tcp://127.0.0.1:37024 failed:
> >>> java.io.EOFException
> >>>
> >>>      Do you have any suggestion to try to fix this issue (which we
> >>> sadly can't reproduce at will.. and it only happens pretty rarely)?
> >>> Should we rather ask on the activemq mailing list?
> >>>
> >>>      Regards,
> >>>
> >>> emmanuel
> >>>
> >>>
> >>
>
>