Monday, 25 October 2010

Timeouts to the subscription database when starting NServiceBus

When using durable subscription storage (using a database), when starting NServiceBus you may see the following errors logged:

NHibernate.Util.ADOExceptionReporter {log4net:HostName=MyHost} - NHibernate.ADOException: While preparing SELECT this_.SubscriberEndpoint as y0_ FROM [Subscription] this_ WHERE this_.MessageType in (@p0) an error occurred ---> System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

NHibernate.Util.ADOExceptionReporter {log4net:HostName=MyHost} - While preparing SELECT this_.SubscriberEndpoint as y0_ FROM [Subscription] this_ WHERE this_.MessageType in (@p0) an error occurred

NHibernate.Util.ADOExceptionReporter {log4net:HostName=MyHost} - System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

NHibernate.Util.ADOExceptionReporter {log4net:HostName=MyHost} - Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

NServiceBus.Unicast.UnicastBus {log4net:HostName=MyHost} - MyHostFailed handling message.
System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

NServiceBus.Unicast.Transport.Msmq.MsmqTransport {log4net:HostName=MyHost} - Failed raising 'transport message received' event for message with ID=00000000-0000-0000-0000-000000000000\000000
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> NHibernate.ADOException: While preparing SELECT this_.SubscriberEndpoint as y0_ FROM [Subscription] this_ WHERE this_.MessageType in (@p0) an error occurred ---> System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.

NServiceBus.Unicast.Transport.Msmq.MsmqTransport {log4net:HostName=MyHost} - Message has failed the maximum number of times allowed, ID=00000000-0000-0000-0000-000000000000\000000.

The problem here is to do with having a lot of logical messages in a single physical message and that physical message is the first one that is received by NServiceBus after starting up.

In NServiceBus, you can send multiple message instances at once. You can do this by calling “IBus.Publish(IMessage[])”. These multiple message instances are wrapped up into a single physical message. There does not appear to be any hard limit on the number of logical messages that can be put into a single physical message. You can, therefore, send a lot of logical messages in a single physical message.

When NServiceBus starts up and receives the first physical message, if you are using durable subscription storage (using a database), NServiceBus will look up the subscriptions from the subscriptions database (so that it knows where to send the message). Once the subscriptions list is initialised, it is stored in memory so there are no subsequent database look-ups. Note that new subscriptions are received via messages, so the subscription list in memory will be kept up to date as subscriptions are added (when a new subscription is received it is added to the in-memory list and written to the subscription database).

NServiceBus will perform the subscription database look-up for every logical message in the first physical message it receives, opening a connection to the subscription database for each logical message. It appears that NServiceBus will not close the connection to the subscription database until it has finished processing the message. All logical messages must be processed within the same operation when they are contained in a single physical message. Should you have more logical messages than the maximum database connection pool size, the operation will not complete (it will timeout).

For example, let’s say your connection string for the subscription database allows 100 connections in the connection pool but you have 150 logical messages in the first physical message that is received. As the message is the first received, NServiceBus needs to look up the subscriptions. So NServiceBus try to will open 150 connections to the subscription database, it will successfully open 100. When processing the 101st logical message, it will not be able to establish a connection because there are 100 open connections and that is the maximum allowed (the first 100 connections will not be closed until the whole operation on the physical message is completed). The operation to establish the 101st connection will timeout. The whole operation on the physical message will be considered to have failed, any transactions will be rolled back and the physical message will be re-tried the configured number of times (with each re-try failing for the same reason). Eventually the physical message will be sent to the error queue and the next physical message will be processed.

Should a subsequent physical message be processed successfully (i.e. one with less logical messages than allowed database connections), you can re-try your physical message with a large number of logical messages and it will also be processed successfully (because the subscriptions list will already be initialised in memory at that point).

The short-term work around is to have less logical messages than you have allowed database connections (for that first message). The default connection pool size is 100, so that is the limit. I am looking for a long term solution, I think this is a bug in NServiceBus.


About Me