Microsoft Exchange 2016 – Version Store Out of Memory Error

Microsoft Exchange Microsoft Windows

The Microsoft Exchange 2016 ESE database engine manages the database engine and database files that make up the Mailbox Databases on the Microsoft Exchange server; it reads and writes data to and from the database and manages the database files through background management tasks. The Exchange databases are based on the JET database engine and are fully transaction based and fully obeys ACID principles of database design.

The issue occurred a couple of years ago, but I thought i’d write it up, because in theory it could continue to affect people now who end up with the same situation.

Issue and Symptoms

Users on the affected mailbox will see slow and/or sporadic email access via IMAP, OWA or RPC over HTTPS connections, some inbound or outbound mail delivery will be rejected or delayed. The Exchange server hosting the affected database will also be logging the following events within the event log. 

You can filter on these pertinent log IDs although others might be relevant: 4012,630,1046,1002,1051,4013,9042,4009

Some examples are given below:

Log Name:      Application
Source:        MSExchangeIS
Date:          01/05/2020 13:14:02
Event ID:      1034
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      exchangeservername.domain.com
Description:
A Microsoft Exchange Server Information Store task failed with a non-fatal database exception. Exception text is (Microsoft.Exchange.Server.Storage.Common.NonFatalDatabaseException: JetTableOperator.Delete ---> Microsoft.Isam.Esent.Interop.EsentVersionStoreOutOfMemoryException: Version store out of memory (cleanup already attempted)
   at Microsoft.Isam.Esent.Interop.Api.JetDelete(JET_SESID sesid, JET_TABLEID tableid)
   at Microsoft.Exchange.Server.Storage.PhysicalAccessJet.JetTableOperator.Delete()
   --- End of inner exception stack trace ---
   at Microsoft.Exchange.Server.Storage.PhysicalAccessJet.JetTableOperator.Delete()
   at Microsoft.Exchange.Server.Storage.PhysicalAccessJet.JetDeleteOperator.ExecuteScalar()
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.DoMaintenanceDelete(Context context, MaintenanceRecordData maintenanceRecordData)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.ApplyMaintenance(Context context, MaintenanceRecordData maintenanceRecordData)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.ApplyMaintenanceChunk(Context context, Queue`1 maintenanceRecords, Int32 numberOfRecordsToApply, Boolean commitTransaction)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.ApplyMaintenanceToIndexNoLock(Context context, Boolean canRepopulate, Boolean canPulseTransaction, Int64 itemCountEstimate)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.ApplyMaintenanceToIndex(Context context, Boolean canRepopulate, Boolean canPulseTransaction, Int64 itemCountEstimate)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.UpdateIndexImpl(Context context, Object populationCallback, Int64 itemCountEstimate, Boolean updateLastReferenceDate, Boolean& indexRepopulated, Boolean& indexRepopulatedFromAnotherIndex)
   at Microsoft.Exchange.Server.Storage.LogicalDataModel.SearchFolder.BaseViewOperatorDefinition(Context context, Mailbox mailbox, IList`1 columnsToFetch, SearchCriteria criteria, Nullable`1 associated)
   at Microsoft.Exchange.Server.Storage.LogicalDataModel.SearchFolder.<DoSearchPopulation>d__203.MoveNext()
   at Microsoft.Exchange.Protocols.MAPI.MapiViewMessage.<CreateMaterializedRestrictionAndPopulate>d__138.MoveNext()
   at Microsoft.Exchange.Server.Storage.MapiDisp.MailboxTaskQueue.RunTaskStep(MailboxTaskContext mailboxTaskContext, QueueItem queueItem, Func`1 shouldTaskContinue)
   at Microsoft.Exchange.Server.Storage.StoreCommonServices.TaskExecutionWrapper`1.<>c__DisplayClass2_0`1.<WrapExecute>b__0(TaskExecutionDiagnosticsProxy diagnosticContext, TTaskContext taskContext, Func`1 shouldCallbackContinue)).
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="MSExchangeIS" />
    <EventID Qualifiers="49156">1034</EventID>
    <Level>2</Level>
    <Task>1</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2020-05-01T12:14:02.820404100Z" />
    <EventRecordID>9660961</EventRecordID>
    <Channel>Application</Channel>
    <Computer>exchangeservername.domain.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data>Microsoft.Exchange.Server.Storage.Common.NonFatalDatabaseException: JetTableOperator.Delete ---> Microsoft.Isam.Esent.Interop.EsentVersionStoreOutOfMemoryException: Version store out of memory (cleanup already attempted)
   at Microsoft.Isam.Esent.Interop.Api.JetDelete(JET_SESID sesid, JET_TABLEID tableid)
   at Microsoft.Exchange.Server.Storage.PhysicalAccessJet.JetTableOperator.Delete()
   --- End of inner exception stack trace ---
   at Microsoft.Exchange.Server.Storage.PhysicalAccessJet.JetTableOperator.Delete()
   at Microsoft.Exchange.Server.Storage.PhysicalAccessJet.JetDeleteOperator.ExecuteScalar()
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.DoMaintenanceDelete(Context context, MaintenanceRecordData maintenanceRecordData)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.ApplyMaintenance(Context context, MaintenanceRecordData maintenanceRecordData)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.ApplyMaintenanceChunk(Context context, Queue`1 maintenanceRecords, Int32 numberOfRecordsToApply, Boolean commitTransaction)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.ApplyMaintenanceToIndexNoLock(Context context, Boolean canRepopulate, Boolean canPulseTransaction, Int64 itemCountEstimate)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.ApplyMaintenanceToIndex(Context context, Boolean canRepopulate, Boolean canPulseTransaction, Int64 itemCountEstimate)
   at Microsoft.Exchange.Server.Storage.LazyIndexing.LogicalIndex.UpdateIndexImpl(Context context, Object populationCallback, Int64 itemCountEstimate, Boolean updateLastReferenceDate, Boolean& indexRepopulated, Boolean& indexRepopulatedFromAnotherIndex)
   at Microsoft.Exchange.Server.Storage.LogicalDataModel.SearchFolder.BaseViewOperatorDefinition(Context context, Mailbox mailbox, IList`1 columnsToFetch, SearchCriteria criteria, Nullable`1 associated)
   at Microsoft.Exchange.Server.Storage.LogicalDataModel.SearchFolder.<DoSearchPopulation>d__203.MoveNext()
   at Microsoft.Exchange.Protocols.MAPI.MapiViewMessage.<CreateMaterializedRestrictionAndPopulate>d__138.MoveNext()
   at Microsoft.Exchange.Server.Storage.MapiDisp.MailboxTaskQueue.RunTaskStep(MailboxTaskContext mailboxTaskContext, QueueItem queueItem, Func`1 shouldTaskContinue)
   at Microsoft.Exchange.Server.Storage.StoreCommonServices.TaskExecutionWrapper`1.<>c__DisplayClass2_0`1.<WrapExecute>b__0(TaskExecutionDiagnosticsProxy diagnosticContext, TTaskContext taskContext, Func`1 shouldCallbackContinue)</Data>
    <Binary>5B444941475F4354585D00001E010000FF1100000000000000021001000080CA00100000000030FF001000000000309F0010000000001AE810108F0000001A8810109B000000CCE850205D0B6D46574D333600000000CCE850205D0B6D46574D33370000000080A18030A8D0E999A1E4114E9746DB6B738E665E0000000080891010987E0000CCE850205D0B6D46574D333800000000CCE850205D0B6D46574D33390000000080A18030A8D0E999A1E4114E9746DB6B738E665E0000000080891010987E0000C88F4010D3FBFFFFCCDA001000000000B0DD001000000000B0AD001000000000B0ED001000000000B08D001000000000B095001000000000B0E5001000000000C8F440108009000080A18030A8D0E999A1E4114E9746DB6B738E665E0000000080891010987E0000</Binary>
  </EventData>
</Event>

This event log entry was one of the most interesting:

Log Name:      Application
Source:        ESE
Date:          01/05/2020 13:16:12
Event ID:      630
Task Category: Transaction Manager
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      exchangeserver.domain.com
Description:
Information Store - 1601A-6 (32408,D,0,15.01.1466.011) 1601A-6: The version store for this instance (0) has a long-running transaction that exceeds the maximum transaction size.
Current version store size for this instance: 819Mb
Maximum transaction size for this instance: 818Mb
Maximum version store size for this instance: 1023Mb
Long-running transaction:
    SessionId: 0x000002BE805E36C0
    Session-context: 0x00000000
    Session-context ThreadId: 0x0000000000000FB0
    Cleanup: 1
    Session-trace:
48869@13:16:12
 
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="ESE" />
    <EventID Qualifiers="0">630</EventID>
    <Level>2</Level>
    <Task>14</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2020-05-01T12:16:12.561266900Z" />
    <EventRecordID>9661071</EventRecordID>
    <Channel>Application</Channel>
    <Computer>exchangeserver.domain.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data>Information Store - 1601A-6</Data>
    <Data>32408,D,0,15.01.1466.011</Data>
    <Data>1601A-6: </Data>
    <Data>0</Data>
    <Data>819</Data>
    <Data>818</Data>
    <Data>1023</Data>
    <Data>0x000002BE805E36C0</Data>
    <Data>0x00000000</Data>
    <Data>0x0000000000000FB0</Data>
    <Data>1</Data>
    <Data>48869@13:16:12
</Data>
  </EventData>
</Event>

Diagnosis

An Exchange database contains an area that is used to store running tasks known as the “Version Store”, the Version Store is can be thought of as a buffer space for transactions that are currently running on the database, a “in-flight” transaction goes into this space and is expunged when it is complete; the space within the Version Store in finite and has been enlarged by the Exchange developers over the various iterations to avoid and minimise the situations when this space can become full and therefore cause issues.

In this particular situation the Version Store has become over loaded/full by a transaction that is too big or is long running for some reason. There are two ways you can get a long-running or too large transaction in the Version Store space that causes issues these are:

  1. The server is so overloaded that transactions are being requested, but the server CPU, RAM and IO is not sufficient to process it fast enough so the Version Store becomes full of transactions or long-running transactions that can be executed fast enough. Note this is fairly rare, and  is also very obvious you’ll see a server with impacted performance across all databases not just one, you’ll also see a very high CPU, RAM usage, high IO and long disk queues.
  2. The second is more common and is caused by either database corruption or a database item (e.g. an item in a user’s mailbox) that is hanging up database operations in some way, it then causes a long-running transaction that then exceeds the database Version Store space causing database accessibility issues, and in this case generates vast numbers of transaction logs. 

We were experiencing issue 2; the database contained about 6 user mailboxes, all other users on the server were unaffected while the issue was taking place, only users on the affected mailbox were seeing access and delivery issues.

Additionally the database was about 100GB in size, but was generating 10GB+ of transaction logs an hour. Transaction logs store changes to the database for later verification, and can very roughly used to identify the rate of growth in data in the database (although they store all changes, updates and deletions as well as additions). So if you have a 100GB database and you receive 10GB of email over the course of a week, the database would end up being 110GB and the transaction logs would consist of 10GB of 1MB log files; very roughly, this ignores that people would be reading/filing email or deleting mail etc as well.

In this situation we were seeing 100GB database with 10GB+ of transaction logs being created every hour, this was not normal, the database was not growing at a comparative rate, no external database operations (e.g. user scripts) were performing operations on the database at the time that could have been moving data around. From this it was concluded that the transactions being unable to get through the version store were then having to be written down to transaction logs as they backed up due to whatever database anomaly was causing long running processes, of course this was just generating more and more transaction logs because changes to the database need to be stored somewhere if they cannot be action-ed to ensure consistency of the database.

Resolution

In this case it was the number of items within a user’s mailbox which was the apparent cause of the issue, therefore purging said emails is required to bring them within the operating envelope of the Exchange server database again. For example if the user’s mailbox contained millions of emails from wtgc-nagios@sanger.ac.uk as an example the following commands could be used.

Firstly to get an estimate on the number of items:

Search-Mailbox USERNAME -SearchQuery {from:USERNAME@domain.com} -EstimateResultOnly
 
Search-Mailbox USERNAME -SearchQuery {from:USERNAME@domain.com} -SearchDumpster:$false -EstimateResultOnly
 
Search-Mailbox USERNAME -SearchQuery {from:USERNAME@domain.com} -SearchDumpsterOnly -EstimateResultOnly

And finally then to purge the items with:

Search-Mailbox USERNAME -SearchQuery {from:USERNAME@domain.com} -DeleteContent

There were so few other articles or posts about this, but these were the few I found at the time.

Leave a Reply

Your email address will not be published.