Back to blog

Who moved my millions of morsels of cheese?

7 Aug 2018 by Jason Jacobo

Part 3: A Deeper Impact

Familiar with the cliché “a stitch in time saves nine”? It basically implies that procrastination in addressing any identified problem just results in more work when you eventually decide to address it. In earlier parts of this series, we began to introduce an issue that the makers of messaging solutions saw, and ignored. As the cliché goes, this procrastination compounded their problem (explosive email usage vs the cost to maintain it) substantially, but it also impacted any other system that was used to help mitigate the symptoms of their problem (third-party email archives, PST file usage).  With every workaround that ignored the root cause, the problem of email archiving was further compounded, and as a knock-on effect, the complications for migrating this data also increased too.

Feeling lost? Need a refresher course in the history of email archiving? Read pt.1 and pt. 2 here.

In part 3 of this blog series, we will talk about the impact that archiving systems had on the messaging ecosystem, and the effects this had on migration solutions used to move the data stored in these systems to a new target. We will explore why this is a problem, and the features or techniques these migration solutions have employed to try to address those issues.

Typically, organizations chose centralized archiving solutions for the following reasons (or a combination of both):

In many cases, archiving solutions ran like this for decades. During this time, messaging solutions developed to better accommodate the demands placed on them. As a result, they began to encourage organizations to ‘bring their data home’ and back into Exchange, or Exchange Online. Native methods that returned this data to its original birthplace lacked a means of accountability, error tracking, or ease of operation. Commonly, customers would try to migrate with the tools they had available to them, get 3% into the process in the timescale that was allocated for the entire project, with no ability to report on the progress. Exhausted, messaging teams began to seek a solution that better understood their needs. It was this industry-wide frustration that gave birth to the market of the modern email archive migration solutions.

Migration solutions were not impervious to the same issues native tools experienced. When moving message data it is important to retain data integrity and to log all key details from source to target. There are common critical components that must be tracked per item in the source, including where the message came from, where it ended up, evidence showing that the source and target messages are the same, and when this took place. This report, commonly referred to as the ‘chain of custody’, tracked every migrated message within a database used by the migration solution. As more and more data was migrated, row limitations began to cause performance degradation in the solution’s database tables, resulting in performance and sometimes reporting issues when dealing with the largest of customers.

The approach for mitigating this problem was historically similar too: most solutions just stood up additional instances and some solutions (like  ) approached the problem with a multi-database design to try to ‘containerize’ the data and make it manageable. Neither solution proved to be ideal. In small or mid-sized migrations, no issues were experienced, and they proceeded seamlessly. In the largest of migrations, where the data and chain of custody are frequently most critical, the tables used to track each message would begin to result in performance degradation, not just in reporting against this data, but in several areas of the product’s functionality.

Indices for tables were frequently added to try to address the continued growth, but unlike most applications leveraging relational databases, migration solutions spent a large amount of time loading up the database with data rather than just reading it. Additional Indices frequently resulted in increased database load times, additional instances resulted in increased management effort, and segmented data eventually ran into the same issue as its source: having the needs produce more data than the solutions were designed to accommodate. It became clear that the sheer number of items being migrated resulted in the weakest dependency being table size –  where those items are tracked and operated against.

For years, the migration industry just dealt with these challenges and the consequent performance degradation. In the next blog in this series, we will show what Quadrotech is doing about this problem and how it results in an improved experience and accountability for all enterprise migrations.