Back to blog

Archive Migration: Why Extraction speed is only half the story

Oct 10, 2014 by Orlaith Palmer

You may be interested in the following service: Migrate Enterprise Vault to Office 365.

Currently, many of the marketing messages in the legacy archive migration space focus heavily on extraction speed (e.g. “We can extract 5TB a day from Enterprise Vault” or “We can extract (index) 330TB of compressed data in 45 days”). Fast extraction of legacy data is undoubtedly important and if one is able to justify the forensic/compliance risks of each approach, then it doesn’t matter if the vendor is reading the data using direct storage access approaches (with reverse engineered proprietary formats) or via the API.

But extraction speed is not the only factor in measuring migration speeds. For me, migration means the item is extracted from the source as well as ingested into the target. With almost 10 years of legacy archive migrations behind me, my experience is that the bottleneck in almost every legacy migration is usually the target system. It needs to process the data as well as index and store it. Ask yourself: What’s the benefit of extracting data faster than you can ingest it into the target system?

When it comes to judging the overall speed of migration, the different approaches advocated by vendors in this space must be carefully considered and assessed. Some will need to build a normalized model of the data which will require gathering or extraction of all data before the normalized model can be built. This means not only provisioning a lot of temporary disk space but also means you can’t start the ingestion of the first item after all data has been extracted and normalized. Other approaches can start the migration as soon the migration tool is aware of the first item that needs to be migrated.


If I were a customer about to migrate my legacy archive, I would ask providers the following questions:


What is the time from Project Kickoff to the first item migrated? (“Time to first item”)

(Migrated means of course that the item is not only extracted – but also ingested in the target system)


What is the vendor’s estimate as to when the whole migration will be completed?

From Kickoff to de-provisioning the migration tool after having all data migrated to the target.


What is the Project Planning time, especially if you do selective migrations based on content and try to leave data behind? 

Who is setting up those rules/filters and how can it be confirmed that everything required has reached the target? Consider that you might want to crosscheck with your legal department!


What is the required implementation time and resources for the solution?

How much hardware is needed to implement the migration solution and how complex is it to integrate? Are they any other deployment options than deploying hardware or VMs ?


Is a Proof of Concept (POC) in your live production systems easily doable?

How much time, money and effort is involved in the migration of 25 test archives in production?


How does the vendor ingest data in the target system? 

(e.g. scripted PST Import vs. usage of API)

What are the average ingestion speeds and how reliable is the approach?


How much daily operation of the migration is required and how is error handling done?

Are extraction/ingestion retries possible? Manually? Automatically? For the failed items or just a whole archive?



How transparent is the solution to the end user?

Does the user always have access to his data, even during the migration? Are “filtered” items available for a grace period to the users?