Chat with us, powered by LiveChat

Blog

Back

Performance Testing Microsoft’s new PST Import Service Vs. QUADROtech Ingestion Methodologies

17 Jun 2015 by Peter Kozak

Microsoft recently released the “Office 365 Import Service” to allow Office 365 tenants to import PST data into Exchange Online primary and archive mailboxes. The PSTs can be shipped to a Microsoft datacenter on SATA drives or uploaded over the network. In either case, the PSTs are first ingested into Azure Data Services (blob storage) and thereafter can be imported into mailboxes.

It’s important to understand that this service only covers the ingestion of PST files into Office 365 mailboxes. The eradication of PSTs requires much more than ingestion capabilities. How is the data discovered and how are PSTs collected in a centralized location in a reliable way? What about exclusive locks if the PST file is used by Outlook when the collection process runs? What about password-protected files and duplicate data? How does it handle corrupted files and/or corrupted items? Microsoft does not address these challenges as it expects the PSTs provided for processing to be ready for ingestion.

And although Microsoft suggests that the Office 365 Import Service is a way for customers to move off legacy archiving systems such as Symantec Enterprise Vault and Source One, it’s obvious that many other challenges exist after the data is exported to PSTs from these repositories.

With some vendors within the legacy archive migration space already making claims such as “we can move now 100TB of data within 30 days using Microsoft’s new import service”, one would expect PST imports to be very fast. Microsoft itself is more cautious about publishing performance data. On the TechNet webpage describing the service, their answer to the “How long will it take?“ question adopts the diplomatic stance of “It will take the time to ship the disks, plus several hours per TB of data to copy the data. Precise data will be available soon” (https://technet.microsoft.com/en-us/library/dn948519%28v=exchg.150%29.aspx) Most probably Microsoft means here the pure copy process from the disk to the staging area of Azure BLOB, but not the import process to O365 itself.

We wanted to see how the new service performs in a real life scenario in comparison to the three ArchiveShuttle Ingestion Providers available to QuadroTech: the Advanced Ingestion Protocol (AIP), EWS-Batching and EWS-Single (both of which depend on Exchange Web Services). To answer the question, we conducted the performance tests described in this document by ingesting the same set of PSTs with Microsoft’s PST Import Service and QUADROtech’s ArchiveShuttle. All tests were performed using the same Office 365 tenant (based in the European region) to upload/ingest from our Swiss datacenters, utilizing a symmetric gigabit internet connection.

If we look at how Microsoft moves information into user mailboxes from the PSTs imported by its new service, the first surprise is that no new revolutionary approach is being employed. Instead, Microsoft leverages the “New-MailboxImportRequest” cmdlet (which is run under the control of the Mailbox Replication Service) as the underlying method to import a PST into a mailbox. This cmdlet has been available since Exchange 2010 and is present in Exchange 2010, Exchange 2013, and Exchange Online.

Using the PST import service is a two-step process. The collected PST data, which is stored on-premises, is first transferred via disk shipping or network upload to Azure and staged in Azure Data Services. Once the data is safely held in Azure, the tenant administrator can create an import job to instruct Exchange Online to import the data into user mailboxes. A mapping file is used to map the PSTs to target mailboxes.

datatransport

In our tests, ArchiveShuttle ran in the QUADROtech cloud. One small VM (4 CPU, 6 GB RAM) acted as a bridgehead to Office 365.

ASO365

 

Results

As shown in Table 1, the Office 365 Import Service job ran over 6 hours, resulting in an average ingestion speed of 6.44 items per second or 3.08 GB per hour. Surprisingly, 478 items (0.33%) were not imported without any reason reported in the error log. It was impossible to determine which items were left out or how to re-ingest them. In comparison, the AIP ingestion finished in a little more than one hour, resulting in a throughput of 36 Items per second or 17.27 GB per hour. 4 items failed during the ingestion (throttling / back-off errors), but could be imported via a second attempt triggered automatically by ArchiveShuttle. The ingestion rate via Exchange Web Services was approximately the same speed as the Office 365 Import Service, resulting in a throughput of 6.26 Items/second or 3.00 Gigabyte per hour. The error rate was 12 items (0.0084%).

resulttable1

Ingestspeedto365-Items-Volume

MigrationDuration-Error Rate

Conclusion

Providing the ability to ship PST data to Microsoft instead of using ingestion over the network is a great advantage. But getting the data transmitted to Microsoft is only half of the story. The ingestion process is slow, but the biggest issue that we encountered was the number of items that were not imported and were not marked as failed. If you were to ingest 10,000,000 items and experienced the same error rate, over 33,000 items would have been discarded without any indication of why this happened.

In addition, the need to create a CSV-format mapping file to connect PSTs to target mailboxes is basic and will be a painful experience if you need to tackle several hundred or even thousands of PST Files. However, Microsoft might well address these issues based on initial customer feedback.

The most important point is that AIP was six times faster at processing PST data than Microsoft’s Import Service and all items were processed without problems.

Side note: ArchiveShuttle also has the ability to ship PST data via disk transfer. We have several customers using one small Azure VM (A3) to host the Office 365 ArchiveShuttle ingestion module. Data then can be transferred to Microsoft via Azure Import/Export Service on standard 3.5” SATA Disks. (http://blogs.msdn.com/b/windowsazurestorage/archive/2014/05/13/announcing-microsoft-azure-import-export-service-ga.aspx) This results in the same advantage brought by the PST Import Service to avoid consuming network bandwidth, whilst utilizing the fast and reliable ingestion approach of AIP.

Detailed Test Setup

Preparation

In an Office 365 tenant, 12 new users were created (PSTTest1 – PSTTest12) as targets for the PST Ingestion.

psttest1

 

12 PST files, each of them containing 11,842 items (Size per PST = 1.65GB), were prepared on disk to be ingested. The total data processed during the test was 18.9 GB made up of 142,104 items. We consider this volume to be a reasonable test that is representative of the kind of workload that customers will need to cope with.

pstfiles2

 

Importing/Ingesting via Microsoft PST Import Service

The Office 365 Import Service uses a two-step process when using network transfer to upload the PST files. First, the PSTs will be uploaded to AZURE Data Services. The second step is to create an import job in the Office 365 admin console.

First “AzCopy” will be used to upload the files to the Azure blob. This is done via command line.

pst3

 

Thanks to our fast internet connection, the upload of the 19 GB was done in just 4:04 minutes.

The second step is to create an import job in the O365 admin console and provide the import service with a mapping file:

pst4

 

pst5

 

As soon as the mapping file is provided, the import job appears in the Office 365 Administration Center and the ingestion starts. As you can see in the screenshot below, the job was created at 10:23:53 AM local time.

pst6

 

Microsoft’s PST Import started to process all 12 PST files in parallel after a short initial “scan” phase.

The job finished at 14:27:41 UTC (= 16.27.41 local Time (CET))

pst7

 

Interestingly, not all items from all PST files were imported. There were no errors or obvious skipped items, but as shown in the screenshot below, PST1 and 2 had fewer items imported than all the others.

As you can download an “error report” on each PST file, we expected a failed items report containing the Entry ID’s of each message that hadn’t been imported.

pst8Surprisingly, the error report had just the following message:

6/12/2015 12:28:33 PM: The long running job has been temporarily postponed. It will be re-tried again when resources become available.”

On 6/15/2015, still nothing has changed.

Unfortunately there was no way to see exactly what items were affected. There were no errors recorded and no items were skipped.

 

ArchiveShuttle Migration

QUADROtech’s ArchiveShuttle (AS) has the ability to use multiple ingestion providers (AIP, EWS Batch and EWS Single) in a defined priority. Per default, all ingestions are first attempted with AIP. Should AIP fail for whatever reason, AS will fallback to EWS Batch – and should this fail as well, EWS Single will be used.

For this test, we setup AS to run AIP as the only ingestion provider for the first test run and EWS Single for the second run.

Ingestion via AIP

AIP works very differently to traditional Office 365 ingestion mechanisms. Instead of establishing a connection to Office 365 for every single item and transmitting an XML based stream of MAPI Properties, AIP creates batches of 75 items (max. 18 MB) and transmits a binary blob of those items (Note: Restrictions that are explained in the EWS section below do NOT apply to AIP). Next, it establishes a connection to Office 365 and streams the whole binary blob at once.

This results not only in less connection overhead and less data to transfer, but also in a more effective handling of the transmitted data by Office 365 (in technical terms, the connection is handled by a Client Access Server, or CAS). Office 365 recognizes the binary stream as an Exchange Internal Storage Stream and forwards the BLOB to the target mailbox server, which then stores the items in the target mailboxes without any need to parse XML and manipulate data. Office 365 throttles workload in many different places, including inbound client traffic; less throttling is applied to data transferred via AIP because it appears as a large single transaction rather than a set of transactions. AIP also takes a multithreaded approach, allowing the use of multiple streams per mailbox and multiple mailboxes in parallel.

A snippet from the ArchiveShuttle Office 365 Ingestion Module using AIP:

Ingestion via Exchange Web Services (EWS)

There are two different approaches to ingest data via EWS. The first is single item EWS, where one item is transferred per connection. Additionally, EWS (like AIP) allows the batching of items, however there are certain restrictions on to how the batches are built. For example, only items without attachments can be batched, all items have to be ingested into the same mailbox folder and the items have to be below a certain size. As our test-set of the 12 PSTs was extracted from a journal archive, they all contain attachments (Envelope Journaling containing the real message as an attachment) so that EWS Batching couldn’t be tested with this test set of data. Due to the restrictions mentioned above, EWS Batching allows – in our experience – approximately double the speed of an EWS Single Items method.

EWS works by requiring the application to create a new item in the mailbox folder and then setting the properties on those items. Unfortunately, not all properties can be preserved, such as creation date of the item for example. It will change to the time of the migration which can’t be prevented. Some other properties which are “read/only” can also not be set via EWS. This means the ingested item is no longer the “original” item. This could leave some organizations with concerns over Chain of Custody preservation, which is an important aspect of proving item immutability for legal purposes.

In this test, we ran 240 ingestion threads (20 per Mailbox) in parallel.

12 items were not ingested (1 per PST File). For whatever reason, it seemed like EWS didn’t like those items, even when they were re-processed. However, these items were successfully ingested when we switched to the AIP protocol.

Get in touch with us here to see how AIP can provide you with a faster, safer migration approach.

whois: Andy White Freelance WordPress Developer London