Back to blog

Our Foundations (and why QUADROtech technology core trumps the rest)

May 5, 2016 by Romulo Melillo

When migrating email and document archives to the cloud, the total volume of data migrated (that is, the data ‘on the wire’) that needs to be transmitted from a company’s network to a cloud datacenter is significantly greater than the data volume that is reported by an archiving system.  For example, if you have 1 TB of archive data in a repository, it is compressed and single-instanced. To migrate the data to Office 365, the data is moved via a multi-instance uncompressed API to the Office 365 datacenter. This will create 3 or 4 TB network traffic on the wire.

So what causes this problem? Put simply, the underlying MAPI interface (Microsoft Mail Application Programming Interface – designed in the early 1990s) allows ingested data to be associated with one mailbox/personal archive at a time. If a given 100 Kb email is sent to 5 recipients, that email needs to be ingested 6 times (5 recipients, plus the sender – 600 Kb). The situation is then further complicated by needing to either encode the items for transport to the API, adding at least another 3rd, or allowing MAPI to wrap the low level structures.[vc_column width=”1/4″][vc_column width=”3/4″]

Incredibly, and rather unbelievably, migration providers still use old-style MAPI technology to move data around. MAPI-based implementations limit other tool ingest capabilities to something like 5 TB/day when information is moved into Office 365 (with caveats around ‘per server’ or ‘per task’). The fact is that MAPI was never actually designed for migration. It was designed to enable Clients (like Microsoft Outlook) to interact with Exchange. Any interface that is intended for client use implies that it is optimized for on the user experience, happily opening messages that are even corrupt – missing hidden properties – silently removed without the application even noticing. More on that in my next blog. Microsoft have long since re-written Exchange’s internals to stream data in and out of mailboxes using alternate methods and protocols, rather than MAPI (which is actually still heavily used by Outlook).

Whilst performance testing 2 years ago, we hit a wall: MAPI performance couldn’t be pushed to a place where it was capable of delivering the kind of transmission speeds we wanted. We couldn’t optimize the protocol, and we had to pull crazy tricks like using hundreds of concurrent processes to get any sort of reasonable performance. So, we took inspiration from a famous computer scientist: Alan Kay. During a 1982 conference, he said “People who are really serious about software should make their own hardware.”. That paradigm holds true here: people who are really serious about migration should design their own foundation libraries. So we threw away MAPI and started from the ground up.

Microsoft provide a great set of open specifications covering all of the protocols and formats used with Office 365, so whilst we were charting new waters by doing this ourselves, we had reference material to make sure everything we were doing was fully supported. The intelligence in writing the foundation libraries actually isn’t just in being able to interpret specifications and produce efficient logic to do so, it’s actually in ensuring performance and an overall sensible architecture; with 900+ items/sec being addressed by this library in a single process, smart memory management, threading, and logic are essential (just imagine an inefficient implementation loading 1000 items into memory per second @ 200 Kb each and that resulting in 2 GB of memory movement/second… disastrous for performance).

So what does having our own foundation library allow?

  • Significantly increased performance: we ignore MAPI and communicate directly with Exchange/Office 365. No abstraction layers, no black box foundation layers. That gives us significant performance benefits (over 3x MAPI performance)!
  • Ease of Integration: Our foundation library implements the MSG format, EML, PST, AIP and all the dependencies around those, including all conversions between them. Pulling a PST and converting it to AIP format for ingestion into Office 365 needs us to use 3 lines of code: great for testing!
  • Massively reduced bandwidth: MAPI attempts to compress remote operations (ROPS) on the wire when sending data to Office 365. Problem is, it’s an implementation designed back when messages were small. So… we rewrote it. We now beat MAPI’s own on-the-wire compression by a further 40%.
  • No ‘workarounds’: We don’t need to use hacks or workarounds to get MAPI working for us at high performance levels – no crazy number of processes, no “multiple tasks or servers”. We are not stuck with processes falling over due to MAPI memory corruption, either – a notorious problem for those that try and ‘wrap MAPI’.
  • Corruption Detection: MAPI is notoriously good at presenting MSG files to users – to the point where if a property is missing in part or full, it’ll show the message anyway. Same goes for when MAPI opens a message: it’ll best case the message and not report the corrupt properties – simply remove them… so how would your migration provider ever know?! More on this in a future blog.
  • Full support for .Net Core: with no requirements for MAPI, the Win32 API or anything that ties it to the windows operating system, our foundation libraries are fully cross-platform. More on that in my next blog too.

Our advice to other migration providers? If you want significant performance, true item integrity, and to really start reducing customer bandwidth, it’s time to build your own foundations too. Without it, you can’t simply can’t guarantee a high performing, efficient, safe migration.