Back to blog

Email migration to Office 365: Part 4 – Managing data volume

Feb 16, 2017 by Emma Robinson

This post is part of a series looking at the key considerations when migrating your email to Office 365.

We started the series by exploring the reasons why migrating to Office 365 could be the right decision for your organisation. This was followed by a post detailing the common mistakes made during an Office 365 migration, and subsequently a post addressing a range of technical considerations you should be aware of before starting the migration process.

Now it’s time to take a long hard look at your data, especially the volume which needs to be moved across into your new system. This post will explore the potential issues surrounding the management of your data volume.

Data volume and Office 365

We fully appreciate the process of migrating data is one filled with questions, and some of the most important ones are around the amount of data to be migrated. As tempting as it is to migrate everything, this isn’t typically advisable. While storage in Office 365 might be far more affordable and flexible than on-premises, moving all your data could become an obstacle to productivity and mobility. Office 365 should enable you to streamline operations, reduce infrastructure and storage costs, and make it easier for your employees to be productive, if you still need to sift through large amounts of old, irrelevant, and perhaps even corrupt data, then you’re less likely to reap the benefits that your new environment offers.

It is important to minimise the data volume that gets transferred over the WAN into Office 365, otherwise you may encounter network issues, IT performance problems, and a negative impact on business continuity. There are a number of different ways to ease this pressure, and reduce the impact on your network.

Microsoft offers two options for getting data ready for an Office 365 import:

  • Network upload – this involves uploading data files over the network to a temporary storage location in the Microsoft cloud, with the Office 365 import service then utilised to bring the data into your Office 365 system.
  • Drive shipping – the process of copying data files to a BitLocker-encrypted hard drive and then physically shipping the drive to Microsoft. Microsoft data centre staff then upload the data to a temporary storage location in the Microsoft cloud for subsequent use with the import service.

In email and archive migration projects, the data payloads being transferred can be very significant. Transferring large email ecosystems and data archives into Office 365 over the network can take up to several months for large enterprises, and so it’s vital you use the most time and cost-effective method possible.

Of the two options highlighted, drive shipping is the more expensive (at $2k per 1TB), but in certain cases, it may be the only viable solution. There are third-party software solutions available to optimise the payload and ingestion process are optimised by purpose-built software, the network upload option remains the most preferable.

Besides having a negative impact on your own network, large data migrations are also affected by throttling restrictions. Throttling is the intentional slowing of a service or system by its provider, and it is typically seen in the context of bandwidth throttling (the slowing down of Internet service by an Internet Service Provider).

Importantly – and interestingly – the Exchange Online service “includes bandwidth throttling to help manage server access. The throttling components of Exchange Online are especially important, given that network resources in the data centers are optimized for the broad set of customers that use the service.”

Office 365 deliberately slows migration uploads by employing:

  • User throttling – affecting migration from non-Microsoft platforms such as IBM Lotus Domino and Novell GroupWise
  • Resource-based throttling – to manage incidents affecting critical services
  • Migration-service throttling – in our case, this is the most relevant, as migration-service throttling may, for example, restrict the number of mailboxes that can be migrated simultaneously during simple Exchange migrations (by default, a maximum of three mailboxes can be migrated at any one time)

You can find a more technical description of throttling and its limits here, but it’s important to remember that the speed of upload is not just affected by throttling – intermittent and unreliable connectivity can be a problem (particularly from overseas locations), causing issues where uploads are constantly being halted and resumed.

Throughout the whole migration process, time is key, as the longer the data upload takes, the greater the opportunities are for something to go wrong unexpectedly, interrupting the process. For example, there could be crashes during import, or unforeseen bottlenecks constricting the flow of data. These could delay the project considerably and consequently put business continuity at risk, especially if you’re aiming for a ‘cutover’ migration (one where you plan to simultaneously ‘switch on’ Office 365 at the same time as turning off the old environment).

The faster the migration can be completed, the fewer the risks. This brings us back perfectly to the need to reduce data volume before we even begin the actual migration process.

How do I minimise the data volume I need to migrate to Office 365?

As part of the preparation and discovery process, you’re likely to be able to identify data that simply does not need to go into Office 365. This data most often ranges from duplicate items and items so old they have no practical use, right through to corrupt and therefore unusable files.

For this reason, a data clean-up process should be at the start of every migration. It is a crucial activity, enabling you to streamline your new environment and optimise storage going forward.

Office 365 is a practical solution for consolidating compliance and making management easier, especially when the service already has a range of robust features and tools to help you centralise your IT needs. The platform provides large mailbox quotas (100gb by default), as well as archive storage, plus extensive eDiscovery and legal hold tools, all of which can replace the need for on-premises archives.

The cost and complications of maintaining archive vendor solutions alongside Office 365 is one of the reasons for migrating everything into one compliant, manageable destination. That said, many organisations choose to implement a ‘hybrid’ environment, whereby part of the email ecosystem (such as the archive) remains on-premises. This configuration is better suited to certain organisation’s needs or requirements (some have issues configuring of line of business (LOB) systems, others are restricted because certain types of data cannot be stored in the cloud). It can also help minimise end user disruption during a migration, and take some of the pressure or urgency out of the transition for Administrators.

Cleaning up your data, where possible, before migrating it over the WAN to Office 365 approach will obviously reduce the overall transfer data volume, not to mention the additional efficiencies achieved by consolidating and rationalising legacy systems. Ultimately though, the decisions affecting how much is retained and how much is archived (and where) really depends on your individual circumstances as an organisation, and whether you are able to streamline the data you bring into this new system.

You’ve cleaned up your data, now what?

Having minimised the source data volume as much as possible, we can then consider the actual payload to be transferred. The term ‘payload’ refers to the actual data within a file (discarding the headers and similar components, used for transporting). There are various solutions available for speeding up ingestion (the importing and processing of data) into Office 365, and for reducing the actual payloads being transferred from each location. The type of solution really depends on your needs, so make sure you explore your options if you’re concerned about bandwidth or network disruptions.