The PST FlightDeck Process – Our Approach to PST Discovery
Step 1 – Discovering all the PSTs
The difficulties surrounding the proliferation of the PST file in the modern enterprise was a dominant topic of conversation at Microsoft Ignite 2015. As they spread across your environment, they can cause a massive amount of duplicated information to be stored on your hardware as well as pose a threat to user email access if they corrupt. Today, looking to migrate PST files to Office 365, or even eliminating them entirely, is a wise choice for anyone looking to preserve user access and gain greater control over the day-to-day management of email data.
First, you need a method to locate all the PSTs. You can take an agent-less approach from a central server, but this would require the need to remotely connect to every machine connected to the domain (or potentially the whole forest) to look for the PSTs. Click here to learn more about the pros and cons of agent vs agent-less approaches to PST Discovery.
By deploying an agent, such as the PST FlightDeck Agent, to the company computers & laptops using any Software-Distribution-System (e.g. SCCM, Altiris, etc.), the agent will scan the configured paths for PST files and any associated metadata. Additionally, it will scan the user’s Outlook profiles and determine which PST files are currently attached to Outlook, and which ones are not. The information about the discovered PST files (Owner, PST Name, PST Path, Creation Date, Last Modified Date, Size, Attached or not, etc.) will be transmitted via HTTP/HTTPS web service back to the PST FlightDeck Server and can be used for further planning of the migration.
It’s important to understand that the PSTFlightDeck Agent runs under the security context of the user, therefore no administrative account that has access to fileservers and user workstations is required. Typically, local drives and the user’s home share are scanned with the user agent, while Group-Shares are scanned from a central location using the fileserver scanner. This approach ensures maximum security while simplifying the owner detection process.
The Discovery Phase is independent from the following migration phases and is typically done upfront, to fetch the required data necessary for detailed planning of the migration (e.g. how many PST files are there, where are they, their size, age, distribution, etc…).
Step 2 – Centralizing the Files
Once you’ve identified the files and gathered information on them, such as PST location and size for example, you can look to centralize them to stage them for processing and ingestion into the new target. Once enabled for migration, the FlightDeck Agent deploys several registry keys to change the behavior and use of PST files for that user. The user is still able to attach and read PST files, but can’t add any new data to any PST file. The Agent then creates a snapshot of the attached PST files, and queues them for upload to the PSTFlightDeck Server. Non-attached PST files don’t require a snapshot, as there is no exclusive lock by Outlook. The centralization process occurs with QUADROtech’s ACT (Advanced Centralization Technology), utilizing Microsoft’s BITS protocol in the background for the underlying transfer.
This allows interrupted transfers to resume (in cases where the user laptop was undocked from company network for example) from the last successfully transmitted byte as soon a new connection is available. Furthermore, the bandwidth can be controlled through the use of the BITS GPOs in Active Directory. It’s important to note that BITS doesn’t require any kind of additional firewall ports to be open except for http/https. FlightDeck can also manage the number of files and total size of uploads to assist in managing bandwidth constraints.
If bandwidth is an issue, a local PST FlightDeck web service can be installed on the remote location on a small VM, or even a laptop, which can collect the PST files in the LAN. The collected files can then be shipped via shippable media (e.g. encrypted USB Drive) to the central datacenter.
Step 3 – Processing PST Files
Once the PST files arrive at the central datacenter, they will be prepared for ingestion into the target system. As PST FlightDeck is workflow based, the necessary process can be modified as required. A typical process workflow would take care of the following steps with FlightDeck’s corresponding module.
Password Removal Module
Based on our experience, 5-8% of PSTs are typically password protected. The Password-Removal Module checks each received PST file to see if it is password protected, and if so, it will strip the password from the PST. In fact, the module tells the PST file that it is no longer password protected.
Most customers like to record a “last reliable” backup of the PST files in the central location before further processing/ingesting the data. These backups are typically kept for a limited amount of time after ingestion to the target system. The backup is written to a configurable Windows share and it is assumed that the company will use its established backup methodology to backup this share. After a successful backup, the PST files will be deleted from the backup share.
Pre-flight Checking Module
PST files may either be corrupted or contain corrupted items, which can prevent a successful ingestion. The Preflight-Check determines if a PST file is corrupted and will try to repair them if so. If a PST file can be opened successfully, it will be checked to see if it’s in the old (older than Outlook 2003) ANSI format. If so, it will be converted to a Unicode PST file. The next step of the process is to check each item within the PST file for corruption.
Content can be filtered in this module based on the available metadata. For example, if the legal retention policy of a customer is 7 years, data older than 7 years can be filtered out to reduce the amount of data required for ingestion. Furthermore, different message types such as contacts or tasks (based on the Exchange message class) can be filtered out if they are unwanted in the target environment.
Based on over 10 years of experience running large PST migration projects, one of the major issues that can occur are the backup-copies of the PST files being interacted with by the end-users. These backup-copies might contain small deltas between the files. Ingesting all the PST files would result in duplicated folder structures and duplicated data in the users Mailbox. The deduplication module takes all PST files for a single user into consideration and calculates a fingerprint for each item. If the fingerprint for an item is the same, it means the same message is already present in another PST file. The older message can then be removed. Typically, this step can result in an (up to) 33% reduction in data volume and required ingestion time.
Step 4 – PST File Ingestion
Recently Microsoft announced its own solution to PST ingestion, but this has been met with many questions over its limitations, especially in regards to being applied to enterprise-level projects. It’s important to remember that ingestion is just one factor in a complicated procedure. For more information on this, you can find links to the earlier parts of our blog series below.
When ingesting into Exchange or Office 365, it can be assigned as to whether data should be ingested to either the primary or archive mailbox. Remember that ingesting to the primary mailbox would result in resync of the data back to the client, as the content is cached in the OST-File on the client. QUADROtech is using its unique Advanced Ingestion Protocol instead of traditional MAPI or EWS approaches.
AIP converts a bunch of messages from a PST file into Exchange’s internal storage format at the same time and streams this binary BLOB to the CAS Server over a HTTP/S connection. This results in a lighter load on the CAS Servers or Array, as conversion is no longer required, which would have been typically done by the CAS Servers. The CAS forwards the stream immediately to the Mailbox server to be stored in the Exchange Database. The typical speed benefit is 400-500% more throughput, with less load on Exchange, than traditional EWS. Especially in Office 365 ingestions, it results in far less throttling by Microsoft.
Typically the ingestion is writing to a configurable subfolder (e.g. “Legacy-PST-Data”) in the Mailbox-Root of the user, keeping the folder structure underneath this folder intact.
Alongside the speed benefit of AIP, is heightened safety for the migrated items. Approaches such as EWS for example, work by requiring the application to create a new item in the mailbox folder, and then setting the properties on those items. Unfortunately, not all properties can be preserved, such as the date it as created. It will change to the time of the migration which can’t be prevented. Some other properties which are “read/only” can also not be set via EWS. This means the ingested item is no longer the “original” item, which could leave some organizations with concerns over Chain of Custody preservation.
Step 5 – Cleanup and Management
FlightDeck undertakes a cleanup process to manage items potentially still waiting in the upload area, and ensure that the required files (and the items within them) have all been transferred successfully.
A PST cleanup can be performed either post, or pre-ingestion. We advocate a post-ingestion cleanup to save resources due to the fact fewer write IOPS are required on the FlightDeck Server(s) as the PST-Files have no need to be extended. From our experience, writing/extending the PST files is best avoided in order to ensure the best-possible levels of performance, stability and corruption avoidance.
Let’s look at this process in the context of a typical PST migration project. Say, for example, you have 100 items in a single PST file. You may have put filters into place, which remove items of a certain size, or perhaps some of them have been identified as duplicates during the project. Either way, this can leave you with fewer than 100 items in your newly migrated PST file.
This means that a number of items for that PST file are still residing in the centralized BITS upload location. When configured to, the cleanup module will now move these items to a specific directory to attempt re-ingestion before ascertaining whether it’s now safe to remove them from the upload location, where you want to maximize space for items waiting to be ingested. This can safeguard you against the possible deletion of items that have failed to ingest. Any problem items are easily visible and appropriate rules can be applied to certain items that may need further processes applied to them.
High levels of speed and efficiency with a PST migration, just as with an archive or mailbox migration, is an end-to-end process that can’t rely on any one capability alone. Our approach to the cleanup phase serves to maximize both data safety and migration resources during your project, just as every other part of the process has been continually refined in order to achieve.