The PST FlightDeck Process – PST File Processing
Last week we looked at how the discovered PST files are centralised in order to be further prepared for their ingestion into a new target system. Today we take a look at what needs to be applied to the files prior to their ingestion.
Once the PST files arrive at the central datacenter, they will be prepared for ingestion into the target system. As PST FlightDeck is workflow based, the necessary process can be modified as required. A typical process workflow would take care of the following steps with FlightDeck’s corresponding module.
Password Removal Module
Based on our experience, 5-8% of PSTs are typically password protected. The Password-Removal Module checks each received PST file to see if it is password protected, and if so, it will strip the password from the PST. In fact, the module tells the PST file that it is no longer password protected.
Most customers like to record a “last reliable” backup of the PST files in the central location before further processing/ingesting the data. These backups are typically kept for a limited amount of time after ingestion to the target system. The backup is written to a configurable Windows share and it is assumed that the company will use its established backup methodology to backup this share. After a successful backup, the PST files will be deleted from the backup share.
Pre-flight Checking Module
PST files may either be corrupted or contain corrupted items, which can prevent a successful ingestion. The Preflight-Check determines if a PST file is corrupted and will try to repair them if so. If a PST file can be opened successfully, it will be checked to see if it’s in the old (older than Outlook 2003) ANSI format. If so, it will be converted to a Unicode PST file. The next step of the process is to check each item within the PST file for corruption.
Content can be filtered in this module based on the available metadata. For example, if the legal retention policy of a customer is 7 years, data older than 7 years can be filtered out to reduce the amount of data required for ingestion. Furthermore, different message types such as contacts or tasks (based on the Exchange message class) can be filtered out if they are unwanted in the target environment.
Based on over 10 years of experience running large PST migration projects, one of the major issues that can occur are the backup-copies of the PST files being interacted with by the end-users. These backup-copies might contain small deltas between the files. Ingesting all the PST files would result in duplicated folder structures and duplicated data in the users Mailbox. The deduplication module takes all PST files for a single user into consideration and calculates a fingerprint for each item. If the fingerprint for an item is the same, it means the same message is already present in another PST file. The older message can then be removed. Typically, this step can result in an (up to) 33% reduction in data volume and required ingestion time.
Next week we move to the ingestion phase of the PST FlightDeck process.