A first look at the new Import Export Mailbox API in the Microsoft Graph Part 2 - Export Items
This is part 2 of my series on looking at the new Import and Export API in the Microsoft Graph. In part 1 I went through getting started in the API and doing folder operations and synchronization so I’d recommend reading this if you haven’t already. In this post I’m going to look at the actual data export side.
Data formats
There are a number of different data and file formats around Email (or Email messaging) eg MIME, SMIME, ICAL, VCF etc. In the Exchange world there is also MSG, PST, OST, TNEF and EDB which for the first three (MSG, PST, OST) are actually Outlook file formats and the last is the Exchange Database file format. For individual Exchange items MSG files are the most common portable file format used to transfer (or save) a single item with full fidelity or provide a full fidelity export of an item for external auditing etc. Because they are an Outlook file format they are just easy to deal with no matter what the situation. The other place the Compound Binary File format gets used is in RMS where the email being encrypted (or protected) becomes an Encrypted Compound binary file CDFV2.
The data format that the Import and Export endpoint uses isn’t any of the ones I mentioned above. The format used is the Fast Transfer Stream which is part of Exchange’s Bulk Data Transfer protocol . This format is essentially a serialized stream of Mapi properties that represents an Exchange store Item (including its attachments). The FTS stream used by the Graph appears to be the same format as that used in Export and Upload items in EWS so it doesn’t follow exactly the documented spec in the Exchange protocol docs but an undocumented one that was used in EWS. (In the EWS docs in was never documented as FTS just an opaque stream or a suffusion of yellow).
So to answer a few quick questions that people always ask around Export and Import with any Exchange API.
Can I export or import to/from MSG file using this API - No
Can I export or import to/from a PST file using this API - No
These are both Outlook file formats so if you want to export or import from or into these use Outlook. Or for PST’s import use https://learn.microsoft.com/en-us/purview/importing-pst-files-to-office-365 and export https://learn.microsoft.com/en-us/purview/ediscovery-export-content . But the point here is there is nothing in these server side API’s that will help with the conversion from FTS.
Interoperability with EWS
The stream produced and expected by Graph appears to be the same as the one used in the EWS Export and Upload Item operations (this is not documented and in fact the documentation for the Graph ExportItems end just defines it as a Fast Transfer Stream which is not 100% correct). From the testing that I did i was able to export a Message using EWS (using the EWSEditor) and then import that into Graph. And also take an export I did using the Graph ExportItems endpoint and import that item into a folder using the EWSEditor again. I didn’t test it using an OnPrem Exchange server but I would expect this to work the same way but mileage may vary. If you want to know about parsing and building your own FTS streams have a read of this Stackoverflow https://stackoverflow.com/questions/73440480/fast-transfer-stream-parser-ews-export-items . I think the same would go for supportability between OnPrem and Online Exchange if you going to go EWS - Graph (or vise versa) in the future it works for now but isn’t supported and they reserve the right to break it in the future. If your migrating a current EWS application or script this does provide and easy migration path. Also Redemption and Outlook classic are something you maybe able to use to do format conversions to the Outlook file formats I mentioned above. If your interested in looking at the contents of an FTS stream then try a binhex editor https://hexed.it/ is one i use and it basically lets you see the serialized mapi properties within the stream.
Exporting Items
Enumerating
To export items you first need to enumerate the Id’s of the items you want to export, in part1 we talked about the Import Export API being the first endpoint that allows you to enumerate any item in a mailbox regardless of it type and location (eg being in the Archive mailbox). For enumerating items you want to use the
/admin/exchange/mailboxes/{mailboxId}/folders/{mailboxFolderId}/items
This will return an optimized item query of whatever folder your trying to enumerate items from and return items of the microsoft.graph.mailboxItem type.
This class/type has very few Strongly typed properties so what you get back is something like
The two most obvious missing properties are Subject (for debugging) and receivedDateTime (If you just want to filter on new messages that have arrived). You can add those properties back in by using the extended property definition to define what extra properties you want to be returned. So in my enumeration sample i use a query like the following.
https://graph.microsoft.com/beta/admin/exchange/mailboxes/$MailboxId/folders/$MailFolderId/items?$expand=singleValueExtendedProperties($filter=(id eq 'String 0x0037') or (id eq 'SystemTime 0x0E06'))
ItemId’s
Like FolderId’s the same ItemId’s (RestId) has been used across the regular graph Mail endpoint and the new Import Export Endpoints. Also ImmutableId’s which i posted about recently are available. In terms of migration or synchronization ImmutableId’s offer some great advantages that aren’t available in EWS or Mapi. You have 1 ImmutableId that doesn’t change as the item cycles through the Mailbox which is a critical piece of logic you need for synchronizing data.
Filtering
Filtering works the same as it does on other Graph endpoints with the exception that because there is only a low number of strongly typed properties for most of the filtering you may want to use will need to be expressed as a extended property filter.
eg Email from a particular Email domain and then because its using a contains you still need to verify to eliminate false positives.
https://graph.microsoft.com/beta/admin/exchange/mailboxes/MBX:73…/folders/inbox/items?$filter=singleValueExtendedProperties/Any(ep: ep/id eq 'String 0x5D01' and contains(ep/value, '@adomain.com'))&$expand=singleValueExtendedProperties($filter=id eq 'String 0x5D01')
Batching and Performance
The Export items endpoint accepts an array of up to 20 items to be submitted in each export request. You can then batch export items requests using the regular batch endpoint in Graph in batches of up to 20 so this means you have a 20 X 20 multiplier effect so you can submit one batch export request that will export 400 items. The way the batch will then get executed by the Graph is it will take each of the Graph batch requests and dispatch them to Exchange concurrently up to a limit of 4 concurrent connections at a time. So in this case you’ll have one thread each trying to export 20 items at a time. Once one export is complete it will then move onto the next export request in the batch.
The upshoot of all this is because the requests are all server to server once the Graph has excepted your batch request it’s extremely fast. My quick benchmarking on this on my personal mailbox which has a lot of small and large items is that i could export
20000 Items around 6GB in size with an average email size around 300KB and it took 20 minutes with a single threaded PowerShell script over a 50Mb/s link.
Then another run at 60000 items which totalled around 14.7 GB took 43 minutes.
I was pretty surprised by these figures and that fact that I didn’t hit any throttling, the service is in preview and I’m sceptical as if you would see those figures carry over into production. The basis of throttling from a service providers point of view is to conserve resources so while awesome export performance from a clients/developer point of view is great if it starts hurting the server end too much you would expect the performance to be lowered.
You could approach it different by not batching the Export Item then threading each exportItem request. If you have large items this could potentially be better but one of the challenges of mailbox data is having lots of really small items. These take a lot more time to export/import because of the overhead and latency associated with each item so the 400 item server batch seems to address potentially that issue.
The 3 body problem (or 4 concurrent thread one).
I talked a little bit about the 4 concurrent thread issue in the last post and if your really looking to optimize an export app your going to hit the wall on this one. If your running a Mailbox export for whatever reason your app is probably going to want to take advantage of the 20 x 20 server batching that is on offer (because it appears to work really well) meaning your basically tying up all 4 concurrent connections that your application can have. If you where allowed 5 concurrent threads that would give you some room to optimize by allowing another connection to readahead etc.
Is it Audited
The ability to essentially download a Mailbox worth of data in a reasonably small amount of time means that auditing should be happening somewhere (in theory). At the time of writing I can’t find anywhere that is happening but I’ll keep looking and post back what I find.
Examples
As with the first post I’ve been updating my test PowerShell script that uses the PowerShell Graph SDK. There script is available on git hub at https://github.com/gscales/Powershell-Scripts/blob/master/Graph101/GraphSDK/Import-ExportMod.ps1
First thing you need is the Mailbox Id to use which can be obtained vai
$Mailbox = Invoke-GetMailboxSettings -Upn gscales@datarumble.com
Enumerate 50000 items in the Inbox
Invoke-ListMailboxFolderItems -MailboxId $mailbox.primaryMailboxId -MailFolderId inbox -ItemCount 50000
Enumerate the first 5000 mail Items that have arrived in the last 2 weeks
$ItemsToExport = Invoke-ListMailboxFolderItems -MailboxId $mailbox.primaryMailboxId -MailFolderId inbox -ItemCount 5000 -Filter "singleValueExtendedProperties/Any(ep: ep/id eq 'SystemTime 0xE06' and cast(ep/value, Edm.DateTimeOffset) ge 2025-01-24T00:00:00Z)"
To Export those items contained in the $ItemsToExport collection
Invoke-BatchExportItems -Items $Items -MailboxId $mailbox.primaryMailboxId -ExportPath C:\temp\TestExp2\ -Verbose
Outliers and edge cases
If your new to exporting email via these API’s a few things to watch out for a reference or cloudy attachments as they aren’t included in the Export Stream. Encrypted messages either RMS or any type of encryption may mean what you have exported is worthless once it removed from the source.
Conclusion
I think the biggest surprise for me was the performance of the export, when I first read the API spec and saw 20 items in the export item it was pretty sceptical (in EWS batch size of around 50 are the sweet spot). But after testing and comparing it against EWS for me it equivalent on the speed of export especially on more latent links. The things for me that does flash warning signs are it is in preview and you can get great performance in EWS in an onPrem server if you switch throttling off (beware the throttling reaper) .
In the next post I’ll take a look at importing items.
Thanks for this excellent article series, it was very helpful in getting a grasp on this new API. I also could not find audit logs recording the use of the API, interested in whether you were able to turn up any. Hopefully by the time it leaves beta that will be addressed. In terms of detecting this activity in the interim, the MicrosoftGraphActivityLogs (enable via an Entra ID diagnostic setting) could be used to find requests to the exportItems or $Batch endpoints with the MailboxItem.ImportExport or MailboxItem.ImportExport.All role or scope. You could probably also factor in the ResponseSize to identify when a non-trivial number of messages was exported.