First of all, Google Takeout conveniently exports all your mail in standard mbox format as one huge file. Interestingly, this file also contains your chat transcripts, disguised as RFC822-formatted messages, but with the Message-Id field. They could be distinguished by the presence of the "Chat" label in the X-GM-Labels header field. I chose to discard them.
Speaking of labels: IMAP has folders while Gmail has labels. Gmail IMAP interface maps labels to folders. I use IMAP's STORE command and Gmail X-GM-LABELS IMAP extension to assign multiple labels to a single message.
Standard IMAP protocol (without UIDPLUS extension) does not give you a unique ID of a message you just appended. I need this unique ID to assign labels. So after successful message addition, I immediately look it up by Message-Id header field. At this point, I hit two bugs in Gmail IMAP implementation:
Bug #1When Message-Id contains '%' character, the IMAP SEARCH command fails to find it. This a screenshot of this message present in my mailbox taken from the Gmail web interface:
But the following IMAP command fails to locate it:
Some people observed the same problem with "!" character and speculated that Gmail split message-id into "words" before indexing. The workaround proposed simulates this split and performs a search on the conjunction of several parts of the subject and then verifies that the found message has the correct Message-Id by fetching it. Instead, I chose to implement a more lightweight solution using X-GM-RAW Gmail's extension to IMAP's SEARCH command, which allows searching using google search syntax. In particular, for the example used above, one can use
UID SEARCH HEADER Message-ID "<D7DA993B.B30DCemail@example.com>"