Showing posts with label haskell. Show all posts
Showing posts with label haskell. Show all posts

Friday, March 12, 2021

Importing mail from Google Takeout into another Gmail account

I had a Google Takeout export of my old University Google account, which I wanted to import into my personal Gmail. Unfortunately, Google does not provide an easy way to do this. I decided to use this as an opportunity to refresh my Haskell language skills, and in particular, to familiarize myself with the Pipes library. The idea was to process an arbitrary size mailbox in fixed memory without explicitly relying on Haskell's laziness. I will not bother you with details of my Haskell implementation. You can take a look at the code yourself, but take it with a grain of salt: I am not a practicing Haskell programmer. For me, this language is more of a lingua franca used in the academic functional programming community. I want to tell you about a few quirks and bugs of the Gmail IMAP interface I've discovered.

First of all, Google Takeout conveniently exports all your mail in standard mbox format as one huge file. Interestingly, this file also contains your chat transcripts, disguised as RFC822-formatted messages, but with the Message-Id field. They could be distinguished by the presence of the "Chat" label in the X-GM-Labels header field. I chose to discard them.

Speaking of labels: IMAP has folders while Gmail has labels. Gmail IMAP interface maps labels to folders. I use IMAP's STORE command and Gmail X-GM-LABELS IMAP extension to assign multiple labels to a single message.

Standard IMAP protocol (without UIDPLUS extension) does not give you a unique ID of a message you just appended. I need this unique ID to assign labels. So after successful message addition, I immediately look it up by Message-Id header field. At this point, I hit two bugs in Gmail IMAP implementation:

Bug #1

When Message-Id contains '%' character, the IMAP SEARCH command fails to find it. This a screenshot of this message present in my mailbox taken from the Gmail web interface:



But the following IMAP command fails to locate it:
UID SEARCH HEADER Message-ID "<D7DA993B.B30DC%psztxa@exmail.nottingham.ac.uk>"
Some people observed the same problem with "!" character and speculated that Gmail split message-id into "words" before indexing. The workaround proposed simulates this split and performs a search on the conjunction of several parts of the subject and then verifies that the found message has the correct Message-Id by fetching it. Instead, I chose to implement a more lightweight solution using X-GM-RAW Gmail's extension to IMAP's SEARCH command, which allows searching using google search syntax. In particular, for the example used above, one can use 
 
"Rfc822msgid:<D7DA993B.B30DC%psztxa@exmail.nottingham.ac.uk>" 
 
search query, which successfully finds the message with "%" in Message-Id.

Bug #2

Another Message-Id-related bug I've stumbled upon is more surprising. It turns out, when indexing messages, Gmail strips square brackets from Message-Ids! If you look at the screenshot from the web interface, you can see that Message-Id shown in the header does not match the actual field value below! To search for such messages, I had to strip square brackets in the search string.


Conclusion

This exercise is not a production-ready script, and some other problems may occur during the import, but I was able to successfully import my mailbox with about 32K messages. You are welcome to hack my script for your own needs. I've submitted a pull request for HaskellNet library with my bugfixes and changes to support Gmail IMAP extensions.
 
P.S. Both bugs were reported to Google: #183687621 and #183677218.




Wednesday, March 8, 2017

Writer comonad in Haskell

Writer monad is readily available in Haskell. But what about Writer comonad? Let us start with the basics. Comonad class is defined as:

Comonad w where
    extract :: w a -> a
    duplicate :: w a -> w (w a)
    extend :: (w a -> b) -> w a -> w b

The minimal implementation requires you to implement 'extract' and 'extend.'  Additionally, it must satisfy the following comonad laws:

1
2
3
extend extract      = id
extract . extend f  = f
extend f . extend g = extend (f . extend g)

The 'extract' is pretty straightforward -- it allows safely extract value from the monad:

extract x = fst $ runWriter x

Our first intuition for 'extend' would be something like this:

extend f wa = return (f wa)

Which is incorrect, as it losses the state accumulated so far and replaces it with 'mempty' and thus do not satisfy the 1st law above (extend extract = id).

Without further ado, the writer comonad definition below correctly preserves the state and meets all three laws:

instance (Monoid w) => Comonad (Writer w) where
    extract x = fst $ runWriter x
    extend f wa = (tell $ execWriter wa) >> return (f wa)

Links:

  1. Haskell comonad library
  2. Category Theoretical view of comonads
  3. Another implementation of Writer and Reader comonads in Haskell

Saturday, February 6, 2016

State Monad in Coq Tutorial

In this post we will review usage of StateMonad in Coq using ExtLib on an adapted example from  a Haskell State Monad tutorial.

We will implement a simple game function which takes a string from dictionary {a,b,c} and produces a score number. The game could have two modes: on and off. It consumes an input string character by character and updates the mode and the score at each step. Initially the mode is off. The character 'c' toggles the on/off mode flag. In off mode, consuming characters does not affect the score. In on mode, each occurrence of the character 'a' increases the score by 1, and each occurrence of the character 'b' decreases it by 1. So for example, the input sequence "ab" will result in a score of 0, "ca" a score of 1, "cabca" also a 0, and "abcaaacbbcabbab" will result in a score of 2.

Let us look the implementation of this game in Coq using State Monad:

1 Require Import Coq.ZArith.ZArith_base Coq.Strings.String.
2 Require Import ExtLib.Data.Monads.StateMonad ExtLib.Structures.Monads.
3 
4 Section StateGame.
5   Import MonadNotation.
6   Local Open Scope Z_scope.
7   Local Open Scope char_scope.
8   Local Open Scope monad_scope.
9 
10   Definition GameValue : Type := Z.
11   Definition GameState : Type := (prod bool Z).
12 
13   Variable m : Type → Type.
14   Context {Monad_m: Monad m}.
15   Context {State_m: MonadState GameState m}.
16   
17   Fixpoint playGame (s: string): m GameValue :=
18     match s with
19     |  EmptyString =>
20        v <- get ;;
21          let '(on, score) := v in ret score
22     |  String x xs =>
23        v <- get ;;
24          let '(on, score) := v in
25          match x, on with
26          | "a", true =>  put (on, score + 1)
27          | "b", true => put (on, score - 1)
28          | "c", _ =>   put (negb on, score)
29          |  _, _  =>    put (on, score)
30          end ;; playGame xs
31     end.
32 
33   Definition startState: GameState := (false, 0).
34 End StateGame.
35 
36 Definition main : GameValue :=
37   (evalState  
38            (playGame (state GameState) "abcaaacbbcabbab") startState).

I will omit description of the Coq nuts and bolts of Require, Import, and Open Scope instructions in lines 1-8.

In lines 10 and 11, we declare type aliases for our game score value as a signed integer (Z), and game state as a product of the boolean game mode and and an integer game score.

Variable 'm' in line 13 identifies our Monad by it's type constructor. Line 14 states that 'm' must be an instance of type class Monad. If we look at ExtLib's definition of this type class we will see that it just postulates the presence of the standard Monad bind and return (called ret here) functions:

Class Monad (m : Type → Type) : Type := { 
    ret : ∀ {t}, t → m t ;
    bind : ∀ {t u}, m t → (t → m u) → m u
}.

The next class, mentioned in line 15 brings additional get and put definitions, specific for State Monad:

Class MonadState (T : Type) (m : Type → Type) : Type := {
    get : m T ;
    put : T → m unit
}.

The actual game function definition begins on line 17. It is a recursive function consuming an input string and returning a monadic value as a result. The string is an inductive type and we match its two constructors on lines 19 and 22, respectively. If the string is empty, we use syntax x <- c1 ;; c2 which is a Coq's rendering of well a known do-notation. Informally: we extract a score element from the current state tuple and return it as a monadic value using ret. While in Haskell, we can write a much terser expression (_, score) <- get, in Coq we have to use an intermediate value 'v' followed by destructive let, which unfortunately does not allow us to use a wildcard for the first element (mode), which we do not need anyway.

In the case of the second constructor on line 22 we again use do-notation and destructing let to break down the current state into on and score components and then analyze their combinations via match. Again, we use slightly different syntax from Haskell's version and, instead of matching on x with guards, we match on both x and on. Depending on their combination, we update the score and on/off flag according to our game rules using the state monad's put function. Then, we proceed with the next recursive iteration chained by c1 ;; c2 notation which is sometimes called then operator (>>) in Haskell, and used to to chain monadic computations where c2 does not depend on the outcome of c1. This could be expanded into our familiar do-notation of:  _ <- c1 ;; c2.

On line 33, we define our initial game state (game off, score=0) as startState.

On line 36, we define our 'main' function, which runs a game for a specific string and returns an integer score. It is a bit difficult to parse, so for clarity let us split it into several definitions, specifying return types and making all arguments explicit via '@':

Definition m : Type → Type := state GameState.
Definition pg :  state GameState GameValue := @playGame m _  _ "abcaaacbbcabbab".
Definition main : GameValue := @evalState GameState GameValue pg startState.

The first definition 'm' uses the state record type to create our State Monad's constructor. It is defined as:

state: Type → Type → Type
Record state (S t : Type) : Type := mkState { runState : S → t * S }

We partially specialize state with GameState, which gives us a new type constructor (Type → Type), which instantiates a type which holds a function with a type GameState -> t * GameState.

Th next definition 'pg' holds a monadic value returned as a result of running our game. You may notice that its type is state GameState GameValue which is the same as m GameValue, the type of  playGame. Please note that playGame was defined inside a section. Consequently, when seen outside one, it is additionally universally quantified with all section variables. In our case playGame now takes 4 arguments. The first is a type constructor for the monad, the second is the type class instance Monad m that carries the monad functions, the third is the type class instance MonadState GameState m that carries the monad state functions, and the fourth is the string input to the function. We explicitly provide 'm' but use wildcards for type class parameters, letting the type class resolution mechanism find instances for parameters marked as _ . Coq will look for a type class instance of Monad (state GameState) and MonadState (state GameState). Both of these are defined as Global Instances in the ExtLib.Data.Monads.StateMonad library so type class resolution finds the definitions. Thus, after class resolution, the 'pg' definition will expand to:

(@playGame (state GameState) (Monad_state GameState)
     (MonadState_state GameState) "abcaaacbbcabbab"

Finally in the 'main' definition, we use the evalState function to get our result. It is defined as:

Definition evalState {t} (c : state t) (s : S) : t :=
    fst (runState c s).

The semantics are pretty straightforward; we apply State Monad to the initial state (run state transition function), and project out the first element of the returned tuple (the score).

To test our implementation we can use Compute main instruction to get the expected result (2).

P.S. I would like to thank Gregory Malecha for the explanations, comments, and suggestions he provided.