Monday, November 18, 2013

MessagePack

MessagePack is an "efficient binary serialization format" designed to be faster and smaller than JSON and has support in many programming languages.

Less than 150 lines of code!

What is that, you ask? Why thats the number of lines of code it took to implement a MessagePack encoder and decoder in Factor.

Reading

For decoding, our strategy will be to create a read-msgpack word that can operate on the current input-stream (allowing reuse for MessagePack objects read from files, strings, or the network).

DEFER: read-msgpack

Aside from support for basic data types such as integers, floating-point numbers, and strings, we also need to support arrays of objects, maps of key/value pairs, and so-called "extended" object types:

: read-array ( n -- obj )
    [ read-msgpack ] replicate ;

: read-map ( n -- obj )
    2 * read-array 2 group >hashtable ;

: read-ext ( n -- obj )
    read be> [ 1 read signed-be> ] dip read 2array ;

We need a way to specify a "nil" (or "null") object since we use t and f for booleans:

SINGLETON: +msgpack-nil+

And, of course, an error to indicate when a requested format is not supported:

ERROR: unknown-format n ;

With those definitions done, we can build a word to read a single MessagePack object from a stream:

: read-msgpack ( -- obj )
    read1 {
        { [ dup 0xc0 = ] [ drop +msgpack-nil+ ] }
        { [ dup 0xc2 = ] [ drop f ] }
        { [ dup 0xc3 = ] [ drop t ] }
        { [ dup 0x00 0x7f between? ] [ ] }
        { [ dup 0xe0 mask? ] [ 1array signed-be> ] }
        { [ dup 0xcc = ] [ drop read1 ] }
        { [ dup 0xcd = ] [ drop 2 read be> ] }
        { [ dup 0xce = ] [ drop 4 read be> ] }
        { [ dup 0xcf = ] [ drop 8 read be> ] }
        { [ dup 0xd0 = ] [ drop 1 read signed-be> ] }
        { [ dup 0xd1 = ] [ drop 2 read signed-be> ] }
        { [ dup 0xd2 = ] [ drop 4 read signed-be> ] }
        { [ dup 0xd3 = ] [ drop 8 read signed-be> ] }
        { [ dup 0xca = ] [ drop 4 read be> bits>float ] }
        { [ dup 0xcb = ] [ drop 8 read be> bits>double ] }
        { [ dup 0xe0 mask 0xa0 = ] [ 0x1f mask read ] }
        { [ dup 0xd9 = ] [ drop read1 read "" like ] }
        { [ dup 0xda = ] [ drop 2 read be> read "" like ] }
        { [ dup 0xdb = ] [ drop 4 read be> read "" like ] }
        { [ dup 0xc4 = ] [ drop read1 read B{ } like ] }
        { [ dup 0xc5 = ] [ drop 2 read be> read B{ } like ] }
        { [ dup 0xc6 = ] [ drop 4 read be> read B{ } like ] }
        { [ dup 0xf0 mask 0x90 = ] [ 0x0f mask read-array ] }
        { [ dup 0xdc = ] [ drop 2 read be> read-array ] }
        { [ dup 0xdd = ] [ drop 4 read be> read-array ] }
        { [ dup 0xf0 mask 0x80 = ] [ 0x0f mask read-map ] }
        { [ dup 0xde = ] [ drop 2 read be> read-map ] }
        { [ dup 0xdf = ] [ drop 4 read be> read-map ] }
        { [ dup 0xd4 = ] [ drop 1 read-ext ] }
        { [ dup 0xd5 = ] [ drop 2 read-ext ] }
        { [ dup 0xd6 = ] [ drop 4 read-ext ] }
        { [ dup 0xd7 = ] [ drop 8 read-ext ] }
        { [ dup 0xd8 = ] [ drop 16 read-ext ] }
        { [ dup 0xc7 = ] [ drop read1 read-ext ] }
        { [ dup 0xc8 = ] [ drop 2 read be> read-ext ] }
        { [ dup 0xc9 = ] [ drop 4 read be> read-ext ] }
        [ unknown-format ]
    } cond ;

Pretty simple!

Writing

For encoding, our strategy will be to define a generic write-msgpack word that will dispatch off the type of object being encoded and operate on the current output-stream (allowing reuse for MessagePack objects written to files, strings, or the network).

GENERIC: write-msgpack ( obj -- )

And, of course, an error to indicate when a requested object type isn't supported:

ERROR: cannot-convert obj ;

Writing the "nil" (or "null" object) and boolean values true and false:

M: +msgpack-nil+ write-msgpack drop 0xc0 write1 ;

M: f write-msgpack drop 0xc2 write1 ;

M: t write-msgpack drop 0xc3 write1 ;

Support for integers and floating point numbers:

M: integer write-msgpack
    dup 0 >= [
        {
            { [ dup 0x7f <= ] [ write1 ] }
            { [ dup 0xff <= ] [ 0xcc write1 1 >be write ] }
            { [ dup 0xffff <= ] [ 0xcd write1 2 >be write ] }
            { [ dup 0xffffffff <= ] [ 0xce write1 4 >be write ] }
            { [ dup 0xffffffffffffffff <= ] 
                [ 0xcf write1 8 >be write ] }
            [ cannot-convert ]
        } cond
    ] [
        {
            { [ dup -0x1f >= ] [ 1 >be write ] }
            { [ dup -0x80 >= ] [ 0xd0 write1 1 >be write ] }
            { [ dup -0x8000 >= ] [ 0xd1 write1 2 >be write ] }
            { [ dup -0x80000000 >= ] [ 0xd2 write1 4 >be write ] }
            { [ dup -0x8000000000000000 >= ] 
                [ 0xd3 write1 8 >be write ] }
            [ cannot-convert ]
        } cond
    ] if ;

M: float write-msgpack
    0xcb write1 double>bits 8 >be write ;

Support for strings and byte-arrays:

M: string write-msgpack
    dup length {
        { [ dup 0x1f <= ] [ 0xa0 bitor write1 ] }
        { [ dup 0xff <= ] [ 0xd9 write1 write1 ] }
        { [ dup 0xffff <= ] [ 0xda write1 2 >be write ] }
        { [ dup 0xffffffff <= ] [ 0xdb write1 4 >be write ] }
        [ cannot-convert ]
    } cond write ;

M: byte-array write-msgpack
    dup length {
        { [ dup 0xff <= ] [ 0xc4 write1 write1 ] }
        { [ dup 0xffff <= ] [ 0xc5 write1 2 >be write ] }
        { [ dup 0xffffffff <= ] [ 0xc6 write1 4 >be write ] }
        [ cannot-convert ]
    } cond write ;

Support for arrays of MessagePack objects:

: write-array-header ( n -- )
    {
        { [ dup 0xf <= ] [ 0x90 bitor write1 ] }
        { [ dup 0xffff <= ] [ 0xdc write1 2 >be write ] }
        { [ dup 0xffffffff <= ] [ 0xdd write1 4 >be write ] }
        [ cannot-convert ]
    } cond ;

M: sequence write-msgpack
    dup length write-array-header [ write-msgpack ] each ;

Support for maps of key/value pairs:

: write-map-header ( n -- )
    {
        { [ dup 0xf <= ] [ 0x80 bitor write1 ] }
        { [ dup 0xffff <= ] [ 0xde write1 2 >be write ] }
        { [ dup 0xffffffff <= ] [ 0xdf write1 4 >be write ] }
        [ cannot-convert ]
    } cond ;

M: assoc write-msgpack
    dup assoc-size write-map-header
    [ [ write-msgpack ] bi@ ] assoc-each ;

Convenience

To conveniently convert into and out of the MessagePack format, we can make words to read from and write to strings:

: msgpack> ( string -- obj )
    [ read-msgpack ] with-string-reader ;

: >msgpack ( obj -- string )
    [ write-msgpack ] with-string-writer ;

Not too hard, was it!

The code for this (including some documentation and tests) is available in the development version of Factor.

No comments: