Re: Factor: March 2011

Wednesday, March 30, 2011

ASCII Art

Yesterday, Raymond Hettinger, an active contributor to Python, tweeted a fun one-liner that works in Python 3.2. I thought I would show how to translate it into Factor.

Fun #python3.2 one-liner: print('\n'.join('*'*(c//2000) for i,c in sorted(Counter(map(sum, product(range(6), repeat=8))).items())))less than a minute ago via webraymondh
raymondh

The original Python "one-liner", including required imports looks like this:

from collections import Counter
from itertools import product

print('\n'.join('*'*(c//2000) for i,c in
    sorted(Counter(map(sum, product(range(6), repeat=8))).items())))

To translate this code into a concatenative language, we are going to work from the inside and move out. In this case, starting with the sum of the cartesian product of eight sequences of the numbers 0 through 5:

map(sum, product(range(6), repeat=8))

vs.

8 [ 6 iota ] replicate [ sum ] product-map

Next, we see that it counts each element, and produces a sorted list of items:

sorted(Counter(...).items())

vs.

histogram >alist sort-keys

And finally, prints a line of stars for each element:

print "\n".join('*'*(c//2000) for i,c in ...)

vs.

values [ 2000 /i CHAR: * <string> print ] each

Putting it all together, it makes this nice "ASCII Art" visualization:

( scratchpad ) USING: assocs io math math.statistics sequences
               sequences.product sorting strings ;

( scratchpad ) 8 [ 6 iota ] replicate [ sum ] product-map
               histogram >alist sort-keys values
               [ 2000 /i CHAR: * <string> print ] each







*
***
*****
********
************
******************
*************************
********************************
*****************************************
*************************************************
********************************************************
**************************************************************
******************************************************************
*******************************************************************
******************************************************************
**************************************************************
********************************************************
*************************************************
*****************************************
********************************
*************************
******************
************
********
*****
***
*

Monday, March 28, 2011

Unique Hash

Recently, I stumbled onto a blog post from 2009 which discussed a way to generate random-looking strings from a series of unique numeric id's, using PHP. The author uses a base-62 encoding ([0-9][A-Z][a-z]) to convert a number to a unique string using a series of prime numbers to reduce the potential of collisions. Below, I show how it might look in Factor.

First, we implement a simple base-62 encoder:

CONSTANT: CHARS
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

: base62 ( n -- string )
    [ dup 0 > ] [ 62 /mod CHARS nth ] "" produce-as reverse nip ;

Next, we define a series of prime numbers (which should be kept "secret" to make the algorithm hard to predict) chosen to be close to the next highest prime from the golden mean of the number of possible permutations for each string length:

CONSTANT: PRIMES
{ 1 41 2377 147299 9132313 566201239 35104476161 2176477521929 }

Finally, we can implement the "unique hash" function:

:: udihash ( n chars -- string )
    chars PRIMES nth n * 62 chars ^ mod base62
    chars CHAR: 0 pad-head ;

Try it out and see that it produces the same random-looking results as the original author. For example, a 5-character hash of the numbers 1 through 10:

( scratchpad ) 10 [1,b] [ 5 udihash print ] each
cJio3
EdRc6
qxAQ9
TGtEC
5ac2F
huKqI
KE3eL
wXmSO
YrVGR
BBE4U

You can find a discussion on StackOverflow, a similar approach used in the SilverStripe CMS to shorten URLs, and lots of search results probably inspiring (or inspired by) the original blog post.

If you're curious how to select your prime numbers, you can use math.primes to create your own list:

( scratchpad ) CONSTANT: golden-ratio 1.618033988749894848 ( scratchpad ) 5 [1,b] [ 62 swap ^ golden-ratio /i next-prime ] map . { 41 2377 147299 9132313 566201239 }

The code for this is on my Github.

Friday, March 25, 2011

Sum

Today's programming challenge is to implement the "old Unix Sys V R4" sum command:

"The original sum calculated a checksum as the sum of the bytes in the file, modulo 2¹⁶−1, as well as the number of 512-byte blocks the file occupied on disk. Called with no arguments, sum read standard input and wrote the checksum and file blocks to standard output; called with one or more filename arguments, sum read each file and wrote for each a line containing the checksum, file blocks, and filename."

First, some imports:

USING: command-line formatting io io.encodings.binary io.files
kernel math math.functions namespaces sequences ;

Short Version

A quick file-based version might look like this:

: sum-file. ( path -- )
    [
        binary file-contents
        [ sum 65535 mod ] [ length 512 / ceiling ] bi
    ] [ "%d %d %s\n" printf ] bi ;

You can try it out:

( scratchpad ) "/usr/share/dict/words" sum-file.
19278 4858 /usr/share/dict/words

The main drawbacks to this version are: loading the entire file into memory (which might be a problem for big files), not printing an error if the file is not found, and not supporting standard input.

Full Version

A more complete version might begin by implementing a function that reads from a stream, computing the checksum and the number of 512-byte blocks:

: sum-stream ( -- checksum blocks )
    0 0 [ 65536 read-partial dup ] [
        [ sum nip + ] [ length + nip ] 3bi
    ] while drop [ 65535 mod ] [ 512 / ceiling ] bi* ;

The output should look like CHECKSUM BLOCKS FILENAME:

: sum-stream. ( path -- )
    [ sum-stream ] dip "%d %d %s\n" printf ;

We can generate output for a particular file (printing FILENAME: not found if the file does not exist):

: sum-file. ( path -- )
    dup exists? [
        dup binary [ sum-stream. ] with-file-reader
    ] [ "%s: not found\n" printf ] if ;

And, to prepare a version of sum that we can deploy as a binary and run from the command line, we build a simple MAIN: word:

: run-sum ( -- )
    command-line get [ "" sum-stream. ] [
        [ sum-file. ] each
    ] if-empty ;

MAIN: run-sum

The code for this is on my Github.

Wednesday, March 23, 2011

Look and Say

There was a programming challenge a week ago that asked for solutions that create the Look and Say sequence. Below is an implementation in Factor.

The “Look and Say” sequence, Sloane number A005150, begins 1, 11, 21, 1211, 111221, 312211, 13112221, 1113213211, …. Each term is constructed from its predecessor by stating the frequency and number of each group of like digits. For instance, the term after 1211 is “one 1, one 2, and two 1s”, or 111221.

We can use the splitting.monotonic vocabulary to split a number, recombine it based on the length of intermediate sequences of digits, and produce the next value in a "Look and Say" sequence:

USING: formatting kernel math.parser sequences splitting.monotonic ;

: look-and-say ( n -- n' )
    number>string [ = ] monotonic-split [
        [ length ] [ first ] bi "%d%c" sprintf
    ] map concat string>number ;

You can try it out in the Listener to see the first 10 numbers in the sequence:

( scratchpad ) 1 10 [ dup . look-and-say ] times
1
11
21
1211
111221
312211
13112221
1113213211
31131211131221
13211311123113112211

Monday, March 21, 2011

Names of Gaddafi

A question was posted yesterday on StackOverflow asking how to create a regular expression to search for the many names of Libyan leader, Muammar al-Gaddafi. I thought it would be fun to explore this problem using Factor.

Someone helpfully posted an image that demonstrates some of the variations of his full name:

List of Names

One approach would be to list out all his possible names, and then check to see if a given string is in the list:

CONSTANT: names {
    "Gadaffi"
    "Gadafi"
    "Gadafy"
    "Gaddafi"
    "Gaddafy"
    "Gaddhafi"
    "Gadhafi"
    "Gathafi"
    "Ghadaffi"
    "Ghadafi"
    "Ghaddafi"
    "Ghaddafy"
    "Gheddafi"
    "Kadaffi"
    "Kadafi"
    "Kad'afi"
    "Kaddafi"
    "Kadhafi"
    "Kazzafi"
    "Khadaffy"
    "Khadafy"
    "Khaddafi"
    "Qadafi"
    "Qaddafi"
    "Qadhafi"
    "Qadhaafi"
    "Qadhdhafi"
    "Qadthafi"
    "Qathafi"
    "Quathafi"
    "Qudhafi"
}

: gaddafi? ( string -- ? )
    names member? ;

Regular Expressions

If we wanted to speed it up, we could build a regular expression from all of the names (using literals to build the regular expression at compile time):

USE: regexp

: gaddafi? ( string -- ? )
    $[ names "|" join <regexp> ] matches? ;

Note: the first time you call this method it needs to compile the regular expression and is a bit slow. Subsequent calls are much faster.

One problem with that, is that it doesn't take into account that sometimes he is called "Al Qaddafi" or "el-Gaddafi". We could create our own case-insensitive pattern that attempts to capture all the possible variations of his name:

CONSTANT: names-pattern
R/ ((al|el)[-\s]?)?(Kh?|Gh?|Qu?)[aeu](d['dt]?|t|zz|dhd)h?aa?ff?[iy]/i

: gaddafi? ( string -- ? )
    names-pattern matches? ;

An advantage of using regular expressions is that it is easy to take a piece of text and normalize all the mentions of his name:

: normalize-gaddafi ( string -- string' )
    names-pattern "Gaddafi" re-replace ;

Soundex

An interesting idea was proposed on the Hacker News discussion to use Soundex to match names. That might look something like this:

USE: soundex

: gaddafi? ( string -- ? )
    soundex { "G310" "K310" "Q310" "Q331" } member? ;

Some disadvantages of this is that it doesn't capture names with prefix "Al" or "El", and misses some names (e.g., "Kazzafi" has a soundex value of "K210", but that would produce a false match against a name like "KOSOFF").

This problem is made even harder if you want to include all the possible variations of his first name (e.g., Muammar, Moamar, Mo'ammar, Mu'amar, Moamma, etc.), and include text that is "FIRSTNAME LASTNAME" or "LASTNAME, FIRSTNAME".

Sunday, March 20, 2011

Typed Netstrings

A few hours ago, Zed Shaw tweeted about an experiment creating typed netstrings:

An experiment in tagging netstrings with the types of their data as an alternative to JSON: http://codepad.org/xct0E5acless than a minute ago via webzedshaw
zedshaw

I thought that an implementation in Factor would contrast nicely with Zed's Python version. The basic idea is to support four kinds of data types:

Text
Numbers
Lists
Dictionaries (e.g., maps or associations)

First, some imports:

USING: arrays combinators formatting hashtables kernel
math.parser sequences splitting ;

An encoded payload value looks something like "{LENGTH}:{PAYLOAD}{TYPE}". We can write a simple word to parse a string that looks like that into its parts:

: parse-payload ( data -- remain payload payload-type )
    ":" split1 swap string>number cut unclip swapd ;

We can build a simple parser for these "typed netstrings" (deferring for the moment, how we parse lists and dictionaries):

DEFER: parse-dict
DEFER: parse-list

: parse-tnetstring ( data -- remain value )
    parse-payload {
        { CHAR: # [ string>number ] }
        { CHAR: " [ ] }
        { CHAR: } [ parse-dict ] }
        { CHAR: ] [ parse-list ] }
        [ "Invalid payload type: %c" sprintf throw ]
    } case ;

Parsing lists is just repeatedly parsing values until the remainder is exhausted:

: parse-list ( data -- value )
    [ { } ] [
        [ dup empty? not ] [ parse-tnetstring ] produce nip
    ] if-empty ;

Parsing dictionaries is only a little more involved. We need a way to parse successive key/value pairs, checking some simple error conditions:

: parse-pair ( data -- extra value key )
    parse-tnetstring [
        dup [ "Unbalanced dictionary store" throw ] unless
        parse-tnetstring
        dup [ "Invalid value, null not allowed" throw ] unless
    ] dip ;

Then we can build the dictionary, repeatedly parsing key/value pairs:

: parse-dict ( data -- value )
    [ H{ } ] [
        [ dup empty? not ] [ parse-pair swap 2array ] produce
        nip >hashtable
    ] if-empty ;

And, to make the interface easy to use, we wrap our parse-tnetstring word, checking that there was no un-parsed remainder:

: tnetstring ( data -- value )
    parse-tnetstring swap [
        "Had trailing junk: %s" sprintf throw
    ] unless-empty ;

We can show that it works on one of Zed's more complex examples:

( scratchpad ) "34:5:hello\"22:11:12345678901#4:this\"]}" tnetstring .
H{ { "hello" { 12345678901 "this" } } }

The code (and some tests) for this is on Github.

Update: I added support for booleans and implemented writer words to match the reader words above. Everything is on my Github.

Friday, March 18, 2011

Square?

A recent code golf requested a way to determine if 4 points form a square. Factor may not be the shortest answer, but I thought I would contribute it anyway.

We require four unique points to be provided. We then use the distance formula to compute the distances between all pairs of points. For a square, there should be just two lengths (the side and the diagonal) not counting zero (the distance from a point to itself).

USING: kernel math math.combinatorics math.vectors sequences sets ;

: square? ( seq -- ? )
    members [ length 4 = ] [
        2 [ first2 v- [ sq ] map-sum ] map-combinations
        { 0 } diff length 2 =
    ] bi and ;

We can write some unit tests to make sure it works.

USE: tools.test

[ t ] [
    {
        { { 0 0 } { 0 1 } { 1 1 } { 1 0 } }   ! standard square
        { { 0 0 } { 2 1 } { 3 -1 } { 1 -2 } } ! non-axis-aligned square
        { { 0 0 } { 1 1 } { 0 1 } { 1 0 } }   ! different order
        { { 0 0 } { 0 4 } { 2 2 } { -2 2 } }  ! rotated square
    } [ square? ] all?
] unit-test

[ f ] [
    {
        { { 0 0 } { 0 2 } { 3 2 } { 3 0 } }   ! rectangle
        { { 0 0 } { 3 4 } { 8 4 } { 5 0 } }   ! rhombus
        { { 0 0 } { 0 0 } { 1 1 } { 0 0 } }   ! only 2 distinct points
        { { 0 0 } { 0 0 } { 1 0 } { 0 1 } }   ! only 3 distinct points
    } [ square? ] any?
] unit-test

Since it's code golf (fewest characters possible), how might you make it shorter?

Wednesday, March 16, 2011

Google Translate

After having some fun building a Factor wrapper for the Google Charts API, I decided to implement the Google Translate API. We will be using version 2, which returns JSON over HTTP.

The Google Translate API requires the use of an API key, which you can get from the Google APIs console.

USING: assocs google http.client io json.reader kernel locals
namespaces sequences urls urls.secure ;

Translate

First, we define a global symbol to store the API key:

SYMBOL: google-api-key

Next, we make a simple word to create translation URLs:

:: translate-url ( text source target -- url )
    URL" https://www.googleapis.com/language/translate/v2"
        google-api-key get-global "key" set-query-param
        source "source" set-query-param
        target "target" set-query-param
        text "q" set-query-param ;

We can translate a string of text from a source language into a target language:

: translate ( text source target -- text' )
    translate-url http-get nip json>
    { "data" "translations" } [ swap at ] each
    first "translatedText" swap at ;

Once you've set your API key, you can try it out:

( scratchpad ) "Hello world!" "en" "es" translate print
¡Hola, mundo!

Google Translate supports many languages, which makes it pretty useful.

Translation Party

If you haven't seen the Translation Party website, you should check it out. Basically, it translates a phrase from English into Japanese and back again until it reaches a stable equilibrium. Some of the results are pretty funny. We are going to build this in Factor, but supporting any source and target language.

:: translation-party ( text source target -- )
    text dup print [
        dup source target translate dup print
        target source translate dup print
        swap dupd = not
    ] loop drop ;

For example, we can translate the phrase "Type anything here and you'll get funny" and you get:

( scratchpad ) "Type anything here and you'll get funny"
               "en" "ja" translation-party
Type anything here and you'll get funny
ここに何も入力すると、面白い取得します
Here you enter anything gets interesting
ここでは、何が面白いの入力
Here is something interesting input
ここで何か面白いものが入力される
Something interesting is entered here
何かが興味深いここで入力されている
Are entered here is something interesting
何かが興味深いここで入力されていますが
Has been entered here is something interesting
ここで入力されている何か面白いです
What is interesting is entered here
興味深いのは、ここに入力されている
Interestingly, that is entered here
興味深いことに、ここに入力されている
Interestingly, that is entered here

The code for this is on my Github.

Friday, March 11, 2011

Google Charts

Recently, I implemented a wrapper for the Google Chart API in Factor. The Visual REPL provides a really great way to do exploratory programming, such as my previous example of integrating Wolfram|Alpha with Factor.

Below are some various examples of using my google.charts vocabulary:

Pie Charts

At the end of 2008, Slava Pestov posted an article analyzing the usage of shuffle words in the core library. He created a usage-histogram word that counts the number of usages of a sequence of words:

: usage-histogram ( words -- assoc )
    [ [ name>> ] [ usage length ] bi ] { } map>assoc ;

We can then produce a pie chart showing the relative usage of various shuffle words:

Bar Charts

It might be fun to use a dictionary to count the number of words that start with each letter. We can use the dictionary available on most Unix-like systems to make a histogram and display the results as a bar chart:

Sparklines

Edward Tufte coined the term sparkline and described them as "data-intense, design-simple, word-sized graphics". They are often useful for quick visualizations of large data sets, for example a random sequence of numbers:

QR Codes

Recently, Fred Alger posted some source code which uses QR Codes and the MECARD format to share address book information with mobile users. We can convert arbitrary text into a QR Code for display:

Formulas

We can use Tex to plot formulas:

Scatter

We can make scatter plots, for example 100 random x,y points:

The code is available on my Github.

Sunday, March 6, 2011

Fast Now

Sometimes profiling an application will show hotspots in the oddest places. For some types of network applications that process huge volumes of events, calls to gettimeofday() can become a bottleneck. If each event needs to have a timestamp generated, this could mean thousands of system calls in a very short time, all producing essentially the same value. If the "actual" time is not so important, performance could be gained by relaxing the requirement that all events received in a select() loop have the same timestamp.

One way to do this is to cache the timestamp after waking up, and use that timestamp for all events received within the I/O loop. Unfortunately, the I/O loop can take a long time to process - resulting in timestamps that diverge from "actual" time by small or large (and unpredictable) amounts. Perhaps a better way would be to cache the result of gettimeofday() with a resolution of one millisecond.

fast-now

In Factor, gettimeofday() is called by the now word from the calendar vocabulary. Let's try and build a fast-now word that adds caching:

USING: calendar kernel math namespaces system ;

Our cache resolution is one millisecond (or 1,000,000 nanoseconds):

CONSTANT: cache-duration 1000000

We will be keeping the cached value and the expiration time:

SYMBOL: cache-value
SYMBOL: cache-until

Given the current time in nanoseconds, we can check to see if the cached value has expired:

: cache-expired? ( nanos -- ? )
    cache-until get-global 0 or > ;

If it has, we can reset the cache expiration:

: reset-cache ( nanos -- )
    cache-duration + cache-until set-global ;

And update the cached value (the result of calling now):

: update-cache ( nanos -- timestamp )
    reset-cache now [ cache-value set-global ] keep ;

Building the fast-now word is as easy as:

Get the current monotonically increasing nano-count.
Check if the cache has expired.
Update the cache if expired, otherwise return the cached value.

: fast-now ( -- timestamp )
    nano-count dup cache-expired? [ update-cache ] [
        drop cache-value get-global
    ] if ;

Note: we use a monotonic timer because it is a much faster operation than calling gettimeofday(). Another fast way would be to use the instruction counter, however we would need to estimate cpu speed to know how many instructions the cached value should survive for.

Try it

We can try this out, and see how much faster it is:

( scratchpad ) [ 1000 [ now drop ] times ] benchmark .
19706313

( scratchpad ) [ 1000 [ fast-now drop ] times ] benchmark .
356574

So, 1,000 calls to now took 19.706 milliseconds, and only 0.356 milliseconds for fast-now (for a speedup of 55x)! In the case of a single call, both now and fast-now take roughly the same time. Not a bad improvement, right?

The code (and some tests) for this is on my Github.