Thursday, April 24, 2014

Instant-runoff Voting

Recently, I had a conversation with a friend in Australia who told me about the voting system used in most of their elections: instant-runoff voting.

Instead of voting for a single candidate, you rank candidates in the order of preference. This ranking system is used to choose a best candidate.

  1. Count each person's most preferred candidate.
  2. The winning candidate must have more than 50% of the votes.
  3. Otherwise, remove the candidate with the least number of overall votes, and try again.

Let's implement a voting system like this in Factor.

Assuming voters provide an ordered list of candidates, we can count everyone's top candidate:

: count-votes ( votes -- total )
    [ first ] histogram-by sort-values ;

A candidate wins the election if he has a simple majority (more than 50%) of the votes:

: choose-winner ( votes total -- winner/f )
    last first2 rot length 2/ > [ drop f ] unless ;

If the candidate with the most votes did not achieve a majority of the votes, we remove all votes for the candidate with the least number of votes:

: remove-loser ( votes total -- newvotes )
    first first swap [ remove ] with map ;

The full implementation of our instant-runoff voting system:

: instant-runoff ( votes -- winner )
    dup count-votes 2dup choose-winner
    [ 2nip ] [ remove-loser instant-runoff ] if* ;

One improvement we could make would be to support versions of this model that do not require voters to rank all the candidates (an assumption that the code above makes).

The code for this is on my GitHub.

Tuesday, April 22, 2014

Checksum Improvements

Just a quick update, some checksum improvements have been contributed to Factor.

Some new checksums have been implemented:

And some checksum performance has been improved:

  • checksums.md5 is a lot faster (benchmark is 0.080 vs 0.583 seconds)
  • checksums.sha is a bit faster (benchmark is 0.418 vs 0.686 seconds)

You can find these changes (and more!) in the development version of Factor.

Monday, April 21, 2014

Scraping Re: Factor

For today's post, I thought it would be fun to build a little interface to my blog, Re: Factor. In addition, you can use this to easily scrape any Blogger website.

Scraping

A simple way to make URLs that are relative to my blogs domain:

: re-factor-url ( str -- url )
    "http://re-factor.blogspot.com/" prepend ;

Using that to get a URL that returns all of the posts as JSON objects.

: posts-url ( -- url )
    "feeds/posts/default?alt=json&max-results=200" re-factor-url ;
Note: we limit the results to 200 posts. Since this is my 190th post, that will work for a little while longer but that limit might need to be bumped up in the future. :-)

Retrieving all of the posts is easy using our HTTP client and parsing the response. Since my posts don't change that frequently, for convenience we will memoize the list result.

MEMO: all-posts ( -- posts )
    posts-url http-get nip json> { "feed" "entry" } [ of ] each ;

Displaying

A simple way to display a list of posts is to display the title of each post and link it to the URL of each post (allowing us to right-click open URLs in the listener).

CONSTANT: post-style H{ { foreground COLOR: blue } }

: posts. ( -- )
    all-posts [
        [ "title" of "$t" of ] [ "link" of ] bi
        over '[ "title" of _ = ] find nip "href" of
        >url post-style [ write-object ] with-style nl
    ] each ;

For individual posts, we will use the html.parser.printer vocabulary to parse the HTML content and display it as text. The conversion to text right now is not perfect, but works okay for most things.

We print the title of the post and a dashed line underneath:

: post-title. ( post -- )
    { "title" "$t" } [ of ] each
    [ print ] [ length CHAR: - <string> print ] bi nl ;

We print the content by rendering the HTML into a string of text, then cleaning up extra whitespace and HTML escapes (using the new html.entities vocabulary), and wrapping the paragraphs.

: post-content. ( post -- )
    { "content" "$t" } [ of ] each
    parse-html html-text html-unescape string-lines [
        [ blank? not ] cut-when
        [ write ] [ 70 wrap-string print ] bi*
    ] each ;

Putting those together, to display a post:

: post. ( n -- )
    all-posts nth [ post-title. ] [ post-content. ] bi ;

The code for this is on my GitHub.

Saturday, April 5, 2014

Speedtest

Many people are familiar with Speedtest.net, which is used to test a network connection, displaying download speeds, upload speeds, and server latency. Implemented as a Flash-based interface, it can be used from a web browser to verify your internet provider is giving you what you pay for.

You might not be aware that the speedtest-cli project provides a way to check internet speed from the command line in a similar manner.

I thought it might be fun to implement an interface to Speedtest.net using Factor:

Closest Servers

Speedtest provides a list of available servers all over the world that can be used for testing, returned as XML. After parsing the XML document, we use a utility method to extract attributes for each server into an array:

: attr-map ( tag -- attrs )
    attrs>> [ [ main>> ] dip ] H{ } assoc-map-as ;

: speedtest-servers ( -- servers )
    "http://www.speedtest.net/speedtest-servers.php"
    http-get nip string>xml
    "server" deep-tags-named [ attr-map ] map ;

Calculating the geographical distance between two points, specified by latitude and longitude:

: radians ( degrees -- radians ) pi * 180 /f ; inline

:: geo-distance ( lat1 lon1 lat2 lon2 -- distance )
    6371 :> radius ! km
    lat2 lat1 - radians :> dlat
    lon2 lon1 - radians :> dlon
    dlat 2 / sin sq dlon 2 / sin sq
    lat1 radians cos lat2 radians cos * * + :> a
    a sqrt 1 a - sqrt fatan2 2 * :> c
    radius c * ;

This lets us find the closest server to a given geographic location:

: lat/lon ( assoc -- lat lon )
    [ "lat" of ] [ "lon" of ] bi [ string>number ] bi@ ;

: server-distance ( server lat lon -- server )
    '[ lat/lon _ _ geo-distance "distance" ] keep
    [ set-at ] keep ;

: closest-servers-to ( lat lon -- servers )
    [ speedtest-servers ] 2dip '[ _ _ server-distance ] map
    [ "distance" of ] sort-with ;

The available Speedtest configuration provides our latitude and longitude, allowing us to sort the server list by geographic distance:

TUPLE: config client times download upload ;

C: <config> config

: speedtest-config ( -- config )
    "http://www.speedtest.net/speedtest-config.php"
    http-get nip string>xml {
        [ "client" deep-tag-named attr-map ]
        [ "times" deep-tag-named attr-map ]
        [ "download" deep-tag-named attr-map ]
        [ "upload" deep-tag-named attr-map ]
    } cleave <config> ;

: closest-servers ( -- servers )
    speedtest-config client>> lat/lon closest-servers-to ;

Best Server

We can calculate latency by downloading a small latency.txt file and timing how long it takes:

: (server-latency) ( server -- ms )
    "url" of >url URL" latency.txt" derive-url
    [ http-get nip "test=test\n" = ] benchmark 1,000,000 /f
    3,600,000 ? ;

After calculating latency, we save it for later use:

: server-latency ( server -- server )
    [ (server-latency) "latency" ] keep [ set-at ] keep ;

The "best" server that we will use for testing is the one with the lowest latency, checking the five closest servers to our location:

: best-server ( -- server )
    closest-servers 5 short head
    [ server-latency ] parallel-map
    [ "latency" of ] sort-with first ;

Upload Speed

To calculate upload speed, we upload several document sizes (filling the content with zeroes) and time how long it takes:

: upload-data ( size -- data )
    9 - CHAR: 0 <string> "content1=" prepend ;

: (upload-speed) ( server -- Mbps )
    "url" of >url { 250,000 500,000 } [
        [
            upload-data [ swap http-put 2drop ] keep length
        ] with map-sum
    ] benchmark 1,000,000,000 /f / 8 * 1,000,000 / ;

After calculating upload speed, we save it for later use:

: upload-speed ( server -- server )
    [ (upload-speed) "upload" ] keep [ set-at ] keep ;

Download Speed

To calculate download speed, we download several files with varying sizes in parallel and time how long it takes:

: download-urls ( server -- urls )
    "url" { 350 500 750 1000 } 
    [ dup "random%sx%s.jpg" sprintf >url derive-url ] with map ;

: (download-speed) ( server -- Mbps )
    download-urls 4 swap <array> [
        [ [ http-get nip length ] map-sum ] parallel-map sum
    ] benchmark 1,000,000,000 /f / 8 * 1,000,000 / ;

After calculating download speed, we save it for later use:

: download-speed ( server -- server )
    [ (download-speed) "download" ] keep [ set-at ] keep ;

Text Results

With all of that built, we can build a word to run a Speedtest, printing out the results as text:

: run-speedtest ( -- server )
    "Selecting best server based on ping..." print flush
    best-server dup {
        [ "sponsor" of ]
        [ "name" of ]
        [ "distance" of ]
        [ "latency" of ]
    } cleave "Hosted by %s (%s) [%0.2f km]: %s ms\n" printf
    "Testing download speed" print flush download-speed
    dup "download" of "Download: %0.2f Mbit/s\n" printf
    "Testing upload speed" print flush upload-speed
    dup "upload" of "Upload: %0.2f Mbit/s\n" printf ;

Graphic Results

It would be nice if we could show the reports graphically, and as it turns out, its not too hard. We just have to upload the results to speedtest.net in the same way their Flash application does, and then display the image that is created for you.

: make-result ( server -- result )
    [
        {
            [ "download" of 1,000 * >integer "download" ,, ]
            [ "latency" of >integer "ping" ,, ]
            [ "upload" of 1,000 * >integer "upload" ,, ]
            [ drop "" "promo" ,, ]
            [ drop "pingselect" "startmode" ,, ]
            [ "id" of "recommendedserverid" ,, ]
            [ drop "1" "accuracy" ,, ]
            [ "id" of "serverid" ,, ]
            [
                [ "latency" of ]
                [ "upload" of 1,000 * ]
                [ "download" of 1,000 * ] tri
                "%d-%d-%d-297aae72" sprintf md5 checksum-bytes
                hex-string "hash" ,,
            ]
        } cleave
    ] { } make ;

: submit-result ( server -- result-id )
    make-result "http://www.speedtest.net/api/api.php"
    <post-request> [
        [
            "http://c.speedtest.net/flash/speedtest.swf"
            "referer"
        ] dip header>> set-at
    ] keep http-request nip query>assoc "resultid" of ;

Speedtest

Putting this all together, we can run the Speedtest, submit the results, then display the test results as an image.

: speedtest ( -- )
    run-speedtest submit-result "Share results: " write
    "http://www.speedtest.net/result/%s.png" sprintf
    [ dup >url write-object nl ] [ http-image. ] bi ;

Some things that I would like to improve:

  • The Speedtest configuration actually specifies the details of download and upload sizes, the amount of parallelism, and the duration of the test, we should use it.
  • The http-get word needs an overall timeout so we can scale between very slow and very fast connection speeds.
  • The Speedtest graphical result images are "retina" when viewed in the web browser, but are not when downloaded from Factor or wget.
  • Factor needs an easier way to create a queue of work that is processed by several worker threads, for convenience I just used one of the concurrent combinators.

The code for this is on my GitHub.