Sunday, May 1, 2011

Reddit Stats

I thought it would be fun to track the development of the Factor programming language using Reddit. Of course, to do this, I wanted to use Factor.

A few months ago, I implemented a Reddit "Top" program. We can use the code for that as a basis for showing, using Reddit scores, how the activity (or popularity?) of a couple Factor blogs have changed over the last few years:


Stories

We start by creating a story tuple that will hold all of the properties that Reddit returns for each posted link (including the score -- up votes minus down votes -- which we will be using).

TUPLE: story author clicked created created_utc domain downs
hidden id is_self levenshtein likes media media_embed name
num_comments over_18 permalink saved score selftext
selftext_html subreddit subreddit_id thumbnail title ups url ;

And a word to parse a "data" result into a story object:

: parse-story ( assoc -- obj )
    "data" swap at \ story from-slots ;

Paging

The Reddit API provides results as a series of "pages" (until all results are exhausted). This is pretty similar to how the website works, so it should be fairly easy to understand the mechanics. We will define a page object that holds the current URL, the results from the current page, and links to the pages before and after the current page:

TUPLE: page url results before after ;

We can then build a word to fetch the page (as a JSON response), parse it, and construct a page object:

: json-page ( url -- page )
    >url dup http-get nip json> "data" swap at {
        [ "children" swap at [ parse-story ] map ]
        [ "before" swap at [ f ] when-json-null ]
        [ "after" swap at [ f ] when-json-null ]
    } cleave \ page boa ;

An easy "paging" mechanism that, given a page of results, can return the next page:

: next-page ( page -- page' )
    [ url>> ] [ after>> "after" set-query-param ] bi json-page ;

And, using the make vocabulary, construct all the results into a sequence:

: all-pages ( page -- results )
    [
        [ [ results>> , ] [ dup after>> ] bi ]
        [ next-page ] while drop
    ] { } make concat ;

Domain Stats

We can retrieve all the results and produces a map of years that links were posted to a sum of the scores for each year's links:

: domain-stats ( domain -- stats )
    "http://api.reddit.com/domain/%s" sprintf json-page all-pages
    [ created>> 1000 * millis>timestamp year>> ] group-by
    [ [ score>> ] map-sum ] assoc-map ;

Finally, a word to display a chart of the scores across a given list of domains:

: domains. ( domains -- )
    H{ } clone [
        '[ domain-stats [ swap _ at+ ] assoc-each ] each
    ] keep >alist sort-keys <bar> 160 >>height chart. ;

The code for this is available on my Github.

No comments: