You folks create a lot of events on GitHub. Trying to mine and report on that data is definitely a big data problem. Ilya Grigorik wants to help the community get a handle on the GitHub firehose with GitHub Archive. With an API call you can get a range of GitHub’s public timeline data for all seventeen event types:
require 'open-uri' require 'zlib' require 'yajl' gz = open('http://data.githubarchive.org/2012-03-11-12.json.gz') js = Zlib::GzipReader.new(gz).read Yajl::Parser.parse(js) do |event| print event end
The source is on GitHub. Be sure and listen to Ilya on episode #55 talking about Goliah, EventMachine, and SPDY.