Ruby Asynchronous HTTP and EM::Iterator

I recently had a need to issue a number of long-running HTTP gets that were taking about 20 minutes to complete when performed serially, one after the other.  In spite of the concurrency limits imposed by the Global Interpreter Lock, Ruby is well-equipped to handle long-running I/O.  

Using the methods documented below I was able to squash that 20 minutes down to less than 5 minutes.  I probably would have been able to do better if the server was able to handle the full load of all requests concurrent.

Part of the reason I am writing this is because of the additional amount of tire-kicking I needed to do to get this working even with the the documentation that comes with em-http-request.  (Thank you for posting your lovely gem... Your documentation is close but not quite there for me.)  

Simple Example - Single Request

Because I prefer a number of examples with increasing complexity when I am tackling a new library, that's what you'll get here. This first example implements only a single HTTP request but establishes that we have figured out enough to determine that it is working for a single get.

It produces output that looks like this:

$ ruby bin/em-http/em-http-000.rb 
D, [2016-07-24T17:12:39.386698 #42654] DEBUG -- : [] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [SUBMITTED] [runtime=0.197928]
D, [2016-07-24T17:12:42.238997 #42654] DEBUG -- : [] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [CALLBACK 200] [runtime=3.050257]
{"SERVER"=>"nginx", "DATE"=>"Sun, 24 Jul 2016 21:12:39 GMT", "CONTENT_TYPE"=>"application/zip", "CONTENT_LENGTH"=>"5242880", "LAST_MODIFIED"=>"Mon, 02 Jun 2008 15:30:42 GMT", "CONNECTION"=>"close", "ETAG"=>"\"48441222-500000\"", "ACCESS_CONTROL_ALLOW_ORIGIN"=>"*", "ACCEPT_RANGES"=>"bytes"}

So far, so good. We download from a URL and it is successful with a status code of 200.

Next Example: Crude Async Concurrency

The next example ensures that we have a solid enough understanding to deal with two concurrent requests. One thing we have to handle is ensuring that EM.stop is only called after all of the jobs have completed. So in this example, we add a request queue collection and a method call to EM.stop when all jobs are done.

It's getting a bit more complex but still quite grokkable and we can see that both requests get submitted at the same time, but the larger file takes longer to download.

$ ruby bin/em-http/em-http-010-async.rb 
D, [2016-07-24T16:47:21.628277 #42542] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [SUBMITTED] [runtime=0.18803]
D, [2016-07-24T16:47:21.631735 #42542] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/10MB.zip] [SUBMITTED] [runtime=0.191508]
D, [2016-07-24T16:47:26.295605 #42542] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [CALLBACK/ERRBACK 200] [runtime=4.855387]
D, [2016-07-24T16:47:26.295678 #42542] DEBUG -- : [stop_when_all_finished] [states=[:finished, :body]]
D, [2016-07-24T16:47:30.669914 #42542] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/10MB.zip] [CALLBACK/ERRBACK 200] [runtime=9.229693]
D, [2016-07-24T16:47:30.669999 #42542] DEBUG -- : [stop_when_all_finished] [states=[:finished, :finished]]

Final Example: Async Concurrency with Concurrency Limits

If we have about 20 requests, we probably don't want them all going to a server and blowing it up. EventMachine provides an elegant mechanism to apply a concurrency constraint so that we can limit the number of active requests. In this final example, I issue six requests and I apply a concurrency limit of 2.

The code itself doesn't look very different. But you can watch the log and see that two jobs are submitted initially and then subsuquent jobs are added only as a job completes.

$ ruby bin/em-http/em-http-020-async-with-em-iterator.rb 
D, [2016-07-24T16:55:37.620212 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [SUBMITTED] [runtime=0.174998]
D, [2016-07-24T16:55:37.624177 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/10MB.zip] [SUBMITTED] [runtime=0.178991]
D, [2016-07-24T16:55:42.003892 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [CALLBACK/ERRBACK 200] [runtime=4.5587]
D, [2016-07-24T16:55:42.003984 #42569] DEBUG -- : [stop_when_all_finished] [states=[:finished, :body]]
D, [2016-07-24T16:55:42.006724 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [SUBMITTED] [runtime=4.561536]
D, [2016-07-24T16:55:45.318885 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/10MB.zip] [CALLBACK/ERRBACK 200] [runtime=7.873685]
D, [2016-07-24T16:55:45.319011 #42569] DEBUG -- : [stop_when_all_finished] [states=[:finished, :finished, :body]]
D, [2016-07-24T16:55:45.321175 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/10MB.zip] [SUBMITTED] [runtime=7.875988]
D, [2016-07-24T16:55:46.646499 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [CALLBACK/ERRBACK 200] [runtime=9.201303]
D, [2016-07-24T16:55:46.646617 #42569] DEBUG -- : [stop_when_all_finished] [states=[:finished, :finished, :finished, :body]]
D, [2016-07-24T16:55:46.651357 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [SUBMITTED] [runtime=9.206166]
D, [2016-07-24T16:55:50.993760 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/5MB.zip] [CALLBACK/ERRBACK 200] [runtime=13.548572]
D, [2016-07-24T16:55:50.993836 #42569] DEBUG -- : [stop_when_all_finished] [states=[:finished, :finished, :finished, :body, :finished]]
D, [2016-07-24T16:55:50.995672 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/10MB.zip] [SUBMITTED] [runtime=13.550481]
D, [2016-07-24T16:55:52.958938 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/10MB.zip] [CALLBACK/ERRBACK 200] [runtime=15.51375]
D, [2016-07-24T16:55:52.959023 #42569] DEBUG -- : [stop_when_all_finished] [states=[:finished, :finished, :finished, :finished, :finished, :body]]
D, [2016-07-24T16:55:59.708914 #42569] DEBUG -- : [new_http_request] [url=http://ipv4.download.thinkbroadband.com/10MB.zip] [CALLBACK/ERRBACK 200] [runtime=22.263717]
D, [2016-07-24T16:55:59.709023 #42569] DEBUG -- : [stop_when_all_finished] [states=[:finished, :finished, :finished, :finished, :finished, :finished]]

Conclusion

Thus ends my tutorial on using em-http-request and em-iterator to handle a number of long-running downloads with concurrency. A lot of the lines I have above are devoted to logging and comments. The code is actually quite concise, I think.

If this was helpful to you, share it along.


Travis Deployment to Rubygems.org

After working on it rather un-seriously for a number of months, I decided to finally learn how to publish a ruby gem today. You can thank my workmate, Craig for this.

And since I am lazy, I also made sure to have Travis CI do it automatically for any releases I tag on Github. Here is the result:

We have a success folks! So I thought I'd document the process to help others trying to do something similar.

How to Get Your Gems Flowing to RubyGems.org from Travis CI

Step 1 - Setup an account on Rubygems.org

You will need an account, and, more

Step 2 - Setup your Rubygems.org api key

This tip comes from the Make Your Own Gem guide at rubygems.org.

To setup your RubyGems API key, do the following (and be sure to substitute your rubygems username where you see the ${USERNAME} field).

curl -u ${USERNAME} https://rubygems.org/api/v1/api_key.yaml > ~/.gem/credentials; chmod 0600 ~/.gem/credentials

Step 3 - Use the Jeweler Gem

We use the jeweler gem to handle versioning and structure creation/update of the .gemspec manifest so that we don't have to do these things by hand. The jeweler gem adds tasks to rake and I found it really easy to get setup and customized for my gem.

The readme on the github pretty much had everything I needed.

The section labeled "Customizing Your Gem" is especially important since it offers a modifiable section of code you can drop in your rake files:

require 'jeweler'
Jeweler::Tasks.new do |gem|
  # gem is a Gem::Specification... see http://guides.rubygems.org/specification-reference/ for more options
  gem.name = "whatwhatwhat"
  gem.summary = %Q{TODO: one-line summary of your gem}
  gem.description = %Q{TODO: longer description of your gem}
  gem.email = "josh@technicalpickles.com"
  gem.homepage = "http://github.com/technicalpickles/whatwhatwhat"
  gem.authors = ["Joshua Nichols"]
end
Jeweler::RubygemsDotOrgTasks.new

Here is the jeweler customization for expect-behaviors.

Once jeweler is setup, we now have a rake driven flow to build the gemspec file so that you can bump a version by doing:

  • rake version:bump:patch
  • rake gemspec

You will have to decide for yourself how you feel about rake release. I plan to use Travis CI to deploy so the release part is less important to me.

Also important: it was really useful for me to build the gem locally because it turns out the gem builder really didn't like the way I wrote some of my dependencies. To do this, run gem build ${GEMSPEC_FILE}.

Step 4 - Get Travis CI up and running

I'm assuming that if you're reading this, you're already familiar with the basics of Travis CI. But in case you need help to get going, I would refer you to their docs:
https://docs.travis-ci.com/user/getting-started/

You want to be up and running with tests before you move on the to next step. If it helps to see an example of how little it takes to configure Travis, you can see my .travis.yml file here. The important sections are covered on lines 1-9 for basic testing. Creating the deploy section is covered in the next step.

Step 5 - Install the Travis Gem and setup rubygems

  • gem install travis
  • travis setup rubygems

Dead simple. That's why we love Ruby.

You will be prompted for everything else. Here is what I saw:

$ travis setup rubygems 
Gem name: |expect-behaviors| 
Release only tagged commits? |yes| 
Release only from francisluong/expect-behaviors? |yes| 
Encrypt API key? |yes|

Step 6 - Travis will attempt to deploy when you draft a new release on Github.

But make sure to bump the revision in the commit too.

Here's what I think my flow will be:

  • As part of my branch/pull-request, I will include a rake version:bump:patch (or :major or :minor as appropriate)
$ rake version:bump:patch  
Current version: 0.1.2  
Updated version: 0.1.3