Saturday, 18 August 2012

Itching a scratch: Thread safe LRU Cache in Ruby

Background:
I'm a bit of a newbie to the Ruby world. Actually I've started learning Ruby only few months ago as a kind of hobby. In the meantime I'm very impressed by elegance, simplicity and productivity of Ruby.
Given that I'm a hard-core Java developer by day, I was very interested to find out how Ruby handles concurrency. After being a bit dissapointed by existence of GIL (http://ablogaboutcode.com/2012/02/06/the-ruby-global-interpreter-lock/) in MRI, I started exploring alternative Ruby implementations - as it seems both JRuby and RBX seem to be much more concurrency friendly.

Fast forward to today:
I've been implementing a kind of a storage system in Ruby. In order to improve performance of the storage system, I've been caching data in a LRU cache. LRU cache is not really a new idea, so I thought there must be plenty of implementations floating around. Indeed, there are quite a few open source LRUCache implementations - just to name a few:

https://github.com/kindkid/lrucache/tree/master/lib
http://cvs.savannah.gnu.org/viewvc/ruby-cache/?root=pupa
https://github.com/ahoward/lru_cache
https://github.com/jmettraux/rufus-lru
https://github.com/dcarney/lru_cache

Most of the features (such as cache timeouts) offered by many of them were not important for my task. Astonishingly, almost none of those seem to care about concurrency of the access to cache. After bit of googling, I've even managed to find an implementation that does cover concurrency:

https://github.com/shadabahmed/lru-cache

At the first glance it did look fine, but after going through the source code I've spotted a few places with possible race conditions, especially when run in JRuby: none of the read operations on a Hash seem to be synchronized with a Mutex - this can go wrong badly if JRuby's Hash does not handle concurrent access internally. To prove my theory I've written following RSpec, that demonstrates the problem:
require 'lru'

module LRUCache
  describe Cache do

    describe :get do

      it "should be thread-safe" do

        cache=Cache.new 2
        threads=(0..200).map do
          Thread.new do
            (0..1000).each do |value|              
              cache.put(rand(10),rand(2000))
              cache.get(rand(10))
            end
          end
        end

        threads.each{|thread| thread.join}

      end

    end

  end
end

After running the Spec using JRuby 1.6.7.2, following happened:

$ rspec
F

Failures:

  1) LRUCache::Cache get should be thread-safe
     Failure/Error: cache.put(rand(10),rand(2000))
     NoMethodError:
       undefined method `key' for nil:NilClass
     # /home/milic/git/lru-cache/lru.rb:19:in `put'
     # ./spec/lrucache_spec.rb:17:in `LRUCache'
     # ./spec/lrucache_spec.rb:16:in `LRUCache'

Finished in 6.4 seconds
1 example, 1 failure

Failed examples:

rspec ./spec/lrucache_spec.rb:11 # LRUCache::Cache get should be thread-safe

This provided me with enough motivation to implement my own LRU Cache that will behave properly under heavy load and behave well when multiple threads are trying to produce value for the same key. The result of that can be found here: https://github.com/draganm/threadsafe-lru

The basic principle is simple: internal state of the cache is protected by one Mutex. Values are held on thread-safe Future objects (Node class in the source code) that synchronize creation of values (evaluating code blocks provided to the get method). This design should allow minimal contention of threads accessing different values (minimal duration of locking of internal state Mutex) and synchronization of threads accessing the same value so that the value is produced only once.