IdentityCache: Improving Performance one Cached Model at a Time

A month ago Shopify was at BigRubyConf where we mentioned an internal library we use for caching ActiveRecord models called IdentityCache. We're pleased to say that the library has been extracted out of the Shopify code base and has been open sourced!
At Shopify, our core application has been database performance bound for much of our platform’s history. That means that the most straightforward way of making Shopify more performant and resilient is to move work out of the database layer. 
For many applications, achieving a very high cache ratio is a matter of storing full cached response bodies, and versioning them based on the associated records in the database, serving always the more current version and relying on the cache’s LRU algorithm for expiration. 
That technique, called a “generational page cache”, is well proven and very reliable.  However, part of Shopify’s value proposition is that store owners can heavily customize the look and feel of their shops. We in fact offer a full fledged templating language
As a side effect, full page static caching is not as effective as it would be in most other web platforms, because we do not have a deterministic way of knowing what database rows we’ll need to fetch on every page render. 
The key metric driving the creation of IdentityCache was our master database’s queries per (second/minute) and thus the goal was to reduce read operations reaching the database as much as possible. IdentityCache does this by moving the workload to Memcached instead.
The inability of a full page cache to take load away from the database becomes even more evident during write heavy - and thus page cache expiring - events like Cyber Monday, and flash sales. On top of that, the traffic on our web app servers typically doubles each year, and we invested heavily in building out IdentityCache to help absorb this growth.  For instance, in 2012 during the last pre-IdentityCache sales peak, we saw 130.000 requests per minute generating 21.000 queries per second in comparison with the latest flash sale on April 2013 generated 203.000 requests with only 14.500 queries per second.  

What Exactly is IdentityCache?

IdentityCache is a read through cache for ActiveRecord models. When reading records from the cache, IdentityCache will try to fetch the requested object from memcached. If the cache entry doesn't exist, IdentityCache will load the object from the database and store it in memcache, then the cached copy will be available for subsequent reads and avoid any more trips to the database. This behaviour is key during events that expire the cache often.
Expiration is explicit and does not rely on Memcached's LRU. It is automatic, objects are expired from the cache by issuing memcached delete command as they change in the database via after_commit hooks. This is important because given a row in the database we can always calculate its cache key based on the current table schema and the row’s id. There is no need for the user to ever call delete themselves. It was a conscious decision to take expiration away from day-to-day developer concerns.
This has been a huge help as the characteristics of our application and Rails have changed. One great example of this is how Ruby on Rails changed what actions would fire after_commit hooks. For instance, in Rails 3.2, touch will not fire an after_commit. Instead of having to add expires, and think about all the possible ramifications every time, we added the after_touch hook into IdentityCache itself.
Aside from the default key, built from the schema and the row id, IdentityCache uses developer defined indexes to access your models. Those indexes simply consist of keys that can be created deterministically from other row fields and the current schema. Declaring an index will also add a helper method to fetch your cached models using said index.
IdentityCache is opt-in, meaning developers need to explicitly specify what should be indexed and explicitly ask for data from the cache. It is important that developers don’t have to guess whether calling a method will bring a cached entry or not. 
We think this is a good thing. Having caching hook in automatically is nice in its simplest form.  However, IdentityCache wasn't built for simple applications, it has been built for large, complicated applications where you want, and need to know what's going on.

Down to the Numbers

If that wasn’t good enough, here are some numbers from Shopify itself.
This is an example of when we introduced IdentityCache to one of the objects that is heavily hit on the shop storefronts. As you can see we cut out thousands of calls to the database when accessing this model. This was huge since the database is one of the heaviest contended components of Shopify.
This example shows similar results once IdentityCache was introduced. We eliminated what was approaching 50K calls per minute (which was growing steadily) to almost nothing since the subscription was now being embedded with the Shop object. Another huge win from IdentityCache.

Specifying Indexes

Once you include IdentityCache into your model, you automatically get a fetch method added to your model class. Fetch will behave like find plus the read-through cache behaviour.
You can also add other indexes to your models so that you can load them using a different key. Here are a few examples:
class Product < ActiveRecord::Base
  include IdentityCache


class Product < ActiveRecord::Base
  include IdentityCache
  cache_index :handle

We’ve tried to make IdentityCache as simple as possible to add to your models. For each cache index you add, you end up with a fetch_* method on the model to fetch those objects from the cache.
You can also specify cache indexes that look at multiple fields. The code to do this would be as follows:
class Product < ActiveRecord::Base
  include IdentityCache
  cache_index :shop_id, :id

Product.fetch_by_shop_id_and_id(shop_id, id)

Caching Associations

One of the great things about IdentityCache is that you can cache has_one, has_many and belongs_to associations as well as single objects. This really sets IdentityCache apart from similar libraries.
This is a simple example of caching associations with IdentityCache:
class Product < ActiveRecord::Base
  include IdentityCache
  has_many :images
  cache_has_many :images

@product = Product.fetch(id)
@images = @product.fetch_images
What happens here is the product is fetched from either Memcached or the database if it's a cache miss. We then look for the images in the cache or database if we get another miss. This also works for both has_one and belongs_to associations with the cache_has_one and cache_belongs_to IdentityCache, respectively.
What if we always want to load the images though, do we always need to make the two requests to the cache? 

Embedding Associations

With IdentityCache we can also embed the associations with the parent object so that when you load the parent the associations are also cached and loaded on a cache hit. This avoids needing to make the multiple Memcached calls to load all the cached data. To enable this you simple need to add the ':embed => true' options. Here's a little example:
class Product < ActiveRecord::Base
  include IdentityCache
  has_many :images
  cache_has_many :images, :embed => true

@product = Product.fetch(id)
@images = @product.fetch_images
The main difference with this example versus the previous is that the '@product.fetch_images' call won't hit Memcached a second time; the data is already loaded when we fetch the product from Memcached.
The tradeoffs of using embed are: first your entries in memcached will be larger, as they’ll have to store data for the model and its embedded associations, second the whole cache entry will expire on changes to any of the models cached.
There are a number of other options and different ways you can use IdentityCache which are highlighted on the github page, I highly encourage anyone interested to take a look at those examples for more details. Please check it out for yourself and let us know what you think!


  • Brad Robertson
    Brad Robertson
    April 11 2013, 11:43AM

    Does this work with the newer dalli gem? I didn’t see any reference to it in the codebase

  • Boris Barroso
    Boris Barroso
    April 11 2013, 12:26PM

    How this would work in a multitenant, multischema (PostgreSQL). I can have a product with id = 1 in schema1 and also a product with id = 1 on schema2 but those two products are different.

  • @Shopify Camilo Lopez
    Camilo Lopez
    April 11 2013, 01:39PM

    Brad, IdentityCache will use an underlying ActiveSupport::Cache::Store, if yours is configured to use dalli IDC will use it.

  • @Shopify Camilo Lopez
    Camilo Lopez
    April 11 2013, 01:47PM


    IdentityCache assumes ids to be unique, so multischema situations are not something it supports out of the box.

    Potentially the schema name/id could be added to the cache keys, and the database finders. Hoever that is not something in our roadmap.

  • Vlad
    April 13 2013, 12:51AM

    Why did you choose the fetch predicate?
    Isn’t the caching always the desirable behaviour?
    Isn’t includung the module already a clear signal, that `find` and relations would be cached?

  • Millisami
    April 15 2013, 12:45PM

    How to test it?

  • @Shopify Camilo Lopez
    Camilo Lopez
    April 15 2013, 12:53PM


    Cache is almost always the desirable behaviour, however under some circumstances (salve replication lag) for instance you need to be certain the data you are getting is authoritative. That is why fetch_ exists to give developers a clear signal; “hey you are working with cached data”.

  • Linh Chau
    Linh Chau
    April 17 2013, 11:19PM

    This idea is very similar to this:

    Except that my gem is framework-agnostic, doesn’t care about what kind of cache software you use, as long as the cache is configurable through Rails.

    It also doesn’t require the keys must be defined in advance. Developers can say fetch by whatever they want, then those attributes automatically become keys.

  • Tobias
    May 15 2013, 01:13PM

    FYI, that the link to the BigRuby-Talk

  • Mark
    June 20 2013, 11:43AM

    From looking at the code, I had a question: why is the cache disabled if there’s an open connection? E.g.:

    def should_cache? # :nodoc:
    !readonly && ActiveRecord::Base.connection.open_transactions == 0

    Isn’t every web request in Rails wrapped in a TX?

  • @Shopify Camilo Lopez
    Camilo Lopez
    August 01 2013, 08:36PM


    No, not every request is wrapped in a transaction by default, and that would be an odd design decision.

  • Paul
    October 22 2013, 11:11AM

    This reminds me of EHCache, one of the things that made J2EE development tolerable, back in the day.

    I was getting stale cache reads in my test env, so I monkey-patched my identity_cache.

    I’ve modified the following method, defined in build_normalized_has_many_cache:
    def #{options[:population_method_name]}
    @#{options[:ids_variable_name]} = #{options[:ids_name]}
    @#{options[:records_variable_name]} = nil

    I added the last line. Now, I can call populate_#{association_name}_cache to ensure that subsequent calls to fetch_#{association_name} won’t return a stale value. I’m sure there’s a better way, but so far I haven’t found it.

    This is not as crucial in the development env, since the model instances are destroyed at the end of the request.

    I do wonder why you create @#{options[:records_variable_name]} in the first place, when it seems you could use association_cache. What am I missing?

Leave a comment ...