Shopify open-sources Sarama, a client for Kafka 0.8 written in Go

Shopify has been hard at work scaling its data pipeline for quite some time now, and it had gotten to the point that plain old log files just wouldn’t cut it. We wanted to do more and more with our data, but ran into problems at every turn:

  • Batch processing of logs required log rotation, which introduced unacceptable latency into other parts of the pipeline.
  • Traditional log aggregation tools like Flume didn’t provide the features, reliability, or performance that we were looking for.
  • Fan-out configuration was promising to become unmanageable. We wanted anyone at Shopify to be able to use and experiment with this data, but the configuration to get our logs to that many endpoints was mind-bogglingly complex.

At the end of the day what we were looking for was a blazing-fast, scalable and reliable publish-subscribe messaging system, and LinkedIn’s Kafka project fit the bill perfectly. Open-sourced just a short while ago with the help of the Apache Foundation, Kafka’s upcoming 0.8 release (which LinkedIn has already deployed internally) provided exactly what we were looking for. With Kafka we would be able to easily aggregate, process (for internal dashboards), and store (into Hadoop) the millions of events that happen on our systems each day.

As always, there was a hitch. Shopify’s system language of choice is Go, a concurrency-friendly, safe and scalable language from Google. Kafka only provides clients for Java and Scala, which would have required deploying a heavy JVM instance to all of our servers. In this case, compromising one way or the other was not an option; the lure of being able to do Kafka message processing from Go was too strong, so we just wrote it ourselves! The result is Sarama, a Kafka 0.8 client for Go.

Sarama is a fully-functional MIT-licensed client library capable of producing and consuming messages from Kafka brokers. It has already been deployed to production on Shopify servers, and has processed millions of messages. 

Go check out Sarama today and let us know what you think:

Sarama Github Repository

Sarama Godoc Documentation

Continue reading

"Variant Barcode" and "Image Alt Text" Now in Product Export

The Export Products feature in your Shop's Administration page will now include the variant barcode for each of your products' variants and the alt text for each of your products' images.

The Variant Barcode column has been added between the Variant Taxable and Image Src columns on each line of the product export file:

The Image Alt Text column has been added to the end of each line of the product export file:

This change was put into production on July 18, 2013 at 12:00 EDT. If you have any custom processing of the Products CSV file that Shopify generates, please ensure that you update it to handle these new fields.

Continue reading

New Feature - Product Export Includes Published

The Export Products feature in your Shop's Administration page will now include a whether that product is currently Published or not.

The Payment Method has been added to the end of each line of the Product export file:

This change will be put into production on April 18, 2013 at 12:00 EDT. If you have any custom processing of the Products CSV file that Shopify generates, please ensure that you have updated it to handle this new field.


Continue reading

Spring Into a New Job

Do you want to be a Shopify Guru? Our support team is now running 24/7, so we are looking for Gurus to help out on Evenings, Weekends and Overnights to help out our merchants around the clock.

While we’re not huge fans of traditional interviews here at Shopify, we are HUGE fans of parties. So instead of inviting you in for a ho-hum one-on-one, we’re hosting a big bash for potential Shopify Gurus on April 18th from 7-10pm. Specific location is TBA, but it will be in Ottawa, Canada.

You’ll get to hang out with fun people who are interested in working at Shopify, chat about the potential job, have a few laughs – and a few drinks! You can meet the people who might become your colleagues, and get a great feel for the job, the space, and Shopify’s culture.

But wait! Before you jump on the party bus, you should probably know exactly what a Shopify Guru is:

Shopify Guru
[shaw-puh-fahy goo-roo]
1. A rare, interesting character who gets a kick out of helping Shopify’s customers get their stores up and running.  A guru is comfortable on the phone and can type like a maniac. In the wild, gurus are often spotted laughing at or telling a great joke, as they have a naturally keen sense of humour.

Space is limited, so this party is invite-only. If you’re interested in attending, please enter your info in the provided fields at the bottom of one of our Guru job postings: weekend and evenings and overnight.

Mention in your application that you want to attend our Guru party on April 18th, and we’ll be in touch!


Continue reading

IdentityCache: Improving Performance one Cached Model at a Time

A month ago Shopify was at BigRubyConf where we mentioned an internal library we use for caching ActiveRecord models called IdentityCache. We're pleased to say that the library has been extracted out of the Shopify code base and has been open sourced!
At Shopify, our core application has been database performance bound for much of our platform’s history. That means that the most straightforward way of making Shopify more performant and resilient is to move work out of the database layer. 
For many applications, achieving a very high cache ratio is a matter of storing full cached response bodies, and versioning them based on the associated records in the database, serving always the more current version and relying on the cache’s LRU algorithm for expiration. 
That technique, called a “generational page cache”, is well proven and very reliable.  However, part of Shopify’s value proposition is that store owners can heavily customize the look and feel of their shops. We in fact offer a full fledged templating language
As a side effect, full page static caching is not as effective as it would be in most other web platforms, because we do not have a deterministic way of knowing what database rows we’ll need to fetch on every page render. 
The key metric driving the creation of IdentityCache was our master database’s queries per (second/minute) and thus the goal was to reduce read operations reaching the database as much as possible. IdentityCache does this by moving the workload to Memcached instead.
The inability of a full page cache to take load away from the database becomes even more evident during write heavy - and thus page cache expiring - events like Cyber Monday, and flash sales. On top of that, the traffic on our web app servers typically doubles each year, and we invested heavily in building out IdentityCache to help absorb this growth.  For instance, in 2012 during the last pre-IdentityCache sales peak, we saw 130.000 requests per minute generating 21.000 queries per second in comparison with the latest flash sale on April 2013 generated 203.000 requests with only 14.500 queries per second.  

What Exactly is IdentityCache?

IdentityCache is a read through cache for ActiveRecord models. When reading records from the cache, IdentityCache will try to fetch the requested object from memcached. If the cache entry doesn't exist, IdentityCache will load the object from the database and store it in memcache, then the cached copy will be available for subsequent reads and avoid any more trips to the database. This behaviour is key during events that expire the cache often.
Expiration is explicit and does not rely on Memcached's LRU. It is automatic, objects are expired from the cache by issuing memcached delete command as they change in the database via after_commit hooks. This is important because given a row in the database we can always calculate its cache key based on the current table schema and the row’s id. There is no need for the user to ever call delete themselves. It was a conscious decision to take expiration away from day-to-day developer concerns.
This has been a huge help as the characteristics of our application and Rails have changed. One great example of this is how Ruby on Rails changed what actions would fire after_commit hooks. For instance, in Rails 3.2, touch will not fire an after_commit. Instead of having to add expires, and think about all the possible ramifications every time, we added the after_touch hook into IdentityCache itself.
Aside from the default key, built from the schema and the row id, IdentityCache uses developer defined indexes to access your models. Those indexes simply consist of keys that can be created deterministically from other row fields and the current schema. Declaring an index will also add a helper method to fetch your cached models using said index.
IdentityCache is opt-in, meaning developers need to explicitly specify what should be indexed and explicitly ask for data from the cache. It is important that developers don’t have to guess whether calling a method will bring a cached entry or not. 
We think this is a good thing. Having caching hook in automatically is nice in its simplest form.  However, IdentityCache wasn't built for simple applications, it has been built for large, complicated applications where you want, and need to know what's going on.

Down to the Numbers

If that wasn’t good enough, here are some numbers from Shopify itself.
This is an example of when we introduced IdentityCache to one of the objects that is heavily hit on the shop storefronts. As you can see we cut out thousands of calls to the database when accessing this model. This was huge since the database is one of the heaviest contended components of Shopify.
This example shows similar results once IdentityCache was introduced. We eliminated what was approaching 50K calls per minute (which was growing steadily) to almost nothing since the subscription was now being embedded with the Shop object. Another huge win from IdentityCache.

Specifying Indexes

Once you include IdentityCache into your model, you automatically get a fetch method added to your model class. Fetch will behave like find plus the read-through cache behaviour.
You can also add other indexes to your models so that you can load them using a different key. Here are a few examples:
class Product < ActiveRecord::Base
  include IdentityCache


class Product < ActiveRecord::Base
  include IdentityCache
  cache_index :handle

We’ve tried to make IdentityCache as simple as possible to add to your models. For each cache index you add, you end up with a fetch_* method on the model to fetch those objects from the cache.
You can also specify cache indexes that look at multiple fields. The code to do this would be as follows:
class Product < ActiveRecord::Base
  include IdentityCache
  cache_index :shop_id, :id

Product.fetch_by_shop_id_and_id(shop_id, id)

Caching Associations

One of the great things about IdentityCache is that you can cache has_one, has_many and belongs_to associations as well as single objects. This really sets IdentityCache apart from similar libraries.
This is a simple example of caching associations with IdentityCache:
class Product < ActiveRecord::Base
  include IdentityCache
  has_many :images
  cache_has_many :images

@product = Product.fetch(id)
@images = @product.fetch_images
What happens here is the product is fetched from either Memcached or the database if it's a cache miss. We then look for the images in the cache or database if we get another miss. This also works for both has_one and belongs_to associations with the cache_has_one and cache_belongs_to IdentityCache, respectively.
What if we always want to load the images though, do we always need to make the two requests to the cache? 

Embedding Associations

With IdentityCache we can also embed the associations with the parent object so that when you load the parent the associations are also cached and loaded on a cache hit. This avoids needing to make the multiple Memcached calls to load all the cached data. To enable this you simple need to add the ':embed => true' options. Here's a little example:
class Product < ActiveRecord::Base
  include IdentityCache
  has_many :images
  cache_has_many :images, :embed => true

@product = Product.fetch(id)
@images = @product.fetch_images
The main difference with this example versus the previous is that the '@product.fetch_images' call won't hit Memcached a second time; the data is already loaded when we fetch the product from Memcached.
The tradeoffs of using embed are: first your entries in memcached will be larger, as they’ll have to store data for the model and its embedded associations, second the whole cache entry will expire on changes to any of the models cached.
There are a number of other options and different ways you can use IdentityCache which are highlighted on the github page, I highly encourage anyone interested to take a look at those examples for more details. Please check it out for yourself and let us know what you think!

Continue reading

API Announcement: Shopify's growing, and so are our integers!

As Shopify and our merchants have grown, the sheer amount of data we deal with has grown as well. With that in mind, we've been forced to transition towards ID columns (and ID references) from 32-bit to 64-bit to ensure that we can continue to issue unique IDs.

MySQL, Ruby on Rails, and most other technologies default ID columns to INT(11). While we're very excited that we're hitting this limitation (problems caused by massive growth are the best problems to have), it is going to require Shopify App and integration developers to make the same transition to ensure compatibility with Shopify in the future.

In short: anywhere you're dealing with an ID from Shopify (whether that's shop_ID, order_ID, or any other ID), you need to be prepared to deal with integers larger than 32-bit datatypes can provide.

If you're a Rails developer, this should help.

This change is happening on May 10th, 2013 – at this point, your apps and integrations need to be switched over to continue working!

Continue reading

What Does Your Webserver Do When a User Hits Refresh?

Your web application is likely rendering requests when the requesting client has already disconnected. Eric Wong helped us devise a patch for the Unicorn webserver that will test the client connection before calling the application, effectively dropping disconnected requests before wasting app server rendering time.

The Flash Sale

A common traffic pattern we see at Shopify is the flash sale, where a product will be discounted heavily or only available for a very short period of time. Our customer's flash sales can cause traffic spikes an order of magnitude above our typical traffic rate.

This blog post highlights one of the problems dealing with these traffic surges that we solved during our preparation for the holiday shopping season.

In a flash sale scenario, with our app servers under high load, response time grows.  As our response time increases, customers attempting to buy items will hit refresh in frustration.  This was causing a snowball effect that would contribute to reduced availability.

Connection Queues 

Each of our application servers run Nginx in front of many Unicorn workers running our Rails application.  When Nginx receives a request, it opens a queued connection on the shared socket that is used to communicate with Unicorn.  The Unicorn workers work off requests in the order they're placed on the socket’s connection backlog.  

The worker process looks something like:

The second step takes the bulk majority of time of processing a request.  Under load, the queue of pending requests sitting on the UNIX socket from Nginx grows until it reaches maximum capacity (SOMAXCONN).  When the queue reaches capacity, Nginx will immediately return a 502 to incoming requests as it has nowhere to queue the connection.

Pending Requests

While the app worker is busy rendering a request, the pending requests in the socket backlog represent users waiting for a result.  If a users hits refresh, their browser closes the current connection and their new connection enters the end of the queue (or nginx returns a 502 if the queue is full).  So what happens when the application server gets to the user's original request in the queue?

Nginx and HTTP 499

The HTTP 499 response code is not part of the HTTP standard.  Nginx logs this response code when a user disconnects before the application returned a result.  Check your logs - an abundance of 499s is a good indication that your application is too slow or over capacity, as people are disconnecting instead of waiting for a response.  Your Nginx logs will always have some 499s due to clients disconnecting before even a quick request finishes.

HTTP 200 vs HTTP 499 Responses During a Flash Sale

When Nginx logs an HTTP 499 it also closes the downstream connection to the application, but it is up to the application to detect the closed connection before wasting time rendering a page for a client who already disconnected.

Detecting Closed Sockets

With the asynchronous nature of sockets, detecting a closed connection isn't straightforward.  Your options are:

  • Call select() on the socket.  If a connection is closed, it will return as "data available" but a subsequent read() call will fail.
  • Attempt to write to the socket.

Unfortunately it is typical for web applications to find out the client socket is closed only after spending the time and resources rendering the page, when it attempts to write the response.  This is what our Rails application was doing.  The net effect was that for every time a user pressed refresh, we would render that page, even if the user had already disconnected.  This would cause a snowball effect until eventually our app workers were doing little but rendering pages and throwing them away and our service was effectively down.

What we wanted to do was test the connection before calling the application, so we could filter out closed sockets and avoid wasting time.  The first detection option above is not great: select() requires a timeout, and generally select() with even the shortest timeout will take a fraction of a millisecond to complete.  So we went with the second solution:  Write something to the socket to test it, before calling the application.  This is typically the best way to deal with resources anyways: just attempt to use them and there will be an error if there’s something in the way.  Unicorn was already acting that way, just not until after wasting time rendering the page.

Just write an 'H'

Thankfully all HTTP responses start with "HTTP/1.1", so (rather cheekily) our patch to Unicorn writes this string to test the connection before calling the application.  If writing to the socket fails, Unicorn moves on to process the next request and only a trivial amount of time is spent dealing with the closed connection.

Eric Wong merged this change into Unicorn master and soon after released Unicorn V4.5.0.  To use this feature you must add 'check_client_connection true' to your Unicorn configuration.


Continue reading

Shopify for Designers Workshops 2013

In 2012 we went on the road and delivered five "Shopify for Designers" in the UK and USA. Not only were they a lot of fun but the feedback was very positive. If you missed out last year we have good news.

Over the last couple of months we have been hard at work putting together a full global programme of workshops and meet ups for 2013. We'll be visiting cities in the USA, Canada and the UK. Exciting times.

Continue reading

Introducing the Super Debugger: A Wireless, Real-Time Debugger for iOS Apps

LLDB is the current state of the art for iOS debugging, but it’s clunky and cumbersome and doesn’t work well with objects. It really doesn't feel very different from gdb. It's a solid tool but it requires breakpoints, and although you can integrate with Objective C apps, it's not really built for it. Dealing with objects is cumbersome, and it's hard to see your changes.

This is where Super Debugger comes in. It's a new tool for rapidly exploring your objects in an iOS app whether they're running on an iPhone, iPad, or the iOS Simulator, and it's available today on Github. Check over the included readme to see what it can do in detail.

Today we're going to run through a demonstration of an included app called Debug Me.

  1. Clone the superdb repository locally to your Mac and change into the directory.

    git clone
        cd superdb
  2. Open the included workspace file, SuperDebug.xcworkspace, select the Debug Me target and Build and Run it for your iOS device or the Simulator. Make sure the device is on the same wifi network as your Mac.

  3. Go back to Xcode and change to the Super Debug target. This is the Mac app that you'll use to talk to your iOS app. Build and Run this app.

  4. In Super Debug, you'll see a window with a list of running, debuggable apps. Find Debug Me in the list (hint: it's probably the only one!) and double click it. This will open up the shell view where you can send messages to the objects in your app, all without setting a single break point.

  5. Now let's follow the instructions shown to us by the Debug Me app.

  6. In the Mac app, issue the command .self (note the leading dot). This updates the self pointer, which will execute a Block in the App delegate that returns whatever we want to be pointed to by the variable self. In this case (and in most cases), we want self to point to the current view controller. For Debug Me, that means it points to our instance of DBMEViewController after we issue this command.

  7. Now that our pointer is set up, we can send a message to that pointer. Type self redView layer setMasksToBounds:YES. This sends a chain of messages in F-Script syntax. In Objective C, it would look like [[[self redView] layer] setMasksToBounds:YES]. Here we omit the braces because of our syntax.

    We do use parentheis sometimes, when passing the result of a message send would be ambiguous, for example something like this in Objective C: [view setBackgroundColor:[UIColor purpleColor]] would be view setBackgroundColor:(UIColor purpleColor) in our syntax.

  8. The previous step has no visible result, so lets make a change. Type self redView layer setCornerRadius:15 and see the red view get nice rounded corners!

  9. Now for the impressive part. Move your mouse over the number 15 and see it highlight. Now click and drag left or right, and see the view's corner radius update in real time. Awesome, huh?

That should be enough to give you a taste of this brand new debugger. Interact with your objects in real-time. Iterate instantly. No more build, compile, wait. It's now Run, Test, Change. Fork the project on Github and get started today.

Continue reading

New Feature - Multiple Tracking Number Support in Fulfillments

Sometimes when you place an order through a fulfillment service, such as Amazon your order may be fulfilled from several fulfillment centers.  Unfortunately this information would not be provided to the customer which could lead to some confusion.

We've now added support for multiple tracking numbers in a single fulfillment, which you can use immediately in your Shipment confirmation templates.

You can start using these in your templates right away

The tracking details for these items are as follows:
{% for tracking_number in fulfillment.tracking_numbers %}
  {{ tracking_number }}
{% endfor %}

For shipping status on these items:
{% for tracking_url in fulfillment.tracking_urls %}
  {{ tracking_url }}
{% endfor %}

Continue reading

New Feature - Orders Export Includes Payment Reference

The Export Orders feature in your Shop's Administration page will now include the Payment Reference used to cross-reference transactions to the gateway.

The Payment Method has been added to the end of each line of the Order export file:

This change will be put into production on December 19, 2012 at 12:00 EST. If you have any custom processing of the Orders CSV file that Shopify generates, please ensure that you have updated it to handle this new field.

Continue reading