Building and Testing Resilient Ruby on Rails Applications

Black Friday and Cyber Monday are the biggest days of the year at Shopify with respect to every metric. As the Infrastructure team started preparing for the upcoming seasonal traffic in the late summer of 2014, we were confident that we could cope, and determined resiliency to be the top priority. A resilient system is one that functions with one or more components being unavailable or unacceptably slow. Applications quickly become intertwined with their external services if not carefully monitored, leading to minor dependencies becoming single points of failure.

For example, the only part of Shopify that relies on the session store is user sign-in - if the session store is unavailable, customers can still purchase products as guests. Any other behaviour would be an unfortunate coupling of components. This post is an overview of the tools and techniques we used to make Shopify more resilient in preparation for the holiday season.

Continue reading

Tuning Ruby's Global Method Cache

I was recently profiling a production Shopify application server using perf and noticed a fair amount of time being spent in a particular function, st_lookup, which is used by Ruby’s MRI implementation for hash table lookups:

Hash tables are used all over MRI, and not just for the Hash object; global variables, instance variables, classes, and the garbage collector all use MRI’s internal hash table implementation, st_table. Unfortunately, what this profile did not show were the callers of st_lookup. Is this some application code that has gone wild? Is this an inefficiency in the VM?

perf is a great sampling profiler for Linux — it’s low overhead and can safely be used in production. However, up until a few years ago, in order to use the call-graph feature of perf you had to recompile an application with -fno-omit-frame-pointer to get usable stack traces. This gives the compiler one less register to work with, but I believe the trade-off is worth it in most cases. As of Linux 3.7, perf now supports recording call graphs even when code is compiled without -fno-omit-frame-pointer. This works by recording a snapshot of stack memory on each sample. When analyzing the profile later with perf report, the stack data from each sample is combined with DWARF debugging information in order to build a call graph. This increases the amount of data included in the profile, but is a reasonable compromise when compared with having to recompile everything with -fno-omit-frame-pointer.

Now when running perf and collecting call graphs, I was able to see the callers of st_lookup:

From this profile, we can see that a large percentage of time is being spent in rb_method_entry_get_without_cache. At first, I suspected this was being caused by global method cache invalidation or clearing. After using SystemTap on the method__cache__clear probe and seeing a relatively low count, it was clear that this was not the case. At this point, I started digging into the MRI source, trying to understand what exactly happens when a method is called and how method caching actually works. Looking through the MRI source shows that effectively the only place that rb_method_entry_get_without_cache is called is via rb_method_entry:

The idea here is that the global method cache will be checked first (via the GLOBAL_METHOD_CACHE macro), and if the method is found, the method entry will be returned from the cache. If the method is not in the cache, a more expensive method lookup needs to be performed. This involves walking the class hierarchy to find the requested method, which can cause multiple invocations of st_lookup to look up the method from a class method table.

The GLOBAL_METHOD_CACHE macro performs a basic hash on the provided key to locate the entry in the global method cache. Looking at this code, I was surprised to see that the global method cache only had room for 2048 entries. Shopify is a large Rails application with millions of lines of Ruby (when you include gems), and defines hundreds of thousands of methods. A 2048-entry global method cache is nowhere near adequate for an application of our size.

Analyzing method cache hit rates

One of the first things I wanted to do was determine the existing method cache hit-to-miss ratio. One way of doing this would be to add tracing code to our MRI to log this data. This would require us to deploy a custom Ruby build to our app servers, which is not an ideal solution. There is a better way: ftrace user probes.

ftrace is a lightweight tracing framework that is built into the Linux kernel. One of the best features of ftrace is the ability to create dynamic user-mode event probes (uprobes). By adding a probe, ftrace will fire an event whenever a specified function is called in a process. Better yet, you can even fire an event on the execution of a specific instruction. This was all I needed to collect method cache hit-to-miss numbers.

To configure a uprobe, you need to specify the path to the binary and the address of the instruction you want to trace. How do you figure out the address of the instruction? Using objdump:

The first field objdump prints is the address of the function. Using this, I can now create a probe:

Now when I run Ruby, I’ll get an event every time rb_method_entry is called:

What this tells me is that running Ruby with a simple “hello world” caused rb_method_entry to be called 4441 times.

This is really cool! But what I originally wanted to figure out was the global method cache hit to miss ratio. By disassembling rb_method_entry, we can figure out what code paths are executed in both hit and miss cases, which we can use to set the appropriate probes:

If you look at the C implementation of the function, you can see how it maps to the generated assembly. The relevant bits here are:

  • 0x00000000001aa16a is the address that jumps to rb_method_entry_get_without_cache. This is the path we take on a cache miss.
  • 0x00000000001aa194 is the address of the return instruction that we’ll execute on a cache hit. Because cache misses perform an unconditional jump to rb_method_entry_get_without_cache, we can infer that this instruction will only ever be executed on a hit.

With this information, we can set up our two probes, one for global method cache misses and one for hits:

And here are the results running the simple “hello world” script:

This is what we wanted to see! So for a simple hello world script, there were 727 method cache misses and 3714 hits, giving us a hit ratio of about 84%. Now that we have a method of calculating the hit rate, let’s see what it looks like in production!

Rather than have to execute these commands every time I wanted to take a sample, I wrote a small shell script to do it for me:

The script takes a binary of the ruby you want to profile and the number of seconds to collect data. After the number of seconds has elapsed, it will display the number of cache hits and misses, along with the hit rate.

So, how does this look in production? With the default method cache size of 2048 entries, we get about a 90% hit rate. This is pretty good, but according to the results I saw in perf, it could definitely be better.

Introducing RUBY_GLOBAL_METHOD_CACHE_SIZE

The global method cache size is configured as a compile-time #define, GLOBAL_METHOD_CACHE_SIZE. I thought it would be useful to be able to configure the method cache size at runtime, like you can do with garbage collector parameters (RUBY_GC_* environment variables). To that end, I added a new environment variable, RUBY_GLOBAL_METHOD_CACHE_SIZE, that can be used to configure the global method cache size at process start time. This code has been committed upstream and is available in Ruby 2.2.

Now that I had a way to dynamically configure the method cache size, I could run some tests to see how Shopify performed using various cache sizes. I ran the trace_rb_method_cache.sh script on one of our production servers for 60 seconds with a few different cache sizes and collected the results:

Cache Size Hits Misses Hit Rate
2K (Default) 10,227,750 1,114,933 90.17%
16K 11,697,582 516,446 95.77%
32K 13,596,388 425,919 96.96%
64K 14,263,747 324,277 97.78%
128K 13,590,568 289,948 97.91%
256K 11,532,080 275,162 97.67%
512K 12,403,522 256,329 97.98%

From these numbers, once we get around 64K it looks like the hit rates begin to level off.

Now, if we do a perf before and after tuning, we can see some respectable results:

2K method cache

128K method cache

This change gives us a cycle savings of about 3%. Not bad for changing one configuration value!

Summary

Using the system-level profiling tools, perf and ftrace, I was able to find a performance issue that would have never been visible with Ruby-level tooling. These tools are available on any modern Linux distribution, and I encourage you to experiment with them on your own application to see what kind of benefits can be had!

Resources

perf wiki

ftrace

ftrace user probes

Continue reading

Docker at Shopify: How we built containers that power over 100,000 online shops

This is the second in a series of blog posts describing our evolution of Shopify toward a  Docker-powered, containerized data center. This instalment will focus on the creation of the container used in our production environment when you visit a Shopify storefront.

Read the first post in this series here.

Why containerize?

Before we dive into the mechanics of building containers, let's discuss motivation. Containers have the potential to do for the datacenter what consoles did for gaming. In the early days of PC gaming, each game typically required video or sound driver massaging before you got to play. Gaming consoles however, offered a different experience:

  • predictability: cartridges were self-contained fun: always ready-to-run, with no downloads or updates.
  • fast: cartridges used read-only memory for lightning fast speeds.
  • easy: cartridges were robust and largely child-proof - they were quite literally plug-and-play.

Predictable, fast, and easy are all good things at scale. Docker containers provide the building blocks to make our data centers easier to run and more adaptable by placing applications into self-contained, ready-to-run units much like cartridges did for console games.

Continue reading

Rebuilding the Shopify Admin: Improving Developer Productivity by Deleting 28,000 lines of JavaScript

This September, we quietly launched a new version of the Shopify admin. Unlike the launch of the previous major iteration of our admin, this version did not include a major overhaul of the visual design, and for the most part, would have gone largely unnoticed by the user.

Why would we rebuild our admin without providing any noticeable differences to our users? At Shopify, we strongly believe that any decision should be able to be questioned at any time. In late 2012, we started to question whether our framework was still working for us. This post will discuss the problems in the previous version of our admin, and how we decided that it was time to switch frameworks.

Continue reading

Building an Internal Cloud with Docker and CoreOS

This is the first in a series of posts about adding containers to our server farm to make it easier to scale, manage, and keep pace with our business.  

The key ingredients are:

  • Docker: container technology for making applications portable and predictable
  • CoreOS: provides a minimal operating system, systemd for orchestration, and Docker to run containers

Shopify is a large Ruby on Rails application that has undergone massive scaling in recent years. Our production servers are able to scale to over 8,000 requests per second by spreading the load across 1700 cores and 6 TB RAM.

Continue reading

Kafka Producer Pipeline for Ruby on Rails

Kafka Producer Pipeline for Ruby on Rails

In the early fall our infrastructure team was considering Kafka, a highly available message bus. We were looking to solve several infrastructure problems that had come up around that time.

  • We were looking for a reliable way to collect event data and send it to our data warehouse.

  • We were considering a more service-oriented architecture, and needed a standardized way of message passing between the components.

  • We were starting to evaluate containerization of Shopify, and were searching for a way to get logs out of containers.

We were intrigued by Kafka due to its highly available design. However, Kafka runs on the JVM, and its primary user, LinkedIn, runs a full JVM stack. Shopify is mainly Ruby on Rails and Go, so we had to figure out how to integrate Kafka into our infrastructure.

Continue reading

Help the Shopify Dev Team Raise Money for Charity!

A recent phenomenon has taken the tech world by storm: Dogecoin. Though goofy and grammatically unique, the Dogecoin has proven to be an incredible force for good in the world through initiatives like The Dogecoin Foundation

For Shopify Hackdays then, the development team at Shopify took it upon themselves to make a gentlepeople's wager against the Business Development and Talent Acquisition teams at Shopify that the Dev team could raise more money in Dogecoin than the so called hustlers could by starting a Shopify business. Nothing like a good old fashioned competition to raise some money for charity.

With all this said, Hackers vs Hustlers 2014 has started, and we could use your help getting all the doge possible in the hands of our charity Doge wallet! The hackers at Shopify have got every server we can find mining doge: the whole hadoop cluster, every beefy box with GPUs, a bunch of mac minis, and even the Raspberry Pis which power our office dashboards. We're mining lots, but maybe not enough to overtake the hustlers by the end. We'd like your help!

The trick is, the rules strictly prohibit donations of any sort, so we can't just ask for doge directly. We can however just so happen to leave these mining pool credentials lying around, and really it is definitely ok with us if anyone out there wanted out of the good of their own heart to contribute to our mining efforts.

Pool URL: stratum+tcp://pool.teamdoge.com:3333

Worker Username: DataEng.TechBlog

Worker Password: iRAKDHJksM77Mf

The charity we will donate all proceeds from both the hustlers' Shopify store and the hackers' mining efforts will be donated to the CompuCorps TECHYOUTH program, which provides children in low income families the opportunity to learn technology skills, and eventually get jobs in the technology field!

Doge Donations (which won't count for the competition, but will still go to CompuCorps) can be sent to this Dogecoin address: DM6xAdYmjMZd8eBNqZbse9cbGDRGb1ivfP.

Much thanks, many wow, very generous.

Continue reading

Building a Rack middleware

I'm Chris Saunders, one of Shopify's developers. I like to keep journal entries about the problems I run into while working on the various codebases within the company.

Recently we ran into a issue with authentication in one of our applications and as a result I ended up learning a bit about Rack middleware. I feel that the experience was worth sharing with the world at large so here's is a rough transcription of my entry. Enjoy!


I'm looking at invalid form submissions for users who were trying to log in via their Shopify stores. The issue was actually at a middleware level, since we were passing invalid data off to OmniAuth which would then choke because it was dealing with invalid URIs.

The bug in particular was we were generating the shop URL based on the data that the user was submitting. Normally we'd be expecting something like mystore.myshopify.com or simply mystore, but of course forms can be confusing and people put stuff in there like http://mystore.myshopify.com or even worse my store. We'd build up a URL and end up passing something like https://http::/mystore.myshopify.com.myshopify.com and cause an exception to get raised.

Another caveat is that we aren't able to even sanitize the input before passing it off to OmniAuth, unless we were to add more code to the lambda that we pass into the setup initializer.

Adding more code to an initializer is definitely less than optimal, so we figured that we could implement this in a better way: adding a middleware to run before OmniAuth such that we could attempt to recover the bad form data, or simply kill the request before we get too deep.

We took a bit of time to learn about how Rack middlewares work, and looked to the OmniAuth code for inspiration since it provides a lot of pluggability and is what I'd call a good example of how to build out easily extendable code.

We decided that our middleware would be initialized with a series of routes to run a bunch of sanitization strategies on. Based on how OmniAuth works, I gleaned that the arguments after config.use MyMiddleWare would be passed into the middleware during the initialization phase - perfect! We whiteboarded a solution that would work as follows:

Now that we had a goal we just had to implement it. We started off by building out the strategies since that was extremely easy to test. The interface we decided upon was the following:

We decided that the actions would be destructive, so instead of creating a new Rack::Request at the end of our strategies call, we'd change values on the object directly. It simplifies things a little bit but we need to be aware that order of operations might set some of our keys to nil and we'd have to anticipate that.

The simplest of sanitizers we'd need is one that cleans up our whitespace. Because we are building these for .myshopify.com domains we know the convention they follow: dashes are used as separators between words if the shop was created with spaces. For example, if I signed up with my super awesome store when creating a shop, that would be converted into my-super-awesome-store. So if a user accidentally put in my super awesome store we can totally recover that!

Now that we have a sanitization strategy written up, let's work on our actual middleware implementation.

According to the Rack spec, all we really need to do is ensure that we return the expected result: an array that consists of the following three things: A response code, a hash of headers and an iterable that represents the content body. An example of the most basic Rack response is:

Per the Rack spec, middlewares are always initialized where the first object is a Rack app, and whatever else afterwards. So let's get to the actual implementation:

That's pretty much it! We've written up a really simple middleware that takes care of cleaning up some bad user input that necessarily isn't a bad thing. People make mistakes and we should try as much as possible to react to this data in a way that isn't jarring to the users of our software.

You can check out our implementation on Github and install it via RubyGems. Happy hacking!

Continue reading

Shopify open-sources Sarama, a client for Kafka 0.8 written in Go

Shopify has been hard at work scaling its data pipeline for quite some time now, and it had gotten to the point that plain old log files just wouldn’t cut it. We wanted to do more and more with our data, but ran into problems at every turn:

  • Batch processing of logs required log rotation, which introduced unacceptable latency into other parts of the pipeline.
  • Traditional log aggregation tools like Flume didn’t provide the features, reliability, or performance that we were looking for.
  • Fan-out configuration was promising to become unmanageable. We wanted anyone at Shopify to be able to use and experiment with this data, but the configuration to get our logs to that many endpoints was mind-bogglingly complex.

At the end of the day what we were looking for was a blazing-fast, scalable and reliable publish-subscribe messaging system, and LinkedIn’s Kafka project fit the bill perfectly. Open-sourced just a short while ago with the help of the Apache Foundation, Kafka’s upcoming 0.8 release (which LinkedIn has already deployed internally) provided exactly what we were looking for. With Kafka we would be able to easily aggregate, process (for internal dashboards), and store (into Hadoop) the millions of events that happen on our systems each day.

As always, there was a hitch. Shopify’s system language of choice is Go, a concurrency-friendly, safe and scalable language from Google. Kafka only provides clients for Java and Scala, which would have required deploying a heavy JVM instance to all of our servers. In this case, compromising one way or the other was not an option; the lure of being able to do Kafka message processing from Go was too strong, so we just wrote it ourselves! The result is Sarama, a Kafka 0.8 client for Go.

Sarama is a fully-functional MIT-licensed client library capable of producing and consuming messages from Kafka brokers. It has already been deployed to production on Shopify servers, and has processed millions of messages. 

Go check out Sarama today and let us know what you think:

Sarama Github Repository

Sarama Godoc Documentation

Continue reading

"Variant Barcode" and "Image Alt Text" Now in Product Export

The Export Products feature in your Shop's Administration page will now include the variant barcode for each of your products' variants and the alt text for each of your products' images.

The Variant Barcode column has been added between the Variant Taxable and Image Src columns on each line of the product export file:


The Image Alt Text column has been added to the end of each line of the product export file:

This change was put into production on July 18, 2013 at 12:00 EDT. If you have any custom processing of the Products CSV file that Shopify generates, please ensure that you update it to handle these new fields.

Continue reading

New Feature - Product Export Includes Published

The Export Products feature in your Shop's Administration page will now include a whether that product is currently Published or not.

The Payment Method has been added to the end of each line of the Product export file:

This change will be put into production on April 18, 2013 at 12:00 EDT. If you have any custom processing of the Products CSV file that Shopify generates, please ensure that you have updated it to handle this new field.

 

Continue reading