Docker at Shopify: How we built containers that power over 100,000 online shops


This is the second in a series of blog posts describing our evolution of Shopify toward a  Docker-powered, containerized data center. This instalment will focus on the creation of the container used in our production environment when you visit a Shopify storefront.

Read the first post in this series here.

Why containerize?

Before we dive into the mechanics of building containers, let's discuss motivation. Containers have the potential to do for the datacenter what consoles did for gaming. In the early days of PC gaming, each game typically required video or sound driver massaging before you got to play. Gaming consoles however, offered a different experience:

  • predictability: cartridges were self-contained fun: always ready-to-run, with no downloads or updates.
  • fast: cartridges used read-only memory for lightning fast speeds.
  • easy: cartridges were robust and largely child-proof - they were quite literally plug-and-play.

Predictable, fast, and easy are all good things at scale. Docker containers provide the building blocks to make our data centers easier to run and more adaptable by placing applications into self-contained, ready-to-run units much like cartridges did for console games.

Continue reading article ›

Rebuilding the Shopify Admin: Improving Developer Productivity by Deleting 28,000 lines of JavaScript


This September, we quietly launched a new version of the Shopify admin. Unlike the launch of the previous major iteration of our admin, this version did not include a major overhaul of the visual design, and for the most part, would have gone largely unnoticed by the user.

Why would we rebuild our admin without providing any noticeable differences to our users? At Shopify, we strongly believe that any decision should be able to be questioned at any time. In late 2012, we started to question whether our framework was still working for us. This post will discuss the problems in the previous version of our admin, and how we decided that it was time to switch frameworks.

Continue reading article ›

Building an Internal Cloud with Docker and CoreOS


This is the first in a series of posts about adding containers to our server farm to make it easier to scale, manage, and keep pace with our business.  

The key ingredients are:

  • Docker: container technology for making applications portable and predictable
  • CoreOS: provides a minimal operating system, systemd for orchestration, and Docker to run containers

Shopify is a large Ruby on Rails application that has undergone massive scaling in recent years. Our production servers are able to scale to over 8,000 requests per second by spreading the load across 1700 cores and 6 TB RAM.

Continue reading article ›

Kafka Producer Pipeline for Ruby on Rails


In the early fall our infrastructure team was considering Kafka, a highly available message bus. We were looking to solve several infrastructure problems that had come up around that time.

  • We were looking for a reliable way to collect event data and send it to our data warehouse.

  • We were considering a more service-oriented architecture, and needed a standardized way of message passing between the components.

  • We were starting to evaluate containerization of Shopify, and were searching for a way to get logs out of containers.

We were intrigued by Kafka due to its highly available design. However, Kafka runs on the JVM, and its primary user, LinkedIn, runs a full JVM stack. Shopify is mainly Ruby on Rails and Go, so we had to figure out how to integrate Kafka into our infrastructure.

Continue reading article ›

Help the Shopify Dev Team Raise Money for Charity!

A recent phenomenon has taken the tech world by storm: Dogecoin. Though goofy and grammatically unique, the Dogecoin has proven to be an incredible force for good in the world through initiatives like The Dogecoin Foundation

For Shopify Hackdays then, the development team at Shopify took it upon themselves to make a gentlepeople's wager against the Business Development and Talent Acquisition teams at Shopify that the Dev team could raise more money in Dogecoin than the so called hustlers could by starting a Shopify business. Nothing like a good old fashioned competition to raise some money for charity.

With all this said, Hackers vs Hustlers 2014 has started, and we could use your help getting all the doge possible in the hands of our charity Doge wallet! The hackers at Shopify have got every server we can find mining doge: the whole hadoop cluster, every beefy box with GPUs, a bunch of mac minis, and even the Raspberry Pis which power our office dashboards. We're mining lots, but maybe not enough to overtake the hustlers by the end. We'd like your help!

The trick is, the rules strictly prohibit donations of any sort, so we can't just ask for doge directly. We can however just so happen to leave these mining pool credentials lying around, and really it is definitely ok with us if anyone out there wanted out of the good of their own heart to contribute to our mining efforts.

Pool URL: stratum+tcp://

Worker Username: DataEng.TechBlog

Worker Password: iRAKDHJksM77Mf

The charity we will donate all proceeds from both the hustlers' Shopify store and the hackers' mining efforts will be donated to the CompuCorps TECHYOUTH program, which provides children in low income families the opportunity to learn technology skills, and eventually get jobs in the technology field!

Doge Donations (which won't count for the competition, but will still go to CompuCorps) can be sent to this Dogecoin address: DM6xAdYmjMZd8eBNqZbse9cbGDRGb1ivfP.

Much thanks, many wow, very generous.

Continue reading article ›

Building a Rack middleware

I'm Chris Saunders, one of Shopify's developers. I like to keep journal entries about the problems I run into while working on the various codebases within the company.

Recently we ran into a issue with authentication in one of our applications and as a result I ended up learning a bit about Rack middleware. I feel that the experience was worth sharing with the world at large so here's is a rough transcription of my entry. Enjoy!

I'm looking at invalid form submissions for users who were trying to log in via their Shopify stores. The issue was actually at a middleware level, since we were passing invalid data off to OmniAuth which would then choke because it was dealing with invalid URIs.

The bug in particular was we were generating the shop URL based on the data that the user was submitting. Normally we'd be expecting something like or simply mystore, but of course forms can be confusing and people put stuff in there like or even worse my store. We'd build up a URL and end up passing something like https://http::/ and cause an exception to get raised.

Another caveat is that we aren't able to even sanitize the input before passing it off to OmniAuth, unless we were to add more code to the lambda that we pass into the setup initializer.

Adding more code to an initializer is definitely less than optimal, so we figured that we could implement this in a better way: adding a middleware to run before OmniAuth such that we could attempt to recover the bad form data, or simply kill the request before we get too deep.

We took a bit of time to learn about how Rack middlewares work, and looked to the OmniAuth code for inspiration since it provides a lot of pluggability and is what I'd call a good example of how to build out easily extendable code.

We decided that our middleware would be initialized with a series of routes to run a bunch of sanitization strategies on. Based on how OmniAuth works, I gleaned that the arguments after config.use MyMiddleWare would be passed into the middleware during the initialization phase - perfect! We whiteboarded a solution that would work as follows:

Now that we had a goal we just had to implement it. We started off by building out the strategies since that was extremely easy to test. The interface we decided upon was the following:

We decided that the actions would be destructive, so instead of creating a new Rack::Request at the end of our strategies call, we'd change values on the object directly. It simplifies things a little bit but we need to be aware that order of operations might set some of our keys to nil and we'd have to anticipate that.

The simplest of sanitizers we'd need is one that cleans up our whitespace. Because we are building these for domains we know the convention they follow: dashes are used as separators between words if the shop was created with spaces. For example, if I signed up with my super awesome store when creating a shop, that would be converted into my-super-awesome-store. So if a user accidentally put in my super awesome store we can totally recover that!

Now that we have a sanitization strategy written up, let's work on our actual middleware implementation.

According to the Rack spec, all we really need to do is ensure that we return the expected result: an array that consists of the following three things: A response code, a hash of headers and an iterable that represents the content body. An example of the most basic Rack response is:

Per the Rack spec, middlewares are always initialized where the first object is a Rack app, and whatever else afterwards. So let's get to the actual implementation:

That's pretty much it! We've written up a really simple middleware that takes care of cleaning up some bad user input that necessarily isn't a bad thing. People make mistakes and we should try as much as possible to react to this data in a way that isn't jarring to the users of our software.

You can check out our implementation on Github and install it via RubyGems. Happy hacking!

Continue reading article ›

Shopify open-sources Sarama, a client for Kafka 0.8 written in Go

Shopify has been hard at work scaling its data pipeline for quite some time now, and it had gotten to the point that plain old log files just wouldn’t cut it. We wanted to do more and more with our data, but ran into problems at every turn:

  • Batch processing of logs required log rotation, which introduced unacceptable latency into other parts of the pipeline.
  • Traditional log aggregation tools like Flume didn’t provide the features, reliability, or performance that we were looking for.
  • Fan-out configuration was promising to become unmanageable. We wanted anyone at Shopify to be able to use and experiment with this data, but the configuration to get our logs to that many endpoints was mind-bogglingly complex.

At the end of the day what we were looking for was a blazing-fast, scalable and reliable publish-subscribe messaging system, and LinkedIn’s Kafka project fit the bill perfectly. Open-sourced just a short while ago with the help of the Apache Foundation, Kafka’s upcoming 0.8 release (which LinkedIn has already deployed internally) provided exactly what we were looking for. With Kafka we would be able to easily aggregate, process (for internal dashboards), and store (into Hadoop) the millions of events that happen on our systems each day.

As always, there was a hitch. Shopify’s system language of choice is Go, a concurrency-friendly, safe and scalable language from Google. Kafka only provides clients for Java and Scala, which would have required deploying a heavy JVM instance to all of our servers. In this case, compromising one way or the other was not an option; the lure of being able to do Kafka message processing from Go was too strong, so we just wrote it ourselves! The result is Sarama, a Kafka 0.8 client for Go.

Sarama is a fully-functional MIT-licensed client library capable of producing and consuming messages from Kafka brokers. It has already been deployed to production on Shopify servers, and has processed millions of messages. 

Go check out Sarama today and let us know what you think:

Sarama Github Repository

Sarama Godoc Documentation

Continue reading article ›

"Variant Barcode" and "Image Alt Text" Now in Product Export


The Export Products feature in your Shop's Administration page will now include the variant barcode for each of your products' variants and the alt text for each of your products' images.

The Variant Barcode column has been added between the Variant Taxable and Image Src columns on each line of the product export file:

The Image Alt Text column has been added to the end of each line of the product export file:

This change was put into production on July 18, 2013 at 12:00 EDT. If you have any custom processing of the Products CSV file that Shopify generates, please ensure that you update it to handle these new fields.

Continue reading article ›

New Feature - Product Export Includes Published

The Export Products feature in your Shop's Administration page will now include a whether that product is currently Published or not.

The Payment Method has been added to the end of each line of the Product export file:

This change will be put into production on April 18, 2013 at 12:00 EDT. If you have any custom processing of the Products CSV file that Shopify generates, please ensure that you have updated it to handle this new field.


Continue reading article ›

Spring Into a New Job

Do you want to be a Shopify Guru? Our support team is now running 24/7, so we are looking for Gurus to help out on Evenings, Weekends and Overnights to help out our merchants around the clock.

While we’re not huge fans of traditional interviews here at Shopify, we are HUGE fans of parties. So instead of inviting you in for a ho-hum one-on-one, we’re hosting a big bash for potential Shopify Gurus on April 18th from 7-10pm. Specific location is TBA, but it will be in Ottawa, Canada.

You’ll get to hang out with fun people who are interested in working at Shopify, chat about the potential job, have a few laughs – and a few drinks! You can meet the people who might become your colleagues, and get a great feel for the job, the space, and Shopify’s culture.

But wait! Before you jump on the party bus, you should probably know exactly what a Shopify Guru is:

Shopify Guru
[shaw-puh-fahy goo-roo]
1. A rare, interesting character who gets a kick out of helping Shopify’s customers get their stores up and running.  A guru is comfortable on the phone and can type like a maniac. In the wild, gurus are often spotted laughing at or telling a great joke, as they have a naturally keen sense of humour.

Space is limited, so this party is invite-only. If you’re interested in attending, please enter your info in the provided fields at the bottom of one of our Guru job postings: weekend and evenings and overnight.

Mention in your application that you want to attend our Guru party on April 18th, and we’ll be in touch!


Continue reading article ›

Create an online store in minutesTry Shopify Free