Sysadmin

So You Think You Can Ops?

Or: How does one stand out from amongst a crowd of operations candidates?

This question, in one form or another, has been posed to or around me numerous times over the past few months. It is fresh in my mind because, as it happens, Shopify is presently looking to hire a few good operations engineers. Last week, I answered just such a question on Quora. I am reproducing an edited and expanded version of my answer here, hopefully to the benefit of a few highly-qualified candidates (hint, hint).

Be passionate. This sounds wishy-washy, but it almost always shines through. Spend time setting up your own systems and networks. Keep up-to-date with new technologies and most importantly, use them. It may not count for as much to larger corporations, but your own personal (by which I mean non-professional) experiences with relevant technologies are just as useful and infinitely more telling; it shows that you take initiative and enjoy the type of work you would be doing.

Have a presence online. When we hire developers, we look for github repos, open-source contributions and personal projects. It’s not quite as easy for admins, but it’s not impossible. Some easy ways to establish a presence include participating at serverfault and Quora, blogging about new technologies and expressing your opinion. Always be mindful of what you publish online; if it’s out there, prospective employers will find it.

Write a web app. It doesn’t have to be anything big. Hell, make your own blogging software if you want to. What it is does not matter; what matters is that you do it. Team up with a developer if you can. You can literally create relevant experience, and you will learn a ton about operations and devops in the process. I believe this is the greatest thing candidates with limited experience can do to help themselves.

Be a good person. Yes, your primary responsibilities will be to work on servers, but don’t think there is no human interaction; you will spend a lot of time interacting with developers in addition to members of your own team. This is especially true when working for a web services company. So make sure to show prospective employers and colleagues that you are someone who they would enjoy working with—friendly, courteous and happy to help.

Automate everything. This goes beyond cron jobs and a few bash scripts. If you ever want to take vacation or get something approaching a good night’s sleep, it would behoove you to know how to automate provisioning servers, deploying applications and backing up your database. We’re big advocates of Chef, but there are a lot of shops using Puppet out there.

To the cloud!. And I’m not talking about downloading Microsoft Photo Fuse. Amazon recently introduced a free usage tier for new users of Amazon Web Services, so there’s no excuse to not familiarize yourself with EC2 and S3, two of the more popular cloud services out there. Bonus points will be awarded for having used the APIs.

Be on the cutting edge. Anybody can setup Linux, a web server and MySQL; if you want to work with the latest and greatest technologies, it would make sense to know a thing or two about those technologies. Here at Shopify, we do not shy away from new technologies, and we expect the same of our operations staff. If I see practical redis experience on your resume, you’ll probably be getting a call from me.

And of course, all the general advice applies, i.e. learn about your prospective employers, show an interest by asking questions during interviews, find out about your interviewer if possible, be thorough with your correspondence, etc. It’s not hard to spell-check, but it says a lot about you when you don’t.

I hope this helps someone out there in their quest to be the next operations superstar. Why not at Shopify? We’re looking to hire a few good Operations Engineers.

Continue reading

Session hijacking protection

There’s been a lot of talk in the past few weeks about “Firesheep”, a new program that lets users hijack other users’ accounts on many different websites. But there’s no need to worry about your Shopify account — we’ve taken steps to ensure your account can’t be hijacked and your data is safe.

Firesheep is a Firefox plugin (a program that integrates right into the Firefox browser) that makes it easy to perform HTTP session cookie hijacks when using an insecure connection on an untrusted network. This kind of attack is nothing new, but Firesheep makes it dead simple and shows how prevalent it is.

The attack consists of stealing cookie data over an untrusted network and using that data to log in to other people’s user accounts. Many websites that you use daily, including Shopify, are susceptible to this kind of attack.

Naturally we reacted to this by taking measures to ensure that this can’t happen to our users. All of your Shopify admin data is now fully secure, encrypted, and protected from Firesheep attacks.

Technical Details

The only way to ensure that cookie data, or any data sent over HTTP for that matter, is not been spied upon is end-to-end encryption. Currently the solution for this is SSL.

Last week we made the switch to all SSL in the Shopify admin area. This has been applied to all URLs and all subscription plans. This means that any request made to Shopify will be forced to use SSL for secure encryption.

But this is not quite enough to ensure that cookie data is not hijacked. By default HTTP cookies are sent over secured, as well as unsecured, connections. Without taking the extra step to secure the HTTP cookie as well, your session is still vulnerable.

The Problem

In Shopify’s case we weren’t able to use SSL for all traffic on the site. There are two main areas to Shopify, the shop frontend and the shop backend. In the backend is where a shop’s employees manage product data, fulfill orders, etc. In the frontend is where products are viewed, carts are filled, and checkout happens. All traffic in the backend happens under one domain, *.myshopify.com, with individual accounts having unique subdomains. One wildcard SSL cert allows us to protect the entire backend.

We can’t apply the same strategy to the shop frontends because we allow our merchants to use custom domains for their shops. So there are literally thousands of different domain names pointing at the Shopify servers, each of which would require an SSL cert. An unsecure frontend is not too worrisome since there is no sensitive data being passed around, just information about what’s stored in the cart.

However, this meant that we would need two different session cookies, one for use in the backend to be sent on encrypted connections only, and one for use in the frontend to be sent unencrypted.

Using two different session stores based on routes isn’t something that Ruby on Rails supports out of the box. You set one session store for your application that gets inserted into the middleware chain and handles sessions for your application.

The Solution

So we came up with a

MultiSessionStore
that delegates to multiple session stores based on the
PATH_INFO
Shopify still has only one session store handling all of its sessions, but if the request comes in under the
/admin
path we’ll use the secure cookie, and if it comes in under another path we’ll use the unsecured cookie.

Here is our implementation in its entirety: https://gist.github.com/704099

This last step, the secured cookie, ensures that session cookie data is never available for hijacking.

Continue reading

Shopify's path to Rails 3

The TL;DR version

Shopify recently upgraded to Rails 3!

We saw minor improvements in overall response times but what we’re most happy with is the new API – it means we get to write cleaner code and get features out faster.

However, this upgrade wasn’t trivial – as one of the largest and oldest Rails apps around, the adventure involved jumping through a few hoops. Here’s what we did and what you might consider if you’ve got an established Rails app that you’re thinking of upgrading.

First, some numbers

The first svn check-in to Shopify was on the release date of Rails 0.5. That was in July of 2004, six years ago, which according to @tobi is “roughly 65 years in internet time”.

At that time Shopify had only two active developers. Today it has eleven full time devs working on it.

The Shopify codebase has over 300 files in the app/models directory, over 130 controllers, and almost 100 gem dependencies.
$ find app/models/ -type f | wc -l
     327
$ find app/controllers/ -type f | wc -l
     131
$ bundle show | wc  -l
      95

Over the past 6 years Shopify has been under constant development, amassing nearly 12000 commits. This makes Shopify one of the oldest, most active Rails projects in existence.

Our process

There are many Rails 3 upgrade guides out there, but we didn’t try to follow any of them. We focused on doing as much as we could ahead of time to prepare for Rails 3, and then giving one big final push when 3.0 final was released.

When upgrading a large app to a major release like this we found there are some things you can do to prepare yourself, but at a certain point you’ve just got to bite the bullet and make the final push to get things working.

Bundler

Shopify had been using Bundler in production for 9 months before making the move to Rails 3. Like most, we weren’t convinced of its utility at first, but as the code got more stable we saw how much it helped with deployments and managing development environments. We think Bundler was absolutely the right choice for managing dependencies.

It was pretty painless to use Bundler with Rails 2.3.x, the Bundler documentation has everything that is needed. We’d definitely recommend doing this step ahead of time as it removes one more obstacle in the Rails 3 migration.

XSS

This was a big one. Some more numbers: Shopify has about 100 helper modules and 130 views. The task of updating all of our views/helpers for the new ‘safe by default’ XSS behaviour was a separate migration all its own. This too, we completed a few months before the release of 3.0.

There was no secret way to go about this, just the obvious back-breaking way. Here’s the basic process I followed:

  1. Run the functional tests. Fix any issues that show up there.
  2. Boot up Shopify in my development environment and click around, fixing any issues I see there.
  3. Manually scan through all of the modules in app/helpers, looking for anything suspicious.
  4. Deploy the code to our staging server. Have the team try it out and report any errors to a shared Google spreadsheet (great for collaborative editing).
  5. Code review.
  6. Deploy the code to production and hope that no issues slipped through.

N.B. When new issues come in, do your best to use ack (or some other project search tool) to find any instances of that issue in other views/helpers and correct those as well.

The rest

After getting Bundler and XSS out of the way, the rest of the migration was done as one large chunk. Some of the work in upgrading to Rails 3 was actually going on in parallel to the XSS work.

The first commit to our rails3 branch was made back in February when the first Rails 3 beta was released. At that point we didn’t know how much work it would be to get Shopify running on Rails 3. We were excited about the launch of the beta and the prospect of getting Shopify using it soon.

After a few days of work we ran into some major blockers that were keeping the app from functioning. Work was abondoned on the rails3 branch for 5 months while the 3.0 release became more stable. When the first release candidate came out in July work we resurrected the rails3 branch.

From then (mid-July) until mid-October the rails3 branch saw pretty constant action, never going more than a few days without a commit. There was a lull during the XSS migration, and as devs took on other projects while doing the migration. We remained mindful of the fact that 3.0 final wasn’t yet released and didn’t want to put our changes into production until we had the confidence of that final release.

Since this whole process took several months there was a lot of activity going on in the master branch at the same time. The only advice to offer is merge early and merge often.

When the final release came out we once again underestimated how much work would be involved in getting Shopify the rest of the way on to Rails 3. The day that it was released @tobi put something like the following into our Campfire room “Let’s get Shopify running on Rails 3! Any devs who want to help join the Meeting Room [campfire room].” It was another few weeks before all was finished.

Major stumbling blocks

Routes

Shopify also has lots of routes.

$ rake routes | wc -l
     846

At the beginning of the upgrade process we used the routes rake task that comes with the rails_upgrade plugin but we were still plagued with missing routes throughout the upgrade.

Although our routes tripled in size, the increase was worth it because the new routing API is much nicer to work with.

The old
map.namespace :admin do |admin|
  admin.resources :products, :collection => { :inventory => :get,
    :count => :get },  
    :member => { :duplicate => :post, 
      :sort => :post,
      :reorganize => :any,
      :update_published_status => :post } do |products|        
    products.resources :variants, :controller => "product_variants", :collection => { :reorder => :post, :set => :post, :count => :get }
  end
end
The new
namespace :admin do
  resources :products do
    collection do
      get :count
      get :inventory
    end 

    member do
      post :sort
      post :duplicate
      post :update_published_status
      match :reorganize
    end 

    resources :variants, :controller => 'product_variants' do
      collection do
        get :count
        post :set
        post :reorder
      end 
    end 
  end
end

Libraries

Like everyone else we were tripped up by libraries in need of upgrades for Rails 3 compliance. There was a lot less of this than you’d expect because Shopify implements so much of what it needs internally. Lots of code in Rails core began in Shopify’s code base.

There were updates required to the plugins that Shopify maintains. Otherwise, when we found issues with libraries we were happy to discover that other maintainers were diligent and had already pushed fixes for Rails 3 compatibility, it was just a matter of updating library versions we were tracking.

helper :all

helper(:all) was a configuration option in Rails 2.x. You could add it to a controller and that controller would have access to all helpers modules defined in your application. In 2.x this was part of the default Rails template, but it could be removed for users who didn’t want it.

In Rails 3.0 this has been moved into ActionController::Base and it can no longer be turned off. This can create very weird behaviour like the following: https://gist.github.com/517669

This was causing issues for us since a lot of our helpers define methods with the same name. We ended submitting a patch to Rails that let us continue to use routes with the default naming scheme. The fix is to use the
clear_helpers
method in your
ApplicationController
class ApplicationController
  clear_helpers
  ...
end

Documentation

External services

Shopify integrates with a myriad of external services. Payment gateways through ActiveMerchant, fulfillment services through ActiveFulfillment, shipping providers through ActiveShipping, product search engines, Google Analytics, Google Checkout, the list goes on.

Ensuring that these integrations continued working was very important for us and we would have had issues had we not thoroughly tested them. Don’t overlook this step.

Looking ahead

Towards the end of the upgrade we (jokingly) asked ourselves if it was really worthwhile to upgrade to Rails 3. After all, we were doing just fine with Rails 2.x, and upgrading to 3.0 was not trivial.

To give you an idea of how much code was changed, here’s the diffstat from Github:

But we soon came to realize that there are a lot of exciting things coming in future releases in the 3.x series and this is the way forward. We’re really excited about getting to use stuff like Arel 2.0, Automatic Flushing, Identity Map, and lots of other goodies.

The Rails project and its surrounding ecosystem are moving ahead quickly. By staying on top of it, we can provide the best tools for our developers and the best experience for our customers.

Continue reading

Outage Report

Last night an outage occurred with Shopify’s asset server that was not detected by our monitoring setup. Our monitoring normally notifies 3 staff members via SMS within minutes of any technical issues. Unfortunately this issue went undetected and therefore none of our admins were notified. This led to the extended outage of Shopify assets like images and stylesheets.

Following is a detailed post-mortem from our system administrator Alex, for the technically inclined:

What happened: yesterday we briefly switched to S3 for asset hosting. At this time, two additional changes were made: The asset proxy’s hostname was changed (from an EC2-provided default) and monitoring was disabled (because S3 returns 403 Access Denied instead of our usual Shopify Asset not found page, which Pingdom interprets as a fail). We ended up rolling the S3 change back. I reverted the asset proxy changes quickly as I was on a bench on the street while walking home, but I did not revert the hostname or monitoring changes. At some point last night the log rotation script refreshed Squid, which freaked out because it could not resolve its own hostname for some reason, which triggered the downtime.

We are very sorry about this and we are in the process of tightening up our monitoring and escalation setup to ensure that a problem like this cannot go undetected again.

Continue reading

Issues Resolved

At approximately 7:45 Eastern time on Sunday March 22nd the myshopify.com server cluster experienced a Distributed Denial of Service attack (DDoS) causing our main firewall to become extremely slow. This slowness prevented our other backup firewalls from taking over. This attack resulted in the entirety of Shopify.com becoming unavailable. We were able to force data to the other firewalls but they too were immediately over run by the DDoS. It was not until we called on the admins of our data center to help us resolve this issue that we learned it was an DDoS on Shopify.

As of 10:35 EST Shopify.com is back up and running. We sincerely apologize for this downtime, and that this type of attack was able to take place.

Update #2 related to the first problems, many people started seeing the following error around 2:00 EST: Liquid error: s3.amazonaws.com temporarily unavailable. In many cases this lead to the admin being unavailable or the store front not to render right. This issue is now resolved as of 2:28 EST.

As you can imagine this has been an interesting day for us. We are taking steps to prevent it from ever being able to happen again and run a full analysis on the events. A truckload of new server hardware is already en route.

Tobias Lütke
CEO, Founder

Continue reading

Shopify DNS Service Fully Restored

I’m happy to announce that DNS service for the shopify.com, myshopify.com and jadedpixel.com domains has been fully migrated to our new DNS hosting provider, www.easydns.com

EasyDNS operates a redundant, geographically-distributed DNS server network, with specific measures in place to mitigate DDoS (distributed denial of service) attacks like the attack that caused the outage with our former DNS provider.

What this means for you is that you can count on your Shopify stores being available on a 24/7/365 basis as you expect (and deserve) them to be.

Thank you for weathering this bump on the ‘net with us, and we look forward to providing you with dependable ecommerce services for many years to come.

Continue reading