Split Testing for Shopify: How to Run Valid Tests

Split testing is a way of comparing two versions of a web page, email, push notification, app, or other product to determine which is more successful.

by Shopify Staff11 Jan 2023 • 10 min read

When your business thrives, you might go months or years without updating your website, storefront, or advertising content—the impulse to not change a winning team is real. However, when sales start ticking down, you may want to revamp your messaging to boost your brand.

Some tweaks are better than others. Marketing decisions based on gut feelings or personal preference can produce positive results, but they’re no sure bet. One of the best ways to experiment with different messages is to run split tests—trying two different versions of an ad or web page copy to see which version nets better results.

Successful split testing can boost your bottom line and lead to long-term business growth. Here’s how to get started.

What is split testing?

Split testing, or A/B testing, compares two versions of a webpage or app to determine which drives more conversions. To implement it, create two content variations and analyze user engagement and sales data. This method helps optimize marketing strategies for improved results.

For example, you can divide a sample group of users into two segments and show each group a different version of an ad campaign. You would then measure the performance of each version to choose which to use for broader implementation.

Split testing can help optimize your business messaging by showing you which design elements, wording, or other factors are most effective in driving conversions. The standard split testing methodology starts with identifying a message to test, including:

Advertising copy. When launching a marketing campaign, you can draft two versions of an ad and run a split test to see which test version gets more clicks.
Landing pages. A landing page is the first web page visitors see when they arrive on your website or ecommerce store. Most landing pages direct visitors to take action, such as making a purchase or filling out a form. You can A/B test two landing pages to see which version converts more customers.
Page elements. Rather than split test two entirely different web pages, you can change individual components of a page or ecommerce store. For instance, you can try two versions of an on-screen button: one that says Checkout and another that says Buy Now. Then, using quantitative data such as link clicks or sales conversions, determine which leads to more sales.
Email subject lines. You can run split tests to try different subject lines if you send out marketing newsletters, as many small businesses do. Then, measure the open rate and click-through rates to determine which subject line performs better.
Push notifications. If your business has an app, split test the push notifications you send to users through the app, or run a split test to see whether you should send out push notifications in the first place. If you notice an increase in conversions based on the notifications or on the different wording of each notification, they’re likely effective. If many users opt out of receiving notifications, it may be a sign messages are not relevant or valuable to them.
Call to action. Calls to action (CTAs) are the buttons or links in marketing emails. These are essential conversion rate optimization elements because they encourage recipients to take a specific action—such as making a purchase, joining a loyalty program, or signing up for a newsletter.

Why should you run a split test?

You can learn much from split testing, provided you collect data from your experiments and objectively analyze it.

With split testing, where you test versions of an ad or web page, you gather empirical data about which message performs better with your audience and tailor your marketing efforts accordingly. Without split testing, you may just rely on gut instinct or the subjective opinions of close friends as you prepare for a big ad spend—not empirical data.

Some ad platforms come with free split testing tools. For example, Google Ads offers split testing tools through Google Analytics at no extra charge. However, these dedicated tools aren’t required. For instance, if you’re sending out a marketing email through Hubspot or Shopify Messaging, you can send one message to half your list and a different message to the other. Your split test results might show you which email subject line leads to higher open rates.

How to design and run a split test

Choose the element you want to test
Determine your success metrics
Create your A and B versions for testing
Determine your sample size
Determine a minimum detectable effect
Establish a threshold for statistical significance
Launch your campaign
Analyze the results
Apply your findings

To run a successful split test, you need a clear goal (What are you trying to find out?), an adequate sample size, and an objective way to analyze the data collected. Here’s how you might run your first split test as an ecommerce entrepreneur:

1. Choose the element you want to test

With split testing, you change one variable at a time. Otherwise, you can’t be sure what works or doesn’t work. Decide what you want to test. It might be a call to action in an email, text of a Google ad, or promotion on your website’s landing page.

2. Determine your success metrics

What’s your goal? Choose the right metrics based on your business goals and the specific goals of your split test. You might want more visitors or social media followers, a higher conversion rate, increased engagement, or more sales.

3. Create your A and B versions for testing

Next, you want to create two versions of the element you want to test—an ad, web page, or email. Make sure you’re only changing one variable at a time. In other words, the two versions should be as similar as possible, except for the element you’re testing. You can run multiple tests to experiment with additional variables down the line.

4. Determine your sample size

Specify the percentage of your audience that receives each version. If you have a fixed audience (like an email list), you create a sample size by sending one message to half the recipients and a different message to the other half. If you have an open-ended audience (like the number of people visiting your website), you may base your sample size on the number of days you run a particular site version. Regardless, you want both versions to reach the same number of people or run for the same number of days.

5. Determine a minimum detectable effect

You must ensure that the data you’re getting isn’t based on chance. To do so, establish the minimum detectable effect (MDE) and the statistical significance for your data set. In plain English, the minimum detectable effect is the amount of change you’re looking for. For instance, you might run a split test on an email campaign where you’re looking for 20% more entries on a signup form. If you only see a 5% shift, or perhaps equal traffic from both versions of the email, you haven’t reached your MDE.

6. Establish a threshold for statistical significance

Statistical significance is a trickier topic and one most split testing tools, like Google Analytics, can handle for you. In basic terms, statistical significance is tied to sample size. Random chance may play an outsized role in your results if you have a tiny sample size.

For instance, if you A/B test two sales promotions and only 10 people see each advertisement, you may get some “false positives.” This might be because some of your positive respondents were looking to spend money no matter what, and the actual promotion did not affect their decision. When you broaden your sample size, you get a more representative portrait of the public.

As Peep Laja, founder of CXL, puts it: “Statistical significance does not equal validity—it’s not a stopping rule. … In most cases, you’ll want to run your tests two, three, or four weeks, depending on how fast you can get the needed sample.”

7. Launch your campaign

With your split test setup, you’re now ready to launch your campaign, which means showing two versions to equal numbers of people. Sometimes, you can run a split test on your own, such as sending half the members of your email list one message and the other half a different message.

However, in most cases, your job is much easier if you use a split testing tool. The Shopify App Store has several split testing tools. (Most are free or at a low cost.) You can also opt for tools like Google Optimize, Optimizely, and VWO. You want to let your campaign run for an extended period—usually weeks—to achieve more statistical significance.

8. Analyze the results

Once your split test has run for a sufficient amount of time or been exposed to enough people, it’s time to analyze the results to see which version performed better. Go back to the success metric you started with and track how it’s changed. See if the changes you observe meet your minimum detectable effect, if you determine one.

Remember the results of a single split test may be inconclusive—you may want to run additional tests to validate your findings.

9. Apply your findings

If your split test is successful, you should have a clear sense of which version worked best, and you can use this messaging to make changes to your marketing strategy to improve your results. However, bear in mind that split testing doesn’t always produce concrete results and may not be suitable for some businesses.

Split testing vs. multivariate testing

Multivariate testing is a close cousin of split testing. The difference is a split test, or A/B test, isolates a single variable and tests two versions, whereas a multivariate test changes multiple variables simultaneously. In other words, multivariate testing involves comparing multiple versions of a digital asset (a web page, app, email) at the same time.

What are the similarities?

Split testing and multivariate testing involve changing messaging elements, i.e., a website, email, ecommerce page, sales promotion, and other marketing communications. Both help determine which version resonates most with viewers and leads to more favorable customer behavior.

What are the differences?

Split testing, or A/B testing, only changes one element at a time to observe how this single change impacts customer behavior. For instance, a split test may try pitching a sale in two ways: “Buy One, Get One 50% Off” or “Save 25% When You Buy Two.” These pitches represent the same sale, but the phrasing may yield different results.

Multivariate testing changes multiple elements to compare various versions within the same test. You might use two versions of the sales language while also changing other aspects. For instance, you might try two versions of a web page announcing the sale (blue background vs. orange background) along with two different sale expiration dates (expires tomorrow versus expires at the end of the month). Using software tools to determine the winning combo, you can mix and match these various elements.

Split Testing FAQ

How does split testing work?

Split testing works by isolating one element of a marketing message, such as ad copy, a call to action, or a landing page layout, and creating two versions of it. Each version is shown to an equal number of people over a fixed period, then results are compared against a chosen metric, like clicks or conversions, to see which one performs better.

Why is split testing important?

Split testing matters because it replaces guesswork with objective, empirical data about how customers actually respond to a message. Businesses can use these results to identify the most effective version of an ad, page, or email and make decisions based on measured behavior rather than gut instinct or personal opinion.

What are the limitations of split testing?

Split testing only produces useful results when the data is statistically significant, meaning it reflects real audience behavior rather than chance. A test run on too few subjects or for too short a time may return misleading results, so a longer runtime and a larger audience generally lead to more reliable conclusions.

How long should a split test run?

A split test usually needs to run for two to four weeks, depending on how quickly enough data can be collected to reach statistical significance. As CXL founder Peep Laja explains, "Statistical significance does not equal validity—it's not a stopping rule," so test length should be based on sample size rather than a fixed calendar deadline.

What are some split testing best practices?

Reliable split tests follow three core practices: isolate only one variable at a time, reach enough people to rule out random chance, and let the test run long enough, usually several weeks, to gather a representative sample. Skipping any of these steps increases the risk of drawing conclusions from unreliable data.

What sample size do I need for a split test?

There's no fixed sample size that works for every split test since it depends on baseline conversion rate, traffic volume, and the size of the change being measured. A useful approach is to keep both versions running to the same number of visitors or the same number of days, then check whether the minimum detectable effect has been reached before ending the test.

by Shopify StaffPublished on 11 Jan 2023

Split Testing for Shopify: How to Run Valid Tests

popular posts

The newsletter for entrepreneurs

Sell anywhere with Shopify