Introduction to Statistical Significance for Ecommerce

a bar graph: statistical significance

You’re thinking of changing your ecommerce pricing strategy based on a hunch that if you charge less for each product your sales volume will rise enough to boost your earnings. But you would like some data to support that hunch. So you decide to survey some of your customers and many say they would buy more at lower prices.

Now you’ve run into the following problem: How likely is it that these responses from a sample population of your customers apply to all your customers? The probability that it does is known as statistical significance, and it’s information that can help a business make informed decisions.

What is statistical significance?

Statistical significance refers to when an apparent pattern in a dataset is unlikely to be random chance or have occurred by chance. If you’re analyzing trends in your sales data, for instance, and you notice that your product is unusually popular with a certain demographic, you might go on to demonstrate that your finding is statistically significant to make sure it isn’t the result of a sampling error.

Statistical significance wouldn’t be needed if you had all the data, which is almost impossible to gather. For instance, it would be too costly and take too much time to ask the 170 million registered US voters how they planned to vote in a presidential election. However, a random sample of 1,000 voters should give a meaningful idea of how the rest of the population actually will vote. 

Statistical significance in ecommerce

Ecommerce companies can use significance testing when weighing decisions such as the introduction of new products or services, marketing strategies, or changes in operating processes.

For example, statistical significance testing could help an online company determine whether flex time for some employees instead of a fixed workweek improves productivity.

It decides to test the idea by randomly picking 100 of its salespeople to go on flex time (the independent variable), telling them they can work the days and hours that suit them best. Meanwhile, another 100 randomly selected workers, known as the control group, stay with a fixed Monday to Friday schedule.

After one month, the company analyzes the output of the 100 flex-time salespeople to see if there was any change in completed sales. If per-person sales (the dependent variable) by that group exceeded the production of the control group, the company could use statistical significance testing to determine the probability that the flex-time schedule was behind the increase in sales.

What is hypothesis testing?

Hypothesis testing is about determining if there is a relationship between two variables, called the dependent variable and independent variable. It begins with a hypothesis, which is the theory that the dependent variable is affected by the independent variable. For example, a researcher for an online seller of athletic wear hypothesizes that an increase in social media advertising (the independent variable) will generate more immediate purchases from clicks on those ads (the dependent variable).

The goal is to find out if a proposed explanation is acceptable, and with what level of confidence can it be accepted. A confidence level of 95% often is used as the threshold for establishing statistical significance. Scientific sample testing may set more stringent confidence levels, such as 99%, while studies in other fields such as business and economics may accept lower confidence levels.

A test that fails to produce a level of statistical significance means the researcher must accept the opposite hypothesis, which is called the null hypothesis, or that it was mere coincidence or chance behind the performance of the dependent variable.

7 steps to tests to establish statistical significance

  1. Determine what will be tested
  2. State your hypothesis
  3. Decide on a significance level, or p-value
  4. Pick the type and size of the sample
  5. Collect the data
  6. Calculate results
  7. Is the significance strong?

Establishing statistical significance requires the following seven steps.

1. Determine what will be tested

For instance, your online athletic-wear retail business wants to try a new advertising campaign on social media such as Facebook and Instagram, to see if it generates more product sales, compared with the advertisements now in use. You plan to run the new and current ads and then track the online behavior of shoppers.

2. State your hypothesis

This is your educated guess about what should happen. For instance, the hypothesis might be that the new ad will influence more people to make a purchase after viewing it.

3. Decide on a significance level, or p-value

This probability of error is called the p-value. If you set a p-value of 0.05, or 5%, this means the probability of the null hypothesis—that there’s no relationship between clicks on the new ad and increased or decreased purchases of your product—should not exceed 5%. Said another way, you want to be 95% confident that the two variables have a statistically significant relationship.

A less stringent test, for instance, might set a p-value of 0.1, or 10%, and a corresponding confidence level of 90%.

4. Pick the type and size of the sample

The online athletic-wear retailer sets up a three-day test of the existing online ad and the new ad, tracking a random sample of 500 online shoppers.

5. Collect the data

In this example, your athletic-wear retail business would track the number of views of each ad that converted into online product purchases.

6. Calculate the results

The analysis then turns to various statistical equations for things such as standard error and standard deviation, which are ways to measure or estimate the variability of sample data from a mean or median value. Various online calculators can crunch the numbers (manual calculation is time-consuming and complicated).

7. Is the significance strong?

Decide if the significance level is strong enough to support a decision. In the example above, if the results show a 95% probability that the new ad led customers to buy, then the retailer should adopt the new social-media ad campaign.

Limitations of statistical significance testing

Statistical significance testing isn’t exact and is subject to two important caveats.

First, the sample population should reflect the general population and be randomly selected; otherwise, it’s what researchers call a biased sample. For example, a survey of 1,000 likely voters in New York or California about a presidential election might be biased because Democratic voters heavily outnumber Republicans in those states, and the survey result couldn’t be accurately extrapolated to the rest of the country.

Second, a statistical test determines the probability, but not the certainty, of a relationship between the variables. A 95% confidence level means there is still a 5% probability the result is wrong. The researcher might incorrectly find statistical significance when there is none; such results are known as false positives. Alternatively, if a researcher finds no significant relationship when there is one, then that is a false negative.

Statistical significance FAQ

What is the difference between statistical significance and practical significance?

Statistical significance shows the probability of a relationship between variables, but not the magnitude of the relationship. Practical significance is about whether the independent variable’s effect on the dependent variable is big enough, in a real-world sense, to prompt a change, such as undertaking a new online ad campaign or implementing a new premium-pricing strategy.

What are Type I (false positive) and Type II (false negative) errors in statistical significance testing?

Researchers should be aware that statistical significance tests carry the risk of getting it wrong. One of two types of basic errors could occur. A Type 1 error is known as a false positive result—the test finds statistical significance when, in fact, there isn’t any. In this case, the researcher should have accepted the null hypothesis. A Type 2 error is a false negative—the test finds no statistically significant relationship between the variables, when there actually is one.

Can statistical significance be applied to all types of data?

Almost all types of data can be analyzed for statistical significance, whether political, scientific, medical, economic, or financial. The most common forms of data analyzed in significance testing are binary data, such as simple yes-no responses to polls, and discrete or count data, such as the number of clicks on an online advertisement. A third form, known as continuous data, is about a range of values rather than specific binary or count values. For example, a study of consumer spending at supermarkets might track it against the length of time shoppers are in the stores.

How does sample size affect statistical significance?

Sample size is a critical factor in obtaining a reliable result from significance testing. Generally, a larger sample improves the odds of determining a statistically significant result. A smaller sample has a higher risk of producing false-positive or false-negative results. Samples, however, can be too large. Tracking a larger sample takes more time and money, and the reliability may not improve.