Summary: 25 large-scale experiments with over 2 million households in co-operation with a large retailer, a stockbroker, and Yahoo show that it is extremely difficult to identify advertising ROI. Why? Because there is such massive cross-sectional variation across households in spending; plus huge variation in spending by the same household across time; and huge variation in purchase timing (sometimes a household buys twice in two weeks, sometimes only once in a year or not at all) – this massive variation makes it almost impossible to distinguish advertising effects from random noise. This is the case even when one has individual household data that matches both ad exposure and purchasing.
The market context
Here is some of the detail of the market and advertising context. I have simplified & explained the detail of the spending that is in the original study, to make it clearer to readers of this post. The context is a retailer, that can target ads to specific households. Furthermore it knows the exact amount of money those hosueholds spend with it. The study also uses a stockbroking firm that advertises to consumers with more or less the same sort of figures as below.
Typical scenario: An advertiser is going to hit households with approximately 35 display ads in a 2-week campaign period. The advertised product has a gross margin of 50%. The cost of the advertising per household is 14 cents (based on a price of $4 per thousand people, per ad = .4 cents per ad x 35 = 14 cents) for the campaign.
The average sales per consumer is $7 in the time period, but the standard deviation (i.e. the variation in sales across people) is $75. This means households vary from $0 to hundreds of dollars in purchases over the expected duration of the campaign.
Next, the advertising ROI goal is a 25% ROI. Spending 14 cents per person and getting a 25% ROI means the goal is to generate 14 x 1.25 = 17.5 cents profit per exposed household. In turn, this means we need to generate 35 cents extra sales revenue per household, on average. This figure comes from the assumption of a 50% gross margin, since 50% margin on 35 cents of sales is 17.5 cents, which is in turn 25% larger than the 14 cents we spend hitting each household with ads.
The advertiser then selects a control group which will not see the advertising, and a treatment group that will. And remember, it’s not as if the control group is quiet / stable in the campaign – an awful lot of unexposed households will buy the product.
So this selection of control and treatment groups, and hitting the treatment households with ads is what the researchers did, in 25 different experiments, with variations on these basic figures. To verify if a campaign reached its 25% ROI, though, they had to detect an average difference of 35 cents or more per household in sales, between a treatment and control group, when average sales per household are $7 and the standard deviation in sales across households is $75. This is just too small a difference to detect, given the massive variation in household spending.
Conclusion
Rao and Lewis concluded even with treatment groups of 200,000 consumers, these real-world experiments were quite underpowered to be able to reasonably verify if the campaigns reached the ROI target, or even if they had any effect at all. You might think, maybe it was because the target ROI was quite small – what if it was 50%, not 25%? The answer is it would not make much difference. It would mean finding a sales difference of 40 cents per household rather than 35, again with an average sales level of $7 and a standard deviation of $75. Again, the advertising effect on sales would be trivially small compared to the natural variation that occurs.
Management take-out
So the take-out is, it’s actually very difficult to measure advertising ROI – even with carefully controlled experiments – due to massive variation in baseline spending levels across households. Or as the researchers concluded themselves, “We find that even when ad delivery and consumer purchases can be measured at the individual level, linked across purchasing domains, and randomized to ensure exogenous exposure, forming reliable estimates on the returns to advertising is exceedingly difficult, even with millions of observations”.
Based on this very large study, managers should be very cautious about promises from vendors to calculate their ROI, or that guarantee a certain ROI on advertising.
The full study (lots of maths if you like that sort of thing) is at http://justinmrao.com/lewis_rao_nearimpossibility.pdf
Thanks so much, John, for this insightful summary of Lewis and Rao (2003), which reviewers have brought up in my research for the last 5 years. For senior / brand managers allocating advertising resources (eg over regions, products or media), I’d like to point out this is an individual consumer level analysis of digital advertising and that a key reason for the uncertainty in determining the exact ROI is the selection effect, i.e. consumer self-exposure to the site that hosts the ads (see their quote from page 3 below)
For brand executives, here is a typical scenario:
1. I increase my ad spending for my marketing modeling services on your blog by 100%, and managers who read your blog buy 20% more from me than the control group.
2. Lewis and Rao (2003) now argue that we can’t tell which of this is incremental for any given manager, because only managers that are already heavily interested in great market science read your blog (the selection effect).
3. I don’t care because (a) I only want to know whether the ads gave me incremental business overall, not for any given manager, and (b) the selection effect only matters to me if those buying and blog-reading managers would have come and bought from me without my ad on your blog. If that is the case, my ROI is indeed -100% and I should stop advertising on your blog. This is exactly what Blake, Nosko and Tadelis (2013) find for eBay’s paid search advertising. If that is not the case to a large extent (which we can quantify beyond reasonable doubt with aggregate modeling), I should continue advertising on your blog as long as the marginal ROI exceeds alternative advertising options. Indeed, we show that paid search does not have great returns for well-known brands offering search goods (such as eBay), but a very high one for lesser known brands for products needed urgently, such as refrigerators or office furniture (http://eprints.lancs.ac.uk/78859/1/Impact_of_Brand_Familiarity_on_Online_and_Offline_Media_Synergy.pdf)
So I would argue that their study is very relevant for determining a precise effect/ROI for 1 ad targeted at an individual consumer controlling for selection effects (it’s basically impossible) but not so much for executives allocating resources based on advertising ROI by region, product and/or media (e.g. TV vs digital) based on aggregates of consumers.
As a marketer, judge for yourself whether you would be happy or not with their scenario on page 3: ‘while the true causal effect should be relatively small, selection effects are expected to be quite large. Consider a simple example: if an ad costs 0.5 cents per delivery, each viewer sees one ad, and the marginal profit per conversion is $30, then only 1 in 6,000 people need to be “converted” by the ad to break even. Suppose targeting individual has a 10% higher baseline purchase probability (indeed this is a very weak form of targeting), then the selection effect is 600 times larger than the causal effect of the ad.’
Mean = 7, SD = 75? So these statistics are meaningless, because they imply that approximately 67% of sales have the value of -$67 to $82. Prime example of using median and IQR, which have much more intuitive meanings than mean and SD!
Now that I’ve read the paper a bit more, I’d like to echo Koen’s points. The paper is irrelevant academic hogwash (“a continuous concave function of per user expenditure …”) written by people with an axe to grind, that focuses on person-level effects, which no manager with more than a few dozen customers cares about. They report aggregates to their superiors!
Even worse, they use classical statistics to work out sample sizes for detecting minimum effect sizes, which are based on several assumptions, not least of which is that mean & SD are sufficient statistics for the distribution of the variable to be measured. But plainly they are not. So every aspect of their analysis based on that assumption is flawed.
Sadly, even though this paper was co-written by some dude from Google, who are known for hiring very intelligent people indeed, it seems plain that it’s utterly irrelevant to managers.
Even after saying all that, I still think it’s worthwhile to question our fundamental assumptions. I think every manager knows (a) there’s no magic formula for advertising success; and (b) it’s really, really hard to measure advertising ROI in any but the most crude fashion. But just because something is hard doesn’t imply that we should give up.
Thanks John & Koen,
John