Friday, April 09, 2010

Selection Bias In Economic Data

Let's talk about selection bias and how it applies to recent economic data.

Selection bias is the phenomenon of potentially erroneous sampling.  Yesterday, retail "same store sales" (SSS) numbers came out, up 9% year over year,  blowing away expectations.  Now, there is an inherent selection bias embedded in the SSS numbers - it only counts data from stores which have been open for at least a year.  So, we have another embedded bias:  survivorship bias:  the data from stores which went out of business isn't counted.  This morning I was thinking to myself, "Why on Earth would they want to use such a clearly flawed metric?  Why not just use total retail sales?"  Well, there are good reasons:  most of the time, during normal, stable or growing economic periods, the same store sales numbers probable provide a much more smooth, accurate depiction of the economic situation.  They don't get screwed up by volatile data from new stores - grand openings, and store number changes that result in apples-to-oranges comparisons.  The goal is to get a consistent picture of the sales trends for each chain. 

For an example of how flawed gross numbers can be, we need only look at yesterday's data on Las Vegas Strip revenue, which rose nearly 33% from the same period a year earlier.  Of course, this analysis is pretty bizarre, since 2010 included data for the grand opening of one of the biggest projects in Vegas's history - MGM's CityCenter staple casino, Aria.  Of course 2010 will be higher than 2009 - it's an apples to oranges comparison.  To clarify, these Vegas numbers are NOT a "same store sales" metric - they are gross numbers, so a new casino will result in an increase in the numbers - all other factors held constant.

In times of store contraction, however, we get a similar problem with SSS, as a result of the survivorship bias.  As MISH points out, 31 retailers filed for bankruptcy in 2009.  The existing retailers also closed some stores.  This has the substitution effect of potentially increasing sales at the remaining stores, even if overall sales decrease.  MISH summarizes, "Supposedly retail sales are up 4 months in a row. They aren't. Same store sales may be, but that is a different matter."

In other words, imagine if KidDynamitesWorld sells widgets.  I have 5 stores, each doing $200MM in revenue, for a total of $1B.  If, for the next sampling period, I have to close 2 of my stores due to the recession, and my remaining 3 stores now do $300MM in revenues each, my total sales are down 10% (from $1B to $900MM) but same store sales are up a whopping 50% (from $200MM to $300MM).  We don't count the stores that go out of business - we have survivorship bias.  The same store sales numbers, in this example,  offer a very flawed look through the window of my company's true health.

The Census bureau's advance monthly sales for retail and food services for February, the most recent period available, shows that sales were up 3.9% from February of 2009.  So we can see that the situation does seem to be "improving," but I think it's essential to look at the potential flaws and exaggerations that can result from selection bias in the data, such as the same store sales data.

Of course, there's another simple bias in both the Vegas Strip Revenue numbers, and the Retail Same Store Sales numbers - calendar bias.  The Vegas numbers received a boost because Chinese New Year fell in February this year, while it was in January last year (note: the article I linked to above erroneously says it was in March last year).  Similarly, the SSS data received a boost this year because Easter fell at the beginning of April, which resulted in most Easter spending being captured in the March data that was just reported, while last year Easter spending was largely captured in the April data.

When looking at economic data, it's essential to always be cognizant of biases in the data which can result in "apples to oranges" comparisons.


disclosure: no position in retailers, although I'm looking for a point to short XRT.  I am short MGM and LVS equity. In fact, I shorted more MGM on the 10% rally reacting to this data, which I think was vastly misinterpreted.


scharfy said...

Really good post.

Its the interpretation of the raw data that separates the men from the boys.

EconomicDisconnect said...

you are obviously blinded by your extreme bias and are fighting the tape all the way. Just admit things are getting truly amazing out there and get behind the government backed new world. Even though though I wrote a book about how bailouts were terrible, I was sharp enough to deploy all my cash at the March bottom and buy all the worst stuff to make a killing, which like I said in my book, was a huge moral hazard that must be stopped. Please learn the true way to profits, which will be out in my next book.
Barry Ritholtz
Sarcasm on HIGH

Kid Dynamite said...

GYC - I don't think that's Barry's message. I think his point was that although all of may seem insane, it's still tradable - from the BULL side, not the bear side. I am absolutely fighting the tape, because i refuse to be the last guy holding the grenade when it explodes - and because of my "prudence" i'm paying... it sucks. i feel like a smart idiot. or a stupid smart guy. i don't know.

i don't know if you read this recent thread of his:

but i corrected someone in the comments who said "Retail sales were up 9%"... they retorted that actual retail sales were up more than 11%... and i looked up the data, where i found something confirming that:

so the total numbers are actually LARGER than the same store numbers - which implies what - that there were store OPENINGS during the period? i simply CANNOT understand that. it's AMAZING. job cuts, cost cutting, great recession... yet chains are OPENING NEW STORES ????


EconomicDisconnect said...

working on a post on this right now. The whole series of posts there today just seemed a little weird to me. I understand the premise; crazy times = tradeable but seems part of the problem to point out all the hazards of bailouts/backstops then go out and join hands with the people making it happen to make a buck.

Mike K said...


For whatever reason, a bad methodology, SSS, has become a gold standard for evaluating the trend in retail sales.

I see at least 3 ways to make the "tool" better: 1) look at total sales, although this is probably not feasible in real time; 2) normalize the SSS by a sales weighted percentage of change in store number (to use your example of your 5 to 3 stores, your SSS would be reported as 60% of the reported SSS; 3) use state sales tax receipts, although not all states (go DE & NH) have a tax.

Do most states report monthly sales tax revenue? If the retail sector were really getting, you'd expect states to be reporting higher revenues compared to prior projections.