Let's talk about selection bias and how it applies to recent economic data.
Selection bias is the phenomenon of potentially erroneous sampling. Yesterday, retail "same store sales" (SSS) numbers came out, up 9% year over year, blowing away expectations. Now, there is an inherent selection bias embedded in the SSS numbers - it only counts data from stores which have been open for at least a year. So, we have another embedded bias: survivorship bias: the data from stores which went out of business isn't counted. This morning I was thinking to myself, "Why on Earth would they want to use such a clearly flawed metric? Why not just use total retail sales?" Well, there are good reasons: most of the time, during normal, stable or growing economic periods, the same store sales numbers probable provide a much more smooth, accurate depiction of the economic situation. They don't get screwed up by volatile data from new stores - grand openings, and store number changes that result in apples-to-oranges comparisons. The goal is to get a consistent picture of the sales trends for each chain.
For an example of how flawed gross numbers can be, we need only look at yesterday's data on Las Vegas Strip revenue, which rose nearly 33% from the same period a year earlier. Of course, this analysis is pretty bizarre, since 2010 included data for the grand opening of one of the biggest projects in Vegas's history - MGM's CityCenter staple casino, Aria. Of course 2010 will be higher than 2009 - it's an apples to oranges comparison. To clarify, these Vegas numbers are NOT a "same store sales" metric - they are gross numbers, so a new casino will result in an increase in the numbers - all other factors held constant.
In times of store contraction, however, we get a similar problem with SSS, as a result of the survivorship bias. As MISH points out, 31 retailers filed for bankruptcy in 2009. The existing retailers also closed some stores. This has the substitution effect of potentially increasing sales at the remaining stores, even if overall sales decrease. MISH summarizes, "Supposedly retail sales are up 4 months in a row. They aren't. Same store sales may be, but that is a different matter."
In other words, imagine if KidDynamitesWorld sells widgets. I have 5 stores, each doing $200MM in revenue, for a total of $1B. If, for the next sampling period, I have to close 2 of my stores due to the recession, and my remaining 3 stores now do $300MM in revenues each, my total sales are down 10% (from $1B to $900MM) but same store sales are up a whopping 50% (from $200MM to $300MM). We don't count the stores that go out of business - we have survivorship bias. The same store sales numbers, in this example, offer a very flawed look through the window of my company's true health.
The Census bureau's advance monthly sales for retail and food services for February, the most recent period available, shows that sales were up 3.9% from February of 2009. So we can see that the situation does seem to be "improving," but I think it's essential to look at the potential flaws and exaggerations that can result from selection bias in the data, such as the same store sales data.
Of course, there's another simple bias in both the Vegas Strip Revenue numbers, and the Retail Same Store Sales numbers - calendar bias. The Vegas numbers received a boost because Chinese New Year fell in February this year, while it was in January last year (note: the article I linked to above erroneously says it was in March last year). Similarly, the SSS data received a boost this year because Easter fell at the beginning of April, which resulted in most Easter spending being captured in the March data that was just reported, while last year Easter spending was largely captured in the April data.
When looking at economic data, it's essential to always be cognizant of biases in the data which can result in "apples to oranges" comparisons.
disclosure: no position in retailers, although I'm looking for a point to short XRT. I am short MGM and LVS equity. In fact, I shorted more MGM on the 10% rally reacting to this data, which I think was vastly misinterpreted.