Should data be treated as a public good?

Richard’s Real Estate and Urban Economics Blog:

In April 2010, authorities in Israel began publishing on-line information about house transactions, and in October 2010, they launched a “user-friendly web site.”  (Details may be found in the paper).  The paper measures the change in measured price dispersion before and after the information was publicly available, and, at minimum, found reductions in dispersion of about 17 percent. The paper takes pains to make sure their result isn’t a function of some shock that happened simultaneously to the release of the information.  For example, they show that price dispersion fell less in neighborhoods with well-educated people.  This could either reflect that (1) well educated people were better informed about housing markets to begin with, and so got less benefit from the new information or (2) that a greater share of the residuals in well-educated neighborhoods comes from non-measured house characteristics.  In either event, the result is consistent with the idea that the information shock is what contributed to the decline in measured price dispersion.

So more information really does seem to produce a more efficient housing market.  The policy implication may be that data, in general, should be a public good.  Data meet half of Musgrave’s definition of a public good—they are non-rival (one person’s use of a data-set does not detract from another person’s use).  And while data are excludable (services such as CoreLogic show this to be true), their creation produces a classical fixed-cost marginal-cost problem.  The fixed cost of producing a good dataset is very large; once it is created, the marginal cost of providing the data to users is very low.  This suggests that the efficient price of data should be very low.

Currently, data services have something like natural monopolies, with long downward sloping average cost curves.  Theory says that this means they are setting prices such that marginal revenue equals marginal costs, instead of setting price equal to marginal cost.  All this implies that data are underprovided.  Danny and Roni’s work shows that this under-provision has meaningful consequences for the broader economy.