Granular economic outcomes and internet penetration

Our project also shows that IP activity data can be used to predict local economic activity as well as differences in sectoral productivity. This application reveals that, in a more aggregated form, highly granular IP activity data can be used to predict the outcomes of very complex human behavior and interactions. Our approach relates to a small but growing body of literature that uses other passively collected data to measure local economic activity [26, 27, 28, 29], and a recent study which uses an estimate of aggregate IP allocation at the subnational level to study digital ethnic favouritism[22].

We use data from 411 large regions from middle and high income countries for the years 2006-2012. The regions are defined by the OECD and normally correspond to the first subnational level (i.e. U.S. states or EU NUTS2 regions). A simple comparison of economic activity and internet penetration between different regions is likely to be confounded by a number of other factors that drive economic and internet activity simultaneously (i.e. technological development, culture, geography etc.). Instead, we apply a fixed effects estimator that exploits the time-series features of our data and compares changes in economic and internet activity within the region over time.

Our measure for economic output is the regional Gross Domestic Product (GDP) per capita (in logs) in a given year, the measure for sectoral productivity is the Gross Value Added (GVA) per worker in a given year, and our measure for internet activity is regional IP per capita in a given year. In our estimation approach we account for time-invariant differences in economic development and productivity in a region, shocks that are common to all regions in a country and year as well as region specific linear trends.

We find a positive correlation between GDP pc and IP pc (Figure 4(a)). The simple correlation coefficient without accounting for region-specific, country-year specific differences and region-specific time trends is 0.38 (see Table E, S1). Once we include those other covariates, the coefficient decreases to 0.08, suggesting that a 10% increase in IP pc is associated with a 0.8 % increase in GDP pc at the regional level. However, as Figure 4(b) makes clear, increased internet activity is not associated with uniformly positive impacts on all economic sectors within a region. Broadly speaking, we find that service sectors amenable to digital competition through outsourcing (e.g. publishing, news, film production, administrative support, education) have suffered with increasing local IP concentration, whilst location-constrained sectors have prospered from higher internet concentrations presumably due to lowered consumer search-costs and/or logistic and process ef- ficiency gains (e.g. wholesale, retail, real-estate, repairs, hairdressing, mining, transportation, accommodation) (see Table F, S1). It is important that the estimated effects on regional GDP and sectoral GVA, respectively, are only correlations and do not allow for a causal interpretation.

To our knowledge, the present study is the first of its kind to apply over a trillion online/offline activity observations of the entire internet to human behaviour. The data’s high level of spatial and temporal granularity paired with the passive way it is collected, makes IP data uniquely suited to analyse a wide spectrum of human behaviour and social interactions. As such, our work not only expands the data and methodological space of the quantitative social data-sciences but it provides a first glimpse of the potential of global internet activity to change profoundly the way research in this realm is conducted.