3/08/2010

Statistics for a changing world: Google Public Data Explorer in Labs


Last year, we released a public data search feature that enables people to quickly find useful statistics in search. More recently, we expanded this service to include information from the World Bank, such as population data for every region in the world. More and more public agencies, non-profits and other organizations are looking for ways to open up their data and expand global access to this kind of information. We want to help keep that momentum going, so today we're sharing a snapshot of some of the most popular public data search topics on Google. We're also launching the Google Public Data Explorer, an experimental visualization tool in Google Labs.

Popular public data topics on Google
We know people want to be able to find reliable data and statistics on a variety of subjects. But what kind of statistics are they looking for most? To help us better prioritize which data sets to include in our public data search feature, we've analyzed anonymous search logs to find patterns in the kinds of searches people are doing, similar to the patterns you can find on Google Trends and Insights for Search. Some public data providers have asked us to share what we've learned, so we decided to put together an approximate list of the 80 most popular data and statistics search topics.

You can read the complete list at this link (PDF), but here's the top 20 to get you started:

1. School comparisons
2. Unemployment
3. Population
4. Sales tax
5. Salaries
6. Exchange rates
7. Crime statistics
8. Health statistics (health conditions)
9. Disaster statistics
10. Gross Domestic Product (GDP)
11. Last names
12. Poverty
13. Oil price
14. Minimum wage
15. Consumer price index, inflation
16. Mortality
17. Cost of living
18. Election results
19. First names
20. Accidents, traffic violations

You'll notice some interesting entries in the list. For example, we were surprised by how many people search for data about popular first and last names. Perhaps people are trying to decide what to name a new baby boy or girl? As it turns out, people are interested in a wide range of statistical information.

To build the list, we looked at the aggregation of billions of queries people typed into Google search, using data from multiple sources, including Insights for Search, Google Trends and internal data tools — similar to what we do for our annual Zeitgeist. We combined search terms into groups, filtering out spam and repeats, to prepare a list reflecting the most popular public data topics. As a statistician, it's important for me to note that the data only covers one week's worth of searches in the U.S., so there could be seasonal and other confounding factors (perhaps there was an election that week). In addition, preparing a study like this requires a fair amount of manual grouping of similar queries into topics, which is fairly subjective and prone to human error. While imperfect, we still think the list is helpful to consider.

The Public Data Explorer
As you can see, people are interested in a wide variety of data and statistics, but this information is only useful if it's easy to access, understand and communicate. That's why today we're also releasing the Google Public Data Explorer in Labs, a new experimental product designed to help people comprehend data and statistics through rich visualizations. With the Data Explorer, you can mash up data using line graphs, bar graphs, maps and bubble charts. The visualizations are dynamic, so you can watch them move over time, change topics, highlight different entries and change the scale. Once you have a chart ready, you can easily share it with friends or even embed it on your own website or blog. We've embedded the following chart using the new feature as an example: