U.S. flag

An official website of the United States government

Was this page helpful?

Data tutorials

Please note that due to the retirement of the Commerce Data Service, these data tutorials are no longer actively maintained and are presented as-is for public use. Over time, the links and code in the tutorials may break or become outdated. When possible, broken links have been updated to an archive.org snapshot.

With tens of thousands of datasets ranging from satellite imagery to material standards to demographic surveys, the U.S. Department of Commerce has long been in the business of Open Data. Through the Commerce Data Usability Project, go on a series of guided tours through the Commerce data lake and learn how you can leverage this free and open data to unlock the possible.

Impact of Spending on Traffic Congestion (link is external)  by Deloitte and Datawheel (June 2016)

Description:  Everyone hates traffic, and for most of us, it seems to be getting worse with every passing year. But how does traffic congestion in our home town actually compare to other cities across the country, and what is the government doing about it? These are complicated questions, but we can begin to answer them by comparing data on commuting times with data on public spending on congestion relief projects. The resulting mashup can enable us to see the extent to which government spending through formula grants and related programs from the Department of Transportation aligns with high-congestion regions. Data on commute times is collected by the Census Bureau's American Community Survey each year, and we'll access these data via Data USA's convenient API. Spending data is provided by the Treasury Department on USASpending.gov. In this tutorial, we'll show you how to build a simple web visualization which mashes up these two datasets and allows users to start to get answers to questions about congestion mitigation spending.

Authors:  Peter Viechnicki (Deloitte Services LP (link is external) ), Zach Whitman (Deloitte Advisory LP (link is external) ), Jonathan Speiser and Dave Landry (Datawheel (link is external) )

Tools:  JavaScript, Python, D3.js, D3plus, leaflet 

Data:  Data USA (link is external) USASpending.gov

Tough Crowd? A Deep Dive into Business Dynamics (link is external)  by DataScience Inc (September 2016)

Description:  Every year, thousands of entrepreneurs launch startups, aiming to make it big. This journey and the perils of failure have been interrogated from many angles, from making risky decisions to start the next iconic business to the demands of having your own startup. However, while the startup survival has been written about, how do these survival rates shake out when we look at empirical evidence? As it turns out, the U.S. Census Bureau collects data on business dynamics that can be used for survival analysis of firms and jobs. In this tutorial, we build a series of functions in Python to better understand business survival across the United States. By comparing survival rates in various Metropolitan Statistical Areas (MSAs), we find regions that may fare far better in business survival than others.

Authors:  Guo Liu and Dave Goodsmith (DataScienceInc (link is external) )

Tools:  Python, Plotly 

Data:  Census Business Dynamics API 

Examining the Government-Constituent Relationship. With data. And trees. (link is external) by Columbia University (July 2016)

Description:  In 2012, Hurricane Sandy inflicted tens of billions of dollars in losses, destroying or damaging thousands of homes and vehicles and bringing the Northeast US to a grinding halt. A lesser cited impact was the storm’s toll on the city’s urban forestry. Sandy knocked down or seriously damaged over 20,000 street trees in New York City (NYC) alone. In the hours and days that followed, thousands of New Yorkers contacted local government via the City of New York’s 311 system asking for tree service to remove downed trees from cars, buildings and sidewalks. Devastating storm events are natural experiments – i.e., well-defined, independent, observable events. Immediately following a storm, it is highly likely that tree damage reported to 311 was generated from that event. To explore this paradigm, in 2014, Jonathan Auerbach, from Columbia University’s Statistics Department, and Christopher Eshleman, recently graduated from Columbia’s School of International and Public Affairs, set out to analyze the underlying demand patterns generated by a series of storm events which recently hit New York City. They used a combination of data from the City of New York’s open data portal and the US Census Bureau, and worked closely with the City’s Parks Department and NYC311. With this data they produced a series of statistical analyses that helped to identify communities in the City that had a higher propensity to report storm damage. Through the Commerce Data Usability Project and in collaboration with the Commerce Data Service, we have presented Auerbach and Eshleman’s work as a two-part R tutorial focused on extracting insights from spatial data utilizing recent developments in Bayesian statistics. Part one focuses on the fundamentals of processing geospatial data. Part two, soon to be released, focuses on hard statistical research that enables data-driven policy and operations.

Authors:  Jonathan Auerbach (Columbia University Dept of Statistics), Christopher Eshleman (Columbia University SIPA)

Tools:  R

Data:  NYC 311 (Pre-2010) (link is external) NYC 311 (Post-2010) (link is external) Census Tract Shapefile (downloads .zip)NYC Parks Shapefile (downloads .zip) (link is external)

Census Schooling Data with PowerBI, Part 1 (link is external) by Microsoft (April 2016)

Description:  Data analysis using open government data has taken the center stage for everything from community organizing to starting or growing a business to crafting good public policy. Many government entities - federal, state, and local - have made open data a cornerstone of their transparency initiatives. But opening data to the public is just the first step to driving knowledge and insights in using that data. People need easy to use tools to help them discover, combine, analyze, and visualize that data. This three part tutorial will cover how to use the free version of the Power BI tool to ask questions using open federal and local data sources. 

Author:  Adam Hecthman (Microsoft Chicago (link is external) )

Tools:  PowerBI

Data:  2010 Census of Governments 

Exploring the landscape of American Innovation (link is external) by the U.S. Patent and Trademark Office (April 2016)

Description:  Among the vast set of data that the USPTO provides, the Patent Technology Monitoring Team (PTMT) produces aggregations of the underlying patent data, along multiple, different dimensions and partitions. These aggregations answer a host of questions for us, including questions related to trends in technology and innovation in any given sector. As part of the Commerce Data Usability Project and in collaboration with the Commerce Data Service, the USPTO created a tutorial introducing Tableau as a way to explore USPTO data. In essence, Tableau makes it easy to develop visualizations in order to quickly identify the changes in data. A series of visualizations powered by Tableau and USPTO data, for example, reveals new insights into how innovation varies across the United States, state by state. In California more patents are granted for technology innovations, but in Alaska, more patents are granted for oil drilling. 

Author:  Christopher Berman (USPTO)

Tools:  Python, Tableau

Data:  Patent Trademark Monitoring Team Data Aggregations (PTMT) 

Living Data for Web Maps, Part 2: Animating the atmopshere (link is external) by Mapbox

Description:  Building upon the first tutorial (link is external)  in a two part series, Mapbox demonstrates how to animate Precipitable Water (PWAT) data from NOAA's global weather forecasts. This tutorial brings the data to life, showing the fluidity of an atmospheric variable that is crucial for understanding the environment. In collaboration with the Commerce Data Service, Mapbox has created this first installment in a two part tutorial that will guide you though processing and visualizing weather data.

Authors:  Damon Burgett and Charlie Lloyd (Mapbox (link is external) )

Tools:  bash, GDAL (link is external) rasterio (link is external) gribdoctor (link is external) Mapbox Studio (link is external)

Data:  NOAA National Operational Model Archive and Distribution System (NOMADS) 

This house(ing market) is on fire! Using American Community Survey and Zillow data to explore housing affordability for protective service workers (link is external) by Zillow (March 2016)

Description:  When it comes to buying a house, there is a lot more to consider than just the sticker price. Wages and salaries vary widely across the country, including within specific occupations, as does the share of household income local residents are accustomed to dedicating to housing costs each month. At one extreme, some places with low housing costs might appear to be very affordable, but incomes might also be much lower than elsewhere; at the other extreme, some places that appear to be extremely expensive when looking at prices, might be more manageable as a result of relatively high wages and salaries. By combining Zillow data on home values, rents, and historical housing-cost burdens with data on incomes by occupation from the United States Census Bureau's American Community Survey, we can compare housing affordability for specific types of workers in different communities across the country. In this example, we estimate the share of a household's income that goes to a monthly mortgage payment on the median home across the country's metro areas. 

Author:  Aaron Terrazas (Zillow (link is external) )

Tools:  R

Data:  U.S. Census Bureau's American Community Survey

Environmental Data for Decision Making: Using open data for business investment decisions (link is external) by Earth Genome

Description:  Open environmental data can be viewed as infrastructure for society: the data is available to the public and can be meaningfully used to understand nature-based solutions to environmental issues. Earth Genome, an environmental nonprofit, has built an Green Infrastructure Support Tool (GIST) on top of open environmental data to help businesses understand investment decisions involving "green" infrastructure. Among the data that drives this tool is NOAA’s topography data — a fundamental element in their wetlands restoration model for informing industrial investment decisions that support not only bottom line for manufacturing but the environment. This data is critical for everything from hydrology, risk management, construction planning, and telecommunications. In this tutorial, Earth Genome and the Commerce Data Service demonstrate how to manipulate and visualize topography data. 

Authors:  Dan Hammer (Earth Genome (link is external) ), Jeff Chen (U.S. Department of Commerce)

Tools:  R, Plotly

Data:  NOAA GLOBE Digital Elevation Model (NOMADS) 

Weather Data for Web Maps, Part 1: Using Mapbox and Open Source tools to map atmospheric water (link is external) by Mapbox

Description:  Precipitable Water (Pwat) helps meteorologists understand the available moisture - fuel for storms - in the atmosphere. When visualized, the complex swirling and eddying patterns bring alive atmospheric processes, and are a beautiful liquid analog to the more esoteric variable that they describe. Combining this data with reference information - coastlines, political borders, and terrain - helps to paint a clearer picture of earth surface and atmospheric interactions on our planet. In collaboration with the Commerce Data Service, Mapbox has created this first installment in a two part tutorial that will guide you though processing and visualizing weather data.

Author:  Damon Burgett (Mapbox (link is external) )

Tools:  bash, GDAL (link is external) rasterio (link is external) gribdoctor (link is external) Mapbox Studio (link is external)

Data:  NOAA National Operational Model Archive and Distribution System (NOMADS) 

Is there enough fire in your firewall? Using NIST's National Vulnerabilities Database to understand cybersecurity  (link is external) (archive snapshot)

Description:  That's the question Americans ask themselves daily. What is a good tradeoff between convenience and security when everything we use seems to require constant updates? How often are the vulnerabilities being discovered, exploited, or patched in the software that powers our lives? How much are we putting ourselves at risk when we ignore our updates? This tutorial uses cybersecurity data from the National Institute of Standards and Technology (NIST) to uncover vulnerability patterns across the tech world.

Tools:  Python, D3.js, Dimple.js, dygraphs, Gephi 

Data:  NIST's National Vulnerabilities Database (NVD) 

Acknowledgements: This work was made possible with the technical review conducted by NIST (NVD).

Data Science with Nighttime Lights, Part 1: Processing satellite imagery for demographic analysis  (link is external)

Description:  Every time a satellite passes over is an opportunity to improve our understanding of society in near real time. The various imagery captures different bandwidths of light, enabling a wide range of scientific and operational capabilities. In particular, nighttime satellite data can be transformed into approximations of economic activity as well as demographic patterns. In this first part in a multiple part series, we will illustrate how to process monthly satellite imagery composites from NOAA to correlate with population data from the U.S. Census Bureau. 

Authors: Jeff Chen and Star Ying (Commerce Data Service)

Languages: R, Plotly 

Data:  VIIRS Day/Night Band (DNB) Cloud Free Monthly Composites (VIIRS DNB), US Census Bureau County Shapefiles (TIGER) 

Acknowledgements:  This work was inspired and developed in collaboration with NOAA's National Centers for Environmental Information (NCEI).

Focusing resources through survey data: Using the American Community Survey to help nonprofit missions (link is external) (February 2016)

Description:  Providing an unprecedented glimpse of societal well-being, the American Community Survey (ACS) is one of the highest value datasets produced by the U.S. Census Bureau. This dataset enables socioeconomic research, informs urban planning efforts among others, but it can also used for applied efforts to help non-profits focus their outreach to provide public services. This tutorial illustrates how ACS data can be used to identify areas that may benefit from services. 

Author:  Star Ying (Commerce Data Service)

Tools:  Python, D3.js, Leaflet.js, Leaflet-omnivore.js, TopoJSON.js, GeoJSON-VT.js and Dimple.js 

Data:  ACS estimates (Tract and Block GroupLevel) (downloads .tar.gz)ACS Geography Files (downloads .zip)Census tract shapefiles

Goodness gracious, great balls of hail: Using NOAA's severe weather data for risk analysis (link is external) (archive snapshot)

Description:  For some Americans, this is a real weather phenomenon that risks the physical well-being of people and property. But how often does it happen? Where do these events normally happen? As it turns out, the National Oceanic and Atmospheric Administration (NOAA) collects massive amounts of data on precipitation using radar stations across the country. 

Authors:  Stephen Del Greco (NOAA/NESDIS/NCEI), Steve Ansari (NOAA/NESDIS/NCEI), Mark Phillips (UNC Asheville/NEMAC), Ed Kearns (NOAA/NESDIS/NCEI)

Tools:  R with Google Charts and Leaflet.js integration

Data:  NOAA Severe Weather Data Inventory (SWDI)US Census Bureau County Shapefiles (TIGER) 

Acknowledgements:  This work was inspired and developed in collaboration with NOAA's National Centers for Environmental Information (NCEI).