Ideas |
Unraveling |
![]() Data Mining |
Managing |
Partnering |
Crowdsourcing |
The nuts and bolts of the Census graphing ‘widget’
A detailed explanation of the code
A word of caution
Unlike the other elements in this module, this tutorial is designed for more advanced Web programmers. Specifically, it assumes:
- You are familiar with HTML
- You are familiar with PHP
- Or you have access to someone on your staff or in your community who has those skills
Okay, with full recognition that this will take you pretty deep in the weeds of Web programming, this tutorial shows you the basics of building a Web site that scrapes select demographic data about a user-defined ZIP code from the Census Web site.
If you came to this page before reading the Web scraping overview, you probably should go back. You might also want to check out the section on data mining.
Credit for both the tutorial and the coding goes to Josh Williams, who recently received his master's degree in interactive journalism from the School of Communication at American University.
For this example, we will be collecting and using demographic data for an Arlington, Va., ZIP code.
The widget we built is based on PHP. Many computer languages facilitate Web scraping, and several languages have special libraries or packages to make it even easier. To reach a very broad audience, we are going to use PHP, a free Web development language available on almost all standard shared Web hosts.
The widget has three main features: It permits users to enter a ZIP code, grabs the data from American FactFinder and creates and displays either a bar graph or pie chart of the information. The bar graph compares the ZIP code's characteristics to national averages. Both graphs include a text data table.
The full source code of the sample applications also is available. You are free to use it as either a stand-alone or as the basis of your own Census-based Web application. If you do decide to use the code or incorporate it on your site, we would appreciate credit and a link back to this page.
For the second of the two graphing examples, we will use GD, an optional graphics library for PHP that may not be installed with PHP on all Web hosts, but which can easily be downloaded and installed.
While meant to be simple, with code examples written more for clarity than efficiency, this tutorial assumes a basic familiarity of Web development with PHP and HTML.
The tutorial is broken into three parts:
- The Basics – Getting the data automatically.
- Graphing The Data – Building a bar graph.
- Advanced Graphing – Creating pie charts with dynamic images and the GD library.
The basics
In an ideal world, building Web applications with data from the Web would be as easy as pulling data from a XML or RSS file. However, to pull race-related data from the Census Web site, we’re going to have to scrape it.
The first step in this exercise is to determine the location of the desired data. For basic ZIP code-level demographic data at the American Factfinder, the URL is: http://factfinder.census.gov/servlet/SAFFFacts?_event=Search&_zip=22203.
The part we are concerned about in the URL is the _zip variable at the end of the query string. Changing “22203” to another valid five-digit zip code brings up that area's data.
Now that we know where the demographic information is, we need to parse out only the information we want. Let’s start with the white population for 22203, the first one listed on the Web page.
On the page it looks like:
The corresponding HTML is:

Take note of the location of each of the numbers we want (12,587, 68.0, 75.1). Notice that they are all directly below a line with unique “headers” values of the table data cells (<td>). One simple way for our application to grab the statistics about the white population in 22203 is to simply loop through each line of the HTML document and search for the unique “headers” value above the numbers we want, strip out the HTML on the next line &mdash which leaves only the desired value on that line — and ignore everything else on the page.
Here is the php code to accomplish that:
<?php
//ZIP code we want demographic data for
$zip = "22203";
//the URL of the data
$url = "http://factfinder.census.gov/servlet/SAFFFacts?_event=Search&_zip=" . $zip;
//put Census page HTML in an array
$lines = file($url);
//loop through HTML and search for unique table values
foreach ($lines as $line_num => $line) {
//white population (denoted by the R9 R10 C2 headers)
if(preg_match("<td headers=\"R9 R10 C2\" align=\"RIGHT\">", $line)){
$totalWhite = trim($lines[$line_num + 1]);
}
}
//print number of white people in 22203
echo($totalWhite);
?>
Explaining the PHP
The first few lines simply define the location of the Census data and put the source of that page into an array, with one line of HTML per array element. The “$zip” variable can be changed to any valid five-digit ZIP code.
The next bit gets interesting. The “for” loop works through each element of the array (each element being a line of HTML) while “preg_match” looks for the unique headers value. Once we’ve found the matching pattern, we take the next line of HTML, which is the next element in the array ($totalWhite = trim($lines[$line_num + 1]);, and assign it to a variable for later use.
If you got lost in the details of how the code works, the important part to remember is that “$totalWhite = trim($lines[$line_num + 1]);” assigns the values 12,587 to the variable “$totalWhite” by taking the HTML line after matching the unique headers we searched for and striped it of HTML. If we wanted to know the percent white for the ZIP code, we could simply grab the value of the line three lines down from our search, like so: “$percentWhite = trim($lines[$line_num + 1]);.”
To grab other statistics from the page, simply count the lines from the match found above or add a new conditional and “preg_match” search. Once all the data is stored in variables, printing it in tabular form is a snap.
Try the application. View the source.
Graphing the data
Now that we know how to automatically grab Census demographic data for any ZIP code, we can represent that data graphically. A bar chart is a simple way to show the relationship between population percentages, both locally and nationally, for much of the demographic data on the American FactFinder.
Follow the instructions in section one to grab Census data and store them in variables for later use. This tutorial will stick with race data for ZIP code 22203.
We are going to make our chart by dynamically sizing static images of different colors in order to represent the two series of data (the ZIP code and U.S. total). The “height” and “width” attributes of the image tag size an image, even if the values of the attributes do not match the true pixel size of the image. Using this feature, we can create very small images and stretch them to appear to be separate, individually-sized bars.
We take two one pixel high by one pixel wide images and stretch several instances of them to produce the following result:
Here is a sample of the relevant HTML (you may have to change the paths to match those on your server):
<table width="508" height="250" border="0" cellpadding="0" cellspacing="0">
...
<td width="32" align="bottom"><img src="/reporting/widgets/images/chart_green.png" width="32" height="7" /></td>
<td width="32" align="bottom"><img src="/reporting/widgets/images/chart_greenW.png" width="32" height="5" /></td>
...
<td width="32" align="bottom"><img src="/reporting/widgets/images/chart_green.png" width="32" height="28" /></td>
<td width="32" align="bottom"><img src="/reporting/widgets/images/chart_greenW.png" width="32" height="15" /></td>
...
</table>
Notice we used a table to contain the dynamic images. Tables can have an image as a background, which we will use as our scale.
Here is the same table with a background image we created to show scale:
Now all we have to do is make the bars the correct size relative to the values they represent. Our table is 250 pixels high, so a value of “100%” would need to be 250 pixels high. A value of “50%” would need to be 125 pixels high, et cetera.
The equation to determine bar heights: image height = (race percent / 100) * 250.
We can express this in a simple PHP function:
function calculateBarGraph($v){
$v = ceil(($v/100)*250);
return $v;
} $v is the percent value we want to turn into an image height. We are rounding up to the nearest percent all values with “ceil()” so that the bars all have at lease one pixel visible.
Try the application. View the source.
Advanced graphing
The bar chart technique in the previous section, which utilizes small images stretched to appear various sizes, can be used in many situations. There are times, however, when stretching images into bars will not suffice. Line graphs and pie graphs, for example, are valuable tools that require more horsepower than has been introduced thus far. Fortunately there is an optional library called GD, that can be compiled into PHP. This library facilitates the creation of dynamic images in various formats. Your system administrator can tell you if GD is installed with your version of PHP.
Building on the first two sections, we are going to create dynamic pie charts with GD and libchart, another open source library written in pure PHP that only needs to be uploaded to your Web server to work.
Fortunately, libchart makes pie charts very easy by handling all of the low-level GD details. Simply upload the “libchart” folder in the download and upload it to the project folder you want to use for our chart and add one line at the top of your PHP page: include "libchart/libchart.php";
From here, we take the demographic that we collected in the first step of the tutorial and feed the variables to libchart as shown in this php code snippet:
//name of image to create
$dynamicPieChart = "cache/" . $zip . "_pie.png";
//only create if it doesn't exist already
if(!file_exists($dynamicPieChart)) {
$pie_chart = new PieChart(510, 250);
$pie_chart->setLogo("images/blankLogo.png");
$pie_chart->addPoint(new Point("White - $percentWhite", $percentWhite));
$pie_chart->addPoint(new Point("Black / African American - $percentBlack", $percentBlack));
$pie_chart->addPoint(new Point("American Indian - $percentAmericanIndian", $percentAmericanIndian));
$pie_chart->addPoint(new Point("Asian - $percentAsian", $percentAsian));
$pie_chart->addPoint(new Point("Native Hawaiian - $percentHawaiian", $percentHawaiian));
$pie_chart->addPoint(new Point("Some Other Race - $percentOther", $percentOther));
$pie_chart->addPoint(new Point("Two Or More Races - $percent2More", $percent2More));
$pie_chart->setTitle("Race Breakdown for ZIP Code ". $zip);
$pie_chart->render($dynamicPieChart);
}
Notice the $dynamicPieChart variable. That is the location on the Web server where we are going to physically create a static pie chart. We chose to create a “cache” folder on the Web server for our images. Regardless of the location, PHP needs to have write access to the folder. If you have problems writing to the folder, contact your system administrator.
The other code creates a new instance of the pie chart class of libchart library and feeds it the relevant data. The only thing else that’s needed is a little HTML to place the image on a page.
The result is this:
Try the application. View the source.
| Overview | Ideas | Unraveling | Data Mining | Managing | Partnering | Crowdsourcing | About |



