The nuts and bolts of the Census graphing ‘widget’A detailed explanation of the codeA word of cautionUnlike the other elements in this module, this tutorial is designed for more advanced Web programmers. Specifically, it assumes:
Okay, with full recognition that this will take you pretty deep in the weeds of Web programming, this tutorial shows you the basics of building a Web site that scrapes select demographic data about a user-defined ZIP code from the Census Web site. If you came to this page before reading the Web scraping overview, you probably should go back. You might also want to check out the section on data mining. Credit for both the tutorial and the coding goes to Josh Williams, who recently received his master's degree in interactive journalism from the School of Communication at American University. For this example, we will be collecting and using demographic data for an Arlington, Va., ZIP code. The widget we built is based on PHP. Many computer languages facilitate Web scraping, and several languages have special libraries or packages to make it even easier. To reach a very broad audience, we are going to use PHP, a free Web development language available on almost all standard shared Web hosts. The widget has three main features: It permits users to enter a ZIP code, grabs the data from American FactFinder and creates and displays either a bar graph or pie chart of the information. The bar graph compares the ZIP code's characteristics to national averages. Both graphs include a text data table. The full source code of the sample applications also is available. You are free to use it as either a stand-alone or as the basis of your own Census-based Web application. If you do decide to use the code or incorporate it on your site, we would appreciate credit and a link back to this page. For the second of the two graphing examples, we will use GD, an optional graphics library for PHP that may not be installed with PHP on all Web hosts, but which can easily be downloaded and installed. While meant to be simple, with code examples written more for clarity than efficiency, this tutorial assumes a basic familiarity of Web development with PHP and HTML. The tutorial is broken into three parts:
The basicsIn an ideal world, building Web applications with data from the Web would be as easy as pulling data from a XML or RSS file. However, to pull race-related data from the Census Web site, we’re going to have to scrape it. The first step in this exercise is to determine the location of the desired data. For basic ZIP code-level demographic data at the American Factfinder, the URL is: http://factfinder.census.gov/servlet/SAFFFacts?_event=Search&_zip=22203. The part we are concerned about in the URL is the _zip variable at the end of the query string. Changing “22203” to another valid five-digit zip code brings up that area's data. Now that we know where the demographic information is, we need to parse out only the information we want. Let’s start with the white population for 22203, the first one listed on the Web page. On the page it looks like: The corresponding HTML is:
Take note of the location of each of the numbers we want (12,587, 68.0, 75.1). Notice that they are all directly below a line with unique “headers” values of the table data cells (<td>). One simple way for our application to grab the statistics about the white population in 22203 is to simply loop through each line of the HTML document and search for the unique “headers” value above the numbers we want, strip out the HTML on the next line &mdash which leaves only the desired value on that line — and ignore everything else on the page. Here is the php code to accomplish that: <?php
|
Here is a sample of the relevant HTML (you may have to change the paths to match those on your server):
<table width="508" height="250" border="0" cellpadding="0" cellspacing="0">
...
<td width="32" align="bottom"><img src="/reporting/widgets/images/chart_green.png" width="32" height="7" /></td>
<td width="32" align="bottom"><img src="/reporting/widgets/images/chart_greenW.png" width="32" height="5" /></td>
...
<td width="32" align="bottom"><img src="/reporting/widgets/images/chart_green.png" width="32" height="28" /></td>
<td width="32" align="bottom"><img src="/reporting/widgets/images/chart_greenW.png" width="32" height="15" /></td>
...
</table>
Notice we used a table to contain the dynamic images. Tables can have an image as a background, which we will use as our scale.
Here is the same table with a background image we created to show scale:
Now all we have to do is make the bars the correct size relative to the values they represent. Our table is 250 pixels high, so a value of “100%” would need to be 250 pixels high. A value of “50%” would need to be 125 pixels high, et cetera.
The equation to determine bar heights: image height = (race percent / 100) * 250.
We can express this in a simple PHP function:
function calculateBarGraph($v){
$v = ceil(($v/100)*250);
return $v;
} $v is the percent value we want to turn into an image height. We are rounding up to the nearest percent all values with “ceil()” so that the bars all have at lease one pixel visible.
Try the application. View the source.
The bar chart technique in the previous section, which utilizes small images stretched to appear various sizes, can be used in many situations. There are times, however, when stretching images into bars will not suffice. Line graphs and pie graphs, for example, are valuable tools that require more horsepower than has been introduced thus far. Fortunately there is an optional library called GD, that can be compiled into PHP. This library facilitates the creation of dynamic images in various formats. Your system administrator can tell you if GD is installed with your version of PHP.
Building on the first two sections, we are going to create dynamic pie charts with GD and libchart, another open source library written in pure PHP that only needs to be uploaded to your Web server to work.
Fortunately, libchart makes pie charts very easy by handling all of the low-level GD details. Simply upload the “libchart” folder in the download and upload it to the project folder you want to use for our chart and add one line at the top of your PHP page: include "libchart/libchart.php";
From here, we take the demographic that we collected in the first step of the tutorial and feed the variables to libchart as shown in this php code snippet:
//name of image to create
$dynamicPieChart = "cache/" . $zip . "_pie.png";
//only create if it doesn't exist already
if(!file_exists($dynamicPieChart)) {
$pie_chart = new PieChart(510, 250);
$pie_chart->setLogo("images/blankLogo.png");
$pie_chart->addPoint(new Point("White - $percentWhite", $percentWhite));
$pie_chart->addPoint(new Point("Black / African American - $percentBlack", $percentBlack));
$pie_chart->addPoint(new Point("American Indian - $percentAmericanIndian", $percentAmericanIndian));
$pie_chart->addPoint(new Point("Asian - $percentAsian", $percentAsian));
$pie_chart->addPoint(new Point("Native Hawaiian - $percentHawaiian", $percentHawaiian));
$pie_chart->addPoint(new Point("Some Other Race - $percentOther", $percentOther));
$pie_chart->addPoint(new Point("Two Or More Races - $percent2More", $percent2More));
$pie_chart->setTitle("Race Breakdown for ZIP Code ". $zip);
$pie_chart->render($dynamicPieChart);
}
Notice the $dynamicPieChart variable. That is the location on the Web server where we are going to physically create a static pie chart. We chose to create a “cache” folder on the Web server for our images. Regardless of the location, PHP needs to have write access to the folder. If you have problems writing to the folder, contact your system administrator.
The other code creates a new instance of the pie chart class of libchart library and feeds it the relevant data. The only thing else that’s needed is a little HTML to place the image on a page.
The result is this:
Try the application. View the source.
| Overview | Ideas | Unraveling | Data Mining | Managing | Partnering | Crowdsourcing | About |