pbogden.com

science & tech in the public interest

Something's wrong.
You seem to be using a defunct browser. The demos on this page require a modern browser that supports HTML5 and JavaScript. Internet Explorer 9 and older won't work. If you think you're okay, then confirm that JavaScript has not been disabled (in your browser's preferences/settings).

2016 single-family loans

Phase 1 -- Early design concept

Product requirements were developed over a casual cup of coffee. The client -- a major financial services company -- wanted to demonstrate that they support the entire housing market. And they wanted something easily understandable and visually appealing to a wide audience, including my son (a 10-year old boy), who helps test the prototypes.

To start the prototyping process, we needed data. The first prototype involved 26,000 multi-family loans, each with a date and ZIP code. This was similar to data that we would get later under a non-disclosure agreement. And since it was readily available, it got us going quickly.

In this initial prototype, a yellow dot appears when a loan closes, leaving behind a small red dot on the map. Red dots accumulate over time to reveal a larger number of loans in densely populated areas. Click here to re-start in a new tab.

Client feedback was positive. The only concern was that the graphic looked a bit too much like a video game (my son likes video games). So the design requirement was modified to: a bit more conservative but still appealing to my son.

Phase 2 -- Data preparation

Data for the final graphic includes one year's worth of single family loans (about 2 million). Even though each loan came with only a ZIP code and a date, there were concerns about including too much detail. Since a savvy programmer can extract the data from web-based graphics, we aggregated loans by county. And as part of the analysis, we created a standard bubble plot.

The pattern in the data is visually indistinguishable from the U.S. population, which is precisely the story that the client wanted to convey. So this static graphic -- originally created just to check our calculations -- appears in their annual report. Our ultimate goal was to develop a dynamic graphic for their website.

Phase 3 -- Protoyping

During prototyping, a question arose about blank areas in the map. These unpopulated areas are most obvious in the Rockies. To make the geographic distinction clear, we integrated the NASA Blue Marble -- a cloudless satellite composite. Client feedback was positive (my son liked it too).

We then iterated on several versions of the bubble plot that showed variation over time. One version adapted functionality from the earthquake demo. It showed time variation with a translucent gray slider that could be moved back and forth. In the end, the client preferred this fully automated version for their website.

Phase 4 -- Predictive Analytics

The graphics above used client data to visualize a "story" -- nationwide support for the entire housing market. In the next phase, we developed predictive skill by integrating new data to provide additional context. The idea was to compensate for the fact that ZIP codes (and counties) vary greatly in size, and those variations don't necessarily reflect per capita homeownership. With open data from the IRS, we estimated the number of households in each ZIP code by the number of tax returns. We then defined "loan ratio" as the number of loans per household. Variations in loan ratio should reflect socio-economic factors that impact housing, as distinct from the number of people living in geographic regions of variable size.

Loan Ratio (loans/household) -- opens in a new tab

This calculation turns out to be sensitive to some arbritrary factors. The user can investigate sensitivities by adjusting the variable parameters in the interactive graphic. In the high sensitivity case -- ZIP codes with relatively few households -- small adjustments create large changes in the map. A relatively consistent patterns emerges in the low sensitivity case.

The interactive graphic quickly filters through tens of thousands of ZIP codes. And the low-sensitivity case flags areas with a reliably high loan ratio. Each one has an interesting socioeconomic rationale that's easy to infer by googling its ZIP code (obtained by running your mouse or finger over the graphic). Some are associated with oil boomtowns that could be susceptible to high foreclosure rates if oil prices drop too low. Others are associated with new retirement communities, or coastal resort communities that are undergoing rapid development. Predictive capabilities would continue to emerge by extending this type of spatial analysis over multiple years.