Crime in Context

Crime in Context

Organisation: The Marshall Project (United States)

Publication Date: 04/10/2017

Size of team/newsroom:small


We started with the seemingly simple question: “Is crime in America rising or falling?” The answer is not nearly as clear as politicians would have us believe because of how the FBI handles crime data from the country’s more than 18,000 police agencies. The system relies upon voluntary reporting, and as a result, when local reports are filed they are frequently inconsistent. The problem is compounded by the fact that the bureau can take more than a year to crunch the numbers, so the national data lags far behind the current state of crime. We recognized that in order to truly understand the picture of violent crime in America today—homicide, robbery, rape and aggravated assault—we had to understand historical patterns as well. The Marshall Project collected decades of data and employed sophisticated statistical analysis techniques. Rather than relying on partial-year data and comparing one year to the previous, we analyzed more than 40 years of violent crime data from 68 of the nation’s largest police jurisdictions. We used a statistical method called LOESS (or local) regressions to smooth often erratic year-to-year crime figures and to weight the resulting averages over a 10-year stretch. This technique allowed us to see long-term trends and crime patterns. We then looked at how crime has evolved in similar ways in various cities. To do that, we used Ward’s algorithm to perform a hierarchical cluster analysis, grouping very different cities together by their crime trends. Our analysis of the years 1975 through 2015 found that violent crime in these jurisdictions rose 2.2 percent last year, while nationally violent crime rose 3 percent. Through the LOESS regression and its weighted averages, we could see, that despite a recent slight uptick in violence, crime remained at very close to its lowest point in since the 1970s. In each city of the dozens we examined, we could see how crime had crested in the 1980s and 1990s and then fallen precipitously in the new millennium. The resulting story combined sharp analysis with engaging visualizations to demonstrate how crime data can be easily manipulated to make a cherry-picked point. If you want to make the case that crime is rising dramatically, pick a small sample of cities and look at a single category for a short period of time. But when you look at data as criminologists do–and as we did—you can see that any rhetoric that makes the case that the country is gripped by a crime wave is largely exaggerated. Finally, we wanted to help readers see and understand the trends in the crime data that we had collected. To that end, we created an interactive chart that allowed readers to explore any of the four violent crime categories by whole numbers or rates in any of our sample cities for any time period.

What makes this project innovative? What was its impact?

Our project was innovative because we overcame many difficulties associated with the data that we were analyzing. To start, the historic UCR data was spread across 40 different data files, most of them hundreds of columns wide and fixed-width—not easily delimited by comma or tab—in roughly 7 different formats. Tom Meagher created an R script to download and compile the data for each of those formats and to then concatenate them together into one big database of historic reports from which we could slice out just the jurisdictions we were focusing on in our analysis. For every department, we calculated the rate of crime in each category and for all violent crime, per 100,000 residents in the jurisdiction, based on the FBI’s estimated population for that year. In the few departments that did not report to the FBI for 2015, we used the 2014 estimated population. We also calculated a weighted, 10-year average for our crime figures using a LOESS regression, to smooth out year-to-year hiccups but also give more emphasis to recent years. We conducted a hierarchical cluster analysis to group cities that have had similar crime trends together. To do this, we used Ward’s algorithm (“ward.D”) as implemented in the statistical analysis program R. Crime in Context’s use of statistical analysis, computer-assisted reporting and data visualizations integrating 40 years of crime data gave our readers a unique and much-needed true picture of crime trends in America, which had been sorely lacking from the national conversation to that point.

Technologies used for this project:

Most of the data compiling, cleaning and analysis was done using R. We employed Python’s pandas library for a small bit of munging to translate the SPSS data layouts provided in some years by the NACJD for the fixed-width files to an array that R could use. We used R’s ggplot2 library to create all of the static charts and the D3 javascript library to create the interactive charts.


Follow this project

Comments (0)

You have to be connected to contribute

You have to be connected to follow

Leave this project and no longer be informed about this project

By joining this project, you will be informed by email when an update or a new contribution is posted on the website.

Thank you for your active participation !

The GEN Community Team