Portfolio: Helena Bengtsson

Portfolio: Helena Bengtsson

Organisation: The Guardian (United Kingdom)

Publication Date: 04/12/2016


Size of team/newsroom:large


Helena Bengtsson is a data journalist and editor of the Guardian’s Data team. Her work combines journalistic prowess and an eye attuned to the interpretation of data-led stories, using data analysis as a tool to find the stories, rather than the story itself.

What makes this project innovative? What was its impact?

For ‘The tycoons and world leaders who built secret UK property empires’ Bengtsson partnered with David Pegg and Holly Watt to investigate a structured registry of about 200,000 companies. The Guardian compared the list of companies with data from the Land Registry that contains all properties owned by companies outside the UK. That database contained about 90,000 properties owned by about 30,000 companies. Given the inconsistent quality of the records, she used a similarity match where we could join the two lists and queried the database to get companies where the names were 90% equal. Reporters used the resulting list to mine the unstructured data and other sources to find out more about the companies and expose the people behind the companies, a key component of the overarching investigation, the ‘Panama Papers’. For the ‘Unaffordable country’ map, Bengtsson, along with Will Franklin and Apple Chan-Fardel, carefully cleaned, analysed and refined the Price Paid Data (PPD) database from the Land Registry to collect 19.6 million records that covered 20 years worth of addresses, type of home (including age of residence), and price. For the story ‘Most UK police forces have disproportionate number of white officers’, Bengtsson and Kevin Rawlinson acquired data via FOI from each police force, breaking down their recruitment by application success and ethnicity. They established the breakdown of population ethnicity by mapping 2011 census data to the corresponding police jurisdictions. By comparing this against the ethnic breakdown of who had applied to each force, they were able to identify which forces didn’t represent the makeup of their district at the application stage, and which forces under-recruited ethnic minorities. Bengtsson’s background in computer science and decades of experience make her one of the best data journalists in the world and an invaluable data journalism coach in the newsroom.

Technologies used for this project:

Excel, OpenOffice, SQL, and myriad other data collection and analysis tools.
Follow this project

Comments (0)

You have to be connected to contribute

You have to be connected to follow

Leave this project and no longer be informed about this project

By joining this project, you will be informed by email when an update or a new contribution is posted on the website.

Thank you for your active participation !

The GEN Community Team