Organisation: Bayerischer Rundfunk/BR Data (Germany)

Publication Date: 10/27/2016


Thousands of publicly owned companies are running trains, finance or infrastructure services all over Europe. However, information about these companies and their network of public private partnerships is often hidden in PDFs published by government bodies, city councils or other public sources, making it difficult for investigative journalists to search for them in a single place. However, there are well known cases of state-owned or partially state-owned companies being involved in scandals and shady business. State-O aims to shed light on stated-owned enterprises and offers information in a simple searchable database. The goal is to build a database of state-owned enterprises all over Europe to be used mainly by investigative journalists. In a first draft, State-O uses information on companies controlled directly or indirectly by the German federal government, published by the Federal Ministry of Finance in PDF format. Also direct ownership of corporates by the Austrian federal government is part of the already scraped dataset. Other sources from European state Governments, regional Governments and city councils will be added in the future. More features to be included: CSV export of data, network visualization of the data, uploading your own data, building a community around gathering data about public companies around State-O.

Technologies used for this project:

The dataset is currently imported from a CSV-File which contains an entry for each state and each enterprise in the universe of state owned enterprises. In the case of Germany, a PDF from the Ministery of Finance was extracted with Tabula and split into direct and indirect ownerships. Poppler/pdftohtml unfortunately did not give better results. Since parts of the PDF are messy, a Python script merges entries which where split over several lines in the original document. Establishing a final version of the dataset still requires therefore some data tidying and cleaning. An R-Script sketches a way to merge direct and indirect ownerships. The database and website (available only on local machine so far) runs on Node.js, Express.js, awesomplete.js and MongoDB.
Follow this project

Comments (0)

You have to be connected to contribute

You have to be connected to follow

Leave this project and no longer be informed about this project

By joining this project, you will be informed by email when an update or a new contribution is posted on the website.

Thank you for your active participation !

The GEN Community Team