Organisation: (Cuba)

Publication Date: 04/08/2017

Size of team/newsroom:small

Description is the first and only site that does data journalism in Cuba.The main objective of the space is to develop and encourage in Cuba the transparency and free access to data and its use for the good of society. To that end, and having as main protagonist of its works to Cuba, is that it has covered from the perspective of data journalism different edges of the Cuban society. To this end, we published issues related to privacy, the way cubans are named, baseball, hurricanes, gender issues, and other topics. In this way, we present articles, graphs, visualizations where we identify data sources and stories that can be addressed by other spaces. The team is very small, currently it is made up of 4 people, 3 journalists and a data analyst who also performs all the technical activity. All members of work voluntarily and receive no compensation for their research. In addition, we have received collaborations from people who want to work with us and share our way of doing. In addition, is a space for continuous learning and experimentation that's why it shares its experiences in different spaces such as the Institute of Journalism and the Faculty of Communication of the University of Havana. In these spaces we share our ways of doing, tries to incorporate new ones, and bets on the construction of new stories, collectively and with new ways of doing. Another challenge facing our team is access to data. In Cuba there is no culture of open data, nor legislation that protects access to information. That is why another of the contributions of the project is the identification of databases or repositories of data that can be used by different sources, or the construction of databases based on information disaggregated in different spaces. Along these lines, it also aims to boost access to data as a public policy in the country.

What makes this project innovative? What was its impact? is the only medium in Cuba that bets on data journalism. He is also one of the few who bet on the continuous use of interactive graphic resources to tell the stories. In addition, his work also combines more traditional modes or genres when telling a story to publish either a unit number or a set of articles that complement each other as part of a common history or topic. As we have few resources for the technological infrastructure, we decided to use Github as the main infrastructure of the project and a we publish our stories based on the use of the control of versions using git. In this way, we has managed to have all the history of development of, as well as all the source code that has been developed is available for the community so that our ideas or developments can be used in other spaces. Thus, in just over six months, has been able to tell stories that were readed by many people, that there is a view by various Cuban media about the use of data to tell the stories, that the journalism academies of the country trust in our team for teaching our way of doing to professionals and students of journalism, as well as to obtain an identity seal and prestige by our way of telling and of visualizing the histories.

Technologies used for this project: base its infrastructure in Github, this way it has the full history of the development and the code is open to the community. Then our way of publishin a history is doing a commit/push to our github repository. Our project has used different databases or data repositories that are in different formats like sqlite, excel, csv, or pdf. In the data analysis process we use python as our main programming language, and different libraries that helps to process the data like NLTK for natural language processing, NetworkX for complex networks analysis, Matplotlib for numerical analysis among others. JSON is the format we decided to use for our data and we used in most of the interactive graphs that we published. We also use different Linux tools, for example, we use bash for some scripting to get image from pdfs, we also use tools like html2text or pdftotext to extract text from html pages or pdf documents, and wget and scrappy to scrap html pages to extract information and create our databases. We coded our site always from scratch (our main reason to do that is that we have the freedom to change everything we want for any article), then we HTML5, css3 (bootstrap) and Javascript. The main libraries that we use in javascript are jquery, d3.js, c3.js, d3plus, list.js, timelineJS and jvectormap.
Follow this project

Comments (0)

You have to be connected to contribute

You have to be connected to follow

Leave this project and no longer be informed about this project

By joining this project, you will be informed by email when an update or a new contribution is posted on the website.

Thank you for your active participation !

The GEN Community Team