Organisation: GauchaZH / Zero Hora (Brazil)

Publication Date: 04/08/2018


BotStalker offers tools for journalists and researchers to understand how bots will perform on Twitter during the 2018 election campaign in Brazil. It will monitor in real time the behavior of the bots associated with each candidate and trigger viral content alerts that have the potential to be fake news. This action will enable newsrooms to make the fact-checking to fight against rumors and misinformation quickly, minimizing their impact. The dashboard will show in real time the number of posts, the most discussed topics, the most shared links and the most retweet posts by the bot armies of each political group. It will also allow to create reports according to the desired period, from daily data, refinement of information collected and measurement of the impact and influence of robots on the virtual campaign. BotStalker will join all the data in .csv format for target audience make other crossings and studies about the bots.

Technologies used for this project:

Tool Feeder: We will build and train machine learning models able to predict if a tweet is fake or true. Potential learning algorithms: recurrent neural network (LSTM), boosted decision tree, naive bayes. Python (tensorflow) and R are the recommended technologies. Monitoring Tool: it will be built in two parts. First part: job fired every 10 seconds, that accesses Twitter API, make an analysis. The data will persist on a non-relational basis (MongoDB or elasticsearch). Second part: API, that reads the data from the non-relational database and delivers the data in json format to the visualizations pane. Data analysis: - list of trending topics of the moment: gensim library and Latent Semantic Indexing (LSI) model. - alert of viral tweets and viral links: use of the model that detects fake news trained in the "tool feeder" with an anomaly detector (it will be built with a gaussean curve). - links, domains, absolute numbers and computation of the level of activity of the bot armies of each candidate will be implemented with the standard python library Display panel: Use of AngularJS and lib D3. Query to the api built in the "Monitoring Tool" Report: - Creation of armies: Creation of clusters (K-Means) based on the most important terms of each profile. For this, the TF-IDF (Term Frequency / Inverse document frequency) can be created, from where the most relevant terms will be extracted and the clustering algorithm will be applied. Python and scikit learn are suitable for implementation. - List of trending topics and association of topics to armies: We will use the same module of the monitoring tool. - Processing for extracting metrics (the most shared links, domains and absolute numbers): will be build with the standard python library using the Mongo database.


Featured Documents

Follow this project

Comments (0)

You have to be connected to contribute

You have to be connected to follow

Leave this project and no longer be informed about this project

By joining this project, you will be informed by email when an update or a new contribution is posted on the website.

Thank you for your active participation !

The GEN Community Team