Key topics to predict conflict: information and civil war

What do newspapers’ headlines say right before the outburst of a war? What are the words that define a conflict? Hannes Mueller, researcher at the Institute for Economic Analysis (CSIC) in Barcelona, and Christopher Rauh, currently at Cambridge University, strive to respond to these questions with a new project to predict conflict. They have developed an algorithm that compiles information from the most important newspapers and evaluates which topics become recurrent right before a conflict or a civil war begins.

The algorithm collects the news from the main newspapers to evaluate which issues are recurring before the outburst of a conflict. Image: PixaBay.The team uses an algorithm based on “unsupervised learning”, that is, the researchers provide the algorithm with data (for this study, press articles) so it looks for patterns automatically. In this research project, these patterns are topics that summarise the text, based on the words found and analysed by the system. The algorithm finds patterns associated with themes such as economy, politics, war, or conflict, although “sometimes the themes are less clear”, according to the researcher. All of these patterns (or topics) are used to make predictions.

As Mueller highlights, although certain topics become prominent right before the outburst of a civil war, other themes are less common. For instance, the study has shown that a decrease in news articles related to the economy or the justice system might point to the possibility of a conflict. However, this relation is “a correlation, and not the cause”, reminds Mueller. That means, these indicators are useful in order to make predictions, but they do not indicate the causes.

Project innovation

Algorithms based on “machine learning” technology as well as prediction tools are widely used in the social sciences. The originality of the project of the IAE-CSIC is to combine these technologies to predict conflict. The research group uses non-supervised learning to find the patterns (themes) within the text. Later on, the group uses supervised learning to make the prediction.

The research uses machine learning technologies, both supervised and unsupervised, to make the prediction

The researchers compile news each year and evaluate the efficiency of the tool by making a comparison with the following year: for example, the data from 2001 is used to predict conflict in 2002. This test confirms that the algorithm is capable of predicting conflict.

According to Mueller, the truly arduous work has resided in downloading the 3.8 million articles that the algorithm has evaluated, click after click. A few research assistants have helped with this tedious task, downloading the articles and revising the contest to guarantee its correct processing. From the 3.8 million articles, 3.1 come from BBC Monitor, a tool developed during the Cold War in order to obtain information from local newspapers across the globe.

Risk of violence in Spain according to data. Image: IAE-CSIC.The sample collects news from the last two decades, when the impact of "fake news”, now on the increase, was more limited. Nevertheless, the results from the last few years indicate that the algorithm has not been affected by those fake news, since “propaganda has always being a part of the news”, says the researcher.

Factors that determine armed conflict

Hannes Mueller identifies political exclusion as one of the most important factors in the outburst of a conflict. “Always, always. You can see it in every conflict, there are people who feel excluded from the political process”, Mueller explains, and adds economic and climatological causes (such as the drought in Syria) as factors that might result in conflict. Previous research projects at IAE-CSIC also point to ethnic polarisation and economic inequality as elements that might affect these processes.

Political exclusion is one of the most important factors in the outburst of a conflict

The purpose of the study, however, is not to study the factors that might influence or determine an armed conflict. “That is a different part of the research”, says Mueller. Although his research does includes this focus, “it is unrelated to the prediction”.

The tendency to repetition: the conflict-trap

The researcher explains that some countries such as Afghanistan, Somalia or the Democratic Republic of the Congo fall into a phenomenon known as “conflict-trap”, that is, they are constantly facing civil wars or internal and external tensions. Nevertheless, he notes, this trap does not need to be permanent. “Leaving this conflict-trap is possible, but it takes some time: a decade, more or less. If peace lasts a decade, [it is considered that the country] has overcome this phase of conflict”, he points out.

However, the challenge is not to predict conflict in these nations, but in countries where conflict is uncommon and the initial probability of incurring in it is very low. In these cases, one out of ten projections made with the algorithm will be a real conflict.

A country overcomes the conflict-trap when peace lasts for a decade, as it is considered that it has overcome the phase of conflict

Guessing one out of ten predictions correctly might seem like a low rate, but predicting conflict is a difficult and complex task. A few different factors complicate the prediction. “The first factor is the complexity of societies”, Mueller explains, since they are formed by human actors that add unpredictability to the study. Secondly, it is important to note that predictions and social systems influence each other. “If you make predictions in social systems, these systems might react to those predictions”, the researcher notes. Lastly, the amount of data is crucial for a good prediction: the larger the dataset, the better the learning of the algorithm, and consequently, the better the prediction.

In spite of these challenges, the researcher asserts that the algorithm provided data in 2017 that indicated that a conflict might originate in Yemen, a fact that has been confirmed. “In 2017 we had [data for] Yemen. Yemen was one [of the countries where we predicted conflict] and look at what is happening”, says Mueller.

Future applications of the model

Hannes Mueller, researcher in the project, in the Instituto de Análisis Económico. Image: Sabela Rey Cao.Mueller expects to establish a collaboration with Banco de España to expand the project, still in the academic phase, and develop an index of political risk. This collaboration would lead to a more accurate prediction in the whole world and would expand the database. “If we have time and resources, we are going to focus in Latin America”, as the researcher explains.

Mueller assures that he would rather work with public agents, such as Banco de España (the central Bank of Spain), to prove that the algorithm works and to make the database public. Hence, they can avoid the private interests of investors that have taken an interest in the project, because that could “worsen/aggravate the problem”.

Paula Talero Álvarez and Sabela Rey Cao - Delegación del CSIC en Cataluña