Physica A – COVID-19 in Italy and Extreme Data Mining

Paolo Massimo Buscema1, 2, *, Francesca Della Torre1, Marco Breda1, Giulia Massini1, Enzo Grossi1.

1Semeion Research Center of Sciences of Communication, via Sersale, 117, 00128 Rome, Italy
2University of Colorado at Denver, Dept. Mathematical and Statistical Sciences, Denver, CO, USA
*Corresponding author. Email address:


In this article we want to show the potential of an evolutionary algorithm called Topological Weighted Centroid (TWC). This algorithm can obtain new and relevant information from extremely limited and poor datasets. In a world dominated by the concept of big (fat?) data we want to show that it is possible, by necessity or choice, to work profitably even on small data. This peculiarity of the algorithm means that even in the early stages of an epidemic process, when the data are too few to have sufficient statistics, it is possible to obtain important information.

To prove our theory, we addressed one of the most central issues at the moment: the COVID-19 epidemic. In particular, the cases recorded in Italy have been selected. Italy seems to have a central role in this epidemic because of the high number of measured infections. Through this innovative artificial intelligence algorithm, we have tried to analyze the evolution of the phenomenon and to predict its future steps using a dataset that contained only geospatial coordinates (longitude and latitude) of the first recorded cases.

Once the coordinates of the places where at least one case of contagion had been officially diagnosed until February 26th, 2020 had been collected, research and analysis was carried out on: outbreak point and related heat map (TWC alpha); probability distribution of the contagion on February 26th (TWC beta); possible spread of the phenomenon in the immediate future and then in the future of the future (TWC gamma and TWC theta); how this passage occurred in terms of paths and mutual influence (Theta paths and Markov Machine). Finally, a heat map of the possible situation towards the end of the epidemic in terms of infectiousness of the areas was drawn up. The analyses with TWC confirm the assumptions made at the beginning.