Artificial intelligence to tackle global challenges

by Chiara Sabelli
July 2025

Robotic arm, VandaL Laboratory

Many people involved in artificial intelligence and networks still remember what happened in 2012. That year, the ImageNet competition was won decisively by AlexNet, a deep neural network developed by Alex Krizhevsky, then a student at the University of Toronto under the supervision of Geoffrey Hinton, the latest Nobel Prize winner in physics, and Ilya Sutskever, one of the founders of OpenAI.

“This will change everything,” Barbara Caputo – now head of the VandaL lab at Politecnico di Torino – told her colleagues at the Swiss Federal Institute of Technology in Lausanne (EPFL). “It was not easy to accept that the statistical approaches to machine learning I had been working on in my career would never measure up, but the fact was undeniable.”

AlexNet marked the beginning of a revolution in artificial intelligence, made possible by the availability of sufficiently powerful hardware, graphical processing units (GPUs), and a large number of freely available images, mostly shared by unaware internet users on social media platforms of the time.

This technology has now been implemented in various fields. It is used in medical image analysis for diagnosing conditions like skin and breast cancer, as well as in self-driving cars and facial recognition. In some cases, it had a revolutionary impact, while in others, its effects are still unfolding.

To learn how to recognize dogs, a large collection of images is needed. This collection should include many pictures that feature dogs in various positions and settings, as well as images that do not contain dogs at all. Creating this database is relatively straightforward, as the labeling can be done by non-experts, allowing for lower compensation costs.

The process is not as simple when it comes to satellite images used to determine crop types or the effects of a fire. If you then want to train a deep neural network to recognize the scene in front of a robot, things get even more complicated. The robot's point of view is subjective, while the images available online are almost always taken from a third-party perspective. The same applies to drones, which observe the surrounding space at different heights above the ground and varying angles.

Applications in these fields could significantly impact the future of humanity. Addressing climate change requires continuous monitoring of our planet, and satellites can provide this capability, provided we know how to interpret the images they collect. Robots that can work closely with humans and learn to perform a variety of tasks in real time could offer essential support to an increasingly elderly and sick population. Additionally, they could replace humans in responding to natural disasters such as earthquakes, floods, or fires.

At the same time, if we don't want artificial intelligence to exacerbate some of the problems it could tackle, it must adapt to energy sustainability.

Many people are devoted to these challenges at Politecnico di Torino. Barbara Caputo and her research group are the ones focusing on vision for robotics and ‘frugal’ neural networks. Enrico Magli and his group prioritize artificial intelligence for satellite image processing.

Here are their stories.

Monitoring our planet’s health

Satellite image of northwest Sardinia, Sentinel-2A

For ten years, the Copernicus program has been observing Earth with artificial eyes. It does so through the Sentinel satellite missions, which collect various types of data on our planet, from vegetation and forests to the atmosphere and oceans. This data is essential for implementing measures to reduce climate change, assessing its effects, and monitoring the pledges made by the countries involved in this challenge. It can be used to identify land use, the health of vegetation, the extent of scars left by fires, and the time needed for them to heal. Copernicus can also play a role in agriculture, particularly in optimizing the use of resources.

Processing data collected by satellite sensors as they orbit is essential for obtaining meaningful information.

Images are among the most complex data to process, yet they are also the richest in information.

Over the past decade, machine learning algorithms based on deep neural networks have become the primary tool used in the field of satellite image processing.

This is precisely what Professor Enrico Magli handles, coordinating the Image Processing Lab at Politecnico di Torino. “I have worked extensively on data collected by the Sentinel satellite, across all its generations, and I am now collaborating with the European Space Agency (ESA) on the next generation of Sentinel.”

Magli and his colleagues have successfully tackled the challenge of merging low-resolution images to create a high-resolution version. The PROBA-V satellite mission, managed by the European Space Agency as part of the Copernicus program, collects multispectral images at frequencies that are particularly suitable for monitoring vegetation, forests, and inland water bodies.

“Applications in agriculture, for instance, require a resolution of at least two meters, while the sensors on board Sentinel-2 reach a maximum of ten meters, and PROBA-V 30 meters,” explains Magli.

In 2019, the European Space Agency launched a competition to discover the best algorithm that could take a series of low-resolution images captured by the PROBA-V satellite during successive orbital passes over the same area and generate a high-resolution version that is three times greater in resolution. This means that for every pixel in the low-resolution image, there would need to be nine pixels in the high-resolution image.

The DeepSUM algorithm, developed by Magli's group, won the competition, and in 2021 the group published a further improved version called PIUNet.

“The innovative element in DeepSUM was the way we aligned the low-resolution images before combining them,” explains Magli. At each orbital pass, the satellite photographs the same portion of territory, but from a slightly different angle, and this introduces distortions. “The improvement we achieved with PIUNet was constraining the network architecture to yield results that did not depend on the sequence of the provided low-resolution images.”

The Image Processing Lab has also worked on processing hyperspectral images collected by the EnMAP and PRISMA satellites.

While multispectral sensors collect images corresponding to around ten wavelengths of the electromagnetic spectrum, hyperspectral sensors collect hundreds of them.

“This allows us to perform more sophisticated tasks, such as overcoming the problem of cloud cover,” continues Magli. Clouds cover about 40% of images on Earth and 60% of those of the oceans. “Clouds can easily be confused with other things, such as ice or light emitted by powerful reflectors. Only by having access to a very detailed frequency spectrum is it possible to understand whether they are actually clouds,” explains Magli. “Each material, depending on its chemical structure, leaves a different spectral signature that we can measure with hyperspectral sensors.”

“Currently, all processing occurs on the ground,” Magli states. “Given the size and complexity of the neural networks we utilize, it would be unfeasible to train and run them aboard satellites.”

The long-term goal of the collaboration between the Image Processing Lab and ESA for the next generation of Sentinel missions is to develop machine learning systems for on-board processing.

This represents the final frontier in this area of research, as it requires operating under stringent constraints regarding computing power. There is no access to GPUs—graphical processing units that are crucial for training deep neural networks with a large number of parameters—and energy consumption is also limited.

Enhancement of Sentinel-2 data resolution with deep learning framework (Source: Google API)

Enhancement of Sentinel-2 data resolution with deep learning framework (Source: Google API)

Low-resolution image obtained via the Proba-V satellite (Valsesia and Magli 2021)

Same image taken by Proba-V and revised in high resolution by the DeepSUM algorithm (Valsesia and Magli 2021)

This is one of the challenges that Magli will address thanks to funding from the European Research Council (ERC), the European Union body that supports frontier research. The project that has received ERC support is called IntelliSwarm and aims to develop deep learning models for processing images collected by a swarm of satellites orbiting the Earth. “The models will be trained on Earth but executed in a distributed manner, with each satellite in the swarm doing its part.”

To achieve this result, Magli and his colleagues will have to start from scratch. First, they will need to build the data on which to train the models. “There are no public datasets of images of Earth collected by satellite swarms. ESA and NASA are only now beginning their first experiments,” he explains. An aircraft equipped with a hyperspectral camera and a LIDAR sensor, which measures the height of objects on the ground, will allow Magli's team to acquire a set of high-resolution images. The researchers will then develop algorithms to simulate the images that a swarm of low-orbit satellites would collect of the same scenes seen from the aircraft, lowering their spatial and spectral resolution. These images will be used to train deep learning models to recreate the original images collected by the aircraft and model the scene in 3D using the data collected by the LIDAR. “The image dataset will be made public and available to the scientific community.”

Once trained, the models will be “unpacked” so that individual satellites in the swarm can execute them using only their images. An iterative process will ensure that the results are nearly identical to those achieved through centralized calculations.

However, there is a hidden challenge within the challenge. “We are developing a compact machine learning system where the weights of the network connections can only take on three discrete values, rather than representing real numbers. If we run these lightweight networks on graphics cards designed for traditional neural networks, we won't gain any time or energy efficiency. Therefore, we will need to create custom hardware,” explains Magli.

This is the first step towards an approach that sees an ever-increasing part of the computing shifting from ground stations to satellites. “Today, we transfer all the images collected by satellites to ground stations, where we perform the necessary calculations to extract the information we need, but both transmission and computing are extremely expensive.”

Onboard execution will allow only useful information to be sent to the ground, such as the health status of a certain cultivated area. The long-term vision is to move the training phase to the satellite as well. “This would result in continuous learning, with the model being updated as the satellite collects new images. It's a bit like what happens with humans: they learn to recognize certain scenes but continue to look, thus updating their ability to interpret reality.”

Artificial intelligence for robotics

Robotic arm, VandaL Laboratory

Magli is not the only researcher at Politecnico di Torino addressing the challenges posed by the size and computational demands of modern machine learning algorithms, particularly those that use deep neural networks. These challenges are significant not only for the usability of these systems but also for their energy consumption and the resulting environmental impact.

Data centers' electricity consumption per capita by region, baseline scenario, 2020-2030 (source: IEA)

Researchers at the VandaL laboratory, coordinated by Professor Barbara Caputo, have developed algorithms that generate the most computationally efficient neural network architecture based on the hardware on which this neural network will be trained.

“When designing a machine learning algorithm, if you ask yourself ‘what hardware is this running on?’, you optimize the algorithm for the problem you want to solve, and the computer itself,” explains Caputo.

This work began with the RoboExNovo project funded by the European Research Council in 2015, which aimed to make web knowledge usable by robots.

Robotic vision is one of the areas in which VandaL researchers are most active.

“My initial training was in theoretical physics of spin glasses, but I promptly realized that evaluating a model purely based on its mathematical plausibility was too restrictive for me,” explains Caputo. “I needed to work on mathematical models that could be immediately tested against data.” So, after a PhD in computer science at KTH in Stockholm, Caputo focused part of her research on robotic vision.

“It is undeniable that deep neural networks have made a much less significant impact on the field of robotics compared to their influence in artificial vision, also known as computer vision,” comments Caputo.

This did not happen for many reasons.

“The web is full of images, but they are always collected from a third-party perspective, not that of the robot,” she explains. Together with other members of the Vandal lab, Caputo has developed strategies to bridge this gap.

“We can imagine that something will change in the near future when images and videos recorded by wearable devices, such as GoPro cameras or augmented reality glasses, are shared en masse on the web. But it's hard to say before we see it happen,” concludes Caputo.

Caputo is also head of the ELLIS unit at Politecnico di Torino. ELLIS is a network of European researchers working in the field of machine learning and intelligent systems, organized into 15 thematic areas. It was founded in 2018, inspired by organizations such as the Canadian Institute for Advanced Research (CIFAR) and the European Molecular Biology Lab (EMBL). “Politecnico di Torino was one of the very first Italian ELLIS units, and we are very proud of this because it places us on the European map as one of the centers of excellence in Europe and has allowed us to recruit young researchers from all over the world,” says Caputo.

Researchers at VandaL are actively engaged in various areas within the field of computer vision.

Carlo Masone is successfully developing algorithms that analyze images of the land, specifically focusing on visual place recognition for images captured from the ground or from satellites. The key question is: if I see an image, can I determine its location in the world using a reference database?

Understanding how to respond to this question is beneficial for applications in both civil and security sectors. Along with his team, Masone has successfully addressed two issues that emerge in tasks of this nature.

On one hand, there is the challenge of quickly accessing and searching a vast database of images that depict locations around the world, captured from various angles and under different weather and lighting conditions. How much storage space is needed for these images, and how quickly can we access this archive? To address this issue, Masone and his team have developed a method for representing information in a highly compact and efficient manner, allowing for swift access and quick comparisons within this extensive database.

Another issue to consider is the scalability of geolocation algorithms. Masone has created an algorithm that can theoretically process images taken from any location in the world. He developed this model using data from Italy and demonstrated its scalability. This means that by inputting an image from anywhere globally, the algorithm can determine where the image was taken in a reasonable amount of time.

Tatiana Tommasi, another researcher at VandaL, is internationally recognized as a leader in extreme learning. One of her most famous works is puzzle-based learning.

The core idea is to learn to recognize a dog image by dividing it into many small pieces, like a puzzle, then reassembling the complete image by fitting the pieces together. When we solve puzzles, we look for local patterns, so by teaching an algorithm to solve puzzles, we are teaching it to find these patterns and therefore to interpret the content.

This approach addresses a very important issue in machine learning. The performance of machine learning systems depends heavily on the quality of the data used to train them. Image recognition has been successful because large, well-curated databases have become available. However, this work is expensive and not always feasible, and Tommasi's research shows that even extremely raw and uncurated data can be used for learning.

One intriguing aspect of Tommasi's work is its inspiration from cognitive psychology. When teaching children to sort objects by size, we often utilize an unrelated attribute: color. In cognitive psychology, this is referred to as an auxiliary task. This educational approach is effective because it leverages one of the first characteristics children can recognize — colors — and uses it to help them learn a new attribute: size.

Our challenge is to develop systems and models that are adaptable, sustainable, robust, reliable, and lightweight.

- Tatiana Tommasi -

Tatiana Tommasi, researcher