Data forms the backbone of DIDYMOS-XR’s work in creating and updating accurate digital twins. This is particularly relevant when using deep learning algorithms for tasks such as object detection or segmentation. While these algorithms can efficiently automate tedious tasks, training and validating them requires a large amount of annotated data, which can be costly to acquire. Therefore, acquiring accurate data is crucial to DIDYMOS-XR’s mission of developing robust 3D scene reconstruction methods. This is why Ficosa was appointed at the start of the project to gather various types of data for city-related use cases. This blog presents reflections on the data acquisition and generation challenges tackled by DIDYMOS-XR.
Challenges in data acquisition

The DIDYMOS-XR algorithms used to reconstruct digital twins leverage various data modalities, such as images, point clouds and odometry. Acquiring such data requires a specific experimental setup. For the city use cases, Ficosa used a demo car equipped with various sensors, including a surround-view camera system, 360° LiDAR and dGPS (innovations presented in a previous article). Using this vehicle and the data processing pipeline developed during the project, we were able to conduct several recording campaigns to capture relevant real-world data for developing and validating algorithms.
However, this approach has various challenges and disadvantages.
- Cost: In addition to the cost of the recording car itself, organising and conducting recording campaigns takes time, depending on the size of the area to be covered and the number of shots required. Furthermore, the recorded data is ‘raw’ and may require post-processing to be usable. Annotating the data for AI model training is costly and time-consuming, even when using mitigation techniques.
- Lack of diversity: Data collected from a single setup or location may not capture the full range of scenarios and environments encountered in application, which could limit the generalisability of the algorithms. This can be mitigated by increasing the number of recording campaigns or setups, which takes us back to the previous point.
-
Anonymisation: Using publicly recorded data requires further precautions to be taken to protect privacy and ensure compliance with GDPR. This requires additional processing steps and can limit the data’s usability for certain applications.
Synthetic Data Generation
To overcome some of the challenges associated with traditional data acquisition, the DIDYMOS-XR project takes advantage of synthetic data generation. Using rFpro, an automotive simulation platform, we simulated a city-like environment and a demo car with comparable sensors. We generated several datasets by varying the experimental conditions, such as the trajectory, weather conditions and the behaviour of other road users. This approach allows us to generate data in extreme conditions that would be difficult to obtain in the real world. Furthermore, generating data allows us to automatically compute all the necessary annotations (e.g. object bounding boxes or classes). Finally, using synthetic data removes the need for data privacy policies, since no personal data are involved in the generation process. However, using synthetic data also presents challenges, one of the most significant being ensuring realism. If there is too large a difference between the generated and real data (in terms of image appearance, scenarios, movement, sensor output etc.), it becomes difficult to generalise the developed algorithms to real-world applications. This can result in solutions that perform well on synthetic data but not in real conditions.

Conclusion
As data is central to the development of solutions for digital twin reconstruction, Ficosa was appointed to provide datasets for all DIDYMOS-XR partners, particularly for the city use cases in Vilanova i la Geltrú. To achieve this, we leverage both real-world and synthetic data, allowing us to take advantage of the benefits of both approaches while mitigating their inconveniences. By researching and utilising different data sources and modalities, DIDYMOS-XR ensures the construction of accurate and reliable digital twins.
Author: Dr Romain Guesdon, Ficosa