Data Processing for Digital Twins: Applications and Future Directions

Introduction

Data and application management play a key role in the development and deployment of digital twin applications. To build and update realistic digital twins, a significant and consistent amount of data is required. This data can be of many types (image/video, LiDAR, odometry, object/scene information etc.), and a similar variety can be found in the XR applications we are developing in DIDYMOS-XR (which include virtual tourism, city maintenance, city planning, and industrial use cases). Hosting this amount of data and number of applications requires suitable infrastructure and data processing pipelines.

Data processing in DIDYMOS-XR for the city use cases

To create accurate digital twins from the data we record, Ficosa employs a multi-stage data processing pipeline. This pipeline orchestrates ETL (Extract, Transform, Load) processes to clean, transform, and integrate the data. It starts with the extraction of raw sensor data, including LiDAR, SVS camera feeds, and localization information (in the case of the Ficosa recording car employed by DIDYMOS-XR). Because the data comes from different sensors, it is synchronized using clock synchronization and timestamps. To automate and monitor each step of the pipeline, and ensure that dependencies between tasks are maintained, we use the automation tool Apache Airflow. The transformed data is then processed through a series of Docker containers (a type of software unit) running various scripts. Most of the tasks performed during this step fall into two categories. First, the data is processed through computer vision algorithms to perform tasks such as video anonymization and RGB point cloud creation. These tasks can be executed in specific instances when Graphic Processing Unit (GPU) technology is required. Next, the data is processed and formatted to meet the expected output specifications. This includes creating the folder structure, annotation and log files, and converting the recorded data to the appropriate file formats. Finally, the processed data is stored on a server to be shared with the rest of the DIDYMOS-XR consortium. In an actual application, this data would be fed into various tools for purposes such as digital tool creation or visualization. This comprehensive approach ensures that the data is accurately processed and readily available for further use in the development of digital twins. Although the DIDIYMOS-XR project leverages other sources of data, such as static cameras or recordings from industrial use cases, the described pipeline remains suitable for these scenarios.

Amazon Web Services (AWS)

With our data processing pipeline in place, we are now exploring how Amazon Web Services (AWS) could further enhance our workflow. AWS are a major actor in this domain, offering a range of tools that have the potential to improve the efficiency, scalability, and integration capabilities of our pipeline. To explore the latest innovations and solutions, Alfonso Fornell López from Ficosa attended the AWS Summit Madrid 2024 on June 5th. Attending this congress was a chance to explore how Generative Artificial Intelligence (AI) can drive and improve process developments, as well as to learn from other companies’ implementations of AI and security methodologies to integrate new practices into the project. It was also an opportunity to evaluate the progress made so far and ensure that the alignment of data management and security, AWS architecture, database content management, and pipeline development accord with the most advanced methodologies presented at the congress. Sharing the latest knowledge and practices allows us to identify new services and technologies that could be incorporated into the development workflow.

The congress was divided into several areas, including exhibition booths, demonstration zones, and conference rooms. Several sessions presented topics relevant to building construction of digital twin-based solutions. One key topic was the development and enhancement of applications. For example, the use of generative AI to accelerate application development was discussed, along with methods for building and scaling these applications while ensuring security and privacy. The next generation of web applications was also highlighted, showcasing how cloud services can enhance software-as-a-service (SaaS) development by incorporating advanced functionalities. Another important topic was data management and optimization. Strategies for improving productivity included optimizing search capabilities within knowledge repositories using natural language processing to refine search efficiency. The benefits of converting legacy database code to modern programming languages were also detailed, emphasizing improved data management and task automation that increase operational flexibility. The final topic covered was privacy and security strategies, focusing on critical areas such as identity verification, privacy, and network security. Since our goal in DIDYMOS-XR is to build novel digital twins and XR applications that require processing large amounts of data, these topics are crucial to ensure the efficient development and deployment of our solutions while guaranteeing data security.

Relevance for the project

This summit provided an opportunity to discuss with stakeholders the various technical solutions that align with our project, with data security and privacy among the main topics. Implementing a Zero Trust architecture could enhance the security of our data infrastructure, ensuring protection against unauthorized access and compliance with data protection regulations. Another critical issue in the creation of digital twins is the handling of large data sets. Several AI-based tools were presented for more efficient platforms capable of managing large data volumes and real-time data analysis. This would allow us to improve the processing and analysis of real and synthetic datasets, optimizing sensor fusion and precision map generation thanks to services based on low-latency queries and data synchronization from multiple sources, such as cameras, lidars and GPS; thus enhancing the responsiveness and accuracy of our XR applications. Finally, process optimization and code reengineering strategies, in particular improving search in knowledge repositories and transitioning from legacy database code to a modern programming language, will modernize our operations and increase our competitiveness and efficiency.

Conclusion

Integrating efficient data processing techniques and tools is crucial for the success of the DIDYMOS-XR project. We have implemented an automated pipeline to process, prepare, and share the recorded data from Ficosa’s car. Our next goal is to enhance the efficiency, scalability, and security of this pipeline, leveraging third-party services like AWS and other innovative technologies. Generative AI and other AI-based tools present promising opportunities to improve data analysis and application development. By continuously monitoring new developments, we can incorporate beneficial enhancements into our workflow. Ultimately, these efforts will contribute to creating accurate and reliable digital twins and XR applications.

Author: Alfonso Fornell López and Romain Guesdon, Ficosa