Introduction
The objective of DIDYMOS-XR is to advance the technologies needed for creating digital twins and developing 3D capturing systems for objects and scenes. This involves keeping digital twins of factories or cities updated by automatically analyzing sensor data to refresh the digital models. In recent years the pursuit of high fidelity and immersion has led to the popularity of volumetric video content, making it an ideal technique to fit the needs of the project. This technology has emerged and established itself as a groundbreaking development, offering experiences that closely mimic reality. By capturing a three-dimensional space, such as a dynamic scene, viewers are able to interact with it from any angle and receive a novel, immersive experience.
Volumetric Video in DIDYMOS-XR
Volumetric video technology is integral to the DIDYMOS-XR project, and our team at i2CAT is continually researching new approaches to enhance the current state of the art, with a particular focus on novel solutions for compression and rendering. As noted, a key innovation within the project is the twin synchronization and update; intended to develop an advanced pipeline for volumetric video compression and transmission methods. This will improve the quality of digital twin updates in offline, real-time, and near real-time scenarios. Because volumetric video is highly detailed and interactive, it has sizeable data needs that require advanced data compression and rendering solutions.
Representing Volumetric Video
Point clouds are one of the most common volumetric representation formats, providing detailed information and accurately representing volumetric scenes, but there are several shortcomings to consider. Specifically, point clouds in raw format require a huge amount of memory for storage or bandwidth for transmission. For high quality, better resolution is required which implies larger point clouds and even higher memory and bandwidth demands. This problem becomes more complex when real-time constraints are considered. For that reason, efficient mechanisms to compress, deliver and render point clouds are imperative, in order to reduce memory and bandwidth requirements.
Compression
Real-time volumetric capture is crucial for numerous applications, particularly VR/XR. Efficiently managing the large data volumes produced by volumetric video is essential for streaming and storage, where 3D videos are captured by RGB-D sensors (which capture colour and depth). For instance, point cloud videos with one million points can require up to five gigabits per second, making storage for large scenes like cities very expensive and real-time streaming nearly impossible. Therefore, using advanced algorithms to reduce data size without significant quality loss is a priority.
Initially, once the images from the sensors are collected, the different RGB-D viewpoints are fused into a single volumetric representation. It is then necessary to optimize data streams from multiple sensors by projecting the volume into a 2D space and streaming it using traditional video compression (e.g., HEVC or H.264). This ensures real-time updates to digital twins with minimal bandwidth usage and is seen in the following figure, where a point cloud of 12 million voxels is compressed to reduce the amount of data needed to describe it.
Rendering
After compression, digital twins need to be rendered, transforming compressed data into a visually coherent, detailed, and interactive format for XR applications. At i2CAT, as part of the DIDYMOS-XR project, we are focused on enhancing compression and rendering by exploring various techniques. Our approach utilizes GPU acceleration to manage the intensive computational tasks associated with volumetric rendering, and relies on two techniques.
The first is a similar approach to the V-PCC standard developed by the Moving Picture Expert Group (MPEG), producing specialized shaders called occupancy maps (2D images). These efficiently contain all the geometry information of a 3D point cloud, along with corresponding 2D images containing texture information. This approach speeds up rendering and enhances the final output’s visual quality.
To further improve rendering efficiency, the second technique implements Level-of-Detail (LoD); adjusting the complexity of volumetric data based on the viewer’s distance and perspective. This dynamic adaptation reduces the rendering load and improves system responsiveness, key elements for XR applications. The following figure shows the visual results of applying the aforementioned LoD techniques to reduce the complexity of a point cloud.
Conclusion
Looking to the future, digital twins have immense potential to revolutionize urban planning, enhance public safety, and streamline industrial processes. By reducing the latency and overhead associated with volumetric data, digital twins become more realistic, more synchronized with the physical world, and more accessible for XR applications across various domains.
As digital and physical realms become more connected, DIDYMOS-XR will be essential in shaping the next generation of digital interaction. At i2CAT, we are excited to contribute to the next generation of digital twin evolution through our participation in the DIDYMOS-XR project.
Author: Stefano Beccaletto, i2CAT