This week, Intel Research showcased new technologies in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Intel Labs, in partnership with Blockade Labs, recently launched a unique deployment model called Latent Diffusion Model for 3D (LDM3D). This innovative Artificial Intelligence (AI) deployment model is designed to generate realistic 3D visual content from text prompts. Check out the virtual reality demo below.
“This paper proposes a latent diffusion model of 3D (LDM3D) that generates both image data and a depth map from a given text prompt, allowing users to generate RGBD images from textual prompts. “
A leader in its field, LDM3D is the first model capable of creating a depth map using the diffusion process, resulting in vivid and immersive 3D images with a full 360-degree view. The potential uses for LDM3D span a variety of industries, including gaming, entertainment, architecture, and design, and are poised to dramatically change the landscape of content creation and digital experiences.
Generative AI technology aims to increase and enhance human creativity and save time. However, most current AI models are limited to generating 2D images and very few can generate 3D images from text prompts.
Unlike the current stable-latent diffusion models, LDM3D allows users to generate an image and depth map from a given text vector using approximately the same number of parameters. It provides a more accurate relative depth for each pixel in an image than standard post-processing methods for depth estimation and saves developers significant scenes development time.” said Vasudev Lal, AI/Machine Learning Research Scientist, Intel Labs.
Generative AI technologies aim to enhance and amplify human creativity while saving precious time. However, current generative AI models mainly generate 2D images, with only a few capable of producing 3D images from textual prompts.
LDM3D departs from the norm by enabling users to generate an image and depth map from a given text vector using an almost identical number of parameters as the underlying stable diffusion models. This approach provides a more accurate relative depth for each pixel in an image than standard post-processing techniques for depth estimation, thus greatly reducing the time developers spend developing a scene.
360 images of text prompts
The potential impact of this research is far-reaching, and promises to transform the way we interact with digital content. By allowing users to visualize their textual claims in entirely new ways, LDM3D enables text descriptions of a tropical beach, modern skyscraper, or sci-fi world to be transformed into a detailed 360-degree panorama.
This ability to capture in-depth information can greatly enhance realism and immersion, opening up new applications for a wide range of industries, from gaming and entertainment to interior design and real estate listings, as well as virtual museums and immersive virtual reality (VR) experiences.
To generate a dataset for LDM3D training, a subset of 10,000 samples from the LAION-400M database, which comprises more than 400 million pairs of pictograms, was used. The Dense Prediction Transformer (DPT) wide depth estimation model, previously developed at Intel Labs, was used to illustrate the training set. Large DPT model provides high resolution relative depth to each pixel in the image.
Source: Intel Labs
Filed Under: News Tools
Latest togetherbe
disclosure: Some of our articles include affiliate links. If you buy something through one of these links, togetherbe may earn an affiliate commission. Learn about our disclosure policy.