Xpeng releases X-Mind world model for autonomous driving
Xpeng has released X-Mind, a framework embedding predictive world model capabilities into autonomous driving systems to let vehicles simulate near-term environmental changes before committing to an action. Presented at the computer vision and pattern recognition conference’s Foundation Model workshop in Denver this month, it completes the three-pillar Physical AI research programme Xpeng has been assembling alongside X-World and X-Foresight.

Overall architecture of X-Mind. The predictive world model is embedded within the large driving model. Recurrent Block Diffusion executes progressive denoising across hierarchical internal layers in a single forward pass to generate a compact abstract sketch. Conditioned on this anticipated physical future, the planner derives the optimal ego-vehicle trajectory. Blue arrows denote training data flow; black arrows illustrate inference.
Conventional autonomous driving operates on a reactive perception-to-action loop, processing immediate visual input without modelling how the surrounding environment will develop. X-Mind introduces a visual chain of thought that runs a spatial-temporal simulation inside the system before any action is generated, allowing vehicles to anticipate traffic conditions rather than simply respond to them.

Visualisation of the structured abstract sketch. Annotations of this type serve as high-fidelity supervisory signals for training the world model, covering: (a) dynamic traffic light states, (b) adaptive navigation intents, (c) velocity compliance profiles. Dense, structurally featured annotations are critical for the model to learn complex physical and semantic driving rules.
The framework’s Thought Sketch module compresses 12 projected future frames into 96 tokens using a deep compression autoencoder, retaining road topology, traffic light states and navigation intent while discarding planning-irrelevant texture data. A Recurrent Block Diffusion mechanism then generates future rollouts in a single forward pass, achieving substantially higher image quality than single-step denoising at comparable inference latency.

Overview of Recurrent Block Diffusion. Transformer layers are divided into five blocks; during training, sketch token features at each block are replaced with linear combinations of noise and ground truth. During inference, outputs of preceding blocks feed subsequent blocks via Euler integration with a fixed time step—all within one large language model forward pass.
In comparative testing, X-Mind reduced lateral and longitudinal displacement error against conventional vision-language-action models, with gains concentrated in complex long-tail scenarios where safety and traffic compliance are most critical. Inference latency is described as compatible with automotive-grade hardware under resource constraints — a deployment threshold that heavier 3D reconstruction approaches have not met.

Qualitative comparison of future bird’s-eye-view (BEV) predictions. The images illustrate the results of future spatial inference under both daytime and nighttime scenarios. Compared to baseline methods based on single-step generation (middle row), the Recurrent Block Diffusion (RBD) framework proposed by X-Mind (bottom row) yields highly accurate and temporally coherent predictions. Crucially, even in cases where dynamic objects are absent from ground truth (GT) supervision, the RBD framework demonstrates a cognitive capability to predict the motion of dynamic objects.
X-Mind, X-World and X-Foresight together constitute Xpeng’s Physical AI Foundational Model lineage, covering proactive reasoning, controllable generation and long-horizon forecasting. Xpeng has indicated the architecture is being extended beyond autonomous driving into embodied intelligence applications.
Source: Xpeng
AP by OMG
Asian-Promotions.com |
Buy More, Pay Less | Anywhere in Asia
Shop Smarter on AP Today | FREE Product Samples, Latest
Discounts, Deals, Coupon Codes & Promotions | Direct Brand Updates every
second | Every Shopper’s Dream!
Asian-Promotions.com or AP lets you buy more and pay less
anywhere in Asia. Shop Smarter on AP Today. Sign-up for FREE Product Samples,
Latest Discounts, Deals, Coupon Codes & Promotions. With Direct Brand
Updates every second, AP is Every Shopper’s Dream come true! Stretch your
dollar now with AP. Start saving today!
Originally posted on: https://www.automotiveworld.com/news/xpeng-releases-x-mind-world-model-for-autonomous-driving/