Learning to Merge Tokens for Communication-Efficient Collaborative Occupancy Prediction
Communication-efficient multi-agent 3D occupancy prediction with token-based representation and adaptive communication
Overview
This project focuses on communication-efficient collaborative 3D semantic occupancy prediction for autonomous driving.
The goal is to enable multiple agents (vehicles) to collaboratively perceive the environment while operating under strict communication bandwidth constraints.
The work explores how to design compact scene representations and adaptive communication mechanisms to balance perception performance and communication cost.
This project is currently ongoing and planned for submission to CVPR 2026.
Problem Setting
In collaborative perception, each agent observes only a partial view of the environment.
To achieve global scene understanding, agents need to exchange information. However:
- communication bandwidth is limited
- redundant information transfer is common
- naive feature sharing is inefficient
This project studies how to select, compress, and exchange only the most informative content across agents.
Key Ideas
The system is built around three core ideas:
1. Tokenized Scene Representation
Instead of dense feature maps, the scene is represented as a set of compact tokens, enabling efficient information exchange.
2. Spatio-Temporal Information Reuse
A memory mechanism is introduced to reuse information across:
- time (historical frames)
- agents (collaborative context)
This reduces redundant communication.
3. Adaptive Communication
Communication is designed to be task-aware and selective, where:
- only relevant information is exchanged
- redundant or low-value content is filtered
This significantly improves communication efficiency.
System Overview
The system follows a collaborative perception pipeline:
Perception → Representation → Communication → Fusion → Prediction
Key components include:
- token generation from multi-view inputs
- inter-agent communication mechanism
- feature fusion across agents
- occupancy prediction head
Experimental Findings
Preliminary experiments show that:
- the system achieves strong perception performance
- communication cost can be significantly reduced (KB-level)
- efficient trade-off between performance and bandwidth is achieved
Research Significance
This project contributes toward:
- communication-efficient multi-agent perception
- scalable autonomous driving systems
- connections to world model and structured scene representation
Note
Details of the method are intentionally omitted due to ongoing submission.