OccProphet : Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework

ICLR 2025
The Hong Kong Polytechnic University
MY ALT TEXT

Illustration of OccProphet. OccProphet only receives multi-camera video input and produces future occupancies.

MY ALT TEXT

Comparison of performance between Cam4DOcc and OccProphet.

Occupancy forecasting results of OccProphet.

Abstract

Predicting variations in complex traffic environments is crucial for the safety of autonomous driving. Recent advancements in occupancy forecasting have enabled forecasting future 3D occupied status in driving environments by observing historical 2D images. However, high computational demands make occupancy forecasting less efficient during training and inference stages, hindering its feasibility for deployment on edge agents. In this paper, we propose a novel framework, i.e., OccProphet, to efficiently and effectively learn occupancy forecasting with significantly lower computational requirements while improving forecasting accuracy. OccProphet comprises three lightweight components: Observer, Forecaster, and Refiner. The Observer extracts spatio-temporal features from 3D multi-frame voxels using the proposed Efficient 4D Aggregation with Tripling-Attention Fusion, while the Forecaster and Refiner conditionally predict and refine future occupancy inferences. Experimental results on nuScenes, Lyft-Level5, and nuScenes-Occupancy datasets demonstrate that OccProphet is both training- and inference-friendly. OccProphet reduces 58%∼78% of the computational cost with a 2.6× speedup compared with the state-of-the-art Cam4DOcc. Moreover, it achieves 4%∼18% relatively higher forecasting accuracy.

Overall Framework

MY ALT TEXT

Overview of OccProphet. It receives multi-frame images from surround-view cameras as input and outputs future occupancy or occupancy flow. It consists of four key components: the Observer, Forecaster, Refiner, and Predictor. The Observer module aggregates spatio-temporal information. The Forecaster module conditionally generates preliminary representations of future scenarios. These preliminary representations are refined by the Refiner module. Finally, the Predictor module produces the final predictions of future occupancy or occupancy flow.

Quantitative Results

Performance on forecasting inflated GMO (nuScenes, Lyft-Level5), fine-grained GMO (nuScenes-Occupancy). SPC: SurroundDepth + PCPNet + Cylinder3D.

MY ALT TEXT

Performance on forecasting inflated GMO and fine-grained GSO.

MY ALT TEXT

Qualitative Results

MY ALT TEXT

Qualitative results of Cam4DOcc and OccProphet in the future 2 seconds. Black arrows denote the motion trends of moving objects. Red dashed rectangles represent that the results of OccProphet are more consistent with the ground truth than those of Cam4DOcc.

BibTeX


        @article{chen2025occprophet,
          title={Occprophet: Pushing efficiency frontier of camera-only 4d occupancy forecasting with observer-forecaster-refiner framework},
          author={Chen, Junliang and Xu, Huaiyuan and Wang, Yi and Chau, Lap-Pui},
          journal={arXiv preprint arXiv:2502.15180},
          year={2025}
        }