Congratulations to all authors whose hard work and dedication have paid off!
IROS 2023
Learning Deep Sensorimotor Policies for Vision-Based Autonomous Drone Racing
This paper presents a method for learning deep sensorimotor policies for vision-based drone racing with Learning by Cheating, which achieves robust performance against visual disturbances by learning well-aligned image embeddings using contrastive learning and data augmentation.
Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
Performing multiple heterogeneous visual tasks in dynamic scenes is a hallmark of human perception capability. Despite remarkable progress in image and video recognition via representation learning, current research still focuses on designing specialized networks for singular, homogeneous, or simple combination of tasks. We instead explore the construction of a unified model for major image and video recognition tasks in autonomous driving with diverse input and output structures.
Dual Aggregation Transformer for Image Super-Resolution
A new image super-resolution model, dual aggregation Transformer (DAT), that aggregates spatial and channel features in the dual manner, achieves state-of-the-art performance.
3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers
We propose 3D point PE with depth prior to localize the 2D feature, and it unifies representation of positional encoding for both image feature and object query.
iDisc: Internal Discretization for Monocular Depth Estimation
We propose a monocular depth estimation method which represents internally the scene as a finite set of concepts via a continuous-discrete-continuous bottleneck.
Spatio-Temporal Action Detection Under Large Motion
We propose to enhance actor feature representation under large motion by tracking actors and performing temporal feature aggregation along the respective tracks.