Main content

Conference papers to be presented by ETH VIS 2023

Congratulations to all authors whose hard work and dedication have paid off!  

    

IROS 2023

Learning Deep Sensorimotor Policies for Vision-Based Autonomous Drone Racing

This paper presents a method for learning deep sensorimotor policies for vision-based drone racing with Learning by Cheating, which achieves robust performance against visual disturbances by learning well-aligned image embeddings using contrastive learning and data augmentation.

external page https://www.vis.xyz/pub/vision-based-autonomous-drone-racing/
 

ICCV 2023

Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving

Performing multiple heterogeneous visual tasks in dynamic scenes is a hallmark of human perception capability. Despite remarkable progress in image and video recognition via representation learning, current research still focuses on designing specialized networks for singular, homogeneous, or simple combination of tasks. We instead explore the construction of a unified model for major image and video recognition tasks in autonomous driving with diverse input and output structures.

external page https://www.vis.xyz/pub/vtd/
 

Dual Aggregation Transformer for Image Super-Resolution

A new image super-resolution model, dual aggregation Transformer (DAT), that aggregates spatial and channel features in the dual manner, achieves state-of-the-art performance.

external page https://www.vis.xyz/pub/dat/
 

MolGrapher: Graph-based Visual Recognition of Chemical Structures

We propose a graph-based method for the recognition of chemical structure images.

external page https://www.vis.xyz/pub/molgrapher/
 

3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers

We propose 3D point PE with depth prior to localize the 2D feature, and it unifies representation of positional encoding for both image feature and object query.

external page https://www.vis.xyz/pub/3dppe/
 

R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras

We propose a method for dense 3D reconstruction and ego-motion estimation from multi-camera input in dynamic environments.

external page https://www.vis.xyz/pub/r3d3/
 

CVPR 2023

iDisc: Internal Discretization for Monocular Depth Estimation

We propose a monocular depth estimation method which represents internally the scene as a finite set of concepts via a continuous-discrete-continuous bottleneck.

external page https://www.vis.xyz/pub/idisc/
 

Mask-Free Video Instance Segmentation

We remove video and image mask annptation necessity for training highly accurate VIS models.

external page https://www.vis.xyz/pub/maskfreevis/
 

ICRA 2023

TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction

We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving.

external page https://www.vis.xyz/pub/trafficbots/
 

RA-L 2023

Uncertainty-Driven Dense Two-View Structure from Motion

We introduce an uncertainty-driven Dense Two-View SfM pipeline.

external page https://www.vis.xyz/pub/dtv-sfm/
 

WACV 2023

Dense Prediction with Attentive Feature Aggregation

We propose Attentive Feature Aggregation (AFA) to exploit both spatial and channel information for semantic segmentation and boundary detection.

external page https://www.vis.xyz/pub/dla-afa/
 

Spatio-Temporal Action Detection Under Large Motion

We propose to enhance actor feature representation under large motion by tracking actors and performing temporal feature aggregation along the respective tracks.

external page https://www.vis.xyz/pub/action-detection-under-large-motion/
 

Composite Learning for Robust and Effective Dense Predictions

We find that jointly training a dense prediction task with a self-supervised task can consistently improve the performance of the target task.

external page https://www.vis.xyz/pub/composite-learning-for-robust-and-effective-dense-predictions/
 

JavaScript has been disabled in your browser