Main content

Conference papers to be presented by ETH VIS 2023

Congratulations to all authors whose hard work and dedication have paid off!  

    

IROS 2023

Learning Deep Sensorimotor Policies for Vision-Based Autonomous Drone Racing

This paper presents a method for learning deep sensorimotor policies for vision-based drone racing with Learning by Cheating, which achieves robust performance against visual disturbances by learning well-aligned image embeddings using contrastive learning and data augmentation.

external pagehttps://www.vis.xyz/pub/vision-based-autonomous-drone-racing/
 

ICCV 2023

Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving

Performing multiple heterogeneous visual tasks in dynamic scenes is a hallmark of human perception capability. Despite remarkable progress in image and video recognition via representation learning, current research still focuses on designing specialized networks for singular, homogeneous, or simple combination of tasks. We instead explore the construction of a unified model for major image and video recognition tasks in autonomous driving with diverse input and output structures.

external pagehttps://www.vis.xyz/pub/vtd/
 

Dual Aggregation Transformer for Image Super-Resolution

A new image super-resolution model, dual aggregation Transformer (DAT), that aggregates spatial and channel features in the dual manner, achieves state-of-the-art performance.

external pagehttps://www.vis.xyz/pub/dat/
 

MolGrapher: Graph-based Visual Recognition of Chemical Structures

We propose a graph-based method for the recognition of chemical structure images.

external pagehttps://www.vis.xyz/pub/molgrapher/
 

3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers

We propose 3D point PE with depth prior to localize the 2D feature, and it unifies representation of positional encoding for both image feature and object query.

external pagehttps://www.vis.xyz/pub/3dppe/
 

R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras

We propose a method for dense 3D reconstruction and ego-motion estimation from multi-camera input in dynamic environments.

external pagehttps://www.vis.xyz/pub/r3d3/
 

CVPR 2023

iDisc: Internal Discretization for Monocular Depth Estimation

We propose a monocular depth estimation method which represents internally the scene as a finite set of concepts via a continuous-discrete-continuous bottleneck.

external pagehttps://www.vis.xyz/pub/idisc/
 

Mask-Free Video Instance Segmentation

We remove video and image mask annptation necessity for training highly accurate VIS models.

external pagehttps://www.vis.xyz/pub/maskfreevis/
 

ICRA 2023

TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction

We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving.

external pagehttps://www.vis.xyz/pub/trafficbots/
 

RA-L 2023

Uncertainty-Driven Dense Two-View Structure from Motion

We introduce an uncertainty-driven Dense Two-View SfM pipeline.

external pagehttps://www.vis.xyz/pub/dtv-sfm/
 

WACV 2023

Dense Prediction with Attentive Feature Aggregation

We propose Attentive Feature Aggregation (AFA) to exploit both spatial and channel information for semantic segmentation and boundary detection.

external pagehttps://www.vis.xyz/pub/dla-afa/
 

Spatio-Temporal Action Detection Under Large Motion

We propose to enhance actor feature representation under large motion by tracking actors and performing temporal feature aggregation along the respective tracks.

external pagehttps://www.vis.xyz/pub/action-detection-under-large-motion/
 

Composite Learning for Robust and Effective Dense Predictions

We find that jointly training a dense prediction task with a self-supervised task can consistently improve the performance of the target task.

external pagehttps://www.vis.xyz/pub/composite-learning-for-robust-and-effective-dense-predictions/
 

JavaScript has been disabled in your browser