Intelligent Systems


2024


no image
The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks

Spieler, A., Rahaman, N., Martius, G., Schölkopf, B., Levina, A.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024 (conference) Accepted

arXiv [BibTex]

2024

arXiv [BibTex]


no image
Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks

Khajehabdollahi, S., Zeraati, R., Giannakakis, E., Schäfer, T. J., Martius, G., Levina, A.

In The Twelfth International Conference on Learning Representations, ICLR 2024, May 2024 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Learning Hierarchical World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics

Gumbsch, C., Sajid, N., Martius, G., Butz, M. V.

In The Twelfth International Conference on Learning Representations, ICLR 2024, May 2024 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Multi-View Causal Representation Learning with Partial Observability

Yao, D., Xu, D., Lachapelle, S., Magliacane, S., Taslakian, P., Martius, G., von Kügelgen, J., Locatello, F.

Proceedings of the Twelfth International Conference on Learning Representations (ICLR), May 2024 (conference) Accepted

arXiv [BibTex]

arXiv [BibTex]


no image
Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models
2024 (misc)

Abstract
Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: this https://sites.google.com/view/naturalwalkingrl

link (url) [BibTex]

2023


no image
Online Learning under Adversarial Nonlinear Constraints

Kolev, P., Martius, G., Muehlebach, M.

In Advances in Neural Information Processing Systems 36, December 2023 (inproceedings)

link (url) [BibTex]

2023

link (url) [BibTex]


no image
Regularity as Intrinsic Reward for Free Play

Sancaktar, C., Piater, J., Martius, G.

In Advances in Neural Information Processing Systems 37, December 2023 (inproceedings)

Website Code link (url) [BibTex]

Website Code link (url) [BibTex]


no image
Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities

Zadaianchuk, A., Seitzer, M., Martius, G.

In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023), Advances in Neural Information Processing Systems 36, December 2023 (inproceedings)

Abstract
Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted domains. Recently, it was shown that the reconstruction of pre-trained self-supervised features leads to object-centric representations on unconstrained real-world image datasets. Building on this approach, we propose a novel way to use such pre-trained features in the form of a temporal feature similarity loss. This loss encodes semantic and temporal correlations between image patches and is a natural way to introduce a motion bias for object discovery. We demonstrate that this loss leads to state-of-the-art performance on the challenging synthetic MOVi datasets. When used in combination with the feature reconstruction loss, our model is the first object-centric video model that scales to unconstrained video datasets such as YouTube-VIS.

arXiv Website OpenReview link (url) [BibTex]

arXiv Website OpenReview link (url) [BibTex]


no image
Goal-conditioned Offline Planning from Curious Exploration

Bagatella, M., Martius, G.

In Advances in Neural Information Processing Systems 36, December 2023 (inproceedings)

Abstract
Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments.

link (url) [BibTex]

link (url) [BibTex]


no image
Improving Behavioural Cloning with Positive Unlabeled Learning

Wang, Q., McCarthy, R., Bulens, D. C., McGuinness, K., O’Connor, N. E., Sanchez, F. R., Gürtler, N., Widmaier, F., Redmond, S. J.

7th Annual Conference on Robot Learning (CoRL), November 2023 (conference) Accepted

[BibTex]

[BibTex]


Backpropagation through Combinatorial Algorithms: Identity with Projection Works
Backpropagation through Combinatorial Algorithms: Identity with Projection Works

Sahoo, S., Paulus, A., Vlastelica, M., Musil, V., Kuleshov, V., Martius, G.

In Proceedings of the Eleventh International Conference on Learning Representations, May 2023 (inproceedings) Accepted

Abstract
Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous problems, or interpolating the loss landscape with techniques that typically require additional solver calls, introduce extra hyper-parameters, or compromise performance. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. Our experiments demonstrate that such a straightforward hyper-parameter-free approach is able to compete with previous more complex methods on numerous experiments such as backpropagation through discrete samplers, deep graph matching, and image retrieval. Furthermore, we substitute the previously proposed problem-specific and label-dependent margin with a generic regularization procedure that prevents cost collapse and increases robustness.

OpenReview Arxiv Pdf link (url) [BibTex]

OpenReview Arxiv Pdf link (url) [BibTex]


DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems
DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems

Schumacher, P., Haeufle, D. F., Büchler, D., Schmitt, S., Martius, G.

In Proceedings of the Eleventh International Conference on Learning Representations (ICLR), The Eleventh International Conference on Learning Representations (ICLR), May 2023 (inproceedings)

Abstract
Muscle-actuated organisms are capable of learning an unparalleled diversity of dexterous movements despite their vast amount of muscles. Reinforcement learning (RL) on large musculoskeletal models, however, has not been able to show similar performance. We conjecture that ineffective exploration in large overactuated action spaces is a key problem. This is supported by our finding that common exploration noise strategies are inadequate in synthetic examples of overactuated systems. We identify differential extrinsic plasticity (DEP), a method from the domain of self-organization, as being able to induce state-space covering exploration within seconds of interaction. By integrating DEP into RL, we achieve fast learning of reaching and locomotion in musculoskeletal systems, outperforming current approaches in all considered tasks in sample efficiency and robustness.

Arxiv pdf Website link (url) [BibTex]

Arxiv pdf Website link (url) [BibTex]


Benchmarking Offline Reinforcement Learning on Real-Robot Hardware
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

Gürtler, N., Blaes, S., Kolev, P., Widmaier, F., Wüthrich, M., Bauer, S., Schölkopf, B., Martius, G.

In Proceedings of the Eleventh International Conference on Learning Representations, The Eleventh International Conference on Learning Representations (ICLR), May 2023 (inproceedings)

Abstract
Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.

Website arXiv Code link (url) [BibTex]

Website arXiv Code link (url) [BibTex]


no image
Efficient Learning of High Level Plans from Play

Armengol Urpi, N., Bagatella, M., Hilliges, O., Martius, G., Coros, S.

In International Conference on Robotics and Automation, 2023 (inproceedings) Accepted

Abstract
Real-world robotic manipulation tasks remain an elusive challenge, since they involve both fine-grained environment interaction, as well as the ability to plan for long-horizon goals. Although deep reinforcement learning (RL) methods have shown encouraging results when planning end-to-end in high-dimensional environments, they remain fundamentally limited by poor sample efficiency due to inefficient exploration, and by the complexity of credit assignment over long horizons. In this work, we present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL to achieve long-horizon complex manipulation tasks. We leverage task-agnostic play data to learn a discrete behavioral prior over object-centric primitives, modeling their feasibility given the current context. We then design a high-level goal-conditioned policy which (1) uses primitives as building blocks to scaffold complex long-horizon tasks and (2) leverages the behavioral prior to accelerate learning. We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks and learns policies that can be easily transferred to physical hardware.

Arxiv Website Poster [BibTex]

Arxiv Website Poster [BibTex]


Pink Noise Is All You Need: Colored Noise Exploration in Deep Reinforcement Learning
Pink Noise Is All You Need: Colored Noise Exploration in Deep Reinforcement Learning

Eberhard, O., Hollenstein, J., Pinneri, C., Martius, G.

In Proceedings of the Eleventh International Conference on Learning Representations (ICLR), The Eleventh International Conference on Learning Representations (ICLR), May 2023 (inproceedings)

Abstract
In off-policy deep reinforcement learning with continuous action spaces, exploration is often implemented by injecting action noise into the action selection process. Popular algorithms based on stochastic policies, such as SAC or MPO, inject white noise by sampling actions from uncorrelated Gaussian distributions. In many tasks, however, white noise does not provide sufficient exploration, and temporally correlated noise is used instead. A common choice is Ornstein-Uhlenbeck (OU) noise, which is closely related to Brownian motion (red noise). Both red noise and white noise belong to the broad family of colored noise. In this work, we perform a comprehensive experimental evaluation on MPO and SAC to explore the effectiveness of other colors of noise as action noise. We find that pink noise, which is halfway between white and red noise, significantly outperforms white noise, OU noise, and other alternatives on a wide range of environments. Thus, we recommend it as the default choice for action noise in continuous control.

link (url) [BibTex]

link (url) [BibTex]


Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions
Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions

Li, C., Blaes, S., Kolev, P., Vlastelica, M., Frey, J., Martius, G.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), IEEE International Conference on Robotics and Automation (ICRA), May 2023 (inproceedings) Accepted

Abstract
Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining controllable skill sets from unlabeled datasets containing diverse state transition patterns by maximizing their discriminability. Moreover, we show that by utilizing unsupervised skill discovery in the generative adversarial imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained universal policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.

Arxiv Videos Project [BibTex]

Arxiv Videos Project [BibTex]


no image
Bridging the Gap to Real-World Object-Centric Learning

Seitzer, M., Horn, M., Zadaianchuk, A., Zietlow, D., Xiao, T., Simon-Gabriel, C., He, T., Zhang, Z., Schölkopf, B., Brox, T., Locatello, F.

In Proceedings of the Eleventh International Conference on Learning Representations, The Eleventh International Conference on Learning Representations (ICLR), May 2023 (inproceedings)

Abstract
Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data and is the first unsupervised object-centric model that scales to real world-datasets such as COCO and PASCAL VOC. DINOSAUR is conceptually simple and shows competitive performance compared to more involved pipelines from the computer vision literature.

Code Website link (url) [BibTex]

Code Website link (url) [BibTex]

2022


Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations
Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

(Best Paper Award Finalist)

Li, C., Vlastelica, M., Blaes, S., Frey, J., Grimminger, F., Martius, G.

Proceedings of the 6th Conference on Robot Learning (CoRL), Conference on Robot Learning (CoRL), December 2022 (conference) Accepted

Abstract
Learning agile skills is one of the main challenges in robotics. To this end, reinforcement learning approaches have achieved impressive results. These methods require explicit task information in terms of a reward function or an expert that can be queried in simulation to provide a target control output, which limits their applicability. In this work, we propose a generative adversarial method for inferring reward functions from partial and potentially physically incompatible demonstrations for successful skill acquirement where reference or expert demonstrations are not easily accessible. Moreover, we show that by using a Wasserstein GAN formulation and transitions from demonstrations with rough and partial information as input, we are able to extract policies that are robust and capable of imitating demonstrated behaviors. Finally, the obtained skills such as a backflip are tested on an agile quadruped robot called Solo 8 and present faithful replication of hand-held human demonstrations.

Arxiv Videos Project link (url) [BibTex]

2022

Arxiv Videos Project link (url) [BibTex]


no image
Learning with Muscles: Benefits for Data-Efficiency and Robustness in Anthropomorphic Tasks

Wochner, I., Schumacher, P., Martius, G., Büchler, D., Schmitt, S., Haeufle, D.

Proceedings of the 6th Conference on Robot Learning (CoRL), 205, pages: 1178-1188, Proceedings of Machine Learning Research, (Editors: Liu, Karen and Kulic, Dana and Ichnowski, Jeff), PMLR, December 2022 (conference)

link (url) [BibTex]

link (url) [BibTex]


no image
A Sequential Group VAE for Robot Learning of Haptic Representations

Richardson, B. A., Kuchenbecker, K. J., Martius, G.

pages: 1-11, Workshop paper (8 pages) presented at the CoRL Workshop on Aligning Robot Representations with Humans, Auckland, New Zealand, December 2022 (misc)

Abstract
Haptic representation learning is a difficult task in robotics because information can be gathered only by actively exploring the environment over time, and because different actions elicit different object properties. We propose a Sequential Group VAE that leverages object persistence to learn and update latent general representations of multimodal haptic data. As a robot performs sequences of exploratory procedures on an object, the model accumulates data and learns to distinguish between general object properties, such as size and mass, and trial-to-trial variations, such as initial object position. We demonstrate that after very few observations, the general latent representations are sufficiently refined to accurately encode many haptic object properties.

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation
Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

Sancaktar, C., Blaes, S., Martius, G.

In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022 (inproceedings)

Arxiv Videos Openreview link (url) [BibTex]

Arxiv Videos Openreview link (url) [BibTex]


Embrace the Gap: VAEs Perform Independent Mechanism Analysis
Embrace the Gap: VAEs Perform Independent Mechanism Analysis

Reizinger*, P., Gresele*, L., Brady*, J., von Kügelgen, J., Zietlow, D., Schölkopf, B., Martius, G., Brendel, W., Besserve, M.

Advances in Neural Information Processing Systems (NeurIPS 2022), 35, pages: 12040-12057, (Editors: S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh), Curran Associates, Inc., 36th Annual Conference on Neural Information Processing Systems, December 2022, *equal first authorship (conference)

Arxiv PDF link (url) [BibTex]

Arxiv PDF link (url) [BibTex]


no image
Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World

Gürtler, N., Widmaier, F., Sancaktar, C., Blaes, S., Kolev, P., Bauer, S., Wüthrich, M., Wulfmeier, M., Riedmiller, M., Allshire, A., Wang, Q., McCarthy, R., Kim, H., Baek, J., Kwon, W., Qian, S., Toshimitsu, Y., Michelis, M. Y., Kazemipour, A., Raayatsanati, A., Zheng, H., Cangan, B. G., Schölkopf, B., Martius, G.

Proceedings of the NeurIPS 2022 Competitions Track, 220, pages: 133-150, Proceedings of Machine Learning Research, (Editors: Ciccone, Marco and Stolovitzky, Gustavo and Albrecht, Jacob), PMLR, December 2022 (conference)

link (url) [BibTex]

link (url) [BibTex]


no image
Self-supervised Reinforcement Learning with Independently Controllable Subgoals

Zadaianchuk, A., Martius, G., Yang, F.

In Proceedings of the 5th Conference on Robot Learning, 164, pages: 384-394, PMLR, 2022 (inproceedings) Accepted

Abstract
To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects.

Arxiv Openreview Poster link (url) Project Page [BibTex]

Arxiv Openreview Poster link (url) Project Page [BibTex]


A Soft Vision-Based Tactile Sensor for Robotic Fingertip Manipulation
A Soft Vision-Based Tactile Sensor for Robotic Fingertip Manipulation

Andrussow, I., Sun, H., Kuchenbecker, K. J., Martius, G.

Workshop paper (1 page) presented at the IROS Workshop on Large-Scale Robotic Skin: Perception, Interaction and Control, Kyoto, Japan, October 2022 (misc)

Abstract
For robots to become fully dexterous, their hardware needs to provide rich sensory feedback. High-resolution haptic sensing similar to the human fingertip can enable robots to execute delicate manipulation tasks like picking up small objects, inserting a key into a lock, or handing a cup of coffee to a human. Many tactile sensors have emerged in recent years; one especially promising direction is vision-based tactile sensors due to their low cost, low wiring complexity and high-resolution sensing capabilities. In this work, we build on previous findings to create a soft fingertip-sized tactile sensor. It can sense normal and shear contact forces all around its 3D surface with an average prediction error of 0.05 N, and it localizes contact on its shell with an average prediction error of 0.5 mm. The software of this sensor uses a data-efficient machine-learning pipeline to run in real time on hardware with low computational power like a Raspberry Pi. It provides a maximum data frame rate of 60 Hz via USB.

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


Developing hierarchical anticipations via neural network-based event segmentation
Developing hierarchical anticipations via neural network-based event segmentation

(SmartBot Challenge award winner)

Gumbsch, C., Adam, M., Elsner, B., Martius, G., Butz, M. V.

In Proceedings of the IEEE International Conference on Development and Learning (ICDL 2022), pages: 1-8, 2022 IEEE International Conference on Development and Learning (ICDL), September 2022 (inproceedings)

Abstract
Humans can make predictions on various time scales and hierarchical levels. Thereby, the learning of event encodings seems to play a crucial role. In this work we model the development of hierarchical predictions via autonomously learned latent event codes. We present a hierarchical recurrent neural network architecture, whose inductive learning biases foster the development of sparsely changing latent state that compress sensorimotor sequences. A higher level network learns to predict the situations in which the latent states tend to change. Using a simulated robotic manipulator, we demonstrate that the system (i) learns latent states that accurately reflect the event structure of the data, (ii) develops meaningful temporal abstract predictions on the higher level, and (iii) generates goal-anticipatory behavior similar to gaze behavior found in eye-tracking studies with infants. The architecture offers a step towards the autonomous learning of compressed hierarchical encodings of gathered experiences and the exploitation of these encodings to generate adaptive behavior.

link (url) [BibTex]

link (url) [BibTex]


{InvGAN}: Invertible {GANs}
InvGAN: Invertible GANs

(Best Paper Award)

Ghosh, P., Zietlow, D., Black, M. J., Davis, L. S., Hu, X.

In Pattern Recognition, pages: 3-19, Lecture Notes in Computer Science, 13485, (Editors: Andres, Björn and Bernard, Florian and Cremers, Daniel and Frintrop, Simone and Goldlücke, Bastian and Ihrke, Ivo), Springer, Cham, 44th DAGM German Conference on Pattern Recognition (DAGM GCPR 2022), September 2022 (inproceedings)

Abstract
Generation of photo-realistic images, semantic editing and representation learning are only a few of many applications of high-resolution generative models. Recent progress in GANs have established them as an excellent choice for such tasks. However, since they do not provide an inference model, downstream tasks such as classification cannot be easily applied on real images using the GAN latent space. Despite numerous efforts to train an inference model or design an iterative method to invert a pre-trained generator, previous methods are dataset (e.g. human face images) and architecture (e.g. StyleGAN) specific. These methods are nontrivial to extend to novel datasets or architectures. We propose a general framework that is agnostic to architecture and datasets. Our key insight is that, by training the inference and the generative model together, we allow them to adapt to each other and to converge to a better quality model. Our InvGAN, short for Invertible GAN, successfully embeds real images in the latent space of a high quality generative model. This allows us to perform image inpainting, merging, interpolation and online data augmentation. We demonstrate this with extensive qualitative and quantitative experiments.

pdf DOI [BibTex]

pdf DOI [BibTex]


no image
Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers

Zietlow, D., Lohaus, M., Balakrishnan, G., Kleindessner, M., Locatello, F., Schölkopf, B., Russell, C.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 10410-10421, June 2022 (conference)

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]


no image
Orchestrated Value Mapping for Reinforcement Learning

Fatemi, M., Tavakoli, A.

International Conference on Learning Representations, April 2022 (conference)

link (url) [BibTex]

link (url) [BibTex]


no image
Uncertainty in Equation Learning

Werner, M., Junginger, A., Hennig, P., Martius, G.

In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO), pages: 2298-2305, Association for Computing Machinery, 2022 (inproceedings)

Abstract
Equation learning is a deep learning framework for explainable machine learning in regression settings, with applications in engineering and the natural sciences. Equation learners typically do not capture uncertainty about the model or its predictions, although uncertainty is often highly structured and particularly relevant for these kinds of applications. We show how simple, yet effective, forms of Bayesian deep learning can be used to build structure and explainable uncertainty over a set of found equations. Specifically, we use a mixture of Laplace approximations, where each mixture component captures a different equation structure, and the local Laplace approximations capture parametric uncertainty within one family of equations. We present results on both synthetic and real world examples.

Paper PDF DOI [BibTex]

Paper PDF DOI [BibTex]

2021


no image
Hierarchical Reinforcement Learning with Timed Subgoals

Gürtler, N., Büchler, D., Martius, G.

In Advances in Neural Information Processing Systems 34, 26, pages: 21732-21743, (Editors: M. Ranzato and A. Beygelzimer and Y. Dauphin and P. S. Liang and J. Wortman Vaughan), Curran Associates, Inc., Red Hook, NY, 35th Conference on Neural Information Processing Systems (NeurIPS 2021), December 2021 (inproceedings)

video arXiv code link (url) [BibTex]

2021

video arXiv code link (url) [BibTex]


Planning from Pixels in Environments with Combinatorially Hard Search Spaces
Planning from Pixels in Environments with Combinatorially Hard Search Spaces

Bagatella, M., Olšák, M., Rolínek, M., Martius, G.

In Advances in Neural Information Processing Systems 34, 30, pages: 24707-24718, Curran Associates, Inc., Red Hook, NY, 35th Conference on Neural Information Processing Systems (NeurIPS 2021), December 2021 (inproceedings)

Abstract
The ability to form complex plans based on raw visual input is a litmus test for current capabilities of artificial intelligence, as it requires a seamless combination of visual processing and abstract algorithmic execution, two traditionally separate areas of computer science. A recent surge of interest in this field brought advances that yield good performance in tasks ranging from arcade games to continuous control; these methods however do not come without significant issues, such as limited generalization capabilities and difficulties when dealing with combinatorially hard planning instances. Our contribution is two-fold: (i) we present a method that learns to represent its environment as a latent graph and leverages state reidentification to reduce the complexity of finding a good policy from exponential to linear (ii) we introduce a set of lightweight environments with an underlying discrete combinatorial structure in which planning is challenging even for humans. Moreover, we show that our methods achieves strong empirical generalization to variations in the environment, even across highly disadvantaged regimes, such as “one-shot” planning, or in an offline RL paradigm which only provides low-quality trajectories.

link (url) [BibTex]

link (url) [BibTex]


Causal Influence Detection for Improving Efficiency in Reinforcement Learning
Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Seitzer, M., Schölkopf, B., Martius, G.

In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 34, pages: 22905-22918, (Editors: M. Ranzato and A. Beygelzimer and Y. Dauphin and P. S. Liang and J. Wortman Vaughan), Curran Associates, Inc., Red Hook, NY, 35th Conference on Neural Information Processing Systems, December 2021 (inproceedings)

arXiv PDF Data Code link (url) Project Page [BibTex]

arXiv PDF Data Code link (url) Project Page [BibTex]


Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains
Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

Gumbsch, C., Butz, M. V., Martius, G.

In Advances in Neural Information Processing Systems 34, 21, pages: 17518-17531, (Editors: M. Ranzato and A. Beygelzimer and Y. Dauphin and P. S. Liang and J. Wortman Vaughan), Curran Associates, Inc., Red Hook, NY, 35th Conference on Neural Information Processing Systems (NeurIPS 2021), December 2021 (inproceedings)

Abstract
A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. Accordingly, we propose Gated $L_0$ Regularized Dynamics (GateL0RD), a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states. The bias is implemented by means of a novel internal gating function and a penalty on the $L_0$ norm of latent state changes. We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks. GateL0RD tends to encode the underlying generative factors of the environment, ignores spurious temporal dependencies, and generalizes better, improving sampling efficiency and prediction accuracy as well as behavior in model-based planning and reinforcement learning tasks. Moreover, we show that the developing latent states can be easily interpreted, which is a step towards better explainability in RNNs.

arXiv Openreview link (url) [BibTex]

arXiv Openreview link (url) [BibTex]


Risk-Averse Zero-Order Trajectory Optimization
Risk-Averse Zero-Order Trajectory Optimization

Vlastelica*, M., Blaes*, S., Pinneri, C., Martius, G.

In Conference on Robot Learning, 2021, *Equal Contribution (inproceedings) Accepted

Abstract
We introduce a simple but effective method for managing risk in zero-order trajectory optimization that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks. Various experiments indicate that the separation of uncertainties is essential to performing well with data-driven MPC approaches in uncertain and safety-critical control environments.

OpenReview PDF link (url) Project Page [BibTex]


no image
Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths

Hornakova, A. K. T. S. P. R. M. R. B. H. R.

Proceedings 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), pages: 6310-6320, IEEE, ICCV 2021, October 2021 (conference)

DOI [BibTex]

DOI [BibTex]


CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints
CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints

Paulus, A., Rolínek, M., Musil, V., Amos, B., Martius, G.

In Proceedings of the 38th International Conference on Machine Learning, 139, pages: 8443-8453, Proceedings of Machine Learning Research, (Editors: Meila, Marina and Zhang, Tong), PMLR, The Thirty-eighth International Conference on Machine Learning (ICML), July 2021 (inproceedings)

Abstract
Bridging logical and algorithmic reasoning with modern machine learning techniques is a fundamental challenge with potentially transformative impact. On the algorithmic side, many NP-hard problems can be expressed as integer programs, in which the constraints play the role of their ``combinatorial specification.'' In this work, we aim to integrate integer programming solvers into neural network architectures as layers capable of learning both the cost terms and the constraints. The resulting end-to-end trainable architectures jointly extract features from raw data and solve a suitable (learned) combinatorial problem with state-of-the-art integer programming solvers. We demonstrate the potential of such layers with an extensive performance analysis on synthetic data and with a demonstration on a competitive computer vision keypoint matching benchmark.

Arxiv Code Pdf Spotlight video @ ICML 2021 Poster @ ICML 2021 link (url) Project Page [BibTex]

Arxiv Code Pdf Spotlight video @ ICML 2021 Poster @ ICML 2021 link (url) Project Page [BibTex]


Demystifying Inductive Biases for (Beta-)VAE Based Architectures
Demystifying Inductive Biases for (Beta-)VAE Based Architectures

Zietlow, D., Rolinek, M., Martius, G.

In Proceedings of the 2021 International Conference on Machine Learning (ICML), The Thirty-eighth International Conference on Machine Learning (ICML), July 2021 (inproceedings)

Abstract
The performance of Beta-Variational-Autoencoders and their variants on learning semantically meaningful, disentangled representations is unparalleled. On the other hand, there are theoretical arguments suggesting the impossibility of unsupervised disentanglement. In this work, we shed light on the inductive bias responsible for the success of VAE-based architectures. We show that in classical datasets the structure of variance, induced by the generating factors, is conveniently aligned with the latent directions fostered by the VAE objective. This builds the pivotal bias on which the disentangling abilities of VAEs rely. By small, elaborate perturbations of existing datasets, we hide the convenient correlation structure that is easily exploited by a variety of architectures. To demonstrate this, we construct modified versions of standard datasets in which (i) the generative factors are perfectly preserved; (ii) each image undergoes a mild transformation causing a small change of variance; (iii) the leading VAE-based disentanglement architectures fail to produce disentangled representations whilst the performance of a non-variational method remains unchanged.

Arxiv PDF Paper @ ICML 2021 (spotlight video) Project Page [BibTex]

Arxiv PDF Paper @ ICML 2021 (spotlight video) Project Page [BibTex]


no image
Neuro-algorithmic Policies Enable Fast Combinatorial Generalization

Vlastelica, M., Rolinek, M., Martius, G.

In Proceedings of the 2021 International Conference on Machine Learning (ICML), The Thirty-eighth International Conference on Machine Learning (ICML), July 2021 (inproceedings)

Abstract
Although model-based and model-free approa\-ches to learning the control of systems have achieved impressive results on standard benchmarks, generalization to task variations is still lacking. Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data. We give evidence that generalization capabilities are in many cases bottlenecked by the inability to generalize on the combinatorial aspects of the problem. We show that, for a certain subclass of the MDP framework, this can be alleviated by a neuro-algorithmic policy architecture that embeds a time-dependent shortest path solver in a deep neural network. Trained end-to-end via blackbox-differentiation, this method leads to considerable improvement in generalization capabilities in the low-data regime.

arXiv Spotlight PDF Project Page [BibTex]

arXiv Spotlight PDF Project Page [BibTex]


no image
The dynamical regime and its importance for evolvability, task performance and generalization

Prosi, J., Khajehabdollahi, S., Giannakakis, E., Martius, G., Levina, A.

In The 2021 Conference on Artificial Life, MIT Press, July 2021 (inproceedings)

PDF link (url) DOI [BibTex]

PDF link (url) DOI [BibTex]


Self-supervised Visual Reinforcement Learning with Object-centric Representations
Self-supervised Visual Reinforcement Learning with Object-centric Representations

Zadaianchuk*, A., Seitzer*, M., Martius, G.

In 9th International Conference on Learning Representations (ICLR 2021), May 2021, *equal contribution (inproceedings)

Abstract
Autonomous agents need large repertoires of skills to act reasonably on new tasks that they have not seen before. However, acquiring these skills using only a stream of high-dimensional, unstructured, and unlabeled observations is a tricky challenge for any autonomous agent. Previous methods have used variational autoencoders to encode a scene into a low-dimensional vector that can be used as a goal for an agent to discover new skills. Nevertheless, in compositional/multi-object environments it is difficult to disentangle all the factors of variation into such a fixed-length representation of the whole scene. We propose to use object-centric representations as a modular and structured observation space, which is learned with a compositional generative world model. We show that the structure in the representations in combination with goal-conditioned attention policies helps the autonomous agent to discover and learn useful skills. These skills can be further combined to address compositional tasks like the manipulation of several different objects.

Arxiv Code Paper @ ICLR 2021 (spotlight video) OpenReview Project Page [BibTex]


Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers
Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers

Pinneri*, C., Sawant*, S., Blaes, S., Martius, G.

In 9th International Conference on Learning Representations (ICLR 2021), May 2021, *equal contribution (inproceedings)

Abstract
Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping

OpenReview Project Page [BibTex]

OpenReview Project Page [BibTex]

2020


no image
Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers

Rolínek, M., Swoboda, P., Zietlow, D., Paulus, A., Musil, V., Martius, G.

In Computer Vision – ECCV 2020, 28, pages: 407-424, Lecture Notes in Computer Science, 12373, (Editors: Vedaldi, Andrea and Bischof, Horst and Brox, Thomas and Frahm, Jan-Michael), Springer, Cham, 16th European Conference on Computer Vision (ECCV 2020) , August 2020 (inproceedings)

Abstract
Building on recent progress at the intersection of combinatorial optimization and deep learning, we propose an end-to-end trainable architecture for deep graph matching that contains unmodified combinatorial solvers. Using the presence of heavily optimized combinatorial solvers together with some improvements in architecture design, we advance state-of-the-art on deep graph matching benchmarks for keypoint correspondence. In addition, we highlight the conceptual advantages of incorporating solvers into deep learning architectures, such as the possibility of post-processing with a strong multi-graph matching solver or the indifference to changes in the training setting. Finally, we propose two new challenging experimental setups.

Code Arxiv Long Spotlight Short Spotlight pdf DOI Project Page [BibTex]

2020

Code Arxiv Long Spotlight Short Spotlight pdf DOI Project Page [BibTex]


no image
Fast Non-Parametric Learning to Accelerate Mixed-Integer Programming for Hybrid Model Predictive Control

Zhu, J., Martius, G.

IFAC-PapersOnLine, 21rst IFAC World Congress, 53(2):5239-5245, Elsevier, Amsterdam, 21rst IFAC World Congress, July 2020 (conference)

arXiv link (url) DOI [BibTex]

arXiv link (url) DOI [BibTex]


Optimizing Rank-based Metrics with Blackbox Differentiation
Optimizing Rank-based Metrics with Blackbox Differentiation

Rolínek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., Martius, G.

In Proceedings of the IEEE/CVF Conerence on Computer Vision and Pattern Recognition (CVPR), pages: 7620-7630, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, June 2020, Best paper nomination (inproceedings)

Abstract
Rank-based metrics are some of the most widely used criteria for performance evaluation of computer vision models. Despite years of effort, direct optimization for these metrics remains a challenge due to their non-differentiable and non-decomposable nature. We present an efficient, theoretically sound, and general method for differentiating rank-based metrics with mini-batch gradient descent. In addition, we address optimization instability and sparsity of the supervision signal that both arise from using rank-based metrics as optimization targets. Resulting losses based on recall and Average Precision are applied to image retrieval and object detection tasks. We obtain performance that is competitive with state-of-the-art on standard image retrieval datasets and consistently improve performance of near state-of-the-art object detectors.

Paper @ CVPR2020 Long Oral Short Oral Arxiv Code Pdf link (url) DOI Project Page [BibTex]

Paper @ CVPR2020 Long Oral Short Oral Arxiv Code Pdf link (url) DOI Project Page [BibTex]


Calibrating a Soft {ERT}-Based Tactile Sensor with a Multiphysics Model and Sim-to-real Transfer Learning
Calibrating a Soft ERT-Based Tactile Sensor with a Multiphysics Model and Sim-to-real Transfer Learning

Lee, H., Park, H., Serhat, G., Sun, H., Kuchenbecker, K. J.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 1632-1638, IEEE International Conference on Robotics and Automation (ICRA 2020), May 2020 (inproceedings)

Abstract
Tactile sensors based on electrical resistance tomography (ERT) have shown many advantages for implementing a soft and scalable whole-body robotic skin; however, calibration is challenging because pressure reconstruction is an ill-posed inverse problem. This paper introduces a method for calibrating soft ERT-based tactile sensors using sim-to-real transfer learning with a finite element multiphysics model. The model is composed of three simple models that together map contact pressure distributions to voltage measurements. We optimized the model parameters to reduce the gap between the simulation and reality. As a preliminary study, we discretized the sensing points into a 6 by 6 grid and synthesized single- and two-point contact datasets from the multiphysics model. We obtained another single-point dataset using the real sensor with the same contact location and force used in the simulation. Our new deep neural network architecture uses a de-noising network to capture the simulation-to-real gap and a reconstruction network to estimate contact force from voltage measurements. The proposed approach showed 82% hit rate for localization and 0.51 N of force estimation error performance in single-contact tests and 78.5% hit rate for localization and 5.0 N of force estimation error in two-point contact tests. We believe this new calibration method has the possibility to improve the sensing performance of ERT-based tactile sensors.

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Differentiation of Blackbox Combinatorial Solvers
Differentiation of Blackbox Combinatorial Solvers

Vlastelica*, M., Paulus*, A., Musil, V., Martius, G., Rolínek, M.

In International Conference on Learning Representations, ICLR’20, May 2020, *Equal Contribution (inproceedings)

Arxiv Code pdf link (url) Project Page [BibTex]

Arxiv Code pdf link (url) Project Page [BibTex]


no image
A Real-Robot Dataset for Assessing Transferability of Learned Dynamics Models

Agudelo-España, D., Zadaianchuk, A., Wenk, P., Garg, A., Akpo, J., Grimminger, F., Viereck, J., Naveau, M., Righetti, L., Martius, G., Krause, A., Schölkopf, B., Bauer, S., Wüthrich, M.

IEEE International Conference on Robotics and Automation (ICRA), pages: 8151-8157, IEEE, 2020 (conference)

Project Page PDF DOI Project Page [BibTex]

Project Page PDF DOI Project Page [BibTex]