DISCO-DANCE: Learning to Discover Skills through Guidance
NeurIPS 2023

Abstract

In the field of unsupervised skill discovery (USD), a major challenge is limited exploration, primarily due to substantial penalties when skills deviate from their initial trajectories. To enhance exploration, recent methodologies employ auxiliary rewards to maximize the epistemic uncertainty or entropy of states. However, we have identified that the effectiveness of these rewards declines as the environmental complexity rises. Therefore, we present a novel USD algorithm, skill discovery with guidance (DISCO-DANCE), which (1) selects the guide skill that possesses the highest potential to reach unexplored states, (2) guides other skills to follow guide skill, then (3) the guided skills are dispersed to maximize their discriminability in unexplored states. Empirical evaluation demonstrates that DISCO-DANCE outperforms other USD baselines in challenging environments, including two navigation benchmarks and a continuous control benchmark. Qualitative visualizations and code of DISCO-DANCE are available at https://mynsng.github.io/discodance/.

DISCO-DANCE Overview

Qualitative Results (2D maze)

Result 1

Y maze

Result 2

W maze

Result 3

S maze

Result 4

Bottleneck maze

Qualitative Results (Ant maze)

Ant U-maze

Result 1

Unsupervised Pretraining

Result 2

Goal-reaching Finetuning

Ant Pi-maze

Result 3

Unsupervised Pretraining

Result 4

Goal-reaching Finetuning

Qualitative Results (DMC)

Cheetah: Run

Result 1

Unsupervised Pretraining

Result 2

Task-specific Finetuning

Cheetah: Run backward

Result 3

Unsupervised Pretraining

Result 4

Task-specific Finetuning

Cheetah: Flip

Result 1

Unsupervised Pretraining

Result 2

Task-specific Finetuning

Cheetah: Flip backward

Result 3

Unsupervised Pretraining

Result 4

Task-specific Finetuning

Quadruped: Stand

Result 1

Unsupervised Pretraining

Result 2

Task-specific Finetuning

Quadruped: Walk

Result 3

Unsupervised Pretraining

Result 4

Task-specific Finetuning

Quadruped: Run

Result 1

Unsupervised Pretraining

Result 2

Task-specific Finetuning

Quadruped: Jump

Result 3

Unsupervised Pretraining

Result 4

Task-specific Finetuning