DF-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning

1Stanford University, 2Brown University
ICML 2026

Diffusion Policies in the Age of Exploration ๐Ÿ”ญ โ‹†หšเฟ”

Diffusion policies can learn powerful multimodal decision-making models from offline experience, and strategies have been devised for finetuning them with respect to online experience. But as real-world interactions can be expensive, we look to reduce collection quantity for increased experience quality through exploration, to improve behavior in a sample-efficient way. We focus on three central questions in equipping diffusion policies for principled online exploration:

    ๐Ÿ” How can we identify reasonable actions worth exploring amongst a continuous action space?
    โš–๏ธ How can we quantify the exploration interest an agent has for an arbitrary action?
    ๐Ÿค How can we collaborate between agents in a fleet to perform exploration as a group?
We present Diffusion Filtered Exploration via Ensembles (DF-ExpEnse), an exploration technique that improves the quality of online experience collection, thus increasing finetuning sample-efficiency. DF-ExpEnse leverages the multimodal modeling capabilities of the diffusion policy to identify an expressive and tractably evaluatable candidate set. It utilizes an ensemble of critics to quantifiably score each action, to select the one that best balances execution quality with exploration interest. DF-ExpEnse further enables cross-agent communication to facilitate collaborative exploration as a group.

Exploration can come for free! DF-ExpEnse requires no extra components beyond ones commonly found in reinforcement learning finetuning, simply reusing them during online inference for principled exploration!

DF-ExpEnse Framework

At each timestep, DF-ExpEnse selects an exploratory action to execute by performing three steps. First, (a) filters the continuous action space by generating multiple samples from the diffusion policy. Then, (b) estimates exploration interest in each action with respect to quality and uncertainty using an ensemble. Lastly, (c) normalizes exploration interest across the fleet and selects the action with the maximum interest to execute.

Experiments

DF-ExpEnse is a general exploration technique, and can be seamlessly integrated with existing strategies that finetune pretrained diffusion policies via reinforcement learning to provide sample-efficiency benefits. We integrate DF-ExpEnse with input noise and residual finetuning, and evaluate on a variety of manipulation and locomotion tasks across Robomimic, Gym, and DexMimicGen.

Fleet Size Ablations

Intuitively, larger fleets may provide greater amounts of normalization and collaboration possibilities. We find that performance does decrease below a fleet size of 4, verifying that DF-ExpEnse can leverage larger fleet sizes to help improve sample efficiency. Nevertheless, DF-ExpEnse still reliably outperforms vanilla DSRL and Max-Q across all fleet sizes, large and small.

These findings further reinforce DF-ExpEnse as a robust method that can be integrated with standard reinforcement learning finetuning techniques to provide consistent sample efficiency benefits across a variety of available resource settings.

fleetsize_comparison

BibTeX

@inproceedings{
      luo2026dfexpense,
      title={{DF}-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning},
      author={Calvin Luo and Chen Sun and Shuran Song},
      booktitle={Forty-third International Conference on Machine Learning},
      year={2026}
    }