Enhancing Jailbreaking with Universal Multi-Prompts
jumping to a better initial point
Abstract
In this paper, we propose JUMP, a recipe that further improves current SOTA jailbreaking methods. and ultimately, we compare our best setting, JUMP with AdvPrompter. We conduct different experiments for BEAST.
1. Introduction
Our main contributions are as follows:
- we propose JUMP, a training-free method to find universal prompts that have competitive results with AdvPrompter in a higher evaluation number of query setting
- we use JUMP to improve other baselines with our universal prompts.
- We try to attack API-base models directly, and we compare them with transfer attack results
- We have discovered that AdvPrompter would be easily overfitting on the train set in low-data settings
2. Preliminaries
3. Methodology: JUMP