Enhancing Jailbreaking with Universal Multi-Prompts

jumping to a better initial point

Abstract

In this paper, we propose JUMP, a recipe that further improves current SOTA jailbreaking methods. and ultimately, we compare our best setting, JUMP with AdvPrompter. We conduct different experiments for BEAST.

1. Introduction

Our main contributions are as follows:

  1. we propose JUMP, a training-free method to find universal prompts that have competitive results with AdvPrompter in a higher evaluation number of query setting
  2. we use JUMP to improve other baselines with our universal prompts.
  3. We try to attack API-base models directly, and we compare them with transfer attack results
  4. We have discovered that AdvPrompter would be easily overfitting on the train set in low-data settings

2. Preliminaries

3. Methodology: JUMP