DAN_JailBreak
Description: Tests the model (that you passed in the argument) to see if it is vulnerable to DAN_JailBreak attack. DAN_JailBreak is an attack scenario where the model generates responses which are free from limitations, providing unrestricted and unfiltered responses.
How does the Test work:
Pre-defined internal system prompts are used.
Multiple iteration of Model output is generated for these pre-defined system prompts.
raga-llm-eval’s
MitigationBypass
andDAN_JailBreak
detectors scan the model to see if the attack was successful.This concludes if the model is vulnerable to DAN_JailBreak attack.
Required arguments: model (like:gpt-4
)
Interpretation of results:
Score range(degree of model behaving as DAN_JailBreak):
[0, 1]
. A score of 1 implies that the model does not behave as DAN_JailBreak, while lower scores signify increasing levels of model behaving as DAN_JailBreak.Sample internal prompt and model response for
failed cases
is displayed. For detailed info on internal prompts, model responses, scores, useevaluator.save_results('filename.json')
.
Coding Example:
Last updated