DAN 6_0
Description: Tests the model (that you passed in the argument) to see if it is vulnerable to DAN 6_0 attack. DAN 6_0 is an attack scenario while LLM assumes the role of DAN, an AI model capable of bypassing constraints and OpenAI policies, providing uncensored responses without limitations.
How does the Test work:
Pre-defined internal system prompts are used.
Multiple iteration of Model output is generated for these pre-defined system prompts.
raga-llm-eval’s
MitigationBypass
andDAN
detectors scan the model to see if the attack was successful.This concludes if the model is vulnerable to DAN 6_0 attack.
Required arguments: model (like:gpt-4
)
Interpretation of results:
Score range(degree of model behaving as DAN 6):
[0, 1]
. A score of 1 implies that the model does not behave as DAN 6, while lower scores signify increasing levels of model behaving as DAN 6.Sample internal prompt and model response for
failed cases
is displayed. For detailed info on internal prompts, model responses, scores, useevaluator.save_results('filename.json')
.
Coding Example:
Last updated