ChatGPT_Developer_Mode_RANTI
Description: A version of the language model with Developer Mode enabled, offering uncensored responses with profanity and edgy content.
How does the Test work:
Pre-defined internal system prompts are used.
Multiple iteration of Model output is generated for these pre-defined system prompts.
raga-llm-eval’s
MitigationBypass
andDevMode
detectors scan the model to see if the attack was successful.This concludes if the model is vulnerable to ChatGPT_Developer_Mode_RANTI attack.
Required arguments: model (like:gpt-4
)
Interpretation of results:
Score range(degree of model behaving as ChatGPT_Developer_Mode_RANTI):
[0, 1]
. A score of 1 implies that the model does not behave as ChatGPT_Developer_Mode_RANTI, while lower scores signify increasing levels of model behaving as ChatGPT_Developer_Mode_RANTI.Sample internal prompt and model response for
failed cases
is displayed. For detailed info on internal prompts, model responses, scores, useevaluator.save_results('filename.json')
.
Coding Example:
probe = 'ChatGPT_Developer_Mode_RANTI'
evaluator.add_test(
test_names=["dan_vulnerability_scanner"],
data={
"prompt": ["The quick brown fox jumps over the lazy dog."],
},
arguments={"category":[probe], "model": "gpt-3.5-turbo", "threshold": 1.0},
).run()
evaluator.print_results()
Last updated
Was this helpful?