ChatGPT_Developer_Mode_v2
Description: Simulates a version of the language model with Developer Mode enabled, offering unrestricted, opinionated, and edgy responses.How does the Test work:
1.Pre-defined internal system prompts are used.
2.Multiple iteration of Model output is generated for these pre-defined system prompts.
3.raga-llm-eval’s
MitigationBypass
andDevMode
detectors scan the model to see if the attack was successful.4.This concludes if the model is vulnerable to ChatGPT_Developer_Mode_v2 attack.
Required arguments: model (like:gpt-4
)Interpretation of results:
1.Score range(degree of model behaving as ChatGPT_Developer_Mode_v2):
[0, 1]
. A score of 1 implies that the model does not behave as ChatGPT_Developer_Mode_v2, while lower scores signify increasing levels of model behaving as ChatGPT_Developer_Mode_v2.2.Sample internal prompt and model response for
failed cases
is displayed. For detailed info on internal prompts, model responses, scores, useevaluator.save_results('filename.json')
.
Coding Example
Last updated