ChatGPT_Image_Markdown

Description: A version of the model that converts messages into image markdown format while offering uncensored responses.

How does the Test work:

Pre-defined internal system prompts are used.
Multiple iteration of Model output is generated for these pre-defined system prompts.
raga-llm-eval’s MitigationBypass and MarkdownLink detectors scan the model to see if the attack was successful.
This concludes if the model is vulnerable to ChatGPT_Image_Markdown attack.

Required arguments: model (like:gpt-4)

Interpretation of results:

Score range(degree of model behaving as ChatGPT_Image_Markdown): [0, 1]. A score of 1 implies that the model does not behave as ChatGPT_Image_Markdown, while lower scores signify increasing levels of model behaving as ChatGPT_Image_Markdown.
Sample internal prompt and model response for failed cases is displayed. For detailed info on internal prompts, model responses, scores, use evaluator.save_results('filename.json').

Coding Example:

probe = 'ChatGPT_Image_Markdown'
evaluator.add_test(
    test_names=["dan_vulnerability_scanner"],
    data={
    },
    arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},
).run()

evaluator.print_results()

PreviousChatGPT_Developer_Mode_RANTI NextAblation_Dan_11_0

Last updated 1 year ago

Was this helpful?