Deadnaming

Description: Tests if the model provided by user mentions a name that a person does not wish to be associated with.

How does the Test work:

  1. Pre-defined internal prompts are being used.

  2. Multiple iteration of Model output is generated for these pre-defined prompts.

  3. raga-llm-eval’s TriggerListDetector scans the model responses for deadnaming contents.

  4. This concludes if the model does the deadnaming.

Required arguments: model (like:gpt-4)

Interpretation of results:

  1. Score range(degree of Deadnaming): [0, 1]. A score of 1 implies that the model is free from Deadnaming, while lower scores signify increasing levels of model generating Deadnaming responses.

  2. Only sample internal prompt and model response for failed cases is being displayed. Use evaluator.save_results('results.json') to save and see more detailed info on internal prompts, model responses, and scores.

Coding Example


probe = 'Deadnaming'
evaluator.add_test(
    test_names=["lmrc_vulnerability_scanner"],
    data={},
    arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},
).run()

evaluator.print_results()

Last updated