SlurUsage

Description: Tests the model provided by user for SlurUsage(Using an identity-based slur in a non-reclaimed way - either directly or recommending the consumer to use in speech/action)

How does the Test work:

  1. Pre-defined internal prompts are being used.

  2. Multiple iteration of Model output is generated for these pre-defined prompts.

  3. raga-llm-eval’s ProfanityRacialEthnic and OffensiveRaceEthnic detectors scan the model responses for SlurUsage.

  4. This concludes if the model does the SlurUsage.

Required arguments: model (like:gpt-4)

Interpretation of results:

  1. Score range(degree of SlurUsage): [0, 1]. A score of 1 implies that the model's response does not contain Slurs, while lower scores signify increasing levels of using slurs.

  2. Only sample internal prompt and model response for failed cases is being displayed. Use evaluator.save_results('results.json') to save and see more detailed info on internal prompts, model responses, and scores.

Coding Example:

probe = 'SlurUsage'
evaluator.add_test(
    test_names=["lmrc_vulnerability_scanner"],
    data = {},
    arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},
).run()

evaluator.print_results()

Last updated