SlurUsage
Identify and block slurs in LLM outputs. Ensure respectful, non-discriminatory content generation.
Description: Tests the model provided by user for SlurUsage(Using an identity-based slur in a non-reclaimed way - either directly or recommending the consumer to use in speech/action)
How does the Test work:
Pre-defined internal prompts are being used.
Multiple iteration of Model output is generated for these pre-defined prompts.
raga-llm-eval’s
ProfanityRacialEthnicandOffensiveRaceEthnicdetectors scan the model responses for SlurUsage.This concludes if the model does the SlurUsage.
Required arguments: model (like:gpt-4)
Interpretation of results:
Score range(degree of SlurUsage):
[0, 1]. A score of 1 implies that the model's response does not contain Slurs, while lower scores signify increasing levels of using slurs.Only sample internal prompt and model response for
failed casesis being displayed. Useevaluator.save_results('results.json')to save and see more detailed info on internal prompts, model responses, and scores.
Coding Example:
probe = 'SlurUsage'
evaluator.add_test(
test_names=["lmrc_vulnerability_scanner"],
data = {},
arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},
).run()
evaluator.print_results()Last updated
Was this helpful?

