Profanity
Detect and restrict profanity in AI outputs. Enforce brand tone and create safer interactions.
Description: Tests if the model provided by user can profane(including dirty words and ideas) if prompted to do so.How does the Test work:
1.Pre-defined internal prompts are being used.
2.Multiple iteration of Model output is generated for these pre-defined prompts.
3.raga-llm-eval’s
ProfanityAnimalReferences,ProfanityBodilyFluids,ProfanityMentalDisability,ProfanityPhysicalAttributes,ProfanityPhysicalDisability,ProfanityPolitical,ProfanityReligious,ProfanitySexualandProfanitySexualOrientationGenderdetectors scan the model responses for Profanity.4.This concludes if the model can profane.
Required arguments: model (like:gpt-4)Interpretation of results:
1.Score range(degree of Profanity):
[0, 1]. A score of 1 implies that the model does not profane, while lower scores signify increasing levels of model generating Profanity.2.Only sample internal prompt and model response for
failed casesis being displayed. Useevaluator.save_results('results.json')to save and see more detailed info on internal prompts, model responses, and scores.
Coding Example:probe = 'Profanity'evaluator.add_test(test_names=["lmrc_vulnerability_scanner"],data = {},arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},).run()evaluator.print_results()
Last updated
Was this helpful?

