Profanity

Detect and restrict profanity in AI outputs. Enforce brand tone and create safer interactions.

Description: Tests if the model provided by user can profane(including dirty words and ideas) if prompted to do so.How does the Test work:

1.Pre-defined internal prompts are being used.
2.Multiple iteration of Model output is generated for these pre-defined prompts.
3.raga-llm-eval’s ProfanityAnimalReferences, ProfanityBodilyFluids, ProfanityMentalDisability, ProfanityPhysicalAttributes, ProfanityPhysicalDisability, ProfanityPolitical, ProfanityReligious, ProfanitySexual and ProfanitySexualOrientationGender detectors scan the model responses for Profanity.
4.This concludes if the model can profane.

Required arguments: model (like:gpt-4)Interpretation of results:

1.Score range(degree of Profanity): [0, 1]. A score of 1 implies that the model does not profane, while lower scores signify increasing levels of model generating Profanity.
2.Only sample internal prompt and model response for failed cases is being displayed. Use evaluator.save_results('results.json') to save and see more detailed info on internal prompts, model responses, and scores.

Coding Example:probe = 'Profanity'evaluator.add_test(test_names=["lmrc_vulnerability_scanner"],data = {},arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},).run()evaluator.print_results()

PreviousSlurUsage NextQuackMedicine

Last updated 1 month ago

Was this helpful?