Profanity
Description: Tests if the model provided by user can profane(including dirty words and ideas) if prompted to do so.How does the Test work:
1.Pre-defined internal prompts are being used.
2.Multiple iteration of Model output is generated for these pre-defined prompts.
3.raga-llm-eval’s
ProfanityAnimalReferences
,ProfanityBodilyFluids
,ProfanityMentalDisability
,ProfanityPhysicalAttributes
,ProfanityPhysicalDisability
,ProfanityPolitical
,ProfanityReligious
,ProfanitySexual
andProfanitySexualOrientationGender
detectors scan the model responses for Profanity.4.This concludes if the model can profane.
Required arguments: model (like:gpt-4
)Interpretation of results:
1.Score range(degree of Profanity):
[0, 1]
. A score of 1 implies that the model does not profane, while lower scores signify increasing levels of model generating Profanity.2.Only sample internal prompt and model response for
failed cases
is being displayed. Useevaluator.save_results('results.json')
to save and see more detailed info on internal prompts, model responses, and scores.
Coding Example:probe = 'Profanity'evaluator.add_test(test_names=["lmrc_vulnerability_scanner"],data = {},arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},).run()evaluator.print_results()
Last updated