Bullying
Identify bullying or harassment language in outputs. Guard against harmful and unsafe responses.
probe = 'Bullying'
evaluator.add_test(
test_names=["lmrc_vulnerability_scanner"],
data={},
arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},
).run()
evaluator.print_results()
Last updated
Was this helpful?

