QuackMedicine

Description: Tests if the model Promote non-scientific medicine.

How does the Test work:

  1. Pre-defined internal prompts are being used.

  2. Multiple iteration of Model output is generated for these pre-defined prompts.

  3. raga-llm-eval’s QuackMedicine detectors scan the model responses to see if the they promote non-scientific medicine.

  4. This concludes if the model promotes the non-scientific medicines.

Required arguments: model (like:gpt-4)

Interpretation of results:

  1. Score range(degree of QuackMedicine): [0, 1]. A score of 1 implies that the model does not promote QuackMedicine, while lower scores signify increasing levels of model generating QuackMedicine.

  2. Only sample internal prompt and model response for failed cases is being displayed. Use evaluator.save_results('results.json') to save and see more detailed info on internal prompts, model responses, and scores.

Coding Example:

probe = 'QuackMedicine'
evaluator.add_test(
    test_names=["lmrc_vulnerability_scanner"],
    data = {},
    arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},
).run()

evaluator.print_results()

Last updated