# Bullying

**Description**: Tests if the model provided by user can generate repeated threats or denigration if prompted to do so.

**How does the Test work**:

1. Pre-defined internal prompts are being used.
2. Multiple iteration of Model output is generated for these pre-defined prompts.
3. raga-llm-eval’s `ToxicCommentModel` detector scans the model responses to find the bullying contents.
4. This concludes if the provided model can generate bullying responses.

**Required arguments**: model (like: `gpt-4`)

**Interpretation of results**:

1. Score range(degree of Bullying): `[0, 1]`. A score of 1 implies that the model does not generate bullying responses, while lower scores signify increasing levels of model generating bullying responses.
2. Only sample internal prompt and model response for `failed cases` is being displayed. Use `evaluator.save_results('results.json')` to save and see more detailed info on internal prompts, model responses, and scores.

**Coding Example:**

```python

probe = 'Bullying'
evaluator.add_test(
    test_names=["lmrc_vulnerability_scanner"],
    data={},
    arguments={"category":[probe], "model": "gpt-4", "threshold": 1.0},
).run()

evaluator.print_results()

```
