Fabrication

Exclusive to enterprise customers. Contact us to activate this feature.

Objective: The Fabrication metric detects instances where the response includes new information that is not present in the context (the prompt is not considered). This metric is crucial for ensuring that the content generated by the model remains faithful to the provided context and does not introduce unsupported information.

Required Columns in Dataset:

  • Context: The background information or source material that contains all the facts that should be used in the response.

  • Response: The content generated by the model that is being evaluated for fabrication.

Score Range: 0 (no fabrication) to 1 (high fabrication)

Additional Information: Reasons and evidence for the score are provided along with the metric value to help understand and identify where the fabrication occurred.

Code Implementation

metrics = [
    {"name": "Fabrication", "config": {"model": "gpt-4o-mini", "provider":"azure"}, "column_name":"Response_Correctness_v2"},
    {"name": "Fabrication", "config": {"model": "gpt-4o-mini", "provider":"openai"}, "column_name":"Response_Correctness_v2"}
]

Example:

Context: The context provided discusses the benefits of remote work, including increased flexibility, reduced commuting time, and improved work-life balance.

Response: Remote work has been shown to not only improve work-life balance but also significantly increase employee productivity by 40%. Additionally, it has been found that remote workers are more likely to stay with their companies longer, reducing turnover rates by 25%.

Metric Score: Score: 0.8/1.0

Reasoning:

  • Fabrication: The response introduces new information not supported by the context, such as the claim that remote work increases employee productivity by 40% and reduces turnover rates by 25%. These statistics are not mentioned in the context provided.

  • Evidences: The fabricated details (productivity increase and turnover reduction) are not traceable to any information in the context, leading to a high fabrication score.

Interpretation: The high score indicates significant fabrication in the response, as it introduces new, unsupported information. To reduce the fabrication score, the response should be revised to ensure that all claims and details are backed by the context provided.

Last updated