Failure Mode Analysis
Analyze where LLMs fail. Spot reasoning errors, hallucinations, and inconsistencies.
Execute Test
rules = FMA_LLMRules()
rules.add(metric = 'accuracy', metric_threshold = 0.5, eval_metric='BLEU', threshold=0.1)
rules.add(metric = 'accuracy', metric_threshold = 0.5, eval_metric='CosineSimilarity', threshold=0.5)
rules.add(metric = 'accuracy', metric_threshold = 0.7, eval_metric='METEOR', threshold=0.2)
rules.add(metric = 'accuracy', metric_threshold = 0.5, eval_metric='ROUGE', threshold=0.25)
cls_default = clustering(test_session=test_session,
dataset_name="llm_dataset_testing",
method="k-means",
embedding_col="summary_vector",
level="image",
args={"numOfClusters": 5})
cls_default = clustering(test_session=test_session,
dataset_name="llm_dataset_testing",
method="k-means",
embedding_col="summary_vector",
level="image",
args={"numOfClusters": 5})
edge_case_detection = failure_mode_analysis_llm(test_session=test_session,
dataset_name="dataset_name",
test_name="fma_llm_1",
model="modelA",
gt="GT",
rules=rules,
type="fma",
output_type="llm",
prompt_col_name="document",
model_column="summary",
gt_column="reference_summary",
embedding_col_name="document_vector",
model_embedding_column="summary_vector",
gt_embedding_column="reference_summary_vector",
clustering=cls_default)
test_session.add(edge_case_detection)
test_session.run()
Analysing Test Results

Analysing Test Results

Navigating and Interpreting Results


Practical Tips
Last updated
Was this helpful?

