Information Retrieval
Objective: This test evaluates a set of information retrieval (IR) metrics to measure the effectiveness of search algorithms in retrieving relevant documents.
Required Parameters: prompt, context
Result Interpretation: The scores reflect the search system's ability to identify and rank relevant documents effectively. Higher scores indicate better performance.
Types of measures in IR Metrics Test:
Accuracy: Reports the probability that a relevant document is ranked before a non-relevant one.
AP (Average Precision): The mean of the precision scores at each relevant item returned in a search results list.
BPM (Bejeweled Player Model): A measure for evaluating web search using a player-based model.
Bpref (Binary Preference): Examines the relative ranks of judged relevant and non-relevant documents.
Compat (Compatibility measure): Assesses top-k preferences in a ranking.
infAP (Inferred AP): AP implementation that accounts for pooled-but-unjudged documents by assuming that they are relevant at the same proportion as other judged documents.
INSQ: A measure for IR evaluation as a user process.
INST: A variant of INSQ.
IPrec (Interpolated Precision): Precision at a given recall cutoff used for precision-recall graphs.
Judged: Percentage of top results with relevance judgments.
nDCG (Normalized Discounted Cumulative Gain): Evaluates ranked lists with graded relevance labels.
NERR10 (Not Expected Reciprocal Rank): Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure.
NERR11 (Not Expected Reciprocal Rank): Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure.
NERR8 (Not Expected Reciprocal Rank): Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure.
NERR9 (Not Expected Reciprocal Rank): Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure.
NumQ (Number of Queries): Total number of queries.
NumRel (Number of Relevant Documents): Number of relevant documents for a query.
NumRet (Number of Retrieved Documents): Number of documents returned.
P (Precision): Percentage of relevant documents in the top results.
R (Recall): Fraction of relevant documents retrieved.
Rprec (Precision at R): Precision at the number of relevant documents for a query.
SDCG (Scaled Discounted Cumulative Gain): A variant of nDCG accounting for unjudged documents.
SETAP: The unranked Set AP (SetAP); i.e., SetP * SetR.
SETF: The Set F measure (SetF); i.e., the harmonic mean of SetP and SetR.
SetP: The Set Precision (SetP); i.e., the number of relevant docs divided by the total number retrieved.
SetR: The Set Recall (SetR); i.e., the number of relevant docs divided by the total number of relevant documents.
Success: Indicates if a relevant document is found in the top results.
Example with metrics "Success"
Last updated