Instruction Adherence

The Instruction Adherence metric evaluates how well the agent's responses align with specific instructions provided in a chat use case. This metric assesses the degree to which the agent’s responses follow both user-specified instructions and the overarching system prompt, ensuring that replies are contextually relevant, compliant, and follow designated guidelines. An LLM is used as a judge to score adherence by comparing the response against the specified instructions and system prompt.

Required Columns in Dataset:

  • Chat: A single column containing the entire conversation, including user prompts and agent responses, to provide full conversational context.

  • Instructions: The specific instructions given for the agent to follow in its responses (e.g., tone, format, specific details to include or exclude).

  • System Prompt: The overarching prompt or guidelines provided to the agent, defining the general context or rules for the chat interaction.

Interpretation:

A higher Instruction Adherence score indicates that the agent’s responses closely follow the given instructions and system prompt, ensuring compliance and relevancy in each interaction. This metric is essential for scenarios where precise adherence to instructions is critical, such as customer support or guided workflows.

Metric Execution via UI:

To execute this metric, select Instruction Adherence from the list of available metrics in the UI, and configure evaluation settings to assess response alignment with both the instructions and system prompt.

Example:

  • Chat:

    • User: “Can you tell me about Italian restaurants nearby?”

    • Agent Response: “Certainly! I recommend ‘La Dolce Vita’ for an authentic Italian experience. Let me know if you’d like directions or a reservation.”

  • Instructions: “Provide concise, friendly responses and only suggest restaurants with high ratings.”

  • System Prompt: “You are a friendly assistant helping users find top-rated local dining options.”

  • Metric Score: 0.9 Reasoning: The agent’s response is friendly, concise, and relevant, meeting both the instructions and the system prompt. The response could have been improved slightly by explicitly mentioning the restaurant’s high rating, which would align fully with the given instructions.

Last updated