LogoLogo
Slack CommunityCatalyst Login
  • Welcome
  • RagaAI Catalyst
    • User Quickstart
    • Concepts
      • Configure Your API Keys
      • Supported LLMs
        • OpenAI
        • Gemini
        • Azure
        • AWS Bedrock
        • ANTHROPIC
      • Catalyst Access/Secret Keys
      • Enable Custom Gateway
      • Uploading Data
        • Create new project
        • RAG Datset
        • Chat Dataset
          • Prompt Format
        • Logging traces (LlamaIndex, Langchain)
        • Trace Masking Functions
        • Trace Level Metadata
        • Correlating Traces with External IDs
        • Add Dataset
      • Running RagaAI Evals
        • Executing Evaluations
        • Compare Datasets
      • Analysis
      • Embeddings
    • RagaAI Metric Library
      • RAG Metrics
        • Hallucination
        • Faithfulness
        • Response Correctness
        • Response Completeness
        • False Refusal
        • Context Relevancy
        • Context Precision
        • Context Recall
        • PII Detection
        • Toxicity
      • Chat Metrics
        • Agent Quality
        • Instruction Adherence
        • User Chat Quality
      • Text-to-SQL
        • SQL Response Correctness
        • SQL Prompt Ambiguity
        • SQL Context Ambiguity
        • SQL Context Sufficiency
        • SQL Prompt Injection
      • Text Summarization
        • Summary Consistency
        • Summary Relevance
        • Summary Fluency
        • Summary Coherence
        • SummaC
        • QAG Score
        • ROUGE
        • BLEU
        • METEOR
        • BERTScore
      • Information Extraction
        • MINEA
        • Subjective Question Correction
        • Precision@K
        • Chunk Relevance
        • Entity Co-occurrence
        • Fact Entropy
      • Code Generation
        • Functional Correctness
        • ChrF
        • Ruby
        • CodeBLEU
        • Robust Pass@k
        • Robust Drop@k
        • Pass-Ratio@n
      • Marketing Content Evaluation
        • Engagement Score
        • Misattribution
        • Readability
        • Topic Coverage
        • Fabrication
      • Learning Management System
        • Topic Coverage
        • Topic Redundancy
        • Question Redundancy
        • Answer Correctness
        • Source Citability
        • Difficulty Level
      • Additional Metrics
        • Guardrails
          • Anonymize
          • Deanonymize
          • Ban Competitors
          • Ban Substrings
          • Ban Topics
          • Code
          • Invisible Text
          • Language
          • Secret
          • Sentiment
          • Factual Consistency
          • Language Same
          • No Refusal
          • Reading Time
          • Sensitive
          • URL Reachability
          • JSON Verify
        • Vulnerability Scanner
          • Bullying
          • Deadnaming
          • SexualContent
          • Sexualisation
          • SlurUsage
          • Profanity
          • QuackMedicine
          • DAN 11
          • DAN 10
          • DAN 9
          • DAN 8
          • DAN 7
          • DAN 6_2
          • DAN 6_0
          • DUDE
          • STAN
          • DAN_JailBreak
          • AntiDAN
          • ChatGPT_Developer_Mode_v2
          • ChatGPT_Developer_Mode_RANTI
          • ChatGPT_Image_Markdown
          • Ablation_Dan_11_0
          • Anthropomorphisation
      • Guardrails
        • Competitor Check
        • Gibberish Check
        • PII
        • Regex Check
        • Response Evaluator
        • Toxicity
        • Unusual Prompt
        • Ban List
        • Detect Drug
        • Detect Redundancy
        • Detect Secrets
        • Financial Tone Check
        • Has Url
        • HTML Sanitisation
        • Live URL
        • Logic Check
        • Politeness Check
        • Profanity Check
        • Quote Price
        • Restrict Topics
        • SQL Predicates Guard
        • Valid CSV
        • Valid JSON
        • Valid Python
        • Valid Range
        • Valid SQL
        • Valid URL
        • Cosine Similarity
        • Honesty Detection
        • Toxicity Hate Speech
    • Prompt Playground
      • Concepts
      • Single-Prompt Playground
      • Multiple Prompt Playground
      • Run Evaluations
      • Using Prompt Slugs with Python SDK
      • Create with AI using Prompt Wizard
      • Prompt Diff View
    • Synthetic Data Generation
    • Gateway
      • Quickstart
    • Guardrails
      • Quickstart
      • Python SDK
    • RagaAI Whitepapers
      • RagaAI RLEF (RAG LLM Evaluation Framework)
    • Agentic Testing
      • Quickstart
      • Concepts
        • Tracing
          • Langgraph (Agentic Tracing)
          • RagaAI Catalyst Tracing Guide for Azure OpenAI Users
        • Dynamic Tracing
        • Application Workflow
      • Create New Dataset
      • Metrics
        • Hallucination
        • Toxicity
        • Honesty
        • Cosine Similarity
      • Compare Traces
      • Compare Experiments
      • Add metrics locally
    • Custom Metric
    • Auto Prompt Optimization
    • Human Feedback & Annotations
      • Thumbs Up/Down
      • Add Metric Corrections
      • Corrections as Few-Shot Examples
      • Tagging
    • On-Premise Deployment
      • Enterprise Deployment Guide for AWS
      • Enterprise Deployment Guide for Azure
      • Evaluation Deployment Guide
        • Evaluation Maintenance Guide
    • Fine Tuning (OpenAI)
    • Integration
    • SDK Release Notes
      • ragaai-catalyst 2.1.7
  • RagaAI Prism
    • Quickstart
    • Sandbox Guide
      • Object Detection
      • LLM Summarization
      • Semantic Segmentation
      • Tabular Data
      • Super Resolution
      • OCR
      • Image Classification
      • Event Detection
    • Test Inventory
      • Object Detection
        • Failure Mode Analysis
        • Model Comparison Test
        • Drift Detection
        • Outlier Detection
        • Data Leakage Test
        • Labelling Quality Test
        • Scenario Imbalance
        • Class Imbalance
        • Active Learning
        • Image Property Drift Detection
      • Large Language Model (LLM)
        • Failure Mode Analysis
      • Semantic Segmentation
        • Failure Mode Analysis
        • Labelling Quality Test
        • Active Learning
        • Drift Detection
        • Class Imbalance
        • Scenario Imbalance
        • Data Leakage Test
        • Outlier Detection
        • Label Drift
        • Semantic Similarity
        • Near Duplicates Detection
        • Cluster Imbalance Test
        • Image Property Drift Detection
        • Spatio-Temporal Drift Detection
        • Spatio-Temporal Failure Mode Analysis
      • Tabular Data
        • Failure Mode Analysis
      • Instance Segmentation
        • Failure Mode Analysis
        • Labelling Quality Test
        • Drift Detection
        • Class Imbalance
        • Scenario Imbalance
        • Label Drift
        • Data Leakage Test
        • Outlier Detection
        • Active Learning
        • Near Duplicates Detection
      • Super Resolution
        • Semantic Similarity
        • Active Learning
        • Near Duplicates Detection
        • Outlier Detection
      • OCR
        • Missing Value Test
        • Outlier Detection
      • Image Classification
        • Failure Mode Analysis
        • Labelling Quality Test
        • Class Imbalance
        • Drift Detection
        • Near Duplicates Test
        • Data Leakage Test
        • Outlier Detection
        • Active Learning
        • Image Property Drift Detection
      • Event Detection
        • Failure Mode Analysis
        • A/B Test
    • Metric Glossary
    • Upload custom model
    • Event Detection
      • Upload Model
      • Generate Inference
      • Run tests
    • On-Premise Deployment
      • Enterprise Deployment Guide for AWS
      • Enterprise Deployment Guide for Azure
  • Support
Powered by GitBook
On this page

Was this helpful?

  1. RagaAI Catalyst
  2. Concepts
  3. Uploading Data

Chat Dataset

Guide for uploading chat datasets to RagaAI Catalyst

Last updated 6 months ago

Was this helpful?

When uploading a chat dataset to RagaAI Catalyst, you can map your data columns to the appropriate fields in the Catalyst schema. This enables Catalyst to accurately interpret and analyze the data. The schema mapping interface provides flexibility in matching your dataset columns to Catalyst's requirements.

Required Schema Fields

At minimum, ensure that the following columns are mapped correctly:

  1. ChatID: (Optional) A unique identifier for each conversation. If left blank, Catalyst will generate a unique ChatID for each entry.

  2. Chat: This field contains the entire conversation history in JSON format, detailing the sequence of interactions between the assistant, user, system, and function calls.

Optional Fields

You can also add additional columns to provide more context, instructions, or metadata about the conversation. These optional fields include:

  • Instruction: Any specific instructions or guidelines for handling the chat. This helps Catalyst understand the context or constraints of the conversation, ensuring that responses align with the provided rules.

  • Context: Additional information related to the chat, such as prior interactions or specific user preferences, which can influence response behavior.

  • Metadata: Any relevant metadata associated with the conversation, such as tags, sentiment, or session information. This data can include useful details about the conversation's origin, customer type, or other contextual information that helps Catalyst tailor responses effectively.

Each conversation entry should include a list of JSON objects where each object represents a message or system instruction in the chat. Below is an example of the expected structure:

ChatID
Chat

Ch_01

[

{

"role": "system",

"content": "## Important: YOU MUST ALWAYS CALL THE MOST APPROPRIATE FUNCTION WITH APT PARAMETERS FOR ALL USER QUERIES IF RELEVANT. ## Instructions: You are a smart assistant (PLEASE DON'T USE words like I am AI assistant etc) who helps our customer Ramesh Singh, with their bus ticket queries. Ramesh Singh might ask in any language regarding their bookings. Your by default language is English only but be ready to understand different language and type in your answer only in that language. For your information ticket numbers (TIN) of Ramesh Singh is: KLM123456789. He booked those ticket 100000.0 minutes ago and current time is 2024-09-15 15:45:00. Strictly follow below rules. Rule 1. Information from function are always correct.Hence assistant should never change the answer based on customer's persuasion. Rule 2. Use above ticket number (if present) to interact with customer. If no ticket number is found due to payment uncompleted then call given function without any tin number. Don't answer anything which is not related to ticket booking from redBus. Rule 3. After each of your complete answer (except for queries related to bus operator cancellation) ask Ramesh Singh if he found your answer helpful or not. If he says that assistant's answer is helpful then just type '||ANSWER HELPED CUSTOMER||'. Never ask a question like 'Please let me know if you need any further assistance with this ticket' etc etc. Ramesh Singh will eventually ask you for his query. Rule 4. Always start the conversation by greeting Ramesh Singh with his name along with his ticket number after that ask what kind of help he needs. Rule 5. Words like tin,TIN,ticket id refers to ticket number. Rule 6. If there is no ticket number then it means either payment process was unsuccessful or desired seat was not available even though payment was successful. If any amount has been deducted from customer account it will be refunded back to original source within 4-5 working days. Explain this entire statement to customer when needed. Rule 7. Always ask for email or mobile from customer. If customer gives no input or instructs you to use registered email/ mobile then call 'resend_ticket_details_on_mobile' or 'resend_ticket_details_on_email' without any argument. Rule 8. If customer has done booking related error and he has reported the error within 30 mins of booking then you can immediately type '||AGENT CALL||' with proper explanation to customer. Rule 9. If customer claims to receive call/emails from BO regarding ticket cancellation or boarding details change you can immediately type '||AGENT CALL||'. Rule 10. Do not call a human agent based on customer direction initially. First, try to solve the customer's problem on your own. If you are unable to solve a customer query within two attempts, only then type '||AGENT CALL||' so that we can transfer the conversation to a live expert agent (please don't say I am transferring to human agent etc etc). Rule 11. If the customer asks the same question two times or expresses disagreement with your answer for two times, just type '||AGENT CALL||' so that we can transfer the conversation to a live expert agent. Rule 12. Always use functions (don't skip this) to generate a response for every customer query. If by chance you can't find proper function call with appropriate argument just inform Ramesh Singh that his request can't be fulfilled. Rule 13. Whenever you type '||AGENT CALL||', always ensure to also put a message stating the user will be connected to a live agent for query resolution. ## Important: ALWAYS CALL THE MOST APPROPRIATE FUNCTION WITH APT PARAMETERS FOR ANY USER QUERY IF RELEVANT."

},

{

"role": "user",

"content": "Can I change the seat for this booking?",

"topMatches": [],

"timestamp": "2024-09-15 14:30:00"

},

{

"role": "function",

"name": "check_if_seat_change_is_allowed",

"content": "\"Seat change is not allowed for this booking as the travel date is within 24 hours.\"",

"args": "{'tin_number': 'KLM123456789'}",

"timestamp": "2024-09-15 14:30:05"

},

{

"role": "system",

"content": "If the function output says seat change is not allowed, inform the customer that they cannot change the seat for this booking as the travel date is within 24 hours. If the customer insists, type '||AGENT CALL||' after two unsuccessful attempts to resolve the query.",

"timestamp": "2024-09-15 14:30:05"

},

{

"content": "I'm sorry, Ramesh Singh, but seat change is not allowed for your ticket KLM123456789 as the travel date is within 24 hours. Is there anything else I can help you with?",

"role": "assistant",

"totalTokens": "3444",

"timestamp": "2024-09-15 14:30:10"

},

{

"role": "user",

"content": "But I really need to change the seat. Can you please do something?",

"topMatches": [],

"timestamp": "2024-09-15 14:32:30"

},

{

"content": "I understand your concern, Ramesh Singh, but unfortunately seat changes are not possible for this booking as the travel date is within 24 hours. Let me know if I can assist with anything else.",

"role": "assistant",

"totalTokens": "3444",

"timestamp": "2024-09-15 14:32:40"

},

{

"role": "user",

"content": "Can you try again?",

"topMatches": [],

"timestamp": "2024-09-15 14:33:00"

},

{

"role": "function",

"name": "findout_if_agent_call_is_required",

"content": "\"agent call is required\"",

"args": "{'intent': 'assistant is unable to solve user query within two attempts'}",

"timestamp": "2024-09-15 14:33:05"

}

]

Each chat entry contains multiple fields within each message object:

  • role: Indicates the sender's role in the conversation. Values can be system, user, assistant, or function.

    • system: Represents rules, guidelines, and instructions that the assistant must follow.

    • user: Captures the customer's messages or queries.

    • assistant: Represents the assistant's responses to the customer.

    • function: Logs calls to specific functions that perform tasks or retrieve data relevant to the conversation.

  • content: Holds the main text or instructions for the conversation.

    • For system entries, this includes detailed instructions and rules for the assistant's behavior.

    • For user entries, this is the actual query or message from the customer.

    • For assistant entries, this field contains the response provided by the assistant.

    • For function entries, this field captures the result or status of a function call and may include a description of the parameters used.

  • timestamp: A timestamp for each message in the format YYYY-MM-DD HH:MM:SS.

  • totalTokens (optional): Used for assistant entries to capture token usage, which may assist in analysis and optimization.

Example Chat Breakdown

Below is a breakdown of a sample conversation to illustrate the structure and flow:

  1. system: Provides guidelines to the assistant, such as responding with appropriate function calls and following specific rules regarding the customer’s bus ticket query.

  2. user: The customer asks, "Can I change the seat for this booking?"

  3. function: The assistant calls check_if_seat_change_is_allowed to verify if a seat change is permissible.

  4. assistant: Responds based on the function’s output, stating that seat change is not allowed within 24 hours of travel.

  5. user: Insists on changing the seat.

  6. assistant: Politely reiterates that a seat change cannot be made.

  7. user: Asks the assistant to try again.

  8. function: Calls findout_if_agent_call_is_required because the assistant cannot resolve the query within two attempts.

This structured chat format enables RagaAI Catalyst to evaluate and analyze the flow of conversations effectively. Please ensure your dataset follows this format for seamless integration and analysis.

9KB
Chat Format CSV - chat.csv
Sample CSV