Chat Dataset
Guide for uploading chat datasets to RagaAI Catalyst
Last updated
Guide for uploading chat datasets to RagaAI Catalyst
Last updated
When uploading a chat dataset to RagaAI Catalyst, you can map your data columns to the appropriate fields in the Catalyst schema. This enables Catalyst to accurately interpret and analyze the data. The schema mapping interface provides flexibility in matching your dataset columns to Catalyst's requirements.
Required Schema Fields
At minimum, ensure that the following columns are mapped correctly:
ChatID: (Optional) A unique identifier for each conversation. If left blank, Catalyst will generate a unique ChatID for each entry.
Chat: This field contains the entire conversation history in JSON format, detailing the sequence of interactions between the assistant, user, system, and function calls.
Optional Fields
You can also add additional columns to provide more context, instructions, or metadata about the conversation. These optional fields include:
Instruction: Any specific instructions or guidelines for handling the chat. This helps Catalyst understand the context or constraints of the conversation, ensuring that responses align with the provided rules.
Context: Additional information related to the chat, such as prior interactions or specific user preferences, which can influence response behavior.
Metadata: Any relevant metadata associated with the conversation, such as tags, sentiment, or session information. This data can include useful details about the conversation's origin, customer type, or other contextual information that helps Catalyst tailor responses effectively.
Each conversation entry should include a list of JSON objects where each object represents a message or system instruction in the chat. Below is an example of the expected structure:
Ch_01
[
{
"role": "system",
"content": "## Important: YOU MUST ALWAYS CALL THE MOST APPROPRIATE FUNCTION WITH APT PARAMETERS FOR ALL USER QUERIES IF RELEVANT. ## Instructions: You are a smart assistant (PLEASE DON'T USE words like I am AI assistant etc) who helps our customer Ramesh Singh, with their bus ticket queries. Ramesh Singh might ask in any language regarding their bookings. Your by default language is English only but be ready to understand different language and type in your answer only in that language. For your information ticket numbers (TIN) of Ramesh Singh is: KLM123456789. He booked those ticket 100000.0 minutes ago and current time is 2024-09-15 15:45:00. Strictly follow below rules. Rule 1. Information from function are always correct.Hence assistant should never change the answer based on customer's persuasion. Rule 2. Use above ticket number (if present) to interact with customer. If no ticket number is found due to payment uncompleted then call given function without any tin number. Don't answer anything which is not related to ticket booking from redBus. Rule 3. After each of your complete answer (except for queries related to bus operator cancellation) ask Ramesh Singh if he found your answer helpful or not. If he says that assistant's answer is helpful then just type '||ANSWER HELPED CUSTOMER||'. Never ask a question like 'Please let me know if you need any further assistance with this ticket' etc etc. Ramesh Singh will eventually ask you for his query. Rule 4. Always start the conversation by greeting Ramesh Singh with his name along with his ticket number after that ask what kind of help he needs. Rule 5. Words like tin,TIN,ticket id refers to ticket number. Rule 6. If there is no ticket number then it means either payment process was unsuccessful or desired seat was not available even though payment was successful. If any amount has been deducted from customer account it will be refunded back to original source within 4-5 working days. Explain this entire statement to customer when needed. Rule 7. Always ask for email or mobile from customer. If customer gives no input or instructs you to use registered email/ mobile then call 'resend_ticket_details_on_mobile' or 'resend_ticket_details_on_email' without any argument. Rule 8. If customer has done booking related error and he has reported the error within 30 mins of booking then you can immediately type '||AGENT CALL||' with proper explanation to customer. Rule 9. If customer claims to receive call/emails from BO regarding ticket cancellation or boarding details change you can immediately type '||AGENT CALL||'. Rule 10. Do not call a human agent based on customer direction initially. First, try to solve the customer's problem on your own. If you are unable to solve a customer query within two attempts, only then type '||AGENT CALL||' so that we can transfer the conversation to a live expert agent (please don't say I am transferring to human agent etc etc). Rule 11. If the customer asks the same question two times or expresses disagreement with your answer for two times, just type '||AGENT CALL||' so that we can transfer the conversation to a live expert agent. Rule 12. Always use functions (don't skip this) to generate a response for every customer query. If by chance you can't find proper function call with appropriate argument just inform Ramesh Singh that his request can't be fulfilled. Rule 13. Whenever you type '||AGENT CALL||', always ensure to also put a message stating the user will be connected to a live agent for query resolution. ## Important: ALWAYS CALL THE MOST APPROPRIATE FUNCTION WITH APT PARAMETERS FOR ANY USER QUERY IF RELEVANT."
},
{
"role": "user",
"content": "Can I change the seat for this booking?",
"topMatches": [],
"timestamp": "2024-09-15 14:30:00"
},
{
"role": "function",
"name": "check_if_seat_change_is_allowed",
"content": "\"Seat change is not allowed for this booking as the travel date is within 24 hours.\"",
"args": "{'tin_number': 'KLM123456789'}",
"timestamp": "2024-09-15 14:30:05"
},
{
"role": "system",
"content": "If the function output says seat change is not allowed, inform the customer that they cannot change the seat for this booking as the travel date is within 24 hours. If the customer insists, type '||AGENT CALL||' after two unsuccessful attempts to resolve the query.",
"timestamp": "2024-09-15 14:30:05"
},
{
"content": "I'm sorry, Ramesh Singh, but seat change is not allowed for your ticket KLM123456789 as the travel date is within 24 hours. Is there anything else I can help you with?",
"role": "assistant",
"totalTokens": "3444",
"timestamp": "2024-09-15 14:30:10"
},
{
"role": "user",
"content": "But I really need to change the seat. Can you please do something?",
"topMatches": [],
"timestamp": "2024-09-15 14:32:30"
},
{
"content": "I understand your concern, Ramesh Singh, but unfortunately seat changes are not possible for this booking as the travel date is within 24 hours. Let me know if I can assist with anything else.",
"role": "assistant",
"totalTokens": "3444",
"timestamp": "2024-09-15 14:32:40"
},
{
"role": "user",
"content": "Can you try again?",
"topMatches": [],
"timestamp": "2024-09-15 14:33:00"
},
{
"role": "function",
"name": "findout_if_agent_call_is_required",
"content": "\"agent call is required\"",
"args": "{'intent': 'assistant is unable to solve user query within two attempts'}",
"timestamp": "2024-09-15 14:33:05"
}
]
Each chat entry contains multiple fields within each message object:
role: Indicates the sender's role in the conversation. Values can be system
, user
, assistant
, or function
.
system: Represents rules, guidelines, and instructions that the assistant must follow.
user: Captures the customer's messages or queries.
assistant: Represents the assistant's responses to the customer.
function: Logs calls to specific functions that perform tasks or retrieve data relevant to the conversation.
content: Holds the main text or instructions for the conversation.
For system
entries, this includes detailed instructions and rules for the assistant's behavior.
For user
entries, this is the actual query or message from the customer.
For assistant
entries, this field contains the response provided by the assistant.
For function
entries, this field captures the result or status of a function call and may include a description of the parameters used.
timestamp: A timestamp for each message in the format YYYY-MM-DD HH:MM:SS
.
totalTokens (optional): Used for assistant entries to capture token usage, which may assist in analysis and optimization.
Example Chat Breakdown
Below is a breakdown of a sample conversation to illustrate the structure and flow:
system: Provides guidelines to the assistant, such as responding with appropriate function calls and following specific rules regarding the customer’s bus ticket query.
user: The customer asks, "Can I change the seat for this booking?"
function: The assistant calls check_if_seat_change_is_allowed
to verify if a seat change is permissible.
assistant: Responds based on the function’s output, stating that seat change is not allowed within 24 hours of travel.
user: Insists on changing the seat.
assistant: Politely reiterates that a seat change cannot be made.
user: Asks the assistant to try again.
function: Calls findout_if_agent_call_is_required
because the assistant cannot resolve the query within two attempts.
This structured chat format enables RagaAI Catalyst to evaluate and analyze the flow of conversations effectively. Please ensure your dataset follows this format for seamless integration and analysis.