RAG Datset
Once your project is created, you can upload datasets to it for evaluation.
Last updated
Once your project is created, you can upload datasets to it for evaluation.
Last updated
Open your Project from the Project list.
It will take you to the Dataset tab.
From the options to create new dataset, select "Upload via CSV" method
Click on the upload area and browse/drag and drop your local CSV file. Ensure the file size does not exceed 1GB.
Enter a suitable name and description [optional] for your dataset.
Click Next to proceed.
Next, you will be directed to map your dataset schema with Catalyst's inbuilt schema, so that your column headings don't require editing.
Here is a list of Catalyst's inbuilt schema elements (definitions are for reference purposes and may vary slightly based on your use case):
Schema Element | Definition |
---|---|
traceId | Unique ID associated with a trace |
metadata | Any additional data not falling into a defined bucket. User has to define the type of metadata [numerical or categorical] |
cost | Expense associated with generating a particular inference |
expected_context | Context documents expected to be retrieved for a query |
latency | Time taken for an inference to be returned |
system_prompt | Predefined instruction provided to an LLM to shape its behaviour during interactions |
traceUri | Unique identifier used to trace and log the sequence of operations during an LLM inference process |
pipeline | Sequence of processes or stages that an input passes through before producing an output in LLM systems |
response | Output generated by an LLM after processing a given prompt or query |
context | Surrounding information or history provided to an LLM to inform and influence its responses |
prompt | Input or query provided to an LLM that triggers the generation of a response |
expected_response | Anticipated or ideal output that an LLM should produce in response to a given prompt |
timestamp | Specific date and time at which an LLM action, such as an inference or a response, occurs |
This guide provides a step-by-step explanation on how to use the RagaAI Python SDK to upload data to your project. The example demonstrates how to manage datasets and upload a CSV file into the platform. The following sections will cover initialisation, listing existing datasets, mapping schema, and uploading the CSV data.
Ensure you have the RagaAI Python SDK installed. If not, you can install it using:
You need secret key, access key and project name, which you can get by navigating to settings/authenticate on UI.
Import the Dataset
module from the ragaai_catalyst
library to handle the dataset operations.
Initialise the dataset manager for a specific project. This will allow you to interact with the datasets in that project.
Replace "demo_project"
with your actual project name.
You can list all the existing datasets within your project to check what data is already available.
This prints a list of existing datasets available in your project.
Retrieve the supported schema elements from the project. This will help you understand how to map your CSV columns to the dataset schema.
This step returns the available schema elements that can be used for mapping your CSV columns.
Create a dictionary to map your CSV column names to the schema elements supported by RagaAI. For example:
In this case, the column 'sql_context'
in the CSV is mapped to 'context'
in the dataset, and 'sql_prompt'
is mapped to 'prompt'
.
Finally, use the create_from_csv
function to upload the CSV data into the platform. Specify the CSV path, dataset name, and the schema mapping.
Replace the csv_path
and dataset_name
with your CSV file path and desired dataset name, respectively.
After uploading, you can verify the upload by listing the datasets again or checking the project dashboard.
Navigate to Dataset tab inside your project to explore your dataset and run evals