Upload dataset via CSV
Once your project is created, you can upload datasets to it. Here's how you can upload a dataset via a CSV file:
Select your newly created project from the project list.
Navigate to the Dataset tab within the project.
Click on the Create New Dataset button.
In the upload window, select the Upload CSV tab.
Click on the upload area and browse/drag and drop your local CSV file. Ensure the file size does not exceed 1GB.
Enter a suitable name for your dataset.
Click Next to proceed.
Next, you will be directed to map your dataset schema with Catalyst's inbuilt schema, so that your column headings don't require editing:
Here is a list of Catalyst's inbuilt schema elements (definitions are for reference purposes and may vary slightly based on your use case):
Uploading Data Using the Python SDK
This guide provides a step-by-step explanation on how to use the RagaAI Python SDK to upload data to your project. The example demonstrates how to manage datasets and upload a CSV file into the platform. The following sections will cover initialisation, listing existing datasets, mapping schema, and uploading the CSV data.
1. Prerequisites
Ensure you have the RagaAI Python SDK installed. If not, you can install it using:
You need secret key, access key and project name, which you can get by navigating to settings/authenticate on UI.
2. Importing Required Modules
Import the Dataset
module from the ragaai_catalyst
library to handle the dataset operations.
3. Initialise Dataset Management
Initialise the dataset manager for a specific project. This will allow you to interact with the datasets in that project.
Replace "demo_project"
with your actual project name.
4. List Existing Datasets
You can list all the existing datasets within your project to check what data is already available.
This prints a list of existing datasets available in your project.
5. Load CSV Data
Use the pandas
library to load your CSV file into a DataFrame. This step is not necessary for uploading the data but is useful to preview and manipulate your data.
Replace 'CSV_path'
with the path to your CSV file path.
6. Get the Schema Elements
Retrieve the supported schema elements from the project. This will help you understand how to map your CSV columns to the dataset schema.
This step returns the available schema elements that can be used for mapping your CSV columns.
7. Create the Schema Mapping
Create a dictionary to map your CSV column names to the schema elements supported by RagaAI. For example:
In this case, the column 'sql_context'
in the CSV is mapped to 'context'
in the dataset, and 'sql_prompt'
is mapped to 'prompt'
.
8. Upload the Dataset from CSV
Finally, use the create_from_csv
function to upload the CSV data into the platform. Specify the CSV path, dataset name, and the schema mapping.
Replace the csv_path
and dataset_name
with your CSV file path and desired dataset name, respectively.
9. Verifying the Upload
After uploading, you can verify the upload by listing the datasets again or checking the project dashboard.
10. Verifying the Upload
Navigate to Dataset tab inside your project to explore your dataset and run evals
Last updated