LogoLogo
Slack CommunityCatalyst Login
  • Welcome
  • RagaAI Catalyst
    • User Quickstart
    • Concepts
      • Configure Your API Keys
      • Supported LLMs
        • OpenAI
        • Gemini
        • Azure
        • AWS Bedrock
        • ANTHROPIC
      • Catalyst Access/Secret Keys
      • Enable Custom Gateway
      • Uploading Data
        • Create new project
        • RAG Datset
        • Chat Dataset
          • Prompt Format
        • Logging traces (LlamaIndex, Langchain)
        • Trace Masking Functions
        • Trace Level Metadata
        • Correlating Traces with External IDs
        • Add Dataset
      • Running RagaAI Evals
        • Executing Evaluations
        • Compare Datasets
      • Analysis
      • Embeddings
    • RagaAI Metric Library
      • RAG Metrics
        • Hallucination
        • Faithfulness
        • Response Correctness
        • Response Completeness
        • False Refusal
        • Context Relevancy
        • Context Precision
        • Context Recall
        • PII Detection
        • Toxicity
      • Chat Metrics
        • Agent Quality
        • Instruction Adherence
        • User Chat Quality
      • Text-to-SQL
        • SQL Response Correctness
        • SQL Prompt Ambiguity
        • SQL Context Ambiguity
        • SQL Context Sufficiency
        • SQL Prompt Injection
      • Text Summarization
        • Summary Consistency
        • Summary Relevance
        • Summary Fluency
        • Summary Coherence
        • SummaC
        • QAG Score
        • ROUGE
        • BLEU
        • METEOR
        • BERTScore
      • Information Extraction
        • MINEA
        • Subjective Question Correction
        • Precision@K
        • Chunk Relevance
        • Entity Co-occurrence
        • Fact Entropy
      • Code Generation
        • Functional Correctness
        • ChrF
        • Ruby
        • CodeBLEU
        • Robust Pass@k
        • Robust Drop@k
        • Pass-Ratio@n
      • Marketing Content Evaluation
        • Engagement Score
        • Misattribution
        • Readability
        • Topic Coverage
        • Fabrication
      • Learning Management System
        • Topic Coverage
        • Topic Redundancy
        • Question Redundancy
        • Answer Correctness
        • Source Citability
        • Difficulty Level
      • Additional Metrics
        • Guardrails
          • Anonymize
          • Deanonymize
          • Ban Competitors
          • Ban Substrings
          • Ban Topics
          • Code
          • Invisible Text
          • Language
          • Secret
          • Sentiment
          • Factual Consistency
          • Language Same
          • No Refusal
          • Reading Time
          • Sensitive
          • URL Reachability
          • JSON Verify
        • Vulnerability Scanner
          • Bullying
          • Deadnaming
          • SexualContent
          • Sexualisation
          • SlurUsage
          • Profanity
          • QuackMedicine
          • DAN 11
          • DAN 10
          • DAN 9
          • DAN 8
          • DAN 7
          • DAN 6_2
          • DAN 6_0
          • DUDE
          • STAN
          • DAN_JailBreak
          • AntiDAN
          • ChatGPT_Developer_Mode_v2
          • ChatGPT_Developer_Mode_RANTI
          • ChatGPT_Image_Markdown
          • Ablation_Dan_11_0
          • Anthropomorphisation
      • Guardrails
        • Competitor Check
        • Gibberish Check
        • PII
        • Regex Check
        • Response Evaluator
        • Toxicity
        • Unusual Prompt
        • Ban List
        • Detect Drug
        • Detect Redundancy
        • Detect Secrets
        • Financial Tone Check
        • Has Url
        • HTML Sanitisation
        • Live URL
        • Logic Check
        • Politeness Check
        • Profanity Check
        • Quote Price
        • Restrict Topics
        • SQL Predicates Guard
        • Valid CSV
        • Valid JSON
        • Valid Python
        • Valid Range
        • Valid SQL
        • Valid URL
        • Cosine Similarity
        • Honesty Detection
        • Toxicity Hate Speech
    • Prompt Playground
      • Concepts
      • Single-Prompt Playground
      • Multiple Prompt Playground
      • Run Evaluations
      • Using Prompt Slugs with Python SDK
      • Create with AI using Prompt Wizard
      • Prompt Diff View
    • Synthetic Data Generation
    • Gateway
      • Quickstart
    • Guardrails
      • Quickstart
      • Python SDK
    • RagaAI Whitepapers
      • RagaAI RLEF (RAG LLM Evaluation Framework)
    • Agentic Testing
      • Quickstart
      • Concepts
        • Tracing
          • Langgraph (Agentic Tracing)
          • RagaAI Catalyst Tracing Guide for Azure OpenAI Users
        • Dynamic Tracing
        • Application Workflow
      • Create New Dataset
      • Metrics
        • Hallucination
        • Toxicity
        • Honesty
        • Cosine Similarity
      • Compare Traces
      • Compare Experiments
      • Add metrics locally
    • Custom Metric
    • Auto Prompt Optimization
    • Human Feedback & Annotations
      • Thumbs Up/Down
      • Add Metric Corrections
      • Corrections as Few-Shot Examples
      • Tagging
    • On-Premise Deployment
      • Enterprise Deployment Guide for AWS
      • Enterprise Deployment Guide for Azure
      • Evaluation Deployment Guide
        • Evaluation Maintenance Guide
    • Fine Tuning (OpenAI)
    • Integration
    • SDK Release Notes
      • ragaai-catalyst 2.1.7
  • RagaAI Prism
    • Quickstart
    • Sandbox Guide
      • Object Detection
      • LLM Summarization
      • Semantic Segmentation
      • Tabular Data
      • Super Resolution
      • OCR
      • Image Classification
      • Event Detection
    • Test Inventory
      • Object Detection
        • Failure Mode Analysis
        • Model Comparison Test
        • Drift Detection
        • Outlier Detection
        • Data Leakage Test
        • Labelling Quality Test
        • Scenario Imbalance
        • Class Imbalance
        • Active Learning
        • Image Property Drift Detection
      • Large Language Model (LLM)
        • Failure Mode Analysis
      • Semantic Segmentation
        • Failure Mode Analysis
        • Labelling Quality Test
        • Active Learning
        • Drift Detection
        • Class Imbalance
        • Scenario Imbalance
        • Data Leakage Test
        • Outlier Detection
        • Label Drift
        • Semantic Similarity
        • Near Duplicates Detection
        • Cluster Imbalance Test
        • Image Property Drift Detection
        • Spatio-Temporal Drift Detection
        • Spatio-Temporal Failure Mode Analysis
      • Tabular Data
        • Failure Mode Analysis
      • Instance Segmentation
        • Failure Mode Analysis
        • Labelling Quality Test
        • Drift Detection
        • Class Imbalance
        • Scenario Imbalance
        • Label Drift
        • Data Leakage Test
        • Outlier Detection
        • Active Learning
        • Near Duplicates Detection
      • Super Resolution
        • Semantic Similarity
        • Active Learning
        • Near Duplicates Detection
        • Outlier Detection
      • OCR
        • Missing Value Test
        • Outlier Detection
      • Image Classification
        • Failure Mode Analysis
        • Labelling Quality Test
        • Class Imbalance
        • Drift Detection
        • Near Duplicates Test
        • Data Leakage Test
        • Outlier Detection
        • Active Learning
        • Image Property Drift Detection
      • Event Detection
        • Failure Mode Analysis
        • A/B Test
    • Metric Glossary
    • Upload custom model
    • Event Detection
      • Upload Model
      • Generate Inference
      • Run tests
    • On-Premise Deployment
      • Enterprise Deployment Guide for AWS
      • Enterprise Deployment Guide for Azure
  • Support
Powered by GitBook
On this page
  • Via UI:
  • Via SDK:

Was this helpful?

  1. RagaAI Catalyst
  2. Concepts
  3. Uploading Data

RAG Datset

Once your project is created, you can upload datasets to it for evaluation.

Last updated 7 months ago

Was this helpful?

Via UI:

  1. Open your Project from the Project list.

  2. It will take you to the Dataset tab.

  3. From the options to create new dataset, select "Upload via CSV" method

  4. Click on the upload area and browse/drag and drop your local CSV file. Ensure the file size does not exceed 1GB.

  5. Enter a suitable name and description [optional] for your dataset.

  6. Click Next to proceed.

Next, you will be directed to map your dataset schema with Catalyst's inbuilt schema, so that your column headings don't require editing.

Here is a list of Catalyst's inbuilt schema elements (definitions are for reference purposes and may vary slightly based on your use case):

Schema Element
Definition

traceId

Unique ID associated with a trace

metadata

Any additional data not falling into a defined bucket. User has to define the type of metadata [numerical or categorical]

cost

Expense associated with generating a particular inference

expected_context

Context documents expected to be retrieved for a query

latency

Time taken for an inference to be returned

system_prompt

Predefined instruction provided to an LLM to shape its behaviour during interactions

traceUri

Unique identifier used to trace and log the sequence of operations during an LLM inference process

pipeline

Sequence of processes or stages that an input passes through before producing an output in LLM systems

response

Output generated by an LLM after processing a given prompt or query

context

Surrounding information or history provided to an LLM to inform and influence its responses

prompt

Input or query provided to an LLM that triggers the generation of a response

expected_response

Anticipated or ideal output that an LLM should produce in response to a given prompt

timestamp

Specific date and time at which an LLM action, such as an inference or a response, occurs

Via SDK:

This guide provides a step-by-step explanation on how to use the RagaAI Python SDK to upload data to your project. The example demonstrates how to manage datasets and upload a CSV file into the platform. The following sections will cover initialisation, listing existing datasets, mapping schema, and uploading the CSV data.

1. Prerequisites

  • Ensure you have the RagaAI Python SDK installed. If not, you can install it using:

    pip install ragaai-catalyst
  • You need secret key, access key and project name, which you can get by navigating to settings/authenticate on UI.

2. Importing Required Modules

Import the Dataset module from the ragaai_catalyst library to handle the dataset operations.

from ragaai_catalyst import Dataset
import pandas as pd

3. Initialise Dataset Management

Initialise the dataset manager for a specific project. This will allow you to interact with the datasets in that project.

# Initialize Dataset management for a specific project
dataset_manager = Dataset(project_name="demo_project")

Replace "demo_project" with your actual project name.

4. List Existing Datasets

You can list all the existing datasets within your project to check what data is already available.

# List existing datasets
datasets = dataset_manager.list_datasets()
print("Existing Datasets:", datasets)

This prints a list of existing datasets available in your project.

5. Get the Schema Elements

Retrieve the supported schema elements from the project. This will help you understand how to map your CSV columns to the dataset schema.

# Get the schema elements
schemaElements = dataset_manager.get_csv_schema()['data']['schemaElements']
print('Supported column names: ', schemaElements)

This step returns the available schema elements that can be used for mapping your CSV columns.

6. Create the Schema Mapping

Create a dictionary to map your CSV column names to the schema elements supported by RagaAI. For example:

pythonCopy code #Create the schema mapping accordingly
schema_mapping = {'sql_context': 'context', 'sql_prompt': 'prompt'}

In this case, the column 'sql_context' in the CSV is mapped to 'context' in the dataset, and 'sql_prompt' is mapped to 'prompt'.

7. Upload the Dataset from CSV

Finally, use the create_from_csv function to upload the CSV data into the platform. Specify the CSV path, dataset name, and the schema mapping.

# Create a dataset from CSV
dataset_manager.create_from_csv(
    csv_path='/content/synthetic_text_to_sql_gpt_4o_mini.csv',
    dataset_name='csv_upload31',
    schema_mapping=schema_mapping
)

Replace the csv_path and dataset_name with your CSV file path and desired dataset name, respectively.

8. Verifying the Upload

After uploading, you can verify the upload by listing the datasets again or checking the project dashboard.

# List datasets to verify the upload
datasets = dataset_manager.list_datasets()
print("Updated Datasets:", datasets)

9. Verifying the Upload

Navigate to Dataset tab inside your project to explore your dataset and run evals

Upload Via CSV
Uploaded Dataset