Quickstart
This guide outlines the steps required to run the Gateway workflow in order to route real-time prompts to various available LLMs based on performance and/or fallback rules.
Step 1: Navigate to Gateway
From the left-hand side pane, select the "Gateway" option
Step 2: Configure API Keys
Users have the freedom to use separate LLM endpoints for their production (deployed) applications. Therefore, there exists a separate section within Gateway to specify the same.
This can be done by clicking the "Configure API Keys" button on the top-right, which opens up the following sidebar:
In case users want to use the same keys they do for metric evaluations on Catalyst, the “Import from Settings” button will allow them to directly import the same (if already declared in Catalyst Settings) to the respective field here.
Step 3: Configure a RagaAI Classifier
As mentioned above, users can setup a classifier model using metric runs from their existing Catalyst datasets. The default page lists existing trained classifiers (if any).
To do so, users can follow the steps below:
Click the "Train New Classifier" button placed on the top-right of this screen. Doing so will open the setup dialog:
Users will be prompted to add a classifier name (unique), followed by the following:
An existing Catalyst dataset - This dataset must contain a “prompt” column, along with a common metric computed on at least two different LLM responses for each prompt.
The common metric - refers to the common metric (e.g, Context Precision) computed on various LLM responses inside the dataset selected above.
Clicking “Next” takes users to the schema mapping screen.
Map your dataset's column names with the relevant classifier schema elements, most importantly prompt and metric values:
Note that the configuration for each metric evaluation is stored during computation, so the model associated with a particular metric column is already known in the background, and hence need not be specified here.
Clicking “Train” will add the classifier training job in the Job ID window, and once complete, users can find it on the default screen.
Clicking on individual classifiers will allow users to view their corresponding details as shown (no edits allowed, view-only):
Users can delete existing classifiers using the three-dot menu on the top-right corner of each card on the default screen.
Step 3: Create a Deployment
Users can create multiple deployments - which are basically a combination of rules, settings, classifier details (as described above), and model choices for real-time inferences. That said, only one deployment can be active at any time.
Deployments can follow one of two routing logics: custom ordering and auto-routing.
Custom Order
To create a new custom order deployment:
Navigate to the "Deploy" section of the Gateway:
Click on "New Deployment" to open the following dialog:
The default choice is set to "Custom Order", wherein users can choose their preferred models from a multi-select dropdown (based on the API keys declared). Once chosen, users can drag-and-drop them in order of their preference.
Once the preference order is chosen, users can select the rules based on which the order will be executed by clicking "Next":
Users can enable each of the following options using a checkbox:
Model Response Timeout - time value in seconds for how long the Catalyst Router should wait for a model to inference a query before moving on to the next one
Retries on Model Timeout - numerical value of the number of times the same model should be retried on hitting the timeout value set above
Retries on Model Failure - numerical value of the number of times the same model should be retried in case a failure/error code is received
Auto-Routing
To create an auto-routiing deployment:
On the "New Deployment" dialog, select Auto-Routing from the preference dropdown:
Till users train their own classifiers, the above screen will be shown. Once 1+ classifiers have been trained, a list will be shown as follows:
For any one deployment, only one classifier can be enabled. Toggling one classifier will disable the others.
Clicking "Next" will allow users to set the rules, same as above.
Clicking "Save" will add the deployment to the default "Deploy" screen, similar to the Projects flow on the Catalyst app.
Once a deployment (custom/auto) has been created, its details can be viewed by clicking its respective card on the "Deploy" screen:
Step 4: Activate a Deployment
To activate a created deployment, simply switch on the toggle switch associated with a particular deployment:
Doing so will expose a code snippet, containing the deployment ID associated with a particular deployment as below:
The copy button next to the snippet will allow users to bring the code snippet to their application's codebase/SDK. Users can simply edit the message with their own variables to enable Routing.
Last updated