Gateway
Also referred to as "Router" interchangeably.
Exclusive to enterprise customers. Contact us to activate this feature.
The Catalyst Gateway is a post-deployment solution designed to help enterprises leverage multiple Large Language Models (LLMs) in their real-time operations. Users can set up manual as well as autonomous routing rules to decide which model will be used to answer individual prompts in production applications.
User Workflows:
There are two key ways in which users can define the routing logic as described above:
1. Auto-Routing
Allows users to quickly train a RagaAI classifier model using an existing Catalyst dataset. By running a common metric (e.g, Context Precision) on responses from multiple LLMs, the classifier can learn which LLM best answers a certain semantic combination (prompt).
Once a classifier is trained, it can be stored on the platform, and deployed with a single line of code in the users' application codebase. Users can set up multiple classifiers for different use cases.
2. Manual (Custom) Ordering
Alternatively, Catalyst also allows users a prioritised fallback list of LLMs for each request, rerouting to alternative models in case of timeouts or extended response times. This ensures high availability by automatically switching to backup models if the primary choice fails, maintaining service continuity.
Users can simply setup API keys for all available models, and then define the order and associated rules for routing. The configuration can be deployed using a single line of code, same as above.
3. Performance Analytics
All real-time logs related to a particular deployment are stored in a Catalyst dataset. This provides live tracking of model performance metrics, including response times, accuracy, and cost-effectiveness.
Additionally, historical data can be used to refine routing algorithms, continuously enhancing decision-making processes.
Benefits
Increased Accuracy and Relevance: By leveraging the strengths of multiple LLMs, users receive more accurate and contextually relevant responses, enhancing overall productivity.
Cost Efficiency: Optimises API usage, reducing unnecessary expenditures by selecting the most cost-effective model for each request.
Enhanced Reliability: Minimises downtime and service disruptions through intelligent load balancing and failover strategies, ensuring consistent performance.
Scalability: Supports enterprise-level operations, allowing for seamless expansion as new LLMs become available or as usage demands increase.
Last updated