Guardrails

Exclusive to enterprise customers. Contact us to activate this feature.

RagaAI’s Guardrails feature ensures the secure and reliable operation of large language models (LLMs) by enforcing strict guidelines during response generation. By addressing security concerns, it provides a robust framework to handle diverse and emerging types of prompts with minimal risk. Tailored specifically to individual use cases, RagaAI Guardrails offer best-in-class response time, ensuring low latency while maintaining security and quality.

These guardrails protect against potential risks like hallucinations, inappropriate content, and manipulation attempts while ensuring that LLM outputs meet the specific needs of each application. Whether you are developing chatbots, automated customer support, or complex decision-making systems, RagaAI Guardrails ensure that your model behaves reliably and securely in real time.

Example: Guardrails for Chatbot use case:

  1. Context Adherence (Closed Domain Hallucinations): Ensures the response strictly follows the context in closed-domain use cases, avoiding any unnecessary or incorrect external information.

  2. Correct Language: Validates the correctness of grammar and syntax in the response to maintain high communication standards.

  3. Gibberish Text Detection: Prevents the generation of nonsensical or incoherent responses, ensuring clarity and relevance.

  4. Instruction Adherence: Measures how well the model follows instructions provided in the prompt, crucial for mission-critical applications.

  5. NSFW Text Detection: Detects and blocks any inappropriate content (e.g., explicit or harmful language) from being generated.

  6. Off-Topic Detection: Ensures that the response remains relevant to the prompt and doesn’t deviate into unrelated topics.

  7. Profanity-Free Checks: Monitors and blocks profane language, ensuring the content is appropriate and professional.

  8. Prompt Injection Protection: Identifies and mitigates attempts to manipulate the model through crafted prompts, safeguarding the integrity of responses.

  9. Response Length Appropriateness: Ensures that responses are neither too short nor unnecessarily long, optimizing the information provided.

  10. Restrict to Topic: Confirms that the response is aligned with the specified topic, keeping the output focused and relevant.

  11. Input and Output Toxic Language: Detects and filters out toxic or harmful language in both the input and output, promoting respectful and positive communication.

  12. Unusual Prompt Detection: Identifies unusual or trick prompts that could compromise the model’s integrity or output.

  13. Bias Detection: Detects and measures bias in the model’s responses to ensure fairness and avoid discriminatory outcomes.

  14. Financial Tone Validation: Ensures that the generated response maintains an appropriate tone in financial or professional contexts.

  15. Language Formality: Evaluates the level of formality in the language, ensuring it matches the expected tone (casual or formal) for the use case.

  16. Personally Identifiable Information (PII) Protection: Flags any PII in both input prompts and generated responses, maintaining privacy and compliance.

  17. Politeness Check: Ensures the response is polite and respectful, especially important in customer interactions and sensitive communications.

  18. Sensitive Topic Detection: Flags sensitive subjects in the response, ensuring that potentially harmful topics are avoided.

  19. Sexism Detection: Identifies and measures sexist language in both input prompts and generated outputs, ensuring inclusivity and fairness.

  20. Tone Assessment: Evaluates the tone of the input and output, ensuring consistency with the intended emotional or professional tone.

  21. Correctness (Open-domain Hallucinations): Validates factual correctness in responses, especially important when dealing with open-domain prompts.

By implementing these guardrails, RagaAI provides a comprehensive and secure environment for LLM development, protecting both the user and end users from harmful or incorrect outputs. With industry-leading latency, responses are delivered in real time without compromising on security or quality.

Last updated