Skip to main content
All CollectionsTrust & Safety
API Trust & Safety Tools
API Trust & Safety Tools
Updated today

Whether you are just starting the process of setting up Trust & Safety for your API deployment of Claude, or your deployment is already running, here are some strategies to consider when building your own AI safety program. These suggestions are designed to help you comply with our Terms of Service and Usage Policy, which prohibit certain uses of Claude. Failure to comply with the Terms and Usage Policy may result in suspension or termination of your access to the services.

Basic Safeguards

  • Store IDs linked with each API call, so if you need to pinpoint specific violative content you have the ability to find it in your systems.

  • Consider assigning IDs to users, which can help you track specific individuals who are violating Anthropic’s AUP, allowing for more targeted action in cases of misuse.

    • The choice to pass IDs to Anthropic through the API is up to you. But, if provided, we can more precisely pinpoint violations. To help protect end-users' privacy, any IDs passed should be cryptographically hashed.

  • Consider requiring customer to sign-up for an account on your platform before utilizing Claude

  • Ensure your customers understand permitted uses

  • Warn, throttle, or suspend users who repeatedly violate Anthropic’s Terms of Service and Usage Policy

Intermediate Safeguards

  • Create customization frameworks that restrict end-user interactions with Claude to a limited set of prompts or only allow Claude to review a specific knowledge corpus that you already have, which will decrease the ability of users to engage in violative behavior.

  • Enable additional safety filters - free real-time moderation tooling built by Anthropic for helping detect potentially harmful prompts and managing real-time actions to reduce harm

  • For Bedrock Customers:

    • Activate your private S3 bucket in order to store prompts and completions for your own evaluation

Advanced Safeguards

Comprehensive Safeguards

  • Set up an internal human review system to flag prompts that are marked by Claude (being used for content moderation) or a moderation API as harmful so you can intervene to restrict or remove users with high violation rates.

Did this answer your question?