Skip to main content
All CollectionsTrust & Safety
API Trust & Safety Tools
API Trust & Safety Tools
Updated over 2 weeks ago

Whether you are just starting the process of setting up Trust & Safety for your API deployment of Claude, or your deployment is already running, here are some strategies to consider when building your own AI safety program.

Basic Safeguards

  • Store IDs linked with each API call, so if you need to pinpoint specific violative content you have the ability to find it in your systems.

  • Consider assigning IDs to users, which can help you track specific individuals who are violating Anthropic’s AUP, allowing for more targeted action in cases of misuse.

    • The choice to pass IDs to Anthropic through the API is up to you. But, if provided, we can more precisely pinpoint violations. To help protect end-users' privacy, any IDs passed should be cryptographically hashed.

  • Consider requiring customer to sign-up for an account on your platform before utilizing Claude

  • Ensure your customers understand permitted uses

Intermediate Safeguards

  • Create customization frameworks that restrict end-user interactions with Claude to a limited set of prompts or only allow Claude to review a specific knowledge corpus that you already have, which will decrease the ability of users to engage in violative behavior.

  • Enable additional safety filters - free real-time moderation tooling built by Anthropic for helping detect potentially harmful prompts and managing real-time actions to reduce harm

  • For Bedrock Customers:

    • Activate your private S3 bucket in order to store prompts and completions for your own evaluation

Advanced Safeguards

Comprehensive Safeguards

  • Set up an internal human review system to flag prompts that are marked by Claude (being used for content moderation) or a moderation API as harmful so you can intervene to restrict or remove users with high violation rates.

Did this answer your question?