API Documentation

Everything you need to integrate SafeModeration into your platform.

Quick start

Get up and running in under 60 seconds.

Step 1

Get your API key

Sign up at safemoderation.com/pricing to start your free trial. Your API key will be emailed to you immediately after checkout. It looks like this: sm_live_a1b2c3d4e5f6…

Step 2a

Moderate text

curl -X POST https://api.safemoderation.com/.netlify/functions/moderate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "media_type": "text",
    "content": "Hello, how are you today?",
    "reference_id": "comment_84729"
  }'
const response = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'text',
      content: 'Hello, how are you today?',
      reference_id: 'comment_84729',
    }),
  }
);
const data = await response.json();
console.log(data.decision); // "allow"
import requests

response = requests.post(
    'https://api.safemoderation.com/.netlify/functions/moderate',
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json',
    },
    json={
        'media_type': 'text',
        'content': 'Hello, how are you today?',
        'reference_id': 'comment_84729',
    }
)
data = response.json()
print(data['decision'])  # "allow"

Step 2b

Moderate an image

curl -X POST https://api.safemoderation.com/.netlify/functions/moderate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "media_type": "image",
    "content": "https://example.com/user-upload.jpg",
    "reference_id": "post_a8f9c2"
  }'
const response = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'image',
      content: 'https://example.com/user-upload.jpg',
      reference_id: 'post_a8f9c2',
    }),
  }
);
const data = await response.json();
console.log(data.decision); // "block"
import requests

response = requests.post(
    'https://api.safemoderation.com/.netlify/functions/moderate',
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json',
    },
    json={
        'media_type': 'image',
        'content': 'https://example.com/user-upload.jpg',
        'reference_id': 'post_a8f9c2',
    }
)
data = response.json()
print(data['decision'])  # "block"

Step 3

Read the response

Text response:

json
{
  "request_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "reference_id": "comment_84729",
  "decision": "allow",
  "confidence": 0.95,
  "categories": {
    "hate_speech": 0.01,
    "harassment_bullying": 0.02,
    "adult_content": 0.00,
    "violence_gore": 0.00,
    "spam_scam": 0.01,
    "suicide_self_harm": 0.00,
    "pii_exposure": 0.00,
    "profanity": 0.00
  },
  "warnings": [],
  "usage": {
    "credits_used": 1,
    "monthly_credits": 15000
  }
}

Image response:

json
{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "reference_id": "post_a8f9c2",
  "decision": "block",
  "confidence": 0.92,
  "categories": {
    "adult_content": 0.04,
    "violence_gore": 0.92,
    "hate_speech": 0.01,
    "suicide_self_harm": 0.00,
    "weapons": 0.78,
    "drugs": 0.00,
    "alcohol": 0.00,
    "tobacco": 0.00
  },
  "warnings": [],
  "usage": {
    "credits_used": 47,
    "monthly_credits": 15000
  }
}

The decision field tells you what to do with the content.

Reference ID

reference_id is a required field that links each moderation result back to the record in your own data model. It is stored in the log and echoed in every response. It has no effect on the moderation decision itself.

FieldRequiredFormatDescription
reference_id Yes String, 1-256 chars, alphanumeric plus ._-/: Your unique identifier for this content. Use your database row ID, post slug, or comment ID: any key that lets you look up the original record.

Why reference_id is required

SafeModeration assigns its own request_id to every call. reference_id is yours: it makes the moderation log immediately actionable without a secondary lookup. When a result comes back block, your code already knows exactly which record to act on.

Worked example

A forum stores user comments in a comments table, each with an integer primary key. When a user submits a new comment, the forum's backend calls SafeModeration before writing the row to the database:

javascript
const result = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'text',
      content: comment.body,
      reference_id: `comment_${comment.id}`,  // e.g. "comment_84729"
    }),
  }
).then(r => r.json());

if (result.decision === 'block') {
  // reference_id is echoed back, no extra lookup needed
  await markCommentRejected(result.reference_id);
}

The response echoes "reference_id": "comment_84729", so the forum can act on the result without tracking SafeModeration's internal request_id at all.

Metadata

The optional metadata field accepts any plain JSON object you want stored alongside the log record and echoed in the response. It has no effect on the moderation decision itself.

FieldRequiredFormatDescription
metadata No Plain object, max 4,096 bytes (UTF-8 JSON) Arbitrary key-value pairs you want stored with the log record. Common uses: content type, author ID, thread ID, locale, and other platform-specific context.

What to put in metadata

Metadata is a free-form envelope: use any keys that make sense for your platform. Common examples:

  • content_type: your category for the content (e.g. comment, post, profile), useful for filtering in your moderation dashboard
  • author_id: identifier of the user who created the content, enabling author-level abuse tracking and repeat-offender detection
  • thread_id, locale, client_version: any other context your team finds useful when reviewing flagged content

Worked example

javascript
const result = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'text',
      content: comment.body,
      reference_id: `comment_${comment.id}`,
      metadata: {
        content_type: 'comment',
        author_id: `user_${comment.authorId}`,
        thread_id: `thread_${comment.threadId}`,
      },
    }),
  }
).then(r => r.json());

// result.metadata is echoed back exactly as sent
console.log(result.metadata.author_id);  // "user_84729"

The metadata object is echoed in the response unchanged, so your downstream code can read any field it needs without an extra lookup.

Authentication

SafeModeration uses API key authentication. Pass your key as a Bearer token in the Authorization header on every request.

http
Authorization: Bearer sm_live_your_key_here
⚠️

Keep your API key secret. Never expose it in client-side code, public repositories, or browser requests. Always make API calls from your server.

ℹ️

API keys are issued immediately after checkout and emailed to you. If you lose your key, contact support@safemoderation.com to have it revoked and reissued.

The /moderate endpoint

http
POST https://api.safemoderation.com/.netlify/functions/moderate

Request schema

Headers

HeaderRequiredValue
AuthorizationYesBearer YOUR_API_KEY
Content-TypeYesapplication/json

Body parameters

ParameterTypeRequiredDescription
media_typestringYesEither "text" or "image".
contentstringYesFor text: the text to moderate, max 1,024 characters. For image: a public HTTPS URL pointing to a JPEG or PNG.
reference_idstringRecommendedYour internal ID for this content (e.g. post_id, comment_id). Echoed in every response. 1-256 characters; alphanumeric plus ._-/:.
metadataobjectNoArbitrary key-value pairs stored with the log record and echoed in the response. Any plain JSON object up to 4,096 bytes (UTF-8). Has no effect on the moderation decision.
ℹ️

For text requests, content is limited to 1,024 characters. Requests with longer text are rejected with a 400 error. Trim content to the relevant portion before submitting.

Response

FieldTypeDescription
request_idstringUnique identifier for this request assigned by SafeModeration. Reference this ID when contacting support.
reference_idstringEchoes the reference_id you sent. Always present in the response.
metadataobjectEchoes the metadata object you sent, unchanged. Only present if provided in the request.
decisionstringOne of: allow, flag, block
confidencenumberConfidence score from 0.00 to 1.00
categoriesobjectScore for each moderation category (0.00-1.00). Keys vary by media_type. See the categories reference below.
warningsarrayReserved for future warning codes. Currently always an empty array.
usage.credits_usednumberCredits consumed this month so far
usage.monthly_creditsnumberYour plan's monthly credit limit

Contract guarantees

ℹ️

These properties are stable and guaranteed in every response:

  • All category keys for the given media_type are always present, even if their score is 0.00
  • decision is always one of: allow, flag, or block
  • confidence reflects the classifier's certainty in the decision, not an average of category scores
  • Response shape does not change between requests

Full example

curl -X POST https://api.safemoderation.com/.netlify/functions/moderate \
  -H "Authorization: Bearer sm_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "media_type": "text",
    "content": "I know where you live. Im going to find you and hurt you.",
    "reference_id": "post_a8f9c2"
  }'
const response = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'text',
      content: 'I know where you live. Im going to find you and hurt you.',
      reference_id: 'post_a8f9c2',
    }),
  }
);
const data = await response.json();
// data.decision === "block"
import requests

response = requests.post(
    'https://api.safemoderation.com/.netlify/functions/moderate',
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json',
    },
    json={
        'media_type': 'text',
        'content': 'I know where you live. Im going to find you and hurt you.',
        'reference_id': 'post_a8f9c2',
    }
)
data = response.json()
# data["decision"] == "block"
json response
{
  "request_id": "9b2d3e4f-1a2b-3c4d-5e6f-7a8b9c0d1e2f",
  "reference_id": "post_a8f9c2",
  "decision": "block",
  "confidence": 0.97,
  "categories": {
    "hate_speech": 0.08,
    "harassment_bullying": 0.96,
    "adult_content": 0.00,
    "violence_gore": 0.45,
    "spam_scam": 0.01,
    "suicide_self_harm": 0.00,
    "pii_exposure": 0.00,
    "profanity": 0.02
  },
  "warnings": [],
  "usage": {
    "credits_used": 42,
    "monthly_credits": 15000
  }
}

Decisions explained

Every response includes a decision field. Here is how to act on each value:

DecisionMeaningRecommended action
allowContent passes moderationPublish
flagAmbiguous, possible violationRoute to human review, or add friction
blockClear violationReject

Integration pattern

javascript
switch (data.decision) {
  case 'allow':
    return publishContent();
  case 'flag':
    return sendToHumanReview();
  case 'block':
    return rejectContent();
}
💡

How you act on each decision is entirely up to your platform's policies. Many platforms auto-block on block, route flag to a human review queue, and auto-approve on allow. You decide the right thresholds for your use case.

Confidence scores

The confidence field reflects overall certainty in the decision, from 0.00 (uncertain) to 1.00 (very certain). Individual category scores reflect how strongly each category applies to the content.

ℹ️

Confidence scores are probabilistic, not deterministic. No automated system is 100% accurate. We recommend human review for high-stakes decisions and for content near decision boundaries.

Text categories

When media_type is "text", the response includes scores for the following eight categories. Each value is a float from 0.00 to 1.00. All keys are always present.

CategoryKeyWhat it detects
Hate speech hate_speech Slurs, dehumanizing language, and content targeting people based on race, religion, ethnicity, gender, sexual orientation, or other protected characteristics.
Harassment & bullying harassment_bullying Targeted abuse, threats, doxxing, coordinated harassment, and content designed to intimidate or demean specific individuals.
Adult content adult_content Explicit sexual content, graphic nudity, and solicitation.
Violence & gore violence_gore Graphic violence, threats of physical harm, glorification of violence, and disturbing imagery descriptions.
Spam & scams spam_scam Phishing attempts, fake prizes, fraudulent solicitation, impersonation, and promotional abuse.
Suicide & self-harm suicide_self_harm Content that promotes, glorifies, or provides methods for self-harm or suicide. Context-aware: prevention and awareness content is not flagged.
PII exposure pii_exposure Personally identifiable information including Social Security numbers, credit card numbers, bank account details, passwords, and similar sensitive data.
Profanity profanity Explicit language and offensive terms.

Evasion detection

SafeModeration automatically detects common evasion techniques including l33tspeak substitution, spaced characters, Unicode homoglyphs, and repeated character patterns.

Multilingual support

The classifier detects harmful content in 50+ languages. Coverage is most thoroughly tested in English, Spanish, French, German, Portuguese, Arabic, and Russian, with strong performance across other major European, East Asian, South Asian, and Middle Eastern languages.

Image categories

When media_type is "image", the response includes scores for the following eight categories. Each value is a float from 0.00 to 1.00. All keys are always present.

CategoryKeyWhat it detects
Adult content adult_content Nudity, sexual content, or sexually suggestive imagery.
Violence & gore violence_gore Violence, gore, or graphic disturbing imagery.
Hate speech hate_speech Hate symbols, extremist iconography, or supremacist imagery.
Suicide & self-harm suicide_self_harm Self-injury imagery or suicide-related imagery.
Weapons weapons Firearms, knives, or weapons in threatening contexts.
Drugs drugs Illegal substances, paraphernalia, or drug use imagery.
Alcohol alcohol Alcoholic beverages or drinking imagery. Use thresholds appropriate to your jurisdiction.
Tobacco tobacco Tobacco products or smoking imagery. Use thresholds appropriate to your jurisdiction.

Image requirements

SafeModeration fetches the image from the URL you provide, classifies it, then discards the bytes. We store the URL and a one-way hash of the image content for caching and audit purposes. The image bytes are not retained.

RequirementValue
Supported formatsJPEG, PNG
Maximum file size10 MB
Minimum dimensions80 × 80 pixels
Maximum dimensions8192 × 8192 pixels
URL protocolHTTPS only. The URL must be publicly accessible.
Fetch timeout5 seconds
Maximum redirects3
⚠️

GIF and WebP are not supported. Requests with unsupported formats return a 400 error with code IMAGE_FORMAT_UNSUPPORTED. You are not charged for failed image requests.

💡

Pass the URL of the image as stored on your own infrastructure or CDN. The URL must be reachable from our servers at the time of the request. Pre-signed URLs with short expiration windows may fail if the window closes before the request is processed.

Error codes

StatusDescription
400INVALID_MEDIA_TYPE: media_type must be "text" or "image".
400INVALID_CONTENT: content must be a non-empty string.
400Content must be 1024 characters or fewer: the content field exceeded 1,024 characters for a text request. Trim the content before retrying.
400"reference_id" is required: reference_id was not included in the request body.
400"reference_id" must be 1-256 characters: reference_id was empty or exceeded the length limit.
400"reference_id" contains invalid characters (allowed: a-z A-Z 0-9 . _ - / :): reference_id contained whitespace, quotes, or other disallowed characters.
400metadata must be a plain object: metadata was not a JSON object (e.g. an array or string was sent).
400metadata exceeds 4096-byte limit (5200 bytes): the JSON-serialised metadata object exceeded 4,096 bytes (UTF-8); 5200 is the actual byte count from your request.
Image-specific errors
400INVALID_IMAGE_URL: the image URL is malformed or does not use HTTPS.
400IMAGE_FETCH_FAILED: the image could not be fetched. The server returned an error, the request timed out, or a network error occurred.
400IMAGE_FORMAT_UNSUPPORTED: the image is not JPEG or PNG. GIF, WebP, and other formats are not supported.
400IMAGE_TOO_LARGE: the image file exceeds 10 MB.
400IMAGE_TOO_SMALL: the image dimensions are below 80 × 80 pixels.
400IMAGE_DIMENSIONS_TOO_LARGE: the image dimensions exceed 8192 × 8192 pixels.
400IMAGE_URL_BLOCKED: the image URL was rejected by URL safety checks (private IP ranges, localhost, non-public hosts).
401Unauthorized: missing, invalid, or revoked API key.
429Rate limit exceeded. Maximum 600 requests per minute per API key. Slow down and retry. Burst limit hit. The retry_after field in the response body and the Retry-After header indicate how many seconds until the current window resets.
429Monthly credit limit reached. Upgrade your plan or wait until the next billing period. Monthly cap exhausted. The response body includes "limit_type": "monthly". All requests return 429 until the 1st of next month or you upgrade your plan.
502Internal error: retry the request. If it persists, contact support@safemoderation.com.
💡

Failed image requests do not consume credits. If the image cannot be fetched, is in an unsupported format, or fails any safety check, the request is not charged.

Error response format

json
{
  "error": "Invalid or revoked API key."
}

Image errors include an additional code field:

json
{
  "error": "Image is not JPEG or PNG.",
  "code": "IMAGE_FORMAT_UNSUPPORTED"
}
⚠️

There are two distinct 429 conditions. A burst 429 is temporary: wait the number of seconds in retry_after and resend. A monthly 429 ("limit_type": "monthly") blocks all requests until the 1st of next month or until you upgrade your plan.

Credits and rate limits

Credits

Each moderation request consumes credits from your monthly allowance:

Content typeCredits
Text1 credit
Image3 credits

Failed image requests do not consume credits. If the image cannot be fetched, is in an unsupported format, or fails any safety check, you are not charged.

Plans

PlanMonthly creditsPrice
Starter15,000$99/mo
Growth150,000$249/mo
Pro500,000$499/mo
EnterpriseCustomContact us

Credits reset on the 1st of each calendar month. Unused credits do not roll over.

Rate limits

All plans share a burst limit of 600 requests per minute per API key. Exceeding this returns a 429 with a Retry-After header and a retry_after field in the response body indicating the seconds remaining in the current window.

Retry pattern

javascript
async function moderateWithRetry(content) {
  const res = await fetch(url, options);
  if (res.status === 429) {
    const data = await res.json();
    if (data.limit_type === 'monthly') throw new Error('Monthly limit reached');
    const wait = (data.retry_after ?? 60) * 1000;
    await new Promise(r => setTimeout(r, wait));
    return fetch(url, options);
  }
  return res;
}
💡

Track your credit usage with the usage object returned in every response. You'll also receive email alerts at 80%, 90%, and 100% of your monthly limit.

FAQ

How quickly does the API respond?

Most text requests complete in under 200ms. Image requests typically take 500ms to 2 seconds, depending on image fetch time and file size.

Does SafeModeration store the content I submit?

For authenticated production API requests, we store the moderation result, your submitted text content (for text moderation), and the image URL (for image moderation). This data is accessible to you through your dashboard and supports moderation review, audit, and analytics. For image moderation, we fetch the image, classify it, then discard the bytes. We do not retain raw image content. We store a SHA-256 hash of the image bytes for caching purposes only. Moderation logs are retained for one year and then automatically deleted. See our Privacy Policy for full details on retention and your rights.

What image formats are supported?

JPEG and PNG are supported. GIF, WebP, and other formats are not supported and will return a 400 error. Failed image requests are not charged.

How are image categories different from text categories?

Both content types return eight categories. Four overlap: hate_speech, adult_content, violence_gore, and suicide_self_harm. Text adds harassment_bullying, spam_scam, pii_exposure, and profanity. Image adds weapons, drugs, alcohol, and tobacco.

Do failed image requests count against my monthly credits?

No. If the image cannot be fetched, is in an unsupported format, or fails any safety check, the request is not charged. Credits are only consumed when a moderation result is successfully returned.

What languages are supported?

SafeModeration handles text content in 50+ languages, including all major European, East Asian, South Asian, and Middle Eastern languages. English, Spanish, French, German, Portuguese, Arabic, and Russian have the most thoroughly tested coverage.

What should I do with flag decisions?

That depends on your platform's policies. Common approaches: route to human review, add a friction step before posting, hold content pending secondary analysis, or treat identically to block. There is no single right answer.

Can I test without a paid plan?

Your 7-day free trial includes full API access. No charge until the trial ends.

What happens if the API is down?

Email support@safemoderation.com for urgent issues.

How do I cancel?

Manage your subscription from the billing section of your dashboard. Cancellation takes effect at the end of your current billing period.