SafeModeration API Documentation

Quick start

Get up and running in under 60 seconds.

Step 1

Get your API key

Sign up at safemoderation.com/pricing to start your free trial. Your API key will be emailed to you immediately after checkout. It looks like this: sm_live_a1b2c3d4e5f6…

Step 2a

Moderate text

curl -X POST https://api.safemoderation.com/.netlify/functions/moderate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "media_type": "text",
    "content": "Hello, how are you today?",
    "reference_id": "comment_84729"
  }'

const response = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'text',
      content: 'Hello, how are you today?',
      reference_id: 'comment_84729',
    }),
  }
);
const data = await response.json();
console.log(data.decision); // "allow"

import requests

response = requests.post(
    'https://api.safemoderation.com/.netlify/functions/moderate',
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json',
    },
    json={
        'media_type': 'text',
        'content': 'Hello, how are you today?',
        'reference_id': 'comment_84729',
    }
)
data = response.json()
print(data['decision'])  # "allow"

Step 2b

Moderate an image

curl -X POST https://api.safemoderation.com/.netlify/functions/moderate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "media_type": "image",
    "content": "https://example.com/user-upload.jpg",
    "reference_id": "post_a8f9c2"
  }'

const response = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'image',
      content: 'https://example.com/user-upload.jpg',
      reference_id: 'post_a8f9c2',
    }),
  }
);
const data = await response.json();
console.log(data.decision); // "block"

import requests

response = requests.post(
    'https://api.safemoderation.com/.netlify/functions/moderate',
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json',
    },
    json={
        'media_type': 'image',
        'content': 'https://example.com/user-upload.jpg',
        'reference_id': 'post_a8f9c2',
    }
)
data = response.json()
print(data['decision'])  # "block"

Step 3

Read the response

Text response:

json

{
  "request_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "reference_id": "comment_84729",
  "decision": "allow",
  "confidence": 0.95,
  "categories": {
    "hate_speech": 0.01,
    "harassment_bullying": 0.02,
    "adult_content": 0.00,
    "violence_gore": 0.00,
    "spam_scam": 0.01,
    "suicide_self_harm": 0.00,
    "pii_exposure": 0.00,
    "profanity": 0.00
  },
  "warnings": [],
  "usage": {
    "credits_used": 1,
    "monthly_credits": 15000
  }
}

Image response:

json

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "reference_id": "post_a8f9c2",
  "decision": "block",
  "confidence": 0.92,
  "categories": {
    "adult_content": 0.04,
    "violence_gore": 0.92,
    "hate_speech": 0.01,
    "suicide_self_harm": 0.00,
    "weapons": 0.78,
    "drugs": 0.00,
    "alcohol": 0.00,
    "tobacco": 0.00
  },
  "warnings": [],
  "usage": {
    "credits_used": 47,
    "monthly_credits": 15000
  }
}

The decision field tells you what to do with the content.

Reference ID

reference_id is a required field that links each moderation result back to the record in your own data model. It is stored in the log and echoed in every response. It has no effect on the moderation decision itself.

Field	Required	Format	Description
`reference_id`	Yes	String, 1-256 chars, alphanumeric plus `._-/:`	Your unique identifier for this content. Use your database row ID, post slug, or comment ID: any key that lets you look up the original record.

Why `reference_id` is required

SafeModeration assigns its own request_id to every call. reference_id is yours: it makes the moderation log immediately actionable without a secondary lookup. When a result comes back block, your code already knows exactly which record to act on.

Worked example

A forum stores user comments in a comments table, each with an integer primary key. When a user submits a new comment, the forum's backend calls SafeModeration before writing the row to the database:

javascript

const result = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'text',
      content: comment.body,
      reference_id: `comment_${comment.id}`,  // e.g. "comment_84729"
    }),
  }
).then(r => r.json());

if (result.decision === 'block') {
  // reference_id is echoed back, no extra lookup needed
  await markCommentRejected(result.reference_id);
}

The response echoes "reference_id": "comment_84729", so the forum can act on the result without tracking SafeModeration's internal request_id at all.

Metadata

The optional metadata field accepts any plain JSON object you want stored alongside the log record and echoed in the response. It has no effect on the moderation decision itself.

Field	Required	Format	Description
`metadata`	No	Plain object, max 4,096 bytes (UTF-8 JSON)	Arbitrary key-value pairs you want stored with the log record. Common uses: content type, author ID, thread ID, locale, and other platform-specific context.

What to put in metadata

Metadata is a free-form envelope: use any keys that make sense for your platform. Common examples:

content_type: your category for the content (e.g. comment, post, profile), useful for filtering in your moderation dashboard
author_id: identifier of the user who created the content, enabling author-level abuse tracking and repeat-offender detection
thread_id, locale, client_version: any other context your team finds useful when reviewing flagged content

Worked example

javascript

const result = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'text',
      content: comment.body,
      reference_id: `comment_${comment.id}`,
      metadata: {
        content_type: 'comment',
        author_id: `user_${comment.authorId}`,
        thread_id: `thread_${comment.threadId}`,
      },
    }),
  }
).then(r => r.json());

// result.metadata is echoed back exactly as sent
console.log(result.metadata.author_id);  // "user_84729"

The metadata object is echoed in the response unchanged, so your downstream code can read any field it needs without an extra lookup.

Authentication

SafeModeration uses API key authentication. Pass your key as a Bearer token in the Authorization header on every request.

http

Authorization: Bearer sm_live_your_key_here

⚠️

Keep your API key secret. Never expose it in client-side code, public repositories, or browser requests. Always make API calls from your server.

ℹ️

API keys are issued immediately after checkout and emailed to you. If you lose your key, contact support@safemoderation.com to have it revoked and reissued.

The /moderate endpoint

http

POST https://api.safemoderation.com/.netlify/functions/moderate

Request schema

Headers

Header	Required	Value
`Authorization`	Yes	Bearer YOUR_API_KEY
`Content-Type`	Yes	application/json

Body parameters

Parameter	Type	Required	Description
`media_type`	string	Yes	Either `"text"` or `"image"`.
`content`	string	Yes	For text: the text to moderate, max 1,024 characters. For image: a public HTTPS URL pointing to a JPEG or PNG.
`reference_id`	string	Recommended	Your internal ID for this content (e.g. `post_id`, `comment_id`). Echoed in every response. 1-256 characters; alphanumeric plus `._-/:`.
`metadata`	object	No	Arbitrary key-value pairs stored with the log record and echoed in the response. Any plain JSON object up to 4,096 bytes (UTF-8). Has no effect on the moderation decision.

ℹ️

For text requests, content is limited to 1,024 characters. Requests with longer text are rejected with a 400 error. Trim content to the relevant portion before submitting.

Response

Field	Type	Description
`request_id`	string	Unique identifier for this request assigned by SafeModeration. Reference this ID when contacting support.
`reference_id`	string	Echoes the `reference_id` you sent. Always present in the response.
`metadata`	object	Echoes the `metadata` object you sent, unchanged. Only present if provided in the request.
`decision`	string	One of: `allow`, `flag`, `block`
`confidence`	number	Confidence score from 0.00 to 1.00
`categories`	object	Score for each moderation category (0.00-1.00). Keys vary by `media_type`. See the categories reference below.
`warnings`	array	Reserved for future warning codes. Currently always an empty array.
`usage.credits_used`	number	Credits consumed this month so far
`usage.monthly_credits`	number	Your plan's monthly credit limit

Contract guarantees

ℹ️

These properties are stable and guaranteed in every response:

All category keys for the given media_type are always present, even if their score is 0.00
decision is always one of: allow, flag, or block
confidence reflects the classifier's certainty in the decision, not an average of category scores
Response shape does not change between requests

Full example

curl -X POST https://api.safemoderation.com/.netlify/functions/moderate \
  -H "Authorization: Bearer sm_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "media_type": "text",
    "content": "I know where you live. Im going to find you and hurt you.",
    "reference_id": "post_a8f9c2"
  }'

const response = await fetch(
  'https://api.safemoderation.com/.netlify/functions/moderate',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.SAFEMODERATION_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      media_type: 'text',
      content: 'I know where you live. Im going to find you and hurt you.',
      reference_id: 'post_a8f9c2',
    }),
  }
);
const data = await response.json();
// data.decision === "block"

import requests

response = requests.post(
    'https://api.safemoderation.com/.netlify/functions/moderate',
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json',
    },
    json={
        'media_type': 'text',
        'content': 'I know where you live. Im going to find you and hurt you.',
        'reference_id': 'post_a8f9c2',
    }
)
data = response.json()
# data["decision"] == "block"

json response

{
  "request_id": "9b2d3e4f-1a2b-3c4d-5e6f-7a8b9c0d1e2f",
  "reference_id": "post_a8f9c2",
  "decision": "block",
  "confidence": 0.97,
  "categories": {
    "hate_speech": 0.08,
    "harassment_bullying": 0.96,
    "adult_content": 0.00,
    "violence_gore": 0.45,
    "spam_scam": 0.01,
    "suicide_self_harm": 0.00,
    "pii_exposure": 0.00,
    "profanity": 0.02
  },
  "warnings": [],
  "usage": {
    "credits_used": 42,
    "monthly_credits": 15000
  }
}

Decisions explained

Every response includes a decision field. Here is how to act on each value:

Decision	Meaning	Recommended action
`allow`	Content passes moderation	Publish
`flag`	Ambiguous, possible violation	Route to human review, or add friction
`block`	Clear violation	Reject

Integration pattern

javascript

switch (data.decision) {
  case 'allow':
    return publishContent();
  case 'flag':
    return sendToHumanReview();
  case 'block':
    return rejectContent();
}

💡

How you act on each decision is entirely up to your platform's policies. Many platforms auto-block on block, route flag to a human review queue, and auto-approve on allow. You decide the right thresholds for your use case.

Confidence scores

The confidence field reflects overall certainty in the decision, from 0.00 (uncertain) to 1.00 (very certain). Individual category scores reflect how strongly each category applies to the content.

ℹ️

Confidence scores are probabilistic, not deterministic. No automated system is 100% accurate. We recommend human review for high-stakes decisions and for content near decision boundaries.

Text categories

When media_type is "text", the response includes scores for the following eight categories. Each value is a float from 0.00 to 1.00. All keys are always present.

Category	Key	What it detects
Hate speech	`hate_speech`	Slurs, dehumanizing language, and content targeting people based on race, religion, ethnicity, gender, sexual orientation, or other protected characteristics.
Harassment & bullying	`harassment_bullying`	Targeted abuse, threats, doxxing, coordinated harassment, and content designed to intimidate or demean specific individuals.
Adult content	`adult_content`	Explicit sexual content, graphic nudity, and solicitation.
Violence & gore	`violence_gore`	Graphic violence, threats of physical harm, glorification of violence, and disturbing imagery descriptions.
Spam & scams	`spam_scam`	Phishing attempts, fake prizes, fraudulent solicitation, impersonation, and promotional abuse.
Suicide & self-harm	`suicide_self_harm`	Content that promotes, glorifies, or provides methods for self-harm or suicide. Context-aware: prevention and awareness content is not flagged.
PII exposure	`pii_exposure`	Personally identifiable information including Social Security numbers, credit card numbers, bank account details, passwords, and similar sensitive data.
Profanity	`profanity`	Explicit language and offensive terms.

Evasion detection

SafeModeration automatically detects common evasion techniques including l33tspeak substitution, spaced characters, Unicode homoglyphs, and repeated character patterns.

Multilingual support

The classifier detects harmful content in 50+ languages. Coverage is most thoroughly tested in English, Spanish, French, German, Portuguese, Arabic, and Russian, with strong performance across other major European, East Asian, South Asian, and Middle Eastern languages.

Image categories

When media_type is "image", the response includes scores for the following eight categories. Each value is a float from 0.00 to 1.00. All keys are always present.

Category	Key	What it detects
Adult content	`adult_content`	Nudity, sexual content, or sexually suggestive imagery.
Violence & gore	`violence_gore`	Violence, gore, or graphic disturbing imagery.
Hate speech	`hate_speech`	Hate symbols, extremist iconography, or supremacist imagery.
Suicide & self-harm	`suicide_self_harm`	Self-injury imagery or suicide-related imagery.
Weapons	`weapons`	Firearms, knives, or weapons in threatening contexts.
Drugs	`drugs`	Illegal substances, paraphernalia, or drug use imagery.
Alcohol	`alcohol`	Alcoholic beverages or drinking imagery. Use thresholds appropriate to your jurisdiction.
Tobacco	`tobacco`	Tobacco products or smoking imagery. Use thresholds appropriate to your jurisdiction.

Image requirements

SafeModeration fetches the image from the URL you provide, classifies it, then discards the bytes. We store the URL and a one-way hash of the image content for caching and audit purposes. The image bytes are not retained.

Requirement	Value
Supported formats	JPEG, PNG
Maximum file size	10 MB
Minimum dimensions	80 × 80 pixels
Maximum dimensions	8192 × 8192 pixels
URL protocol	HTTPS only. The URL must be publicly accessible.
Fetch timeout	5 seconds
Maximum redirects	3

⚠️

GIF and WebP are not supported. Requests with unsupported formats return a 400 error with code IMAGE_FORMAT_UNSUPPORTED. You are not charged for failed image requests.

💡

Pass the URL of the image as stored on your own infrastructure or CDN. The URL must be reachable from our servers at the time of the request. Pre-signed URLs with short expiration windows may fail if the window closes before the request is processed.

Error codes

Status	Description
`400`	`INVALID_MEDIA_TYPE`: `media_type` must be `"text"` or `"image"`.
`400`	`INVALID_CONTENT`: `content` must be a non-empty string.
`400`	`Content must be 1024 characters or fewer`: the `content` field exceeded 1,024 characters for a text request. Trim the content before retrying.
`400`	`"reference_id" is required`: `reference_id` was not included in the request body.
`400`	`"reference_id" must be 1-256 characters`: `reference_id` was empty or exceeded the length limit.
`400`	`"reference_id" contains invalid characters (allowed: a-z A-Z 0-9 . _ - / :)`: `reference_id` contained whitespace, quotes, or other disallowed characters.
`400`	`metadata must be a plain object`: `metadata` was not a JSON object (e.g. an array or string was sent).
`400`	`metadata exceeds 4096-byte limit (5200 bytes)`: the JSON-serialised `metadata` object exceeded 4,096 bytes (UTF-8); 5200 is the actual byte count from your request.
Image-specific errors
`400`	`INVALID_IMAGE_URL`: the image URL is malformed or does not use HTTPS.
`400`	`IMAGE_FETCH_FAILED`: the image could not be fetched. The server returned an error, the request timed out, or a network error occurred.
`400`	`IMAGE_FORMAT_UNSUPPORTED`: the image is not JPEG or PNG. GIF, WebP, and other formats are not supported.
`400`	`IMAGE_TOO_LARGE`: the image file exceeds 10 MB.
`400`	`IMAGE_TOO_SMALL`: the image dimensions are below 80 × 80 pixels.
`400`	`IMAGE_DIMENSIONS_TOO_LARGE`: the image dimensions exceed 8192 × 8192 pixels.
`400`	`IMAGE_URL_BLOCKED`: the image URL was rejected by URL safety checks (private IP ranges, localhost, non-public hosts).
`401`	Unauthorized: missing, invalid, or revoked API key.
`429`	`Rate limit exceeded. Maximum 600 requests per minute per API key. Slow down and retry.` Burst limit hit. The `retry_after` field in the response body and the `Retry-After` header indicate how many seconds until the current window resets.
`429`	`Monthly credit limit reached. Upgrade your plan or wait until the next billing period.` Monthly cap exhausted. The response body includes `"limit_type": "monthly"`. All requests return 429 until the 1st of next month or you upgrade your plan.
`502`	Internal error: retry the request. If it persists, contact support@safemoderation.com.

💡

Failed image requests do not consume credits. If the image cannot be fetched, is in an unsupported format, or fails any safety check, the request is not charged.

Error response format

json

{
  "error": "Invalid or revoked API key."
}

Image errors include an additional code field:

json

{
  "error": "Image is not JPEG or PNG.",
  "code": "IMAGE_FORMAT_UNSUPPORTED"
}

⚠️

There are two distinct 429 conditions. A burst 429 is temporary: wait the number of seconds in retry_after and resend. A monthly 429 ("limit_type": "monthly") blocks all requests until the 1st of next month or until you upgrade your plan.

Credits and rate limits

Credits

Each moderation request consumes credits from your monthly allowance:

Content type	Credits
Text	1 credit
Image	3 credits

Failed image requests do not consume credits. If the image cannot be fetched, is in an unsupported format, or fails any safety check, you are not charged.

Plans

Plan	Monthly credits	Price
Starter	15,000	$99/mo
Growth	150,000	$249/mo
Pro	500,000	$499/mo
Enterprise	Custom	Contact us

Credits reset on the 1st of each calendar month. Unused credits do not roll over.

Rate limits

All plans share a burst limit of 600 requests per minute per API key. Exceeding this returns a 429 with a Retry-After header and a retry_after field in the response body indicating the seconds remaining in the current window.

Retry pattern

javascript

async function moderateWithRetry(content) {
  const res = await fetch(url, options);
  if (res.status === 429) {
    const data = await res.json();
    if (data.limit_type === 'monthly') throw new Error('Monthly limit reached');
    const wait = (data.retry_after ?? 60) * 1000;
    await new Promise(r => setTimeout(r, wait));
    return fetch(url, options);
  }
  return res;
}

💡

Track your credit usage with the usage object returned in every response. You'll also receive email alerts at 80%, 90%, and 100% of your monthly limit.

FAQ

How quickly does the API respond?

Most text requests complete in under 200ms. Image requests typically take 500ms to 2 seconds, depending on image fetch time and file size.

Does SafeModeration store the content I submit?

For authenticated production API requests, we store the moderation result, your submitted text content (for text moderation), and the image URL (for image moderation). This data is accessible to you through your dashboard and supports moderation review, audit, and analytics. For image moderation, we fetch the image, classify it, then discard the bytes. We do not retain raw image content. We store a SHA-256 hash of the image bytes for caching purposes only. Moderation logs are retained for one year and then automatically deleted. See our Privacy Policy for full details on retention and your rights.

What image formats are supported?

JPEG and PNG are supported. GIF, WebP, and other formats are not supported and will return a 400 error. Failed image requests are not charged.

How are image categories different from text categories?

Both content types return eight categories. Four overlap: hate_speech, adult_content, violence_gore, and suicide_self_harm. Text adds harassment_bullying, spam_scam, pii_exposure, and profanity. Image adds weapons, drugs, alcohol, and tobacco.

Do failed image requests count against my monthly credits?

No. If the image cannot be fetched, is in an unsupported format, or fails any safety check, the request is not charged. Credits are only consumed when a moderation result is successfully returned.

What languages are supported?

SafeModeration handles text content in 50+ languages, including all major European, East Asian, South Asian, and Middle Eastern languages. English, Spanish, French, German, Portuguese, Arabic, and Russian have the most thoroughly tested coverage.

What should I do with flag decisions?

That depends on your platform's policies. Common approaches: route to human review, add a friction step before posting, hold content pending secondary analysis, or treat identically to block. There is no single right answer.

Can I test without a paid plan?

Your 7-day free trial includes full API access. No charge until the trial ends.

What happens if the API is down?

Email support@safemoderation.com for urgent issues.

How do I cancel?

Manage your subscription from the billing section of your dashboard. Cancellation takes effect at the end of your current billing period.

API Documentation

Quick start

Reference ID

Why reference_id is required

Worked example

Metadata

What to put in metadata

Worked example

Authentication

The /moderate endpoint

Request schema

Response

Contract guarantees

Full example

Decisions explained

Integration pattern

Confidence scores

Text categories

Evasion detection

Multilingual support

Image categories

Image requirements

Error codes

Error response format

Credits and rate limits

Credits

Plans

Rate limits

Retry pattern

FAQ

Why `reference_id` is required