Skip to main content
We highly recommend using Svix webhooks instead of direct webhooks for production applications.Svix provides enterprise-grade webhook infrastructure with built-in security (cryptographic request signing), advanced retry logic, delivery tracking, and a management dashboard. Direct webhooks require you to implement your own validation and security measures.Use direct webhooks only for simple use cases, prototyping, or when you have specific requirements that prevent using Svix.
Webhooks allow your application to receive real-time updates when long-running or asynchronous jobs are completed β€” such as parsing or extraction tasks submitted via Reducto’s API. Instead of continuously polling the job status, you can configure a webhook endpoint to be notified automatically when a job finishes. With direct mode, Reducto sends HTTP POST requests directly to your specified endpoint URL.

How Direct Webhooks Work

When you submit an async job with a direct webhook configured, Reducto will:
  1. Process your document asynchronously
  2. Send an HTTP POST request to your webhook URL when the job completes (or fails)
  3. Include the job status, job ID, and any metadata you provided in the request body

Webhook Payload

The webhook notification will be sent as a JSON POST request with the following structure:
{
  "status": "Completed",  // or "Failed"
  "job_id": "job_abc123",
  "metadata": {
    // Your custom metadata (optional)
  }
}

Setting Up Your Webhook Endpoint

Your webhook endpoint should:
  • Accept HTTP POST requests
  • Return a 2xx status code to acknowledge receipt
  • Process the webhook payload to retrieve the job result
Reducto will automatically retry failed webhook deliveries up to 3 times with exponential backoff.

Submit Async Job with Direct Webhook

from reducto import Reducto

client = Reducto(api_key="your_api_key")

# Submit an async parse job with a direct webhook
submission = client.parse.run_job(
    input="https://ci.reducto.ai/onepager.pdf",
    async_config={
        "webhook": {
            "mode": "direct",
            "url": "https://your-domain.com/webhook/reducto"
        },
        "metadata": {
            "user_id": "user_123",
            "document_type": "invoice"
        }
    }
)

print(f"Job submitted: {submission.job_id}")

Webhook Handler

from flask import Flask, request, jsonify
from reducto import Reducto

app = Flask(__name__)
client = Reducto(api_key="your_api_key")

@app.route('/webhook/reducto', methods=['POST'])
def handle_webhook():
    payload = request.json
    
    job_id = payload['job_id']
    status = payload['status']
    metadata = payload.get('metadata', {})
    
    if status == "Completed":
        # Retrieve the job result
        job = client.job.get(job_id)
        result = job.result
        
        # Process the result
        print(f"Job {job_id} completed successfully")
        print(f"Metadata: {metadata}")
        print(f"Result: {result}")
        
    elif status == "Failed":
        print(f"Job {job_id} failed")
    
    return jsonify({"received": True}), 200

if __name__ == '__main__':
    app.run(port=5000)

Using Direct Webhooks with Extract

Direct webhooks work with all async endpoints. Here’s an example with the extract endpoint:
from reducto import Reducto

client = Reducto(api_key="your_api_key")

submission = client.extract.run_job(
    input="https://ci.reducto.ai/invoice.pdf",
    instructions={
        "schema": {
            "invoice_number": "string",
            "total_amount": "number",
            "date": "string"
        }
    },
    async_config={
        "webhook": {
            "mode": "direct",
            "url": "https://your-domain.com/webhook/reducto"
        }
    }
)

print(f"Extraction job submitted: {submission.job_id}")

Validating Webhook Requests

Since direct webhooks don’t include built-in request signing like Svix webhooks, it’s important to implement your own validation strategy to ensure webhook requests are legitimate. Here are several approaches you can use:

Strategy 1: Secret Token in Metadata

Include a secret token in your webhook metadata and validate it on receipt. This is the simplest approach for basic validation.
import os
import secrets
from flask import Flask, request, jsonify, abort
from reducto import Reducto

app = Flask(__name__)
client = Reducto(api_key="your_api_key")

# Generate a secret token (store this securely, e.g., in environment variables)
WEBHOOK_SECRET = os.environ.get("WEBHOOK_SECRET", secrets.token_urlsafe(32))

# When submitting a job, include the secret in metadata
def submit_job_with_secret():
    submission = client.parse.run_job(
        input="https://ci.reducto.ai/onepager.pdf",
        async_config={
            "webhook": {
                "mode": "direct",
                "url": "https://your-domain.com/webhook/reducto"
            },
            "metadata": {
                "webhook_secret": WEBHOOK_SECRET,
                "user_id": "user_123"
            }
        }
    )
    return submission.job_id

# Validate the secret in your webhook handler
@app.route('/webhook/reducto', methods=['POST'])
def handle_webhook():
    payload = request.json
    metadata = payload.get('metadata', {})
    
    # Validate the webhook secret
    if metadata.get('webhook_secret') != WEBHOOK_SECRET:
        abort(401, "Invalid webhook secret")
    
    job_id = payload['job_id']
    status = payload['status']
    
    if status == "Completed":
        job = client.job.get(job_id)
        # Process the result
        print(f"Job {job_id} completed successfully")
    
    return jsonify({"received": True}), 200

Strategy 2: Job ID Verification

Maintain a list of expected job IDs and only process webhooks for jobs you’ve submitted. This prevents processing of forged webhook requests.
from flask import Flask, request, jsonify, abort
from reducto import Reducto
import redis

app = Flask(__name__)
client = Reducto(api_key="your_api_key")
redis_client = redis.Redis(host='localhost', port=6379, decode_responses=True)

# Store job IDs when submitting
def submit_and_track_job():
    submission = client.parse.run_job(
        input="https://ci.reducto.ai/onepager.pdf",
        async_config={
            "webhook": {
                "mode": "direct",
                "url": "https://your-domain.com/webhook/reducto"
            }
        }
    )
    
    # Store the job ID with an expiration (e.g., 24 hours)
    redis_client.setex(f"job:{submission.job_id}", 86400, "pending")
    return submission.job_id

# Validate job ID exists in your system
@app.route('/webhook/reducto', methods=['POST'])
def handle_webhook():
    payload = request.json
    job_id = payload['job_id']
    
    # Check if this is a job we submitted
    if not redis_client.exists(f"job:{job_id}"):
        abort(404, "Unknown job ID")
    
    status = payload['status']
    
    if status == "Completed":
        job = client.job.get(job_id)
        # Process the result
        print(f"Job {job_id} completed successfully")
        
        # Clean up the job ID
        redis_client.delete(f"job:{job_id}")
    
    return jsonify({"received": True}), 200

Strategy 3: IP Allowlisting

Restrict webhook requests to come only from Reducto’s IP addresses. Contact Reducto support to get the current list of webhook source IPs for your deployment.
from flask import Flask, request, jsonify, abort
from reducto import Reducto

app = Flask(__name__)
client = Reducto(api_key="your_api_key")

# Get this list from Reducto support
ALLOWED_IPS = [
    "203.0.113.0/24",  # Example CIDR block
    "198.51.100.0/24"  # Example CIDR block
]

def is_ip_allowed(ip_address):
    """Check if the IP address is in the allowed list"""
    from ipaddress import ip_address as parse_ip, ip_network
    
    client_ip = parse_ip(ip_address)
    for allowed_range in ALLOWED_IPS:
        if client_ip in ip_network(allowed_range):
            return True
    return False

@app.route('/webhook/reducto', methods=['POST'])
def handle_webhook():
    # Get the client IP (handle proxy headers if behind a reverse proxy)
    client_ip = request.headers.get('X-Forwarded-For', request.remote_addr)
    if ',' in client_ip:
        client_ip = client_ip.split(',')[0].strip()
    
    # Validate IP address
    if not is_ip_allowed(client_ip):
        abort(403, "IP address not allowed")
    
    payload = request.json
    job_id = payload['job_id']
    status = payload['status']
    
    if status == "Completed":
        job = client.job.get(job_id)
        print(f"Job {job_id} completed successfully")
    
    return jsonify({"received": True}), 200
For production systems, combine multiple validation strategies for defense in depth:
from flask import Flask, request, jsonify, abort
from reducto import Reducto
import redis
import os

app = Flask(__name__)
client = Reducto(api_key="your_api_key")
redis_client = redis.Redis(host='localhost', port=6379, decode_responses=True)

WEBHOOK_SECRET = os.environ.get("WEBHOOK_SECRET")
ALLOWED_IPS = os.environ.get("ALLOWED_IPS", "").split(",")

def validate_webhook_request(payload, client_ip):
    """Validate webhook using multiple strategies"""
    
    # 1. Validate IP address (if configured)
    if ALLOWED_IPS and client_ip not in ALLOWED_IPS:
        return False, "IP address not allowed"
    
    # 2. Validate job ID exists in our system
    job_id = payload.get('job_id')
    if not redis_client.exists(f"job:{job_id}"):
        return False, "Unknown job ID"
    
    # 3. Validate webhook secret (if configured)
    metadata = payload.get('metadata', {})
    if WEBHOOK_SECRET and metadata.get('webhook_secret') != WEBHOOK_SECRET:
        return False, "Invalid webhook secret"
    
    return True, None

@app.route('/webhook/reducto', methods=['POST'])
def handle_webhook():
    client_ip = request.headers.get('X-Forwarded-For', request.remote_addr)
    if ',' in client_ip:
        client_ip = client_ip.split(',')[0].strip()
    
    payload = request.json
    
    # Validate the request
    is_valid, error_message = validate_webhook_request(payload, client_ip)
    if not is_valid:
        abort(403, error_message)
    
    job_id = payload['job_id']
    status = payload['status']
    
    if status == "Completed":
        job = client.job.get(job_id)
        print(f"Job {job_id} completed successfully")
        redis_client.delete(f"job:{job_id}")
    
    return jsonify({"received": True}), 200

Best Practices

  1. Secure Your Endpoint: Use HTTPS for your webhook URL to ensure secure transmission of data.
  2. Implement Validation: Always validate webhook requests using one or more of the strategies above. For production systems, use a combined approach.
  3. Handle Retries: Your endpoint should be idempotent, as Reducto will retry failed deliveries up to 3 times.
  4. Quick Response: Return a 2xx status code quickly. Process the job result asynchronously if needed to avoid timeouts.
  5. Use Metadata: Include custom metadata in your webhook configuration to help identify and route jobs in your application.
  6. Error Handling: Implement proper error handling in your webhook endpoint to avoid missing notifications.
  7. Monitor Webhook Failures: Log and monitor failed webhook validations to detect potential security issues or misconfigurations.

Comparison with Svix Webhooks

Direct webhooks are simpler to set up but have fewer features compared to Svix webhooks:
FeatureDirect WebhooksSvix Webhooks
Setup complexitySimpleRequires Svix configuration
Retry logicBasic (3 retries)Advanced with exponential backoff
Webhook signingNot includedIncluded for security
Delivery trackingNot includedFull delivery dashboard
Multiple endpointsOne URL per jobMultiple endpoints via channels
For production applications with high reliability requirements, consider using Svix webhooks instead.