cURL Integration
Use ScrapeHub directly from the command line
Basic Scrape Request
Terminalcurl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "format": "json" }'
Advanced Configuration
With JavaScript Rendering
Terminalcurl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "render_js": true, "wait_for": ".product-list", "format": "json" }'
With Custom Headers
Terminalcurl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "headers": { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36", "Accept-Language": "en-US,en;q=0.9", "Referer": "https://example.com" } }'
With Pagination
Terminalcurl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "pagination": { "enabled": true, "max_pages": 10, "selector": "a.next-page" } }'
Creating Async Jobs
Terminal# Create job curl -X POST https://api.scrapehub.io/v4/jobs \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/large-dataset", "engine": "neural-x1" }' # Response: # { # "job_id": "job_abc123xyz", # "status": "pending", # "created_at": "2026-01-27T10:30:00Z" # }
Checking Job Status
Terminalcurl -X GET https://api.scrapehub.io/v4/jobs/job_abc123xyz \ -H "X-API-KEY: sk_live_xxxx_449x"
Exporting Results
Export as JSON
Terminalcurl -X GET "https://api.scrapehub.io/v4/jobs/job_abc123xyz/export?format=json" \ -H "X-API-KEY: sk_live_xxxx_449x" \ -o results.json
Export as CSV
Terminalcurl -X GET "https://api.scrapehub.io/v4/jobs/job_abc123xyz/export?format=csv" \ -H "X-API-KEY: sk_live_xxxx_449x" \ -o results.csv
Export as Compressed
Terminalcurl -X GET "https://api.scrapehub.io/v4/jobs/job_abc123xyz/export?format=json&compress=gzip" \ -H "X-API-KEY: sk_live_xxxx_449x" \ -o results.json.gz
Pretty Print JSON
Terminalcurl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1" }' | jq '.'
Using Variables
Terminal# Store API key as environment variable export SCRAPEHUB_API_KEY="sk_live_xxxx_449x" # Use in requests curl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: $SCRAPEHUB_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1" }'
Batch Processing
batch_scrape.sh
Terminal#!/bin/bash API_KEY="sk_live_xxxx_449x" URLS=( "https://example.com/category/1" "https://example.com/category/2" "https://example.com/category/3" ) for url in "${URLS[@]}"; do echo "Scraping: $url" curl -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: $API_KEY" \ -H "Content-Type: application/json" \ -d "{ \"target\": \"$url\", \"engine\": \"neural-x1\" }" > "results_$(basename $url).json" echo "Saved results for $url" done
Error Handling
Terminal# Capture HTTP status code HTTP_STATUS=$(curl -s -o response.json -w "%{http_code}" \ -X POST https://api.scrapehub.io/v4/scrape \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1" }') if [ $HTTP_STATUS -eq 200 ]; then echo "Success! Results saved to response.json" cat response.json | jq '.' elif [ $HTTP_STATUS -eq 401 ]; then echo "Error: Invalid API key" elif [ $HTTP_STATUS -eq 429 ]; then echo "Error: Rate limit exceeded" else echo "Error: HTTP $HTTP_STATUS" cat response.json fi
Webhook Integration
Terminalcurl -X POST https://api.scrapehub.io/v4/jobs \ -H "X-API-KEY: sk_live_xxxx_449x" \ -H "Content-Type: application/json" \ -d '{ "target": "https://example.com/products", "engine": "neural-x1", "webhook_url": "https://your-server.com/webhook" }'
Pro Tips
- Use
-sflag for silent mode (no progress bar) - Use
-vflag for verbose output (debugging) - Pipe results through
jqfor JSON formatting - Store API keys in environment variables for security
- Use
-o filenameto save output to file