Every major video model behind one API surface: Veo, Kling, Runway, Seedance, Wan, Hailuo and more. Below: the comparison table, real per-second pricing, latency numbers and the async integration pattern they all share.
The comparison
List pricing and latency compiled from provider docs and public aggregator rates, checked July 2026. Prices are per second of generated video; a "10s clip" column keeps the math honest.
| Model | $/second | 10s clip | Latency | Max res | Max length | Native audio | Image-to-video | Access |
|---|---|---|---|---|---|---|---|---|
| Google Veo 3.1 cinematic leader | $0.05 to 0.20 | $0.50 to 2.00 | 15 to 25s | 1080p | 8s (+extend) | Yes | Yes | Gemini API, Vertex, aggregators |
| Kling 3.0 Pro speed + value | $0.08 to 0.11 | $0.84 to 1.12 | 15 to 30s | 1080p | 15s | Yes (Omni) | Yes | Kling API, aggregators |
| Runway Gen-4.5 quality benchmark #1 | $0.20 to 0.25 | $2.00 to 2.50 | 20 to 40s | 1080p | 10s (+extend) | No | Yes | Runway API, aggregators |
| Seedance 2.0 | $0.08 to 0.10 | $0.80 to 1.00 | ~45s+ | 1080p | 10s | Yes | Yes (9 ref images) | BytePlus, aggregators |
| Wan 2.7 cheapest 1080p | $0.02 to 0.10 | $0.20 to 1.00 | 30 to 60s | 1080p | 10s | No | Yes | Alibaba, aggregators, self-host (open) |
| Hailuo 2.3 Pro | ~$0.05 to 0.08 | ~$0.50 to 0.80 | 30 to 60s | 1080p | 10s | No | Yes | MiniMax API, aggregators |
| Luma Ray 3 | ~$0.06 to 0.12 | ~$0.60 to 1.20 | 20 to 40s | 1080p | 10s (+extend) | No | Yes | Luma API, aggregators |
| Amazon Nova Reel | ~$0.08 | ~$0.80 | 60s+ | 720p | 2min (multi-shot) | No | Yes | AWS Bedrock |
| PixVerse v6 | ~$0.04 to 0.08 | ~$0.40 to 0.80 | 20 to 40s | 1080p | 8s | No | Yes | PixVerse API, aggregators |
| Vidu Q3 | ~$0.04 to 0.08 | ~$0.40 to 0.80 | 30 to 60s | 1080p | 16s | No | Yes | Vidu API, aggregators |
| Hunyuan Video open source | Compute only | GPU cost | Hardware-bound | 720p+ | ~5s | No | Yes | Self-host, open weights |
| OpenAI Sora 2 API retiring Sept 2026 | Credit-based | Varies | 60s+ | 1080p | 20s (+ext to 120s) | Yes | Yes | OpenAI API (deprecated), app |
OpenAI has announced the Sora 2 Videos API shuts down on September 24, 2026. If you built on it, the practical migrations are Veo 3.1 (closest on quality plus native audio), Kling 3.0 (cheaper, faster) or an aggregator layer so the next deprecation is a config change instead of a rewrite. That last option is exactly why we're building this unified API.
Know what you're buying
Half the frustration with this market is vendors using one term for three different products. Route yourself first:
Veo, Kling, Runway, Seedance, Wan, Hailuo. A prompt or a still image goes in, novel footage comes out. Priced per second of output. This page (and this API) is about these.
HeyGen, Synthesia, D-ID, Tavus. A script becomes a talking-head video with a licensed or cloned presenter. Priced per minute or credit. Right choice for training and personalized outreach video.
Shotstack, Creatomate, JSON2Video. Programmatic editing: your existing clips, images and captions composited by template. Nothing is generated. Right choice for automated slideshows and data-driven video.
Integration
Video generation takes 15 to 60+ seconds, so every serious API is asynchronous: submit a job, then poll or receive a webhook. Here is the same integration in three languages against our unified endpoint. Swap the model string to switch providers; nothing else changes.
# 1. submit curl -X POST .../v1/generate \ -H "Authorization: Bearer $KEY" \ -d '{"model":"kling-3.0", "prompt":"product rotating on a marble pedestal, studio light","duration":5}' # 2. poll until done curl .../v1/jobs/gen_8f2k1 \ -H "Authorization: Bearer $KEY" # {"status":"succeeded", # "video_url":"https://..."}
import time, requests job = requests.post( f"{BASE}/v1/generate", headers=auth, json={"model": "veo-3.1", "prompt": prompt, "duration": 8} ).json() while True: j = requests.get( f"{BASE}/v1/jobs/{job['id']}", headers=auth).json() if j["status"] != "running": break time.sleep(3) print(j["video_url"])
const job = await fetch( `${BASE}/v1/generate`, { method: "POST", headers: auth, body: JSON.stringify({ model: "wan-2.7", prompt, duration: 5, webhook_url: "https://app.dev/hooks" }) }).then(r => r.json()); // webhook fires on completion: // { id, status: "succeeded", // video_url, seconds_billed }
Two production notes the docs never lead with. First, always implement the webhook path even if you start with polling: long generations plus polling loops are how serverless bills explode. Second, store the provider job ID before you await anything; when a function times out mid-poll, that ID is the difference between resuming and re-billing.
Cost at scale
Per-second prices look interchangeable until you multiply. The same monthly workload, 1,000 clips of 10 seconds each, priced across the field:
| Model | 1,000 x 10s clips / month | Read |
|---|---|---|
| Wan 2.7 | $200 to 1,000 | The volume play. B-roll, drafts, high-iteration creative testing. |
| Hailuo 2.3 / PixVerse / Vidu | $400 to 800 | The value middle. Good quality per dollar for social content. |
| Seedance 2.0 | $800 to 1,000 | Pay for multimodal control: 9 reference images, audio input. |
| Kling 3.0 Pro | $840 to 1,120 | Fastest latency tier plus audio. The ads workhorse. |
| Veo 3.1 | $500 to 2,000 | Cinematic ceiling; the range is the resolution tier you pick. |
| Runway Gen-4.5 | $2,000 to 2,500 | Benchmark-topping quality. Hero shots, not volume. |
The strategy that falls out of this table: route by shot value. Draft and iterate on a cheap model, re-render the winning prompt on a premium one. Teams doing this cut video spend 60 to 80 percent versus running everything on the flagship, and it is precisely the workflow a unified API makes a one-line change.
Use-case router
| You're building | Use | Why |
|---|---|---|
| Ad creative testing at volume | Kling 3.0, Seedance 2.0 | Cheap enough to test 20 variants, fast enough to iterate same-day, native audio for sound-on placements. |
| Cinematic brand film | Veo 3.1, Runway Gen-4.5 | Best physics, coherence and camera language. Render few, render high. |
| Social b-roll pipelines | Wan 2.7, Hailuo 2.3 | Cost per clip low enough to generate daily at feed scale. |
| Product demos from stills | Kling 3.0, PixVerse v6 | Strong image-to-video: your real product photos become motion without a studio. |
| Avatar or training video | HeyGen, Synthesia, Tavus | Different category (see above). Script-to-presenter beats generative models for talking heads. |
| Full control / no per-second fees | Hunyuan, Wan (open weights) | Self-host on your GPUs. You trade ops burden for marginal cost. |
Going direct to one provider gets you enterprise SLAs, the newest checkpoints first, and one vendor relationship. It also gets you their rate limits, their deprecation schedule (see Sora) and a rewrite every time the leaderboard flips, which in this market is roughly quarterly.
An aggregation layer (one key, one schema, model as a string parameter) trades a small markup for portability: reroute when prices drop, A/B models on real traffic, and survive deprecations with a config change. Our position is obviously the second camp, since that is what we are building, but the honest rule is: direct if you are certain of your model and volume-negotiating with the vendor; aggregate if you value optionality. Most teams shipping product, not research, want optionality.
FAQ
Wan 2.7 is the cheapest mainstream 1080p option at roughly $0.02 to 0.10 per second depending on host and tier, which puts a 10-second clip at $0.20 to 1.00. If you can run your own GPUs, open-weight models (Wan, Hunyuan) drop marginal cost to pure compute. For hosted volume work, the Hailuo and PixVerse tier around $0.04 to 0.08 per second is the usual sweet spot.
Veo 3.1 and Kling 3.0 currently lead on latency, typically returning a 5 to 10 second clip in 15 to 30 seconds. Seedance and Wan trade speed for cost or control, usually landing at 30 to 60+ seconds. Latency varies with load, resolution and duration; benchmark with your own prompts at your usual hours before committing.
Veo 3.1, Kling 3.0 (Omni), Seedance 2.0 and Sora 2 generate synchronized audio natively, including ambient sound and speech. Runway, Wan, Hailuo and Luma output silent video; you add sound in post or via a TTS pass. For sound-on placements like TikTok, native audio saves a pipeline step and usually sounds more coherent.
OpenAI has announced the Sora 2 Videos API shuts down on September 24, 2026. Existing integrations keep working until then. If you are choosing today, build against Veo 3.1 or Kling 3.0, or integrate through an abstraction layer so the next provider change is configuration, not code.
Most providers bill per second of generated output, with the rate depending on model and resolution tier. Some (Runway, Sora) use credit systems that abstract the same thing. Aggregators normalize everything to per-second or per-clip pricing. Watch for the failed-generation policy: good APIs don't bill failed or moderation-blocked jobs, but not all are good.
Hosted providers (Google, Kling, Runway, MiniMax) grant commercial usage rights on paid tiers, and none of the majors watermark paid API output. Open-weight models vary: Wan's license permits commercial use, others restrict above certain scale. Whatever you ship, keep the generation metadata; provenance requirements (C2PA labeling) are tightening across ad platforms.
The current hosted standard is 1080p at 5 to 15 seconds per generation, with extension endpoints chaining clips toward 30 to 120 seconds. Nova Reel does multi-shot two-minute videos at 720p. True 4K generation is not commercially standard yet; teams upscale 1080p output when they need it.
Yes: Hunyuan Video and Wan publish open weights that run on a single high-memory GPU (think H100 or a rented A100 class card). You give up managed scaling, safety filtering and the newest checkpoints, and gain unlimited generation at fixed hardware cost. Break-even versus API pricing typically lands around several thousand clips per month.
Direct if you have negotiated volume pricing with one vendor and stability matters more than flexibility. Aggregator if you want to route between models, hedge deprecations and test quality-per-dollar continuously. The market leaderboard has flipped roughly every quarter since 2024, which is the strongest argument for keeping the model name a string in your config.
Runway Gen-4.5 tops quality benchmarks, Veo 3.1 leads the cinematic-plus-audio combination, Kling 3.0 wins price-to-performance, and Wan 2.7 wins pure price. There is no single best, which is the point of this page: match the table above to your use case, or use a unified API and stop betting your roadmap on one vendor.
We're onboarding early-access developers now. Unified schema, per-second billing, webhooks, and model routing across everything in the table above.