AI Phone Call API Tutorial: Make Your First Outbound Call in 10 Minutes
An AI phone call API is an HTTP endpoint that dials a real phone number, runs a realtime voice agent against the audio stream, and returns a structured transcript and recording when the call ends. This tutorial walks the full loop end-to-end with copy-paste examples in cURL and Python, using ClawCall's REST API at api.clawcall.dev as the worked example. You will go from zero credentials to a finalized call transcript in about ten minutes. The post covers the asynchronous fire-and-poll pattern, the tri-auth model (anonymous, API key, Clerk session), the call lifecycle states, recording retrieval, the bridge-to-human handoff, and the agent-skill drop-in for Claude Code and Cursor. No credit card is required for the first 30 calls + 30 minutes.
Try ClawCall free — 30 calls + 30 min, no card →What an AI Phone Call API Actually Does
Prerequisites and the Tri-Auth Model
Your First Call: One cURL Command
Polling for the Result: The Call Lifecycle
A Working Python Loop
Handling Bridges, Recordings, and IVR Edge Cases
Skipping the Tutorial: The Agent Skill for Claude Code and Cursor
How This API Compares to the Alternatives
Frequently asked
- How long does it take to make my first AI phone call with the ClawCall API?
- About ten minutes from a cold start. The first POST /call request is anonymous and requires no signup, no credit card, and no provisioned phone number. You send one HTTP request with a destination number in E.164 format and a plain-English task description, save the api_key returned in the response, and poll GET /call/:id every few seconds until lifecycle equals finalized. The first 30 calls + 30 minutes of calling are free. If you are integrating into Claude Code or Cursor, the agent skill at /for-agents wraps the entire flow and gets you to a working call in roughly half that time. No SDK install is required for the raw API path.
- What does the response from POST /call look like?
- POST /call returns immediately with a JSON body containing call_id, lifecycle set to queued, and (on anonymous first calls) an api_key field starting with clawcall_sk_. The full call result is not in this response because real phone calls take minutes to complete; you poll GET /call/:id to get the final state. The polled response separates two concerns: lifecycle (queued, dialing, answered, finalized) tells you whether to keep polling, and outcome (completed, voicemail_detected, no_answer, busy, declined_by_callee, failed, and others) tells you what happened. The final response also includes talk_seconds, the transcript array, a recording_url, and hangup_cause fields.
- Do I need to handle voicemail, IVR menus, or hold time myself?
- No. The voice agent handles all three internally. When the call reaches voicemail, it follows your task: leave a concise message if you asked for one, or report that voicemail was reached if you did not. IVR menus are navigated automatically using a built-in send_dtmf tool; the menu prompts and the digits the agent pressed appear in your transcript so you can audit them. Hold time is handled by the agent simply waiting; talk_seconds reflects the full duration including hold. You do not write DTMF logic, voicemail heuristics, or hold-detection code. Your client code is one POST and a polling loop, regardless of how complex the actual call ends up being on the line.
- Can the AI hand the call to me when a human needs to take over?
- Yes, via the bridge_number parameter on POST /call. Pass your own US phone number and the agent gets a loop_in_user tool. When the agent decides handoff is appropriate (verification step, payment authorization, an unusual request, or the human on the other end insisting), it tells the callee "connecting you now," the carrier bridges the two phone calls at the network level, and you finish the conversation yourself. Bridge calls consume two numbers from the outbound pool, so they count as two against your concurrency for the duration. The bridge contract is documented at /docs and the consumer-facing pattern is shown at /hold-for-me.
- What are the rate limits and concurrency limits on the API?
- Free-tier anonymous calls are rate-limited by identity and capped at 30 calls and 30 minutes, whichever lasts later, before sign-up is required. On a paid plan, the default outbound concurrency is roughly three simultaneous calls per account, set by the size of the shared outbound phone-number pool. Bridge calls consume two numbers each, so they count as two against your concurrency. Unlimited Reserve and Unlimited Reserve Plus subscribers get a private reserved number that does not draw from the shared pool. There is no per-minute billing on any plan; pricing is flat monthly at $4.99 for Unlimited, $8.99 for Unlimited Reserve, and $14.99 for Unlimited Reserve Plus. US numbers only today.
- How does this compare to Vapi, Retell, or building on a raw carrier plus an LLM?
- Vapi and Retell are infrastructure-grade voice platforms designed for teams building their own voice products end to end, with per-minute pricing and rich assistant-configuration APIs. Rolling your own on a raw carrier plus a realtime LLM is the fully assemble-it-yourself path; you control everything and you also build the SIP plumbing, codec bridge, and turn-taking logic. ClawCall sits one level higher: a finished consumer-and-agent product exposed through a clean REST API, with flat monthly pricing, a no-card free tier of 30 calls and 30 minutes, whichever lasts later, mandatory AI disclosure, voicemail support, and a drop-in agent skill for Claude Code and Cursor. Pick the platform that matches what you are actually building. See /vs/vapi and /vs/retell for line-by-line breakdowns.