Your first month is free.First month free on any plan.Thanks to the Deepgram for Startups program. Use codeStart free with DEEPGRAM
AI phone call API tutorial

AI Phone Call API Tutorial: Make Your First Outbound Call in 10 Minutes

An AI phone call API is an HTTP endpoint that dials a real phone number, runs a realtime voice agent against the audio stream, and returns a structured transcript and recording when the call ends. This tutorial walks the full loop end-to-end with copy-paste examples in cURL and Python, using ClawCall's REST API at api.clawcall.dev as the worked example. You will go from zero credentials to a finalized call transcript in about ten minutes. The post covers the asynchronous fire-and-poll pattern, the tri-auth model (anonymous, API key, Clerk session), the call lifecycle states, recording retrieval, the bridge-to-human handoff, and the agent-skill drop-in for Claude Code and Cursor. No credit card is required for the first 30 calls + 30 minutes.

Try ClawCall free — 30 calls + 30 min, no card →

What an AI Phone Call API Actually Does

An AI phone call API is an HTTP endpoint you POST to with a destination number and a task description. The provider's backend dials the number over the PSTN, runs a realtime voice agent against the audio stream, and hands back a structured result once the call ends. The agent handles the messy parts a normal API cannot: pressing IVR digits, waiting through hold music, repeating itself when a receptionist mishears, recognizing voicemail and following your message fallback, and bridging the call to a human when verification is needed. You are not assembling a SIP trunk, a speech-to-text pipeline, a turn-taking model, and a TTS engine yourself. You are calling a finished product. The modal use cases break into two camps. Developers wire calls into agents and workflows: a coding agent session that calls a clinic to book an appointment, a task that disputes a duplicate utility charge, an automation flow that confirms restaurant reservations the night before. Consumers use the same backend through chat or a web dashboard for one-off tasks like canceling a gym membership or sitting on hold with the airline. A well-designed API serves both surfaces from the same REST contract, which means anything you build composes cleanly with the rest of the product. There is no separate developer-only tier with different limits or different model behavior. The shape of the API tends to be similar across vendors: a POST endpoint that returns an identifier immediately, a polling endpoint for status, and a transcript plus recording at the end. What varies is how much you have to configure (assistants, prompts, voices, transports), how billing works (per-minute vs flat), and whether the call agent has guardrails around voicemail and AI disclosure.

Prerequisites and the Tri-Auth Model

You need three things: a terminal, a US phone number you control for the first test call, and roughly ten minutes. You do not need a credit card, you do not need to provision a phone number, and you do not need to wire up Telnyx, Deepgram, or ElevenLabs accounts. ClawCall manages the full telephony and voice stack and exposes a single REST surface at api.clawcall.dev. The first 30 calls + 30 minutes of calling are free. Authentication uses a tri-auth model: anonymous (no header), API key (X-Api-Key header), or a session token. The interesting move is the anonymous path. The first POST /call without any credentials auto-issues a proto-key in the response body under the api_key field. Store it, send it as X-Api-Key on every subsequent request, and the same key keeps working after you sign up by linking it to your account. This lets you test the integration with zero friction and upgrade to a paid plan later without rewriting your client code or rotating credentials. The practical sequence for this tutorial: open a terminal, have your own US mobile number ready in E.164 format (e.g., +14155550123), and pick a one-line task description in plain English. You will not write a script, a state machine, or a dialog tree. You tell the agent what outcome you want and any constraints it should respect, and the agent does the rest. The proto-key is rate-limited to your IP's free-tier allowance until you link it, at which point it draws from your subscription.

Your First Call: One cURL Command

Open a terminal and run a single POST. Replace +15555550123 with your own US mobile number so the first test call comes to you. The to field accepts E.164 format and is restricted to +1 NANP today. The task field is plain English; you do not write a script or a state machine. Tell the agent what outcome you want and any constraints it should respect. curl -X POST https://api.clawcall.dev/call \ -H 'Content-Type: application/json' \ -d '{"to": "+15555550123", "task": "Tell whoever answers this is a test of the ClawCall API and ask what the weather is where they are."}' The response returns immediately with a call_id, a lifecycle of queued, and an api_key field if this was an anonymous call. Save both. The API is fire-and-poll by design: dialing, ringing, talking, and hangup all happen asynchronously on the telephony side, and you read the final state by polling GET /call/:id rather than holding an HTTP connection open. This matters because realistic calls run two to fifteen minutes, and no load balancer should hold a request open that long. The pattern also keeps your client code small: one POST, then a polling loop, then a transcript. If your runtime is serverless or background-job based, the asynchronous shape is the only viable model — a synchronous wait-for-call-to-finish endpoint would either time out at the edge or hold a worker hostage. The same fire-and-poll contract appears in most production voice APIs, so the pattern you learn here ports cleanly to other vendors if you ever switch.

Polling for the Result: The Call Lifecycle

Now poll GET /call/:id every few seconds until the call finishes. The response contract uses two fields you should learn separately. The lifecycle field walks through queued, dialing, answered, and finalized, in that order. Finalized means the call ended for any reason and the result is stable. The outcome field is a separate enum that tells you what actually happened: completed, voicemail_detected, no_answer, busy, declined_by_callee, failed, and a handful of others. Treat lifecycle as the gate for whether to keep polling, and outcome as the answer to what happened. curl https://api.clawcall.dev/call/$CALL_ID -H "X-Api-Key: $KEY" Once lifecycle is finalized, the response includes talk_seconds (the single duration field you should bill or display against), a transcript array of role-tagged turns, a recording_url that resolves to an MP3 or WAV, and structured event timestamps. Poll every two to three seconds with simple backoff after the first thirty seconds; calls that hit hold music can run long. The split between lifecycle and outcome is worth dwelling on, because the two collapse-them-into-one designs (a single status string with values like queued/ringing/voicemail/completed/failed) make filtering and billing harder. With a clean split, a query like "all finalized calls last week with outcome != completed" is one filter; with a merged enum it is a parameterized switch. The hangup_cause and sip_hangup_cause fields are populated on every finalized call and explain why the call ended at the carrier level — useful when an outcome of failed needs root-causing.

A Working Python Loop

Here is the same flow as a complete Python script. It places the call, captures the proto-key from the first response, polls until finalized, and prints the transcript. The only external dependency is requests. import os, time, requests BASE = 'https://api.clawcall.dev' resp = requests.post(f'{BASE}/call', json={ 'to': os.environ['MY_NUMBER'], 'task': 'Ask what the weather is like and thank them.' }).json() key = resp.get('api_key') call_id = resp['call_id'] headers = {'X-Api-Key': key} if key else {} while True: call = requests.get(f'{BASE}/call/{call_id}', headers=headers).json() if call['lifecycle'] == 'finalized': break time.sleep(3) print(f"Outcome: {call['outcome']}, duration: {call['talk_seconds']}s") for turn in call['transcript']: print(f"{turn['role']}: {turn['text']}") This is the entire integration. No SDK to install, no webhook server to stand up, no SIP credentials, no audio codec to negotiate. Save the proto-key returned in resp and reuse it on the next call to draw from the same usage bucket. When you eventually sign up, the same key links to your account and the script keeps working without changes. For production use, three small refinements are worth adding. Wrap the polling loop with a maximum total wait (say 20 minutes) so a stuck call cannot block your worker forever. Add exponential backoff after the first 30 seconds (3s, 5s, 8s, 10s capped) to reduce request volume on long hold-time calls. And handle 429 responses by sleeping for the duration in the Retry-After header. None of these change the shape of the integration — they just make it polite.

Handling Bridges, Recordings, and IVR Edge Cases

Two API features matter past the hello-world stage. First, the bridge_number field lets the AI hand the live call to you mid-conversation. Pass your own phone number when you POST /call and the agent gets a loop_in_user tool. When verification, payment confirmation, or a human handoff is needed, the agent says "connecting you now," the carrier bridges the two legs at the network level, and you finish the call yourself. This is how you survive cases where the callee insists on talking to a real human or asks for a credit card number the agent will not say out loud. The bridge consumes a second number from the outbound pool, so your concurrency drops by one for the duration. Second, recordings are retrievable at GET /call/:id/recording. If your account has S3-compatible object storage configured, recordings are archived there automatically and the URL is presigned. Otherwise the URL fetches from the carrier directly for as long as the recording is retained. IVR navigation is handled internally by the agent; you do not need to write DTMF logic. The send_dtmf tool is available but you rarely call it explicitly because the agent fires it when a menu prompt is detected. Transcripts include synthesized turns for IVR menus so you can audit exactly what the agent navigated through. Voicemail handling is task-driven: the agent recognizes a voicemail greeting, leaves a concise message when your task asks for one, or reports that it reached voicemail when you did not authorize a message. You do not write voicemail heuristics yourself; the transcript and final outcome show what happened on the line.

Skipping the Tutorial: The Agent Skill for Claude Code and Cursor

If you are wiring this into an AI coding agent, you can skip the manual integration entirely. ClawCall ships a drop-in agent skill for Claude Code, Cursor, ClawHub, and OpenClaw. Install once and your agent gets phone-calling as a first-class capability, with the tri-auth flow, polling loop, and transcript handling already wrapped. Your agent says "call the clinic" and the skill handles POST /call, polls until finalized, and feeds the transcript back into context. This matters because most readers of an API tutorial are not building a phone product. They are giving phone capability to an existing agent. The skill is the right entry point for that path, and the REST API is the right entry point if you are building a custom backend, a CRM integration, an automation node, or anything where you want explicit control over the HTTP layer. Both surfaces hit the same underlying contract; the skill is a convenience wrapper, not a separate product. The shape of the install is small: the skill defines tool schemas the agent can call (call_phone, get_call_status, get_recording), maps each to the REST endpoints documented above, and handles auth bookkeeping (storing the proto-key, attaching it on subsequent calls, surfacing sign-up prompts when the free tier is exhausted). Skill install instructions and the full tool schema are at /for-agents. The skill is MIT-licensed and the docs are CC BY 4.0. If you have already followed this tutorial and run a successful call, you are five minutes from giving your agent the same capability.

How This API Compares to the Alternatives

The honest landscape: there are infrastructure-grade voice platforms aimed at developers building their own voice products, consumer-grade AI call apps with thinner APIs, and a few products in between. Vapi and Retell sit firmly in the first camp, with rich REST surfaces, configurable assistants, and per-minute pricing — pick them when you are shipping a voice product where you want to tune the model, voice, transport, and prompting end to end. Synthflow and Vocode are similar developer-infrastructure plays; Vocode in particular is open-source and a good fit if you want self-host control. Air.ai and Regal target outbound sales call centers; their APIs are real but the product is aimed at sales orgs rather than individual agent integrations. Bland exposes a clean POST /call surface and is a reasonable peer if you want per-minute infrastructure with low setup overhead. On the inbound side, Goodcall, Rosie, Numa, and Replicant build AI receptionists that answer your business line. They are not what you want if your goal is outbound calls from an agent or script; they are excellent if you want a 24/7 answering service for a business. Apple's Hold For Me and the HoldForMe.ai consumer app are screen features and consumer apps respectively — neither exposes the kind of API this tutorial is about. On the consumer-overlap side, ClawTalk, ClawdTalk, PollyReach, AgentPhone, CallBuddy, Chirp, CallFluent, and Jarvis.cx ship task-based AI calling but most do not expose a public REST API at all, which makes them poor fits for a developer integration. ClawCall's positioning for the modal API reader is straightforward: a finished product with a clean REST contract, flat $4.99-$14.99 monthly pricing instead of per-minute billing, a no-card free tier of 30 calls and 30 minutes, whichever lasts later, mandatory AI disclosure, an instruction-controlled voicemail support, and a drop-in agent skill. If your goal is to give an agent phone capability today, it is the shortest path. If you are building a voice product from scratch, Vapi, Retell, or Vocode are the better starting points.

Frequently asked

How long does it take to make my first AI phone call with the ClawCall API?
About ten minutes from a cold start. The first POST /call request is anonymous and requires no signup, no credit card, and no provisioned phone number. You send one HTTP request with a destination number in E.164 format and a plain-English task description, save the api_key returned in the response, and poll GET /call/:id every few seconds until lifecycle equals finalized. The first 30 calls + 30 minutes of calling are free. If you are integrating into Claude Code or Cursor, the agent skill at /for-agents wraps the entire flow and gets you to a working call in roughly half that time. No SDK install is required for the raw API path.
What does the response from POST /call look like?
POST /call returns immediately with a JSON body containing call_id, lifecycle set to queued, and (on anonymous first calls) an api_key field starting with clawcall_sk_. The full call result is not in this response because real phone calls take minutes to complete; you poll GET /call/:id to get the final state. The polled response separates two concerns: lifecycle (queued, dialing, answered, finalized) tells you whether to keep polling, and outcome (completed, voicemail_detected, no_answer, busy, declined_by_callee, failed, and others) tells you what happened. The final response also includes talk_seconds, the transcript array, a recording_url, and hangup_cause fields.
Do I need to handle voicemail, IVR menus, or hold time myself?
No. The voice agent handles all three internally. When the call reaches voicemail, it follows your task: leave a concise message if you asked for one, or report that voicemail was reached if you did not. IVR menus are navigated automatically using a built-in send_dtmf tool; the menu prompts and the digits the agent pressed appear in your transcript so you can audit them. Hold time is handled by the agent simply waiting; talk_seconds reflects the full duration including hold. You do not write DTMF logic, voicemail heuristics, or hold-detection code. Your client code is one POST and a polling loop, regardless of how complex the actual call ends up being on the line.
Can the AI hand the call to me when a human needs to take over?
Yes, via the bridge_number parameter on POST /call. Pass your own US phone number and the agent gets a loop_in_user tool. When the agent decides handoff is appropriate (verification step, payment authorization, an unusual request, or the human on the other end insisting), it tells the callee "connecting you now," the carrier bridges the two phone calls at the network level, and you finish the conversation yourself. Bridge calls consume two numbers from the outbound pool, so they count as two against your concurrency for the duration. The bridge contract is documented at /docs and the consumer-facing pattern is shown at /hold-for-me.
What are the rate limits and concurrency limits on the API?
Free-tier anonymous calls are rate-limited by identity and capped at 30 calls and 30 minutes, whichever lasts later, before sign-up is required. On a paid plan, the default outbound concurrency is roughly three simultaneous calls per account, set by the size of the shared outbound phone-number pool. Bridge calls consume two numbers each, so they count as two against your concurrency. Unlimited Reserve and Unlimited Reserve Plus subscribers get a private reserved number that does not draw from the shared pool. There is no per-minute billing on any plan; pricing is flat monthly at $4.99 for Unlimited, $8.99 for Unlimited Reserve, and $14.99 for Unlimited Reserve Plus. US numbers only today.
How does this compare to Vapi, Retell, or building on a raw carrier plus an LLM?
Vapi and Retell are infrastructure-grade voice platforms designed for teams building their own voice products end to end, with per-minute pricing and rich assistant-configuration APIs. Rolling your own on a raw carrier plus a realtime LLM is the fully assemble-it-yourself path; you control everything and you also build the SIP plumbing, codec bridge, and turn-taking logic. ClawCall sits one level higher: a finished consumer-and-agent product exposed through a clean REST API, with flat monthly pricing, a no-card free tier of 30 calls and 30 minutes, whichever lasts later, mandatory AI disclosure, voicemail support, and a drop-in agent skill for Claude Code and Cursor. Pick the platform that matches what you are actually building. See /vs/vapi and /vs/retell for line-by-line breakdowns.

Related on clawcall.dev

← Back to blog
Use ClawCall on iMessage