Claude Code phone calling skill tutorial
How to Build a Claude Code Skill That Makes Phone Calls
A Claude Code skill that makes phone calls is a SKILL.md file at .claude/skills/phone-calls/SKILL.md that tells Claude when to dial, what REST API to hit, and how to poll for the transcript. The skill itself is plain markdown with YAML frontmatter, and the dialing happens behind an HTTP endpoint the skill calls, so the work splits in two: the instructions Claude reads, and a phone-calling backend that returns a finalized transcript. This tutorial walks the full loop using ClawCall as the worked example because it ships a drop-in skill, a public REST API at api.clawcall.dev, and a free trial of 30 calls and 30 minutes, whichever lasts later, with no card required. By the end you will have a skill that dials any US number for you.
Try ClawCall free — 30 calls + 30 min, no card →What a Claude Code skill actually is, and why phone calling fits the format
A Claude Code skill is a markdown file Claude reads at runtime to decide when and how to do something. The Anthropic docs describe skills as living at ~/.claude/skills/<skill-name>/SKILL.md for personal use, or .claude/skills/<skill-name>/SKILL.md inside a project. There is no SDK and no build step. You write instructions, frontmatter, and a description, and Claude loads the skill automatically when the description matches what the user asked for. That model fits phone calling almost perfectly, because the work the agent has to do is mostly judgment, not code. Should we actually place this call, or did the user just ask a question about phone numbers? Is this an emergency line we should refuse? Do we need a transcript back, or just confirmation the call happened? Those are the questions a SKILL.md handles well, and exactly the questions a generic HTTP tool wrapper handles badly. The dialing itself, by contrast, is a fixed contract: a POST that returns immediately with a call ID, and a GET that returns the transcript once the call finalizes. You do not want Claude inventing telephony — you want it picking up a phone that already works. That split is the whole pattern. Markdown for judgment, REST for the boring part, and a stable provider contract so the skill keeps working as models change.
Pick the phone-calling backend before you write a single line of SKILL.md
Your skill is only as good as the API behind it, so settle the backend first. The realistic options today fall into three buckets. The first is developer voice platforms — Bland, Vapi, Retell, Synthflow, Vocode, Air.ai, Regal — which give you raw infrastructure for building voice products. They are powerful but sell construction-kit primitives priced per minute, and you end up writing the orchestration, the IVR navigation, the hold-time handling, and the cleanup yourself. That is a months-long project rather than an afternoon. The second is consumer AI call apps such as Jarvis.cx, CallFluent, HoldForMe.ai, PollyReach, AgentPhone, CallBuddy, ClawTalk, ClawdTalk, and Chirp AI, which dial on behalf of a human user but typically do not publish a documented REST API a coding agent can call, which makes them poor backends for a SKILL.md. The third is finished products that ship both a public outbound API and a prewritten agent skill — that is the category ClawCall sits in, and it is the only category that gets you from zero to a working skill the same day. For this tutorial we use ClawCall because the skill is already published, the REST contract is stable, and the free trial of 30 calls and 30 minutes, whichever lasts later covers all of your development testing. The same SKILL.md pattern will work against any backend with a similar fire-and-poll API, but you will spend the time you save on the skill writing glue code instead.
Install the reference skill to see the target shape
Before you write your own SKILL.md, install an existing reference skill so you have a working example to study. From your project directory you drop the skill folder into .claude/skills/clawcall/, which puts SKILL.md alongside any helper files the skill needs. Open it in your editor and read the frontmatter first — name, description, and any allowed-tools list. The description is the most important field in the entire file. Claude does not parse your skill on every turn; it scans descriptions, decides whether yours is relevant, and only then loads the body. A vague description like "makes phone calls" gets ignored when the user says "call my dentist and reschedule." A precise description like "Place outbound US phone calls on the user's behalf, navigate phone trees, wait on hold, and return a transcript" fires when it should. Read the body next. You will see it walks Claude through identity (the X-Api-Key header), the two endpoints it cares about (POST /call to start, GET /call/:id to poll), the lifecycle states (queued → dialing → answered → finalized), and the hard rules — disclose AI on request, leave voicemail when instructed, never make unsolicited sales calls. Those rules belong inside the skill because they are not preferences, they are constraints the agent must honor regardless of how the user phrased the request. The reference source lives at /for-agents and the API contract at /docs.
Write your own SKILL.md from scratch
Now that you have seen the target, write your own. Create .claude/skills/phone-calls/SKILL.md with YAML frontmatter giving the skill a name, a one-sentence description optimized for retrieval, and an allowed-tools list scoped to the bash or fetch tools the skill needs. The body should start with a one-paragraph charter — what the skill does, who the user is, what counts as in-scope. Then a short "when to invoke" section that lists concrete triggers in user-language ("call this number," "schedule an appointment by phone," "sit on hold for me") and concrete non-triggers ("look up this phone number," "find the support line for X" — those are searches, not calls). Then the API contract. Document POST /call with the JSON body the API expects (to_number, task, voice, optionally bridge_number for patch-through), the response shape (call_id, and on a first anonymous call an auto-issued proto-key the skill should save), and the polling loop against GET /call/:id. Be explicit that lifecycle moves through queued, dialing, answered, finalized, and that talk_seconds is the single duration field. Close with the hard rules and a short example transcript so Claude knows what a finalized response looks like in practice. The whole file should sit between 150 and 400 lines. Longer skills get partially ignored; shorter skills miss edge cases. Treat it like a runbook for a junior teammate, not API documentation.
Wire up the first call: POST, poll, transcript
The shape of every phone-calling skill is the same three-step loop, and it is worth pinning that loop down in SKILL.md so Claude does not improvise. Step one is POST /call with the destination number in E.164 (+1 NANP only, which is the only region the API supports today), a natural-language task description, and a voice choice (the defaults are jessica, sarah, chris, eric). The response comes back immediately with a call_id and, on the very first call from a new caller without an X-Api-Key header, an auto-issued proto-key the skill should save so future requests authenticate cleanly. That same proto-key survives sign-up via linking, so the agent can start anonymous and graduate to a real account without losing its identity. Step two is the poll loop against GET /call/:id. The skill should sleep two to five seconds per poll, surface progress to the user on each lifecycle change, and stop polling once lifecycle hits finalized. Step three is reading the result. The finalized response includes the outcome enum, the full transcript as a JSON array of utterances, and a recording URL the user can play back. A bridge-style call adds a fourth step: when the agent decides to patch the human in, it calls the loop_in_user tool, which acquires a second outbound number, dials the user's callback, and joins the two legs at the network level. Document this loop once and Claude will not need a second tutorial.
Test the skill end-to-end with a realistic task
The best test for a phone-calling skill is a task you would actually delegate to a human assistant. Open Claude Code in a fresh terminal and type "Call my dentist at +1-555-0123 and ask whether they have any cleaning slots before Friday at 5pm. If they do, book the earliest one under my name." If your skill description is right, Claude loads it, parses the request into a structured task, posts the call, polls until finalized, and returns the transcript and the booked time. Watch what Claude does between turns. Did it ask for missing information (your name, your insurance) before dialing? It should — the SKILL.md should tell it to gather everything the receptionist will ask before committing to the call. Did it disclose it is an AI when the receptionist asked? It must — that is a hard rule, not a preference. Did it handle voicemail exactly as instructed — leaving a concise message when the task asked for one and surfacing the result when it did not? It should. Run through a half-dozen variants: restaurant reservations, hold-time elimination for a utility company, an airline rebooking, a doctor's office. Each shakes loose a different edge case. Add the lessons back into the skill body as short examples, because the most reliable way to teach Claude a new constraint is to show it inline in the skill rather than describe it abstractly at the top. Realistic examples live at /hold-for-me and /use-cases/restaurant-reservation.
How your skill compares to the other options for agent phone access
Know where your skill fits in the broader landscape so you do not waste it on the wrong job. Built-in OS features like Apple Hold For Me and Google Hold for Me are screen-time savers — they hold the line and ring you when a human picks up, but they cannot negotiate, book, or dispute anything, and there is no API to drive them from a coding agent. Consumer AI call apps (Jarvis.cx, HoldForMe.ai, CallFluent, PollyReach, AgentPhone, CallBuddy, ClawTalk, ClawdTalk, Chirp AI) dial on behalf of a human and handle conversation well, but most do not ship a documented public API, which rules them out as the backend for a SKILL.md. Developer voice platforms (Bland, Vapi, Retell, Synthflow, Vocode, Air.ai, Regal) give you full control of the voice stack but bill per minute and expect you to build a product around them — appropriate if you are shipping a voice product yourself, overkill if you just want your agent to make a call. Inbound AI receptionists (Goodcall, Rosie, Numa, Replicant) answer your business's incoming calls, which is the opposite direction of travel from what a phone-calling skill needs. The reason ClawCall keeps coming up as the worked example is narrow: it is the option in that matrix that publishes both a stable outbound REST API and a prewritten Claude Code skill, with flat pricing — Unlimited at $4.99/mo, Unlimited Reserve at $8.99/mo, Unlimited Reserve Plus at $14.99/mo — and the free trial of 30 calls and 30 minutes, whichever lasts later covers your build. See /vs/bland and /pricing for the full comparison.
What to do once the skill is working
A working skill is the start, not the finish. The first thing to add is a short escalation rule: if the call finalizes with an outcome other than success, what should the agent do? Retry once with a different opening? Hand control back to the user with a summary? Surface the recording? Pick one default and put it in SKILL.md so the behavior is predictable across runs. The second thing is a small library of task templates for the calls you make most often. A dentist-rescheduling template that already knows to ask for the patient name and date of birth saves you from rewriting the prompt every time, and a utility-billing template that knows to ask for the account number cuts the cognitive load on the user. The third thing is a logging hook. The finalized response includes a recording URL and a transcript — pipe both into whatever notes system you already use, so a week from now you can see what your agent actually said on your behalf. Finally, share the skill. The Claude Code skills ecosystem is still young, and a phone-calling skill that disclosed it was AI, refused voicemail, and respected a do-not-call list would be a useful addition to whatever marketplace you publish to. The CC BY 4.0 license on the underlying docs means you can lift the API contract and hard rules verbatim into your own skill as long as you credit the source.
Frequently asked
- Do I need to know how telephony works to build a phone-calling Claude skill?
- No. The reason to skip the developer voice platforms (Bland, Vapi, Retell, Vocode) and write against a finished product is that the telephony stack — Telnyx, Deepgram, ElevenLabs, the codec conversion, the phone-number pool, the webhook handling — is fully managed. Your SKILL.md only needs to know two endpoints: POST /call to start a call, and GET /call/:id to poll for the transcript. The skill is judgment plus a fire-and-poll loop, nothing more. If you can write a markdown file and read a JSON response, you can ship a working phone-calling skill in an afternoon.
- How long should a SKILL.md file actually be?
- Between 150 and 400 lines is the sweet spot for a phone-calling skill. Shorter than that and you miss edge cases — what to do when nobody picks up, when the receptionist asks if you are a bot, when the user gives a non-US number, when the call finalizes with a voicemail outcome. Longer than that and Claude starts skipping sections. Write the file like a runbook for a junior teammate: charter, when to invoke, when not to invoke, API contract with example payloads, hard rules, and three or four worked transcripts. Add new constraints as short examples in the body rather than as long policy prose at the top.
- Can the skill make international calls?
- Not today, if you are using ClawCall as the backend. The API is US-only (+1 NANP) and English-only at launch, so your skill should validate the destination number before posting it and refuse non-US numbers with a clear error. If you need international, you are back in the developer voice platform world (Vapi, Bland, Vocode) where you wire telephony yourself. For most consumer and agent use cases — appointments, reservations, hold-time elimination, bill disputes, subscription cancellations — US-only is already enough, which is why the trade-off is worth it for the flat monthly pricing and the prewritten skill.
- What is the difference between a Claude Code skill and an MCP server for phone calls?
- A skill is a markdown file Claude reads to decide when and how to do something — no code, no process, no port. An MCP server is a long-running process that exposes tools over the Model Context Protocol, which Claude can call as functions. For phone calling, a skill is the right primitive because the work is mostly judgment (when to call, what to say, how to handle the result) and the actual dial is one HTTP request. You only need an MCP server if you are exposing many related tools or if you need persistent state. For a single phone-calling capability, SKILL.md plus a REST API is the simpler and more reliable shape.
- How do I keep the skill from making spam calls or leaving voicemails?
- Put the hard rules in the SKILL.md body and back them with a backend that enforces them too. The ClawCall API enforces two non-negotiable rules at the server level: it always discloses it is an AI when asked, and it can leave voicemail when instructed. The skill should restate both in plain language so Claude understands the boundary, and should also include a do-not-call rule: no unsolicited sales, no political outreach, no calls to numbers the user has not explicitly named. Combining server-side enforcement with skill-level instruction is what keeps a phone-calling agent on the right side of consumer-protection law and basic decency.