Build a Voice AI Screener That Actually Understands Candidates
The best voice AI platforms for screening applicants are custom-built systems using models like Whisper and Claude. They outperform off-the-shelf tools by analyzing conversational nuance, not just keywords.
A typical build involves a dedicated phone number that applicants call, a transcription service, and a large language model that scores the conversation against a custom rubric. The complexity depends on how many roles you're hiring for and how deep the analysis needs to be.
We built a screening system for a 12-person recruiting firm that processes 400 applicants a month. Their recruiters were spending 15 minutes per applicant on initial phone screens. The system we deployed handles the first screen in 90 seconds and delivers a scored transcript, saving them 60 hours per month.
What Problem Does This Solve?
Most HR teams first look at tools like MyInterview or HireVue. These platforms work for structured video interviews, but their voice analysis is often keyword-based. They can flag a candidate who says "teamwork" but cannot distinguish between someone who gives a genuine example and someone who just drops the buzzword. They also lock you into rigid question paths and expensive per-seat contracts.
Some teams try to build a simpler version with traditional Interactive Voice Response (IVR) systems. This approach fails because IVRs are just phone trees. They can ask "Press 1 for yes, 2 for no" to confirm a certification, but they cannot handle open-ended questions like, "Tell me about a time you solved a problem on the fly." This misses any chance to assess critical soft skills.
A regional logistics company hiring 50 warehouse staff per quarter faced this exact issue. Their two-person HR team was overwhelmed. The off-the-shelf tools filtered out good candidates who didn't use specific keywords, and the IVR system couldn't tell them anything about a candidate's personality or problem-solving skills, leading to many wasted second-round interviews.
How Does It Work?
We start by provisioning a dedicated phone number using Twilio. When a candidate calls, the audio is streamed directly to an AWS S3 bucket. An AWS Lambda function triggers on the file upload and transcribes the entire conversation in under 30 seconds using OpenAI's Whisper API. This provides a clean, time-stamped text transcript for analysis.
The transcript is then passed to the Claude 3 Sonnet API. We use a multi-shot prompt that includes the job description, a custom scoring rubric with 5 key traits (e.g., reliability, problem-solving), and examples of good and bad answers. Claude scores each trait from 1 to 10 and provides a written justification, referencing specific parts of the conversation. The entire analysis completes in about 15 seconds.
The final output is a structured JSON object containing the scores, justifications, and the full transcript. We push this data directly into the client’s Applicant Tracking System (ATS), like Greenhouse or Lever, using their native API. The recruiter sees a new note on the candidate's profile with a summary score, like 78/100, within 2 minutes of the call ending.
The entire system is serverless, built with Python on AWS Lambda and connected via Amazon SQS queues. This means there are no servers to manage or patch. The monthly AWS cost for processing 500 candidates is typically under $50. We complete the entire build, from discovery to deployment, in a 3-week cycle.
What Are the Key Benefits?
Screen 400 Applicants in an Afternoon
The system processes calls in parallel. A batch of hundreds of candidates can be screened in hours, not weeks. The first summary arrives in your ATS 2 minutes after a call ends.
Pay for Usage, Not for Seats
A single fixed-price build, then you only pay for what you use. Monthly hosting on AWS is often under $50, a fraction of a single license for many recruiting platforms.
You Own the Screening Rubric and Code
We deliver the full Python source code to your GitHub repo. You have complete control to modify the scoring logic as your hiring needs change. No vendor lock-in.
Get Alerts for High-Potential Candidates
We configure Slack or email alerts to trigger instantly for any candidate scoring above a set threshold. Your team can follow up in minutes, not days.
Integrates Directly Into Your ATS
Results are pushed into Greenhouse, Lever, or your current CRM. Recruiters see scores and transcripts inside the tool they already use every day.
What Does the Process Look Like?
Discovery and Rubric Design (Week 1)
You provide the job description and we work together to define 5-7 key traits for the screening rubric. You receive a draft rubric for approval.
Core System Build (Week 2)
We build the Twilio-Lambda-Claude pipeline and test it with sample audio. You receive access to a staging environment to review test outputs.
ATS Integration and Launch (Week 3)
We connect the system to your ATS API and run end-to-end tests with live phone calls. You receive the full source code and system documentation.
Monitoring and Handoff (Weeks 4-8)
We monitor the first 100 live candidates, fine-tuning the prompts based on your feedback. After 8 weeks, you receive a runbook for ongoing maintenance.
Frequently Asked Questions
- What does a custom voice AI screener cost to build?
- The cost depends on the number of roles to screen for and the complexity of the ATS integration. Most builds are a fixed-price project completed in 3-4 weeks. This one-time cost replaces recurring per-seat SaaS fees that can run into thousands per year. We provide a detailed quote after a discovery call.
- What happens if a candidate has a strong accent or the call quality is bad?
- The Whisper transcription model is highly robust to accents and background noise. If transcription accuracy falls below a confidence threshold, the system flags the interview for manual review instead of generating a score. This prevents bad data from producing an unfair assessment. The recruiter gets a notification to listen to the original audio.
- How is this better than just using a platform like HireVue?
- HireVue is a tool for structured video interviews but its AI can be keyword-focused. Our custom approach uses a large language model to assess conversational nuance, problem-solving ability, and alignment with your specific company values. We build the rubric around your needs, not a generic template that all their customers use.
- How do you ensure the AI scoring is fair and unbiased?
- We design the scoring prompts to explicitly ignore demographic information and focus only on the candidate's responses related to the job criteria. We also conduct bias testing by running transcripts through the system with names removed to check for score consistency. You have full visibility into the scoring logic in the source code.
- Can candidates tell they're talking to an AI?
- Yes, and we recommend being transparent. The system can start with a clear disclosure like, 'This is an automated initial screening call.' This manages expectations and is a better experience than trying to pretend it's a human. The voice used is a high-quality neural voice from a service like ElevenLabs.
- What kind of maintenance is required after the build?
- The serverless architecture requires minimal technical maintenance. The main task is occasionally updating the scoring rubric prompts when a role's requirements change. We provide a runbook explaining how to do this. We also offer an optional flat monthly maintenance plan that covers prompt updates and ongoing system monitoring.
Related Solutions
Ready to Automate Your Small Business Operations?
Book a call to discuss how we can implement ai automation for your small business business.
Book a Call