About session
Case Studies

Simplify Job Search

AI Voice Calling Agent & Interview Screening Platform

AI-Powered Voice Screening Practice, Telephony-Based Calls, Speech-to-Text, Text-to-Speech, Candidate Response Capture, and Interview Readiness Automation

1. Client Context

Simplify Job Search was developed as an AI-powered job application and hiring platform that helps
candidates prepare better applications and helps recruiters manage hiring workflows. This case
study focuses on the AI Voice Calling Agent, a separate module designed to simulate voice-based
screening rounds and improve candidate interview readiness.

Modern hiring increasingly includes phone screening, automated first-round interviews, recruiter
screening calls, and communication-based evaluation. Many candidates may have strong technical
skills but still struggle to explain their profile confidently in a live voice interaction. Simplify Job
Search introduced the AI Voice Calling Agent to help candidates practice realistic screening
conversations before facing real recruiter or HR calls.

The AI Voice Calling Agent combines telephony, speech-to-text, text-to-speech, AI conversation
generation, response capture, payment unlock, and candidate workflow integration. The module
allows candidates to unlock a voice round, start a practice call, answer role-related and HR-style
questions, and receive an AI-assisted screening experience through a natural phone-call style
workflow.

The system uses Twilio for calling, Deepgram for speech-to-text context, Amazon Polly for voice
output, AI providers such as Azure OpenAI and Vertex AI context for response generation and
conversation orchestration, AWS SQS for asynchronous job queues, worker-based processing for
picking and executing voice jobs, Node.js backend APIs, Angular 17 frontend screens, DynamoDB,
Amazon S3, Razorpay unlock/payment flow, AWS EC2, Docker, Nginx, CloudWatch, and CloudWatch
Agent.

The goal of the module is to create a practical AI screening layer that helps candidates improve
confidence, communication, and interview preparedness while also creating a scalable monetization
opportunity for Simplify Job Search through paid AI voice rounds.

2. Problem

Candidates often prepare resumes and job applications but remain underprepared for the first
screening conversation. The first recruiter or HR call is usually short, fast, and judgment-heavy,
and many candidates lose opportunities because they are unable to communicate their value clearly.

The existing interview preparation process created several problems:

  • Candidates usually practice written applications but do not get enough realistic voice-based interview practice.
  • Many candidates are nervous during phone screening rounds and struggle with confidence, pacing, and clarity.
  • HR screening questions are repetitive but still require structured, confident, and personalized responses.
  • Candidates often do not know how to introduce themselves, explain experience, discuss strengths, or answer role-fit questions in a live call.
  • Traditional mock interview preparation requires another person, coach, recruiter, or mentor, which may not be available on demand.
  • Text-based AI chatbots do not fully replicate the pressure and communication style of a phone screening round.
  • Recruiter screening calls require listening, quick thinking, and spoken response quality, which cannot be measured only through resume tools.
  • Candidates need affordable, repeatable, and self-service interview practice that can be accessed when they are preparing for a specific role.
  • The platform needed a secure way to unlock voice rounds using payment or credits before initiating AI-powered telephony sessions.
  • Voice workflows require reliable handling of calls, speech recognition, voice generation, AI prompts, AWS SQS job queues, worker-based job processing, candidate response capture, and backend state management.
  • The system needed monitoring and logging because voice-call workflows are more operationally sensitive than normal text-based AI requests.

The key goal was to build an AI voice screening module that could simulate realistic candidate
screening calls, reduce interview anxiety, capture candidate responses, and create a repeatable
voice-practice workflow inside the Simplify Job Search platform.

3. AI Approach

The Simplify Job Search AI Voice Calling Agent was designed as a voice-first interview preparation
and screening automation module. Instead of limiting candidates to text-based AI outputs, the
platform uses calling, real-time speech processing, AI-generated interview flow, and human-like
voice responses to create a practical screening experience.

The approach combines candidate unlock/payment, call initiation, Twilio telephony, speech-to-text
transcription, AI prompt orchestration, text-to-speech response generation, candidate response
capture, screening-flow state handling, and result storage. This allows the platform to convert an
AI interview practice session into a structured, trackable, and monetizable workflow.

The approach included:

Voice Round Unlock and Payment Flow

The AI voice round is connected to the Simplify Job Search monetization system. Candidates can
access the voice round through an unlock/payment flow before starting the AI call.

The frontend provides the candidate with an AI voice round access screen, unlock modal, pricing
context, and call-start experience. Razorpay payment integration is used to create and verify
payment before enabling the paid voice round.

This creates a controlled AI usage model where high-cost voice and AI infrastructure is linked to
a paid unlock or credit-based usage workflow.

Candidate Practice Call Start Flow

After unlock, the candidate can start a voice practice call from the Simplify Job Search application.
The frontend sends the required candidate, job, and voice-round context to the backend so the
system can initiate or manage the screening session.

The call-start flow is designed to feel similar to a real recruiter screening round. Candidates can
practice introducing themselves, answering role-related questions, explaining projects, responding
to HR questions, and improving verbal confidence.

The module can be connected with candidate profile, resume, job context, or screening configuration
so the AI conversation can be more relevant than a generic interview script.

Asynchronous Queue and Worker Processing

The AI Voice Calling Agent also uses AWS SQS to support asynchronous voice-session processing.
Instead of keeping every operational task inside a single request-response API cycle, the platform
can place voice-round jobs into a queue and allow a backend worker to pick, process, and update
the job state.

This queue-worker approach is important for voice workflows because call initiation, transcription,
AI response generation, text-to-speech generation, response capture, and status updates may take
different amounts of time. AWS SQS helps decouple the frontend request from backend processing
and gives the system a more reliable way to handle voice jobs.

The worker processing layer can pick pending jobs from the queue, execute the required voice or
AI workflow, update DynamoDB with screening progress, and support retry/failure-handling patterns
for more operationally stable call automation.

Telephony-Based AI Interaction

Twilio is used as the core telephony layer for placing or managing voice calls. This allows the AI
screening experience to move beyond browser-only chat and into a phone-call style interaction.

The telephony layer supports the call bridge between the candidate and the AI agent. The backend
manages the call state, session flow, and integration with speech and AI services.

This approach helps candidates experience the pressure, flow, pauses, and response timing of a
real voice screening conversation.

Speech-to-Text Processing

Candidate spoken responses need to be converted into text before they can be evaluated or used
by the AI system. The module uses speech-to-text context with Deepgram to transcribe candidate
audio into structured text.

The transcribed response can then be passed to the AI layer for follow-up question generation,
answer evaluation, conversation continuation, and response storage.

Speech-to-text is important because the system must understand what the candidate actually said
rather than depending only on predefined button-based inputs.

AI Conversation Orchestration

The AI layer generates the flow of the screening conversation. Based on candidate context, job
context, previous answers, and interview stage, the AI can ask relevant questions and continue
the conversation in a structured way.

The platform uses Azure OpenAI and Vertex AI context as part of its AI-provider strategy. This
allows Simplify Job Search to design prompts, conversation flows, and fallback planning around
flexible AI provider integration.

The AI approach is focused on practical screening simulation: asking meaningful questions,
maintaining a professional tone, responding naturally, and guiding the call through an
interview-like sequence.

Text-to-Speech and Voice Output

After the AI generates a question or response, the text must be converted into spoken audio.
Amazon Polly is used for text-to-speech voice generation context so that the candidate hears the
AI interviewer instead of reading text on screen.

The use of text-to-speech makes the experience more realistic and helps candidates practice
listening and responding in a voice-based format.

This also gives the module flexibility to use clear, consistent, and scalable voice output instead
of relying on human interviewers.

Candidate Response Capture and Screening Record

During the voice session, candidate answers and screening responses can be captured for review,
tracking, and future improvement. The backend can store call/session context, transcribed
responses, question flow, and screening metadata.

This creates a record of the candidate practice session and provides a foundation for future scoring,
feedback, recruiter-side screening intelligence, or candidate improvement reports.

The response capture layer is important because voice practice becomes more valuable when the
platform can preserve what was asked, what was answered, and how the session progressed.

Monitoring, Reliability, and Operational Control

Voice calling modules require stronger operational monitoring than normal web features. The
system needs to track payment state, call initiation, transcription, AI response generation, voice
output, failure handling, and logs.

AWS EC2, Docker, Nginx, CloudWatch, CloudWatch Agent, and AWS SQS support production
deployment, routing, logs, infrastructure monitoring, queue visibility, worker execution, and
operational control.

This makes the AI Voice Calling Agent more reliable for production usage and gives the team a
foundation for debugging call or AI workflow issues.

4. Tech Used

The Simplify Job Search AI Voice Calling Agent uses frontend engineering, backend APIs, AWS SQS
queue workers, telephony services, speech processing, AI orchestration, payment integration,
database storage, and cloud deployment technologies to deliver a scalable voice-based interview
preparation workflow.

Frontend and User Experience Stack

  • Angular 17 for building the candidate-facing AI voice round interface and call-start experience.
  • TypeScript for type-safe frontend logic and maintainable voice-round workflows.
  • JavaScript for supporting frontend behavior, payment handling, and user interactions.
  • HTML for structuring candidate voice-round screens, access panels, and user flows.
  • CSS for base styling and responsive interface presentation.
  • Tailwind CSS for utility-first UI styling and clean candidate-facing layouts.
  • RxJS for managing asynchronous frontend API communication, unlock state, and call-start responses.

Backend and API Stack

  • Node.js for running backend voice-round APIs and server-side orchestration logic.
  • Express.js for API routing, middleware handling, and REST endpoint development.
  • JavaScript ES modules for modular backend implementation and maintainable service structure.
  • REST API architecture for connecting the Angular frontend with voice, payment, and AI backend workflows.
  • JWT authentication for protecting candidate voice-round access and authenticated API usage.
  • Controller-service-route structure for separating call initiation, payment verification, AI workflow, and data operations.
  • AWS SDK for integrating backend services with DynamoDB, Amazon S3, AWS SQS, AWS SES, and AWS infrastructure services.
  • Winston and Pino for backend logging, debugging, and operational observability.
  • Node cron for scheduled backend tasks and operational automation context.

Communication and Queue Stack

  • AWS SQS for queue-based voice-session job processing and asynchronous backend workflows.
  • SQS worker processing for picking voice jobs, executing call/session tasks, and updating workflow state.
  • SQS message payloads for passing candidate, job, call, and screening context between API and worker layers.
  • Queue-based retry and failure-handling context for improving resilience of telephony, AI, STT, and TTS workflows.
  • Asynchronous processing architecture for separating frontend call-start requests from longer-running backend voice operations.

Voice Calling and Telephony Stack

  • Twilio for initiating and managing telephony-based AI voice calls.
  • Twilio Voice workflows for connecting candidates with the AI screening call experience.
  • Webhook-based call handling for managing call events, session progress, and backend voice workflows.
  • Call-state management for tracking active sessions, candidate context, and screening progress.

Speech Processing and Audio Stack

  • Deepgram for speech-to-text transcription of candidate spoken responses.
  • STT processing for converting candidate audio into text that can be used by AI workflows.
  • Amazon Polly for text-to-speech generation of AI interviewer voice responses.
  • TTS processing for converting AI-generated questions and prompts into spoken audio.
  • Voice response pipeline for connecting transcription, AI generation, and synthesized speech output.

AI and Conversation Orchestration Stack

  • Azure OpenAI for AI-powered interview question generation, response handling, and conversation flow.
  • Vertex AI and Gemini provider context for future AI-provider flexibility and model fallback planning.
  • Prompt-driven conversation design for creating structured HR-style and role-related screening interactions.
  • Context-aware AI orchestration for using candidate, job, resume, and previous-response context during the call.
  • Structured screening-response capture for preserving AI questions, candidate answers, and session outcomes.

Payment, Database, and Storage Stack

  • Razorpay for paid AI voice round unlock, order creation, and payment verification.
  • Amazon DynamoDB for storing candidate records, unlock state, voice-round metadata, SQS job status, screening responses, and usage history.

Deployment and Infrastructure Stack

  • AWS EC2 for hosting production frontend/backend services and server infrastructure.
  • Docker for containerizing application services and creating portable deployments.
  • Docker Compose for managing multi-service deployment configuration.
  • Nginx reverse proxy for routing, SSL termination, and production serving.
  • CloudWatch for infrastructure monitoring, application logs, and operational visibility.
  • CloudWatch Agent for collecting server-level metrics and logs from EC2.
  • Production Angular build through Nginx for serving candidate-facing voice-round frontend assets.
  • Backend Node.js container for running voice APIs, AI orchestration, telephony hooks, SQS worker processing, and supporting utilities.
  • Environment-based configuration for securely managing AWS SQS, AWS services, Twilio, Deepgram, Amazon Polly, AI providers, authentication, and payment settings.

5. Outcome / Business Value

The Simplify Job Search AI Voice Calling Agent created a practical voice-first interview preparation
workflow and extended the platform beyond resume optimization into communication readiness.

The module provides several business benefits:

  • Helps candidates practice realistic HR and recruiter screening conversations before real interviews.
  • Reduces interview anxiety by allowing candidates to repeatedly practice voice-based responses.
  • Improves candidate communication readiness by simulating phone-call style interactions rather than only text-based chat.
  • Creates a more complete candidate preparation journey by combining resume intelligence with interview readiness.
  • Supports paid unlock and monetization through AI voice round access.
  • Uses telephony, AWS SQS worker processing, STT, TTS, and AI orchestration to create a differentiated feature compared with ordinary resume tools.
  • Captures candidate responses and screening context for future feedback, scoring, analytics, and recruiter-side intelligence.
  • Allows the platform to support on-demand mock screening without requiring human recruiters or mentors for every practice session.
  • Strengthens Simplify Job Search as an AI-powered job application companion instead of only a resume-generation product.
  • Creates a scalable foundation for future recruiter-side automated screening, candidate evaluation, and voice-based hiring workflows.
  • Improves product monetization by connecting high-value AI interactions with credits, unlocks, and payment verification.
  • Supports operational reliability through SQS-based asynchronous processing, cloud deployment, monitoring, logging, and environment-based configuration.

The AI Voice Calling Agent demonstrates how voice AI can be added to a career platform to help
candidates improve not only what they submit, but also how they communicate during the hiring
process.

6. What Similar Companies Can Learn

Career-tech platforms, HR-tech startups, job marketplaces, interview preparation tools, and
recruitment automation companies can learn several important lessons from the Simplify Job Search
AI Voice Calling Agent.

First, candidate preparation should not stop at resume generation. Many candidates fail or lose
confidence during the first screening call, so voice-based practice can become a high-value addition
to any job-search platform.

Second, text chat is not enough for communication readiness. A real screening call includes
listening, speaking, pauses, confidence, and response structure. Telephony-based AI practice creates
a more realistic preparation experience than a normal chatbot.

Third, voice AI requires a full pipeline rather than a single AI model. The strongest implementation
combines telephony, AWS SQS queues, worker processing, speech-to-text, AI conversation logic,
text-to-speech, call-state tracking, storage, payments, and monitoring.

Fourth, paid unlock workflows are important for voice AI because telephony, speech recognition,
voice generation, and LLM usage all create operational cost. Linking the feature with Razorpay,
credits, and usage history makes the system commercially sustainable.

Fifth, candidate response capture creates long-term value. Once the platform stores screening
questions, transcribed answers, and session context, it can later generate feedback, readiness scores,
recruiter insights, and improvement recommendations.

Finally, the strongest career platforms will combine application intelligence with communication
intelligence. Simplify Job Search shows how resume score, gap analysis, resume generation, cover
letter automation, and AI voice screening can work together as one end-to-end job preparation
ecosystem.

Workshop session

LET’S TALK

Questions? Let’s talk.

Share your goals, timeline, and current stack. We’ll reply within 24–48 hours with a suggested plan and next steps.

Fast response NDA on request Clear roadmap
How it works
  1. 1 Share goals + timeline
  2. 2 We propose a plan + estimate
  3. 3 Kickoff in 3–5 days
NDA on request. Secure handling of sensitive data.

Ask Me Anything About This Site

Get fast, informative answers