Digital Event Horizon
Code-switching, the practice of switching between languages even mid-sentence, poses a significant challenge for voice agents. A recent study by ServiceNow-AI evaluated seven ASR systems on code-switched speech and found that top-performing systems can handle bilingual customers with surprising accuracy, but other models struggle to keep up. The researchers highlight the need for voice agents to be able to handle code-switched speech effectively in enterprise settings.
Over half of people speak more than one language, posing a challenge for voice agents in handling code-switched speech. A team of researchers created a benchmark and dataset to evaluate ASR systems' performance on code-switched speech in enterprise settings. The study tested four language pairs and seven ASR systems, finding that top-performing models incurred only small penalties for errors compared to monolingual baselines. Code-switching breaks ASR systems, with the number of switches being a key predictor of transcription errors. The study highlights the need for voice agents to handle code-switched speech effectively in enterprise settings and emphasizes the importance of benchmarking language pairs before making production decisions.
The world's population is increasingly linguistically diverse, with over half of people speaking more than one language. This diversity poses a significant challenge for voice agents, which are designed to understand and transcribe spoken speech in various languages. One of the most common challenges voice agents face is code-switching, where speakers seamlessly switch between languages even mid-sentence. Despite its prevalence, there has been limited research on how voice agents handle code-switched speech in enterprise settings.
Recently, a team of researchers from ServiceNow-AI built a benchmark and dataset to evaluate models' performance on code-switched speech. They focused on automatic speech recognition (ASR) as the first step in any voice agent pipeline, as transcription errors can have real operational consequences in enterprise settings. The researchers tested four language pairs - Spanish-English, French-English, Canadian French-English, and German-English - and evaluated seven ASR systems, including some Large Audio Language Models (LALMs), frontier ASRs, and open-source ASRs.
The researchers analyzed errors along two dimensions: word-level accuracy and semantic accuracy. They reported three metrics: Word Error Rate (WER), Semantic Word Error Rate (SWER), and Answer Error Rate (AER). The results showed that the top-performing systems, including ElevenLabs Scribe V2 and AssemblyAI Universal 3-Pro, incurred only a small penalty relative to their monolingual baselines. However, other models performed poorly, especially when it came to handling code-switched speech.
The researchers also investigated how code-switching breaks ASR systems. They found that the number of language switches within an utterance was the predictor most consistently associated with whether the occurrence of a transcription error occurred. However, the magnitude of errors was shaped by the overall density of mixing, with more thoroughly interwoven languages resulting in larger transcription errors.
The study highlights the need for voice agents to be able to handle code-switched speech effectively, particularly in enterprise settings where bilingual customers may switch languages mid-sentence. The researchers conclude that before making production decisions, it is essential to benchmark the languages your customers actually speak, as performance varies substantially across models and language pairs.
Related Information:
https://www.digitaleventhorizon.com/articles/Code-Switching-in-Voice-Agents-A-Study-on-the-Challenges-and-Opportunities-deh.shtml
https://huggingface.co/blog/ServiceNow-AI/code-switching
https://aitoolly.com/ai-news/article/2026-06-10-can-voice-agents-handle-bilingual-customers-benchmarking-frontier-asr-on-code-switched-speech
Published: Wed Jun 10 19:23:36 2026 by llama3.2 3B Q4_K_M