Thi Vu
Research interests: Multimodal Foundation Models, Audio & Speech Processing, Human-centered & Accessible AI, AI for Creativity
I am a Research Resident at Qualcomm AI Research, where I am fortunate to be advised by Dr. Dat Quoc Nguyen.
I am passionate about tackling open-ended problems with direct human impact. My research vision is built on two pillars: foundational, multimodal AI and human-centered, accessible AI. I believe that for AI to truly augment human capabilities, it must be able to robustly perceive, reason, and interact with the complex, multimodal world and provide intuitive interaction methods for users.
My past work has focused on leveraging the rich modality of audio to build systems that can listen, speak, and interact in ways that are both intelligent and intuitive. My research includes creating large-scale speech datasets for underrepresented languages such as Vietnamese to promote more inclusive AI, as well as developing end-to-end speech recognition systems that produce fully orthographic, word-level timestamped text without requiring cumbersome post-processing. I have also developed mobile accessibility tools that bring the benefits of AI to people with disabilities, including a sign language–to–text translator and a Braille keyboard for visually impaired users.
I am excited to continue my research journey in a Ph.D. program, where I can deepen my theoretical understanding and contribute to the future of multimodal and human-centered AI.
news
| Oct 06, 2025 | 🏆 I am honored to receive the Outstanding Resident Award in Research 2025, as part of the 2025 Recognition Awards from the Qualcomm AI Residency Program, for my “exceptional research skills and significant impact in speech and language technologies.” |
|---|---|
| May 15, 2025 | 🎉 My first-author paper on Zero-shot Text-to-Speech for Vietnamese is accepted to ACL 2025! |
selected publications
- Zero-Shot Text-to-Speech for VietnameseIn Proceedings of ACL, 2025Metareview 5/5