University of Cambridge
Department of Engineering
Cambridge, UK
CV
GitHub
Google Scholar
LinkedIn
am3303 [at] cam.ac.uk
ABOUT ME
I am a 23-year-old PHD student at the University of Cambridge under the supervision of Prof. Phil Woodland. I completed my Bachelor's and Master's degree in computer science and AI with distinction in an innovation and research-devoted curriculum and earned a two-year entrepreneurship diploma in 2022. I am professionally also contributing to the development of SpeechBrain, an all-in-one, open-source, PyTorch-based speech processing toolkit with more than 10,000+ stars on GitHub. At SpeechBrain, I lead the core efforts of the toolkit. In 2019, I started as an autodidact on deep learning and helped frame the largest French AI community.
RESEARCH INTERESTS
My research focuses on deep learning—specifically Speech Language Models (SLMs) that natively understand and generate speech—with the long-term goal of passing a “Speech Turing Test.” I aim to realise fully speech-native agents capable of robust dialogue, reasoning, and paralinguistic expressivity, akin to Her.
SPEECHBRAIN
I serve as a core maintainer of the SpeechBrain toolkit, responsible for its core development and overall management. My role entails actively supporting the toolkit by engaging in discussions, addressing issues, and reviewing pull requests. Additionally, I focus on expanding the toolkit’s capabilities by introducing new features for automatic speech recognition.
One of my notable contributions was integrating the openAI’s Whisper model. My ongoing work involves incorporating advanced decoding methods into SpeechBrain speech recognition systems. It includes integrating CTC frame-synchronous beam search and CTC/Att joint decoding, leveraging language models such as kenLM and TransformerLM (e.g., GPT2) for improved performance. I am also working on the integration of Speech LLMs within SpeechBrain.
PUBLICATIONS
- Cross‑Lingual Interleaving for Spoken Language Models. ICASSP 2026 (under review).
- Text-speech language models with improved cross-modal transfer by aligning abstraction levels. Preprint. [preprint]
- Discrete Audio Tokens: More Than a Survey! TMLR 2025 [preprint]
- Open-source conversational ai with speechbrain 1.0. JMLR 2024 [preprint]
- Stabilising and accelerating light gated recurrent units for automatic speech recognition. ICASSP 2023 [preprint]
For a more complete list: Google Scholar.