Lecture - Department of Software and Computing Systems

Title:	From Acoustic Models to Speech LLMs: Connecting Audio and Language Understanding	Import to your calendar:
	tutorial
Presenter:	Yongjian Chen (estudiant/e PhD univ. Groningen)
Venue:	aula I2/POLIVALENTE en Institutos Universitarios II
Date&time:	11:00 03/06/2025
Estimated duration:	2:00 horas
Contact person:	Toral Ruiz, Antonio (antonio.toralrua.es)
Abstract:	Speech LLMs represent the convergence of multiple research areas, being particularly relevant for natural language processing through their capacity for speech-to-text/speech downstream tasks with or without intermediate transcription, and for general AI research by exemplifying key trends like self-supervised learning and multimodal processing. This 2-hour tutorial traces the evolution of speech processing from traditional acoustic models to modern Speech Large Language Models (LLMs), providing both theoretical foundations and hands-on implementation experience. The session begins with classical approaches (HMM-GMM, HMM-DNN) and progressively advances through end-to-end neural architectures, self-supervised learning paradigms (Wav2Vec2.0, HuBERT, Whisper), culminating in contemporary Speech LLMs that integrate audio and text understanding. The tutorial includes comparative analysis of foundation models, examining pre-training paradigms and fine-tuning strategies, along with a practical coding session where participants will implement and fine-tune ASR models, gaining hands-on experience with modern speech processing pipelines. The practical implementation session provides immediately applicable skills to research projects requiring speech processing components, while bringing together researchers from complementary domains which creates opportunities for interdisciplinary innovation and idea exchange between machine translation, general AI, and music generation.

[ Close ]