Department of Software and Computing Systems

Lecture

Title:From Acoustic Models to Speech LLMs: Connecting Audio and Language Understanding Import to your calendar:
[CSV]
tutorial
Presenter:Yongjian Chen (estudiant/e PhD univ. Groningen)
Venue:aula I2/POLIVALENTE en Institutos Universitarios II
Date&time:11:00 03/06/2025
Estimated duration:2:00 horas
Contact person:

Toral Ruiz, Antonio (antonio.toralr[Perdone'm]ua.es)
Abstract:
Speech LLMs represent the convergence of multiple research areas, being
particularly relevant for natural language processing through their capacity
for speech-to-text/speech downstream tasks with or without intermediate
transcription, and for general AI research by exemplifying key trends like
self-supervised learning and multimodal processing.

This 2-hour tutorial traces the evolution of speech processing from traditional
acoustic models to modern Speech Large Language Models (LLMs), providing both
theoretical foundations and hands-on implementation experience. The session
begins with classical approaches (HMM-GMM, HMM-DNN) and progressively advances
through end-to-end neural architectures, self-supervised learning paradigms
(Wav2Vec2.0, HuBERT, Whisper), culminating in contemporary Speech LLMs that
integrate audio and text understanding.

The tutorial includes comparative analysis of foundation models, examining
pre-training paradigms and fine-tuning strategies, along with a practical
coding session where participants will implement and fine-tune ASR models,
gaining hands-on experience with modern speech processing pipelines.

The practical implementation session provides immediately applicable skills
to research projects requiring speech processing components, while bringing
together researchers from complementary domains which creates opportunities
for interdisciplinary innovation and idea exchange between machine translation,
general AI, and music generation.

[ Close ]