📝 Publications
( * equal contribution, # corresponding author)
2025

SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios Kai Li, Wendi Sang, Chang Zeng, Guo Che, Runxuan Yang, Xiaolin Hu. ICLR 2025. Singapore EXPO.
-
SonicSim is a customizable simulation platform built on Habitat-sim, designed to generate high-fidelity, diverse synthetic data for speech separation and enhancement tasks involving moving sound sources, addressing the limitations of real-world and existing synthetic datasets in acoustic realism and scalability.

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation Mohan Xu, Kai Li*, Guo Chen, Xiaolin Hu. ICLR 2025. Singapore EXPO.

Apollo: Band-sequence Modeling for High-Quality Audio Restoration Kai Li, Yi Luo. ICASSP 2025. Hyderabad, India.
2024

SafeEar: Content Privacy-Preserving Audio Deepfake Detection Xinfeng Li, Kai Li*, Yifan Zheng, Chen Yan, Xiaoyu Ji, Wenyuan Xu. CCS 2024. Salt Lake City, U.S.A.

IIANet: an intra- and inter-modality attention network for audio-visual speech separation Kai Li, Runxuan Yang, Sun Fuchun, Xiaolin Hu. ICML 2024. Vienna, Austria.
-
Inspired by the cross-modal processing mechanism in the brain, we design intra- and inter-attention modules to integrate auditary and visual information for efficient speech separation. The model simulates audio-visual fusion in different levels of sensory cortical areas as well as higher association areas such as parietal cortex.

Towards Robust Pansharpening: A Large-Scale High-Resolution Multi-Scene Dataset and Novel Approach Shiying Wang, Xuechao Zou, Kai Li, Junliang Xing, Tengfei Cao, Pin Tao. Remote Sensing 2024.

The sound demixing challenge 2023–Cinematic demixing track. Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji. ISMIR 2024.

SPMamba: State-space model is all you need in speech separation. Kai Li, Guo Chen. Arxiv 2024.

High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark. Ben Chen, Xuechao Zou, Kai Li, Yu Zhang, Junliang Xing, Pin Tao ICME 2024. Niagra Falls, Canada

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits. Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu. TPAMI 2024.

RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation. Samuel Pegg*, Kai Li*, Xiaolin Hu. ICLR 2024. Vienna, Austria.

DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal From Optical Satellite Images. Xuechao Zou*, Kai Li*, Junliang Xing, Yu Zhang, Shiying Wang, Lei Jin, Pin Tao. TGRS 2024.

Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference. Kai Li, Yi Luo. ICASSP 2024. Seoul, Korea.
- A method for training neural networks with dynamic depth and width configurations, enabling flexible extraction of subnetworks during inference without additional training.

LEFormer: A Hybrid CNN-Transformer Architecture for Accurate Lake Extraction from Remote Sensing Imagery. Ben Chen, Xuechao Zou, Yu Zhang, Jiayu Li, Kai Li, Pin Tao. ICASSP 2024. Seoul, Korea.
2023

TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion. Samuel Pegg*, Kai Li*, Xiaolin Hu. ICIST 2023. Cairo, Egypt.
-
TDFNet is a cutting-edge method in the field of audio-visual speech separation. It introduces a multi-scale and multi-stage framework, leveraging the strengths of TDANet and CTCNet. This model is designed to address the inefficiencies and limitations of existing multimodal speech separation models, particularly in real-time tasks.

PMAA: A Progressive Multi-scale Attention Autoencoder Model for High-Performance Cloud Removal from Multi-temporal Satellite Imagery. Xuechao Zou*, Kai Li*, Junliang Xing, Pin Tao#, Yachao Cui. ECAI 2023. Kraków, Poland.

An efficient encoder-decoder architecture with top-down attention for speech separation. Kai Li, Runxuan Yang, Xiaolin Hu. ICLR 2023. Kigali, Rwanda.

One-page Report for Tencent AI Lab’s CDX 2023 System. Kai Li, Yi Luo. SDX Workshop 2023. Paris, France.
- We present the system of Tencent AI Lab for the Cinematic Sound Demixing Challenge 2023, which is based on a novel neural network architecture and a new training strategy.

Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model. Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu and Timo Gerkmann. Interspeech 2023. Dublin, Ireland.

A Neural State-Space Model Approach to Efficient Speech Separation. Chenchen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku and Eng Siong Chng. Interspeech 2023. Dublin, Ireland.

On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems. Kai Li, Yi Luo. ICASSP 2023. Melbourne, Australia.
- The paper explores converting RNN-based offline neural speech separation systems to online systems with minimal performance degradation.
2022

On the Use of Deep Mask Estimation Module for Neural Source Separation Systems. Kai Li, Xiaolin Hu, Yi Luo. InterSpeech 2022. Incheon, Korea.
- We propose a Deep Mask Estimation Module for speech separation, which can improve the performance without additional computional complex.

Inferring mechanisms of auditory attentional modulation with deep neural networks Ting-Yu Kuo, Yuanda Liao, Kai Li, Bo Hong, Xiaolin Hu. Neural Computation, 2022.
2021

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network. Xiaolin Hu*, #, Kai Li$^*$, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann. NeurIPS 2021. Online.
2020 and Prior

A Survey of Single Image Super Resolution Reconstruction. Kai Li, Shenghao Yang, Runting Dong, Jianqiang Huang#, Xiaoying Wang. IET Image Processing 2020.
- This paper provides a comprehensive survey of single image super-resolution reconstruction methods, including traditional methods and deep learning-based methods.

Single Image Super-resolution Reconstruction of Enhanced Loss Function with Multi-GPU Training. Jianqiang Huang*, #, Kai Li$^*$, Xiaoying Wang.
- This paper proposes a multi-GPU training method for single image super-resolution reconstruction, which can significantly reduce training time and improve performance.