Publications

Papers

  • SonarGuard2: Ultrasonic Face Liveness Detection Based on Adaptive Doppler Effect Feature Extraction.
    Xiaoming Zhang, Keyue Zhang, Taiping Yao, Songjun Cao, Shouhong Ding, Long Ma. INTERSPEECH 2025.
  • MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt.
    Zhichao Wu, Yueteng Kang, Songjun Cao, Long Ma, Qiulin Li, Qun Yang. INTERSPEECH 2025.
  • Monotonic Attention for Robust Text-to-Speech Synthesis in Large Language Model Frameworks.
    Yike Zhang, Yiming Li, Jie Chen, Qinghua Wu, Songjun Cao, Long Ma. INTERSPEECH 2025.
  • DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models.
    Weihao Wu*, Zhiwei Lin*, Yixuan Zhou, Jingbei Li, Rui Niu, Qinghua Wu, Songjun Cao, Long Ma, Zhiyong Wu. ICASSP 2025.
  • M-MoE: Mixture of Mixture-of-Expert Model for CTC-based Streaming Multilingual ASR.
    Songjun Cao, Xiong Wang, Yike Zhang, Xiaoming Zhang, Long Ma. ICASSP 2025.
  • A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition.
    Yangze Li*, Xiong Wang*, Songjun Cao, Yike Zhang, Long Ma, Lei Xie. INTERSPEECH 2024.
  • DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model.
    Yanzhe Fu*, Yueteng Kang*, Songjun Cao, Long Ma. arXiv 2023.
  • Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training.
    Bowen Zhang*, Songjun Cao*, Xiaoming Zhang, Yike Zhang, Long Ma, Takahiro Shinozaki. INTERSPEECH 2022.
  • Improving CTC-based Speech Recognition via Knowledge Transferring from Pre-trained Language Models.
    Keqi Deng*, Songjun Cao*, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang. ICASSP 2022.
  • A Practical Framework for Multi-domain Speech Recognition and an Instance Sampling Method to Neural Language Modeling.
    Yike Zhang, Xiaobing Feng, Yi Liu, Songjun Cao, Long Ma. arXiv 2021.
  • Improving Hybrid CTC/Attention End-to-end Speech Recognition with Pretrained Acoustic and Language Models.
    Keqi Deng*, Songjun Cao*, Yike Zhang, Long Ma. ASRU 2021.
  • Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning .
    Songjun Cao, Yueteng Kang, Yanzhe Fu, Xiaoshuo Xu, Sining Sun, Yike Zhang, Long Ma. INTERSPEECH 2021.
  • Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-supervised Learning.
    Keqi Deng*, Songjun Cao*, Long Ma. INTERSPEECH 2021.
  • Explore Wav2vec 2.0 for Mispronunciation Detection .
    Xiaoshuo Xu, Yueteng Kang, Songjun Cao, Binghuai Lin, Long Ma. INTERSPEECH 2021.
  • Improving Speech Recognition Accuracy of Local Poi Using Geographical Models.
    Songjun Cao* , Yike Zhang*, Xiaobing Feng, Long Ma. SLT 2021.
  • Multi-head Monotonic Chunkwise Attention For Online Speech Recognition.
    Baiji Liu, Songjun Cao, Sining Sun, Weibin Zhang, Long Ma. arXiv 2020.