Jingyi Chen

About Me

I am a Ph.D. Candidate in Computational Linguistics at The Ohio State University, specializing in speech synthesis, multimodal large language models, and reinforcement learning for audio. I work under the supervision of Dr. Micha Elsner and Dr. Andrew Perrault. My research focuses on advancing speech synthesis through reinforcement learning and diffusion models, developing speech emotion conversion systems, and creating benchmarks for evaluating multimodal LLMs on emotional speech understanding.

Previously, I completed my M.S. in Computer Science & Engineering at OSU and earned my B.A. in Linguistics from Sichuan International Studies University in China. I have industry experience as an Applied Scientist Intern at Amazon, where I developed production-ready speech emotion transfer systems and recommendation models.

Research Interests

Speech Synthesis: Text-to-speech systems, diffusion models, emotional speech generation, GANs for speech representation learning
Multimodal Large Language Models: Speech-text cooperation, instruction tuning, semantic-emotion disentanglement
Reinforcement Learning for Audio: RLHF, reward-based optimization, model fine-tuning

News

[Aug. 2025] Started new research project on Social-Emotional Speech Dialogue Benchmark for Multimodal LLMs.
[May. 2025] Completed internship at Amazon DEX AI, working on LLM-based recommendation systems.
[Jan. 2025] Released comprehensive emotion transfer dataset with 27K audio samples and published project page.
[Aug. 2025] Completed internship at Amazon Prime Video, delivered emotion transfer model to production.
[May 2025] Paper “Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models” accepted to Interspeech 2025.
[Aug. 2023] Paper “Exploring How Generative Adversarial Networks Learn Phonological Representations” accepted to ACL 2023 with Area Chair Awards.

Publications

arXiv

Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance

Jingyi Chen, Zhimeng Guo, Jiyun Chun, Pichao Wang, Andrew Perrault, Micha Elsner

arXiv preprint, 2025.

PDF Code Project Page Dataset BibTeX

@misc{chen2025audiollmsreallylisten,
  title={Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance}, 
  author={Jingyi Chen and Zhimeng Guo and Jiyun Chun and Pichao Wang and Andrew Perrault and Micha Elsner},
  year={2025},
  eprint={2510.10444},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2510.10444}
}

Interspeech

Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models

Jingyi Chen, Ju-Seung Byun, Micha Elsner, Andrew Perrault

Interspeech, 2025.

PDF Code Project Page BibTeX Oral Presentation

@inproceedings{chen2025dlpo,
  title={Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models},
  author={Chen, Jingyi and Byun, Ju-Seung and Elsner, Micha and Perrault, Andrew},
  booktitle={Interspeech},
  year={2025}
}

ACL

Exploring How Generative Adversarial Networks Learn Phonological Representations

Jingyi Chen, Micha Elsner

Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

PDF Code BibTeX Area Chair Awards, Oral Presentation

@inproceedings{chen2023exploring,
  title={Exploring How Generative Adversarial Networks Learn Phonological Representations},
  author={Chen, Jingyi and Elsner, Micha},
  booktitle={Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2023},
  url={https://arxiv.org/pdf/2305.12501}
}

TTIC Workshop

A Curriculum Learning Paradigm for Speech Emotion Transfer

Jingyi Chen, Pichao Wang, Andrew Perrault, Micha Elsner

TTIC Speech & Audio Foundation Models Workshop, 2025.

Project Page BibTeX

@inproceedings{chen2025emotion,
  title={A Curriculum Learning Paradigm for Speech Emotion Transfer},
  author={Chen, Jingyi and Wang, Pichao and Perrault, Andrew and Elsner, Micha},
  booktitle={TTIC Speech \& Audio Foundation Models Workshop},
  year={2025}
}

Memory retrieval as pressure towards chunking in morphological inflection

Micha Elsner, Jingyi Chen, Andrea Sims

Computational Linguistics, 2025.

BibTeX Journal Article

@article{elsner2025memory,
  title={Memory retrieval as pressure towards chunking in morphological inflection},
  author={Elsner, Micha and Chen, Jingyi and Sims, Andrea},
  journal={Computational Linguistics},
  year={2025}
}

About Me

Research Interests

News

Publications

Services

Conference Reviewers