—— 要投稿,上万维,轻松学术交流

严正声明

本站非期刊官网,非中介代理,
不向作者收取任何费用!
举报微信:13140028228 冯老师

态度公正、信息求实、投稿自助、使用免费
您的位置:学术资讯 » 正文
  • 阅读: 2023/9/27 9:38:19

    eess.AS音频处理,共计44

    1Towards General-Purpose Text-Instruction-Guided Voice Conversion

    链接:https://arxiv.org/abs/2309.14324

    作者:Chun-Yi Kuan, Chen An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-yiin Chang, Hung-yi Lee

    备注:Accepted to ASRU 2023

    2Speaker anonymization using neural audio codec language models

    链接:https://arxiv.org/abs/2309.14129

    作者:Michele Panariello, Francesco Nespoli, Massimiliano Todisco, Nicholas Evans

    备注:Submitted to ICASSP 2024

    3Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification

    链接:https://arxiv.org/abs/2309.14109

    作者:Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li

    备注:accepted by ASRU 2023

    4Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech

    链接:https://arxiv.org/abs/2309.14107

    作者:Farhad Javanmardi, Saska Tirronen, Manila Kodali, Sudarsana Reddy Kadiri, Paavo Alku

    5BiSinger: Bilingual Singing Voice Synthesis

    链接:https://arxiv.org/abs/2309.14089

    作者:Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li

    备注:Accepted by ASRU2023

    6Analysis and Detection of Pathological Voice using Glottal Source Features

    链接:https://arxiv.org/abs/2309.14080

    作者:Sudarsana Reddy Kadiri, Paavo Alku

    7Unsupervised Accent Adaptation Through Masked Language Model Correction Of Discrete Self-Supervised Speech Units

    链接:https://arxiv.org/abs/2309.13994

    作者:Jakob Poncelet, Hugo Van hamme

    备注:Submitted to ICASSP2024

    8Connecting Speech Encoder and Large Language Model for ASR

    链接:https://arxiv.org/abs/2309.13963

    作者:Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

    9Evaluating Classification Systems Against Soft Labels with Fuzzy Precision and Recall

    链接:https://arxiv.org/abs/2309.13938

    作者:Manu Harju, Annamaria Mesaros

    备注:published in DCASE 2023

    10Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors

    链接:https://arxiv.org/abs/2309.13916

    作者:Di Liang, Nian Shao, Xiaofei Li

    11AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

    链接:https://arxiv.org/abs/2309.13905

    作者:Jianwei Yu, Hangting Chen, Yanyao Bian, Xiang Li, Yi Luo, Jinchuan Tian, Mengyang Liu, Jiayi Jiang, Shuai Wang

    12Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

    链接:https://arxiv.org/abs/2309.13874

    作者:Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng

    备注:Submitted to ICASSP 2024

    13A Two-Step Approach for Narrowband Source Localization in Reverberant Rooms

    链接:https://arxiv.org/abs/2309.13819

    作者:Wei-Ting Lai, Lachlan Birnie, Thushara Abhayapala, Amy Bastine, Shaoheng Xu, Prasanga Samarasinghe

    14VoiceLDM: Text-to-Speech with Environmental Context

    链接:https://arxiv.org/abs/2309.13664

    作者:Yeonghyeon Lee, Inmo Yeon, Juhan Nam, Joon Son Chung

    备注:Demos and code are available at this https URL

    15Cross-modal Alignment with Optimal Transport for CTC-based ASR

    链接:https://arxiv.org/abs/2309.13650

    作者:Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

    备注:Accepted to IEEE ASRU 2023

    16Efficient Black-Box Speaker Verification Model Adaptation with Reprogramming and Backend Learning

    链接:https://arxiv.org/abs/2309.13605

    作者:Jingyu Li, Tan Lee

    17Speech enhancement with frequency domain auto-regressive modeling

    链接:https://arxiv.org/abs/2309.13537

    作者:Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram Ganapathy

    备注:10 pages

    18Attention Is All You Need For Blind Room Volume Estimation

    链接:https://arxiv.org/abs/2309.13504

    作者:Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin

    备注:5 pages, 4 figures, submitted ICASSP 2024

    19Contrastive Speaker Embedding With Sequential Disentanglement

    链接:https://arxiv.org/abs/2309.13253

    作者:Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien

    备注:Submitted to ICASSP 2024

    20Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR

    链接:https://arxiv.org/abs/2309.13102

    作者:Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan "Honza" Silovsky

    备注:In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023

    21An Investigation of Distribution Alignment in Multi-Genre Speaker Recognition

    链接:https://arxiv.org/abs/2309.14158

    作者:Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

    备注:submitted to ICASSP 2024

    22Multi-Domain Adaptation by Self-Supervised Learning for Speaker Verification

    链接:https://arxiv.org/abs/2309.14149

    作者:Wan Lin, Lantian Li, Dong Wang

    备注:submitted to ICASSP 2024

    23On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers

    链接:https://arxiv.org/abs/2309.14130

    作者:Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

    备注:submitted to ICASSP 2024

    24VoiceLens: Controllable Speaker Generation and Editing with Flow

    链接:https://arxiv.org/abs/2309.14094

    作者:Yao Shi, Ming Li

    25Audio classification with Dilated Convolution with Learnable Spacings

    链接:https://arxiv.org/abs/2309.13972

    作者:Ismail Khalfaoui-Hassani, Timothée Masquelier, Thomas Pellegrini

    26Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

    链接:https://arxiv.org/abs/2309.13942

    作者:Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu

    备注:Published at the CVPR 2023 Sight and Sound workshop

    27Real-Time Emergency Vehicle Detection using Mel Spectrograms and Regular Expressions

    链接:https://arxiv.org/abs/2309.13920

    作者:Alberto Pacheco-Gonzalez, Raymundo Torres, Raul Chacon, Isidro Robledo

    备注:in Spanish language

    28HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS

    链接:https://arxiv.org/abs/2309.13907

    作者:Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie

    备注:Accepted by ASRU2023

    29Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

    链接:https://arxiv.org/abs/2309.13876

    作者:Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

    备注:Accepted at ASRU 2023

    30Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

    链接:https://arxiv.org/abs/2309.13860

    作者:Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen

    31The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

    链接:https://arxiv.org/abs/2309.13573

    作者:Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

    备注:8 pages, Accepted by ASRU2023

    32Related Rhythms: Recommendation System To Discover Music You May Like

    链接:https://arxiv.org/abs/2309.13544

    作者:Rahul Singh, Pranav Kanuparthi

    33Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

    链接:https://arxiv.org/abs/2309.13509

    作者:Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari

    备注:Submitted to ASRU2023

    34Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection

    链接:https://arxiv.org/abs/2309.13476

    作者:Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

    备注:5 pages, 3 figures, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing

    35Asca: less audio data is more insightful

    链接:https://arxiv.org/abs/2309.13373

    作者:Xiang Li, Junhao Chen, Chao Li, Hongwu Lv

    备注:6 pages,3 figures

    36My Science Tutor (MyST) -- A Large Corpus of Children's Conversational Speech

    链接:https://arxiv.org/abs/2309.13347

    作者:Sameer S. Pradhan, Ronald A. Cole, Wayne H. Ward

    37Two vs. Four-Channel Sound Event Localization and Detection

    链接:https://arxiv.org/abs/2309.13343

    作者:Julia Wilkins, Magdalena Fuentes, Luca Bondi, Shabnam Ghaffarzadegan, Ali Abavisani, Juan Pablo Bello

    38Beyond Fairness: Age-Harmless Parkinson's Detection via Voice

    链接:https://arxiv.org/abs/2309.13292

    作者:Yicheng Wang, Xiaotian Han, Leisheng Yu, Na Zou

    39WikiMT++ Dataset Card

    链接:https://arxiv.org/abs/2309.13259

    作者:Monan Zhou, Shangda Wu, Yuan Wang, Wei Li

    40Importance of negative sampling in weak label learning

    链接:https://arxiv.org/abs/2309.13227

    作者:Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj

    41Invisible Watermarking for Audio Generation Diffusion Models

    链接:https://arxiv.org/abs/2309.13166

    作者:Xirong Cao, Xiang Li, Divyesh Jadav, Yanzhao Wu, Zhehui Chen, Chen Zeng, Wenqi Wei

    备注:This is an invited paper for IEEE TPS, part of the IEEE CIC/CogMI/TPS 2023 conference

    42Towards Lexical Analysis of Dog Vocalizations via Online Videos

    链接:https://arxiv.org/abs/2309.13086

    作者:Yufei Wang, Chunhao Zhang, Jieyi Huang, Mengyue Wu, Kenny Zhu

    43Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners

    链接:https://arxiv.org/abs/2309.13085

    作者:Jieyi Huang, Chunhao Zhang, Yufei Wang, Mengyue Wu, Kenny Zhu

    44Applied design thinking in urban air mobility: creating the airtaxi cabin design of the future from a user perspective

    链接:https://arxiv.org/abs/2309.05353

    作者:F.Reimer, J.Herzig, L.Winkler, J.Biedermann, F.Meller, B.Nagel

    备注:13 pages

    转自:arXiv每日学术速递”微信公众号

    如有侵权,请联系本站删除!


    浏览(170)
    点赞(0)
    收藏(0)

上一篇:图像和视频处理学术速递[9.26]

下一篇:数学物理/复杂变量学术速递[9.26]