自然语言处理学术速递[1.10]

格林先生MrGreen arXiv每日学术速递 2022-05-05

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.CL 方向，今日共计14篇

QA|VQA|问答|对话(2篇)

【1】 RxWhyQA: a clinical question-answering dataset with the challenge of multi-answer questions
标题：RxWhyQA：一个具有多答案问题挑战的临床问答数据集
链接：https://arxiv.org/abs/2201.02517

作者：Sungrim Moon,Huan He,Hongfang Liu,Jungwei W. Fan
机构：Department of Artificial Intelligence & Informatics, Center for Clinical and Translational Science, Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Rochester, United States, st Street SW, RO-HA-,-CSHCD, Rochester, MN , (,) ,-
备注：2 tables, 3 figures
摘要：目的创建一个数据集，用于开发和评估能够处理多答案问题的临床问答（QA）系统。材料和方法我们利用2018年国家NLP临床挑战（n2c2）语料库中的注释关系生成QA数据集。1-to-0和1-to-N药物原因关系形成了不可回答和多答案条目，代表了现有临床QA数据集中缺乏的具有挑战性的场景。结果RxWhyQA结果数据集包含91440个QA条目，其中一半无法回答，21%（n=19269）的可回答条目需要多个答案。该数据集符合社区审查的斯坦福问答数据集（团队）格式。讨论RxWhyQA有助于比较需要处理零答案和多答案挑战的不同系统，要求双重缓解假阳性和假阴性答案。结论：我们创建并共享了一个临床QA数据集，重点关注多答案问题，以表示真实场景。
摘要：Objectives Create a dataset for the development and evaluation of clinical question-answering (QA) systems that can handle multi-answer questions. Materials and Methods We leveraged the annotated relations from the 2018 National NLP Clinical Challenges (n2c2) corpus to generate a QA dataset. The 1-to-0 and 1-to-N drug-reason relations formed the unanswerable and multi-answer entries, which represent challenging scenarios lacking in the existing clinical QA datasets. Results The result RxWhyQA dataset contains 91,440 QA entries, of which half are unanswerable, and 21% (n=19,269) of the answerable ones require multiple answers. The dataset conforms to the community-vetted Stanford Question Answering Dataset (SQuAD) format. Discussion The RxWhyQA is useful for comparing different systems that need to handle the zero- and multi-answer challenges, demanding dual mitigation of both false positive and false negative answers. Conclusion We created and shared a clinical QA dataset with a focus on multi-answer questions to represent real-world scenarios.

【2】 Sign Language Video Retrieval with Free-Form Textual Queries
标题：基于自由格式文本查询的手语视频检索
链接：https://arxiv.org/abs/2201.02495

作者：Amanda Duarte,Samuel Albanie,Xavier Giró-i-Nieto,Gül Varol
机构：Xavier Gir´o-i-Nieto, G¨ul Varol, Universitat Politecnica de Catalunya, Spain, Barcelona Supercomputing Center, Spain, Machine Intelligence Laboratory, University of Cambridge, UK, Institut de Robotica i Informatica Industrial, CSIC-UPC, Spain
摘要：能够高效搜索手语视频集的系统已经被认为是手语技术的一个有用的应用。然而，在文献中，在单个关键词之外搜索视频的问题受到了有限的关注。为了解决这一差距，在这项工作中，我们介绍了使用自由形式文本查询进行手语检索的任务：给定一个书面查询（例如，一个句子）和大量手语视频集合，目标是在集合中找到与书面查询最匹配的手语视频。我们建议通过在最近引入的美国手语（ASL）大规模How2Sign数据集上学习跨模态嵌入来解决这一问题。我们发现，系统性能的一个关键瓶颈是符号视频嵌入的质量，这是由于缺乏标记的训练数据造成的。因此，我们提出SPOT-ALIGN，这是一个交叉迭代的符号定位和特征对齐框架，以扩大可用训练数据的范围和规模。我们通过改进符号识别和视频检索任务，验证了SPOT-ALIGN学习鲁棒符号视频嵌入的有效性。
摘要：Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology. However, the problem of searching videos beyond individual keywords has received limited attention in the literature. To address this gap, in this work we introduce the task of sign language retrieval with free-form textual queries: given a written query (e.g., a sentence) and a large collection of sign language videos, the objective is to find the signing video in the collection that best matches the written query. We propose to tackle this task by learning cross-modal embeddings on the recently introduced large-scale How2Sign dataset of American Sign Language (ASL). We identify that a key bottleneck in the performance of the system is the quality of the sign video embedding which suffers from a scarcity of labeled training data. We, therefore, propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data. We validate the effectiveness of SPOT-ALIGN for learning a robust sign video embedding through improvements in both sign recognition and the proposed video retrieval task.

机器翻译(1篇)

【1】 Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT
标题：具有远程监控和置信度校准的大规模蛋白质翻译后修饰提取
链接：https://arxiv.org/abs/2201.02229

作者：Aparna Elangovan,Yuan Li,Douglas E. V. Pires,Melissa J. Davis,Karin Verspoor
机构：au 4 School of Computing Technologies, RMIT University
备注：None
摘要：蛋白质-蛋白质相互作用（PPIs）是正常细胞功能的关键，与许多疾病途径有关。然而，只有4%的PPI在生物知识数据库（如完整数据库）中使用PTM进行注释，主要通过手动整理完成，这既不省时也不划算。我们使用完整的PPI数据库创建一个远程监督数据集，该数据集由相互作用的蛋白质对、它们对应的PTM类型以及来自PubMed数据库的相关摘要注释。我们训练了一组BioBERT模型，称为PPI-BioBERT-x10，以提高置信度校准。我们将集合平均置信度方法与置信度变化相结合，以抵消类别不平衡的影响，从而提取高置信度预测。在测试集上评估的PPI-BioBERT-x10模型得出了适度的F1 micro 41.3（P=58.1，R=32.1）。然而，通过结合高置信度和低变异来识别高质量预测，调整预测的精度，我们保留了19%的测试预测的100%精度。我们在1800万份PubMed摘要上评估了PPI-BioBERT-x10，提取了160万（546507个独特的PTM-PPI三联体）PTM-PPI预测，并过滤了约5700（4584个独特的）高置信度预测。在5700中，对一个随机抽样的小子集进行的人类评估表明，尽管进行了置信度校准，但精确度仍下降到33.7%，并突出了即使进行了置信度校准，也无法超越测试集的通用性。我们通过只包含与多篇论文相关的预测来避免这个问题，将精确度提高到58.8%。在这项工作中，我们强调了基于深度学习的文本挖掘在实践中的好处和挑战，以及需要更加强调信心校准，以促进人类的管理工作。
摘要：Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time nor cost-effective. We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models - dubbed PPI-BioBERT-x10 to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter ~ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.

语义分析(1篇)

【1】 Semantic-based Data Augmentation for Math Word Problems
标题：基于语义的数学应用题数据增强
链接：https://arxiv.org/abs/2201.02489

作者：Ailisi Li,Jiaqing Liang,Yanghua Xiao
机构：Fudan University
摘要：神经MWP解算器很难处理微小的局部方差。在MWP任务中，一些局部变化保留了原始语义，而另一些局部变化可能完全改变了底层逻辑。目前，MWP任务的现有数据集包含有限的样本，这是神经模型学习消除问题中不同类型局部方差的歧义并正确解决问题的关键。在本文中，我们提出了一套新的数据扩充方法，用不同类型的局部方差扩充的数据来补充现有的数据集，并有助于提高现有神经模型的泛化能力。通过知识引导的实体替换和逻辑引导的问题重组生成新样本。增强方法确保了新数据与其标签之间的一致性。实验结果表明了该方法的必要性和有效性。
摘要：It's hard for neural MWP solvers to deal with tiny local variances. In MWP task, some local changes conserve the original semantic while the others may totally change the underlying logic. Currently, existing datasets for MWP task contain limited samples which are key for neural models to learn to disambiguate different kinds of local variances in questions and solve the questions correctly. In this paper, we propose a set of novel data augmentation approaches to supplement existing datasets with such data that are augmented with different kinds of local variances, and help to improve the generalization ability of current neural models. New samples are generated by knowledge guided entity replacement, and logic guided problem reorganization. The augmentation approaches are ensured to keep the consistency between the new data and their labels. Experimental results have shown the necessity and the effectiveness of our methods.

Graph|知识图谱|Knowledge(1篇)

【1】 Predicting Patient Readmission Risk from Medical Text via Knowledge Graph Enhanced Multiview Graph Convolution
标题：基于知识图增强多视图卷积的医学文本再入院风险预测
链接：https://arxiv.org/abs/2201.02510

作者：Qiuhao Lu,Thien Huu Nguyen,Dejing Dou
机构：University of Oregon, Eugene, OR, USA, Baidu Research
备注：SIGIR 2021
摘要：计划外重症监护病房（ICU）再入院率是评估医院护理质量的重要指标。有效准确地预测ICU再入院风险不仅有助于防止患者不当出院和潜在危险，还可以降低相关的医疗成本。在本文中，我们提出了一种新的方法，使用电子健康记录（EHR）的医学文本进行预测，这为以前严重依赖患者数字和时间序列特征的研究提供了另一种视角。更具体地说，我们从EHR中提取患者的出院总结，并用外部知识图增强的多视图图表示。然后使用图卷积网络进行表示学习。实验结果证明了我们的方法的有效性，为这项任务提供了最先进的性能。
摘要：Unplanned intensive care unit (ICU) readmission rate is an important metric for evaluating the quality of hospital care. Efficient and accurate prediction of ICU readmission risk can not only help prevent patients from inappropriate discharge and potential dangers, but also reduce associated costs of healthcare. In this paper, we propose a new method that uses medical text of Electronic Health Records (EHRs) for prediction, which provides an alternative perspective to previous studies that heavily depend on numerical and time-series features of patients. More specifically, we extract discharge summaries of patients from their EHRs, and represent them with multiview graphs enhanced by an external knowledge graph. Graph convolutional networks are then used for representation learning. Experimental results prove the effectiveness of our method, yielding state-of-the-art performance for this task.

摘要|信息提取(2篇)

【1】 Video Summarization Based on Video-text Representation
标题：基于图文表示的视频摘要
链接：https://arxiv.org/abs/2201.02494

作者：Li Haopeng,Ke Qiuhong,Gong Mingming,Zhang Rui
机构：Video Summarization Based on Video-text RepresentationHaopeng LiUniversity of Melbournehaopeng, auQiuhong KeUniversity of Melbourneqiuhong, auMingming GongUniversity of Melbournemingming, auRui ZhangTsinghua Universityrayteam
摘要：现代视频摘要方法是基于深度神经网络的，需要大量的标注数据进行训练。然而，现有的视频摘要数据集规模较小，容易导致深度模型的过度拟合。考虑到大规模数据集的标注耗时，我们提出了一种多模态自监督学习框架来获取视频的语义表示，这有利于视频摘要任务的完成。具体来说，我们探索视频的视觉信息和文本信息之间的语义一致性，以便在新收集的视频文本对数据集上对多模式编码器进行自我监督预训练。此外，我们还介绍了一种渐进式视频摘要方法，在该方法中，视频中的重要内容将逐步精确定位，以生成更好的摘要。最后，提出了一种基于视频分类的视频摘要质量客观评价框架。大量的实验证明了我们的方法在秩相关系数、F-分数和提出的客观评价方面的有效性和优越性。
摘要：Modern video summarization methods are based on deep neural networks which require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep models. Considering that the annotation of large-scale datasets is time-consuming, we propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task. Specifically, we explore the semantic consistency between the visual information and text information of videos, for the self-supervised pretraining of a multimodal encoder on a newly-collected dataset of video-text pairs. Additionally, we introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries. Finally, an objective evaluation framework is proposed to measure the quality of video summaries based on video classification. Extensive experiments have proved the effectiveness and superiority of our method in rank correlation coefficients, F-score, and the proposed objective evaluation compared to the state of the art.

【2】 An Unsupervised Masking Objective for Abstractive Multi-Document News Summarization
标题：一种面向抽象多文档新闻摘要的无监督掩蔽目标
链接：https://arxiv.org/abs/2201.02321

作者：Nikolai Vogler,Songlin Li,Yujie Xu,Yujian Mi,Taylor Berg-Kirkpatrick
机构：Computer Science and Engineering, University of California, San Diego
摘要：我们证明了一个简单的无监督掩蔽目标可以接近抽象多文档新闻摘要的监督性能。我们的方法训练了一个先进的神经摘要模型来预测相对于多文档组具有最高词汇中心性的被屏蔽源文档。在多新闻数据集上的实验中，我们的蒙面训练目标产生了一个优于过去无监督方法的系统，并且在人类评估中，在不需要访问任何基本事实摘要的情况下，超过了最佳监督方法。此外，我们还评估了词汇中心性的不同度量是如何影响最终性能的，这些度量是受过去提取摘要工作的启发的。
摘要：We show that a simple unsupervised masking objective can approach near supervised performance on abstractive multi-document news summarization. Our method trains a state-of-the-art neural summarization model to predict the masked out source document with highest lexical centrality relative to the multi-document group. In experiments on the Multi-News dataset, our masked training objective yields a system that outperforms past unsupervised methods and, in human evaluation, surpasses the best supervised method without requiring access to any ground-truth summaries. Further, we evaluate how different measures of lexical centrality, inspired by past work on extractive summarization, affect final performance.

GAN|对抗|攻击|生成相关(2篇)

【1】 Repairing Adversarial Texts through Perturbation
标题：通过扰动修复敌意文本
链接：https://arxiv.org/abs/2201.02504

作者：Guoliang Dong,Jingyi Wang,Jun Sun,Sudipta Chattopadhyay,Xinyu Wang,Ting Dai,Jie Shi,Jin Song Dong
机构：sg•Sudipta Chattopadhyay is with Singapore University of Technology andDesign
摘要：众所周知，神经网络会受到对抗性扰动的攻击，即通过扰动恶意构建的输入，以诱导错误的预测。此外，此类攻击不可能消除，即在应用对抗性训练等缓解方法后，仍然可能出现对抗性干扰。已经开发了多种方法来检测和拒绝此类敌对输入，主要是在图像领域。然而，拒绝可疑输入可能并不总是可行或理想的。首先，由于检测算法产生的假警报，可能会拒绝正常输入。其次，拒绝服务攻击可以通过向此类系统提供敌对输入来实施。为了解决这一差距，在这项工作中，我们提出了一种在运行时自动修复敌对文本的方法。给定一个怀疑是敌对的文本，我们以积极的方式应用多种敌对扰动方法来识别修复，即神经网络正确分类的稍微变异但语义等价的文本。我们的方法已经在为自然语言处理任务训练的多个模型上进行了实验，结果表明我们的方法是有效的，即它成功地修复了大约80\%的对抗性文本。此外，根据应用的扰动方法，对抗性文本可以在平均1秒的时间内修复。
摘要：It is known that neural networks are subject to attacks through adversarial perturbations, i.e., inputs which are maliciously crafted through perturbations to induce wrong predictions. Furthermore, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs, mostly in the image domain. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address the gap, in this work, we propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Our approach has been experimented with multiple models trained for natural language processing tasks and the results show that our approach is effective, i.e., it successfully repairs about 80\% of the adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired in as short as one second on average.

【2】 A Transfer Learning Pipeline for Educational Resource Discovery with Application in Leading Paragraph Generation
标题：教育资源发现的迁移学习流水线及其在前导段落生成中的应用
链接：https://arxiv.org/abs/2201.02312

作者：Irene Li,Thomas George,Alexander Fabbri,Tammy Liao,Benjamin Chen,Rina Kawamura,Richard Zhou,Vanessa Yan,Swapnil Hingmire,Dragomir Radev
机构：Yale University, USA,University of Waterloo, Canada,Tata Consultancy Services Limited, India
摘要：有效的人类学习取决于广泛选择的教育材料，这些材料与学习者当前对主题的理解相一致。虽然互联网已经彻底改变了人类的学习或教育，但仍然存在巨大的资源获取障碍。也就是说，过多的在线信息会使导航和发现高质量的学习材料变得具有挑战性。在本文中，我们提出了教育资源发现（ERD）管道，该管道可以自动化新领域的web资源发现。管道包括三个主要步骤：数据收集、特征提取和资源分类。我们从一个已知的源域开始，通过迁移学习在两个不可见的目标域上进行资源发现。我们首先从一组种子文档中收集频繁的查询，并在web上搜索以获取候选资源，例如讲座幻灯片和介绍性博客文章。然后，我们引入一种新的预训练信息检索深度神经网络模型，即查询文档屏蔽语言建模（QD-MLM），来提取这些候选资源的深度特征。我们使用一个基于树的分类器来判断候选对象是否是一个积极的学习资源。当在两个相似但新颖的目标域上进行评估时，管道的F1得分分别为0.94和0.82。最后，我们将演示此管道如何有利于应用程序：调查的前导段落生成。据我们所知，这是第一次考虑调查生成的各种web资源的研究。我们还从NLP、计算机视觉（CV）和统计（STATS）发布了39728个手动标记的web资源和659个查询的语料库。
摘要：Effective human learning depends on a wide selection of educational materials that align with the learner's current understanding of the topic. While the Internet has revolutionized human learning or education, a substantial resource accessibility barrier still exists. Namely, the excess of online information can make it challenging to navigate and discover high-quality learning materials. In this paper, we propose the educational resource discovery (ERD) pipeline that automates web resource discovery for novel domains. The pipeline consists of three main steps: data collection, feature extraction, and resource classification. We start with a known source domain and conduct resource discovery on two unseen target domains via transfer learning. We first collect frequent queries from a set of seed documents and search on the web to obtain candidate resources, such as lecture slides and introductory blog posts. Then we introduce a novel pretrained information retrieval deep neural network model, query-document masked language modeling (QD-MLM), to extract deep features of these candidate resources. We apply a tree-based classifier to decide whether the candidate is a positive learning resource. The pipeline achieves F1 scores of 0.94 and 0.82 when evaluated on two similar but novel target domains. Finally, we demonstrate how this pipeline can benefit an application: leading paragraph generation for surveys. This is the first study that considers various web resources for survey generation, to the best of our knowledge. We also release a corpus of 39,728 manually labeled web resources and 659 queries from NLP, Computer Vision (CV), and Statistics (STATS).

识别/分类(1篇)

【1】 Automatic Speech Recognition Datasets in Cantonese Language: A Survey and a New Dataset
标题：粤语自动语音识别数据集：综述和一个新的数据集
链接：https://arxiv.org/abs/2201.02419

作者：Tiezheng Yu,Rita Frieske,Peng Xu,Samuel Cahyawijaya,Cheuk Tung Shadow Yiu,Holy Lovenia,Wenliang Dai,Elham J. Barezi,Qifeng Chen,Xiaojuan Ma,Bertram E. Shi,Pascale Fung
机构：The Hong Kong University of Science and Technology
摘要：低资源语言上的自动语音识别（ASR）提高了语言少数群体获得人工智能（AI）技术优势的机会。在本文中，我们解决一个问题，香港广东话语言的数据稀缺性，通过创建一个新的广东话数据集。我们的数据集，多域粤语语料库（MCDC），由73.6个小时的干净阅读语音配对成绩单，收集来自广东香港的有声读物。它结合了哲学、政治、教育、文化、生活方式和家庭领域，涵盖了广泛的主题。我们还回顾了所有现有的粤语数据集，并在两个最大的数据集（MDCC和Common Voice zh HK）上进行了实验。我们根据语音类型、数据源、总大小和可用性对现有数据集进行分析。使用Fairseq S2T Transformer（最先进的ASR模型）进行的实验结果表明了我们数据集的有效性。此外，通过在MDCC和Common Voice zh HK上应用多数据集学习，我们创建了一个强大而健壮的广东话ASR模型。
摘要：Automatic speech recognition (ASR) on low resource languages improves access of linguistic minorities to technological advantages provided by Artificial Intelligence (AI). In this paper, we address a problem of data scarcity of Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong. It combines philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics. We also review all existing Cantonese datasets and perform experiments on the two biggest datasets (MDCC and Common Voice zh-HK). We analyze the existing datasets according to their speech type, data source, total size and availability. The results of experiments conducted with Fairseq S2T Transformer, a state-of-the-art ASR model, show the effectiveness of our dataset. In addition, we create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK.

Word2Vec|文本|单词(2篇)

【1】 Code-Switching Text Augmentation for Multilingual Speech Processing
标题：用于多语言语音处理的码型转换文本增强
链接：https://arxiv.org/abs/2201.02550

作者：Amir Hussein,Shammur Absar Chowdhury,Ahmed Abdelali,Najim Dehak,Ahmed Ali
机构：KANARI AI , California, USA, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, USA, Qatar Computing Research Institute, Qatar
摘要：话语内语码转换（CS）在口语内容中的普遍性迫使ASR系统处理混合输入。然而，设计CS-ASR有很多挑战，主要是由于数据稀缺、语法结构复杂、不匹配以及语言使用分布不平衡。最近的ASR研究表明，E2E-ASR使用多语言数据处理CS现象的优势在于CS数据很少。但是，对CS数据的依赖性仍然存在。在这项工作中，我们提出了一种方法来增加单语数据，人工生成口语CS文本，以改进不同的语音模块。我们的方法基于对等约束理论，同时利用对齐翻译对生成语法有效的CS内容。我们的实证结果显示，对于两个生态和噪声CS测试集，困惑测试的相对增益为29-34%，WER测试的相对增益为2%左右。最后，人类评估表明，83.8%的生成数据是人类可以接受的。
摘要：The pervasiveness of intra-utterance Code-switching (CS) in spoken content has enforced ASR systems to handle mixed input. Yet, designing a CS-ASR has many challenges, mainly due to the data scarcity, grammatical structure complexity, and mismatch along with unbalanced language usage distribution. Recent ASR studies showed the predominance of E2E-ASR using multilingual data to handle CS phenomena with little CS data. However, the dependency on the CS data still remains. In this work, we propose a methodology to augment the monolingual data for artificially generating spoken CS text to improve different speech modules. We based our approach on Equivalence Constraint theory while exploiting aligned translation pairs, to generate grammatically valid CS content. Our empirical results show a relative gain of 29-34 % in perplexity and around 2% in WER for two ecological and noisy CS test sets. Finally, the human evaluation suggests that 83.8% of the generated data is acceptable to humans.

【2】 Applying Word Embeddings to Measure Valence in Information Operations Targeting Journalists in Brazil
标题：在巴西以记者为目标的信息操作中应用词嵌入来衡量价位
链接：https://arxiv.org/abs/2201.02257

作者：David A. Broniatowski
机构：Department of Engineering Management and Systems Engineering, The George Washington University, Washington, DC, USA
摘要：信息业务的目标之一是改变针对特定参与者的总体信息环境。例如，“推特运动”试图破坏特定公众人物的信誉，导致其他人不信任他们，并威胁这些人物保持沉默。为了实现这些目标，信息运营部门经常使用“巨魔”——恶意的网络参与者，他们对这些人物进行口头攻击。特别是在巴西，巴西现任总统的盟友被指控操纵“仇恨内阁”——这是一项针对指控这位政治家及其政权其他成员腐败的记者的跟踪行动。检测有害言论的领先方法，如谷歌的透视API，试图识别含有有害内容的特定信息。虽然这种方法有助于识别要降级、标记或删除的内容，但众所周知，它很脆弱，可能会错过在话语中引入更微妙偏见的尝试。在这里，我们的目标是制定一项措施，用于评估有针对性的信息行动如何寻求改变特定行为者的总体价值或评价。初步结果表明，已知的活动以女性记者为目标，而非男性记者，而且这些活动可能会在整个推特话语中留下可察觉的痕迹。
摘要：Among the goals of information operations are to change the overall information environment vis-\'a-vis specific actors. For example, "trolling campaigns" seek to undermine the credibility of specific public figures, leading others to distrust them and intimidating these figures into silence. To accomplish these aims, information operations frequently make use of "trolls" -- malicious online actors who target verbal abuse at these figures. In Brazil, in particular, allies of Brazil's current president have been accused of operating a "hate cabinet" -- a trolling operation that targets journalists who have alleged corruption by this politician and other members of his regime. Leading approaches to detecting harmful speech, such as Google's Perspective API, seek to identify specific messages with harmful content. While this approach is helpful in identifying content to downrank, flag, or remove, it is known to be brittle, and may miss attempts to introduce more subtle biases into the discourse. Here, we aim to develop a measure that might be used to assess how targeted information operations seek to change the overall valence, or appraisal, of specific actors. Preliminary results suggest known campaigns target female journalists more so than male journalists, and that these campaigns may leave detectable traces in overall Twitter discourse.

其他神经网络|深度学习|模型|建模(1篇)

【1】 Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping
标题：重新利用现有的深层网络进行字幕和美学引导的图像裁剪
链接：https://arxiv.org/abs/2201.02280

作者：Nora Horanyi,Kedi Xia,Kwang Moo Yi,Abhishake Kumar Bojja,Ales Leonardis,Hyung Jin Chang
机构：University of Birmingham, United Kingdom, Zhejiang University, China, University of Victoria, Canada
备注：None
摘要：我们提出了一个新的优化框架，该框架基于用户描述和美学裁剪给定的图像。与现有的图像裁剪方法不同，我们通常训练深度网络回归到裁剪参数或裁剪操作，我们建议通过在图像字幕和美学任务上重新利用预先训练的网络来直接优化裁剪参数，而无需任何微调，从而避免训练单独的网络。具体而言，我们搜索最佳作物参数，以最小化这些网络初始目标的综合损失。为了制作优化表，我们提出了三种策略：（i）多尺度双线性采样，（ii）对裁剪区域的尺度进行退火，从而有效地减少参数空间，（iii）聚合多个优化结果。通过各种定量和定性评估，我们表明，我们的框架可以生产出与预期用户描述一致且美观的作物。
摘要：We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization table, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.

其他(1篇)

【1】 The Defeat of the Winograd Schema Challenge
标题：Winograd Schema挑战赛的失败
链接：https://arxiv.org/abs/2201.02387

作者：Vid Kocijan,Ernest Davis,Thomas Lukasiewicz,Gary Marcus,Leora Morgenstern
机构：University of Oxford, Department of Computer Science, Oxford, OX,QD, UK, New York University, Department of Computer Science, Mercer St, NY , Alan Turing Institute, Euston Rd, London NW,DB, Robust AI, Portage Avenue Palo Alto, CA , United States
摘要：Winograd图式挑战——一组涉及代词指称消歧的双句子，似乎需要使用常识知识——由Hector Levesque于2011年提出。到2019年，许多人工智能系统基于预先训练好的大型基于Transformer的语言模型，并针对这类问题进行了微调，实现了超过90%的准确率。在本文中，我们回顾了Winograd模式挑战的历史并评估其意义。
摘要：The Winograd Schema Challenge -- a set of twin sentences involving pronoun reference disambiguation that seem to require the use of commonsense knowledge -- was proposed by Hector Levesque in 2011. By 2019, a number of AI systems, based on large pre-trained transformer-based language models and fine-tuned on these kinds of problems, achieved better than 90% accuracy. In this paper, we review the history of the Winograd Schema Challenge and assess its significance.

机器翻译，仅供参考

点击“阅读原文”获取带摘要的学术速递

一把短刀，怎么就让他连捅18人？！

黄晓菁，这位杭州泰隆银行女员工自爆视频火了，带给我们那些思考？

这一刻，快乐被具象化了

海南省拟任干部人选公

女高管与男下属上班约会开房，男方妻子闹到单位！被开除后她辩称：一直保持0.46-1.22米“个人距离”

自然语言处理学术速递[1.10]

您可能也对以下帖子感兴趣

一把短刀，怎么就让他连捅18人？！

黄晓菁，这位杭州泰隆银行女员工自爆视频火了，带给我们那些思考？

这一刻，快乐被具象化了

海南省拟任干部人选公

女高管与男下属上班约会开房，男方妻子闹到单位！被开除后她辩称：一直保持0.46-1.22米“个人距离”

生成图片，分享到微信朋友圈

自然语言处理学术速递[1.10]

您可能也对以下帖子感兴趣