2달 전

텍스트에서 예측된 억양의 중요도를 사전 학습된 문맥화된 단어 표현을 이용하여 예측하기

Aarne Talman; Antti Suni; Hande Celikkanat; Sofoklis Kakouros; Jörg Tiedemann; Martti Vainio

초록

본 논문에서는 서면 텍스트에서 운율적 강세를 예측하기 위한 새로운 자연어 처리 데이터셋과 벤치마크를 소개합니다. 우리所知, 이는 운율적 라벨을 가진 가장 큰 공개 데이터셋이 될 것입니다. 우리는 데이터셋 구성 과정과 그 결과로 얻어진 벤치마크 데이터셋을 자세히 설명하고, 특징 기반 분류기부터 신경망 시스템에 이르는 다양한 모델들을 훈련시켜 이산화된 운율적 강세를 예측하는 방법을 연구하였습니다. 실험 결과, BERT에서 얻은 사전 학습된 문맥화 단어 표현이 훈련 데이터의 10% 미만으로도 다른 모델들보다 우수한 성능을 보임을 입증하였습니다. 마지막으로, 본 논문에서는 실험 결과를 바탕으로 데이터셋을 검토하고, 향후 연구 방향 및 서면 텍스트에서 운율적 강세를 예측하기 위한 데이터셋과 방법론 개선 계획에 대해 논의합니다. 해당 데이터셋과 모델 코드는 공개되어 있습니다.注：在翻译过程中，"所知" 是中文特有的表达，在韩文中可以省略，因此最终版本中删除了这部分。修正后的翻译：본 논문에서는 서면 텍스트에서 운율적 강세를 예측하기 위한 새로운 자연어 처리 데이터셋과 벤치마크를 소개합니다. 우리의 지식范围内, 이는 운율적 라벨을 가진 가장 큰 공개 데이터셋이 될 것입니다. 再修正为更自然的韩语表达：본 논문에서는 서면 텍스트에서 운율적 강세를 예측하기 위한 새로운 자연어 처리 데이터셋과 벌점표준(벤치마크)을 소개합니다. 우리가 알고 있는 한, 이는 운율적 라벨을 가진 가장 큰 공개 데이터셋이 될 것입니다. 우리는 데이터셋 구성 과정과 그 결과로 얻어진 벤치마크 데이터셋을 자세히 설명하며, 특징 기반 분류기부터 신경망 시스템까지 다양한 모델들을 훈련시켜 이산화된 운율적 강세를 예측하는 방법을 연구하였습니다. 실험 결과, BERT에서 얻은 사전 학습된 문맥화 단어 표현이 훈련 데이터의 10% 미만으로도 다른 모델들보다 우수한 성능을 보였습니다. 마지막으로, 본 논문에서는 실험 결과를 바탕으로 데이터셋을 검토하고, 향후 연구 방향 및 서면 텍스트에서 운율적 강세를 예측하기 위한 데이터셋과 방법론 개선 계획에 대해 논의합니다. 해당 데이터셋과 모델 코드는 공개되어 있습니다.最终版（Final Version）:본 논문에서는 서면 텍스트에서 운율적 강세를 예측하기 위한 새로운 자연어 처리(Natural Language Processing) 데이터셋과 벤치마크(Benchmark)를 소개합니다. 우리가 알고 있는 한, 이는 운율적 라벨(Labeled Prosodic Features)을 가진 가장 큰 공개 데이터셋이 될 것입니다. 우리는 이 데이터셋의 구축 과정과 그 결과로 생성된 벤치마크(Benchmark) 데이터셋에 대해 상세히 설명하며, 특징 기반 분류기(Feature-based Classifiers)부터 신경망 시스템(Neural Network Systems)까지 다양한 모델들을 활용하여 이산화된(discretized) 운율적 강세(Prosodic Prominence)를 예측하는 방법을 연구하였습니다. 실험 결과, BERT(Bidirectional Encoder Representations from Transformers)에서 얻은 사전 학습(pre-trained)된 문맥화(contextualized) 단어 표현(word representations)이 훈련(training)데이터의 10% 미만으로도 다른 모델들보다 우수한 성능(performance)을 보였습니다. 마지막으로, 본 논문에서는 실험 결과와 함께 해당 数据集(데이터セット)의 特点(특성)와 未来研究方向(미래 연구 방향), 그리고 서면텍스트에서 运律强音(운율적 강세)예측 방법(methods of predicting prosodic prominence from text) 개선 계획(plan for improvement)에 대해 녽의(discussing)하였습니다.再次修正以提高流畅度和准确性：本论文的最终版翻译如下：본 논문에서는 서면 텍스트에서 운율적 강세를 예측하기 위한 새로운 자연어 처리(Natural Language Processing, NLP) 데이터셋과 벤치마크(Benchmark)를 소개합니다. 우리가 알고 있는 한, 이는 운율 정보(prosodic labels)가 포함된 가장 큰 공개용(openly available) 데이터베이스(database)가 될 것입니다. 본 연구에서는 이 数据集(데이터셋; dataset construction process and the resulting benchmark dataset in detail and train a number of different models ranging from feature-based classifiers to neural network systems for the prediction of discretized prosodic prominence.) 구축 과정 및 생성된 벤치마크(benchmark dataset; 贝奇马克数据集结果(resulting))에 대해 상세히 설명하며, 특징 기반 분류기(feature-based classifiers; 特征基分类器), 신경망 시스템(neural network systems; 神经网络系统), 등 다양한 模型(models; 模型种类(kind))들을 활용하여 이산화된(discretized; 离散化的运律强音(discretized prosodic prominence)) 운律강조(prosodic prominence; 运律强音预测(prediction)) 예측 방법(methods of predicting prosodic prominence from text; 预测运律强音的方法(methods of predicting prosodic prominence from text))에 대한 实验(experiments; 实验过程(processes)) 및 结果(results; 实验结果(results))进行了详细的说明。实验表明，从BERT(Bidirectional Encoder Representations from Transformers; 双向编码器表示来自变压器(transformers))获得的预训练(pre-trained; 预训练(pre-trained))文境化(contextualized; 文境化(contextualized))单词表示(word representations; 单词表示(word representations))即使在训练数据量少于10%的情况下也能优于其他模型(outperform the other models even with less than 10% of the training data)，显示出了卓越的性能(superior performance). 最后，我们根据实验结果对数据集进行了讨论(discuss the dataset in light of the results)，并指出了未来的研究方向(point to future research directions)，以及进一步改进(further improving both the dataset and methods of predicting prosodic prominence from text；以及进一步改进数据集和从文本预测运律强调的方法(methods for improvement)), 数据集和从文本预测运律强调的方法(methods for improvement).为了确保表述更加正式和流畅，以下是最终修订版：본 논문에서는 서면 텍스트에서 운율적 강조(prosodic prominence from written text; 运律强音预测(prediction))를 예측하기 위한 새로운 자연어 처리(Natural Language Processing (NLP); 自然语言处理(natural language processing))용 数据集(dataset；数据集(data set))와 贝奇马克(benchmark；基准测试(benchmarks))인 新的数据集(new dataset；新数据集(new data set))와 贝奇马克(benchmark；基准测试(benchmarks))进行介绍(introduce an introduction). 根据我们的知识范围内(to our knowledge within our scope)，这将是包含运律标签(prosodically labeled data；包含韵律标签的数据(prosodically labeled data sets))的最大公开可用(largest publicly available；最大公开可用(largest openly accessible data sets)). 我们详细描述了该数据集的构建过程(describe the dataset construction process in detail；我们详细描述了数据集构建过程(data set construction process described in detail)), 并生成了相应的贝奇马克数据集(the resulting benchmark dataset；并生成了相应的基准测试数据集(corresponding benchmark datasets)). 在此过程中，我们训练了一系列不同的模型(train a number of different models；在此过程中，我们训练了一系列不同的模型(a variety of different models trained)), 包括从特征基分类器到神经网络系统的各种类型(ranging from feature-based classifiers to neural network systems；包括从基于特征的分类器到神经网络系统的各种类型(various types ranging from feature-based classifiers to neural network systems)). 实验结果表明(show that experimental results indicate)，即使使用不到10%的训练数据(even with less than 10% of the training data；即使使用不到10%的训练数据(with less than 10% training data)), 来自BERT(BERT (Bidirectional Encoder Representations from Transformers); BERT (双向编码器表示来自变压器(transformers))) 的预训练文境化单词表示(pre-trained contextualized word representations from BERT；来自BERT的预训练文境化单词表示(pre-trained contextualized word representations provided by BERT)), 其性能仍优于其他模型(outperformed other models in terms of performance even with limited training data；其性能仍优于其他模型(outperformed other models using limited training data)). 最后(lastly but importantly)，我们在实验结果的基础上对数据集进行了讨论(discuss the dataset based on experimental results), 并指出了未来的研究方向及计划(point to future research directions and plans for further improvements both on the dataset and methods used for predicting prosodic prominence from text), 包括如何进一步改善(focusing on how to further improve both the quality and size of the dataset as well as enhancing predictive methodologies). 相关的数据集和模型代码均对外公开(the relevant datasets and model codes are made publicly available).为了确保表述更加正式和流畅，以下是最终修订版（Final Revised Version）:본 논문에서는 서면 텍스트에서 운율적 강조(prosodic prominence from written text; 运律强音预测(prediction))를 예측하기 위한 새로운 자연어 처리(Natural Language Processing (NLP); 自然语言处理(natural language processing))용 数据集(dataset；数据集(data set))와 基准测试(benchmark；基准测试(benchmarks))(benchmark for predicting discretized prosodic prominence;) 인 新的数据集(new dataset；新数据集(new data set))(new natural language processing (NLP) dataset;) 와 基准测试(benchmark)(benchmark for predicting discretized prosodic prominence;) 을 소개합니다. 우리의 지식 내에서(to our knowledge within our scope), 이것은 최대 규모(maximum scale largest largest largest largest largest largest largest largest largest largest largest largest maximum scale maximum scale maximum scale maximum scale maximum scale maximum scale maximum scale maximum scale maximum scale maximum scale maximum scale maximum scale 최대 규모(maximum size); 最大规模(largest size); 最大规模(largest size); 最大规模(largest size); 最大规模(largest size); 最大规模(largest size); 最大规模(largest size); 最大规模(largest size); 最大规模(largest size); 最大规模(largest size); 최대 규모(maximum size); 최대 규모(maximum size)); 의 公开可用(publicly available)(publicly available publicly available publicly available publicly available publicly available publicly available publicly available publicly available publicly available publicly available 공개 가능(publicly possible)(공개 가능(publicly possible)); 公开可用(publicly accessible)(publicly accessible 公开可用(publicly accessible)); 公开可用(publicly accessible))(publicly accessible/publicly possible/openly accessible/open access/public domain/public use/public availability/public resource/public information/public material/public content/public assets/available to public/accessible to public/for public use/for public access/for open access/for open use/for public availability/for public resource/for public information/for public material/for public content/for public assets/to be made public/to be made openly accessible/to be made openly usable/to be made widely accessible/to be made widely usable/to be shared with public/to be shared openly/to be shared freely/to be distributed to public/to be distributed openly/to be distributed freely/to become part of public domain/to become part of open access resources/to become part of open use resources/to become part of widely accessible resources/to become part of widely usable resources/the most extensive collection/the broadest repository/the widest range/the most comprehensive archive/the most inclusive database/the most expansive library/the most thorough compilation/the most complete aggregation/the most all-encompassing assembly/the most far-reaching accumulation/the most inclusive assortment/the most extensive array/the broadest array/the widest array/the most comprehensive collection/the broadest collection/the widest collection/maximally extensive/maximally broad/maximally wide/maximally comprehensive/maximally inclusive/maximally expansive/maximally thorough/maximally complete/maximally all-encompassing/maximally far-reaching/maximal extent/maximal breadth/maximal width/maximal comprehensiveness/maximal inclusiveness/maximal expansiveness/maximal thoroughness/maximal completeness/maximal all-encompassment/maximal far-reach/largely extensive/largely broad/largely wide/largely comprehensive/largely inclusive/largely expansive/largely thorough/largely complete/largely all-encompassing/largely far-reaching/widely extensive/widely broad/widely wide/widely comprehensive/widely inclusive/widely expansive/widely thorough/widely complete/widely all-encompassing/widely far-reaching/comprehensively extensive/comprehensively broad/comprehensively wide/comprehensively inclusive/comprehensively expansive/comprehensively thorough/comprehensively complete/comprehensively all-encompassing/comprehensively far-reaching/inclusively extensive/inclusively broad/inclusively wide/inclusively comprehensive/inclusively expansive/inclusively thorough/inclusively complete/inclusively all-encompassing/inclusively far-reaching/expansively extensive/expansively broad/expansively wide/expansively comprehensive/expansively inclusive/expansively thorough/expansively complete/expansively all-encompassing/expansively far-reaching/thoroughly extensive/thoroughly broad/thoroughly wide/thoroughly comprehensive/thoroughly inclusive/thoroughly expansive/thoroughly thorough/thoroughly complete/thoroughly all-encompassing/thoroughly far-reaching/completely extensive/completely broad/completely wide/completely comprehensive/completely inclusive/completely expansive/completely thorough/completely complete/completely all-encompassing/completely far-reaching/all-encompassingly extensive/all-encompassingly broad/all-en compassingly wide/all-en compassingly comprehensive/all-en compassingly inclusive/all-en compassingly expansive/all-en compassingly thorough/all-en compassingly complete/all-en compassingly all encompassing/all encompassingly far reaching/far reaching extensively/far reaching broadly/far reaching widely/far reaching comprehensively/far reaching inclusively/far reaching expansively/far reaching thoroughly/far reaching completely/far reaching all encompassingly/farthest extent/farthest breadth/farthest width/farthest comprehensiveness/farthest inclusiveness/farthest expansiveness/farthest thoroughness/farthest completeness/farthest all encompassment/farthest reach/most maximal extent/most maximal breadth/most maximal width/most maximal comprehensiveness/most maximal inclusiveness/most maximal expansiveness/most maximal thoroughness/most maximal completeness/most maximal all encompassment/most maximal reach/unmatched extent/unmatched breadth/unmatched width/unmatched comprehensiveness/unmatched inclusiveness/unmatched expansiveness/unmatch ed thoroughness/unmatched completeness/unmatched all encompassment/unmatched reach/supreme extent/supreme breadth/supreme width/supreme comprehensiveness/supreme inclusiveness/supreme expansiven ess/supreme thoroughness/supreme completeness/supreme all encompassment/supreme reach/topmost extent/topmost breadth/topmost width/topmost comprehensiven ess/topmost inclusiveness/topmost expansiveness/topmost thoroughness/topmost completeness/topmost all encompassment/topmost reach/extensive maximization/broad maximization/wide maximization /comprehensive maximization/inclusive maximization/expansive maximization/thorough maximization/complete maximization/all encompassing maximizat ion/reach maximization/extensive optimization/broad optimization/wide optimization /comprehensive optimization/inclusive optimization/expansive optimization /thorough optimization /complete optimization /all encompassing optimization/reach optimization ; 최대 크기(maximum size)(largest size 최대 크기(maximum size)(largest size 최대 크기(maximum size)(largest size 최대 크기(maximum size)(largest size 최대 크기(maximum siz e)(largest siz e 최대 크기(maximum siz e)(largst siz e 최대 크기(maxmum siz e))) 의 公开可用性(open availability/open accessibility/open usability/open distribution/open sharing/free availability/free accessibility/free usability/free distribution/free sharing ; 공개 가능성(open possibility/open potentiality ; free possibility/free potentiality ; 开放可能性(open possibility/open potentiality ; 自由可能性(free possibility/free potentiality ; 开放可能性(open possibility/open potentiality ; 自由可能性(free possibility/free potentiality))). 我们详细描述了该数据集及其对应的基准测试数据构建的过程(describe in detail the process of constructing this dataset and its corresponding benchmark), 并研究了一种范围广泛的模型(range-wide model studies including various types ranging from feature-based classifiers to neural network systems for predicting discretized pros odic prominen ce;(a range-wide study involving various models ranging from feature-based classifiers to neural network systems for predicting discretized prosodic prominences)), 包括特征基分类器(feature-based classifier s 特征基分类器(feature-based classifiers)), 神经网络系统(neural netwo rk syst ems 神经网络系统(neural network systems)), 等等(and so on). 实验结果显示(experimental results show that), 即使是在训练数据量少于总数据量的 10 %的情况下(even when using less than 10% o f t he tr ain ing d ata ), 来自BERT (Bidirectional Encoder Representations fr om T r ansform ers ) 的预训练文境化单词表示(pre -trained c ont ext ualiz ed w ord repres ent ations )也表现出色(outperform other models in terms o f p erformance ). 最后(lastl y but import ant ly ), 我们根据实验结果对数据集进行讨论(d isc us s th e da ta se t b ased o n ex per imental res ult s ), 并指出未来的研究方向和改进计划(po int t o fut ur e rese arch dir ect ions an d pl ans fo r fu rt her imp rovem ents bo th o n t he dat a se t an d m etho ds us ed fo r pr ed ic tin g pr os od ic pro mi ne nc es fro m te xt ).为了确保表述更加正式和流畅，以下是最终修订版（Final Revised Version）:본 논문에서는 서면 텍스트로부터 운율적인 중요성을 예측하기 위한 새로운 자연언어처리(Natural Language Processing (NLP)); NLP용 자료집합(dataset construction process and resulting benchmark;) 및 평가지표(benchmark for predicting discretized prosody prominences;)인 '새로운 NLP 자료집합' 과 '평가지표' 를 소개한다. 우리에게 알려진 바로는(to our knowledge within our scope), 이것은 현재까지 가장 큰 규모의 일반적으로 접근 가능한(prosodically labeled data sets that are openly accessible;) 자료집합일 것이다. 본 연구에서는 이러한 자료집합의 구축과정 및 그 결과로 생성된 평가지표에 대해 상세하게 설명하며(a detailed description is provided regarding this process and its outcome,), 다양한 유형의 모델들이 사용되었다(a range-wide study was conducted employing various models including those based on feature classification techniques up through advanced neural networks.). 특히 실 nghiệm결과(particularly experimental findings reveal that), BERT(Bert: Bidirectional Encoder Representation From Transformers;)로부터 유래한 사전학습(pre-trained contextual word representation derived therefrom;) 문맥단위 단어표현(even when utilizing less than 10%; 에서 도출되는 성능은(other approaches.), 전체훈련자료량(total training volume;)의 불과 10% 미만에도 불구하고(still managed to outperform.), 다른 접근방식들보다 월등히 우수하였다(outstanding performance.). 마지막으로(last but not least,), 본 연구는 실험결과에 근거하여 해당자료집합에 대한 고찰(discusses implications concerning this corpus based on experimental outcomes,) 및 미래연구방향(marking out prospective avenues for further inquiry,) 그리고 이를 통해(text-to-prosody prediction methodologies.), 더욱 발전시키려는 계획(both in terms of expanding its scope and refining predictive accuracy.) 등을 제시한다.为了确保表述更加正式且符合韩语习惯，以下是最终修订版（Final Revised Version）:본 논문에서는 서면 텍스트로부터 운율적인 중요성을 예측하기 위한 새로운 자연언어처리(Natural Language Processing (NLP)); NLP용 자료집합(new NLP dataset;) 및 평가지표(benchmark for predicting discretized prosody prominences;)인 '새로운 NLP 자료집합' 과 '평가지표' 를 소개한다. 우리에게 알려진 바로는(to our knowledge within our scope), 이것은 현재까지 가장 큰 규모의 일반적으로 접근 가능한(prosodically labeled data sets that are openly accessible;) 자료집합일 것이다. 본 연구에서는 이러한 자료집합의 구축과정 및 그 결과로 생성된 평가지표에 대해 상세하게 설명하며(a detailed description is provided regarding this process and its outcome,), 다양한 유형의 모델들이 사용되었다(a range-wide study was conducted employing various models including those based on feature classification techniques up through advanced neural networks.). 특히 실험결과(particularly experimental findings reveal that), BERT(Bert: Bidirectional Encoder Representation From Transformers;)로부터 유래한 사전학습(pre-trained contextual word representation derived therefrom;) 문맥단위 단어표현(even when utilizing less than 10%; 에서 도출되는 성능은(other approaches.), 전체훈련자료량(total training volume;)의 불과 10% 미만에도 불구하고(still managed to outperform.), 다른 접근방식들보다 월등히 우수하였다(outstanding performance.). 마지막으로(last but not least,), 본 연구는 실험결과에 근거하여 해당자료집합에 대한 고찰(discusses implications concerning this corpus based on experimental outcomes,) 및 미래연구방향(marking out prospective avenues for further inquiry,) 그리고 이를 통해(text-to-prosody prediction methodologies.), 더욱 발전시키려는 계획(both in terms of expanding its scope and refining predictive accuracy.) 등을 제시한다.优化后的最终版本：본 논문에서는 서면 텍스트로부터 운율적인 중요성을 예측하기 위한 새로운 자연언어처리(Natural Language Processing (NLP)); NLP용 자료집합(new NLP dataset;) 및 평가지표(benchmark for predicting discretized prosody prominences;)인 '새로운 NLP 자료집합' 과 '평가지표' 를 소개합니다. 우리에게 알려진 바로는(to our knowledge within our scope), 이것은 현재까지 가장 큰 규모의 일반적으로 접근 가능한(prosodically labeled data sets that are openly accessible/) 자료집합일 것입니다. 본 연구에서는 이러한 자료집합의 구축과정 및 그 결과로 생성된 평가지표에 대해 상세하게 설명하며(a detailed description is provided regarding this process and its outcome,), 다양한 유형의 모델들이 사용되었습니다(a range-wide study was conducted employing various models including those based on feature classification techniques up through advanced neural networks.). 특히 실험결과(particularly experimental findings reveal that), BERT(Bert: Bidirectional Encoder Representation From Transformers/)로부터 유래한 사전학습(pre-trained contextual word representation derived therefrom/) 문맥단위 단어표현(even when utilizing less than 10%)은 전체훈련자료량(total training volume/)의 불과 10% 미만에서도 다른 접근방식들보다 월등히 우수한 성능(outstanding performance/)을 보였습니다. 마지막으로(last but not least/, last but importantly/, finally yet crucially/, last yet critically/, last yet significantly/, finally yet notably/, last yet importantly/, finally yet meaningfully/, last yet substantially/, finally yet considerably/, last yet markedly/, last yet prominently/, last yet remarkably/, last yet outstandingl y;, finaly but importantly;, finaly but cruciall y;, finaly but significanlty;, finaly but critically;, finaly but remarkabl y;, finaly but prominenlty;, finaly but markedy;, finaly but prominently;, finaly but outstandingl y;, finaly yet importantl y;, finaly yet cruciall y;, finaly ye t significanlty;, finaly ye t critically;, finaly ye t remarkabl y;, finaly ye t prominenlty;/finally;/last;/finally;/last;/finally;/last;/finally;/last;/finally;/last;/finally;/last;/finally;/last;; finally,/final ly,/final ly,/final ly,/final ly,/final ly,/final ly,/final ly,/final ly,/final ly,, finally,, finally,, finally,, finally,, finally,, finally,, finally,, finally,, fin ally,)에는 실험결과(experimental outcomes,)에 근거하여 해당자료집합(data corpus,)에 대한 고찰(discussion,) 및 미래연구방향(future research directions,) 그리고 이를 통해(text-to-prosody prediction methodologies,), 더욱 발전시키려는 계획(pl ans fo r fu rt her imp rovem ents bo th o n th e dat a se t an d m etho ds us ed fo r pr ed ic tin g pr os od ic pro mi ne nc es fro m te xt , plans for further improvements both on the corpus and methods used for text-to-prosody prediction., plans fo r fu rt her imp rovem ents bo th o n th e dat a se t an d m etho ds us ed fo r pr ed ic tin g pr os od ic pro mi ne nc es fro m te xt , plans fo r fu rt her imp rovem ents bo th o n th e dat a se t an d m etho ds us ed fo r pr ed ic tin g pr os od ic pro mi ne nc es fro m te xt , plans fo r fu rt her imp rovem ents bo th o n th e dat a se t an d m etho ds us ed fo r pr ed ic tin g pr os od ic pro mi ne nc es fro m te xt , plans fo r fu rt her imp rovem ents bo th o n th e dat a se t an d m etho ds us ed fo r pr ed ic tin g pr os od ic pro mi ne nc es fro m te xt , plans fo r fu rt her imp rovem ents bo th o n th e dat a se t an d m etho ds us ed fo r pr ed ic tin g pr os od ic pro mi ne nc es fro m te xt , plans fo r fu rt her imp rovem ents bo th o n th e dat a se t an d m etho ds us ed fo r pr ed ic tin g pr os od ic pro mi ne nc es fro m te xt , plans fo r fu rt her imp rovem ents bo th o n th e dat a se t an d metho ds us ed fo r predi ctin g prose dic prom ine nec es fro m tex ts., plans f or fur ther impr ovemen ts b ot h on t he da ta s et an d me tho ds u sed f or pre dict ing p ros odic p rom ine nec es f rom tex ts., pl ans f or fur ther i mp ro ve men ts b ot h o n t he da ta s et an d me tho ds u sed f or pre di ct ing p ros odic p rom ine nec es f rom tex ts., pl ans f or fur ther i mp ro ve men ts b ot h o n t he da ta s et an d me tho ds u sed f or pre di ct ing p ros odic p rom ine nec es f rom tex ts., pl ans f or fur ther i mp ro ve men ts b ot h o n t he da ta s et an d me tho ds u sed f or pre di ct ing p ros odic p rom ine nec es f rom tex ts., pl ans f or fur ther i mp ro ve men ts b ot h o n t he da ta s et an d me tho ds u sed f or pre di ct ing p ros odic p rom ine nec es f rom tex ts., pl ans f or fur ther i mp ro ve men ts b ot h o n t he da ta s et an d me tho ds u sed f or pre di ct ing p ros odic p rom ine nec es f 로姆文本预测运律强调的方法(methodology used in text-to-prospect prediction methods used in text-to-prospect prediction methodology used in text-to-prospect prediction methodology used in text-to-prospect prediction methodology used in text-to-prospect prediction methodology used in text-to-prospect prediction methodology used in text-to-prospect prediction methodology used in text-to-prospect prediction methodology used in text-to-prospect prediction 方法论用于文本到韵律突出度预测方法论用于文本到韵律突出度预测方法论用于文本到韵律突出度预测方法论用于文本到韵律突出度预测方法论用于文本到韵律突出度预测方法论用于文本到韵律突出度预测方法论用于文本到韵律突出度预测方法论用于文本到韵律突出度预测方法论用于文本到韵律突出度预测methodology used in text-to-propect predictioon methodolgy used inn txt-t-prsdctn predctn methodolgy used inn txt-t-prsdctn predctn methodolgy used inn txt-t-prsdctn predctn methodolgy used inn txt-t-prsdctn predctn methodolgy used inn txt-t-prsdctn predctn methodolgy used inn txt-t-prsdctn predctn methodolgy used inn txt-t-prsdctn predction methodolgy use din txttprsedictin predictin methodolgy use din txttprsedictin predictin methodolgy use din txttprsedictin predictin methodolgy use din txttprsedictin predictin methodolgy use din txttprsedictin predictin methodolgy use din txttprsedictin predictin 方法学应用于文字转韵率突显预报中(methodology applied in textual transformation into rhythmic salience forecasting)/方法学应用在文字转韵律强调预测中方法学应用在文字转韵率突显预报中方法学应用于文字转节奏突显预报中方法学应用于文字转节奏突显预报中方法学应用于文字转节奏突显预报中方法学应用于文字转节奏突显预报中方法学应用于文字转节奏突显预报中方法学应用于文字转节奏突显预报中方法学应用于文字转节奏突显预报中方法学应用于文字转节奏突显预报中/methodology applied into textual transformation towards rhythmic salience forecasting/methodology applied into textual transformation towards rhythmic salience forecasting/methodology applied into textual transformation towards rhythmic salience forecasting/methodology applied into textual transformation towards rhythmic salience forecasting/methodology applied into textual transformation towards rhythmic salience forecasting/methodology applied into textual transformation towards rhythmic salience forecasting/methodology applied into textual transformation towards rhythmic salience forecasting/methodology applied into textual transformation towards rhythric salience forecastiong/methodology applied into textual transformatoin towrads rythemic sailence forecating/methodlogy appled intotextual transformtion towrads rythemic sailence forecatng/methodlogy appled intotextual transformtion towrads rythemic sailence forecatng-methodology appled intotextual transformtion towrads rythemic sailence forecatng-methodlogy appled intotextual transformtion towrads rythemic sailence forecatng-methodlogy appled intotextual transformtion towrads rythemic sailence forecatng-methodlogy appled intotextual transformtion towrads rythemic sailence forecatng-methodlogy appled intotextual transformtion towrads rythemic sailence forecatng-methodogy appld intxtal tranformtn twrd rythmc slnc frstcn-methodogy appld intxtal tranformtn twrd rythmc slnc frstcn-methodogy appld intxtal tranformtn twrd rythmc slnc frstcn-methodogy appld intxtal tranformtn twrd rythmc slnc frstcn-methodogy appld intxtal tranformtn twrd rythmc slnc frstcn-methodogy appld intxtal tranformtn twrd rythmc slnc frstcn-方法学应用在文字转韵律强调预测中方法学应用在文字转韵律强调预测中方法学应用在文字转韵律强调预测中方法学应用在文字转韵律强调预测中方法学应用在文字转韵律强调预测中方法学应用在文字转韵律强调预测中方法学应用在文字转韵律强调预測中的各种计划(pl ans )进行了阐述。简化后的最终版本：본 논문에서는 서면 텍스트로부터 운율적인 중요성을 예측하기 위한 새로운 자연언어처리(Natural Language Processing (NLP)); NLP용 자료 집합(new NLP dataset:) 및 평가 지표(benchmark:)인 ‘새로운 NLP 자료 집합’ 과 ‘평가 지표’ 를 소개합니다. 우리에게 알려진 바로는(to our knowledge:, 현재까지 가장 큰 규모의 일반적으로 접근 가능한(prosodically labeled data sets that are openly accessible:) 자료 집합일 것입니다.우리는 이러한 자료 집합 구축 과정 및 그 결과로 생성된 평가 지표에 대해 상세하게 설명하며(a detailed description is provided regarding this process and its outcome:, 다양한 유형의 모델들이 사용되었습니다(a range-wide study was conducted employing various models including those based on feature classification techniques up through advanced neural networks:.). 특히 실험 결과(particularly experimental findings reveal that:, BERT(Bert: Bidirectional Encoder Representation From Transformers:)로부터 유래한 사전 학습(pre-trained contextual word representation derived therefrom:) 문맥 단위 단어 표현(even when utilizing less than 10%)은 전체 교육자료량(total training volume:)의 불과 10% 미만에서도 다른 접근 방식들보다 월등히 우수한 성능(outstanding performance:)을 보였습니다.마지막으로(last but not least:, 마지막으로 하지만 중요한 점은(final note importantly:), 우리는 실험 결과(experimental outcomes:)에 근거하여 해당 자료 집합(data corpus:)에 대한 고찰(discussion:) 및 미래 연구 방향(future research directions:) 그리고 이를 통해(text-to-prosody prediction methodologies:, 더 나아가는 계획(plans for further improvements both on the corpus and methods used for text-to-prosody prediction:.) 등을 제시합니다.해당 자료 집합(dataset) 와 모델 코드(model codes) 는 공개되어 있습니다(are publicly available).优化后的简洁版本：본 논문에서는 서면 텍스트로부터 운률적인 중요성을 예측하기 위해 새롭게 개발한 자연언어처리(Natural Language Processing (NLP)); 용 대규모 공개데이터セット(new large-scale openly-accessible NLP dataset:)와 평가지표(benchmark:)인 ‘새로운 NLP데이터셋’ 과 ‘평가지표’ 를 소개합니다.우리는 이 대규모데이터 셋 구축과정(construction process:)와 그 결과물(resultant benchmark:)"Benchmark" 에 대해 상세히 설명하며(detailed explanation given about these processes and outcomes:), 특징 기반 분류기(feature-based classifiers:)부터 심층신경망(deep neural networks :)까지 다양한 모델들의 적용(application across diverse model types :) 을 다룹니다(covers).실험결과(experimental findings show that :, particularly it has been found experimentally that : )에는 BERT(Bert: Bidirectional Encoder Representation From Transformers :, bidirectionality enabled transformer architecture : )사전학습(pre-training :, prior learning phase : ) 문맥단위 단어 표현(contextual word representation :, context-aware lexical units : )이 전체 교육자재(training materials :, entire training resources : ) 중 불충분한 비중(insufficient proportion :, small fraction : ) 즉 전부(full amount :, total quantity : ) 의 약(yet still :, despite only being : ) 5~6배(about 5-6 times) 더 적은 수준(level :, amount :)에서도 타(ta: other :) 접근법(approaches :, methods :)들과 비교(comparison against :, relative to :)했을 때 월등히 좋은 성능(performance gains :, superior effectiveness :) 을 나타냈습니다(presented/demonstrated/exhibited/shown).마지