2 个月前

扩展语言模型:训练Gopher的方法、分析与洞见

Jack W. Rae; Sebastian Borgeaud; Trevor Cai; Katie Millican; Jordan Hoffmann; Francis Song; John Aslanides; Sarah Henderson; Roman Ring; Susannah Young; Eliza Rutherford; Tom Hennigan; Jacob Menick; Albin Cassirer; Richard Powell; George van den Driessche; Lisa Anne Hendricks; Maribeth Rauh; Po-Sen Huang; Amelia Glaese; Johannes Welbl; Sumanth Dathathri; Saffron Huang; Jonathan Uesato; John Mellor; Irina Higgins; Antonia Creswell; Nat McAleese; Amy Wu; Erich Elsen; Siddhant Jayakumar; Elena Buchatskaya; David Budden; Esme Sutherland; Karen Simonyan; Michela Paganini; Laurent Sifre; Lena Martens; Xiang Lorraine Li; Adhiguna Kuncoro; Aida Nematzadeh; Elena Gribovskaya; Domenic Donato; Angeliki Lazaridou; Arthur Mensch; Jean-Baptiste Lespiau; Maria Tsimpoukelli; Nikolai Grigorev; Doug Fritz; Thibault Sottiaux; Mantas Pajarskas; Toby Pohlen; Zhitao Gong; Daniel Toyama; Cyprien de Masson d'Autume; Yujia Li; Tayfun Terzi; Vladimir Mikulik; Igor Babuschkin; Aidan Clark; Diego de Las Casas; Aurelia Guy; Chris Jones; James Bradbury; Matthew Johnson; Blake Hechtman; Laura Weidinger; Iason Gabriel; William Isaac; Ed Lockhart; Simon Osindero; Laura Rimell; Chris Dyer; Oriol Vinyals; Kareem Ayoub; Jeff Stanway; Lorrayne Bennett; Demis Hassabis; Koray Kavukcuoglu; Geoffrey Irving
扩展语言模型:训练Gopher的方法、分析与洞见
摘要

语言模型通过利用大量的人类书面知识库,为实现智能通信系统迈出了重要一步,能够更好地预测和理解世界。在本文中,我们分析了基于Transformer架构的语言模型在不同规模下的性能表现——从参数量为数千万的模型到参数量达到2800亿的模型Gopher。这些模型在152个多样化的任务上进行了评估,大多数任务上均达到了当前最佳性能。规模带来的收益在诸如阅读理解、事实核查和有害语言识别等领域最为显著,但在逻辑推理和数学推理方面则相对较小。我们对训练数据集和模型的行为进行了全面分析,探讨了模型规模与偏见及有害内容之间的关系。最后,我们讨论了语言模型在人工智能安全领域的应用以及如何减轻下游风险。

扩展语言模型:训练Gopher的方法、分析与洞见 | 最新论文 | HyperAI超神经