wzatv:速记员即将被淘汰,未来 AI 可以把一切转录为文

时间:2017-07-12 08:25来源:668论坛 作者:开奖直播现场 点击:

编者按:人工智能势不可当。虽然尚不完美,却极有可能在未来取代打字员,将人类从打字的繁琐中解放出来,甚至使人们摆脱设备的束缚。便捷、高效、低廉的人工智能转录还将对未来社会产生哪些影响?本文编译自GREG NOONE在 the Atlantic上发表的“When AI Can Transcribe Everything”。

怎样才是描述报业大亨鲁伯特·默多克(Rupert Murdoch)被奶油派砸了一脸的最好方式?这对世界新闻界来说不成问题。几乎所有媒体都报道了在2011年英国议会听证会期间,这位媒介大亨发表证词时发生的意外事件,报道风格从高雅喜剧到低俗喜剧皆由。但这对听证会的官方书记员来说,则是另一回事。通常情况下,书记员的工作只是记录听到的话语。奶油派袭击事件发生后——无论是出于有意选择还是受制于议会的固定风格——书记员决定以最简单的方式,将其标注为“中断”。

What is the best way to describe Rupert Murdoch having a foam pie thrown at his face? This wasn’t much of a problem for the world’s press, who were content to run articles depicting the incident during the media mogul’s testimony at a 2011 parliamentary committee hearing as everything from high drama to low comedy. It was another matter for the hearing’s official tranionist. Typically, a tranionist’s job only involves typing out the words as they were actually said. After the pie attack—either by choice or hemmed in by the conventions of house style—the tranionist decided?to go the simplest route?by marking it as an “[interruption].” ?


Across professional fields, a whole multitude of conversations—meetings, interviews, and conference calls—need to be transcribed and recorded for future reference. This can be a daily, onerous task, but for those willing to pay, the job can be outsourced to a professional tranion service. The service, in turn, will employ staff to transcribe audio files remotely or, as in my own couple of months in the profession, attend meetings to type out what is said in real time.


Despite the recent emergence of browser-based tranion aids, tranion’s an area of drudgery in the modern Western economy where machines can’t quite squeeze human beings out of the equation. That is until last year, when Microsoft built one that could.

微软首席语言科学家黄学东(Xuedong Huang)在苏格兰爱丁堡大学攻读博士课程时,就被自动语音识别(ASR)深深地吸引了。“当时我刚离开中国,”黄学东回忆起用本科水平的美式英语,试图听懂苏格兰口音的教授讲话时的困难,他说,“我希望每个讲师和教授在教室里授课时,都能有字幕。”

Automatic speech recognition, or ASR, is an area that has gripped the firm’s chief speech scientist, Xuedong Huang, since he entered a doctoral program at Scotland’s Edinburgh University. “I’d just left China,” he says, remembering the difficulty he had in using his undergraduate knowledge of the American English to parse the Scottish brogue of his lecturers. “I wished every lecturer and every professor, when they talked in the classroom, could have subtitles.”

为了实现这种实时服务,黄学东和他的团队首先需要创建一个能够追溯转录的程序。人工智能的发展使他们得以利用名为“深度学习”的技术,将该程序训练为能从大量数据中识别出模式。黄学东和他的同事们利用该软件来转录NIST 2000 CTS测试集,这是20多年来作为语音识别工作基准的一组记录谈话。职业打字员在转录两个不同部分的测试时,开奖,分别会出现5.9%和11.3%的错误率。微软团队开发的系统则略微胜过两者。

In order to reach that kind of real-time service, Huang and his team would first have to create a program capable of retrospective tranion. Advances in artificial intelligence allowed them to employ a technique called deep learning, wherein a program is trained to recognize patterns from vast amounts of data. Huang and his colleagues used their software to transcribe the NIST 2000 CTS test set, a bundle of recorded conversations that’s served as the benchmark for speech recognition work for more than 20 years. The error rates of professional tranionists in reproducing two different portions of the test are 5.9 and 11.3 percent. The system built by the team at Microsoft edged past both.
