“这还不是一个实时系统,”黄学东承认,“但它与我们所期望的非常相近了,在我们现有能力的基础上已经到达了极限。实时系统没有那么遥不可及了。” “It wasn’t a real-time system,” acknowledges Huang. “It was very much like we wanted to see, with all the horsepower we have, what is the limit. But the real-time system is not that far off.” 的确,ASR程序已经能够准确地转录采访或会议内容,内容看上去也不再那么荒唐。在上个月微软举办的Build大会上,副总裁沈向洋(Harry Shum)展示了一款PowerPoint转录服务,展示时的语音能够和个人幻灯片相关联。同时,微软也在和苹果、谷歌等公司展开激战,让实时移动翻译应用能够完美地进行转录。 Indeed, the promise of ASR programs capable of accurately transcribing interviews or meetings as they happen no longer seems so outlandish. At Microsoft’s Build conference last month, the company’s vice-president, Harry Shum, demonstrated a PowerPoint tranion service that would allow the spoken words of the presentation to be tied to individual slides. The firm is also in a close race with the likes of Apple and Google to perfect the trans produced by its real-time mobile translation app. 黄学东相信,转录软件将超越人类能力的观点是可以理解的。“完美结果的定义是存在争议的,”他用人类打字员的错误率加以印证。“如何’完美’取决于特定情形和应用。” Huang believes the point at which tranion software will overtake human capabilities is open to interpretation. “The definition of a perfect result would be controversial,” he says, citing the error rates among human tranionists. “How ‘perfect’ this is depends on the scenario and the application.” 如果带有实时转录语言任务的ASR系统,只有在正确理解每个词的情况下才被认为是成功的,那么这在很大程度上已经被Cortana和Siri等手机助手实现了,只是实时翻译应用尚不具备这种功能。然而,越来越多的计算机科学家意识到,对于自动转录音频的要求并不需要那么高,文本中的错误可以之后修改。 An ASR system tasked with transcribing speech in real time is only deemed successful if every word is interpreted correctly, something that largely has been achieved with mobile assistants like Cortana and Siri, but has yet to be mastered in real-time translation apps.? However, a growing number of computer scientists are realizing that standards do not need to be as high when it comes to the automatic tranion of recorded audio, where any mistakes in the text can be amended after the fact. “我们并不声称…这是完美的。只是在拥有优质音频的情况下,它能够接近完美。” “We don’t claim ... this is perfect. But, with good audio, it can be close to perfect.” 两家公司——位于伦敦的Trint和推出SwiftCribe应用的中国互联网巨头百度——已经推出了基于浏览器的工具,能够将一小时以内的音频转录为文本,且错误率在5%以内。在页面上,它们的输出和我作为自由职业打字员参加许多会议期间实时打出的原始文档相似,最好时像詹姆斯·乔伊斯(Joycean)的意识流巨作,最糟时像一篇官样文章。但是通过把用户从转录员变为编辑,这两个程序都能够免去数小时繁琐而不能分心的任务。 Two companies—Trint, a start-up in London,and Baidu, the Chinese internet giant with an application called?SwiftScribe—have begun to offer browser-based tools that can convert recordings of up to an hour into text with a word-error rate of 5 percent or less.*?On the page, their output looks very similar to the raw documents I typed out in real-time during the many meetings I attended as a freelance tranionist: at best, a Joycean stream-of-consciousness marvel, and at worst, gobbledygook. But by turning the user from a scribe into an editor, both programs can shave hours off an onerous and distracting task. 当然,节省的时间取决于音频的质量。Trint和SwiftScribe在转录几乎无噪音的面对面访谈时表现出色,在转录嘈杂房间中的录音、信号不佳的电话访谈或带有非美式或英式英语口音时则十分吃力。我尝试过对Trint播放一段德国口音的英语,atv,却看到它把“天气相当冷,但气氛不错”转录成“那颗心也在呕吐。是的,他的第一面。” The amount of time saved, of course, is contingent on the quality of the audio. Trint and SwiftScribe tend to make short work of face-to-face interviews with the bare minimum of ambient noise, but struggle to transcribe recordings of crowded rooms, telephone interviews with bad reception, or anyone who speaks with an accent that isn’t American or British English. My attempt to run a recording of a German-accented speaker through Trint, for example, saw the engine interpret “it was rather cold, but the atmosphere was great” as “That heart is also all barf. Yes. His first face.” (责任编辑:本港台直播) |