Please use this identifier to cite or link to this item:
http://hdl.handle.net/10773/9275
Title: | Dynamic language modeling for European Portuguese |
Author: | Martins, Ciro Teixeira, António Neto, João |
Keywords: | Vocabulary selection Language modeling Information retrieval techniques Automatic speech recognition (ASR) Broadcast news transcription |
Issue Date: | Oct-2010 |
Publisher: | Elsevier |
Abstract: | This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size. (C) 2010 Elsevier Ltd. All rights reserved. |
Peer review: | yes |
URI: | http://hdl.handle.net/10773/9275 |
DOI: | 10.1016/j.csl.2010.02.003 |
ISSN: | 0885-2308 |
Appears in Collections: | ESTGA - Artigos |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Computer-Speech-language-2010.pdf | Documento principal | 843.47 kB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.