Please use this identifier to cite or link to this item:
Title: Dynamic language modeling for European Portuguese
Author: Martins, Ciro
Teixeira, António
Neto, João
Keywords: Vocabulary selection
Language modeling
Information retrieval techniques
Automatic speech recognition (ASR)
Broadcast news transcription
Issue Date: Oct-2010
Publisher: Elsevier
Abstract: This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size. (C) 2010 Elsevier Ltd. All rights reserved.
Peer review: yes
DOI: 10.1016/j.csl.2010.02.003
ISSN: 0885-2308
Appears in Collections:ESTGA - Artigos

Files in This Item:
File Description SizeFormat 
Computer-Speech-language-2010.pdfDocumento principal843.47 kBAdobe PDFrestrictedAccess

Formato BibTex MendeleyEndnote Degois 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.