Utilize este identificador para referenciar este registo: http://hdl.handle.net/10773/9275
Título: Dynamic language modeling for European Portuguese
Autor: Martins, Ciro
Teixeira, António
Neto, João
Palavras-chave: Vocabulary selection
Language modeling
Information retrieval techniques
Automatic speech recognition (ASR)
Broadcast news transcription
Data: Out-2010
Editora: Elsevier
Resumo: This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size. (C) 2010 Elsevier Ltd. All rights reserved.
Peer review: yes
URI: http://hdl.handle.net/10773/9275
DOI: 10.1016/j.csl.2010.02.003
ISSN: 0885-2308
Aparece nas coleções: ESTGA - Artigos

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
Computer-Speech-language-2010.pdfDocumento principal843.47 kBAdobe PDFrestrictedAccess


FacebookTwitterLinkedIn
Formato BibTex MendeleyEndnote Degois 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.