Web-based tool for searching tables’ contents

Oliveira, Alexandre Daniel Moreira

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10773/25876

Título:	Web-based tool for searching tables’ contents
Outros títulos:	Ferramenta web para pesquisa em conteúdos de tabelas
Autor:	Oliveira, Alexandre Daniel Moreira
Orientador:	Matos, Sérgio Guilherme Aleixo de
Palavras-chave:	Text Mining Table Mining Concept Recognition Information Retrieval Bioinformatics
Data de Defesa:	2018
Resumo:	The number of biomedical articles is constantly growing and researchers have more and more difficulty to efficiently find relevant information, compare results and identify new hypotheses. Text mining techniques have been explored to develop systems with the aim of providing easy and fast access to scientific literature. The problem is that most of these tools completely ignore tables and just process textual parts. This dissertation focuses on the analysis and indexing of tables extracted from scientific articles, as they often include a lot of information that can be useful to researchers and it is not available in the remaining content of the publications. So, the main objective of the work is to create a flexible indexing structure to handle different table formats and recognize biomedical concepts referred in the tables themselves, their captions and texts that reference them. A web-based tool was developed to allow users to search and visualize annotated tables extracted from scientific articles. The solution found uses some open-source frameworks, namely Neji for concept recognition and Elasticsearch for text indexing. O número de artigos biomédicos está constantemente a crescer e os investigadores têm cada vez mais dificuldade em encontrar informação relevante, comparar resultados e identificar novas hipóteses de forma eficiente. As técnicas de mineração de texto têm sido exploradas para desenvolver sistemas que forneçam acesso fácil e rápido à literatura científica. O problema é que muitas destas ferramentas ignoram completamente as tabelas e apenas processam as partes textuais. Esta dissertação foca-se na análise e indexação de tabelas extraídas de artigos científicos, dado que muitas vezes estas incluem bastante informação que pode ser útil para os investigadores e não está disponível no restante conteúdo das publicações. Assim, o principal objetivo deste trabalho é criar uma estrutura de indexação flexível capaz de lidar com diferentes formatos de tabelas e identificar conceitos biomédicos referidos nas próprias tabelas, nas legendas e no texto que referencia as tabelas. Foi então desenvolvida uma ferramenta web que permite aos utilizadores pesquisar e visualizar tabelas anotadas extraídas de artigos científicos. A solução encontrada usa algumas ferramentas de código aberto, nomeadamente o Neji para o reconhecimento de conceitos e o Elasticsearch para a indexação de texto.
URI:	http://hdl.handle.net/10773/25876
Aparece nas coleções:	UA - Dissertações de mestrado DETI - Dissertações de mestrado

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
Dissertação.pdf		3.09 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato completo