Web-based tool for searching tables’ contents

Oliveira, Alexandre Daniel Moreira

Please use this identifier to cite or link to this item: http://hdl.handle.net/10773/25876

Title:	Web-based tool for searching tables’ contents
Other Titles:	Ferramenta web para pesquisa em conteúdos de tabelas
Author:	Oliveira, Alexandre Daniel Moreira
Advisor:	Matos, Sérgio Guilherme Aleixo de
Keywords:	Text Mining Table Mining Concept Recognition Information Retrieval Bioinformatics
Defense Date:	2018
Abstract:	The number of biomedical articles is constantly growing and researchers have more and more difficulty to efficiently find relevant information, compare results and identify new hypotheses. Text mining techniques have been explored to develop systems with the aim of providing easy and fast access to scientific literature. The problem is that most of these tools completely ignore tables and just process textual parts. This dissertation focuses on the analysis and indexing of tables extracted from scientific articles, as they often include a lot of information that can be useful to researchers and it is not available in the remaining content of the publications. So, the main objective of the work is to create a flexible indexing structure to handle different table formats and recognize biomedical concepts referred in the tables themselves, their captions and texts that reference them. A web-based tool was developed to allow users to search and visualize annotated tables extracted from scientific articles. The solution found uses some open-source frameworks, namely Neji for concept recognition and Elasticsearch for text indexing. O número de artigos biomédicos está constantemente a crescer e os investigadores têm cada vez mais dificuldade em encontrar informação relevante, comparar resultados e identificar novas hipóteses de forma eficiente. As técnicas de mineração de texto têm sido exploradas para desenvolver sistemas que forneçam acesso fácil e rápido à literatura científica. O problema é que muitas destas ferramentas ignoram completamente as tabelas e apenas processam as partes textuais. Esta dissertação foca-se na análise e indexação de tabelas extraídas de artigos científicos, dado que muitas vezes estas incluem bastante informação que pode ser útil para os investigadores e não está disponível no restante conteúdo das publicações. Assim, o principal objetivo deste trabalho é criar uma estrutura de indexação flexível capaz de lidar com diferentes formatos de tabelas e identificar conceitos biomédicos referidos nas próprias tabelas, nas legendas e no texto que referencia as tabelas. Foi então desenvolvida uma ferramenta web que permite aos utilizadores pesquisar e visualizar tabelas anotadas extraídas de artigos científicos. A solução encontrada usa algumas ferramentas de código aberto, nomeadamente o Neji para o reconhecimento de conceitos e o Elasticsearch para a indexação de texto.
URI:	http://hdl.handle.net/10773/25876
Appears in Collections:	UA - Dissertações de mestrado DETI - Dissertações de mestrado

Files in This Item:

File	Description	Size	Format
Dissertação.pdf		3.09 MB	Adobe PDF	View/Open

Show full item record