Insights on neural networks

Jesus, Ricardo Jorge Bastos Cordeiro de

Please use this identifier to cite or link to this item: http://hdl.handle.net/10773/29562

Title:	Insights on neural networks
Other Titles:	Observações em redes neuronais
Author:	Jesus, Ricardo Jorge Bastos Cordeiro de
Advisor:	Aguiar, Rui Luís Andrade Dorogovtsev, Sergey
Keywords:	Artificial neural networks Deep learning Machine learning Artificial intelligence Initialization effects
Defense Date:	Dec-2019
Abstract:	The many advances that machine learning, and especially its workhorse, deep learning, has provided to our society are undeniable. However, there is an increasing feeling that the field has become little understood, with researchers going as far as to make the analogy that it has developed into a form of alchemy. There is the need for a deeper understanding of the tools being used since, otherwise, one is only making progress in the dark, frequently relying on trial and error. In this thesis, we experiment with feedforward neural networks, trying to deconstruct the phenomenons we observe, and finding their root cause. We start by experimenting with a synthetic dataset. Using this toy problem, we find that the weights of trained networks show correlations that can be well-understood by the structure of the data samples themselves. This insight may be useful in areas such as Explainable Artificial Intelligence, to explain why a model behaves the way it does. We also find that the mere change of the activation function used in a layer may cause the nodes of the network to assume fundamentally different roles. This understanding may help to draw firm conclusions regarding the conditions in which Transfer Learning may be applied successfully. While testing with this problem, we also found that the initial configuration of weights of a network may, in some situations, ultimately determine the quality of the minimum (i.e., loss/accuracy) to which the networks converge, more so than what could be initially suspected. This observation motivated the remainder of our experiments. We continued our tests with the real-world datasets MNIST and HASYv2. We devised an initialization strategy, which we call the Dense sliced initialization, that works by combining the merits of a sparse initialization with those of a typical random initialization. Afterward, we found that the initial configuration of weights of a network “sticks” throughout training, suggesting that training does not imply substantial updates — instead, it is, to some extent, a fine-tuning process. We saw this by training networks marked with letters, and observing that those marks last throughout hundreds of epochs. Moreover, our results suggest that the small scale of the deviations caused by the training process is a fingerprint (i.e., a necessary condition) of training — as long as the training is successful, the marks remain visible. Based on these observations and our intuition for the reasons behind them, we developed what we call the Filter initialization strategy. It showed improvements in the training of the networks tested, but at the same time, it worsened their generalization. Understanding the root cause for these observations may prove to be valuable to devise new initialization methods that generalize better. É impossível ignorar os muitos avanços que aprendizagem automática, e em particular o seu método de eleição, aprendizagem profunda, têm proporcionado à nossa sociedade. No entanto, existe um sentimento crescente de que ao longo dos anos a área se tem vindo a tornar confusa e pouco clara, com alguns investigadores inclusive afirmando que aprendizagem automática se tornou na alquimia dos nossos tempos. Existe uma necessidade crescente de (voltar a) compreender em profundidade as ferramentas usadas, já que de outra forma o progresso acontece às escuras e, frequentemente, por tentativa e erro. Nesta dissertação conduzimos testes com redes neuronais artificiais dirigidas, com o objetivo de compreender os fenómenos subjacentes e encontrar as suas causas. Começamos por testar com um conjunto de dados sintético. Usando um problema amostra, descobrimos que a configuração dos pesos de redes treinadas evolui de forma a mostrar correlações que podem ser compreendidas atendendo à estrutura das amostras do próprio conjunto de dados. Esta observação poderá revelar-se útil em áreas como Inteligência Artificial Explicável, de forma a clarificar porque é que um dado modelo funciona de certa forma. Descobrimos também que a mera alteração da função de ativação de uma camada pode causar alterações organizacionais numa rede, a nível do papel que os nós nela desempenham. Este conhecimento poderá ser usado em áreas como Aprendizagem por Transferência, de forma a desenvolver critérios precisos sobre os limites/condições de aplicabilidade destas técnicas. Enquanto experimentávamos com este problema, descobrimos também que a configuração inicial dos pesos de uma rede pode condicionar totalmente a qualidade do mínimo para que ela converge, mais do que poderia ser esperado. Esta observação motiva os nossos restantes resultados. Continuamos testes com conjuntos de dados do mundo real, em particular com o MNIST e HASYv2. Desenvolvemos uma estratégia de inicialização, à qual chamamos de inicialização densa por fatias, que funciona combinado os méritos de uma inicialização esparsa com os de uma inicialização típica (densa). Descobrimos também que a configuração inicial dos pesos de uma rede persiste ao longo do seu treino, sugerindo que o processo de treino não causa atualizações bruscas dos pesos. Ao invés, é maioritariamente um processo de afinação. Visualizamos este efeito ao marcar as camadas de uma rede com letras do abecedário e observar que as marcas se mantêm por centenas de épocas de treino. Mais do que isso, a escala reduzida das atualizações dos pesos aparenta ser uma impressão digital (isto é, uma condição necessária) de treino com sucesso — enquanto o treino é bem sucedido, as marcas permanecem. Baseados neste conhecimento propusemos uma estratégia de inicialização inspirada em filtros. A estratégia mostrou bons resultados durante o treino das redes testadas, mas simultaneamente piorou a sua generalização. Perceber as razões por detrás deste fenómeno pode permitir desenvolver novas estratégias de inicialização que generalizem melhor que as atuais.
URI:	http://hdl.handle.net/10773/29562
Appears in Collections:	UA - Dissertações de mestrado DETI - Dissertações de mestrado

Files in This Item:

File	Description	Size	Format
Documento_Ricardo_Jesus.pdf		30.63 MB	Adobe PDF	View/Open

Show full item record