Styles of Truncated Backpropagation

In my post on Recurrent Neural Networks in Tensorflow, I observed that Tensorflow's approach to truncated backpropagation (feeding in truncated subsequences of length n) is qualitatively different than "backpropagating errors a maximum of n steps". In this post, I explore the differences, and ask whether one approach is better than the other.

First Convergence Bias

In this post, I offer the results of an experiment providing support for "first convergence bias", which includes the proposition that training a randomly initialized network via backpropagation may never converge to a global minimum, regardless of the intialization and number of trials.

Inverting a Neural Net

In this experiment, I "invert" a simple two-layer MNIST model to visualize what the final hidden layer representations look like when projected back into the original sample space.

Representational Power of Deeper Layers

The hidden layers in a neural network can be seen as different representations of the input. Do deeper layers learn "better" representations? In a network trained to solve a classification problem, this would mean that deeper layers provide better features than earlier layers. The natural hypothesis is that this is indeed the case. In this post, I test this hypothesis on an network with three hidden layers trained to classify the MNIST dataset. It is shown that deeper layers do in fact produce better representations of the input.