Binary Stochastic Neurons in Tensorflow

In this post, I introduce and discuss binary stochastic neurons, implement trainable binary stochastic neurons in Tensorflow, and conduct several simple experiments on the MNIST dataset to get a feel for their behavior. Binary stochastic neurons offer two advantages over real-valued neurons: they can act as a regularizer and they enable conditional computation by enabling a network to make yes/no decisions. Conditional computation opens the door to new and exciting neural network architectures, such as the choice of experts architecture and heirarchical multiscale neural networks.

Preliminary Note on the Complexity of a Neural Network

This post is a preliminary note on the "complexity" of neural networks. It's a topic that has not gotten much attention in the literature, yet is of central importance to the general understanding of neural networks. In this post I discuss complexity and generalization in broad terms, and make the argument that network structure (including parameter counts), the training methodology, and the regularizers used, though each different in concept, all contribute to this notion of neural network "complexity".

Written Memories: Understanding, Deriving and Extending the LSTM

When I was first introduced to Long Short-Term Memory networks (LSTMs), it was hard to look past their complexity. I didn't understand why they were designed they way they were designed, just that they worked. It turns out that LSTMs can be understood, and that, despite their superficial complexity, LSTMs are actually based on a couple incredibly simple, even beautiful, insights into neural networks. This post is what I wish I had when first learning about recurrent neural networks (RNNs).

Recurrent Neural Networks in Tensorflow II

This is the second in a series of posts about recurrent neural networks in Tensorflow. In this post, we will build upon our vanilla RNN by learning how to use Tensorflow's scan and dynamic_rnn models, upgrading the RNN cell and stacking multiple RNNs, and adding dropout and layer normalization. We will then use our upgraded RNN to generate some text, character by character.

Styles of Truncated Backpropagation

In my post on Recurrent Neural Networks in Tensorflow, I observed that Tensorflow's approach to truncated backpropagation (feeding in truncated subsequences of length n) is qualitatively different than "backpropagating errors a maximum of n steps". In this post, I explore the differences, and ask whether one approach is better than the other.

Recurrent Neural Networks in Tensorflow I

This is the first in a series of posts about recurrent neural networks in Tensorflow. In this post, we will build implement a vanilla recurrent neural network (RNN) from the ground up in Tensorflow, and then translate the model into Tensorflow's RNN API.

First Convergence Bias

In this post, I offer the results of an experiment providing support for "first convergence bias", which includes the proposition that training a randomly initialized network via backpropagation may never converge to a global minimum, regardless of the intialization and number of trials.

Inverting a Neural Net

In this experiment, I "invert" a simple two-layer MNIST model to visualize what the final hidden layer representations look like when projected back into the original sample space.

Representational Power of Deeper Layers

The hidden layers in a neural network can be seen as different representations of the input. Do deeper layers learn "better" representations? In a network trained to solve a classification problem, this would mean that deeper layers provide better features than earlier layers. The natural hypothesis is that this is indeed the case. In this post, I test this hypothesis on an network with three hidden layers trained to classify the MNIST dataset. It is shown that deeper layers do in fact produce better representations of the input.

Implementing Batch Normalization in Tensorflow

Batch normalization is deep learning technique introduced in 2015 that enables the use of higher learning rates, acts as a regularizer and can speed up training by 14 times. In this post, I show how to implement batch normalization in Tensorflow.