Synthetic Gradients with Tensorflow

I stumbled upon Max Jaderberg's Synthetic Gradients paper while thinking about different forms of communication between neural modules. It's a simple idea: rather than compute gradients through backpropagation, we can train a model to predict what those gradients will be, and use our prediction to update our weights. I wanted to try using this in my own work and didn't find a Tensorflow implementation to my liking, so here is mine. I also take this opportunity to (attempt to) answer one of the questions I had while reading the paper: why not use synthetic loss instead of synthetic gradients?

Beyond Binary: Ternary and One-hot Neurons

While playing with some applications of binary neurons, I found myself wanting to use explicit activations that go beyond a simple yes/no decision. For example, we might want our neural network to make a choice between several categories (in the form of a one-hot vector) or we might want it to make a choice between ordered categories (e.g., a scale of 1 to 10). It's rather easy to extend the straight-through estimator to work well on both of these cases, and I thought I would share my work in this post. I share code for implementing ternary and one-hot neurons in Tensorflow, and show that they can learn to solve MNIST.

Non-Zero Initial States for Recurrent Neural Networks

The default approach to initializing the state of an RNN is to use a zero state. This often works well, particularly for sequence-to-sequence tasks like language modeling where the proportion of outputs that are significantly impacted by the initial state is small. In some cases, however, it makes sense to (1) train the initial state as a model parameter, (2) use a noisy initial state, or (3) both. This post examines the rationale behind trained and noisy intial states briefly, and presents drop-in Tensorflow implementations.

Recurrent Neural Networks in Tensorflow III - Variable Length Sequences

This is the third in a series of posts about recurrent neural networks in Tensorflow. In this post, we'll use Tensorflow to construct an RNN that operates on input sequences of variable lengths.

Binary Stochastic Neurons in Tensorflow

In this post, I introduce and discuss binary stochastic neurons, implement trainable binary stochastic neurons in Tensorflow, and conduct several simple experiments on the MNIST dataset to get a feel for their behavior. Binary stochastic neurons offer two advantages over real-valued neurons: they can act as a regularizer and they enable conditional computation by enabling a network to make yes/no decisions. Conditional computation opens the door to new and exciting neural network architectures, such as the choice of experts architecture and heirarchical multiscale neural networks.

Recurrent Neural Networks in Tensorflow II

This is the second in a series of posts about recurrent neural networks in Tensorflow. In this post, we will build upon our vanilla RNN by learning how to use Tensorflow's scan and dynamic_rnn models, upgrading the RNN cell and stacking multiple RNNs, and adding dropout and layer normalization. We will then use our upgraded RNN to generate some text, character by character.

Recurrent Neural Networks in Tensorflow I

This is the first in a series of posts about recurrent neural networks in Tensorflow. In this post, we will build a vanilla recurrent neural network (RNN) from the ground up in Tensorflow, and then translate the model into Tensorflow's RNN API.

Implementing Batch Normalization in Tensorflow

Batch normalization is deep learning technique introduced in 2015 that enables the use of higher learning rates, acts as a regularizer and can speed up training by 14 times. In this post, I show how to implement batch normalization in Tensorflow.