Deconstruction with Discrete Embeddings
In my post Beyond Binary, I showed how easy it is to create trainable "one-hot" neurons with the straight-through estimator. My motivation for this is made clear in this post, in which I demonstrate the potential of discrete embeddings. In short, discrete embeddings allow for explicit deconstruction of inherently fuzzy data, which allows us to apply explicit reasoning and algorithms over the data, and communicate fuzzy ideas with concrete symbols. Using discrete embeddings, we can (1) create a language model over the embeddings, which immediately gives us access to RNN-based generation of internal embeddings (and sequences thereof), and (2) index sub-parts of the embeddings, instead of entire embedding vectors, which gives us (i.e., our agents) access to search techniques that go beyond cosine similarity, such as phrase search and search using lightweight structure.
Preliminary Note on the Complexity of a Neural Network
This post is a preliminary note on the "complexity" of neural networks. It's a topic that has not gotten much attention in the literature, yet is of central importance to the general understanding of neural networks. In this post I discuss complexity and generalization in broad terms, and make the argument that network structure (including parameter counts), the training methodology, and the regularizers used, though each different in concept, all contribute to this notion of neural network "complexity".
Written Memories: Understanding, Deriving and Extending the LSTM
When I was first introduced to Long Short-Term Memory networks (LSTMs), it was hard to look past their complexity. I didn't understand why they were designed the way they were designed, just that they worked. It turns out that LSTMs can be understood, and that, despite their superficial complexity, LSTMs are actually based on a couple incredibly simple, even beautiful, insights into neural networks. This post is what I wish I had when first learning about recurrent neural networks (RNNs).
Skill vs Strategy
In this post I consider the distinction between skill and strategy and what it means for machine learning. Backpropagation is limited in that it develops a skill at a specific strategy, but cannot, by itself, change strategies. I look at how strategy switches are achieved in real examples and ask what algorithm might allow machines to effectively switch strategies.