The Beginning of the End of Black Box Machine Learning

A research paper came out recently called "Convergent Learning: Do Different Neural Networks Learn The Same Representations?" This paper is an important one that hasn't been discussed much publicly, so I want to take this space to talk, in layman's terms, about why this paper matters and the impact it will have on machine learning research going forward.

First, let me explain three things about neural networks for those of you who aren't technical. One is that the weights of neurons are usually initialized randomly, so two neural networks trained on the same data set never start from the same initial state. The second is that neural networks learn what are called "representations." One way to think of a representation is to think of the key features a neural net uses to classify something. A neural net training on a character by character language set might learn, for example, that "q" is almost always followed by a "u" in English. The third thing to note is that the middle layers of a neural network are called "hidden layers" in part because we don't know how they work. Neural nets tend to be black boxes inside.

That last point is important and ties to the core message of this paper. It isn't that we are incapable of understanding the inner workings of neural nets, its just that doing so is difficult and doesn't really impact the problem we are trying to solve. If the neural net reliably solves our problem, do we care how it works? This paper is arguing that yes, as the use of neural networks propagates, it is important to understand how they work. So, the research that generated this paper looked at identical neural nets, trained on identical data sets, to see if their internal states ended up similar.

I won't go into the details of how they performed the analysis. You can read the paper if you are technical enough to follow that. What I will point out is that the key finding was interesting. Different neural nets do tend to learn some of the same features for certain data sets, even if the way those features are represented isn't identical between the neural nets. But, there are also many features learned by one neural net that aren't learned by other neural nets. So it appears that, for common classification problems, there are a few salient features that every trained network will learn, but the full set of features varies slightly.

This work is important because starting to understand how neural networks work inside will undoubtedly spark more ideas and innovation in the space. If you are technical, you should read the paper. If you aren't, you should follow this conceptually as more research takes place. It will be an important area.