I’m learning about various machine learning algorithms, so I want to get recorded what I have learned and where I learned it, in part so that I can relearn it again after I inevitably forget it!

The perceptron is “baby’s first neural network.” It can be used successfully for learning binary classification of data that is linearly separable. The basic idea is that you have some training data that comes to you as vectors. You can start by guessing a weighting for those vectors, which is basically a guess at the subspace that separates your data (the weighting gives the normal vector for the subspace), or you can just initialize the weights to 0. Then you look at a random data point. First, you have to see how your current perceptron categorizes the data point, which you can do by taking a dot product of the weight vector with the data point vector and just looking at its sign.

If it is incorrectly classified, you need to adjust your weighting, which you do by subtracting (a scaling of) the vector of your current data point from your weighting vector, giving that normal vector a bit of a bump so that you will be correctly classifying the current data point. Then you pick another point and do the whole thing again. You are continuously adjusting your weights, so presumably your perceptron is getting better all the time. It is also useful to note that you need to use some kind of activation function to distinguish between correctly and incorrectly classified data points and it seems pretty typical to use a threshold step function.

Does this process ever end? Yes, it will provided that your data is linearly separable. You can even find the proof here. How long does it really take? I don’t know. Presumably it’s not the worst thing to do since lots of people reference it. What if your data isn’t really linearly separable? Well, it will go on forever, so you better pick a maximum number of iterations. Will it give you something reasonable after a reasonable number of iterations if you data is linearly separable-ish? No idea, but it seems like it might.

I read several useful pieces to figure out what I do know. I found this material from a presentation in graduate course on machine learning (there’s a lot of other interesting stuff in the webpage for the 2007 course http://aass.oru.se/~lilien/ml/). I also relied heavily on this material on perceptron from a CMU course in computer vision. Both of these sources have useful illustrations that I decided not to replicate here, so you should go look at them.The wikipedia page on perceptron had some good material, and I got curious a little about history, so I read http://web.csulb.edu/~cwallis/artificialn/History.htm.