Wednesday, July 22, 2009

What are good prequisite textbooks for someone interested in machine learning?

As mentioned in the last post, machine learning requires a good amount of background knowledge just to make sense of the topic. The core subject areas are:
  • calculus
  • linear algebra
  • combinatorics / discrete math
  • probability
  • statistics
  • information theory
  • real analysis
  • optimization theory

The following is a list of basic, introductory textbooks for each of these subjects :
  • A First Course in Probability by Sheldon Ross. Additionally helpful is William Feller's An Introduction to Probability Theory and Its Applications, Vol. 1.
  • For real analysis, Marsden and Hoffman's Elementary Classical Analysis is pretty standard. It's not a great book though. I find it annoying that the proofs are left for the ends of the chapters.
These books are very basic, at the undergraduate level for the most part. However, if, like me, you weren't a math or engineering major, you will probably find them useful for quickly filling in the gaping holes in your education. Later I'll post some introductory books at the graduate level.

Friday, July 3, 2009

What books should I read if I want to learn about machine learning?

I see this question posted on the web not infrequently and I wanted to throw in my two cents. The real answer is that it greatly depends on your background and your interests. Machine learning is vastly interdisciplinary, and the disciplines it builds on require individual study before learning about machine learning really becomes fruitful. Aside from computer science, the prerequisite subjects for machine learning are, at the very least, real analysis, probability, statistics, information theory and optimization theory --- Quite a list!

I was doing computer science at the undergraduate level when I became interested in machine learning. Having taken courses in basic algorithm design, theory of computation, programming languages, and AI, I thought I had the necessary background to start doing some machine learning. After all, it's still computer science, right? Wrong --- in fact totally wrong. The thing about machine learning as opposed to many other areas of computer science is that it's continuous rather than discrete math, or a lot of it is at least. So if you have a math, physics, engineering or statistics background, you are in much better shape to start reading literature on machine learning than if you are versed in fundamental CS. Of course, CS is useful, too, but it's less necessary in getting started. In fact, many of the best people in machine learning did not come from the computer science.

The other thing that makes answering this question difficult is that machine learning has more than a few sub-disciplines. Picking the right book really depends on which of these you want to delve into. Do you want to do theory? Applications? Applications in a particular domain? If you want to do natural language processing, then you shouldn't get a book on kernels. However, if you want to understand machine learning in general, then you should get a survey book. There are some decent survey texts out there, but the better ones are only useful given some mathematical maturity. Even then, it may be rough going.

The best intro-to machine learning textbook I know of which is relatively accessible to people with a traditional computer science education is Tom Mitchell's Machine Learning. However, it is rudimentary and is quickly becoming completely out of date (how sad!). There is also Russell and Norvig's Artificial Intelligence: A Modern Approach, which covers topics in more classical AI at the expense of machine learning topics.

I think the best survey texts on machine learning are probably Hastie, Tibshirani and Friedman's Elements of Statistical Learning, whose recent arrival in an expanded 2nd edition is what prompted me to write this post, and Chris Bishop's Pattern Recognition and Machine Learning, which attempts to be the end-all-be-all modern machine learning textbook. However, these books require more facility in the above prerequisite subjects than the average computer scientist usually has. Another commonly cited survey book in the same vein is Duda, Hart and Stork's Pattern Classification. It's more basic than the previous two and I've found it to be less useful as well.

If you are interested in natural language processing, you are in luck because the survey texts are more accessible to a general CS audience. There's Manning and Schutze's Foundations of Statistical Natural Language Processing, and Jurafsky and Martin's Speech and Language Processing. Fred Jelinek also has a wonderful book, Statistical Methods for Speech Recognition.

So these are my suggestions. Other people will have others (which I am eager to hear about). Later I will post my suggestions on good textbooks for the prereqs. Happy 4th of July, everyone.

--------------------------------------------------

Here are a couple of additions to the ML book list, and best of all, they are freely available online:

(1) Nils J. Nilsson's Introduction to Machine Learning, Draft of Incomplete Notes is similar in spirit to Tom Mitchell's book.

(2) David MacKay's Information Theory, Inference, and Learning Algorithms. This book has a different focus from the other intro books listed above. Rather than provide a comprehensive survey of machine learning methods, it seeks to introduce machine learning by way of information theory and Bayesian modeling, and thus put the topic on solid theoretical footing. Its focus is also on developing the intellectual abilities of the reader, which frankly makes it a far better introduction than the others above. And from I've read so far, MacKay's writing is insanely entertaining. I'm definitely looking forward to having more time to read this gem.