This is my ideal library about machine learning.
Theoretical
Machine learning and statistics
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by Trevor Hastie, Robert Tibshirani, Jerome Friedman
A must have. Covers most of the classical algorithms commonly used in machine learning.
Foundations of Machine Learning by Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar and Francis Bach
Provides a theoretical framework to various machine learning algorithms and a detailed implementation of some of them. Beware, this is highly theoretical and requires familiarity with probability theory !
Cours de Statistique Mathématique (in French!), by Alain Monfort.
It provides a clear and thorough introduction to the theory behind statistical hypothesis testing, estimation, and linear regressions.
Lectures on Algebraic Statistics by Mathias Drton, Bernd Sturmfels, Seth Sullivant.
Following my article about speeding up cross validation where some nice ideas arose from algebra, this books covers more aspects about the links between statistics and algebra, and a chapter dedicated to open problems in this field.
Probability
Exercises in Probability: A Guided Tour From Measure Theory To Random Processes, Via Conditioning (Cambridge Series in Statistical and Probabilistic Mathematics) by Loïc Chaumont and Marc Yor.
What would be machine learning without a nice understanding of probabilities and the proofs behind many methods? This books provides a large range of exercises and their solutions.
Real Analysis and Probability, by R. M. Dudley.
This is one of the clearest exposition of probability theory and of the interplay between the properties of metric spaces and probability measures.
Randomized Algorithms by Rajeev Motwani, Prabhakar Raghavan
I did not know where to put this one. In my post about time complexity I discussed some issues raised by the complexity of training a classifier or a regressor. This book provides a nice overview of this theory, heavily relying on probability.
Practice
R
Applied Predictive Modeling by Max Kuhn, Kjell Johnson
Full of nice ideas about how to treat various data sets! They even implemented a R package. I tried the package but, to be honest, never felt the need to reuse one of the methods proposed.
ggplot2: Elegant Graphics for Data Analysis (Use R!)
I am a big fan of ggplot2. I know, there are now better libraries on the market, with loads and loads of new functionality, interactive plots, yadi yadi yada… But mastering ggplot2 is so amazing. It only takes 5 lines to produce amazing graphs, summarizing findings from data. Which is really amazing, especially at the workplace.
Python
OCaml
Real World OCaml: Functional programming for the masses by Yaron Minsky, Anil Madhavapeddy and Jason Hickey is a good introduction to OCaml.
Miscellanous
The Art of Mathematics: Coffee Time in Memphis
Various exercises in mathematics, with hints and (beautiful) solutions. Some are directly related to probability (11. Loaded Dice, 69. Absent-minded Passengers, 77. Independent Random Variables, 139. A Probabilistic Inequality).
Music: A Mathematical Offering by Dave Benson.
Definitely not probability or statistics oriented. However, you may find here a new way to look at Fourier’s transform, and if you love music and mathematics, this is a must have.
Though not only about machine learning, this book is about the startup scene. Absolutely non technical, but fascinating :)
Freakonomics by Steven D. Levitt, Stephen J. Dubner
When economists meet real life :) non technical either.
Disclaimer: these are amazon sponsored links, however, I have read every book (in some cases, only partly) and only the best are in this list.