[NOTE] Updated November 21, 2015. This article may have outdated content or subject matter.
by Joseph Rickert
What are you reading? – and what are you recommending to friends, colleagues, and students who want to learn something about R programming? A quick search of Amazon will show that there are several new R books proposed for 2016; but of course, new doesn't necessarily mean better. I fully expect that many new books in all areas of statistics, data science and many other scientific disciplines using R to provide a computational aspect for their exposition will continue to be written for years to come. All of these books will provide windows into learning R for people excited about the particular subject matter. However, so many excellent R based texts have already been published that it will be difficult for these new works to achieve "must buy" status for the R content alone.
Below are my recommendations for good R reads. Some of these books go back a few years, but they continue to hold their value. With the possible exception of books that were based primarily on the S language, good R books don't become obsolete. Unlike some other computer languages, R evolves mostly through new capabilities added by contributed packages, not through changes to the R core. The fact that the dplyr family of packages may make data wrangling more convenient in many circumstances doesn't make a book that teaches data manipulation through base R functions any less relevant. In fact, some might argue that new students should be taught the basic functionally first. I am not a militant traditionalist, but it does seem to me that familiarity with the bare bones basics of the language will help newcomers to gain intuition about how R works.
There are three lists below. The first lists my picks for teaching R programming. (Top row in the graphic) The second list provides my recommendations for people interested in learning R for data science. (Second row in the graphic).
The third list is of books on my shelf that I continue to value. For every entry in all three lists I provide a mini or micro review. In a few cases, I point to a more extensive review that I have previously published in this blog. My lists are in no way intended to be complete. But, I apologize right now if I have omitted some really good books. Please let me know about what I have missed by commenting to this post with a mini review of your own.
Advanced R by Hadley Wickham – Anyone who wants to gain a deep understanding of the R language will certainly benefit from this book. More than a reference: the author seeks to provide a conceptual framework for understanding R’s structure and guide readers through R’s idiosyncratic mechanisms pointing out traps, illuminating difficult concepts and providing expert commentary.
The Art of R Programming: A Tour of Statistical Software Design by Norman Matloff – This is still my pick for the best book for people with some programming experience who want to make a serious effort at learning R. Professor Matloff’s interest in teaching the mechanics of programming infused with his deep understanding of both the underlying computer science and statistical theory put this book on top.
R For Dummies by Andrie de Vries and Joris Meys – A current, concise and insightful reference to core concepts in the R language. A really nice feature of the book is its emphasis on presenting the R ecosystem along with core R concepts. When learning anything new, it is always helpful to understand the big picture. Keep this book by your computer, when you stop referring to it you will be a pretty good R programmer.
Data Science with R
Applied Predictive Modeling by Max-Kuhn and Kjell Johnson – This book is the master text for predictive analytics, carefully walking through several modeling examples and making expert use of the extensive machine learning tools in R’s caret package. I have described the book more fully here.
Data Mining with Rattle and R by Graham Williams – This is the perfect first book for machine learning with R. The rattle GUI helps get across the machine learning concepts and also produces some pretty good R code to get your started.
Data Science in R: A Case Studies Approach to Computational Reasoning and Data Science by Deborah Nolan and Duncan Temple Lang. – My most recent acquisition, this book consists of 12, non-trivial case studies organized under three themes: Data Manipulation and modeling, Simulation Studies and Data and Web Technologies. All of the data sets are messy and the projects identify and develop the kind of skills required to undertake open-ended data science projects. The book doesn’t teach R programming, but it shows why R is the appropriate language for doing data science.
Practical Data Science with R by Nina Zumel and John Mount – This book is one of a kind. It moves fluidly between the various stages of the data science process from surface considerations of working with customers to the deep details of various machine learning algorithms. There is quite a bit of original R code that you can use in real projects. Most impressive is the statistical sensibility of the authors who want you to make correct inferences from your data and machine learning models as well as effectively communicate your findings to the people paying the bills.
The Rest of My Book Shelf
A First Course in Statistical Programming with R by W. John Braun and Duncan Murdoch – A deceptively thin book that provides a sharp introduction to R and moves quickly through debugging, computational linear algebra, numerical optimization and linear programming.
An Introduction to Statistical Learning with Applications in R by, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. This book is the companion to the master text for Machine / Statistical Learning, The Elements of Statistical Learning, and contains plenty of R code. The authors have generously posted pdf versions of both of these books online.
An R Companion to Applied Regression by John Fox and Sanford Weisberg– I have been a fan since the first edition which is possibly the best introduction to regression analysis with R ever.
Applied Meta-Analysis with R by Ding-Geng Chen and Karl E. Peace – Provides a solid introduction to basic meta-analysis that should be very helpful to people working in the field and want to move to R.
Bayesian Computation with R by Jim Albert – A concise, undergraduate level introduction to Bayesian Statistics.
Bayesian Essentials with R by Jean-Michel Marin and Christian P. Robert – This is a solid introduction to Bayesian Statistics with lots of useful code.
Data Analysis and Graphics Using R: An Example-Based Approach by John Maindonald and John Braun – a comprehensive introduction to both statistical analysis that is most suitable for self-learning. It is also a very handsome book. If you are a book person, this is the one to own.
Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman and Jennifer Hill – A superb book on statistical modeling that is both practical and rigorous with a modern perspective that should appeal to anyone Bayesians and non-Bayesians alike.
Data Manipulation with R by Phil Spector – A concise introduction to data munging using base R capabilities. This is another book to keep with you while programming.
Doing Bayesian Data Analysis: A Tutorial with R and BUGS by John K. Kruschke – This eclectic and entertaining read is a way to learn both R and Bayesian Analysis simultaneously. It provides lots of R code to build on.
Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models – Building on the authors text on linear models this book covers a lot of ground and provides real insight.
Forecasting: principles and practices by Rob J Hyndman and George Athanasopoulos – Written to teach time series forecasting to a business audience this free, online text is a beautiful example of both the open source ethos and of how R can help people with real business problems become productive with a very modest learning curve.
Introduction to Probability with R by Kenneth Baclawski – This is an eclectic little book. There is really not much R in it, but it is a modern introduction to probability theory including stochastic processes with enough R to help you teach yourself the math by experimenting. R is the really easy part of this book.
Introductory Statistics with R by Peter Dalgaard – A classic text with R code to get you doing real statistics very quickly and a great reference for both statistics and R that you will want to hang on to.
Introductory Time Series with R by Paul S.P. Cowpertwait and Andrew C. Metcalfe – Could be the best introduction to time series analysis ever.
Linear Models with R by Julian J. Faraway – A compact course on analyzing linear models using R. It contains several examples and enough R code to thoroughly analyze regression models.
Modern Applied Statistics with S by W.N. Venables and B.D. Ripley – Probably the best introduction to modern computational statistics out there. Even though it is S, most of the code will work in R.
R Cookbook by Paul Teetor – A solid introduction with recipes for carrying out data analyses and basic plots that you will want on your shelf.
R for Everyone: Advanced Analytics and Graphics by Jared P. Lander – An easy read with relevant machine learning examples that will get you started with R.
R for SAS and SPSS Users by Robert A. Muenchen. If you are still using SAS or SPSS you need this book. The author speaks your language, understands where you are coming from and will help you learn some R.
Regression Modeling Strategies by Frank E. Harrell, Jr. An incredible amount of wisdom for how to do statistics backed up with mostly straightforward R code.
R Programming for Bioinformatics by Robert Gentlemen – Not only for Bioinformatics. This book provides insight into the structure of the R language for intermediate and advanced programmers.
Software for Data Analysis: Programming with R by John M. Chambers – A text for advanced programmers discussing philosophy and good practices and providing deep insight into R.
Statistical Analysis of Network Data with R by Eric D. Kolaczyk and Gábor Csárdi – This is an indispensable resource for analyzing network data, containing a thorough explanation of the igraph package, it works through exponential random graph models and other advanced topics.
Statistical Computing in C++ and R by Randall L. Eubank and Ana Kupresanin – A very approachable introduction to both R and C++ for anyone who wants to understand these languages from the perspective of numerical analysis and the nuts and bolts of linear algebra.
Statistics and Data Analysis for Financial Engineering by David Ruppert and David S. Matteson – If you are interested in financial modeling this book could be your ticket to learning R and the R packages that support time series and financial engineering.
Time Series Analysis with Applications in R by Jonathan D. Cryer and Kung-Sik Chan – A solid undergraduate level introduction to R with step-by-Step R code. Very suitable for self study.
XML and Web Technologies for Data Sciences with R by Deborah Nolan and Duncan Temple Lang – Everything a data scientist would ever really want to know about XLM documents, JSON and other web technologies and how you can work with them using R.