Mathematics and Statistics References:
- Elements of Statistical Learning, entire text is available online, along with data sets and R code.
- Shape of Data, blog about the geometry and topology of data sets, providing a lot of geometric intuition for data science work.
- Advanced Data Analysis from an Elementary Point of View, a draft of a text on data analysis that assumes familiarity with stats, probability, linear algebra, calculus (uses R).
- Nice introductory and big-picture material on “Algebraic Perspective on Deep Learning” — tensor networks, algebraic descriptions of probabilistic models
Machine Learning References
- Distill is a new journal with visual easy-to-read articles about machine learning topics.
Git and GitHub:
- Git workflow, for all those things I forget every single time: http://rogerdudler.github.io/git-guide/
- Fixing git mistakes: https://sethrobertson.github.io/GitFixUm/fixup.html
- My favorite regex tester, available in different flavors: https://www.debuggex.com/
- From the same folks, a useful python regex cheat sheet: https://www.debuggex.com/cheatsheet/regex/python
- Pandas selection with loc/iloc/ix, which I often forget: http://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/
- ggplot2 cheatsheet: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
- Address & Location data: https://openaddresses.io/
- SNAP datasets (Standford): http://snap.stanford.edu/data/index.html
- Awesome public datasets: https://github.com/caesar0301/awesome-public-datasets
- Socrata: https://opendata.socrata.com/
Other programming/computer things:
- mysql: http://www.elated.com/articles/mysql-for-absolute-beginners/
Information, courses, useful material
- Datacamp https://campus.datacamp.com/
- Transitioning to data science career (from Insight)
- Probability & Stats from MIT open courseware