Mathematics and Statistics References:
- Introduction to Statistical Learning with Applications in R, entire text is available online, along with data sets and R code. Designed for a broader audience than Elements of Statistical Learning, but covers much the same topics . There is a MOOC for this(see below).
- Elements of Statistical Learning. Many of the same authors as above, also connected to the MOOC below, has associated R packages.
- Shape of Data, blog about the geometry and topology of data sets, providing a lot of geometric intuition for data science work.
- Advanced Data Analysis from an Elementary Point of View, a draft of a text on data analysis that assumes familiarity with stats, probability, linear algebra, calculus (uses R).
- Nice introductory and big-picture material on “Algebraic Perspective on Deep Learning” — tensor networks, algebraic descriptions of probabilistic models
Machine Learning References
- Distill is a new journal with visual easy-to-read articles about machine learning topics.
- Short science publishes summaries and reviews of research articles, mostly about machine learning and related fields.
- Nice visuals introducing machine learning from r2d3
Git and GitHub:
- Git workflow, for all those things I forget every single time: http://rogerdudler.github.io/git-guide/
- Fixing git mistakes: https://sethrobertson.github.io/GitFixUm/fixup.html
- Also a nice tutorial to introduce you to git: https://try.github.io
- Git cheatsheat: http://rogerdudler.github.io/git-guide/files/git_cheat_sheet.pdf
- My favorite regex tester, available in different flavors: https://www.debuggex.com/
- From the same folks, a useful python regex cheat sheet: https://www.debuggex.com/cheatsheet/regex/python
- Pandas selection with loc/iloc/ix, which I often forget: http://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/
- ggplot2 cheatsheet: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
- Address & Location data: https://openaddresses.io/
- Reddit datasets: https://www.reddit.com/r/datasets/
- SNAP datasets (Standford): http://snap.stanford.edu/data/index.html
- Awesome public datasets: https://github.com/caesar0301/awesome-public-datasets
- Socrata: https://opendata.socrata.com/
- Data.world Sharing and collaborating around data
Other programming/computer things:
- mysql: http://www.elated.com/articles/mysql-for-absolute-beginners/
Information, courses, useful material
- Datacamp https://campus.datacamp.com/
- Transitioning to data science career (from Insight)
- Probability & Stats from MIT open courseware
- Statistical Learning from Stanford, follows textbook Intro to Statistical Learning and Elements of Statistical Learning (and instructors are authors of both books).
- Coursera Intro to Data Science with Python
- Coursera Andrew Ng’s Machine Learning course
Projects and Ideas
- Data for Democracy uses volunteers to drive a number of projects, and put a lot of work on GitHub. They have an active blog as well.
- Partially Derivate: Mix of topics on ways data science is used, techniques.
- Linear Digressions: Thoughtful discussion on new techniques, applications, issues
- Data Science at Home
- Machine Learning Guide: Basically goes through the Andrew Ng course and presents ideas simply
- Learning Machines 101
- Data Skeptic
- Women in Data Science conference at Stanford
- Data Driven by DJ Patil and Hilary Mason. Being a data driven organization, introduction to some issues in data science.