Python and Anaconda

Python Website

Anaconda Website

If you want a local copy of Python on your computer, install the base program or the entire Anaconda package of programs. We recommend beginners start with Anaconda.

Here are some quick guides to follow along for installing Python and Anaconda:

Installing Python

Installing Anaconda (Windows)

Installing Anaconda (Mac)

Python User Interface Options:

Jupyter Notebooks

Google Colab

Kaggle Notebooks

Python is one of the most used programming languages in data science. It is open-source, has powerful tools, and has an array of user-interfaces and user-created packages for any sort of task. We recommend it for anyone who plans to integrate data science into a broader programming skill set, or who plans to work with other computer-oriented folks.

The Anaconda suite of user interfaces contains both Jupyter Notebooks and Spyder, two common programs for working in Python that we recommend for everyone. Jupyter Notebooks offer a clean, interactive environment for running Python from a browser. Its cousin, JupyterLab, is also available for operating from a remote machine. Spyder is a more traditional graphical user interface (GUI) application, similar to the RStudio or Stata environment.

Google Colab and Kaggle Notebooks are another set of interface options. Both host their runtime remotely, so you don’t need a program installed on your local computer (like JupterLab). They also allow you to share notebooks of code with others as easily as a Google Doc. Both also have extensive sets of tutorials and example notebooks for all kinds of applications. Colab has the additional benefit of interacting easily with other Google applications, like Earth Engine or Compute Engine.

Complete Tutorials

Open Source Data Science Masters - Collection of recommended courses with projects and timeline.

O’Reilly Data Science Handbook in Colab - Full set of tutorial notebooks on everything Numpy, Pandas, Matplotlib, and Machine Learning.

New Thiny to Add - Describe what the thingy is.

Stackify: 30 Python Tutorials - Comprehensive list of Python tutorials, from the beginner to the advanced.

Hitchhikers Guide to Python - This opinionated guide exists to provide both novice and expert Python developers a best practice handbook to the installation, configuration, and usage of Python on a daily basis. It also has an exhaustive list of resources for beginners to Python!

Kaggle: Micro-Courses - Kaggle has developed a great set of tutorial notebooks conveniently organized by topic. Their idea is to get you the skills and code you need, fast. A good reference for future projects.

Beginner’s Guides

Python: The Python Tutorial - Introduction to Python by the folks who made it. The table of contents is comprehensive so it’s a good resource if you get stuck too.

EdX Berkeley: Computational Thinking with Python - First in the “Data 8” series by UC Berkeley, introducing concepts of data science through Python.

UC Berkeley: CS9H Python for Programmers - Self-paced course, includes readings, assignments, and quizzes.

Udemy: Python Bootcamp - Complete tutorial for novice Python users interested in wider computer science application, includes 24 hours of video instruction and 19 exercises. Udemy courses are almost always on sale for $12, so avoid the “sticker price”.

UC Berkeley D-Lab: Python Fundamentals - GitHub repo of Python materials split into 4 “days” of instruction. Useful for anyone teaching Python in a seminar!

Harvard: CS109 Data Science Course - Lecture videos and course materials from Harvard’s intro to data science (circa 2014).

EdX UCSD: Python for Data Science - Learn to use powerful, open-source, Python tools, including Pandas, Git and Matplotlib, to manipulate, analyze, and visualize complex datasets.

UC Berkeley: Data 100 Course Website - Collection of current and previous semesters of Berkeley’s “Priciples and Techniques of Data Science”. The syllabus pages for each semester often have notebooks, slides, and other resources (Spring 20 example), while the textbook is continuously updated with new lessons.

Jupyter Notebooks

Datacamp: Jupyter Notebook Tutorial

RealPython: Intro to Jupyter Notebook

DataQuest: Jupyter Notebook for Beginners

Open Tech School: Introducing Jupyter Notebook

Intermediate Guides

EdX Berkeley: Inferential Thinking by Re-Sampling - Learn how to use inferential thinking to make conclusions about unknowns based on data in random samples.

Real Python: Intermediate Python Guides - Huge collection of guides on Django, object-oriented programming, and more!

Python Programming: Intermediate Python Programming - Complete tutorial guide including list comprehension, generators, and multiprocessors.

Machine Learning

Google: Introduction to Machine Learning - Full set of tutorials from Google split into conceptual overviews, videos, and code examples.

EdX Berkeley: Prediction and Machine Learning - Learn how to use machine learning, with a focus on regression and classification, to automatically identify patterns in your data and make better predictions.

Udemy: Python for Data Science and Machine Learning - Over 22 hours of video on everything from Pandas to SciKit-Learn and Tensorflow. Also $12.

Udemy: Machine Learning A-Z - Great course by the SuperDataScience Team with over 41 hours of video, including corollary exercises in R. Again, $12.

Udemy: AWS Machine Learning, AI, SageMaker - With Python - Learn about cloud based machine learning algorithms and how to integrate with your applications. $12.

Udemy: Deep Learning A-Z™ - Hands-On Artificial Neural Networks - Extension of the SuperDataScience Team course focusing on ANN’s with over 22 hours of video. Yup, everything on Udemy is $12.

Udemy: Artificial Intelligence - Reinforcement Learning in Python - Complete guide to artificial intelligence and machine learning, prep for deep reinforcement learning. You guessed it, $12.

Udemy: Deep Learning - Convolutional Neural Networks in Python - Computer Vision and Data Science and Machine Learning combined! In Theano and TensorFlow. Okay, technically it’s $11.99.

Udemy: Unsupervised Deep Learning in Python - Theano and Tensorflow: Autoencoders, Restricted Boltzmann Machines, Deep Neural Networks, t-distributed Stochastic Neighbor Embedding (SNE) and Principal Component Anaylsis (PCA). Costs somewhere between $11.98 and $12, before tax.

Udemy: Bayesian Machine Learning in Python: A/B Testing - 5.5 hour intro to Bayesian statistics and applications for A/B testing in Python. Typically $12.

Udemy: Unsupervised Machine Learning Hidden Markov Models in Python - Intro to HMM models for stock price analysis, language modeling, web analytics, biology, and PageRank. Also typically $12.

fast.ai - Making neural nets cool again - Introduction to deep learning, AI

Geospatial Data Visualization and Analysis

Geopandas

Geopandas: How to Download

Geopandas: Examples

Textbooks and Resources

Think Python: How to Think Like a Computer Scientist - User-friendly textbook that starts at a beginner level with very clear explanations

Python Data Science Handbook - Textbook on everything from opening notebooks to visualizations and machine learning

Python for Data Analysis by McKinney

Data Science from Scratch - Useful and to-the-point guide to many relevant data science concepts in Python; lacks a table of contents but has lots of source code

Composing Programs - Online textbook slightly more geared towards those with programming experience, has a very handy online python tutor

Python Standard Library - Description of standard objects and modules