Data Science Topics

This page contains most of the topics I've covered in a self-set curriculum as I study the field of data science (with a strong focus on machine learning). Bullets without a link are topics that I plan to get to, but will not post an article on in the immediate future. Links labeled "coming soon" are posts currently in progress.

Courses I've taken

Machine Learning

Machine Learning Overview

The General ML Framework

Machine Learning Models


Classification algorithms are used when you have a dataset of observations where we'd like to use the features associated with an observation to predict its class.

Example: Predict the type of flower when provided information on sepal length, sepal width, color, petal width, and petal length.


Regression algorithms are used when you have a dataset of observations where you'd like to use the features to predict a continuous output.

Example: Predict the price of a house using the following features: sq ft, number of rooms, zip code, age of house, school district.


Clustering is a popular technique to find groups or segments in your data that are similar. This is an unsupervised learning algorithm in the sense that you don't train the algorithm and give it examples for what you'd like it to do, you just let the clustering algorithm explore the data and provide you with new insights.

Dimensionality Reduction

When we're building machine learning models, sometimes we deal with datasets with well over 1,000 or even 10,000 dimensions. While this allows us to account for many features, these features are often redundant. Ideally, due to the curse of dimensionality, we'd like to limit our data to capture the true signal in the data and ignore the noise. Dimensionality reduction is one technique to reduce the dimension of our feature-space while maintaining the maximum amount of information. Dimensionality reduction is also very convenient for visualizing higher-dimensional data sets in two or three dimensions. This paper provides a great overview of the different techniques available for dimensionality reduction.

Neural Networks

Neural networks are one of the most popular approaches to machine learning today, achieving impressive performance on a large variety of tasks. Often referred to as the "universal function approximator", this approach is very flexible to learning a variety of tasks.

Reinforcement Learning

Reinforcement learning is an approach to machine learning where agents are rewarded to accomplish some task. "Good" behavior is reinforced via a reward, so this approach can more realistically be considered a method of reward maximization. This book is the canonical resource for learning RL.

Machine Learning Applications

Natural Language Processing

Data Visualization

The following links are external links to useful resources. At this time, I haven't written any blog posts on data visualizations but wanted to save a few external posts for future reference.

Data Acquisition and Wrangling