This page contains most of the topics I've covered in a self-set curriculum as I study the field of data science (with a strong focus on machine learning). Bullets without a link are topics that I plan to get to, but will not post an article on in the immediate future. Links labeled "coming soon" are posts currently in progress.
The General ML Framework
- Preparing data for a machine learning model
- Feature selection
- Evaluating a machine learning model
- Hyperparameter tuning
- Learning from imbalanced data
- Building machine learning pipelines
Machine Learning Models
Classification algorithms are used when you have a dataset of observations where we'd like to use the features associated with an observation to predict its class.
Example: Predict the type of flower when provided information on sepal length, sepal width, color, petal width, and petal length.
Regression algorithms are used when you have a dataset of observations where you'd like to use the features to predict a continuous output.
Example: Predict the price of a house using the following features: sq ft, number of rooms, zip code, age of house, school district.
- Linear Regression
- Polynomial Regression
- Decision Tree Regression
- K-Nearest Neighbors
- Gaussian Process Regression
Clustering is a popular technique to find groups or segments in your data that are similar. This is an unsupervised learning algorithm in the sense that you don't train the algorithm and give it examples for what you'd like it to do, you just let the clustering algorithm explore the data and provide you with new insights.
- K-means clustering
- Soft clustering with Gaussian mixture models
- Density-based spatial clustering of applications with noise
When we're building machine learning models, sometimes we deal with datasets with well over 1,000 or even 10,000 dimensions. While this allows us to account for many features, these features are often redundant. Ideally, due to the curse of dimensionality, we'd like to limit our data to capture the true signal in the data and ignore the noise. Dimensionality reduction is one technique to reduce the dimension of our feature-space while maintaining the maximum amount of information. Dimensionality reduction is also very convenient for visualizing higher-dimensional data sets in two or three dimensions. This paper provides a great overview of the different techniques available for dimensionality reduction.
Neural networks are one of the most popular approaches to machine learning today, achieving impressive performance on a large variety of tasks. Often referred to as the "universal function approximator", this approach is very flexible to learning a variety of tasks.
- Convolutional neural networks
- Introduction to convolutional neural networks
- Common ConvNet architectures
- Object detection
- Face recognition
- Recurrent neural networks
- Introduction and network representation
- Gated recurrent units: Introducing intentional memory
- Long short term memory networks: Learning what to remember and what to forget
- Attention networks
- Transfer learning
- Image recognition
- Natural language processing
- One-shot learning
- Siamese networks
Reinforcement learning is an approach to machine learning where agents are rewarded to accomplish some task. "Good" behavior is reinforced via a reward, so this approach can more realistically be considered a method of reward maximization.
- Overview of reinforcement learning
- Planning in a stochastic environment
- Learning in a stochastic environment
- Implementations of Monte Carlo and temporal difference learning methods
- Generalizing value functions for large state-spaces
- Modeling multi-agent environments using game theory
Machine Learning Applications
Natural Language Processing
- Preprocessing text data for NLP
- TF-IDF Vectorization
The following links are external links to useful resources. At this time, I haven't written any blog posts on data visualizations but wanted to save a few external posts for future reference.