Data Science Scaling nearest neighbors search with approximate methods. Jump to: What is nearest neighbors search? K-d trees Quantization Product quantization Handling multi-modal data Locally optimized product quantization Common datasets Further reading What is nearest neighbors search? In the world of deep
Data Science Organizing machine learning projects: project management guidelines. The goal of this document is to provide a common framework for approaching machine learning projects that can be referenced by practitioners. Knowledge of machine learning is assumed. Overview This overview intends to
Data Science An overview of object detection: one-stage methods. In this post, I'll discuss an overview of deep learning techniques for object detection using convolutional neural networks. Object detection is useful for understanding what's in an image, describing both what is in
Data Science Evaluating image segmentation models. When evaluating a standard machine learning model, we usually classify our predictions into four categories: true positives, false positives, true negatives, and false negatives. However, for the dense prediction task of image segmentation,
Data Science An overview of semantic image segmentation. In this post, I'll discuss how to use convolutional neural networks for the task of semantic image segmentation. Image segmentation is a computer vision task in which we label specific regions of an
Startups Lessons learned from attempting to launch a startup. In Q4 of 2017, I made the decision to walk down the entrepreneurial path and dedicate a full-time effort towards launching a startup venture. I secured a healthy seed round of funding from
Data Science Common architectures in convolutional neural networks. In this post, I'll discuss commonly used architectures for convolutional networks. As you'll see, almost all CNN architectures follow the same general design principles of successively applying convolutional layers to the input, periodically
Data Science Variational autoencoders. In my introductory post on autoencoders, I discussed various models (undercomplete, sparse, denoising, contractive) which take data as input and discover some latent state representation of that data. More specifically, our input data
Data Science Introduction to autoencoders. Autoencoders are an unsupervised learning technique in which we leverage neural networks for the task of representation learning. Specifically, we'll design a neural network architecture such that we impose a bottleneck in the
Data Science Setting the learning rate of your neural network. In previous posts, I've discussed how we can train neural networks using backpropagation with gradient descent. One of the key hyperparameters to set in order to train a neural network is the learning
Data Science Learning from imbalanced data. In this blog post, I'll discuss a number of considerations and techniques for dealing with imbalanced data when training a machine learning model. The blog post will rely heavily on a sklearn contributor
Data Science Normalizing your data (specifically, input and batch normalization). In this post, I'll discuss considerations for normalizing your data - with a specific focus on neural networks. In order to understand the concepts discussed, it's important to have an understanding of gradient
Resolutions New Year's Resolutions 2018 After revisiting my 2017 resolutions and evaluating how well I adhered each resolution, I'd like to set forth my resolutions for the coming year. This year, I'll set more measurable goals so that
Gratitude 2017 List of Gratitude A short list of all of the wonderful people in my life that helped me reach my personal development goals, supported me, and provided me the joys of life in 2017. To all
Data Science Hyperparameter tuning for machine learning models. When creating a machine learning model, you'll be presented with design choices as to how to define your model architecture. Often times, we don't immediately know what the optimal model architecture should be
Blockchain What the heck is blockchain? Lately, I've been talking more and more about blockchain and its potential impact. As I've been learning more about the technology and sharing what I've learned with my friends, I've decided it would
Data Science Generalizing value functions for large state spaces. Up until now, we've discussed the concept of a value function primarily as a lookup table. As our agent visits specific state-action pairs and continues to explore an environment, we update the value
Data Science Implementations of Monte Carlo and Temporal Difference learning. In the previous post, I discussed two different learning methods for reinforcement learning, Monte Carlo learning and temporal difference learning. I then provided a unifying view by considering $n$-step TD learning and
Data Science Learning in a stochastic environment. Previously, I discussed how we can use the Markov Decision Process for planning in stochastic environments. For the process of planning, we already have an understanding of our environment via access to information
Data Science Overview of reinforcement learning. Reinforcement learning is a method of learning where we teach the computer to perform some task by providing it with feedback as it performs actions. This is different from supervised learning in that
Data Science SQL for data analysis. As a data scientist, you deal with a lot of data. For small datasets, maybe you just store this information in a CSV file and load it into Pandas. However, this isn't really
Data Science Convolutional neural networks. In my introductory post on neural networks, I introduced the concept of a neural network that looked something like this. As it turns out, there are many different neural network architectures, each with
Data Science Deep neural networks: preventing overfitting. In previous posts, I've introduced the concept of neural networks and discussed how we can train neural networks. For these posts, we examined neural networks that looked like this. However, many of the
Data Science Planning in a stochastic environment. In this post, I'll be discussing how to calculate the best set of actions to complete a task whilst operating in a known environment, otherwise known as planning. For this scenario, we have
Data Science Evaluating a machine learning model. So you've built a machine learning model and trained it on some data... now what? In this post, I'll discuss how to evaluate your model, and practical advice for improving the model based