Data Science - Jeremy Jordan

Understanding the Transformer architecture for neural networks

The attention mechanism allows us to merge a variable-length sequence of vectors into a fixed-size context vector. What if we could use this mechanism to entirely replace recurrence for sequential modeling? This blog post covers the Transformer architecture which explores such an approach.

Understanding the attention mechanism in sequence models

In this blog post, we'll discuss a key innovation in sequence-to-sequence model architectures: the attention mechanism. This architecture innovation dramatically improved model performance for sequence-to-sequence tasks such as machine translation and text summarization. Moreover, the success of this attention mechanism led to

Managing your machine learning infrastructure as code with Terraform

Let's say you want to deploy a recommender system at your company. A typical architecture might include a set of inference servers to run your embedding and ranking models, an approximate nearest neighbor index to select a set of candidate items that match your query, a database to retrieve features

Terraform configuration: quick reference

This page contains a quick reference for writing Terraform configuration.

A simple solution for monitoring ML systems.

This blog post aims to provide a simple, open-source solution for monitoring ML systems. We'll discuss industry-standard monitoring tools and practices for software systems and how they can be adapted to monitor ML systems.

Effective testing for machine learning systems.

In this blog post, we'll cover what testing looks like for traditional software development, why testing machine learning systems can be different, and discuss some strategies for writing effective tests for machine learning systems. We'll also clarify the distinction between the closely related

An introduction to Kubernetes.

This blog post will provide an introduction to Kubernetes so that you can understand the motivation behind the tool, what it is, and how you can use it. In a follow-up post, I'll discuss how we can leverage Kubernetes to power data science workloads using more concrete (data science) examples.

Building machine learning products: a problem well-defined is a problem half-solved.

Previously, I wrote about organizing machine learning projects where I presented the framework that I use for building and deploying models. However, that framework operates on the implicit assumption that you already know generally what your model should do.

Introduction to recurrent neural networks.

In this post, I'll discuss a third type of neural networks, recurrent neural networks, for learning from sequential data. For some classes of data, the order in which we receive observations is important. As an example, consider the two following sentences:

Scaling nearest neighbors search with approximate methods.

In this blog post, I'll cover a couple of techniques used for approximate nearest neighbors search. This post will not cover approximate nearest neighbors methods exhaustively, but hopefully you'll be able to understand how people generally approach this problem and how to apply these techniques

Organizing machine learning projects: project management guidelines.

The goal of this document is to provide a common framework for approaching machine learning projects that can be referenced by practitioners. If you build ML models, this post is for you.

An overview of object detection: one-stage methods.

In this post, I'll discuss an overview of deep learning techniques for object detection using convolutional neural networks. Object detection is useful for understanding what's in an image, describing both what is in an image and where those objects are found.

Evaluating image segmentation models.

When evaluating a standard machine learning model, we usually classify our predictions into four categories: true positives, false positives, true negatives, and false negatives. However, for the dense prediction task of image segmentation, it's not immediately clear what counts as a "

An overview of semantic image segmentation.

In this post, I'll discuss how to use convolutional neural networks for the task of semantic image segmentation. Image segmentation is a computer vision task in which we label specific regions of an image according to what's being shown.

Common architectures in convolutional neural networks.

In this post, I'll discuss commonly used architectures for convolutional networks. As you'll see, almost all CNN architectures follow the same general design principles of successively applying convolutional layers to the input, periodically downsampling the spatial dimensions while increasing the

Variational autoencoders.

A variational autoencoder (VAE) provides a probabilistic manner for describing an observation in latent space. Thus, rather than building an encoder which outputs a single value to describe each latent state attribute, we'll formulate our encoder to describe a probability distribution

Introduction to autoencoders.

Autoencoders are an unsupervised learning technique in which we leverage neural networks for the task of representation learning. Specifically, we'll design a neural network architecture such that we impose a bottleneck in the network which forces a compressed knowledge representation of the

Setting the learning rate of your neural network.

In previous posts, I've discussed how we can train neural networks using backpropagation with gradient descent. One of the key hyperparameters to set in order to train a neural network is the learning rate for gradient descent.

Learning from imbalanced data.

In this blog post, I'll discuss a number of considerations and techniques for dealing with imbalanced data when training a machine learning model. The blog post will rely heavily on a sklearn contributor package called imbalanced-learn to implement the discussed techniques.

Normalizing your data (specifically, input and batch normalization).

In this post, I'll discuss considerations for normalizing your data - with a specific focus on neural networks. In order to understand the concepts discussed, it's important to have an understanding of gradient descent.

Hyperparameter tuning for machine learning models.

When creating a machine learning model, you'll be presented with design choices as to how to define your model architecture. Often times, we don't immediately know what the optimal model architecture should be for a given model, and thus we&

Generalizing value functions for large state spaces.

Up until now, we've discussed the concept of a value function primarily as a lookup table. As our agent visits specific state-action pairs and continues to explore an environment, we update the value of that state-action pair independent of any other state-action

Implementations of Monte Carlo and Temporal Difference learning.

In the previous post [https://www.jeremyjordan.me/rl-learning-methods/], I discussed two different learning methods for reinforcement learning, Monte Carlo learning and temporal difference learning. I then provided a unifying view by considering $n$-step TD learning and establishing hybrid learning method, $TD\left(

Learning in a stochastic environment.

Previously, I discussed how we can use the Markov Decision Process [https://www.jeremyjordan.me/markov-decision-process] for planning in stochastic environments. For the process of planning, we already have an understanding of our environment via access to information given by the transfer function and

Overview of reinforcement learning.

Reinforcement learning is a method of learning where we teach the computer to perform some task by providing it with feedback as it performs actions. This is different from supervised learning in that we don't explicitly provide correct and incorrect examples of