Reinforcement Learning - Jeremy Jordan

Jeremy Jordan

Sign in

Reinforcement Learning

A collection of 5 posts

Generalizing value functions for large state spaces.

Up until now, we've discussed the concept of a value function primarily as a lookup table. As our agent visits specific state-action pairs and continues to explore an environment, we update the value of that state-action pair independent of any other state-action pairs. When we&

Implementations of Monte Carlo and Temporal Difference learning.

In the previous post [https://www.jeremyjordan.me/rl-learning-methods/], I discussed two different learning methods for reinforcement learning, Monte Carlo learning and temporal difference learning. I then provided a unifying view by considering $n$-step TD learning and establishing hybrid learning method, $TD\left( \lambda \right)$. These methods

Learning in a stochastic environment.

Previously, I discussed how we can use the Markov Decision Process [https://www.jeremyjordan.me/markov-decision-process] for planning in stochastic environments. For the process of planning, we already have an understanding of our environment via access to information given by the transfer function and reward function. In other

Overview of reinforcement learning.

Reinforcement learning is a method of learning where we teach the computer to perform some task by providing it with feedback as it performs actions. This is different from supervised learning in that we don't explicitly provide correct and incorrect examples of how the task should be completed,

Planning in a stochastic environment.

In this post, I'll be discussing how to calculate the best set of actions to complete a task whilst operating in a known environment, otherwise known as planning. For this scenario, we have complete knowledge over the system's dynamics including the reward of each state and