In the previous post [https://www.jeremyjordan.me/rl-learning-methods/], I discussed two
different learning methods for reinforcement learning, Monte Carlo learning and
temporal difference learning. I then provided a unifying view by considering
$n$-step TD learning and establishing hybrid learning method, $TD\left(