In the previous post, I discussed two different learning methods for reinforcement learning, Monte Carlo learning and temporal difference learning. I then provided a unifying view by considering $n$-step TD learning and establishing hybrid learning method, $TD\left( \lambda \right)$. These methods are