0%

TD learning

TD learning of state values

只能估计给定策略$\pi$的state value

image-20250414160137501

Sarsa

给定策略可以估计action value

Sarsa is an action-value version of the TD algorithm

image-20250414194020139

expected Sarsa

image-20250415213231678

image-20250415214119677

n-step Sarsa

包含MC&&TD

image-20250415214523965

Q-learning

image-20250416100428680

Summary

image-20250416100310699