TD learning 发表于 2025-04-13 更新于 2025-04-27 分类于 RL 本文字数: 172 阅读时长 ≈ 1 分钟 TD learning of state values只能估计给定策略$\pi$的state value Sarsa给定策略可以估计action value Sarsa is an action-value version of the TD algorithm expected Sarsa n-step Sarsa 包含MC&&TD Q-learning Summary