Pixiv - KiraraShss
40 字
1 分钟
TD learning
TD learning of state values
只能估计给定策略的state value

Sarsa
给定策略可以估计action value
Sarsa is an action-value version of the TD algorithm

expected Sarsa


n-step Sarsa
包含MC&&TD

Q-learning

Summary

文章分享
如果这篇文章对你有帮助,欢迎分享给更多人!
TD learning
https://printsdf.dpdns.org/posts/td-learning/ 最后更新于 2025-04-13,距今已过 376 天
部分内容可能已过时
printsdf's Blog