机器人控制中的强化学习极简入门指南

最近随着深度强化学习在机器人领域大放异彩，大家对这个方向的关注热情日趋高涨。最近老有人来问我相关的问题，借此机会我写了一个强化学习极简入门教程，其实也就是回答了入门应该看一些什么资料的问题。其实我本人也是初学者，如有疏漏或者错误还请大家海涵，如有问题也欢迎大家一起讨论。这是本系列的第一篇文章，计划后续会继续写一些更具体的环境搭建教程和入门教程，争取不鸽。

强化学习/深度学习部分参考资料

OpenAI的spinning up，主要讲深度强化学习的基本概念和算法的主流思想，里面也有配套代码，代码适合初学者学习，不过配置环境有点恶心，涉及到OpenMPI的内容，代码实在跑不起来就算了，能读懂就挺好，这个强烈推荐看完
https://spinningup.openai.com/en/latest/
OpenAI的gym(更名为Gymnasium了，其实就是gym)环境，经典的强化学习环境，这个环境可以好好玩玩
https://gymnasium.farama.org/
https://github.com/Farama-Foundation/Gymnasium
stable baseline3，深度强化学习经典/常用算法合集，和gym配合使用，gym只是一个环境，或者叫agent，这个是学习控制agent的算法，或者叫policy。
配合gym使用，熟悉一下api就好，别看里面的实现，太复杂了
https://stable-baselines3.readthedocs.io/en/master/
这个是网上一个老哥写的强化学习算法合集，这个和stable baseline3的算法内容基本相同，但是简单很多，适合初学者学习，可以配合这spinning up来看，这个代码很好读也很好跑起来。
https://github.com/Lizhi-sjtu/DRL-code-pytorch
另一个不错的强化学习算法库：
https://github.com/thu-ml/tianshou
最后推荐一本叫《强化学习》的书，这本书很经典系统的介绍了强化学习的基础理论。但是这本书知识有点旧了，但是对入门还是有一些帮助，主要是前几张的内容可以仔细看看，理解一下动态规划，蒙特卡洛，和时序差分的概念比较重要。
pytorch必学深度学习框架，个人感觉学习pytorch看官网的教程就够了，不懂的api问gpt就好
https://pytorch.org/

一些可能会有帮助的资料：
https://docs.anaconda.com/free/miniconda/index.html
https://code.visualstudio.com/docs
https://zh.cppreference.com/w/
https://www.liaoxuefeng.com/wiki/896043488029600
https://www.runoob.com/python3/python3-tutorial.html
https://www.runoob.com/linux/linux-tutorial.html
https://www.runoob.com/docker/docker-tutorial.html

以上就是强化学习/深度学习的一些资料，建议先熟悉熟悉基本的强化学习。熟悉一下linux和python，以及pytorch，之后再看下一步的强化学习在机器人上的应用。强烈建议看完spinning up，以及gym，后面那些框架大概看看就好，不要花费太多时间看算法框架，很多也不一定适合，而且太复杂，不要话太多时间思考算法细节，初步使用不会涉及到修改学习算法本身，多看看仿真器，多跑跑实际工程。多写多练才能更快进步。

机器人控制和仿真器参考资料

啊，终于到正题了

强化学习的仿真器：
nvidia isaacgym (类比于mojoco)，这个是我们目前主要使用的仿真器，可以下载下来熟悉一下，ubuntu 20.04 22.04 24.04 实测支持，建议使用python3.8

官网：https://developer.nvidia.com/isaac-gym

镜像文档：https://blog.zzshub.cn/legged_gym/

安装教程：https://zhuanlan.zhihu.com/p/560826876
ETH Raisim
https://raisim.com/ 这个是另一个仿真器，可以看看，也可以不看。仿真精度高，但是速度慢。
nvidia isaac sim。这个可能是我们未来会使用的仿真器
https://docs.omniverse.nvidia.com/isaacsim/latest/overview.html
强化学习的框架（对仿真器做的二次封装，方便强化学习使用，类比于Pinocchio）：
目前我们使用的是eth的legged_gym，这个配合nvidia isaacgym使用

官方版本：https://github.com/leggedrobotics/legged_gym

我修改版本：https://github.com/ZzzzzzS/legged_gym

双足/四足机器人强化学习经典论文

算法理论

X. B. Peng, Z. Ma, P. Abbeel, S. Levine and A. Kanazawa, “AMP: adversarial motion priors for stylized physics-based character control,” ACM Transactions on Graphics, vol. 40, p. 144:1–144:20, July 2021.
X. B. Peng, P. Abbeel, S. Levine and M. van de Panne, “DeepMimic: example-guided deep reinforcement learning of physics-based character skills,” ACM Transactions on Graphics, vol. 37, p. 143:1–143:14, July 2018.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, “Proximal Policy Optimization Algorithms,” 2017.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, p. 529–533, February 2015.
D. Kalashnikov, J. Varley, Y. Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine and K. Hausman, “MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale,” 2021.
E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine and C. Finn, “BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning,” in Proceedings of the 5th Conference on Robot Learning, 2022.
C. Finn, P. Christiano, P. Abbeel and S. Levine, “A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models,” 2016.
A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta and A. A. Bharath, “Generative Adversarial Networks: An Overview,” IEEE Signal Processing Magazine, vol. 35, p. 53–65, January 2018.

仿真工具

N. Rudin, D. Hoeller, P. Reist and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning, 2022.
V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa and G. State, “Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning,” 2021.
M. Körber, J. Lange, S. Rediske, S. Steinmann and R. Glück, “Comparing Popular Simulation Environments in the Scope of Robotics and Reinforcement Learning,” 2021.

双足/四足控制

A. Tang, T. Hiraoka, N. Hiraoka, F. Shi, K. Kawaharazuka, K. Kojima, K. Okada and M. Inaba, “HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation,” September 2023.
J. Siekmann, Y. Godse, A. Fern and J. Hurst, “Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, 2021.
T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Science Robotics, vol. 7, p. eabk2822, 2022.
Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth and K. Sreenath, “Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control,” January 2024.
J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science Robotics, vol. 5, p. eabc5986, October 2020.
A. Kumar, Z. Fu, D. Pathak and J. Malik, “RMA: Rapid Motor Adaptation for Legged Robots,” 2021.
Y. Jin, X. Liu, Y. Shao, H. Wang and W. Yang, “High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning,” Nature Machine Intelligence, vol. 4, p. 1198–1208, December 2022.
F. Jenelten, J. He, F. Farshidian and M. Hutter, “DTC: Deep Tracking Control,” Science Robotics, vol. 9, p. eadh5401, January 2024.
D. Hoeller, N. Rudin, D. Sako and M. Hutter, “ANYmal parkour: Learning agile navigation for quadrupedal robots,” Science Robotics, vol. 9, p. eadi7566, March 2024.
H. Duan, B. Pandit, M. S. Gadde, B. van Marum, J. Dao, C. Kim and A. Fern, “Learning Vision-Based Bipedal Locomotion for Challenging Terrain,” September 2023.
D. Baek, A. Purushottam and J. Ramos, “Hybrid LMC: Hybrid Learning and Model-based Control for Wheeled Humanoid Robot via Ensemble Deep Reinforcement Learning,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
I. M. Aswin Nahrendra, B. Yu and H. Myung, “DreamWaQ: Learning Robust Quadrupedal Locomotion With Implicit Terrain Imagination via Deep Reinforcement Learning,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.