While writing my Master Thesis at ASL ETH Zurich, our team wrote a paper about achieving Whole-Body Control of a Mobile Manipulator by using End-to-End Reinforcement Learning. Especially, we were using Proximal Policy Optimization (PPO), an Actor-Critic method which has the benefits of TRPO while reducing the amount of hyperparameters.
We submitted our paper to IROS/RA-L 2020 and uploaded it to the preprint server arXiv. A 60s summary of the paper was uploaded to YouTube:
The paper can be read here.