Hello!
I have trained multiple reinforcement networks for various problems, ranging from basic OpenAI Gym problems such as CartPole, Pendulum, MountainCar to more advanced, such as real robot control in the maze, where the NN learned to control the velocity of the separate wheels based only on it's camera vision in order to successfully navigate thru the maze.
I believe I can fix (and train!) your algorithm in just a few hours.
some of my code can be found on my github: [login to view URL]
**** please contact me first to discuss details
Regards,
Juraj K.