In this test, the env has been anchored to a specific state somewhere in the past so that it always went back to that place after some time-steps.
For the implementation, please check below.
GitHub: github.com/Rowing0914/TF_RL/blob/master/test/Atari…
コメント