Many simulated manipulation tasks use a third-person perspective which offers more global observability, perhaps because partial observability introduces many challenges to the agent. However, a third-person perspective can also obfuscate existing symmetries. The authors show that in environments where eye-in-hand perspective is sufficient for learning, this perspective consistently improves training efficiency and out-of-distribution generalization.
On various manipulation tasks, the authors test 3 training algorithms: DAgger, DrQ, and DAC. DAgger and DrQ are used to compare performances on various distribution shifts like table height, distractor objects, and table texture. DAC is used to compare performances for in and out-of-distribution generalization. The authors empirically verify that agents trained with eye-in-hand perspective consistently achieve better performance than agents trained with a third-person perspective.
The authors also experiment using variational information bottleneck (VIB) to regularize third-person observations for environments where such perspective is necessary. This regularization helps out-of-distribution generalization. Check the paper to learn more!