Partial observability is achieved by the introduction of unit sight range, which restricts the agents from receiving information about allied or enemy units that are out of range. Moreover, agents can only observe others if they are alive and cannot distinguish between units that are dead or out of range.
Key to our method is the insight that the full factorisation of VDN is not necessary in order to be able to extract decentralised policies that are fully consistent with their centralised counterpart. Instead, for consistency we only need to ensure that a global argmax performed on yields the same result as a set of individual argmax operations performed on each 我们方法的关键是认识到,为了能够提取与集中式策略完全一致的去中心化策略,不需要对 VDN 进行完全分解。相反,为了保持一致性,我们只需要确保在上执行的全局 argmax 产生与在每个上执行的一组单独的 argmax 操作相同的结果