Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society (2504.17404v3)
Abstract: AI systems are becoming increasingly powerful and autonomous, and may progress to surpass human intelligence levels, namely Artificial Superintelligence (ASI). During the progression from AI to ASI, it may exceed human control, violate human values, and even lead to irreversible catastrophic consequences in extreme cases. This gives rise to a pressing issue that needs to be addressed: superalignment, ensuring that AI systems which are much smarter than humans, remain aligned with human (compatible) intentions and values. Even though this definition is somewhat limited, existing scalable oversight and weak-to-strong generalization methods may prove substantially infeasible and inadequate when facing ASI for superalignment. We must explore a more comprehensive definition, and safer and more pluralistic frameworks as well as approaches for superalignment. In this paper, we redefine superalignment as the human-AI co-alignment towards a sustainable symbiotic society, and highlight a framework that integrates external oversight and intrinsic proactive alignment. External oversight superalignment is grounded in human-centered ultimate decision, supplemented by interpretable automated evaluation and correction, to achieve continuous alignment with humanity's evolving values. Intrinsic proactive superalignment is rooted in a profound understanding of the Self, others, and society, integrating self-awareness, self-reflection, and empathy to spontaneously infer human intentions, distinguishing good from evil and considering human well-being, ultimately attaining human-AI co-alignment through iterative interaction. The integration of externally-driven oversight with intrinsically-driven alignment empowers sustainable symbiotic societies through human-AI co-alignment, paving the way for achieving safe and beneficial AGI/ASI for good, for human, and for a symbiotic ecology.
- Feifei Zhao (29 papers)
- Yuwei Wang (60 papers)
- Enmeng Lu (12 papers)
- Dongcheng Zhao (48 papers)
- Bing Han (74 papers)
- Haibo Tong (9 papers)
- Yao Liang (20 papers)
- Dongqi Liang (2 papers)
- Kang Sun (12 papers)
- Lei Wang (975 papers)
- Yitao Liang (53 papers)
- Chao Liu (358 papers)
- Yaodong Yang (169 papers)
- Yi Zeng (153 papers)
- Boyuan Chen (75 papers)
- Jinyu Fan (3 papers)