• I am working to develop the “Goals, Methods and Failure Causes” (GMF) taxonomy for AI incidents, in the context of the AI Incident Database. You can find a status summary of the project on this AAAI SafeAI2023 paper.
  • In AI Safety Camp 2022 I worked on learning and penalizing betrayal patterns in agent communications in an RL setting of symmetric “observer” - “gatherer” agents. Conducted research and resulting empirical outcomes can be found in this Neurips2022 MLSafety workshop paper. The codebase utilized is available in github.
  • In the 2022 AGI Safety Fundamentals course I worked on toy investigations on deceptively aligned agents in a toy environment. You can find relevant code, results and ideas for future work / extensions in this gitlab repository.