Solaris Research

Solaris Research is a technical AI safety initiative based at ETH Zürich. We build the science to detect, understand and prevent failures in frontier models.

Frontier models are deployed in new contexts at massive scale, yet we have almost no science for how risk profiles shift as models undergo continuous fine-tuning post-release. We study this gap: tracing alignment-relevant low-dimensional structures inside models and intervening on them using tools such as geometric steering and probe-based constraints.

We insist on a stronger evidential foundation than the field currently demands; too many safety claims rest on narrow benchmarks or unreproducible results. We prioritise explicit measurement design, sufficient scale and tests for alternative explanations with intervention-based validation. The research questions are narrow; the stakes are not.

We work on

Misalignment science. How safety-relevant representations form and degrade across pretraining, fine-tuning, and post-deployment adaptation stages

Active mechanistic interpretability. Steering vectors, circuit analysis and probe-guided detection employed as intervention proactively while training

Open benchmarks & evaluations. Tools, datasets, and reproducible benchmarks on emergent misalignment so other researchers can build on top

Harness safety. How tool calling, memory and multi-agent coordination in harness systems create or compounds anthropomorphised misalignment risks

What’s new

May 2026 AMR position paper was awarded an oral at ICML 2026.

April 2026 Compute grant was accepted by the Swiss AI Initiative.

March 2026 The Solaris Research team was formed at ETH Zürich.

The team

We are a team of safety-driven researchers based at ETH Zürich and work closely with LAS group (Prof. Andreas Krause), IVIA lab (Prof. Menna El-Assady), SPY Lab (Prof. Florian Tramèr), ETH AI Center (Dr. Imanol Schlag) in the Swiss AI ecosystem.

Anna Hedström

Core team

Cynthia Chen

Core team

Lukas Fluri

Core team

Work with us

We work with students, research labs and industry partners who share our commitment to open, evidence-based frontier safety science.

Student thesis, semester & independent projects: work on impactful problems, co-supervised across ETH labs, with a path toward ML publication

Project collaborators: bring a clear safety question; we provide compute and co-mentorship across misalignment, evaluation and interpretability topics

Labs & industry partners: if you deploy ML models in high-stakes settings and need safety audits after adaptation, collaborate with us

Enter password

We work on

What’s new

The team

Work with us