Solaris Research is a technical AI safety initiative based at ETH Zürich. We build the science to detect, understand and prevent failures in frontier models.
Frontier models are deployed in new contexts at massive scale, yet we have almost no science for how risk profiles shift as models undergo continuous fine-tuning post-release. We study this gap: tracing alignment-relevant low-dimensional structures inside models and intervening on them using tools such as
geometric steering and
probe-based constraints.
We insist on a stronger evidential foundation than the field currently demands; too many safety claims rest on narrow benchmarks or unreproducible results. We prioritise explicit measurement design, sufficient scale and tests for alternative explanations with intervention-based validation. The research questions are narrow; the stakes are not.
We work on
Misalignment science. How safety-relevant
representations form and degrade across pretraining, fine-tuning,
and post-deployment adaptation stages
Active mechanistic interpretability. Steering
vectors, circuit analysis and probe-guided detection employed as intervention proactively while
training
Open benchmarks & evaluations. Tools, datasets,
and reproducible benchmarks on emergent misalignment so other researchers can build on top
Harness safety. How tool calling, memory and multi-agent coordination in harness systems create or compounds anthropomorphised misalignment risks
April 2026
Compute grant was accepted by the Swiss AI Initiative.
March 2026
Solaris Research was founded at ETH Zürich.
The team
We are a team of safety-driven researchers based at ETH Zürich and work closely with
LAS group (Prof. Andreas Krause),
IVIA lab (Prof. Menna El-Assady),
SPY Lab (Prof. Florian Tramèr), ETH AI Center (Dr. Imanol Schlag) in the
Swiss AI ecosystem.