Iโ€™m excited about developing AI-based activation monitoring and control mechanisms as fallback defenses against misaligned systems. Traditional alignment research remains essential, but perfect alignment may take decades if at all possible. Iโ€™m particularly interested in detecting scheming models by analyzing their internal activations rather than relying solely on behavioral observations and potentially combining this with a dynamic activation intervention mechanism (a la Inference-Time Intervention). Automated monitoring becomes increasingly important as models scale, since human-interpretable features likely disentangle into alien representations that humans cannot directly monitor. Catching misaligned AI โ€œred-handedโ€ before catastrophe occurs provides not only critical safety benefits but also political advantages by producing concrete evidence that could galvanize support for stronger safety measures. These control-focused techniques are appealing because they could be relatively cheap to implement, potentially serve as discretely enforceable standards that could be mandated, and place less of a handicap on capabilities than approaches forcing models to use only human-interpretable features โ€” making them more likely to be widely adopted.