Improving instruction hierarchy in frontier LLMs

IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *