AE
Runs/New run

New evaluation run

Set a master prompt, let the rubric generate questions, score candidate answers — all persisted as one run.

Live evaluation needs Supabase + an OpenAI key.

1 dimensions · 1 scored by LLM judge · 0 safety gate(s)

Each case scored separately · subject to the daily budget cap