New evaluation run
Set a master prompt, let the rubric generate questions, score candidate answers — all persisted as one run.
Live evaluation needs Supabase + an OpenAI key.
1 dimensions · 1 scored by LLM judge · 0 safety gate(s)
Each case scored separately · subject to the daily budget cap