| ✨ feat(eval): add external scoring mode (#12729)
* wip: add llm relevant & BrowseComp
* wip: add widesearch desc
* wip: dsqa, hle, widesearch
* wip: add dsqa
* wip: add awaiting eval status for runs
* wip: add awaiting status for run
* wip: adjust hle-verified
* :bug: fix: browsecomp topics
* :memo: docs: add annotations
* wip: add awaiting status for pass@k
* wip: add complete status
* wip: update theard dots
* wip: update run status page
* wip: remove useless impl
* wip: update prompt
* :sparkles: feat: add external eval routes
* wip: add eval cli
* :bug: fix: support authoritize in no browser environment
* wip: pass tests
* :recycle: refactor: remove tests
* :recycle: refactor: mo camel case | 2 个月前 |