| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |↑ |0.6040|± |0.0310|
| | |none | 0|acc_norm|↑ |0.6040|± |0.0310|
|arc_easy | 1|none | 0|acc |↑ |0.8320|± |0.0237|
| | |none | 0|acc_norm|↑ |0.7920|± |0.0257|
|boolq | 2|none | 0|acc |↑ |0.8920|± |0.0197|
|hellaswag | 1|none | 0|acc |↑ |0.5800|± |0.0313|
| | |none | 0|acc_norm|↑ |0.7560|± |0.0272|
|mmlu | 2|none | |acc |↑ |0.8089|± |0.0037|
| - humanities | 2|none | |acc |↑ |0.7635|± |0.0082|
| - formal_logic | 1|none | 0|acc |↑ |0.6587|± |0.0424|
| - high_school_european_history | 1|none | 0|acc |↑ |0.8182|± |0.0301|
| - high_school_us_history | 1|none | 0|acc |↑ |0.8578|± |0.0245|
| - high_school_world_history | 1|none | 0|acc |↑ |0.7975|± |0.0262|
| - international_law | 1|none | 0|acc |↑ |0.8430|± |0.0332|
| - jurisprudence | 1|none | 0|acc |↑ |0.8796|± |0.0315|
| - logical_fallacies | 1|none | 0|acc |↑ |0.8834|± |0.0252|
| - moral_disputes | 1|none | 0|acc |↑ |0.8160|± |0.0246|
| - moral_scenarios | 1|none | 0|acc |↑ |0.5520|± |0.0315|
| - philosophy | 1|none | 0|acc |↑ |0.7400|± |0.0278|
| - prehistory | 1|none | 0|acc |↑ |0.8200|± |0.0243|
| - professional_law | 1|none | 0|acc |↑ |0.6120|± |0.0309|
| - world_religions | 1|none | 0|acc |↑ |0.7895|± |0.0313|
| - other | 2|none | |acc |↑ |0.8105|± |0.0076|
| - business_ethics | 1|none | 0|acc |↑ |0.8500|± |0.0359|
| - clinical_knowledge | 1|none | 0|acc |↑ |0.8800|± |0.0206|
| - college_medicine | 1|none | 0|acc |↑ |0.8208|± |0.0292|
| - global_facts | 1|none | 0|acc |↑ |0.5000|± |0.0503|
| - human_aging | 1|none | 0|acc |↑ |0.8027|± |0.0267|
| - management | 1|none | 0|acc |↑ |0.8835|± |0.0318|
| - marketing | 1|none | 0|acc |↑ |0.8419|± |0.0239|
| - medical_genetics | 1|none | 0|acc |↑ |0.8900|± |0.0314|
| - miscellaneous | 1|none | 0|acc |↑ |0.8640|± |0.0217|
| - nutrition | 1|none | 0|acc |↑ |0.8680|± |0.0215|
| - professional_accounting | 1|none | 0|acc |↑ |0.7040|± |0.0289|
| - professional_medicine | 1|none | 0|acc |↑ |0.9160|± |0.0176|
| - virology | 1|none | 0|acc |↑ |0.5663|± |0.0386|
| - social sciences | 2|none | |acc |↑ |0.8838|± |0.0065|
| - econometrics | 1|none | 0|acc |↑ |0.6491|± |0.0449|
| - high_school_geography | 1|none | 0|acc |↑ |0.9394|± |0.0170|
| - high_school_government_and_politics| 1|none | 0|acc |↑ |0.9741|± |0.0115|
| - high_school_macroeconomics | 1|none | 0|acc |↑ |0.8920|± |0.0197|
| - high_school_microeconomics | 1|none | 0|acc |↑ |0.9412|± |0.0153|
| - high_school_psychology | 1|none | 0|acc |↑ |0.9720|± |0.0105|
| - human_sexuality | 1|none | 0|acc |↑ |0.8550|± |0.0309|
| - professional_psychology | 1|none | 0|acc |↑ |0.8600|± |0.0220|
| - public_relations | 1|none | 0|acc |↑ |0.7727|± |0.0401|
| - security_studies | 1|none | 0|acc |↑ |0.7837|± |0.0264|
| - sociology | 1|none | 0|acc |↑ |0.9005|± |0.0212|
| - us_foreign_policy | 1|none | 0|acc |↑ |0.9200|± |0.0273|
| - stem | 2|none | |acc |↑ |0.7888|± |0.0072|
| - abstract_algebra | 1|none | 0|acc |↑ |0.6200|± |0.0488|
| - anatomy | 1|none | 0|acc |↑ |0.7481|± |0.0375|
| - astronomy | 1|none | 0|acc |↑ |0.9276|± |0.0211|
| - college_biology | 1|none | 0|acc |↑ |0.9375|± |0.0202|
| - college_chemistry | 1|none | 0|acc |↑ |0.6000|± |0.0492|
| - college_computer_science | 1|none | 0|acc |↑ |0.7400|± |0.0441|
| - college_mathematics | 1|none | 0|acc |↑ |0.7100|± |0.0456|
| - college_physics | 1|none | 0|acc |↑ |0.6765|± |0.0466|
| - computer_security | 1|none | 0|acc |↑ |0.8000|± |0.0402|
| - conceptual_physics | 1|none | 0|acc |↑ |0.9277|± |0.0169|
| - electrical_engineering | 1|none | 0|acc |↑ |0.8069|± |0.0329|
| - elementary_mathematics | 1|none | 0|acc |↑ |0.7520|± |0.0274|
| - high_school_biology | 1|none | 0|acc |↑ |0.9400|± |0.0151|
| - high_school_chemistry | 1|none | 0|acc |↑ |0.8030|± |0.0280|
| - high_school_computer_science | 1|none | 0|acc |↑ |0.8500|± |0.0359|
| - high_school_mathematics | 1|none | 0|acc |↑ |0.5520|± |0.0315|
| - high_school_physics | 1|none | 0|acc |↑ |0.7748|± |0.0341|
| - high_school_statistics | 1|none | 0|acc |↑ |0.8333|± |0.0254|
| - machine_learning | 1|none | 0|acc |↑ |0.7946|± |0.0383|
|openbookqa | 1|none | 0|acc |↑ |0.3280|± |0.0298|
| | |none | 0|acc_norm|↑ |0.4720|± |0.0316|
|rte | 1|none | 0|acc |↑ |0.8200|± |0.0243|
|winogrande | 1|none | 0|acc |↑ |0.7680|± |0.0268|