|                 Tasks                 |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge                          |      1|none  |     0|acc     |↑  |0.6040|±  |0.0310|
|                                       |       |none  |     0|acc_norm|↑  |0.6040|±  |0.0310|
|arc_easy                               |      1|none  |     0|acc     |↑  |0.8320|±  |0.0237|
|                                       |       |none  |     0|acc_norm|↑  |0.7920|±  |0.0257|
|boolq                                  |      2|none  |     0|acc     |↑  |0.8920|±  |0.0197|
|hellaswag                              |      1|none  |     0|acc     |↑  |0.5800|±  |0.0313|
|                                       |       |none  |     0|acc_norm|↑  |0.7560|±  |0.0272|
|mmlu                                   |      2|none  |      |acc     |↑  |0.8089|±  |0.0037|
| - humanities                          |      2|none  |      |acc     |↑  |0.7635|±  |0.0082|
|  - formal_logic                       |      1|none  |     0|acc     |↑  |0.6587|±  |0.0424|
|  - high_school_european_history       |      1|none  |     0|acc     |↑  |0.8182|±  |0.0301|
|  - high_school_us_history             |      1|none  |     0|acc     |↑  |0.8578|±  |0.0245|
|  - high_school_world_history          |      1|none  |     0|acc     |↑  |0.7975|±  |0.0262|
|  - international_law                  |      1|none  |     0|acc     |↑  |0.8430|±  |0.0332|
|  - jurisprudence                      |      1|none  |     0|acc     |↑  |0.8796|±  |0.0315|
|  - logical_fallacies                  |      1|none  |     0|acc     |↑  |0.8834|±  |0.0252|
|  - moral_disputes                     |      1|none  |     0|acc     |↑  |0.8160|±  |0.0246|
|  - moral_scenarios                    |      1|none  |     0|acc     |↑  |0.5520|±  |0.0315|
|  - philosophy                         |      1|none  |     0|acc     |↑  |0.7400|±  |0.0278|
|  - prehistory                         |      1|none  |     0|acc     |↑  |0.8200|±  |0.0243|
|  - professional_law                   |      1|none  |     0|acc     |↑  |0.6120|±  |0.0309|
|  - world_religions                    |      1|none  |     0|acc     |↑  |0.7895|±  |0.0313|
| - other                               |      2|none  |      |acc     |↑  |0.8105|±  |0.0076|
|  - business_ethics                    |      1|none  |     0|acc     |↑  |0.8500|±  |0.0359|
|  - clinical_knowledge                 |      1|none  |     0|acc     |↑  |0.8800|±  |0.0206|
|  - college_medicine                   |      1|none  |     0|acc     |↑  |0.8208|±  |0.0292|
|  - global_facts                       |      1|none  |     0|acc     |↑  |0.5000|±  |0.0503|
|  - human_aging                        |      1|none  |     0|acc     |↑  |0.8027|±  |0.0267|
|  - management                         |      1|none  |     0|acc     |↑  |0.8835|±  |0.0318|
|  - marketing                          |      1|none  |     0|acc     |↑  |0.8419|±  |0.0239|
|  - medical_genetics                   |      1|none  |     0|acc     |↑  |0.8900|±  |0.0314|
|  - miscellaneous                      |      1|none  |     0|acc     |↑  |0.8640|±  |0.0217|
|  - nutrition                          |      1|none  |     0|acc     |↑  |0.8680|±  |0.0215|
|  - professional_accounting            |      1|none  |     0|acc     |↑  |0.7040|±  |0.0289|
|  - professional_medicine              |      1|none  |     0|acc     |↑  |0.9160|±  |0.0176|
|  - virology                           |      1|none  |     0|acc     |↑  |0.5663|±  |0.0386|
| - social sciences                     |      2|none  |      |acc     |↑  |0.8838|±  |0.0065|
|  - econometrics                       |      1|none  |     0|acc     |↑  |0.6491|±  |0.0449|
|  - high_school_geography              |      1|none  |     0|acc     |↑  |0.9394|±  |0.0170|
|  - high_school_government_and_politics|      1|none  |     0|acc     |↑  |0.9741|±  |0.0115|
|  - high_school_macroeconomics         |      1|none  |     0|acc     |↑  |0.8920|±  |0.0197|
|  - high_school_microeconomics         |      1|none  |     0|acc     |↑  |0.9412|±  |0.0153|
|  - high_school_psychology             |      1|none  |     0|acc     |↑  |0.9720|±  |0.0105|
|  - human_sexuality                    |      1|none  |     0|acc     |↑  |0.8550|±  |0.0309|
|  - professional_psychology            |      1|none  |     0|acc     |↑  |0.8600|±  |0.0220|
|  - public_relations                   |      1|none  |     0|acc     |↑  |0.7727|±  |0.0401|
|  - security_studies                   |      1|none  |     0|acc     |↑  |0.7837|±  |0.0264|
|  - sociology                          |      1|none  |     0|acc     |↑  |0.9005|±  |0.0212|
|  - us_foreign_policy                  |      1|none  |     0|acc     |↑  |0.9200|±  |0.0273|
| - stem                                |      2|none  |      |acc     |↑  |0.7888|±  |0.0072|
|  - abstract_algebra                   |      1|none  |     0|acc     |↑  |0.6200|±  |0.0488|
|  - anatomy                            |      1|none  |     0|acc     |↑  |0.7481|±  |0.0375|
|  - astronomy                          |      1|none  |     0|acc     |↑  |0.9276|±  |0.0211|
|  - college_biology                    |      1|none  |     0|acc     |↑  |0.9375|±  |0.0202|
|  - college_chemistry                  |      1|none  |     0|acc     |↑  |0.6000|±  |0.0492|
|  - college_computer_science           |      1|none  |     0|acc     |↑  |0.7400|±  |0.0441|
|  - college_mathematics                |      1|none  |     0|acc     |↑  |0.7100|±  |0.0456|
|  - college_physics                    |      1|none  |     0|acc     |↑  |0.6765|±  |0.0466|
|  - computer_security                  |      1|none  |     0|acc     |↑  |0.8000|±  |0.0402|
|  - conceptual_physics                 |      1|none  |     0|acc     |↑  |0.9277|±  |0.0169|
|  - electrical_engineering             |      1|none  |     0|acc     |↑  |0.8069|±  |0.0329|
|  - elementary_mathematics             |      1|none  |     0|acc     |↑  |0.7520|±  |0.0274|
|  - high_school_biology                |      1|none  |     0|acc     |↑  |0.9400|±  |0.0151|
|  - high_school_chemistry              |      1|none  |     0|acc     |↑  |0.8030|±  |0.0280|
|  - high_school_computer_science       |      1|none  |     0|acc     |↑  |0.8500|±  |0.0359|
|  - high_school_mathematics            |      1|none  |     0|acc     |↑  |0.5520|±  |0.0315|
|  - high_school_physics                |      1|none  |     0|acc     |↑  |0.7748|±  |0.0341|
|  - high_school_statistics             |      1|none  |     0|acc     |↑  |0.8333|±  |0.0254|
|  - machine_learning                   |      1|none  |     0|acc     |↑  |0.7946|±  |0.0383|
|openbookqa                             |      1|none  |     0|acc     |↑  |0.3280|±  |0.0298|
|                                       |       |none  |     0|acc_norm|↑  |0.4720|±  |0.0316|
|rte                                    |      1|none  |     0|acc     |↑  |0.8200|±  |0.0243|
|winogrande                             |      1|none  |     0|acc     |↑  |0.7680|±  |0.0268|