| feat(benchmarks): enhance sample generation and reporting with new configuration options and Excel output support | 1 个月前 |
| feat: introduce stream timeout configuration for benchmarks with BENCH_STREAM_TIMEOUT_MS support | 1 个月前 |
| feat(benchmarks): enhance sample generation and reporting with new configuration options and Excel output support | 1 个月前 |
| feat(benchmarks): implement LLM-as-a-Judge feature for enhanced sample evaluation and reporting, refactor environment variable handling, and improve utility functions | 2 个月前 |
| feat(benchmarks): add metrics for first observable component timing and enhance reporting with benchmark duration | 2 个月前 |
| refactor(fs-paths): streamline formatting function and resolveSamplesDir logic | 1 个月前 |
| feat(benchmarks): enhance sample generation and reporting with new configuration options and Excel output support | 1 个月前 |
| feat(benchmarks): implement LLM-as-a-Judge feature for enhanced sample evaluation and reporting, refactor environment variable handling, and improve utility functions | 2 个月前 |
| docs: update README.md and related files to clarify BENCH_MAAS_MODELS_PATH usage and improve model resolution logic | 1 个月前 |
| feat(benchmarks): implement LLM-as-a-Judge feature for enhanced sample evaluation and reporting, refactor environment variable handling, and improve utility functions | 2 个月前 |
| docs: update README.md and related files to clarify BENCH_MAAS_MODELS_PATH usage and improve model resolution logic | 1 个月前 |
| feat: enhance model handling in benchmarks by introducing primary model resolution and updating README for clarity | 1 个月前 |
| feat(benchmarks): add metrics for first observable component timing and enhance reporting with benchmark duration | 2 个月前 |
| feat: introduce stream timeout configuration for benchmarks with BENCH_STREAM_TIMEOUT_MS support | 1 个月前 |
| feat(benchmarks): add TPOT calculation and reporting enhancements, update README for clarity, and refine judge scoring system | 2 个月前 |