文件最后提交记录最后更新时间
SvelteKit-based WebUI (#14839) 8 个月前
server : include usage statistics only when user request them (#16052) * server : include usage statistics only when user request them When serving the OpenAI compatible API, we should check if {"stream_options": {"include_usage": true} is set in the request when deciding whether we should send usage statistics closes: #16048 * add unit test8 个月前
server : Support multimodal completion and embeddings prompts in JSON format (#15108) - Use server_tokens in more places in server and util.cpp - Convert most functions that used llama_tokens to server_tokens - Modify input tokenizer to handle JSON objects as subprompts - Break out MTMD prompt parsing into utility function - Support JSON objects with multimodal_data arrays for MTMD prompts along with other existing types - Add capability to model endpoint to indicate if client can send multimodal data - Add tests.9 个月前
llama: use FA + max. GPU layers by default (#15434) * llama: use max. GPU layers by default, auto -fa * ggml-backend: abort instead of segfault9 个月前
server : disable context shift by default (#15416) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local9 个月前
server : disable context shift by default (#15416) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local9 个月前
server : disable context shift by default (#15416) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local9 个月前
server : disable context shift by default (#15416) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local9 个月前
server : disable context shift by default (#15416) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local9 个月前
server : disable context shift by default (#15416) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local9 个月前
llama: use FA + max. GPU layers by default (#15434) * llama: use max. GPU layers by default, auto -fa * ggml-backend: abort instead of segfault9 个月前
server : speed up tests (#15836) * server : speed up tests * clean up * restore timeout_seconds in some places * flake8 * explicit offline8 个月前
server : disable context shift by default (#15416) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local9 个月前
server : speed up tests (#15836) * server : speed up tests * clean up * restore timeout_seconds in some places * flake8 * explicit offline8 个月前
server : speed up tests (#15836) * server : speed up tests * clean up * restore timeout_seconds in some places * flake8 * explicit offline8 个月前