| feat: add configurable OC metadata header and fix disabled-lock Get paths Add the oc_metadata_header gflag so operators can disable the shm lock frame. When the flag is false the worker sets metadataSize=0, buffers carry a DisabledLock, and Buffer::RLatch() returns K_NOT_SUPPORTED. Introduce Lock::IsSupported() and Buffer::CopyDataWithRLatch() so all SDK-internal Get-and-copy paths (C++ KV, C API, Java KV, Python KV) transparently skip latching when the lock is disabled, while explicit user RLatch() calls still surface K_NOT_SUPPORTED. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> | 1 天前 |
| excl conn err handling | 6 个月前 |
| feat[UB]: Support UB data transfer for Client to Worker Put and Get requests | 3 个月前 |
| Modify urma connect method | 5 个月前 |
| feat(rpc): add zmq latency metrics | 1 个月前 |
| excl conn err handling | 6 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| feat(rpc): add zmq latency metrics | 1 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| feat(log): implement unified random log sampling system Replace LogRateLimiter's first-N rate limiting with LogSampler's unified random hash-threshold sampling across request/access/diagnostic dimensions. Key changes: - Introduce LogSampler core component with per-request sampling granularity - Implement LogSampleState RPC propagation for cross-process consistency - Add AccessRecorderGuard RAII pattern to skip construction when dropped - Support request_sample_rate/access_sample_rate/diagnostic_sample_rate (0.0-1.0) - Ensure FATAL/CHECK logs are never dropped regardless of sampling - Remove log_rate_limit gflag and LogRateLimiter class entirely - Reserve proto field 24 in RegisterClientRspPb (old log_rate_limit removed) Performance guarantees: - Hot paths (ShouldCreateLogMessage/ShouldCreatePlogMessage) have zero overhead when request is sampled out: no allocations, locks, CAS, RNG, or clock reads - Reject decision gates diagnostic/access log creation early Documentation: - Update docs/source_zh_cn/appendix/log_guide.md with sampling section - Remove obsolete DATASYSTEM_LOG_RATE_LIMIT from client_env_guide.md - Update .repo_context/ with LogSampler design (log-sampler-design.md) Tests: - UT: log_sampler_test, log_sampler_access_test, log_sampler_config_test, log_sampler_integration_test (cover LS-001 to LS-016 externally observable scenarios, 6 retained tests for external features, 14 internal-implementation tests removed) - ST: log_sampler_st_test (cover LS-002b/012/013 cross-process behavior) - Performance: log_performance_test benchmarks (LS-010/011) Breaking changes: - log_rate_limit gflag removed (reserved field 24 in proto) - DATASYSTEM_LOG_RATE_LIMIT env var removed (client receives config via RPC) - EnsureRequestSampleDecision() removed (use ShouldRecordAccess + Record backstop) | 9 天前 |
| Adjust urma log and add perf trace log. | 7 天前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| set nice for io thread | 1 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| feat(zmq): add metrics for ZMQ I/O fault isolation and performance profiling Add 13 ZMQ metrics (IDs 100-113) using the datasystem::metrics framework to enable fault isolation and RPC framework performance self-proof, add tests. | 2 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| add bazel build | 2 个月前 |
| Exclusive Connection Feature. | 6 个月前 |
| set nice for io thread | 1 个月前 |
| feat(zmq): add metrics for ZMQ I/O fault isolation and performance profiling Add 13 ZMQ metrics (IDs 100-113) using the datasystem::metrics framework to enable fault isolation and RPC framework performance self-proof, add tests. | 2 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| fix: avoid ZMQ timeout deduction from cross-host clock skew | 1 个月前 |
| fix(worker): 线程池资源指标从瞬时快照改为区间累计指标 资源日志中的线程池指标(idle/total/max/waiting/usage)使用瞬时快照采集, 5秒采样间隔下几乎总是 idle=total、waiting=0、usage=0,不具备观测价值。 改为区间累计指标(maxRunning/total/tasksDelta/maxWaiting/usage): - maxRunning: 区间峰值并发线程数 - total: 当前线程总数 - tasksDelta: 区间完成任务数 - maxWaiting: 区间峰值排队数 - usage: 区间真实利用率 (totalWorkTimeNs / total×intervalNs) 主要改动: - ThreadPool新增4个atomic计数器和GetAndResetIntervalStats()方法 - DoThreadWork()增加task计时和区间计数 - 新增GetRpcServicesSnapshot()供liveness检查使用(不reset) - 资源采集路径使用GetAndResetIntervalStats()(reset型) - 采集时用当前running/waiting种子下一周期,处理跨区间任务 - master async线程池同步改用区间接口 - 报告脚本适配新字段格式 fix(worker): 线程池资源指标从瞬时快照改为区间累计指标 资源日志中的线程池指标(idle/total/max/waiting/usage)使用瞬时快照采集, 5秒采样间隔下几乎总是 idle=total、waiting=0、usage=0,不具备观测价值。 改为区间累计指标(maxRunning/total/tasksDelta/maxWaiting/usage): - maxRunning: 区间峰值并发线程数 - total: 当前线程总数 - tasksDelta: 区间完成任务数 - maxWaiting: 区间峰值排队数 - usage: 区间真实利用率 (totalWorkTimeNs / total×intervalNs) 主要改动: - ThreadPool新增4个atomic计数器和GetAndResetIntervalStats()方法 - DoThreadWork()增加task计时和区间计数 - 新增GetRpcServicesSnapshot()供liveness检查使用(不reset) - 资源采集路径使用GetAndResetIntervalStats()(reset型) - 采集时用当前running/waiting种子下一周期,处理跨区间任务 - master async线程池同步改用区间接口 - 报告脚本适配新字段格式 | 1 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| feat(rpc): add zmq latency metrics | 1 个月前 |
| fix: add pre-time validation in CheckRpcLatencyAfter* to prevent false slow-log positives The four CheckRpcLatencyAfter* methods computed latency deltas without verifying the earlier tick timestamp was actually set (remained 0 if the tick was missing from latency_ticks). A zero pre-time caused subtraction to produce a bogus huge value, always exceeding the threshold. Add > 0 guard on the pre-time before computing the delta, matching the pattern already used in RecordServerLatencyMetrics. | 1 个月前 |
| fix: add queue latency metrics to ZMQ RPC layer Add CLIENT_STUB_SEND tick and RecordLatencyMetric helper to measure queue latency in microseconds. Includes corresponding test files for queue latency validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> | 1 个月前 |
| feat(zmq): add metrics for ZMQ I/O fault isolation and performance profiling Add 13 ZMQ metrics (IDs 100-113) using the datasystem::metrics framework to enable fault isolation and RPC framework performance self-proof, add tests. | 2 个月前 |
| fix: fix fd leak when initfrontend failed | 2 个月前 |
| feat(zmq): add metrics for ZMQ I/O fault isolation and performance profiling Add 13 ZMQ metrics (IDs 100-113) using the datasystem::metrics framework to enable fault isolation and RPC framework performance self-proof, add tests. | 2 个月前 |
| Support IPv6 | 6 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| fix: bazel build enable perf | 2 个月前 |
| Hi all, this is yuanrong datasystem | 7 个月前 |
| Exclusive Connection Feature. | 6 个月前 |
| Fix serial execution caused by global write lock when connecting different nodes Replace connMap_ value type from weak_ptr<ZmqBaseStubConn> to shared_ptr<ZmqConnEntry> where each entry holds its own mutex. GetOrCreateConnEntry() now only uses the global connMux_ lock for map lookup/insert (short duration), then acquires the per-entry mutex for connection creation. This allows concurrent connection setup to different nodes while still serializing access to the same node's connection. | 20 天前 |
| Fix serial execution caused by global write lock when connecting different nodes Replace connMap_ value type from weak_ptr<ZmqBaseStubConn> to shared_ptr<ZmqConnEntry> where each entry holds its own mutex. GetOrCreateConnEntry() now only uses the global connMux_ lock for map lookup/insert (short duration), then acquires the per-entry mutex for connection creation. This allows concurrent connection setup to different nodes while still serializing access to the same node's connection. | 20 天前 |
| feat(rpc): add zmq latency metrics | 1 个月前 |
| feat(rpc): add zmq latency metrics | 1 个月前 |
| feat(rpc): add zmq latency metrics | 1 个月前 |