| fix: optimize gRPC connection for faster recovery after service restart
- Add keepalive and backoff configuration to gRPC clients:
- worker/client/client.go
- mq/client/client.go
- api/eventlog/entry/grpc/client/event_log_client.go
- Add keepalive and enforcement policy to gRPC servers:
- worker/server/server.go
- api/eventlog/entry/grpc/server/event_log_server.go
Key changes:
- Keepalive: detect broken connections within 10 seconds
- Backoff MaxDelay: reduce from 120s to 10s for faster reconnection
- MaxConnectionAge: force reconnection every 30 minutes for load balancing
- Add Close() method to worker client for proper connection cleanup
This fixes the issue where API requests timeout for 10+ minutes
after worker service restart due to stale gRPC connections.
Signed-off-by: Qi Zhang <smallqi1@163.com>
| 4 个月前 |