文件最后提交记录最后更新时间
Ensure no downtime during rollouts (#854) <!-- Provide a brief summary of your changes --> ## Motivation and Context <!-- Why is this change needed? What problem does it solve? --> The following PR ensures we don't have downtime when we are doing rollouts during deployment/promotions. ## How Has This Been Tested? <!-- Have you tested this in a real application? Which scenarios were tested? --> ## Breaking Changes <!-- Will users need to update their code or configurations? --> ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. --> - [ ] I have read the [MCP Documentation](https://modelcontextprotocol.io) - [ ] My code follows the repository's style guidelines - [ ] New and existing tests pass locally - [ ] I have added appropriate error handling - [ ] I have added or updated documentation as needed ## Additional context <!-- Add any other context, implementation notes, or design decisions --> Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>5 个月前
deploy: Deploy the registry to GCP (#255) Adds the Pulumi code to: - Deploy the registry (and associated services e.g. mongodb) to Google Cloud Platform (GCP), on top of Google Kubernetes Engine (GKE) - Sets up proper environments and secrets management - Uses the real container image, now that it's published in #225. At the moment attached to latest, we might want to pin the version later (or perhaps always use `latest` in staging, and pin prod) - Uses real domains (`staging.registry.modelcontextprotocol.io`) rather than examples (``) ## Motivation and Context Setting up infrastructure to deploy it. I set something up in Azure in #227, although not super robust (e.g. no service accounts etc.). Think we will use GCP as: - the maintainers have experience with GCP, but none with Azure - costs are quite low, and Anthropic is happy to cover them in the short term - means we only have to maintain one login system (just Google Cloud Identity), not two (Google Workspace + Azure) ## How Has This Been Tested? Deployed this to a staging and production cluster. Try it yourself at: ```bash curl -H "Host: staging.registry.modelcontextprotocol.io" -k https://35.222.36.75/v0/ping ``` (will be sorting out domains very soon) ## Breaking Changes NA - just adds support for GCP deployment ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. --> - [x] I have read the [MCP Documentation](https://modelcontextprotocol.io) - [x] My code follows the repository's style guidelines - [ ] New and existing tests pass locally - [x] I have added appropriate error handling - [x] I have added or updated documentation as needed ## Additional context <!-- Add any other context, implementation notes, or design decisions --> Expected follow-ups: - GitHub Action setup to deploy things to the cluster from GitHub, to avoid gatekeeping to just the people with the secrets.9 个月前
infra: Improve local development flow and avoid false positive infra diffs (#267) 9 个月前
chore(deploy): bump production image tag to 1.5.0 (#1043) ## Summary - Updates the production image tag in `deploy/Pulumi.gcpProd.yaml` from `1.4.1` to `1.5.0` ## Test plan - [ ] Verify the Pulumi deployment picks up the new image tag - [ ] Confirm the v1.5.0 container image exists in the registry 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>2 个月前
Add monitoring infrastructure: victoriametrics and grafana (#328) This includes infra setup for metrics collection and visualisation ## Motivation and Context It is important to have basic observability setup, so that common issues can be identified quickly ## How Has This Been Tested? - Tested on local - Tested on one of the production setup ## Breaking Changes No ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. --> - [X] I have read the [MCP Documentation](https://modelcontextprotocol.io) - [X] My code follows the repository's style guidelines - [X] New and existing tests pass locally - [X] I have added appropriate error handling - [ ] I have added or updated documentation as needed ## Additional context - This includes set up of following components: - Victoriametrics single node cluster for storing metrics data with persistent storage configuration for local and production env. - Vmagent with target discovery for mcp-registry pods with scrape interval of 30s. - Grafana for visualising metrics and setting alerts. - Datasource of victoriametrics is pre-configured as config map - Persistent volume for basic configuration - Sqlite for local and Postgres for production setup for storing dashboard, alerts etc details ## Metrics in action <img width="1422" height="666" alt="Screenshot 2025-08-31 at 4 35 28 AM" src="https://github.com/user-attachments/assets/e54ce846-653c-4fa3-aa7a-fe277f0810e6" /> --------- Co-authored-by: Adam Jones <adamj@anthropic.com> Co-authored-by: adam jones <adamj+git@anthropic.com>8 个月前
Add monitoring infrastructure: victoriametrics and grafana (#328) This includes infra setup for metrics collection and visualisation ## Motivation and Context It is important to have basic observability setup, so that common issues can be identified quickly ## How Has This Been Tested? - Tested on local - Tested on one of the production setup ## Breaking Changes No ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. --> - [X] I have read the [MCP Documentation](https://modelcontextprotocol.io) - [X] My code follows the repository's style guidelines - [X] New and existing tests pass locally - [X] I have added appropriate error handling - [ ] I have added or updated documentation as needed ## Additional context - This includes set up of following components: - Victoriametrics single node cluster for storing metrics data with persistent storage configuration for local and production env. - Vmagent with target discovery for mcp-registry pods with scrape interval of 30s. - Grafana for visualising metrics and setting alerts. - Datasource of victoriametrics is pre-configured as config map - Persistent volume for basic configuration - Sqlite for local and Postgres for production setup for storing dashboard, alerts etc details ## Metrics in action <img width="1422" height="666" alt="Screenshot 2025-08-31 at 4 35 28 AM" src="https://github.com/user-attachments/assets/e54ce846-653c-4fa3-aa7a-fe277f0810e6" /> --------- Co-authored-by: Adam Jones <adamj@anthropic.com> Co-authored-by: adam jones <adamj+git@anthropic.com>8 个月前
feat(infra): Add Pulumi-based Kubernetes deployment infrastructure (#237) Original PR: #227 - Add Pulumi-based infrastructure as code for deploying MCP Registry to Kubernetes - Support for both local development (minikube) and Azure Kubernetes Service (AKS) - Complete deployment orchestration including: - cluster setup: e.g. you point this at an Azure account, and it can set up and manage the cluster for you. e.g. K8s version, number of nodes, type of nodes, ... - cloud agnostic K8s services: cert-manager, nginx-ingress - app services: MongoDB, and registry application (currently using nginx as a placeholder, blocked on #225 (as is #190). but should be a 1 line change) ## How is this different to #190 - Supports cluster setup and management. This enables: - Non-hosting maintainers managing many devops workflows (e.g. scaling up the cluster, or bumping K8s versions). Without this, we'd need to bug/page the organisation hosting the registry when we need these things changed. - Makes it easy to spin up things like staging/temporary clusters, as well as enables contributors to replicate the stack exactly on their own Azure accounts. - Sets up cloud-agnostic services. For example, rather than using the Azure-managed ingresses and CA, we install nginx-ingress and cert-manager. This enables: - Running the entire infra stack can also run locally (e.g. in minikube, k3s, orbstack, colima) - making it much easier for contributors to test changes to infra stuff. - Moving between cloud providers much more easily, e.g. we could shift from Azure to GCP/AWS/other with minimal hassle. - Everything stays written in Go, rather than Helm templates. This means we get things like type-checking etc. for free (which from my experience makes AI tools wayyy better at editing K8s stuff), and contributors don't need to learn a new language if they're already using Go. ## Testing I've got this running well: - locally in minikube - on cloud in Azure (my personal Azure account) <details><summary>Claude written architecture and security review</summary> <p> ## Deployment Review & Assessment ### Current Architecture Strengths **Pulumi IaC Approach** - Well-structured infrastructure as code using Pulumi - Multi-provider support (AKS, local) with clean abstraction - Good separation of concerns in `pkg/` directory **Security Fundamentals** - Non-root container execution (`appuser` with UID 10001) - Secrets properly managed via Kubernetes secrets - TLS/SSL certificate management with cert-manager and Let's Encrypt ### Critical Issues & High-Priority Improvements **1. Production Deployment Not Ready** 🚨 The registry deployment uses `nginx:alpine` placeholder image instead of the actual MCP registry: - `deploy/pkg/k8s/registry.go:67` - TODO comments indicate incomplete setup - Health probes are commented out - Port mapping doesn't match actual application (80 vs 8080) **Fix:** Build and publish actual registry container image to GHCR, update deployment **2. Database Security Considerations** 🔒 - MongoDB deployed without authentication - No backup/disaster recovery strategy - Database credentials hardcoded *Note: MongoDB is not exposed externally (ClusterIP service), so this is not a critical security risk but should be addressed for production.* **3. Monitoring & Observability Gaps** 📊 - No Prometheus/Grafana monitoring stack - No log aggregation (ELK/Loki) - No application metrics/health dashboards - No alerting configured **4. High Availability & Reliability** ⚠️ - Single MongoDB instance (no replication) - No persistent volume backup strategy - Fixed 10Gi storage without growth planning - Only 2 replicas for registry service - No pod disruption budgets - No horizontal pod autoscaling ### Recommended Improvements **Immediate (High Priority)** 1. Complete Registry Deployment - Build proper container image pipeline, enable health checks 2. Secure MongoDB - Add authentication credentials, implement backup strategy **Medium Priority** 3. Add Monitoring Stack - Prometheus, Grafana deployment 4. Security Hardening (Nice to Have) - RBAC policies, Network Policies, Pod Security Standards 5. CI/CD Pipeline Enhancement - Container image building/publishing, automated deployment **Lower Priority** 6. High Availability - MongoDB replica set, HPA for registry pods 7. Operational Excellence - Kubernetes dashboard, cost optimization ### Configuration Issues - Production config has test credentials: `deploy/Pulumi.prod.yaml:4-5` - Missing environment-specific resource sizing - Hardcoded domain names (`example.com`) The deployment setup shows good architectural foundations but needs significant work before production readiness. The most critical issue is the placeholder nginx container - priority should be completing the actual registry application deployment before addressing the other improvements. Security measures like RBAC and Network Policies are nice to have but not strictly necessary given that MongoDB is not exposed externally. 🤖 Generated with [Claude Code](https://claude.ai/code) </p> </details> ## Metadata Working towards #91 --------- Co-authored-by: Claude <noreply@anthropic.com>9 个月前
Fix the image reference for prod: from v1.3.0 to 1.3.0 (#669) <!-- Provide a brief summary of your changes --> ## Motivation and Context <!-- Why is this change needed? What problem does it solve? --> The following PR fixes the docs and the image reference for prod since they don't expect to have the v prefix there. ## How Has This Been Tested? <!-- Have you tested this in a real application? Which scenarios were tested? --> ## Breaking Changes <!-- Will users need to update their code or configurations? --> ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. --> - [ ] I have read the [MCP Documentation](https://modelcontextprotocol.io) - [ ] My code follows the repository's style guidelines - [ ] New and existing tests pass locally - [ ] I have added appropriate error handling - [ ] I have added or updated documentation as needed ## Additional context <!-- Add any other context, implementation notes, or design decisions --> --------- Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>7 个月前
build(deps): bump github.com/pulumi/pulumi-kubernetes/sdk/v4 from 4.27.0 to 4.28.0 in /deploy (#1056) Bumps [github.com/pulumi/pulumi-kubernetes/sdk/v4](https://github.com/pulumi/pulumi-kubernetes) from 4.27.0 to 4.28.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pulumi/pulumi-kubernetes/releases">github.com/pulumi/pulumi-kubernetes/sdk/v4's releases</a>.</em></p> <blockquote> <h2>v4.28.0</h2> <p>No release notes provided.</p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pulumi/pulumi-kubernetes/blob/master/CHANGELOG.md">github.com/pulumi/pulumi-kubernetes/sdk/v4's changelog</a>.</em></p> <blockquote> <h2>4.28.0 (March 12, 2026)</h2> <h3>Changed</h3> <ul> <li>Upgrade Kubernetes schema and libraries to v1.35.2.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/1655f4c7933d27651c5c000f1839dfceda24d7e4"><code>1655f4c</code></a> Prepare for v4.28.0 release (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4230">#4230</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/dbeeb5697885ce51eedadd1f62afbe8079013db3"><code>dbeeb56</code></a> Update module github.com/pulumi/pulumi-dotnet/pulumi-language-dotnet/v3 to v3...</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/c6b1c916a36fe47e3eb9d6b4697d0064f614f8b1"><code>c6b1c91</code></a> Automated upgrade: bump Kubernetes to v1.35.2 (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4223">#4223</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/fedb665c4f3e3365ac157c3422f58be216bc5138"><code>fedb665</code></a> Run build on patch upgrade PRs (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4221">#4221</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/4b3e1f493c79bf7b56bcf0e66448ceae14de83ea"><code>4b3e1f4</code></a> Fix misplaced closing paren in kubernetes-update-check workflow (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4218">#4218</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/1737d00694059fb5005dda8ee6ce708c1bbc6a71"><code>1737d00</code></a> Combined dependencies PR (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4209">#4209</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/be0dfed21475aa54c38ec08345b2fb8edad2224c"><code>be0dfed</code></a> Update GitHub Actions workflows. (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4217">#4217</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/4ecded143543b648d1c5cafba1e46294d7d596b8"><code>4ecded1</code></a> Fix make lint to run to completion (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4213">#4213</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/af862bb84987bd1e8857d21dc06619b9cb577399"><code>af862bb</code></a> Move cloud-ready-checks back to pulumi-kubernetes (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4196">#4196</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/ecf6701c9e6779f47a99817214a95f7913ba38b9"><code>ecf6701</code></a> Update GitHub Actions workflows. (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4208">#4208</a>)</li> <li>Additional commits viewable in <a href="https://github.com/pulumi/pulumi-kubernetes/compare/v4.27.0...v4.28.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/pulumi/pulumi-kubernetes/sdk/v4&package-manager=go_modules&previous-version=4.27.0&new-version=4.28.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>2 个月前
build(deps): bump github.com/pulumi/pulumi-kubernetes/sdk/v4 from 4.27.0 to 4.28.0 in /deploy (#1056) Bumps [github.com/pulumi/pulumi-kubernetes/sdk/v4](https://github.com/pulumi/pulumi-kubernetes) from 4.27.0 to 4.28.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pulumi/pulumi-kubernetes/releases">github.com/pulumi/pulumi-kubernetes/sdk/v4's releases</a>.</em></p> <blockquote> <h2>v4.28.0</h2> <p>No release notes provided.</p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pulumi/pulumi-kubernetes/blob/master/CHANGELOG.md">github.com/pulumi/pulumi-kubernetes/sdk/v4's changelog</a>.</em></p> <blockquote> <h2>4.28.0 (March 12, 2026)</h2> <h3>Changed</h3> <ul> <li>Upgrade Kubernetes schema and libraries to v1.35.2.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/1655f4c7933d27651c5c000f1839dfceda24d7e4"><code>1655f4c</code></a> Prepare for v4.28.0 release (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4230">#4230</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/dbeeb5697885ce51eedadd1f62afbe8079013db3"><code>dbeeb56</code></a> Update module github.com/pulumi/pulumi-dotnet/pulumi-language-dotnet/v3 to v3...</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/c6b1c916a36fe47e3eb9d6b4697d0064f614f8b1"><code>c6b1c91</code></a> Automated upgrade: bump Kubernetes to v1.35.2 (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4223">#4223</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/fedb665c4f3e3365ac157c3422f58be216bc5138"><code>fedb665</code></a> Run build on patch upgrade PRs (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4221">#4221</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/4b3e1f493c79bf7b56bcf0e66448ceae14de83ea"><code>4b3e1f4</code></a> Fix misplaced closing paren in kubernetes-update-check workflow (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4218">#4218</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/1737d00694059fb5005dda8ee6ce708c1bbc6a71"><code>1737d00</code></a> Combined dependencies PR (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4209">#4209</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/be0dfed21475aa54c38ec08345b2fb8edad2224c"><code>be0dfed</code></a> Update GitHub Actions workflows. (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4217">#4217</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/4ecded143543b648d1c5cafba1e46294d7d596b8"><code>4ecded1</code></a> Fix make lint to run to completion (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4213">#4213</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/af862bb84987bd1e8857d21dc06619b9cb577399"><code>af862bb</code></a> Move cloud-ready-checks back to pulumi-kubernetes (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4196">#4196</a>)</li> <li><a href="https://github.com/pulumi/pulumi-kubernetes/commit/ecf6701c9e6779f47a99817214a95f7913ba38b9"><code>ecf6701</code></a> Update GitHub Actions workflows. (<a href="https://redirect.github.com/pulumi/pulumi-kubernetes/issues/4208">#4208</a>)</li> <li>Additional commits viewable in <a href="https://github.com/pulumi/pulumi-kubernetes/compare/v4.27.0...v4.28.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/pulumi/pulumi-kubernetes/sdk/v4&package-manager=go_modules&previous-version=4.27.0&new-version=4.28.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>2 个月前
Add database backup functionality with GCS integration (#297) <!-- Provide a brief summary of your changes --> Implements automated database backup functionality with Google Cloud Storage integration, including retention policies and MinIO support for local development. ## Motivation and Context <!-- Why is this change needed? What problem does it solve? --> This change adds critical database backup capabilities to ensure data durability and disaster recovery. The solution provides automated backups with configurable retention policies and supports both production (GCS) and development (MinIO) environments. ## How Has This Been Tested? <!-- Have you tested this in a real application? Which scenarios were tested? --> - Tested backup creation and restoration with MinIO in local development - Verified GCS bucket lifecycle policies for automatic deletion after 60 days - Tested backup retention and cleanup logic ## Breaking Changes <!-- Will users need to update their code or configurations? --> No breaking changes. This is an additive feature that doesn't affect existing functionality. ## Types of changes <!-- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [x] Documentation update ## Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. --> - [x] I have read the [MCP Documentation](https://modelcontextprotocol.io) - [x] My code follows the repository's style guidelines - [x] New and existing tests pass locally - [x] I have added appropriate error handling - [x] I have added or updated documentation as needed ## Additional context <!-- Add any other context, implementation notes, or design decisions --> - Implements 60-day retention period for backups as a safety net - Uses GCS lifecycle rules for automatic cleanup - Includes MinIO setup instructions for local development testing - Port-forwarding commands updated for consistency across documentation - Fixes #1848 个月前
README.md

MCP Registry Kubernetes Deployment

This directory contains Pulumi infrastructure code to deploy the MCP Registry service to a Kubernetes cluster. It supports deploying the infrastructure locally (using an existing kubeconfig, e.g. with minikube) or to Google Cloud Platform (GCP).

Quick Start

Local Development

Pre-requisites:

  1. Ensure your kubeconfig is configured at the cluster you want to use. For minikube, run minikube start && minikube tunnel.
  2. Run make local-up to deploy the stack.
  3. Access the repository via the ingress load balancer. You can find its external IP with kubectl get svc ingress-nginx-controller -n ingress-nginx. Then run curl -H "Host: local.registry.modelcontextprotocol.io" -k https://<EXTERNAL-IP>/v0/ping to check that the service is up.

To change config

The stack is configured out of the box for local development. But if you want to make changes, run commands like:

PULUMI_CONFIG_PASSPHRASE="" pulumi config set mcp-registry:environment local
PULUMI_CONFIG_PASSPHRASE="" pulumi config set mcp-registry:githubClientSecret --secret <some-secret-value>

To delete the stack

make local-destroy and deleting the cluster (with minikube: minikube delete) will reset you back to a clean state.

Production Deployment (GCP)

Note: Deployments are automatically handled by GitHub Actions with separate workflows for staging and production:

  • Staging: All merges to the main branch automatically build and deploy the latest code to the staging environment via deploy-staging.yml. Staging always uses the :main Docker image tag.

  • Production: Deployment requires explicit configuration of the Docker image tag in Pulumi.gcpProd.yaml and is triggered automatically via deploy-production.yml when this config file is pushed to main. This GitOps-style approach provides manual control and an audit trail for production versions.

To deploy a specific version to production:

  1. Cut a release (if deploying a new version):

    # Via GitHub UI: https://github.com/modelcontextprotocol/registry/releases
    # Or via gh CLI:
    gh release create v1.2.3 --generate-notes
    

    This builds Docker images tagged as 1.2.3 and latest (note: image tags do not include the 'v' prefix)

  2. Update the production image tag in Pulumi.gcpProd.yaml:

    # Edit deploy/Pulumi.gcpProd.yaml
    # Change line: mcp-registry:imageTag: 1.2.3  (note: no 'v' prefix for Docker image tags)
    git add deploy/Pulumi.gcpProd.yaml
    git commit -m "Deploy version 1.2.3 to production"
    git push
    
  3. The production deployment workflow will automatically trigger and deploy the specified version

Manual Override: The steps below are preserved if a manual deployment override is needed.

Pre-requisites:

  • Pulumi CLI installed
  • A Google Cloud Platform (GCP) account
  • A GCP Service Account with appropriate permissions
  1. Create a project: gcloud projects create mcp-registry-prod
  2. Set the project: gcloud config set project mcp-registry-prod
  3. Enable required APIs: gcloud services enable storage.googleapis.com && gcloud services enable container.googleapis.com
  4. Create a service account with necessary permissions, and get the key:
    gcloud iam service-accounts create pulumi-svc
    sleep 10
    gcloud projects add-iam-policy-binding mcp-registry-prod --member="serviceAccount:pulumi-svc@mcp-registry-prod.iam.gserviceaccount.com" --role="roles/container.admin"
    gcloud projects add-iam-policy-binding mcp-registry-prod --member="serviceAccount:pulumi-svc@mcp-registry-prod.iam.gserviceaccount.com" --role="roles/compute.admin"
    gcloud projects add-iam-policy-binding mcp-registry-prod --member="serviceAccount:pulumi-svc@mcp-registry-prod.iam.gserviceaccount.com" --role="roles/storage.admin"
    gcloud projects add-iam-policy-binding mcp-registry-prod --member="serviceAccount:pulumi-svc@mcp-registry-prod.iam.gserviceaccount.com" --role="roles/storage.hmacKeyAdmin"
    gcloud iam service-accounts add-iam-policy-binding $(gcloud projects describe mcp-registry-prod --format="value(projectNumber)")-compute@developer.gserviceaccount.com --member="serviceAccount:pulumi-svc@mcp-registry-prod.iam.gserviceaccount.com" --role="roles/iam.serviceAccountUser"
    gcloud iam service-accounts keys create sa-key.json --iam-account=pulumi-svc@mcp-registry-prod.iam.gserviceaccount.com
    
  5. Create a GCS bucket for Pulumi state: gsutil mb gs://mcp-registry-prod-pulumi-state
  6. Set Pulumi's backend to GCS: pulumi login gs://mcp-registry-prod-pulumi-state
  7. Get the passphrase file passphrase.prod.txt from the registry maintainers
  8. Init the GCP stack: PULUMI_CONFIG_PASSPHRASE_FILE=passphrase.prod.txt pulumi stack init gcpProd
  9. Set the GCP credentials in Pulumi config:
    # Base64 encode the service account key and set it
    pulumi config set --secret gcp:credentials "$(base64 < sa-key.json)"
    
  10. Deploy: make prod-up
  11. Access the repository via the ingress load balancer. You can find its external IP with: kubectl get svc ingress-nginx-controller -n ingress-nginx. Then run curl -H "Host: prod.registry.modelcontextprotocol.io" -k https://<EXTERNAL-IP>/v0/ping to check that the service is up.

Structure

├── main.go                 # Pulumi program entry point
├── Pulumi.yaml             # Project configuration
├── Pulumi.local.yaml       # Local stack configuration
├── Pulumi.gcpProd.yaml     # GCP production stack configuration
├── Pulumi.gcpStaging.yaml  # GCP staging stack configuration
├── Makefile                # Build and deployment targets
├── go.mod                  # Go module dependencies
├── go.sum                  # Go module checksums
└── pkg/                    # Infrastructure packages
    ├── k8s/                # Kubernetes deployment components
    │   ├── backup.go          # Database backup configuration
    │   ├── cert_manager.go    # SSL certificate management
    │   ├── deploy.go          # Deployment orchestration
    │   ├── ingress.go         # Ingress controller setup
    │   ├── monitoring.go      # Metrics and monitoring setup
    │   ├── postgres.go        # PostgreSQL database deployment
    │   └── registry.go        # MCP Registry deployment
    └── providers/          # Kubernetes cluster providers
        ├── types.go           # Provider interface definitions
        ├── gcp/               # Google Kubernetes Engine provider
        └── local/             # Local kubeconfig provider

Architecture Overview

Deployment Flow

  1. Pulumi program starts in main.go
  2. Configuration is loaded from Pulumi config files
  3. Provider factory creates appropriate cluster provider (GCP or local)
  4. Cluster provider sets up Kubernetes access
  5. k8s.DeployAll() orchestrates complete deployment:
    • Certificate manager for SSL/TLS
    • Ingress controller for external access
    • Database for data persistence
    • Backup infrastructure for database
    • Monitoring and metrics collection
    • MCP Registry application

Configuration

Parameter Description Required
environment Deployment environment (local/staging/prod) Yes
provider Kubernetes provider (local/gcp) No (default: local)
githubClientId GitHub OAuth Client ID Yes
githubClientSecret GitHub OAuth Client Secret Yes
imageTag Docker image tag for production environment Yes (prod only)
gcpProjectId GCP Project ID (required when provider=gcp) No
gcpRegion GCP Region (default: us-central1) No

Database Backups

The deployment uses K8up (a Kubernetes backup operator) that uses Restic under the hood.

When running locally they are stored in a Minio bucket. In staging and production, backups are stored in a GCS bucket.

Accessing Backup Files

Local Development (MinIO)

# Expose MinIO web console
kubectl port-forward -n minio svc/minio 9000:9000 9001:9001

Then open localhost:9001, login with username minioadmin and password minioadmin, and navigate to the k8up-backups bucket.

Staging and Production (GCS)

Decrypting and Restoring Backups

Backups are encrypted using Restic. To access the backup data:

  1. Download the backup files from the bucket:
    # Local (MinIO) - ensure port-forward is active: kubectl port-forward -n minio svc/minio 9000:9000 9001:9001
    AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin \
      aws --endpoint-url http://localhost:9000 s3 sync s3://k8up-backups/ ./backup-files/
    
    # GCS (staging/production)
    gsutil -m cp -r gs://mcp-registry-{staging|prod}-backups/* ./backup-files/
    
  2. Install Restic
  3. Restore the backup:
    RESTIC_PASSWORD=password restic -r ./backup-files restore latest --target ./restored-files
    
    PostgreSQL data will be in ./restored-files/data/registry-pg-1/pgdata/

Troubleshooting

To configure kubectl to access an existing GKE cluster:

# Login
gcloud auth login
gcloud auth application-default login

# For production
gcloud container clusters get-credentials mcp-registry-prod --zone us-central1-b --project mcp-registry-prod

# For staging
gcloud container clusters get-credentials mcp-registry-staging --zone us-central1-b --project mcp-registry-staging

Check Status

kubectl get pods
kubectl get deployment
kubectl get svc
kubectl get ingress
kubectl get svc -n ingress-nginx

View Logs

kubectl logs -l app=mcp-registry
kubectl logs -l app=postgres

Check Backup Status

kubectl describe schedule.k8up.io 
kubectl get backup