name: gitcode-pr-impact-locator description: Use when needing to identify which merged PR caused a regression, behavioral change, or anomaly in a GitCode repository. Triggers on phrases like "which PR caused this", "定位引入问题的PR", "排查哪个PR导致的", "find the PR that broke X", or when user describes a symptom with a time window and target branch and wants to trace it to specific code changes.

GitCode PR Impact Locator

Systematically narrow down which merged PR introduced a problem by correlating PR metadata, file changes, and code diffs against the reported symptom.

Core Workflow

User describes symptom + time window + branch
              │
              ▼
  1. Clarify scope (missing info? ask)
              │
              ▼
  2. Fetch merged PR list (gc pr list)
              │
              ▼
  3. Triage: score each PR by relevance
     (keywords, files touched, timing, size)
              │
              ▼
  4. Deep-dive top suspects (gc pr diff / curl)
              │
              ▼
  5. Cross-reference file history (curl fallback)
              │
              ▼
  6. Output ranked suspects with evidence

Step 1: Clarify Scope

Before fetching PRs, ensure you have:

Required Optional (but helpful)
Target repo (owner/repo) Exact file path affected
Target branch Known "good" vs "bad" commit/tag
Time window (e.g., "last 10 days") Specific test/feature name
Symptom description Expected vs actual counts/values

Ask only for missing critical items. Don't block on optional ones.

Step 2: Fetch Merged PRs

Primary: gc CLI

# First batch (most recent 30)
gc pr list -R <owner/repo> --state merged --base <branch> \
  --sort updated --direction desc --limit 30 --format table

# Page 2+ (gc lacks --paginate)
gc pr list -R <owner/repo> --state merged --base <branch> \
  --sort updated --direction desc --limit 30 --page 2 --format table

Fallback: curl API (larger batches, more fields)

curl -s -H "Authorization: Bearer $GC_TOKEN" \
  "https://gitcode.com/api/v5/repos/<owner>/<repo>/pulls?state=merged&base=<branch>&sort=updated&direction=desc&per_page=100"

Extract with: | python3 -c "import json,sys; [print(f'#{p[\"number\"]} | {p[\"title\"]}') for p in json.load(sys.stdin)]"

Get merge timestamps

gc CLI pr list --format table doesn't show merge dates. Use curl to get merged_at:

curl ... | python3 -c "
import json,sys
for p in json.load(sys.stdin):
    print(f'#{p[\"number\"]} | merged:{p.get(\"merged_at\",\"?\")[:10]} | {p[\"title\"]}')
"

Step 3: Triage — Score PRs by Relevance

For each PR in the time window, score against these dimensions:

Dimension 1: Keywords (high weight)

Match PR title against problem domain terms. If symptom is "test_ops.py test count dropped", keywords: test_ops, test, skip, case, patch, common_device, common_utils, testing.

# Quick filter
gc pr list ... --format json | python3 -c "
import json,sys
keywords = ['test_ops','skip','patch','common_device']
for p in json.load(sys.stdin):
    score = sum(1 for kw in keywords if kw.lower() in p['title'].lower())
    if score > 0: print(f'#{p[\"number\"]} [score={score}] {p[\"title\"]}')
"

Dimension 2: Files Changed (high weight)

A PR that touches the affected file or its infrastructure is high-priority.

# Get changed files (curl — gc pr view doesn't show file list)
curl -s -H "Authorization: Bearer $GC_TOKEN" \
  "https://gitcode.com/api/v5/repos/<owner>/<repo>/pulls/<number>/files?per_page=100" \
  | python3 -c "import json,sys; [print(f['filename']) for f in json.load(sys.stdin)]"

Match against:

  • The exact file path mentioned in the symptom
  • Parent/sibling directories (e.g., test_upstream/ if symptom is about test generation)
  • Infrastructure files (common_device_type.py, common_utils.py, patch_manager.py)

Dimension 3: Timing (medium weight)

PR merged shortly before symptom appeared → high relevance. Correlate merged_at with when the problem was first observed.

Dimension 4: Change Magnitude (medium weight)

Large PRs (+many files) touching infrastructure are more likely to cause side effects. Check:

gc pr diff --repo <owner/repo> <number> 2>&1 | head -5
# Shows: Changes: +X -Y in Z file(s)

Quick Triage Table

Create a markdown table for the top candidates:

| PR | Score | Title | Files | Key Touchpoints |
|----|-------|-------|-------|-----------------|
| #36598 | *** | torch inductor patch add | 3 | common_device_type.py.patch |
| #34013 | ** | fix inductor patch apply bug | 96 | inductor .diff files |

Step 4: Deep-Dive Top Suspects

For each high-scoring PR, examine the actual code changes:

gc pr diff --repo <owner/repo> <number>

Focus on:

  • Control flow changes — did a function's logic change in a way that alters what gets included/excluded?
  • Configuration changes — did a list, mapping, or flag change?
  • New additions — was a new file/patch added that modifies behavior?
  • Removals — were any guards, skips, or filters removed?

Compare before/after for critical functions. Quote the exact diff lines as evidence.

Step 5: Cross-Reference File History (curl fallback)

gc CLI lacks file history. Use curl:

# Commit history for a specific file
curl -s -H "Authorization: Bearer $GC_TOKEN" \
  "https://gitcode.com/api/v5/repos/<owner>/<repo>/commits?path=<file>&sha=<branch>&per_page=10" \
  | python3 -c "
import json,sys
for c in json.load(sys.stdin):
    print(c['commit']['committer']['date'], c['commit']['message'].split('\n')[0])
"

This reveals if the affected file was modified by a PR not obvious from title alone.

Step 6: Output Ranked Findings

## 定位结论

### 首要嫌疑 — #N "PR title" (merged YYYY-MM-DD)

**变更文件:**
- `path/to/changed/file1`
- `path/to/changed/file2`

**核心证据:** [specific code change and why it explains the symptom]

**diff 摘录:**
```diff
- old behavior
+ new behavior (explains symptom)

次要嫌疑 — #M "PR title"

...

排除的 PR

PR 排除原因
#X 变更范围不重叠 / 时间不匹配 / ...

Always include the **排除** section — showing your work builds confidence.

## gc CLI Quick Reference

```bash
# Auth check
gc auth status

# List PRs
gc pr list -R owner/repo --state merged --base branch \
  --sort updated --direction desc --limit 30 [--page N]

# View PR
gc pr view --repo owner/repo <number>
gc pr view --repo owner/repo <number> --json  # (note: may miss description/merged_at)

# View PR diff
gc pr diff --repo owner/repo <number>

# Search issues
gc issue list -R owner/repo --search "keyword" --state all --limit 10

# List labels
gc label list -R owner/repo

curl API Fallback Reference

Use when gc CLI is missing a capability. All calls need $GC_TOKEN from the environment.

# List PRs (supports per_page up to 100, includes merged_at)
curl -s -H "Authorization: Bearer $GC_TOKEN" \
  "https://gitcode.com/api/v5/repos/<owner>/<repo>/pulls?state=merged&base=<branch>&sort=updated&direction=desc&per_page=100&page=1"

# PR details (includes body/description, merged_at)
curl -s -H "Authorization: Bearer $GC_TOKEN" \
  "https://gitcode.com/api/v5/repos/<owner>/<repo>/pulls/<number>"

# PR files changed
curl -s -H "Authorization: Bearer $GC_TOKEN" \
  "https://gitcode.com/api/v5/repos/<owner>/<repo>/pulls/<number>/files?per_page=100"

# File commit history (gc missing)
curl -s -H "Authorization: Bearer $GC_TOKEN" \
  "https://gitcode.com/api/v5/repos/<owner>/<repo>/commits?path=<file>&sha=<branch>&per_page=10"

# File content at branch (for checking current state)
curl -s -H "Authorization: Bearer $GC_TOKEN" \
  "https://gitcode.com/api/v5/repos/<owner>/<repo>/contents/<path>?ref=<branch>"
# Decode with: base64.b64decode(data['content']).decode()

Problem-Specific Patterns

Test Case Count Changes (up or down)

Key files to check:

  • test_upstream/torch/testing/_internal/common_device_type.py.patch — controls device-type test class generation
  • test_upstream/torch/testing/_internal/common_utils.py.patch — test utility decorators (@skip, etc.)
  • Test-specific .patch files matching the affected test file

Functions to scrutinize:

  • get_all_device_types() — what device types are test classes generated for?
  • filter_desired_device_types() — any filtering changes?
  • instantiate_device_type_tests() — the entry point for test class generation

Key indicators in diffs:

  • @unittest.skip added/removed
  • Device type lists changed (["cpu", "cuda"]["cpu", "cuda", "npu"])
  • Filtering/normalization logic added to device type handling
  • Patch termination/formatting fixes (can cause patch application to silently change behavior)

Build / CI Failures

Key files: CI configs, build scripts, dependency pin files, submodule commit ID updates.

Performance Regressions

Key files: Hot-path source files, compiler/inductor config, kernel launch code, memory allocator changes.

Common Mistakes

Mistake Fix
Only checking PR title, not diff Titles can be misleading. Always pr diff top suspects.
Stopping at first plausible PR List excluded PRs and why they were ruled out.
Ignoring small PRs A 3-file change to test infrastructure can have more impact than a 96-file mechanical fix.
Not checking file commit history A file might have been changed by a PR with an unrelated-looking title.
Forgetting to check common_device_type.py / common_utils.py patches These test infrastructure patches affect ALL tests, not just the ones in their filename.