GGaorui Zhangdocs(research): align Zotero smart-import workflow

e874f7dd创建于 3月31日历史提交

name: research-init description: Initialize a research project with Zotero-integrated literature review. Creates Zotero collections, searches and imports papers, analyzes content, and generates literature review and research proposal. args:

name: topic description: Research topic or keywords required: true
name: scope description: Review scope (focused/broad) required: false default: focused
name: output_type description: Output type (review/proposal/both) required: false default: both tags: [Research, Literature Review, Zotero, Paper Search]

/research-init - Zotero-Integrated Research Startup Workflow

Launch a complete literature survey workflow for the research topic " $t o p i c ", w i t h s c o p e "$ scope" and output type "$output_type".

Usage

Basic Usage

/research-init "transformer interpretability"

Specify Scope

/research-init "few-shot learning" focused

Specify All Parameters

/research-init "neural architecture search" broad both

Workflow

Execute the following steps in order:

Step 1: Create Zotero Research Collection

Call the Zotero MCP tool zotero_create_collection to create the main collection, named Research-{Topic}-{YYYY-MM} (extract a short PascalCase keyword from the topic, use the current year and month)
Create sub-collections under the main collection:
- Core Papers
- Methods
- Applications
- Baselines
- To-Read
Record the collection_key for each sub-collection (needed for import in Step 2)

Step 2: Literature Search and Import

Use WebSearch to find papers related to "$topic"
- Search strategy: use the topic directly, plus variant combinations of key terms
- Target sources: arXiv, DOI-backed publisher landing pages, conference proceedings with full-paper pages, direct PDF pages
- Time range: focused mode searches the last 3 years, broad mode searches the last 5 years
- Target paper count: 20-50 papers for focused scope, 50-100 for broad scope
Source quality filter before extraction:
- Prefer candidates that expose at least one of: DOI, arXiv ID, direct PDF URL, or clear citation metadata for a full paper
- Explicitly avoid abstract-only pages as primary sources when a better source exists for the same paper (for example conference abstract listings, event schedule pages, teaser pages, or pages that only contain a short abstract with no DOI/arXiv/PDF)
- For the same title, source preference should be:
  1. DOI-backed publisher page
  2. arXiv abs/pdf page
  3. direct PDF URL
  4. full-paper proceedings landing page
  5. abstract-only page (last resort only)
- If the discovered page is clearly abstract-only and no DOI/arXiv/PDF can be extracted, do not prioritize it for import; keep searching for a better source first
- Abstract-only pages should not be counted toward the target paper quota unless all better identifier-bearing/full-paper sources for that title have been exhausted
Extract candidate DOI / arXiv ID / landing-page URL from filtered search results
Classify before import: For each paper, determine which sub-collection it belongs to (Core Papers, Methods, Applications, Baselines, or To-Read) based on its title, abstract, and venue
Pre-import deduplication (two-step):
- Call the Zotero MCP tool zotero_search_items with the DOI string when available to find potential matches
- Call zotero_get_item_metadata on results to confirm the DOI field matches exactly
- If confirmed match → skip import, log ("Already exists: {DOI} → {item_key}")
- For papers without DOI → search by title using token overlap ratio (lowercase both titles, remove punctuation, compute intersection of words / union of words). Ratio > 0.8 = duplicate
Abstract-only page guardrail (mandatory):
- Before calling zotero_add_items_by_identifier, check whether the chosen URL is likely an abstract-only page
- Strong signals include: URL/path contains abstract, page title/heading is an abstract listing, page body lacks PDF/full-text links, and no DOI/arXiv identifier is visible
- If it is abstract-only and no DOI/arXiv/PDF can be recovered, prefer one of:
  - keep searching for a better source for the same title, or
  - skip this candidate for now
- Do not eagerly import abstract-only pages into analytical sub-collections just to satisfy paper count
- If an abstract-only page is imported as a last-resort placeholder, it must be treated as To-Read only, never as a confirmed paper source for Core Papers, Methods, Applications, or Baselines
- When you skip such a candidate during Step 2, print this exact user-facing line in the terminal output:
  - Skipped abstract-only page; searching better source
Smart import with collection assignment: Call zotero_add_items_by_identifier with the target sub-collection's collection_key, attach_pdf=true, and fallback_mode="webpage"
- Route priority is fixed: DOI / doi.org URL → arXiv ID or arXiv URL → direct PDF URL → generic landing-page URL
- The tool will prefer proper paper/preprint items and only fall back to webpage when no reliable DOI/arXiv identifier is found
- For difficult publisher pages and cookie-gated PDFs, the import path may additionally use the local Zotero connector/browser session and optional Playwright-assisted PDF rescue before falling back to webpage
- The server records internal import events automatically for debugging, but that internal ledger is not part of the default public MCP tool surface and should not be part of the normal workflow
Collection guardrail: Only keep paper imports in Core Papers, Methods, Applications, or Baselines
- If the import result says Saved as webpage, move or create that entry only in To-Read
PDF follow-up: zotero_add_items_by_identifier(..., attach_pdf=true) already runs the PDF cascade by default. If the import result says PDF not attached, optionally call zotero_find_and_attach_pdfs({ item_keys: [...] }) for a second pass. If it still fails, log it and continue.
Automatic postpass dedupe/reconcile (mandatory): after the import batch, call zotero_reconcile_collection_duplicates on the main research collection with:

collection_key = {main research collection key}
include_subcollections = true
dry_run = false
reconcile_local_only = true
local_db_fallback = false
Goal: automatically clean duplicate parent items across the whole research collection tree, keep the canonical item with PDF when available, merge collection memberships, and remove stale duplicates when the standard APIs can handle them
Precondition: destructive dedupe with dry_run=false requires Zotero MCP write/delete permission, which means UNSAFE_OPERATIONS=items must already be enabled
Default safety mode: do not enable local_db_fallback=true automatically inside /research-init; only mention it in debug/recovery mode if residual duplicates remain after the standard pass

Step-2 terminal summary (mandatory): after import + postpass dedupe, print a user-facing compact summary table:

| Input | Zotero Key | Collection | Status |
|-------|------------|------------|--------|
| ...   | ABC123     | Core Papers | Imported as paper + PDF attached |

Status should use only user-facing phrases:
- Imported as paper + PDF attached
- Imported as paper
- Saved as webpage + PDF attached
- Saved as webpage
- Import failed
If a candidate was rejected before import because it is abstract-only and no DOI/arXiv/PDF could be recovered, print this standalone line before the table entry list continues:
- Skipped abstract-only page; searching better source
After the table, print one compact dedupe line:
- Collection dedupe summary: duplicate groups 0, duplicates trashed 0
- or Collection dedupe summary: duplicate groups N, duplicates trashed M
Then print one compact missing-PDF repair line:
- Missing PDF postpass: repaired 0 items
- or Missing PDF postpass: repaired N items
Read these counts from the zotero_reconcile_collection_duplicates summary; do not invent them
Do not print route=..., pdf_source=..., fallback_reason=..., local_item_key=..., or reconcile internals in the default terminal output
Do not dump the raw dedupe markdown table in normal mode; read the tool result, extract the counts, and summarize it in one user-facing line
If you need implementation details for debugging, explicitly say you are switching to debug mode and rerun with ZOTERO_MCP_DEBUG_IMPORT=1

Note: Zotero items can still be added to or removed from collections later, but /research-init should prefer correct collection_key assignment during import so analytical sub-collections stay clean. The default command path should now rely on zotero_reconcile_collection_duplicates as the standard postpass cleanup, not the older item-by-item local reconcile helper. When in doubt between a full-paper source and an abstract-only page for the same title, always prefer the full-paper source, even if the abstract-only page ranks higher in search results. Use canonical Zotero MCP tool names consistently in this workflow: zotero_create_collection, zotero_search_items, zotero_get_item_metadata, zotero_add_items_by_identifier, zotero_find_and_attach_pdfs, and zotero_reconcile_collection_duplicates.

Step 3: Paper Analysis

Call zotero_get_collection_items to list imported papers
Call zotero_get_item_metadata with include_abstract: true to get metadata and abstracts (ensures abstracts are available as fallback if full-text retrieval fails)
Call zotero_get_item_fulltext to read full text of papers with PDFs
For each paper, extract:
- Research question and motivation
- Core methodology
- Key findings and contributions
- Limitations and future work
Use these structured notes as intermediate analysis to inform the final literature-review.md (they are not a separate output file)

Step 4: Gap Analysis and Synthesis

Analyze all collected papers for:
- Research trends and directions
- Methodological gaps
- Unexplored application domains
- Contradictions in research findings
Identify 2-3 concrete research gaps
Formulate potential research questions

Step 5: Generate Outputs

Generate corresponding files based on output_type "$output_type":

literature-review.md - Structured literature review with real citations from Zotero
research-proposal.md - Research proposal (generated when output_type is "proposal" or "both")
references.bib - BibTeX references from Zotero data
- Primary method: Use Zotero REST API with ?format=bibtex to export accurate, complete BibTeX entries
```
GET https://api.zotero.org/users/{user_id}/collections/{collection_key}/items?format=bibtex
```
  Note: The REST API ?format=bibtex on a collection only exports items directly in that collection, not items in sub-collections. You must iterate each sub-collection key individually, or collect all item keys and use the items endpoint: GET https://api.zotero.org/users/{user_id}/items?itemKey=KEY1,KEY2,...&format=bibtex
- Fallback: Construct BibTeX manually from zotero_get_item_metadata metadata (note: volume, issue, pages, and publisher fields are not available via this tool — entries will be incomplete)

Use TodoWrite to track progress throughout the workflow.

Error Handling

If the default Zotero MCP path fails during execution, use these workflow fallbacks:

zotero_create_collection fails → Create via Zotero REST API directly
zotero_add_items_by_identifier fails → Retry with a narrower identifier (explicit DOI or arXiv ID). If the source is a publisher landing page or direct PDF, allow the smart importer to use connector/browser-session rescue and optional Playwright-assisted PDF rescue first. If smart import still fails, use an out-of-band fallback such as CrossRef metadata lookup (https://api.crossref.org/works/{DOI}) and retry the DOI-specific path or save the page as a manual webpage.
zotero_get_item_fulltext fails → Use WebFetch on the paper's DOI URL to scrape abstract → fall back to abstractNote from zotero_get_item_metadata + domain knowledge
zotero_find_and_attach_pdfs fails → Log and continue; PDFs are not required for analysis. If a needed paper still lacks a PDF, ask the user to attach it manually in Zotero Desktop and rerun analysis later.
zotero_reconcile_collection_duplicates fails → Keep the import results, log that postpass dedupe failed, and continue with analysis. In debug mode, inspect the tool's summary and consider rerunning with local_db_fallback=true only if local-only duplicates remain and the user explicitly wants aggressive cleanup.
Single paper fails → Log error, skip, and continue to next paper
API rate limit → Wait 5 seconds and retry, up to 3 attempts

Completion Checklist

Before finishing, verify:

Zotero collection Research-{Topic}-{YYYY-MM} created with sub-collections
Papers imported and organized into sub-collections (target: 20-50 focused / 50-100 broad)
PDFs attached for available open-access papers
Full-text analysis completed for core papers
Gap analysis identifies at least 2-3 concrete research gaps
Output files generated: literature-review.md, references.bib, and optionally research-proposal.md
All citations in review correspond to actual Zotero library entries

Output Files

The command generates the following files:

{project_dir}/
├── literature-review.md      # Structured literature review (with Zotero citations)
├── research-proposal.md      # Research proposal (if requested)
└── references.bib            # BibTeX references

Integration Notes

This command will:

Use Zotero MCP tools to manage literature collections and metadata
Trigger the literature-reviewer agent for literature analysis
Use the research-ideation skill methodology (5W1H, Gap Analysis)
Search for latest papers via WebSearch

Notes

Ensure the Zotero MCP service is properly configured and running
DOI import depends on network connectivity and Zotero's metadata resolution capability
PDF attachment is limited to open-access papers; paywalled papers must be added manually
Generated literature reviews and research proposals require manual review and refinement

Skill: research-ideation - Research ideation methodology
Agent: literature-reviewer - Literature search and analysis
Commands: /zotero-review - Analyze existing Zotero collections, /zotero-notes - Batch generate reading notes