Skip to content

Add on-demand flaky spec report script#24030

Open
myabc wants to merge 1 commit into
devfrom
chore/flaky-specs-report
Open

Add on-demand flaky spec report script#24030
myabc wants to merge 1 commit into
devfrom
chore/flaky-specs-report

Conversation

@myabc

@myabc myabc commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Ticket

No work package — internal tooling. Grew out of a request to cheaply pull the latest list of flaky specs across PRs to prioritise investigation.

What are you trying to accomplish?

Adds script/flaky_specs_report, an on-demand CLI that aggregates flaky specs reported across PRs over a recent time window (24/48/72h).

CI's "Report flaky specs" step posts a PR comment for every run that had to retry failed feature specs, but those comments are scattered across individual PRs. There was no way to see flakiness repo-wide or decide what to investigate first.

Usage:

script/flaky_specs_report 24                  # last 24h (default)
script/flaky_specs_report 48
script/flaky_specs_report 72
script/flaky_specs_report --freshness 48      # add freshness classification

Requires gh (authenticated) and jq. Handles both BSD (macOS) and GNU date.

What approach did you choose and why?

Those CI PR comments are issue comments, so the script reads them back via GET /repos/{owner}/{repo}/issues/comments?since=<ISO> — paginated, repo-wide, time-filtered. By default it makes a single (paginated) API call and prints raw recurrence counts, so it stays cheap and dependency-light: no Actions API, no artifact downloads, no log scraping. The visible spec list is parsed before the embedded Copilot prompt, so each occurrence is counted once.

--freshness is an optional, heavier pass that classifies each occurrence against dev (one PR/compare lookup per occurrence) to separate flakes on current branches from ones on stale branches. --freshness-commits N tunes the staleness threshold and --include-stale ranks by total while still showing the split; both imply --freshness.

To make that classification exact over time, the flaky CI comment template now embeds a hidden commit / run_url / run_id metadata block (invisible in the rendered comment). Existing historical comments lack it, so freshness for them is approximate (via the PR's current head); future comments will be exact once CI posts the new format.

I kept it as an on-demand CLI rather than a scheduled job posting to Slack/an issue — wanted the data path proven first; a recurring wrapper can come later.

Caveats:

  • Count is a frequency proxy (number of PR comments naming the spec), not an exact retry count — good enough for triage ranking.
  • Only catches specs CI retried (≤10 failures per run); runs that bail with >10 failures post no comment.
  • Fork PRs print to the job summary instead of commenting, so they aren't captured.

Merge checklist

  • Added/updated tests (spec/scripts/flaky_specs_report_spec.rb)
  • Added/updated documentation in Lookbook (patterns, previews, etc) — N/A, CLI tooling
  • Tested major browsers (Chrome, Firefox, Edge, ...) — N/A, no UI

Copilot AI review requested due to automatic review settings June 30, 2026 17:25
@github-actions

Copy link
Copy Markdown

Warning

This pull request does not link an OpenProject work package.

Please add a link to the work package in the description, or reference it in the
title in square brackets, e.g. [SLUG-123] My title here.

@myabc myabc requested a review from toy June 30, 2026 17:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an on-demand CLI script to aggregate and rank flaky feature specs reported by CI across PR comments over a configurable recent time window, enabling repo-wide flakiness triage.

Changes:

  • Introduces script/flaky_specs_report to fetch recent issue comments via gh api, filter “Flaky specs” comments, extract rspec identifiers, and rank them by recurrence.
  • Supports configurable look-back window (hours) and repo override, and handles BSD vs GNU date implementations.

Comment thread script/flaky_specs_report Outdated
@myabc myabc added the ci label Jun 30, 2026
CI posts a PR comment for every run that retried failed feature specs,
but flakiness is scattered across PRs with no repo-wide view. Adds
script/flaky_specs_report, which reads those comments back via the
issues/comments REST endpoint with a since cutoff and ranks specs by
recurrence over a 24/48/72h window.

The visible spec list is parsed before the embedded Copilot prompt so
each occurrence is counted once. The default output is a single cheap
API call; an opt-in --freshness pass classifies occurrences against
dev to separate flakes on current branches from ones on stale ones.

Also teaches the flaky CI comment template to embed hidden commit and
run metadata so future reports can classify occurrences exactly.
@myabc myabc force-pushed the chore/flaky-specs-report branch from 89dc83a to e3d6970 Compare June 30, 2026 18:56
@myabc myabc requested a review from akabiru June 30, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

2 participants