Add on-demand flaky spec report script#24030
Open
myabc wants to merge 1 commit into
Open
Conversation
|
Warning This pull request does not link an OpenProject work package. Please add a link to the work package in the description, or reference it in the |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an on-demand CLI script to aggregate and rank flaky feature specs reported by CI across PR comments over a configurable recent time window, enabling repo-wide flakiness triage.
Changes:
- Introduces
script/flaky_specs_reportto fetch recent issue comments viagh api, filter “Flaky specs” comments, extractrspecidentifiers, and rank them by recurrence. - Supports configurable look-back window (hours) and repo override, and handles BSD vs GNU
dateimplementations.
CI posts a PR comment for every run that retried failed feature specs, but flakiness is scattered across PRs with no repo-wide view. Adds script/flaky_specs_report, which reads those comments back via the issues/comments REST endpoint with a since cutoff and ranks specs by recurrence over a 24/48/72h window. The visible spec list is parsed before the embedded Copilot prompt so each occurrence is counted once. The default output is a single cheap API call; an opt-in --freshness pass classifies occurrences against dev to separate flakes on current branches from ones on stale ones. Also teaches the flaky CI comment template to embed hidden commit and run metadata so future reports can classify occurrences exactly.
89dc83a to
e3d6970
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket
No work package — internal tooling. Grew out of a request to cheaply pull the latest list of flaky specs across PRs to prioritise investigation.
What are you trying to accomplish?
Adds
script/flaky_specs_report, an on-demand CLI that aggregates flaky specs reported across PRs over a recent time window (24/48/72h).CI's "Report flaky specs" step posts a PR comment for every run that had to retry failed feature specs, but those comments are scattered across individual PRs. There was no way to see flakiness repo-wide or decide what to investigate first.
Usage:
Requires
gh(authenticated) andjq. Handles both BSD (macOS) and GNU date.What approach did you choose and why?
Those CI PR comments are issue comments, so the script reads them back via
GET /repos/{owner}/{repo}/issues/comments?since=<ISO>— paginated, repo-wide, time-filtered. By default it makes a single (paginated) API call and prints raw recurrence counts, so it stays cheap and dependency-light: no Actions API, no artifact downloads, no log scraping. The visible spec list is parsed before the embedded Copilot prompt, so each occurrence is counted once.--freshnessis an optional, heavier pass that classifies each occurrence againstdev(one PR/compare lookup per occurrence) to separate flakes on current branches from ones on stale branches.--freshness-commits Ntunes the staleness threshold and--include-staleranks by total while still showing the split; both imply--freshness.To make that classification exact over time, the flaky CI comment template now embeds a hidden
commit/run_url/run_idmetadata block (invisible in the rendered comment). Existing historical comments lack it, so freshness for them is approximate (via the PR's current head); future comments will be exact once CI posts the new format.I kept it as an on-demand CLI rather than a scheduled job posting to Slack/an issue — wanted the data path proven first; a recurring wrapper can come later.
Caveats:
Merge checklist
spec/scripts/flaky_specs_report_spec.rb)