Skip to content

Add LPDiD complex-survey R-parity validation vs survey::svyglm (Phase D2)#591

Merged
igerber merged 1 commit into
mainfrom
docs/lpdid-survey-parity
Jul 1, 2026
Merged

Add LPDiD complex-survey R-parity validation vs survey::svyglm (Phase D2)#591
igerber merged 1 commit into
mainfrom
docs/lpdid-survey-parity

Conversation

@igerber

@igerber igerber commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Summary

  • Phase D2 of the LP-DiD salvage initiative: R-parity validation of the already-merged D1 LPDiD complex-survey path against survey::svyglm (Lumley). Verify-and-document only — no diff_diff/ change. Closes Phase D.
  • New benchmarks/R/generate_lpdid_survey_golden.R + committed lpdid_survey_panel.csv / lpdid_survey_golden.json (own seed, so the absorbing / non-absorbing goldens stay byte-identical).
  • New TestLPDiDSurveyParityR pins three structurally-distinct survey variance paths end-to-end through LPDiD.fit(survey_design=...), each per-horizon (point/SE/df) + pooled-post/pre:
    • VW full design (strata + PSU + FPC) vs svyglm(ids=~psu, strata, fpc, weights); df = n_PSU - n_strata (= 10).
    • inject weights-only, unit injected as PSU vs svyglm(ids=~unit, weights); df = n_PSU - 1 (= 139).
    • covariate direct inclusion vs svyglm(Dy ~ tr + x + factor(time), ...).
  • Plus an FPC-shrinks-SE ordering check (both SEs svyglm-pinned) and a weighting-flows guard.
  • Each LP-DiD horizon uses a fresh svydesign over that horizon's clean long-difference sample (matching the library's per-sample survey_design.resolve(sample), not subset() of a full design). svyglm is the reference implementation of the Binder TSL sandwich, so it anchors the variance directly; the clean-sample construction is independently cross-checked against alexCardazzi/lpdid (the unweighted VW event study matches to <1e-8), and every svyglm WLS point is gated == weighted feols on the same sample (<1e-7).
  • REGISTRY ## LPDiD Deviation Add Synthetic Difference-in-Differences (SDID) estimator #8 + checklist upgraded to "validated vs survey::svyglm"; CHANGELOG entry; D2 TODO row closed.

Methodology references

  • Method name(s): LPDiD complex-survey variance (Binder 1983 stratified-PSU Taylor-linearization sandwich). No estimator or default-behavior change — this PR validates the merged D1 path.
  • Paper / source link(s): Dube, Girardi, Jordà & Taylor (2025), A Local Projections Approach to Difference-in-Differences. Survey-variance reference: survey::svyglm (Lumley 2004).
  • Any intentional deviations from the source (and why): None new. svyglm is itself the reference TSL implementation, so it anchors the variance directly (no third-party survey package gate needed). Documented in docs/methodology/REGISTRY.md ## LPDiD Deviation Add Synthetic Difference-in-Differences (SDID) estimator #8.

Validation

  • Tests added/updated: tests/test_methodology_lpdid.py (TestLPDiDSurveyParityR: VW ES+pooled, weights-only inject, direct-covariate, FPC-shrinks-SE, weighting-flows). Tolerances: point abs=1e-6, SE rtol=1e-5/abs=1e-7, df exact. 17/17 pure-Python + 5/5 Rust green; existing LPDiD parity classes unchanged.
  • Backtest / simulation / notebook evidence: generator fail-closed gates — alexCardazzi clean-sample cross-check (matched to 0.00e+00), feols == svyglm point gate, no-lonely-PSU guard. Existing lpdid_test_panel.csv / lpdid_golden.json / lpdid_nonabsorbing_* confirmed byte-identical.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes (synthetic benchmark data only).

Generated with Claude Code

@github-actions

Copy link
Copy Markdown

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings. This PR is validation/documentation-only for LPDiD complex-survey parity; no estimator implementation under diff_diff/ changed.

Executive Summary

  • Affected method: LPDiD complex-survey variance on the documented reweight=False path.
  • The new survey::svyglm generator matches the REGISTRY contract: fresh per-sample survey designs, pweight WLS point estimates, Binder TSL SEs, and degf(design) df.
  • Tests cover full strata+PSU+FPC, weights-only unit-PSU injection, direct covariates, pooled rows, FPC shrinkage, and weighting-flow behavior.
  • Existing unsupported survey surfaces remain tracked in TODO.md, so they are not blockers.
  • I could not execute tests in this environment: pytest is unavailable, and a manual Python check also failed because numpy is not installed.

Methodology

  • Severity: P3 informational
    Impact: The PR documents completion of the LPDiD survey R-parity validation while preserving the documented D1 scope limits. The updated registry note aligns with the implementation contract: WLS on survey pweights, stratified-PSU Binder TSL SEs, per-sample survey resolution, unit-as-PSU injection when no PSU is provided, and rejection of reweight=True, replicate weights, and non-pweights. See docs/methodology/REGISTRY.md:L1873-L1892 and TODO.md:L118-L119.
    Concrete fix: None required.

No undocumented methodology mismatch found. The generator builds event-study and pooled svyglm goldens per realized LPDiD sample, and the tests assert point/SE/df/n_PSU parity against those goldens. See benchmarks/R/generate_lpdid_survey_golden.R:L157-L228 and tests/test_methodology_lpdid.py:L389-L572.

Code Quality

  • Severity: None
    Impact: No estimator code changed, no new inline inference computation was introduced, and the new assertions are scoped to methodology parity.
    Concrete fix: None.

Performance

  • Severity: None
    Impact: The added fixture is small and test-only. The repeated LPDiD fits are acceptable for methodology validation.
    Concrete fix: None.

Maintainability

  • Severity: None
    Impact: The R generator is deterministic, pins the external alexCardazzi/lpdid cross-check commit, and writes disjoint survey fixtures, reducing fixture drift risk.
    Concrete fix: None.

Tech Debt

  • Severity: P3 informational
    Impact: Closing the D2 TODO row is consistent with the new parity fixture. Remaining LPDiD survey limitations are still tracked under Deferred / Documented.
    Concrete fix: None required.

Security

  • Severity: None
    Impact: Added data appears synthetic; no secrets or PII were evident in the changed fixture/generator paths.
    Concrete fix: None.

Documentation/Tests

  • Severity: None
    Impact: CHANGELOG, REGISTRY, TODO, generator, committed panel, committed golden JSON, and parity tests are in sync.
    Concrete fix: None.

Verification limitation: targeted pytest could not run because pytest is not installed; a direct Python parity smoke check could not run because numpy is not installed.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 30, 2026
…(Phase D2)

Pin the merged D1 LPDiD survey path end-to-end against survey::svyglm (Lumley)
goldens on a dedicated staggered-absorbing survey panel. Closes Phase D of the
LP-DiD salvage initiative. No diff_diff/ change (verify-and-document).

Three structurally-distinct survey variance paths are R-validated per-horizon
(point/SE/df) + pooled-post/pre:
  * VW full design (strata + PSU + FPC) vs svyglm(ids=~psu, strata, fpc, weights);
    df = n_PSU - n_strata = 10.
  * weights-only, unit injected as PSU vs svyglm(ids=~unit, weights); df = n_PSU - 1.
  * direct covariate inclusion vs svyglm(Dy ~ tr + x + factor(time)).

Each LP-DiD horizon uses a FRESH svydesign over that horizon's clean long-difference
sample (matching the library's per-sample resolution, not subset() of a full design).
svyglm is the reference implementation of the Binder TSL sandwich, so it anchors the
variance directly; the clean-sample construction is independently cross-checked against
alexCardazzi/lpdid (unweighted VW event study matches to <1e-8), and every svyglm WLS
point is gated == weighted feols on the same sample (<1e-7).

Tolerances: point abs=1e-6, SE rtol=1e-5/abs=1e-7, df exact. New
TestLPDiDSurveyParityR + generate_lpdid_survey_golden.R + lpdid_survey_panel.csv /
lpdid_survey_golden.json (own seed -> absorbing/non-absorbing goldens byte-identical).
REGISTRY Deviation #8 + checklist upgraded to "validated vs svyglm"; CHANGELOG; TODO row closed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the docs/lpdid-survey-parity branch from e6a0d96 to ad053ff Compare June 30, 2026 22:42
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ad053ffaf1264bc0066225c01055ec9be227bdf3


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings. This PR is validation/documentation-only for LPDiD complex-survey R parity; no estimator code under diff_diff/ is changed.

Executive Summary

  • Affected method: LPDiD complex-survey variance on the documented reweight=False path.
  • The updated REGISTRY note documents the exact supported scope and remaining rejected survey surfaces: reweight=True, replicate weights, and non-pweights remain deferred.
  • The new R generator uses fresh per-horizon svydesign samples and pins point/SE/df against survey::svyglm, with same-sample WLS point gates and an independent clean-sample cross-check.
  • The new Python tests cover full strata+PSU+FPC, weights-only unit-PSU injection, direct covariates, pooled rows, FPC shrinkage, and weighting flow.
  • I could not run the targeted tests here because pytest is not installed.

Methodology

  • Severity: P3 informational
    Impact: The PR validates the documented LPDiD survey path without changing estimator behavior. The registry now states WLS on survey pweights, Binder TSL SEs, per-sample design resolution, unit-as-PSU injection, df formulas, and the D2 survey::svyglm parity scope. This matches the generator and tests. See docs/methodology/REGISTRY.md:L1873-L1892, benchmarks/R/generate_lpdid_survey_golden.R:L157-L229, and tests/test_methodology_lpdid.py:L389-L572.
    Concrete fix: None required.

No undocumented methodology mismatch found.

Code Quality

No findings. The added test helpers are narrowly scoped, assert point/SE/df/n_PSU separately, and do not introduce inline inference computation or estimator-side parameter propagation changes.

Performance

No findings. The committed fixture is small, and the repeated LPDiD fits are test-only methodology validation.

Maintainability

No findings. The generator is deterministic, writes disjoint survey fixture paths, pins the external alexCardazzi/lpdid commit for the clean-sample cross-check, and records the golden metadata needed to regenerate/debug drift.

Tech Debt

  • Severity: P3 informational
    Impact: The removed TODO row for D2 is consistent with the new committed svyglm parity fixture. Remaining LPDiD survey limitations are still tracked under Deferred / Documented. See TODO.md:L118-L119.
    Concrete fix: None required.

Security

No findings. A secret-pattern scan over changed files did not flag credentials, and the added data appears synthetic.

Documentation/Tests

No findings. CHANGELOG, TODO, REGISTRY, generator, golden JSON, panel CSV, and parity tests are aligned.

Verification limitation: python -m pytest tests/test_methodology_lpdid.py::TestLPDiDSurveyParityR -q could not run because pytest is unavailable in this environment.

@igerber igerber merged commit 6596d1b into main Jul 1, 2026
25 checks passed
@igerber igerber deleted the docs/lpdid-survey-parity branch July 1, 2026 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant