Add LPDiD complex-survey R-parity validation vs survey::svyglm (Phase D2) by igerber · Pull Request #591 · igerber/diff-diff

igerber · 2026-06-30T20:18:42Z

Summary

Phase D2 of the LP-DiD salvage initiative: R-parity validation of the already-merged D1 LPDiD complex-survey path against survey::svyglm (Lumley). Verify-and-document only — no diff_diff/ change. Closes Phase D.
New benchmarks/R/generate_lpdid_survey_golden.R + committed lpdid_survey_panel.csv / lpdid_survey_golden.json (own seed, so the absorbing / non-absorbing goldens stay byte-identical).
New TestLPDiDSurveyParityR pins three structurally-distinct survey variance paths end-to-end through LPDiD.fit(survey_design=...), each per-horizon (point/SE/df) + pooled-post/pre:
- VW full design (strata + PSU + FPC) vs svyglm(ids=~psu, strata, fpc, weights); df = n_PSU - n_strata (= 10).
- inject weights-only, unit injected as PSU vs svyglm(ids=~unit, weights); df = n_PSU - 1 (= 139).
- covariate direct inclusion vs svyglm(Dy ~ tr + x + factor(time), ...).
Plus an FPC-shrinks-SE ordering check (both SEs svyglm-pinned) and a weighting-flows guard.
Each LP-DiD horizon uses a fresh svydesign over that horizon's clean long-difference sample (matching the library's per-sample survey_design.resolve(sample), not subset() of a full design). svyglm is the reference implementation of the Binder TSL sandwich, so it anchors the variance directly; the clean-sample construction is independently cross-checked against alexCardazzi/lpdid (the unweighted VW event study matches to <1e-8), and every svyglm WLS point is gated == weighted feols on the same sample (<1e-7).
REGISTRY ## LPDiD Deviation Add Synthetic Difference-in-Differences (SDID) estimator #8 + checklist upgraded to "validated vs survey::svyglm"; CHANGELOG entry; D2 TODO row closed.

Methodology references

Method name(s): LPDiD complex-survey variance (Binder 1983 stratified-PSU Taylor-linearization sandwich). No estimator or default-behavior change — this PR validates the merged D1 path.
Paper / source link(s): Dube, Girardi, Jordà & Taylor (2025), A Local Projections Approach to Difference-in-Differences. Survey-variance reference: survey::svyglm (Lumley 2004).
Any intentional deviations from the source (and why): None new. svyglm is itself the reference TSL implementation, so it anchors the variance directly (no third-party survey package gate needed). Documented in docs/methodology/REGISTRY.md ## LPDiD Deviation Add Synthetic Difference-in-Differences (SDID) estimator #8.

Validation

Tests added/updated: tests/test_methodology_lpdid.py (TestLPDiDSurveyParityR: VW ES+pooled, weights-only inject, direct-covariate, FPC-shrinks-SE, weighting-flows). Tolerances: point abs=1e-6, SE rtol=1e-5/abs=1e-7, df exact. 17/17 pure-Python + 5/5 Rust green; existing LPDiD parity classes unchanged.
Backtest / simulation / notebook evidence: generator fail-closed gates — alexCardazzi clean-sample cross-check (matched to 0.00e+00), feols == svyglm point gate, no-lonely-PSU guard. Existing lpdid_test_panel.csv / lpdid_golden.json / lpdid_nonabsorbing_* confirmed byte-identical.

Security / privacy

Confirm no secrets/PII in this PR: Yes (synthetic benchmark data only).

Generated with Claude Code

github-actions · 2026-06-30T20:21:38Z

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings. This PR is validation/documentation-only for LPDiD complex-survey parity; no estimator implementation under diff_diff/ changed.

Executive Summary

Affected method: LPDiD complex-survey variance on the documented reweight=False path.
The new survey::svyglm generator matches the REGISTRY contract: fresh per-sample survey designs, pweight WLS point estimates, Binder TSL SEs, and degf(design) df.
Tests cover full strata+PSU+FPC, weights-only unit-PSU injection, direct covariates, pooled rows, FPC shrinkage, and weighting-flow behavior.
Existing unsupported survey surfaces remain tracked in TODO.md, so they are not blockers.
I could not execute tests in this environment: pytest is unavailable, and a manual Python check also failed because numpy is not installed.

Methodology

Severity: P3 informational
Impact: The PR documents completion of the LPDiD survey R-parity validation while preserving the documented D1 scope limits. The updated registry note aligns with the implementation contract: WLS on survey pweights, stratified-PSU Binder TSL SEs, per-sample survey resolution, unit-as-PSU injection when no PSU is provided, and rejection of reweight=True, replicate weights, and non-pweights. See docs/methodology/REGISTRY.md:L1873-L1892 and TODO.md:L118-L119.
Concrete fix: None required.

No undocumented methodology mismatch found. The generator builds event-study and pooled svyglm goldens per realized LPDiD sample, and the tests assert point/SE/df/n_PSU parity against those goldens. See benchmarks/R/generate_lpdid_survey_golden.R:L157-L228 and tests/test_methodology_lpdid.py:L389-L572.

Code Quality

Severity: None
Impact: No estimator code changed, no new inline inference computation was introduced, and the new assertions are scoped to methodology parity.
Concrete fix: None.

Performance

Severity: None
Impact: The added fixture is small and test-only. The repeated LPDiD fits are acceptable for methodology validation.
Concrete fix: None.

Maintainability

Severity: None
Impact: The R generator is deterministic, pins the external alexCardazzi/lpdid cross-check commit, and writes disjoint survey fixtures, reducing fixture drift risk.
Concrete fix: None.

Tech Debt

Severity: P3 informational
Impact: Closing the D2 TODO row is consistent with the new parity fixture. Remaining LPDiD survey limitations are still tracked under Deferred / Documented.
Concrete fix: None required.

Security

Severity: None
Impact: Added data appears synthetic; no secrets or PII were evident in the changed fixture/generator paths.
Concrete fix: None.

Documentation/Tests

Severity: None
Impact: CHANGELOG, REGISTRY, TODO, generator, committed panel, committed golden JSON, and parity tests are in sync.
Concrete fix: None.

Verification limitation: targeted pytest could not run because pytest is not installed; a direct Python parity smoke check could not run because numpy is not installed.

…(Phase D2) Pin the merged D1 LPDiD survey path end-to-end against survey::svyglm (Lumley) goldens on a dedicated staggered-absorbing survey panel. Closes Phase D of the LP-DiD salvage initiative. No diff_diff/ change (verify-and-document). Three structurally-distinct survey variance paths are R-validated per-horizon (point/SE/df) + pooled-post/pre: * VW full design (strata + PSU + FPC) vs svyglm(ids=~psu, strata, fpc, weights); df = n_PSU - n_strata = 10. * weights-only, unit injected as PSU vs svyglm(ids=~unit, weights); df = n_PSU - 1. * direct covariate inclusion vs svyglm(Dy ~ tr + x + factor(time)). Each LP-DiD horizon uses a FRESH svydesign over that horizon's clean long-difference sample (matching the library's per-sample resolution, not subset() of a full design). svyglm is the reference implementation of the Binder TSL sandwich, so it anchors the variance directly; the clean-sample construction is independently cross-checked against alexCardazzi/lpdid (unweighted VW event study matches to <1e-8), and every svyglm WLS point is gated == weighted feols on the same sample (<1e-7). Tolerances: point abs=1e-6, SE rtol=1e-5/abs=1e-7, df exact. New TestLPDiDSurveyParityR + generate_lpdid_survey_golden.R + lpdid_survey_panel.csv / lpdid_survey_golden.json (own seed -> absorbing/non-absorbing goldens byte-identical). REGISTRY Deviation #8 + checklist upgraded to "validated vs svyglm"; CHANGELOG; TODO row closed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-30T22:43:45Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ad053ffaf1264bc0066225c01055ec9be227bdf3

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings. This PR is validation/documentation-only for LPDiD complex-survey R parity; no estimator code under diff_diff/ is changed.

Executive Summary

Affected method: LPDiD complex-survey variance on the documented reweight=False path.
The updated REGISTRY note documents the exact supported scope and remaining rejected survey surfaces: reweight=True, replicate weights, and non-pweights remain deferred.
The new R generator uses fresh per-horizon svydesign samples and pins point/SE/df against survey::svyglm, with same-sample WLS point gates and an independent clean-sample cross-check.
The new Python tests cover full strata+PSU+FPC, weights-only unit-PSU injection, direct covariates, pooled rows, FPC shrinkage, and weighting flow.
I could not run the targeted tests here because pytest is not installed.

Methodology

Severity: P3 informational
Impact: The PR validates the documented LPDiD survey path without changing estimator behavior. The registry now states WLS on survey pweights, Binder TSL SEs, per-sample design resolution, unit-as-PSU injection, df formulas, and the D2 survey::svyglm parity scope. This matches the generator and tests. See docs/methodology/REGISTRY.md:L1873-L1892, benchmarks/R/generate_lpdid_survey_golden.R:L157-L229, and tests/test_methodology_lpdid.py:L389-L572.
Concrete fix: None required.

No undocumented methodology mismatch found.

Code Quality

No findings. The added test helpers are narrowly scoped, assert point/SE/df/n_PSU separately, and do not introduce inline inference computation or estimator-side parameter propagation changes.

Performance

No findings. The committed fixture is small, and the repeated LPDiD fits are test-only methodology validation.

Maintainability

No findings. The generator is deterministic, writes disjoint survey fixture paths, pins the external alexCardazzi/lpdid commit for the clean-sample cross-check, and records the golden metadata needed to regenerate/debug drift.

Tech Debt

Severity: P3 informational
Impact: The removed TODO row for D2 is consistent with the new committed svyglm parity fixture. Remaining LPDiD survey limitations are still tracked under Deferred / Documented. See TODO.md:L118-L119.
Concrete fix: None required.

Security

No findings. A secret-pattern scan over changed files did not flag credentials, and the added data appears synthetic.

Documentation/Tests

No findings. CHANGELOG, TODO, REGISTRY, generator, golden JSON, panel CSV, and parity tests are aligned.

Verification limitation: python -m pytest tests/test_methodology_lpdid.py::TestLPDiDSurveyParityR -q could not run because pytest is unavailable in this environment.

igerber added the ready-for-ci Triggers CI test workflows label Jun 30, 2026

igerber force-pushed the docs/lpdid-survey-parity branch from e6a0d96 to ad053ff Compare June 30, 2026 22:42

igerber merged commit 6596d1b into main Jul 1, 2026
25 checks passed

igerber deleted the docs/lpdid-survey-parity branch July 1, 2026 00:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LPDiD complex-survey R-parity validation vs survey::svyglm (Phase D2)#591

Add LPDiD complex-survey R-parity validation vs survey::svyglm (Phase D2)#591
igerber merged 1 commit into
mainfrom
docs/lpdid-survey-parity

igerber commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jun 30, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant