feat(#1424): self.upstream property for pre-restricted ancestor access#1473
feat(#1424): self.upstream property for pre-restricted ancestor access#1473dimitri-yatsenko wants to merge 1 commit into
Conversation
Implements T2.2.b of the provenance trinity. Inside make(), self.upstream exposes a pre-constructed Diagram.trace(self & key) so users can read declared ancestors with provenance-safe, ergonomic syntax. Branch stacked on feat/1423-diagram-trace (#1471) for Diagram.trace(). What's added: - src/datajoint/autopopulate.py: - AutoPopulate._upstream class attribute (default None) — instance storage for the per-make() trace. - AutoPopulate.upstream property — returns the trace if set, raises DataJointError with a clear "only available inside make()" message otherwise. The error includes the fallback pattern (dj.Diagram.trace(self & key)) so the user knows the escape hatch. - In _populate_one, set self._upstream = Diagram.trace(self & dict(key)) immediately before the make() invocation block. Construction is lazy at this layer (graph copy only); the SQL fetch fires when the user accesses self.upstream[T].fetch(...). - The existing `finally` block (line 716) that resets _allow_insert now also resets _upstream to None, so subsequent attribute access raises a clear error rather than silently returning a stale trace from the previous make() call. What's not changed: - make() signature: unchanged — key remains a dict, make_kwargs work as before. self.upstream is a new attribute on self, not a new parameter. - Tripartite make pattern: self._upstream is set once before all three make() invocations, so all three phases see the same upstream view. Tests in tests/integration/test_autopopulate.py (5 new): - test_upstream_provides_pre_restricted_ancestor — basic case: make() reads self.upstream[Subject].fetch1("name") and the value is correctly pre-restricted to the current key. - test_upstream_rejects_non_ancestor — self.upstream[Unrelated] raises DataJointError ("not in this trace"). Inherited from Diagram.__getitem__. - test_upstream_unset_outside_make — accessing the property outside of make() raises with the helpful "only available inside make()" message. - test_upstream_cleared_after_make — after populate() completes, accessing the property on a fresh instance still raises (verifies the finally cleanup; no stale state). - test_upstream_seen_across_tripartite_make — both make_fetch / make_compute / make_insert see the same self.upstream value. Full regression: 17/17 autopopulate tests pass on MySQL. Slated for DataJoint 2.3. Blocked on #1471 (Diagram.trace) merging; T2.2.c (strict_provenance) is stacked on this PR.
Implements T2.2.c of the provenance trinity, completing the trio (Diagram.trace → self.upstream → strict_provenance). When dj.config["strict_provenance"] = True, runtime gates enforce the upstream-only convention inside make(): - Reads must target a table in the active trace's allowed set (declared ancestors + self + self's Parts). - Writes must target self or self's Parts. - Inserted rows' PK columns that overlap with the current key must equal the key's values (key-consistency rule). Default is False. Existing make() bodies are unaffected. Branch stacked on feat/1424-self-upstream (#1473) → feat/1423-diagram-trace (#1471) → fix/1429-cascade-part-part-renamed-fk (#1468). Will rebase onto master after the chain merges. What's added: - src/datajoint/provenance.py (new): the runtime context module. - `_active_strict_make` ContextVar holding (target, allowed_tables, key) for the currently-executing make() invocation. ContextVar chosen over threading.local to propagate correctly across contextvars-aware concurrency boundaries. - `push_strict_make_context` / `pop_strict_make_context` — context lifecycle managed by `_populate_one`'s try/finally. - `assert_read_allowed(query_expression)` — read gate. Recursively discovers base tables via the QueryExpression's `_support` chain and checks each against the allowed set. - `assert_write_allowed(target_table, rows)` — write gate. Verifies the target is self or one of self's Part tables, and checks the key-consistency rule on each dict row. - src/datajoint/settings.py: new `strict_provenance: bool` field on Config (default False), env-var `DJ_STRICT_PROVENANCE`, ENV_VAR_MAPPING entry. - src/datajoint/autopopulate.py: in `_populate_one`, push the strict context (when the flag is on) just before the make() invocation block. The allowed table set = trace's ancestor nodes ∪ {self.full_table_name} ∪ {self's Parts}. Pop in the existing `finally` block. - src/datajoint/expression.py: `QueryExpression.cursor` now calls `assert_read_allowed(self)` before issuing SQL. No-op outside make(). - src/datajoint/table.py: `Table.insert` calls `assert_write_allowed(self, rows)` after the existing `_allow_insert` check. No-op outside make(). Part-table detection uses class `__dict__` traversal (filtered to Part subclasses) instead of `dir/getattr` to avoid triggering the `_JobsDescriptor` (which would lazy-declare ~~table inside the populate transaction — caught by the first test iteration). Documented limitation (deferred): the read gate does not distinguish reads that came through `self.upstream` from reads of the same ancestor via a direct expression. Both are allowed if the table is in the allowed set. The intent is to catch reads from *undeclared* dependencies; tightening the "must come through self.upstream" path requires propagating an attribution marker through QueryExpression composition and is left for a follow-up release. Tests in tests/integration/test_strict_provenance.py (6 new): - test_strict_compliant_make_passes — make() reading via self.upstream and writing self.insert1 with matching key runs cleanly under strict. - test_strict_blocks_read_from_undeclared_table — read from an unrelated table raises with "strict_provenance ... undeclared" message. - test_strict_blocks_write_to_other_table — insert into a non-self, non-Part target raises "not permitted". - test_strict_blocks_write_with_mismatched_key — row PK that disagrees with the current key raises "does not match the current make() key". - test_strict_writes_to_part_table_pass — self.PartName.insert(...) works. - test_strict_off_by_default_no_change — default-off regression check; the canonical "direct (Ancestor & key).fetch1()" pattern still works when strict_provenance is unset. Regression: 17/17 autopopulate tests pass with strict_provenance unset (default). 6/6 new strict tests pass with strict_provenance=True. 8/8 trace tests + 9/9 cascade tests unaffected. Slated for DataJoint 2.3.
86b1ae7 to
7e4130f
Compare
|
The logic here is sound. One gap worth closing: the two guarantees this design most needs to prove are effectively untested.
Suggest a test that captures the populate instance ( |
Summary
T2.2.b of the provenance trinity. Inside `make()`, `self.upstream` exposes a pre-constructed `Diagram.trace(self & key)` so user code can read declared ancestors with provenance-safe, ergonomic syntax.
Closes #1424. Slated for DataJoint 2.3.
Branch dependency
Stacked on `feat/1423-diagram-trace` (#1471). This PR's base is `feat/1423-diagram-trace`, not `master`. Will rebase onto master once #1471 (and its parent #1468) merge.
What's added
Design notes
Tests
Sequencing
Once this PR merges, T2.2.c (`strict_provenance` config flag, the runtime gate) goes next. It branches off this PR.
Test plan