Skip to content

Fix seed boolean null handling#5870

Open
nkwork9999 wants to merge 1 commit into
SQLMesh:mainfrom
nkwork9999:fix/databricks-seed-null-booleans
Open

Fix seed boolean null handling#5870
nkwork9999 wants to merge 1 commit into
SQLMesh:mainfrom
nkwork9999:fix/databricks-seed-null-booleans

Conversation

@nkwork9999

Copy link
Copy Markdown
Contributor

Summary

Fixes #5589.

This preserves NULL values in seed boolean columns instead of coercing them to False.

The issue was caused by seed boolean conversion calling str_to_bool(str(value)) for every value. Pandas reads CSV null values as NaN, so the seed renderer converted NaN to the string "nan", and str_to_bool("nan") returned False.

Changes

  • Preserve pandas NA values as None when rendering seed boolean columns.
  • Add a regression test for a seed CSV containing null, false, true, and null in a boolean column.

Databricks validation

Verified on Databricks Free Edition / SQL Warehouse that Databricks preserves boolean NULL values with the expected SQL shape:

CREATE SCHEMA IF NOT EXISTS sqlmesh_smoke;

CREATE OR REPLACE TABLE sqlmesh_smoke.seed_bool_null_smoke (
  id STRING,
  test_ind BOOLEAN
);

INSERT INTO sqlmesh_smoke.seed_bool_null_smoke
SELECT
  CAST(id AS STRING) AS id,
  CAST(test_ind AS BOOLEAN) AS test_ind
FROM VALUES
  ('1', NULL),
  ('2', FALSE),
  ('3', TRUE),
  ('4', NULL) AS t(id, test_ind);

SELECT
  id,
  test_ind,
  test_ind IS NULL AS is_null
FROM sqlmesh_smoke.seed_bool_null_smoke
ORDER BY id;

Result:

1  null   true
2  false  false
3  true   false
4  null   true

Also verified locally that SQLMesh now generates Databricks SQL preserving NULL values:

FROM VALUES
  ('1', CAST(NULL AS BOOLEAN)),
  ('2', FALSE),
  ('3', TRUE),
  ('4', NULL) AS t(id, test_ind)

Tests

uv run pytest tests/dbt/test_transformation.py::test_seed_boolean_nulls_are_preserved -q
1 passed

uv run pytest tests/dbt/test_transformation.py::test_seed_boolean_nulls_are_preserved tests/dbt/test_transformation.py::test_seed_single_whitespace_is_na tests/dbt/test_transformation.py::test_seed_partial_column_inference -q
3 passed

uv run ruff check sqlmesh/core/model/definition.py tests/dbt/test_transformation.py
All checks passed!

@nkwork9999 nkwork9999 force-pushed the fix/databricks-seed-null-booleans branch from 5429717 to 673aa45 Compare June 30, 2026 14:02
@StuffbyYuki StuffbyYuki self-requested a review June 30, 2026 15:23
@nkwork9999 nkwork9999 force-pushed the fix/databricks-seed-null-booleans branch from 673aa45 to 197e06a Compare June 30, 2026 15:43
Signed-off-by: nkwork9999 <143652584+nkwork9999@users.noreply.github.com>
@nkwork9999 nkwork9999 force-pushed the fix/databricks-seed-null-booleans branch from 197e06a to 6f9744d Compare June 30, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Databricks CSV seed load coercing NULL booleans to False

1 participant