Versions benchmark rework by alexey-milovidov · Pull Request #968 · ClickHouse/ClickBench

alexey-milovidov · 2026-07-01T06:39:07Z

No description provided.

Rebuild the ClickHouse Versions Benchmark infrastructure from scratch around Docker images so every historical and current version can be run identically, replacing the old apt-based scripts. - list-versions.sh: select versions from the authoritative version_date.tsv (all 1.1.x + latest patch per YY.MM, 151 versions), resolve each to a yandex/clickhouse image, package, or unavailable (image-aware, handles 3- vs 4-component tag mismatches). - prepare-data/: build canonical Native data files for hits, SSB (SF100), mgbench (logs1/2/3) and NYC taxi using only oldest-compatible types (Nullable kept only where the queries need IS NULL); stored zstd-6 and streamed via `zstd -dc | clickhouse-client` at load time. - create/: per-version DDL (legacy MergeTree(date,(key),8192) for the earliest 1.1.x, modern PARTITION BY/ORDER BY otherwise) with column schemas under create/schema/. - run-version.sh / run-all.sh: provider abstraction (image + package-in- ubuntu fallback), IPv4 listen override and a matching-version sidecar client image to repair the oldest server images (back to 1.1.54019), plain INSERT ... FORMAT Native loading, 75-query set timed per dataset. Validated full-scale on 1.1.54019 (oldest) and 1.1.54378; data files are gitignored. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add build-from-source/ to compile and run the ClickHouse versions that were never published as a Docker image or package — the bare-number early tags 53973..54011 and the 1.1.x releases 54165/54318/54335/54336/54358/54362/54370. - Dockerfile.ubuntu1604: build a tag in its contemporary environment (Ubuntu 16.04) into a runnable clickhouse-built:<v> image. Handles the era quirks: compiler escalates with date (gcc-5 -> 6 -> 7, the later two from the ubuntu-toolchain-r PPA via ARG GCC), strip the hardcoded -Werror, tolerant submodule init (contrib/zookeeper's upstream is gone -> cmake falls back to system libzookeeper-mt-dev), IPv4 listen, a clickhouse multi-call shim and the pre-created data dirs the 2016 server needs. - build.sh / build-all.sh: build one or many (JOBS concurrent — a single make -j$(nproc) doesn't saturate the cores on these small codebases). - versions.txt: the build list with tag, date and required GCC per version. - list-versions.sh: route these versions to their clickhouse-built:<v> image and order all 189 versions chronologically; nothing is "unavailable" anymore. - run-all.sh: load PARALLEL versions concurrently, then benchmark sequentially. - run-version.sh: LOAD_DATASETS lets a run skip a dataset's load (e.g. the huge taxi table) while its queries still run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

create.sh parsed bare build-number versions (e.g. "53982", the pre-1.1 early-release tags) as major=53982 >= 18 and emitted modern PARTITION BY / ORDER BY syntax, which the 2016 servers reject — so every table create failed and those versions produced all-null results. Treat a bare numeric version as an early build (custom partitioning landed at build 54310; all bare tags predate it), so they correctly get the legacy MergeTree(date,(key),8192) engine. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Mirror the main ClickBench cloud flow for the Versions Benchmark: benchmark one version per fresh VM and send the result to the sink. - cloud-init.sh.in: install Docker, download the prepared Native files from s3://clickhouse-public-datasets/versions-benchmark/, build the image from source when the version has none (clickhouse-built:* via build-from-source), run run-version.sh, POST the result JSON (enriched with machine + kind) and the log to sink.data on play.clickhouse.com, then terminate. - run-benchmark.sh: resolve a version's image/tag/gcc and launch a VM (terminate-on-shutdown, capacity-retry), as the main launcher does. - run-all-benchmarks.sh: one VM per runnable version. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Match the main ClickBench download style (resumable, giga-scale progress). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

If clickhouse dies during a query (e.g. OOM-killed), the container exits but its data layer survives. Detect the dead server (SELECT 1 fails), revive it with docker start (relaunch the daemon for the package provider), and retry the query up to CRASH_RETRIES (default 2). This keeps one heavy query from nulling out every subsequent query for that version. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

e.g. '26.6' resolves to '26.6.1.1193' (one patch per YY.MM is kept), and the launcher canonicalises to the full version. Exact versions and bare tags still match directly; an ambiguous prefix lists the candidates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A prefix now picks the newest matching version instead of erroring on ambiguity: 24 -> 24.12.x, 1.1 -> the latest 1.1.x, 26.6 -> 26.6.1.1193. Exact versions and bare tags still match directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Builds an EBS volume holding the prepared Native files (sized just-enough), snapshots it (labelled versions-data, tagged Name=clickbench-versions-data) and deletes the working volume. Standalone and not wired into the launcher: a snapshot-backed volume lazy-loads from S3, so for one-shot VMs it is not faster than the plain S3 download unless Fast Snapshot Restore or volume reuse is used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A snapshot-backed volume lazy-loads from S3, so it isn't faster than the plain S3 download for one-shot VMs; the snapshot approach is not used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…loop stdin - load_data now echoes each CREATE (with DDL), the INSERT ... FORMAT Native and source file, and "loaded <table>: N rows in Ns" — so the cloud-init log shows what's happening during ingest. - Client invocations set HOME=/tmp (old images' clickhouse user has HOME /nonexistent -> history-file error) and TZ=UTC, and the sidecar client mounts the host /usr/share/zoneinfo (some old client images ship no tzdata and fail at startup with "Could not determine local time zone"). - Fix a stdin-drain regression from the crash-retry: the per-query `docker exec/run -i` client (and the SELECT 1 liveness probe) consumed the query file the benchmark loop reads on stdin, truncating each version to ~60/75 queries. Read queries on FD 3 and give the probe </dev/null. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Echo 'qN [dataset]: [t1, t2, t3]' to the log as each query finishes, and cat results/<version>.json at the end so the full result is visible in the run output / cloud-init log (and thus received via the sink), not just written to a file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pipe the compressed file through pv before zstd -> INSERT, so loads report a periodic progress bar (percentage, rate, ETA) based on the known file size. Falls back to cat when pv is absent; pv added to the cloud-init apt install. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Fetch the actual git commit date for every from-source version (the bare tags 53973..54011 had bogus 2016-01-01 fallbacks from an earlier rate-limited fetch) and record it in versions.txt; list-versions.sh now reports that commit date for built versions instead of the version_date.tsv release date. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Print 'table <TAB> bytes' from system.parts (database 'default'), falling back for old versions without it to du -sLb on the data dir (/var/lib/clickhouse or /opt/clickhouse), following symlinks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Cold-cache query reads and ingest are disk-bound, so use gp3 (default 1000 MB/s / 16000 IOPS, both overridable via throughput=/iops=) instead of gp2 whose throughput is tied to size. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Build the list of dataset files and fetch them concurrently (xargs -P 8 wget --continue --progress=dot:giga) instead of one at a time. Missing files (e.g. ssb/taxi not yet uploaded) fail their own wget and are skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Use aria2c to split each file into parallel byte-range segments (-x16 -s16) and run several files at once (-j4), so the huge taxi file isn't one slow stream and small files don't wait behind it. Falls back to parallel wget if aria2c is absent; aria2 added to the cloud-init install. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

aria2 -j (multiple files at once) cross-contaminated per-file sizes (it used hits's length for ssb's byte ranges), so only hits downloaded and ssb/mgbench were aborted -> skipped at load time. Run one aria2c per file via xargs -P instead: each still uses 16 parallel segments, files download concurrently, and --allow-overwrite re-fetches a non-resumable pre-existing file rather than 416. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Expand the Versions Benchmark to 9 datasets / 344 queries, each loaded into its own database so same-named tables never collide (e.g. TPC-H and TPC-DS both have `customer`): - TPC-H (SF40) and TPC-DS (SF32): official schemas/queries from the ClickHouse repo, Decimal->Float64, NULL->type defaults, synth_date for the legacy engine. - Coffee Shop (fact_sales_500m, minus the unused order_line_id column) from the published Iceberg tables. - ontime (12 used columns) and UK price-paid, from the docs' saved copies. - Join Order Benchmark (21 IMDB tables, 113 queries) with a CSV re-encoder. - Narrow taxi to the 5 columns its queries use. Also: per-dataset databases in run-version.sh/create.sh; default 6 tries (1 cold + 5 hot); dataset-qualified column schemas. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… timeout - run-version.sh builds clickhouse-built:* images on demand (ensure_built_image) when absent, using the recipe from versions.txt (build.sh) or monthly.tsv (Dockerfile.reconstruct) — nothing is pulled from a registry. - Load all datasets in parallel (one background job per dataset / database). - Per-query timeout (QUERY_TIMEOUT, default 100s): a query that exceeds it, or crashes the server, records null and skips its remaining tries (the server is revived after a crash so later queries still run). - ontime: sort the dump (Year, Month, FlightDate, ...) so INSERT blocks are date-contiguous and don't exceed max_partitions_per_insert_block (~450 months). - Reconstruct the build system for pre-2016-03 snapshots that lack one: Dockerfile.reconstruct + reconstruct.sh transplant the 2016-03 donor's build system + contrib, glob renamed sources, stub QuickLZ/MongoDB, generate re2_st, add an isnan shim; build-monthly.sh sweeps monthly.tsv. Strip the never-public add_subdirectory(private) from the 2016-06..08 tags in Dockerfile.ubuntu1604. - cloud-init: install docker-buildx; drop the now-redundant explicit build step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…2 builds - run-benchmark.sh: default volume 500 -> 1000 GB; parallel loading of all datasets peaks disk usage above the on-disk total (NOT_ENOUGH_SPACE otherwise). - reconstruct.sh: build the pre-2016-02 era with the old libstdc++ ABI (_GLIBCXX_USE_CXX11_ABI=0) — that era used the refcounted (COW) std::string, which the struct sizing assumes (Field's DBMS_TOTAL_FIELD_SIZE=32). Also: generic prune of donor-listed sources absent in an older target (encoding-safe), disable utils/, strip add_subdirectory(private), vendor Poco/Ext/ScopedTry.h. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

An aborted/incomplete INSERT (crash, OOM, disk full, interrupted stream) can leave a partially-loaded table. Drop it so the dataset's queries report null instead of timing against incomplete data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add two subobjects to results/<version>.json: - "load_time": {dataset: sum of its tables' load times in seconds} — accumulated per table during the (parallel, possibly separate) load phase into a stats file and summed per dataset at bench time. - "data_size": {dataset: on-disk bytes} — per-database sum(bytes_on_disk) from system.parts (each dataset is its own database), with a data-directory du fallback for old versions lacking that column. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2016-01 and older predate a big refactor and were built on trusty/gcc-5 with the old libstdc++ ABI (refcounted std::string). Reconstruct.sh + Dockerfile.reconstruct now build them end-to-end (verified: 2016-01 builds server+client and boots, SELECT version() -> 0.0.53400): - Base is now ubuntu:14.04 so gcc-5 defaults to the old ABI and the system boost is old-ABI too (16.04's new-ABI boost broke the client's boost::program_options). - Vendor the never-public Yandex libs from the donor: statdaemons embedded dictionaries (via DB/Dictionaries/Embedded) and the daemon base (a thin BaseDaemon compat carrying the used API + --config-file handling, avoiding the donor's newer zkutil/graphite deps); stub statdaemons/Interests.h. - Glob the whole dbms library (excluding the Server/Client executables + ODBC driver) so renamed/moved sources compile regardless of the donor's file lists. - Force-include <numeric>/<random> (not transitively available on this toolchain); build re2 then the server and client with single-target makes (avoid a recursive-make race on shared static libs); os.walk for Python 3.4. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov and others added 26 commits June 29, 2026 15:29

versions: default the benchmark VM to c7a.4xlarge

3f2121d

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: download data with wget --continue --progress=dot:giga

b9ba0e5

Match the main ClickBench download style (resumable, giga-scale progress). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: drop prepare-ebs-snapshot.sh

e5b4637

A snapshot-backed volume lazy-loads from S3, so it isn't faster than the plain S3 download for one-shot VMs; the snapshot approach is not used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Versions benchmark rework#968

Versions benchmark rework#968
alexey-milovidov wants to merge 26 commits into
mainfrom
versions-benchmark-rework

alexey-milovidov commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

alexey-milovidov commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant