Skip to content

Versions benchmark rework#968

Open
alexey-milovidov wants to merge 26 commits into
mainfrom
versions-benchmark-rework
Open

Versions benchmark rework#968
alexey-milovidov wants to merge 26 commits into
mainfrom
versions-benchmark-rework

Conversation

@alexey-milovidov

Copy link
Copy Markdown
Member

No description provided.

alexey-milovidov and others added 26 commits June 29, 2026 15:29
Rebuild the ClickHouse Versions Benchmark infrastructure from scratch
around Docker images so every historical and current version can be run
identically, replacing the old apt-based scripts.

- list-versions.sh: select versions from the authoritative version_date.tsv
  (all 1.1.x + latest patch per YY.MM, 151 versions), resolve each to a
  yandex/clickhouse image, package, or unavailable (image-aware, handles
  3- vs 4-component tag mismatches).
- prepare-data/: build canonical Native data files for hits, SSB (SF100),
  mgbench (logs1/2/3) and NYC taxi using only oldest-compatible types
  (Nullable kept only where the queries need IS NULL); stored zstd-6 and
  streamed via `zstd -dc | clickhouse-client` at load time.
- create/: per-version DDL (legacy MergeTree(date,(key),8192) for the
  earliest 1.1.x, modern PARTITION BY/ORDER BY otherwise) with column
  schemas under create/schema/.
- run-version.sh / run-all.sh: provider abstraction (image + package-in-
  ubuntu fallback), IPv4 listen override and a matching-version sidecar
  client image to repair the oldest server images (back to 1.1.54019),
  plain INSERT ... FORMAT Native loading, 75-query set timed per dataset.

Validated full-scale on 1.1.54019 (oldest) and 1.1.54378; data files are
gitignored.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add build-from-source/ to compile and run the ClickHouse versions that were
never published as a Docker image or package — the bare-number early tags
53973..54011 and the 1.1.x releases 54165/54318/54335/54336/54358/54362/54370.

- Dockerfile.ubuntu1604: build a tag in its contemporary environment
  (Ubuntu 16.04) into a runnable clickhouse-built:<v> image. Handles the era
  quirks: compiler escalates with date (gcc-5 -> 6 -> 7, the later two from the
  ubuntu-toolchain-r PPA via ARG GCC), strip the hardcoded -Werror, tolerant
  submodule init (contrib/zookeeper's upstream is gone -> cmake falls back to
  system libzookeeper-mt-dev), IPv4 listen, a clickhouse multi-call shim and the
  pre-created data dirs the 2016 server needs.
- build.sh / build-all.sh: build one or many (JOBS concurrent — a single
  make -j$(nproc) doesn't saturate the cores on these small codebases).
- versions.txt: the build list with tag, date and required GCC per version.
- list-versions.sh: route these versions to their clickhouse-built:<v> image and
  order all 189 versions chronologically; nothing is "unavailable" anymore.
- run-all.sh: load PARALLEL versions concurrently, then benchmark sequentially.
- run-version.sh: LOAD_DATASETS lets a run skip a dataset's load (e.g. the huge
  taxi table) while its queries still run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
create.sh parsed bare build-number versions (e.g. "53982", the pre-1.1
early-release tags) as major=53982 >= 18 and emitted modern PARTITION BY /
ORDER BY syntax, which the 2016 servers reject — so every table create failed
and those versions produced all-null results. Treat a bare numeric version as
an early build (custom partitioning landed at build 54310; all bare tags
predate it), so they correctly get the legacy MergeTree(date,(key),8192) engine.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirror the main ClickBench cloud flow for the Versions Benchmark: benchmark one
version per fresh VM and send the result to the sink.

- cloud-init.sh.in: install Docker, download the prepared Native files from
  s3://clickhouse-public-datasets/versions-benchmark/, build the image from
  source when the version has none (clickhouse-built:* via build-from-source),
  run run-version.sh, POST the result JSON (enriched with machine + kind) and
  the log to sink.data on play.clickhouse.com, then terminate.
- run-benchmark.sh: resolve a version's image/tag/gcc and launch a VM
  (terminate-on-shutdown, capacity-retry), as the main launcher does.
- run-all-benchmarks.sh: one VM per runnable version.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Match the main ClickBench download style (resumable, giga-scale progress).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
If clickhouse dies during a query (e.g. OOM-killed), the container exits but its
data layer survives. Detect the dead server (SELECT 1 fails), revive it with
docker start (relaunch the daemon for the package provider), and retry the query
up to CRASH_RETRIES (default 2). This keeps one heavy query from nulling out
every subsequent query for that version.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
e.g. '26.6' resolves to '26.6.1.1193' (one patch per YY.MM is kept), and the
launcher canonicalises to the full version. Exact versions and bare tags still
match directly; an ambiguous prefix lists the candidates.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A prefix now picks the newest matching version instead of erroring on ambiguity:
24 -> 24.12.x, 1.1 -> the latest 1.1.x, 26.6 -> 26.6.1.1193. Exact versions and
bare tags still match directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Builds an EBS volume holding the prepared Native files (sized just-enough),
snapshots it (labelled versions-data, tagged Name=clickbench-versions-data) and
deletes the working volume. Standalone and not wired into the launcher: a
snapshot-backed volume lazy-loads from S3, so for one-shot VMs it is not faster
than the plain S3 download unless Fast Snapshot Restore or volume reuse is used.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A snapshot-backed volume lazy-loads from S3, so it isn't faster than the plain
S3 download for one-shot VMs; the snapshot approach is not used.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…loop stdin

- load_data now echoes each CREATE (with DDL), the INSERT ... FORMAT Native and
  source file, and "loaded <table>: N rows in Ns" — so the cloud-init log shows
  what's happening during ingest.
- Client invocations set HOME=/tmp (old images' clickhouse user has HOME
  /nonexistent -> history-file error) and TZ=UTC, and the sidecar client mounts
  the host /usr/share/zoneinfo (some old client images ship no tzdata and fail at
  startup with "Could not determine local time zone").
- Fix a stdin-drain regression from the crash-retry: the per-query `docker
  exec/run -i` client (and the SELECT 1 liveness probe) consumed the query
  file the benchmark loop reads on stdin, truncating each version to ~60/75
  queries. Read queries on FD 3 and give the probe </dev/null.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Echo 'qN [dataset]: [t1, t2, t3]' to the log as each query finishes, and cat
results/<version>.json at the end so the full result is visible in the run
output / cloud-init log (and thus received via the sink), not just written to
a file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pipe the compressed file through pv before zstd -> INSERT, so loads report a
periodic progress bar (percentage, rate, ETA) based on the known file size.
Falls back to cat when pv is absent; pv added to the cloud-init apt install.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fetch the actual git commit date for every from-source version (the bare tags
53973..54011 had bogus 2016-01-01 fallbacks from an earlier rate-limited fetch)
and record it in versions.txt; list-versions.sh now reports that commit date for
built versions instead of the version_date.tsv release date.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Print 'table <TAB> bytes' from system.parts (database 'default'), falling back
for old versions without it to du -sLb on the data dir (/var/lib/clickhouse or
/opt/clickhouse), following symlinks.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cold-cache query reads and ingest are disk-bound, so use gp3 (default 1000 MB/s
/ 16000 IOPS, both overridable via throughput=/iops=) instead of gp2 whose
throughput is tied to size.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Build the list of dataset files and fetch them concurrently (xargs -P 8 wget
--continue --progress=dot:giga) instead of one at a time. Missing files (e.g.
ssb/taxi not yet uploaded) fail their own wget and are skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Use aria2c to split each file into parallel byte-range segments (-x16 -s16) and
run several files at once (-j4), so the huge taxi file isn't one slow stream and
small files don't wait behind it. Falls back to parallel wget if aria2c is
absent; aria2 added to the cloud-init install.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
aria2 -j (multiple files at once) cross-contaminated per-file sizes (it used
hits's length for ssb's byte ranges), so only hits downloaded and ssb/mgbench
were aborted -> skipped at load time. Run one aria2c per file via xargs -P
instead: each still uses 16 parallel segments, files download concurrently, and
--allow-overwrite re-fetches a non-resumable pre-existing file rather than 416.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Expand the Versions Benchmark to 9 datasets / 344 queries, each loaded into
its own database so same-named tables never collide (e.g. TPC-H and TPC-DS
both have `customer`):

- TPC-H (SF40) and TPC-DS (SF32): official schemas/queries from the ClickHouse
  repo, Decimal->Float64, NULL->type defaults, synth_date for the legacy engine.
- Coffee Shop (fact_sales_500m, minus the unused order_line_id column) from the
  published Iceberg tables.
- ontime (12 used columns) and UK price-paid, from the docs' saved copies.
- Join Order Benchmark (21 IMDB tables, 113 queries) with a CSV re-encoder.
- Narrow taxi to the 5 columns its queries use.

Also: per-dataset databases in run-version.sh/create.sh; default 6 tries
(1 cold + 5 hot); dataset-qualified column schemas.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… timeout

- run-version.sh builds clickhouse-built:* images on demand (ensure_built_image)
  when absent, using the recipe from versions.txt (build.sh) or monthly.tsv
  (Dockerfile.reconstruct) — nothing is pulled from a registry.
- Load all datasets in parallel (one background job per dataset / database).
- Per-query timeout (QUERY_TIMEOUT, default 100s): a query that exceeds it, or
  crashes the server, records null and skips its remaining tries (the server is
  revived after a crash so later queries still run).
- ontime: sort the dump (Year, Month, FlightDate, ...) so INSERT blocks are
  date-contiguous and don't exceed max_partitions_per_insert_block (~450 months).
- Reconstruct the build system for pre-2016-03 snapshots that lack one:
  Dockerfile.reconstruct + reconstruct.sh transplant the 2016-03 donor's build
  system + contrib, glob renamed sources, stub QuickLZ/MongoDB, generate re2_st,
  add an isnan shim; build-monthly.sh sweeps monthly.tsv. Strip the never-public
  add_subdirectory(private) from the 2016-06..08 tags in Dockerfile.ubuntu1604.
- cloud-init: install docker-buildx; drop the now-redundant explicit build step.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…2 builds

- run-benchmark.sh: default volume 500 -> 1000 GB; parallel loading of all
  datasets peaks disk usage above the on-disk total (NOT_ENOUGH_SPACE otherwise).
- reconstruct.sh: build the pre-2016-02 era with the old libstdc++ ABI
  (_GLIBCXX_USE_CXX11_ABI=0) — that era used the refcounted (COW) std::string,
  which the struct sizing assumes (Field's DBMS_TOTAL_FIELD_SIZE=32). Also:
  generic prune of donor-listed sources absent in an older target (encoding-safe),
  disable utils/, strip add_subdirectory(private), vendor Poco/Ext/ScopedTry.h.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
An aborted/incomplete INSERT (crash, OOM, disk full, interrupted stream) can
leave a partially-loaded table. Drop it so the dataset's queries report null
instead of timing against incomplete data.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add two subobjects to results/<version>.json:
- "load_time": {dataset: sum of its tables' load times in seconds} — accumulated
  per table during the (parallel, possibly separate) load phase into a stats
  file and summed per dataset at bench time.
- "data_size": {dataset: on-disk bytes} — per-database sum(bytes_on_disk) from
  system.parts (each dataset is its own database), with a data-directory du
  fallback for old versions lacking that column.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2016-01 and older predate a big refactor and were built on trusty/gcc-5 with the
old libstdc++ ABI (refcounted std::string). Reconstruct.sh + Dockerfile.reconstruct
now build them end-to-end (verified: 2016-01 builds server+client and boots,
SELECT version() -> 0.0.53400):

- Base is now ubuntu:14.04 so gcc-5 defaults to the old ABI and the system boost
  is old-ABI too (16.04's new-ABI boost broke the client's boost::program_options).
- Vendor the never-public Yandex libs from the donor: statdaemons embedded
  dictionaries (via DB/Dictionaries/Embedded) and the daemon base (a thin
  BaseDaemon compat carrying the used API + --config-file handling, avoiding the
  donor's newer zkutil/graphite deps); stub statdaemons/Interests.h.
- Glob the whole dbms library (excluding the Server/Client executables + ODBC
  driver) so renamed/moved sources compile regardless of the donor's file lists.
- Force-include <numeric>/<random> (not transitively available on this toolchain);
  build re2 then the server and client with single-target makes (avoid a
  recursive-make race on shared static libs); os.walk for Python 3.4.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant