Add array subset-check benchmark across Ruby 3.3, 3.4, 4.0#237
Draft
etagwerker wants to merge 1 commit into
Draft
Add array subset-check benchmark across Ruby 3.3, 3.4, 4.0#237etagwerker wants to merge 1 commit into
etagwerker wants to merge 1 commit into
Conversation
Revisits the comparison from the closed #125 (gabteles), fixing the reversed Set#subset? arguments so every approach returns the same result (guarded by an equivalence check), and benchmarks across modern Ruby versions. Findings: - (a1 - a2).empty? is the consistent winner for true-subset inputs. - a1.all? { include? } only wins when a1 is NOT a subset (short-circuits) and is O(n*m) on large true subsets. - Set#subset? (incl. to_set) went from ~6.8x slower on 3.3/3.4 to ~1.7x slower on 4.0, where Set got much faster. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an Array benchmark for subset checks (is every element of
a1also ina2?), comparing five approaches across Ruby 3.3.10, 3.4.7, and 4.0.0:(a1 - a2).empty?(a1 & a2) == a1(a1 & a2).size == a1.sizea1.all? { |e| a2.include?(e) }a1.to_set.subset?(a2.to_set)Background
This revisits the comparison from #125 by @gabteles (now closed). That PR had two problems the reviewers (@mblumtritt, @Arcovion) hinted at back in 2017:
Set#subset?had its arguments reversed (a2.to_set.subset?(a1.to_set)), so it returnedfalsewhile every other method returnedtrue. It wasn't measuring the same operation.This version fixes the Set arguments, adds an equivalence guard so all five approaches must agree before the benchmark runs, and reports results across three modern Ruby versions.
Findings
(a1 - a2).empty?is the consistent winner across 3.3, 3.4, and 4.0 for the common case wherea1really is a subset.a1.all? { include? }is data-dependent: it short-circuits on the first miss (so it wins whena1is not a subset), but it's O(n*m) and degrades badly on large true subsets. The README entry documents this caveat.Set#subset?improved dramatically in Ruby 4.0: ~6.8x slower on 3.3/3.4 (dominated byto_setallocation) but only ~1.7x slower on 4.0. If you already holdSets or check repeatedly, it scales best.Notes
rbenvon each version (benchmark-ipsinstalled per version).Comparison:summary for 3.4.7 and 4.0.0 to keep it readable.🤖 Generated with Claude Code