Bug report
Bug description
The binary profile reader in _remote_debugging (used by python -m profiling.sampling to replay .pyb files) reads the string-table and
frame-table entry counts from the file footer and allocates arrays of that size
before validating the counts against the amount of data actually present in
the file:
reader_parse_string_table does PyMem_Calloc(strings_count, sizeof(PyObject *))
reader_parse_frame_table does PyMem_Malloc(frames_count * sizeof(FrameEntry))
strings_count / frames_count are 32-bit values taken from the footer, so a
tiny file can declare up to 2**32 - 1 entries and make the reader attempt a
multi-gigabyte allocation (up to ~120 GB for the frame table). Opening the file
is enough; no valid sample data is required.
This is reachable from the CLI: python -m profiling.sampling replay <file> and
--diff-flamegraph <baseline> both open a user-supplied .pyb through
_remote_debugging.BinaryReader, and the replay input check only validates
the magic number, not the footer.
Reproducer
import _remote_debugging, struct, os, tempfile
fn = tempfile.mktemp(suffix=".pyb")
w = _remote_debugging.BinaryWriter(fn, 1000, 0, compression=0)
w.finalize() # minimal valid file, no samples
size = os.path.getsize(fn)
# footer is the last 32 bytes; the frame count is a u32 at footer offset 4
with open(fn, "r+b") as f:
f.seek(size - 32 + 4)
f.write(struct.pack("<I", 0xFFFFFFFF))
_remote_debugging.BinaryReader(fn) # attempts a ~120 GB allocation
On an unpatched build this drives the process to several gigabytes of RSS (and
can hang) from a ~100-byte file. The string table is affected the same way via
strings_count (footer offset 0).
Expected behavior
The reader should reject a declared count that cannot be backed by the file's
bytes with a ValueError, the same way the RLE sample count is already bounded
in binary_reader_replay.
This is a sibling of gh-148252, which hardened the same reader against malformed
.pyb files; the eager string/frame table allocations were not covered there.
Linked PRs
Bug report
Bug description
The binary profile reader in
_remote_debugging(used bypython -m profiling.samplingto replay.pybfiles) reads the string-table andframe-table entry counts from the file footer and allocates arrays of that size
before validating the counts against the amount of data actually present in
the file:
reader_parse_string_tabledoesPyMem_Calloc(strings_count, sizeof(PyObject *))reader_parse_frame_tabledoesPyMem_Malloc(frames_count * sizeof(FrameEntry))strings_count/frames_countare 32-bit values taken from the footer, so atiny file can declare up to
2**32 - 1entries and make the reader attempt amulti-gigabyte allocation (up to ~120 GB for the frame table). Opening the file
is enough; no valid sample data is required.
This is reachable from the CLI:
python -m profiling.sampling replay <file>and--diff-flamegraph <baseline>both open a user-supplied.pybthrough_remote_debugging.BinaryReader, and thereplayinput check only validatesthe magic number, not the footer.
Reproducer
On an unpatched build this drives the process to several gigabytes of RSS (and
can hang) from a ~100-byte file. The string table is affected the same way via
strings_count(footer offset 0).Expected behavior
The reader should reject a declared count that cannot be backed by the file's
bytes with a
ValueError, the same way the RLE sample count is already boundedin
binary_reader_replay.This is a sibling of gh-148252, which hardened the same reader against malformed
.pybfiles; the eager string/frame table allocations were not covered there.Linked PRs