Linux pipes implement a kernel ring buffer of 4KiB pages, causing double-copy and lock contention when using standard read/write syscalls.
Profiling with perf shows most time is spent in pipe_write, page copying, allocation, and synchronization.
The splice and vmsplice syscalls enable zero-copy transfers by inserting existing user pages directly into the pipe buffer.
Allocating 2 MiB huge pages and advising the kernel with madvise reduces get_user_pages overhead and speeds up page mapping.
Using non-blocking splice/vmsplice in a busy loop avoids blocking and waking overhead at the cost of higher CPU usage.
Combining these optimizations increases throughput from about 3.5 GiB/s to over 60 GiB/s.
This improvement path illustrates profiling-driven optimization, zero-copy IO, paging concepts, and trade-offs in high-performance code.
Get notified when new stories are published for "🇺🇸 Hacker News English"