LogHouse, ClickHouse’s internal observability platform, grew from 19 PiB to over 100 PB of uncompressed data and 500 trillion rows, requiring a more efficient ingestion strategy.
The OpenTelemetry pipeline became a performance and resource bottleneck at scale—incurring high CPU usage, latency, and data loss under spikes.
ClickHouse developed SysEx, a specialized system-tables exporter that streams native data byte-for-byte between ClickHouse instances, cutting CPU usage by over 90% and eliminating intermediate marshalling.
SysEx uses a pull-based model with sliding time windows to ensure complete delivery, dynamic schema versioning, and ClickHouse’s Merge engine to unify evolving table schemas.
They integrated HyperDX, a ClickHouse-native observability UI supporting Lucene-style queries and SQL for complex analysis, and shifted to storing wide, high-cardinality events without pre-aggregation.
Future enhancements include “zero-impact scraping” from S3 storage, further reducing live-system load while maintaining high fidelity.
OpenTelemetry remains in use for crash-loop scenarios and lower-volume logs, with reduced scope to optimize resource use.
Get notified when new stories are published for "🇺🇸 Hacker News English"