Skip to main content

Avatar photo

Benchmarking Simplyblock Storage on ClickBench with PostgreSQL and DuckDB

Jul 24th, 2025 | 9 min read

As storage demands scale and workloads become increasingly performance-sensitive, the right infrastructure choices can drive massive gains in both throughput and efficiency. At simplyblock, we set out to validate the performance of our storage layer using the industry-standard ClickBench benchmark suite.

Thanks to the vast number of available databases supported databases, ClickBench is a perfect tool to run database benchmarks and adjust them to new environments quickly. We tested two popular databases—PostgreSQL and DuckDB—backed by simplyblock’s high-performance software-defined storage.

Test Environment

One of the major factors when running benchmarks is the environment used. Many original ClickBench results utilize AWS virtual machines or proprietary hardware. Simplyblock, however, used Google Compute Engine (GCP) to run the test. This was a cost-efficiency choice and doesn’t reflect any preference of GCP over AWS on simplyblock’s side.

Original Infrastructure: ClickBench

  • PostgreSQL:
    • AWS instance type: c6a.4xlarge
    • 16 vCPUs (x86_64), 32 GB RAM
    • Amazon EBS storage volume
  • DuckDB:
    • AWS instance type: c6a.metal
    • 192 vCPUs (x86_64), 384 GB RAM
    • Amazon EBS storage volume

Simplyblock-Backed Infrastructure

PostgreSQL runs:

  • GCP Instance type: c3-standard-88
  • 88 vCPUs (x86_64), 352 GB RAM
  • 100 Gbit/s Tier 1 networking
  • 6 simplyblock logical volumes, connected via NVMe over TCP, LVM-striped XFS volumes for WAL and table data, enabling maximum parallelism

⚠️ Note: Despite the larger instance size, vCPU and RAM usage were limited during the benchmark to match the original ClickBench spec (16 vCPUs, 32 GB RAM) to ensure a fair comparison. The larger instance was chosen solely to meet GCP’s higher network throughput requirements.

Simplyblock storage cluster:

  • c4a-standard-64-lssd instances, each with:
    • 64 ARM cores (AARCH64, Google Axion)
    • 256 GB RAM
    • 14x Local SSDs (NVMe, 375 GB each)
    • 75 Gbit/s Tier 1 networking
  • Each host ran 2 simplyblock nodes, totaling 6 distributed storage nodes

🐘 PostgreSQL: Raw and Indexed Queries

Test Scenarios:

  • PostgreSQL without indexes
  • PostgreSQL with indexes

Each was tested against the given baseline on ClickBench (most likely AWS EBS) with a simplyblock-backed storage. CPU and RAM was limited. No other configuration changes were applied.

Key Observations:

SystemLoad TimeBenchmark Runtime (warm)
PostgreSQL937 s11908 s
Simplyblock+PostgreSQL775 s668 s
PostgreSQL (Indexed)10357 s4098 s
Simplyblock+PostgreSQL (Indexed)5560 s1264 s

Simplyblock reduced the benchmark runtime by up to 18x, with significant improvements even when indexes were used.

Simplyblock reduced the runtime of certain queries by over 5000x with an average of over 500x.

Simplyblock reduced load time by up to 47%, thanks to parallel ingestion on NVMe-backed volumes.

ClickBench Results: PostgreSQL without Indexes

This benchmark compares runs for PostgreSQL without indexes on the baseline Amazon EBS vs. simplyblock storage.

Execution ratios range from 2x to over 5000x, with cold queries consistently being 10–270x faster and warm queries showing especially high improvements in complex queries.

When PostgreSQL is forced to scan tables (due to lack of indexes), storage performance becomes the dominant factor. That said, PostgreSQL’s performance is typically limited by disk I/O.

Simplyblock’s high-bandwidth, parallelized storage backend drastically reduces execution time for nearly every query. Our NVMe-over-TCP-based storage offers much higher throughput and parallelism compared to AWS EBS, translating to significant speedups, especially in I/O-heavy workloads.

QueryPG (cold)SB+PG (cold)Rel. SpeedPG (warm)SB+PG (warm)Rel. Speed
Q01269.9929.88127.32x258.5342.165119.44x
Q02269.52210.12526.62x259.2542.566101.04x
Q03269.67210.44125.83x258.5312.78092.99x
Q04269.4619.95027.08x258.5122.355109.77x
Q05284.23520.55713.83x272.64712.80221.30x
Q06289.91229.3399.88x278.20121.43812.98x
Q07269.37210.58325.45x258.4682.365109.31x
Q08269.43910.13726.58x258.3982.60099.40x
Q09296.87226.18311.34x285.17318.11115.75x
Q10299.05127.72310.79x286.86419.88014.43x
Q11270.77911.60223.34x258.9484.18261.92x
Q12271.24612.55821.60x259.2824.59856.39x
Q13274.95416.27616.89x263.6818.43631.26x
Q14279.04218.55915.04x267.31010.55525.33x
Q15276.68417.01216.26x265.2079.82526.99x
Q16284.92019.78414.40x273.51812.90621.19x
Q17475.078226.0612.10x458.150154.0782.97x
Q18279.84311.87623.56x268.6544.41460.87x
Q19312.733148.5712.10x301.21096.1873.13x
Q20269.0599.95127.04x258.0522.276113.38x
Q21269.00210.86724.75x258.0473.95665.23x
Q22269.34811.38923.65x258.0184.29060.15x
Q23269.18311.31023.80x258.0014.36759.08x
Q24269.07311.14324.15x258.0424.01164.33x
Q25269.05310.01626.86x257.9482.92788.14x
Q26269.05610.31626.08x257.9932.90388.87x
Q27269.00310.31026.09x257.8942.96187.10x
Q28269.04211.71422.97x257.8924.50857.20x
Q29279.73751.6395.42x268.92245.0485.97x
Q30269.09312.37321.75x258.1096.24441.34x
Q31274.11215.86317.28x262.2748.00132.78x
Q32276.58917.68215.64x264.56510.09326.21x
Q33483.774248.9391.94x477.739172.9072.76x
Q34355.6727.41647.96x349.1960.1881855.95x
Q35358.6600.637563.36x347.1770.0655318.80x
Q36276.3630.483572.09x264.7300.0604443.27x
Q37269.1170.503535.15x257.6380.0892901.18x
Q38268.8280.488550.99x257.6210.0843083.21x
Q39268.8710.503534.39x257.6740.0713652.88x
Q40269.2060.526512.27x257.6170.0773351.02x
Q41268.7840.577465.93x257.6740.1561654.48x
Q42268.7080.559480.63x257.5190.1172193.35x
Q43268.6140.476564.30x257.5570.0663907.14x

ClickBench Results: PostgreSQL with Indexes

In this test, PostgreSQL used indexes, reducing the storage dependency but increasing the load time significantly. Yet, simplyblock still provided 2–25x speedups on many cold and warm queries. Some queries saw near parity or slight regressions, mostly due to PostgreSQL’s internal optimizations masking storage advantages.

Even when indexes limit I/O needs, simplyblock still enhances performance by ensuring fast data fetches and load operations. This proves its value even in optimized, production-like scenarios where queries use indexes extensively.

Interestingly, some queries are faster without indexes. This shows the benefit of using extremely fast storage underneath PostgreSQL, with the option to reduce DDL complexity and remove the computational overhead of the indexing.

QueryPG Idx (cold)SB+PG Idx (cold)Rel. SpeedPG Idx (warm)SB+PG Idx (warm)Rel. Speed
Q015.0652.4482.07x0.7670.7651.00x
Q021.3631.2041.13x0.7060.7140.99x
Q03248.8859.91925.09x237.6192.68488.52x
Q048.2863.4882.38x1.3071.4040.93x
Q058.1007.9491.02x7.2796.8801.06x
Q0611.3679.5881.19x6.3907.1860.89x
Q070.0370.0201.81x0.0000.0001.94x
Q081.4261.2481.14x0.6960.7220.96x
Q0911.82810.3091.15x8.1817.7861.05x
Q10278.03128.0119.93x267.47720.81712.85x
Q119.3774.4892.09x2.0622.2250.93x
Q12854.244201.9464.23x744.48310.48171.03x
Q135.0912.9661.72x2.2792.3460.97x
Q1419.97913.1851.52x9.3049.5970.97x
Q15268.01933.7707.94x256.05412.49120.50x
Q1618.21511.6201.57x14.33211.0401.30x
Q1716.00917.4880.92x15.00816.2270.92x
Q180.0280.0191.43x0.0000.0001.06x
Q19299.935266.2421.13x287.100110.2422.60x
Q200.0330.0191.76x0.0000.0002.42x
Q2116.3632.9305.59x0.0700.0790.89x
Q220.1410.1950.72x0.0680.0790.87x
Q2323.5924.1765.65x0.0850.0980.86x
Q240.1440.1930.75x0.0690.0780.89x
Q250.0740.0223.32x0.0000.0001.97x
Q260.0340.0162.09x0.0000.0002.03x
Q270.0440.0202.26x0.0000.0002.13x
Q28257.86228.3059.11x239.0396.70135.67x
Q29261.86171.6633.65x250.62748.7095.15x
Q307.6656.2031.24x6.2415.3311.17x
Q31255.61831.4378.13x243.77510.14124.04x
Q32258.13433.6907.66x245.99712.32919.95x
Q33564.164342.1551.65x545.320177.2593.08x
Q34344.26165.3455.27x333.86241.0398.14x
Q35343.47963.8435.38x335.36941.6228.06x
Q3647.32429.8501.59x32.14926.0411.23x
Q3738.5977.7484.98x0.7390.6581.12x
Q381.2601.2231.03x0.5150.4631.11x
Q390.9770.9001.08x0.2510.2610.96x
Q401.7751.6701.06x1.0600.9361.13x
Q410.9180.8351.10x0.2220.2360.94x
Q420.9820.8221.19x0.2390.2480.97x
Q431.9481.4581.34x0.8110.7331.11x

Simplyblock and PostgreSQL

Simplyblock’s storage layer dramatically improves PostgreSQL performance in both raw and indexed query scenarios.

With no indexes, query execution was up to 5000x faster, and overall query runtime was reduced by 18x on average.

Even with indexes (where storage bottlenecks are less prominent), Simplyblock still reduced query time by up to 20x and cold-start load times by 47%.

🦆 DuckDB: Single File vs Partitioned Parquet

Test Scenarios:

  • DuckDB reading a single Parquet file
  • DuckDB reading partitioned Parquet files
  • Each was tested against the provided baseline with a simplyblock-backed storage

Key Observations:

SystemLoad TimeBenchmark Runtime
DuckDB (single Parquet)102 s7.78 s
Simplyblock+DuckDb (single Parquet)54 s8.10 s
DuckDB (partitioned Parquet)11.82 s
Simplyblock+DuckDB (partitioned Parquet)10.64 s

✅ DuckDB is very high performance across all tests and independent of the underlying storage. However, simplyblock-backed DuckDB showed faster data load times, slightly faster cold queries, and sustained competitive warm-query latencies, even in more demanding partitioned workloads.

✅ Simplyblock increases query speed on cold benchmark runs by up to 27x.

ClickBench Results: DuckDB with Single Parquet File

When using DuckDB on top of simplyblock, cold queries are accelerated by up to 26x and showed mild improvements in warm query times (~1.1–2x).

This shows that simplyblock enhances first-time access (e.g., initial analytics or scans), which is often bottlenecked by storage in columnar file formats like Parquet.

This is especially interesting in the context of massive data sets where data typically doesn’t fit into memory and need to be read from the backing storage (being a cold-read). Simplyblock ensures fast and consistent access times no matter if data is already in RAM or still on disk.

QueryDuckDB (cold)SB + DuckDB (cold)Rel. SpeedDuckDB (warm)SB + DuckDB (warm)Rel. Speed
Q010.0440.0261.69x0.0010.0011.00x
Q020.1720.0772.23x0.0080.0061.33x
Q031.9840.10818.37x0.0170.0141.21x
Q041.5710.10015.71x0.0180.0161.16x
Q051.6740.5363.12x0.1480.1381.07x
Q061.8990.5123.71x0.1380.1121.24x
Q070.1080.0651.66x0.0130.0121.04x
Q080.4810.0865.59x0.0080.0061.36x
Q092.7380.6224.40x0.1560.1730.90x
Q104.2530.7625.58x0.1860.1851.01x
Q112.2910.3366.82x0.0700.0361.96x
Q122.9220.3608.12x0.0670.0361.89x
Q132.2380.7023.19x0.1270.1091.17x
Q143.9230.8434.65x0.2660.2531.05x
Q152.8410.5974.76x0.1420.1281.11x
Q161.1110.5142.16x0.1550.1241.25x
Q173.8451.0503.66x0.3540.2481.43x
Q183.8300.9464.05x0.3140.2301.36x
Q197.1641.5854.52x0.4630.4631.00x
Q200.3220.0843.83x0.0070.0032.17x
Q2115.1140.56726.66x0.0900.1220.74x
Q2216.9780.79021.49x0.2330.1132.05x
Q2325.8561.37318.83x0.3070.1971.56x
Q242.8310.26910.52x0.0400.0580.69x
Q251.0010.1228.20x0.0220.0270.83x
Q261.3320.10812.33x0.0720.0252.94x
Q271.2540.12110.36x0.0250.0221.14x
Q2815.5890.68422.79x0.1740.0881.98x
Q2911.8293.9812.97x1.4103.0560.46x
Q300.3780.1003.78x0.0240.0260.92x
Q317.0710.7319.67x0.1130.1051.08x
Q329.6430.80711.95x0.2190.1291.70x
Q337.3541.7674.16x0.5970.5451.10x
Q3415.0671.5319.84x0.7380.5021.47x
Q3514.9771.5169.88x0.6930.5021.38x
Q360.7430.5601.33x0.2630.1391.90x
Q370.1190.0951.25x0.0230.0340.69x
Q380.0890.0851.05x0.0090.0200.45x
Q390.1040.0881.18x0.0100.0240.42x
Q400.1680.1321.27x0.0540.0550.98x
Q410.0990.0811.22x0.0050.0110.41x
Q420.0790.0771.03x0.0060.0120.46x
Q430.0730.0711.03x0.0050.0120.39x

ClickBench Results: DuckDB with Partitioned Parquet Files

With partitioned Parquet files, simplyblock showed significant cold-query improvements, up to 22x. Warm-query speedups were smaller (~1.1–2x), and a few regressions were likely due to variance in CPU-bound operations.

For complex, partitioned datasets, faster storage helps DuckDB ingest and query data more efficiently, especially when cold-starting or scanning wide partitions in analytics pipelines.

Again, with increasing data set sizes, the chance that the required data is already in RAM decreases dramatically. Here is where simplyblock really shines and increases the load performance when cold-reads happen.

QueryDuckDB (part, cold)SB + DuckDB (part, cold)Rel. SpeedDuckDB (part, warm)SB + DuckDB (part, warm)Rel. Speed
Q010.1090.1740.63x0.0020.0021.00x
Q020.0870.1280.68x0.0180.0141.33x
Q030.1070.1410.76x0.0240.0231.04x
Q040.3290.1552.12x0.0260.0221.18x
Q051.0260.5501.87x0.1610.1321.22x
Q060.8310.4921.69x0.1750.1581.10x
Q070.0670.1080.62x0.0180.0141.29x
Q080.1010.1390.73x0.0270.0261.04x
Q090.6770.5551.22x0.1750.1641.07x
Q101.0190.6361.60x0.1980.1501.32x
Q110.3970.3101.28x0.0470.0490.96x
Q120.9440.3113.04x0.0520.0491.06x
Q131.2490.6651.88x0.1900.1581.20x
Q142.1940.8672.53x0.3380.2961.14x
Q150.8520.6981.22x0.1690.1730.98x
Q160.4280.5830.73x0.2750.1272.17x
Q172.0980.9662.17x0.3510.2501.41x
Q182.1421.2271.75x0.4390.2421.82x
Q194.1601.6272.56x0.7710.5081.52x
Q200.1590.1381.15x0.0180.0161.13x
Q219.5900.48319.86x0.2520.2441.03x
Q2210.9410.49422.15x0.2230.1951.14x
Q2319.3980.87922.07x0.6120.4101.49x
Q244.9200.6827.21x1.0490.2673.93x
Q250.1770.2120.83x0.0780.0761.02x
Q261.2450.2614.77x0.0770.0741.04x
Q270.2040.1731.18x0.1100.0671.65x
Q289.9960.49520.19x0.2290.2011.14x
Q298.7483.6912.37x2.0133.6350.55x
Q300.1120.1430.78x0.0260.0280.93x
Q312.1730.6233.49x0.1750.1731.01x
Q325.7300.7467.68x0.2970.1841.62x
Q334.5661.5033.04x0.8310.5951.39x
Q349.6381.5786.11x0.9280.7121.30x
Q359.7591.7005.74x0.7890.6571.20x
Q360.3690.6010.61x0.2780.1551.80x
Q370.1760.1850.95x0.0790.0810.98x
Q380.1230.1490.83x0.0580.0680.85x
Q390.1410.1560.90x0.0370.0390.95x
Q400.3010.2851.06x0.1530.1620.94x
Q410.1020.1100.93x0.0180.0200.92x
Q420.0930.1420.65x0.0180.0181.03x
Q430.0910.1400.65x0.0200.0201.03x

Simplyblock and DuckDB

DuckDB, known for being compute-bound, saw modest but consistent improvements using simplyblock. However, when data doesn’t reside in memory yet, simplyblock’s extreme storage performance leads up to 27x faster cold queries, faster data loads, and lower latency for partitioned datasets.

DuckDB is fast by design, so differences were narrower. However, on cold starts (when data must be read from storage), simplyblock’s faster I/O showed a clear edge. It proves that even CPU-heavy analytics engines benefit from better storage in real-world conditions.

Simplyblock: Performance of Local Disks with the Flexibility of Distributed Storage

Simplyblock combines the performance of local NVMe disks with the flexibility of distributed storage. In all tests (PostgreSQL and DuckDB), it consistently performs better, particularly in I/O-heavy workloads.

The benchmarks validate that using simplyblock can dramatically reduce query runtimes, speed up ingestion, and make analytics pipelines far more efficient—whether in transactional databases or analytical engines.

The ClickBench benchmark results highlight the strength of simplyblock’s architecture in real-world data-intensive workloads.

  • PostgreSQL, typically limited by I/O throughput on EBS volumes, benefited dramatically from the NVMe-backed, high-bandwidth simplyblock cluster.
  • DuckDB workloads, while primarily CPU- and memory-bound, still saw tangible benefits in faster data loading and parallel query performance. Especially on the initial cold runs, where storage performance is the limiting factor in many situations.

Whether you’re scaling analytics pipelines or need predictable performance in transactional workloads, simplyblock provides the performance of local disks with the flexibility of distributed block storage, and can be deployed on your “cloud” of choice (public, private, or on-premises) without vendor lock-in.

Bring your data infrastructure to the modern age with simplyblock.