At the heart of Panmnesia’s solution is a CXL 3.1-compliant root complex with multiple root ports and a host bridge featuring a host-managed device memory (HDM) decoder. This sophisticated system effectively tricks the GPU’s memory subsystem into treating PCIe-connected memory as native system memory. Extensive testing has demonstrated impressive results. Panmnesia’s CXL solution, CXL-Opt, achieved two-digit nanosecond round-trip latency, significantly outperforming both UVM and earlier CXL prototypes. In GPU kernel execution tests, CXL-Opt showed execution times up to 3.22 times faster than UVM. Older CXL memory extenders recorded around 250 nanoseconds round trip latency, with CXL-Opt potentially achieving less than 80 nanoseconds. As with CXL, the problem is usually that the memory pools add up latency and performance degrades, while these CXL extenders tend to add to the cost model as well. However, the Panmnesia CXL-Opt could find a use case, and we are waiting to see if anyone adopts this in their infrastructure.
Below are some benchmarks by Panmnesia, as well as the architecture of the CXL-Opt.