Intel said that a new mesh interconnect architecture has been designed to increase bandwidth between on-chip elements, while simultaneously decreasing latency, and improving power efficiency and scalability.
Writing in his bog, Akhilesh Kimar, Skylake-SP CPU Architect said: “The task of adding more cores and interconnecting them to create a multi-core data center processor may sound simple, but the interconnects between CPU cores, memory hierarchy, and I/O subsystems provide critical pathways among these subsystems necessitating thoughtful architecture. These interconnects are like a well-designed highway with the right number of lanes and ramps at critical places to allow traffic to flow smoothly...”
In many-core Xeon processors, Intel used a ring interconnect architecture to link the CPU cores, cache, memory, and various I/O controllers on the chips. However, life has become more difficult as the number of cores in the processors, and memory and I/O bandwidth has increased.
Ring architecture requires data to be sent across long stretches to reach its intended destination. The new mesh architecture addresses this limitation by interconnecting on-chip elements better. This increases the number of pathways and improve the efficiency.
Intel showed us this snap of the new mesh architecture.
Processor cores, on-chip cache banks, memory controllers, and I/O controllers are organised in rows and columns. Wires and switches connect the various on-chip elements and provide a more direct path than the prior ring interconnect architecture. Mesh allows for many more pathways to be implemented, which further minimizes bottlenecks, and also allows Intel to operate the mesh at a lower frequency and voltage, yet still deliver high bandwidth and low latency.
Kimar also says in the post: “The scalable and low-latency on-chip interconnect framework is also critical for the shared last level cache architecture. This large shared cache is valuable for complex multi-threaded server applications, such as databases, complex physical simulations, high-throughput networking applications, and for hosting multiple virtual machines. Negligible latency differences in accessing different cache banks allows software to treat the distributed cache banks as one large unified last level cache.”
Chipzilla is also implementing a modular architecture with its Xeon Scalable processors for resources that access on-chip cache, memory, IO, and remote CPUs. These resources are distributed throughout the chip so “hot-spots” in areas that could be bottlenecked are minimized. Intel claims the higher level of modularity with the new architecture allows available resources to better scale as the number of processor cores increases.