Forget dual-core and quad-core processors: A semiconductor company promises to pack 100 cores into a processor that can be used in applications that require hefty computing punch, like video conferencing, wireless base stations and networking. By comparison, Intel’s latest chips are expected to have just eight cores.
With a revolutionary new chip architecture and programming tool set, Anant Agarwal of Tilera embedded the processing power of hundreds of cores on a single chip. Tilera’s technology addresses the three biggest challenges in today’s semiconductor market, offering a processor that is high-performance, power-efficient, and easy to program.
“This is a general-purpose chip that can run off-the-shelf programs almost unmodified,” says Anant Agarwal, chief technical officer of Tilera, the company that is making the 100-core chip. “And we can do that while offering at least four times the compute performance of an Intel Nehalem-Ex, while burning a third of the power as a Nehalem.”
Agarwal directs the Massachusetts Institute of Technology’s vaunted Computer Science and Artificial Intelligence Laboratory, or CSAIL. The lab is housed in the university’s Stata Center, a Dr. Seussian hodgepodge of forms and angles that nicely reflects the unhindered-by-reality visionary research that goes on inside.
Tilera’s revolutionary architecture provides superior performance because it eliminates the on-chip bus interconnect, a centralized intersection that information must flow through between processor cores or between cores and the memory and I/O. As manufacturers have added more cores to chips, the bus (or ring) has created an information traffic jam because all data from these additional cores must travel through a single one-dimensional path.
Tilera’s architecture eliminates the dependence on a bus, and instead puts a non-blocking, cut-through switch on each processor core, which connects it to a two dimensional on-chip mesh network called iMesh™ (Intelligent Mesh). This combination of the switch and a processor is called a ’tile’. The iMesh provides each tile with more than a terabit/sec of interconnect bandwidth, creating a more efficient distributed architecture and eliminating on-chip data congestion. Multiple parallel meshes are used in order to separate different transaction types and provide more deterministic interconnect throughput.