Lightelligence Stories ‘World’s First’ Optical Community-on-Chip Processor

Synthetic intelligence (AI) and machine studying (ML) require real-time, parallel computations on large quantities of knowledge. These workloads exacerbate the reminiscence bottleneck of classical, all-purpose CPUs from each a latency and power perspective.

To beat these challenges, many new gamers within the trade are turning towards novel applied sciences for the way forward for AI/ML computing. Lately, Lightelligence made waves within the trade when it introduced a new AI/ML accelerator that leverages an optical network-on-chip (NoC).



Lightelligence says its new Hummingbird oNOC processor is the primary of its form designed for domain-specific AI workloads. Picture courtesy of Lightelligence

On this piece, we’ll take a look at challenges with standard multicore AI/ML processors, the novel computing structure developed by Lightelligence, and the corporate’s latest ASIC: the Hummingbird.


NoCs and Multicore Challenges

AI/ML computation includes particular mathematic features, resembling multiply-and-accumulates (MACs) and convolutions, to course of giant quantities of knowledge concurrently. Due to this, customary AI/ML processing {hardware} tends to include multicore and heterogeneous techniques. 


An example of a heterogeneous computing architecture

An instance of a heterogeneous computing structure. Picture courtesy of Routledge Handbooks Online


In a multicore system, a single piece of {hardware} will include many cores to course of information in parallel (resembling a GPU). In a heterogeneous system, like an SoC, a single chip will function numerous totally different computing blocks, together with accelerators for MAC features, GPUs, and general-purpose CPUs. Right here, totally different blocks on the SoC will deal with totally different duties to cut back energy consumption and pace up total computation for an ML mannequin.

No matter which structure is employed, the one fixed between multicore and heterogeneous techniques is the necessity for information motion. Whether or not information is transferring between a number of processing cores or out and in of reminiscence, high-speed computing purposes are likely to implement a network-on-chip to hurry up information switch between endpoints.


Different NoC architectures and configurations

Completely different NoC architectures and configurations. Picture courtesy of ResearchGate


Nonetheless, due to the bodily limitations of digital techniques, these architectures are restricted in bandwidth. Consequently, NoCs are additionally restricted within the topologies they’ll obtain, stopping ASICs from reaching most efficiency.


Lightelligence’s oNoC Structure

For Lightelligence, the important thing to enabling better-performing AI/ML accelerators is to allow new NoC topologies that maximize pace and reduce energy consumption. Since standard electrical NoCs received’t reduce it, the corporate as a substitute turned to optical NoCs (oNoCs) as the answer.

Lightelligence’s computing structure consists of three main parts: an digital chip (EIC), an interposer, and a photonic chip (PIC).


A cross-sectional view of Lightelligence’s stacked architecture

A cross-sectional view of Lightelligence’s stacked structure. Picture courtesy of Lightelligence

The EIC is a part of the system that implements the digital area of the system, together with ALU, reminiscence, and analog interface. The interposer connects the EIC and PIC to ship energy to the domains. The PIC hosts the oNOC, which makes use of optical networking to interconnect the processing cores in an all-to-all broadcasting approach. This method is alleged to enable all cores to entry information concurrently.


Lightelligence’s oNoC connects EICs with optical networking

Lightelligence’s oNoC connects EICs with optical networking. Picture courtesy of Lightelligence

On a decrease degree, the interposer comprises photonic routing wave guides that act as information communication highways between EICs. Every EIC is stacked on high of a PIC linked through micro-bumps to kind a 2D array. Mild from a laser supply routes via the waveguides and is translated into electrical information by modulating the sunshine depth. To do that, the analog interface on every EIC {couples} with the photonic interposer and alters the refractive index of the silicon waveguide to bodily modulate the sunshine’s depth. To transform this again to a bitstream, the EIC hosts photodiodes that convert the sunshine pulses to electrical present to be used within the digital area.

The foremost good thing about optical interconnects is that they function at considerably increased speeds and decrease energy consumption than what’s attainable via electrical NoCs. With near-zero latency, the oNoC allows new NoC topologies, like toroidal, that aren’t in any other case attainable. 


Hummingbird oNoC Processor

Lately, Lightelligence introduced its new Hummingbird processor—the primary product to function its oNoc structure.

Hummingbird is an AI/ML accelerator consisting of 64 cores, every linked to 1 one other through the oNoC. With 64 transmitters and 512 receivers, the Hummingbird is a single-instruction, multiple-data (SIMD) answer with its personal proprietary ISA.


The Hummingbird processor stack up

The Hummingbird processor stack up. Picture courtesy of Lightelligence

Whereas efficiency numbers aren’t accessible, the corporate claims that the answer presents decrease latency and energy consumption than anything accessible. Particularly, the answer’s oNoC is alleged to realize an power effectivity ratio as little as < 1 pJ/bit. 

Because it stands, the Hummingbird will likely be carried out in a PCIe kind issue for customary servers. The instrument will likely be programmable through Lightelligence’s personal SDK, which presents assist for TensorFlow. First demonstrations of the chip will happen at this 12 months’s Scorching Chips convention on the finish of August.