So, instead of rewiring the whole FPGA, we decided to route packets over a fixed interconnect network, letting the tiles on the chip communicate much as computers do over the Internet. Tasks run in the tiles and communicate with other tasks on other tiles, on the microprocessor, or on an ASIC, by sending messages via the network. Since the interconnect network and the tile interfaces are fixed, tasks can be dynamically created and deleted without affecting those running in other tiles.
Keeping tabs on tasks
The key to a great user experience with a device like Gecko lies in making the transitions from one function to another as smooth as possible. This responsibility falls to the real-time operating system, which manages all these complex transitions.
We based Gecko's Operating System for Reconfigurable Systems (OS4RS) on a real-time version of Linux. The OS4RS manages the dynamic creation of hardware tasks and handles communications among them. It also determines when and on which resource to schedule newly created tasks. When switching between tiles on the FPGA or between the FPGA and the StrongARM microprocessor, OS4RS must suspend certain tasks that are running so that other tasks can take a turn. To do so, it must remember the state each task was in when it stopped so that each task can restart from the same state.
To find a way to seamlessly and automatically switch a task running in software on the microprocessor to the FPGA tiles, we looked at traditional microprocessors and operating systems. These solved the software half of the problem a long time ago.
When multiple tasks need to run on a microprocessor, an OS grants each task a time slot on the processor. A running task, X, sometimes needs to be suspended temporarily so that another task, Y, can be run for a time, after which task X is resumed. Handling this suspension is a function called a context switch, which requires the operating system to save the context of the task in the processor's memory at a predefined location.
The context of a task denotes its state: all the information required to resume the task at the point where it was interrupted. For a task running in software on a microprocessor, context includes what's in the processor's registers, the data in the memory on which the task is operating, and information regarding the current state of execution of the task, such as the program counter.
While such a software context switch will work on Gecko's processor, the device's reconfigurable hardware requires special handling. In particular, not only the software but also the hardware states of the same task must be represented consistently.
That consistency is supplied by providing the system designer with a selection of objects that represent tasks and contain timing information. When the code that defines the objects is generated for both hardware and software, the tasks will behave uniformly, regardless of where they are running. By guaranteeing uniformity, we can write chunks of code, called switching points, that will work when the task runs in the FPGA or on the StrongARM. When a running task hits a switching point, it will stop and pass its state information to the OS4RS, which will store it in a defined format in the processor's memory.
Besides switching points, we need one more thing for dynamic reconfiguration. To move a task from the FPGA to the StrongARM microprocessor and back, the operating system needs to know where each task is at any given time. So it assigns every task a logical address. Whenever the operating system schedules a task on the FPGA, an address translation table is updated. This table lets the operating system translate a logical address located in the registers of the StrongARM microprocessor into a physical address based on the location of the task in the FPGA's interconnect network. With switching points embedded in each task and the operating system aware of each task's location, we're ready to reconfigure.
When the user decides to play a game while keeping an eye on the baby, the operating system will signal the video decoder task that it should relocate from the FPGA to the microprocessor, to free the FPGA for the game decoder task. As the video decoder task reaches a switch point, it is interrupted and transfers all of its state information to the operating system, which saves the task's context in the memory of the microprocessor. The operating system then resumes the relocated video decoder task on the microprocessor, where it will start up again right where it left off—but in software running on the microprocessor instead of as a circuit running on the FPGA. Now the FPGA is free to start the game decoder task.
In addition to allowing a user to run several programs on the same device simultaneously, the Gecko concept also gives us that glimpse of a future where a device's longevity can be extended almost indefinitely.
Every PC user is familiar with the scenario where application software autonomously checks for the availability of upgrades or patches on the Internet. Installing such an upgrade is as easy as clicking ”yes” on a dialog box. Connecting the reconfigurable hardware to the Internet extends this software upgrade scenario to hardware.
When a new video compression standard emerges, the compute-intensive tasks that need to be implemented in hardware to obtain acceptable performance can be downloaded over the Internet, together with the less compute-intensive tasks of the video compression standard. The Gecko's operating system will take care of running the compute-intensive tasks in one of the tiles of the reconfigurable architecture while keeping the less compute-intensive tasks on the processor.
Gecko's successor
IMEC isn't the only research group developing this kind of reconfigurable system. Similar research is under way in labs at the University of California at Berkeley, the Imperial College of Science, Technology and Medicine in London, and the Massachusetts Institute of Technology, where researchers are working on a similar architecture as part of its RAW (Raw Architecture Workstation) processor project. RAW has 16 identical tiles containing programmable microprocessors, floating-point arithmetic units, and memories that communicate over an interconnect network that supports both compile-time and run-time routing.
The Gecko gives us a glimpse of a future where a device can be all things to all people and its longevity can be extended almost indefinitely
Clearly, Gecko is far from the final word on reconfigurability. Rather, it is the first step toward a future flexible system-on-chip platform. Such a platform will integrate the discrete components used in the Gecko platform into a single piece of very flexible silicon. This next-generation Gecko will be built in a 45-nm technology, starting in the 2008-to-2010 time frame (we're at 90 nm today). It will consist of a regular array of FPGA tiles, each approximately 2 mm by 2 mm, connected by a packet-switched network. Some tiles will contain instruction-set processors, others FPGA hardware, and still others dedicated custom hardware.
Whatever this chameleon of devices will be called once it makes it to market is anyone's guess, but its heart, the hardware task concept, and the operating system technology, will be pure Gecko.
Comments