The main aim of this thesis is to examine the advantages of 3D stacking applied to microprocessors and related integrated microprocessor systems in the architectural level. In the succession of years microprocessors are aiming towards lower power consumption, increased performance, reduced form factor and increased integration. 3D technology is an emerging technology that can provide improvements in all the aforementioned areas. For conventional process scaling, the signal delay time (RC) is expected to increase with technology node mostly from the increasing resistance of the wires. The situation is more exaggerated because of the constant increase of the interconnect length as well as the increase of the number of interconnect layers used. Thus, mainly for microprocessor systems, it is most important to focus primarily on using 3D to reduce wiring. 3D systems can be divided into two basic categories based on the type of layers stacked to form the 3D entity. The first generically includes stacking cache, main memory or devices with similar functions onto a high- performance logic device. This type of stacking is usually referred as “logic + memory” stacking. The second category involves splitting a logic area between two or more layers and is usually referred as “logic + logic” stacking. This thesis commences with an introduction to 3D ICs and continues by demonstrating ways to improve memory organization. It then proceeds with a unique way of “logic + memory” stacking that provides interesting opportunities for FPGA implementations. Such opportunities may best be exploited with the use of DSP blocks within FPGAs. In this context, a novel DSP block to enhance FPGA performance follows. The Thesis continues with a novel type of link especially useful for 3D integration and concludes with a modular “logic + logic” 3D stacked multi-processor platform. More specifically the first chapter consists an introduction to 3D ICs. The second chapter presents a systematic technique to reduce the silicon area required for AVS-enhanced ISEs without compromising I/O bandwidth. The technique combines a search for the lowest cost memory system organization, followed by a data layout phase (formulated as LICCA—a problem akin to graph coloring), and the use of input and output alignment layers placed between the memory system and ISE logic. Optimizing the memory subsystem using this approach reduces the silicon area by around 36% while maintaining the same data bandwidth as a multi-port memory, and without clock frequency degradation. In the next chapter we propose a methodology to generate data accumulation architectures achieving, to our knowledge, the most efficient use of available memory bandwidth. Such architectures require the minimum number of cycles to complete a number of computations while maintaining the same maximum rate of computation completion as state-of-the-art known implementations. The next chapter proposes the stacking of DRAM on top of an FPGA
Adam Shmuel Teman, Robert Giterman