UID:
almafu_9960073678202883
Umfang:
1 online resource (383 p.)
Ausgabe:
First edition.
ISBN:
9780128009796
,
0128009799
,
9780128011782
,
0128011785
Inhalt:
Networks-on-Chip: From Implementations to Programming Paradigms provides a thorough and bottom-up exploration of the whole NoC design space in a coherent and uniform fashion, from low-level router, buffer and topology implementations, to routing and flow control schemes, to co-optimizations of NoC and high-level programming paradigms. This textbook is intended for an advanced course on computer architecture, suitable for graduate students or senior undergrads who want to specialize in the area of computer architecture and Networks-on-Chip. It is also intended for practitioners in the industry in the area of microprocessor design, especially the many-core processor design with a network-on-chip. Graduates can learn many practical and theoretical lessons from this course, and also can be motivated to delve further into the ideas and designs proposed in this book. Industrial engineers can refer to this book to make practical tradeoffs as well. Graduates and engineers who focus on off-chip network design can also refer to this book to achieve deadlock-free routing algorithm designs. Provides thorough and insightful exploration of NoC design space. Description from low-level logic implementations to co-optimizations of high-level program paradigms and NoCs. The coherent and uniform format offers readers a clear, quick and efficient exploration of NoC design space Covers many novel and exciting research ideas, which encourage researchers to further delve into these topics. Presents both engineering and theoretical contributions. The detailed description of the router, buffer and topology implementations, comparisons and analysis are of high engineering value.
Anmerkung:
Bibliographic Level Mode of Issuance: Monograph
,
Front Cover -- Networks-on-Chip: From Implementations to Programming Paradigms -- Copyright -- Contents in Brief -- Contents -- Preface -- About the Editor-in-Chief and Authors -- Editor-in-Chief -- Authors -- Part I: Prologue -- Chapter 1: Introduction -- 1.1 The dawn of the many-core era -- 1.2 Communication-centric cross-layer optimizations -- 1.3 A baseline design space exploration of NoCs -- 1.3.1 Topology -- 1.3.2 Routing algorithm -- 1.3.3 Flow control -- 1.3.4 Router microarchitecture -- 1.3.5 Performance metric -- 1.4 Review of NoC research -- 1.4.1 Research on topologies -- 1.4.2 Research on unicast routing -- 1.4.3 Research on supporting collective communications -- 1.4.4 Research on flow control -- 1.4.5 Research on router microarchitecture -- 1.5 Trends of real processors -- 1.5.1 The MIT Raw processor -- 1.5.2 The Tilera TILE64 processor -- 1.5.3 The Sony/Toshiba/IBM Cell processor -- 1.5.4 The U.T. Austin TRIPS processor -- 1.5.5 The Intel Teraflops processor -- 1.5.6 The Intel SCC processor -- 1.5.7 The Intel Larrabee processor -- 1.5.8 The Intel Knights Corner processor -- 1.5.9 Summary of real processors -- 1.6 Overview of the book -- References -- Part II: Logic implementations -- Chapter 2: A single-cycle router with wing channels -- 2.1 Introduction -- 2.2 The router architecture -- 2.2.1 The overall architecture -- 2.2.2 Wing channels -- 2.3 Microarchitecture designs -- 2.3.1 Channel dispensers -- 2.3.2 Fast arbiter components -- 2.3.3 SIG managers and SIG controllers -- 2.4 Experimental results -- 2.4.1 Simulation infrastructures -- 2.4.2 Pipeline delay analysis -- 2.4.3 Latency and throughput -- 2.4.4 Area and power consumption -- 2.5 Chapter summary -- References -- Chapter 3: Dynamic virtual channel routers with congestion awareness -- 3.1 Introduction -- 3.2 DVC with congestion awareness -- 3.2.1 DVC scheme.
,
3.2.2 Congestion avoidance scheme -- 3.3 Multiple-port shared buffer with congestion awareness -- 3.3.1 DVC scheme among multiple ports -- 3.3.2 Congestion avoidance scheme -- 3.4 DVC router microarchitecture -- 3.4.1 VC control module -- 3.4.2 Metric aggregation and congestion avoidance -- 3.4.3 VC allocation module -- 3.5 HiBB router microarchitecture -- 3.5.1 VC control module -- 3.5.2 VC allocation and output port allocation -- 3.5.3 VC regulation -- 3.6 Evaluation -- 3.6.1 DVC router evaluation -- 3.6.2 HiBB router evaluation -- 3.7 Chapter summary -- References -- Chapter 4: Virtual bus structure-based network-on-chip topologies -- 4.1 Introduction -- 4.2 Background -- 4.3 Motivation -- 4.3.1 Baseline on-chip communication networks -- 4.3.1.1 Transaction-based bus -- 4.3.1.2 Packet-based NoC -- 4.3.2 Analysis of NoC problems -- 4.3.2.1 Multihop problem -- 4.3.2.2 Multicast problem -- 4.3.3 Advantages of a transaction-based bus -- 4.4 The VBON -- 4.4.1 Interconnect structures -- 4.4.1.1 Wire delay consideration -- 4.4.2 The VB mechanism -- 4.4.2.1 The VB construction -- 4.4.2.2 VB arbitration -- 4.4.2.3 Packet format -- 4.4.2.4 VB operation -- 4.4.2.5 A simple example for VB communication -- 4.4.3 Starvation and deadlock avoidance -- 4.4.4 The VBON router microarchitecture -- 4.5 Evaluation -- 4.5.1 Simulation infrastructures -- 4.5.1.1 Router choices for comparison -- 4.5.1.2 Network configuration -- 4.5.1.3 Traffic generation -- 4.5.2 Synthetic traffic evaluations -- 4.5.2.1 Single-level 4 4 VBON -- 4.5.2.2 Hierarchical 8 8 VBON -- 4.5.3 Real application evaluations -- 4.5.4 Power consumption analysis -- 4.5.5 Overhead analysis -- 4.6 Chapter summary -- References -- Part III: Routing and flow Control -- Chapter 5: Routing algorithms for workload consolidation -- 5.1 Introduction -- 5.2 Background -- 5.3 Motivation.
,
5.3.1 Insufficient information -- 5.3.2 Intraregion interference -- 5.3.3 Inter-region interference -- 5.4 Destination-based adaptive routing -- 5.4.1 Destination-based selection strategy -- 5.4.1.1 Congestion information propagation network -- 5.4.1.2 DBSS router microarchitecture -- 5.4.2 Routing function design -- 5.4.2.1 Offered path diversity -- 5.4.2.2 VC reallocation scheme -- 5.5 Evaluation -- 5.5.1 Evaluation of routing functions -- 5.5.2 Single-region performance -- 5.5.2.1 Synthetic traffic results -- 5.5.2.2 Application results -- 5.5.3 Multiple-region performance -- 5.5.3.1 Results for a small regular region -- 5.5.3.2 Irregular-region results -- 5.5.3.3 Summary -- 5.5.4 CMesh evaluation -- 5.5.4.1 Configuration -- 5.5.4.2 Performance -- 5.5.5 Hardware overhead -- 5.5.5.1 Wiring overhead -- 5.5.5.2 Router overhead -- 5.5.5.3 Power consumption -- 5.6 Analysis and discussion -- 5.6.1 In-depth analysis of interference -- 5.6.2 Design space exploration -- 5.6.2.1 Number of propagation wires -- 5.6.2.2 DBSS scalability -- 5.6.2.3 Congestion propagation delay -- 5.7 Chapter summary -- References -- Chapter 6: Flow control for fully adaptive routing -- 6.1 Introduction -- 6.2 Background -- 6.2.1 Deadlock avoidance theories -- 6.2.2 Fully adaptive routing algorithms -- 6.3 Motivation -- 6.3.1 VC reallocation -- 6.3.2 Routing flexibility -- 6.4 Flow control and routing designs -- 6.4.1 Whole packet forwarding -- 6.4.2 Aggressive VC reallocation for EVCs -- 6.4.3 Maintain routing flexibility -- 6.4.4 Router microarchitecture -- 6.5 Evaluation on synthetic traffic -- 6.5.1 Performance of synthetic workloads -- 6.5.2 Buffer utilization of routing algorithms -- 6.5.3 Sensitivity to network design -- 6.5.3.1 SFP ratio -- 6.5.3.2 VC depth -- 6.5.3.3 VC count -- 6.5.3.4 Network size -- 6.6 Evaluation of PARSEC workloads.
,
6.6.1 Methodology and configuration -- 6.6.2 Performance -- 6.7 Detailed analysis of flow control -- 6.7.1 The detailed buffer utilization -- 6.7.1.1 Allowable EVCs -- 6.7.1.2 Performance analysis -- 6.7.2 The effect of flow control on fairness -- 6.8 Further discussion -- 6.8.1 Packet length -- 6.8.2 Dynamically allocated multiqueue and hybrid flow controls -- 6.9 Chapter summary -- Appendix: Logical Equivalence of Alg and Alg + WPF -- References -- Chapter 7: Deadlock-free flow control for torus networks-on-chip -- 7.1 Introduction -- 7.2 Limitations of existing designs -- 7.2.1 Dateline -- 7.2.2 Localized bubble scheme -- 7.2.3 Critical bubble scheme -- 7.2.4 Inefficiency with variable-size packets -- 7.3 Flit bubble flow control -- 7.3.1 Theoretical description -- 7.3.2 FBFC-localized -- 7.3.3 FBFC-critical -- 7.3.4 Starvation -- 7.4 Router microarchitecture -- 7.4.1 FBFC routers -- 7.4.2 VCT routers -- 7.5 Methodology -- 7.6 Evaluation on 1D tori (rings) -- 7.6.1 Performance -- 7.6.2 Buffer utilization -- 7.6.3 Latency of short and long packets -- 7.7 Evaluation on 2D tori -- 7.7.1 Performance for a 44 torus -- 7.7.2 Sensitivity to SFP ratios -- 7.7.3 Sensitivity to buffer size -- 7.7.4 Scalability for an 88 torus -- 7.7.5 Effect of starvation -- 7.7.6 Real application performance -- 7.7.7 Large-scale systems and message passing -- 7.8 Overheads: Power and area -- 7.8.1 Methodology -- 7.8.2 Power efficiency -- 7.8.3 Area -- 7.8.4 Comparison with meshes -- 7.9 Discussion and related work -- 7.9.1 Discussion -- 7.9.2 Related work -- 7.10 Chapter summary -- References -- Part IV: Programming paradigms -- Chapter 8: Supporting cache-coherent collective communications -- 8.1 Introduction -- 8.2 Message combination framework -- 8.2.1 MCT format -- 8.2.2 Message combination example -- 8.2.3 Insufficient MCT entries -- 8.3 BAM routing.
,
8.4 Router pipeline and microarchitecture -- 8.5 Evaluation -- 8.5.1 Performance -- 8.5.1.1 Overall network performance -- 8.5.1.2 Multicast transaction performance -- 8.5.1.3 Real application performance -- 8.5.2 Comparing multicast VN configurations -- 8.5.2.1 Unicast performance -- 8.5.2.2 Multicast performance -- 8.5.3 MCT size -- 8.5.4 Sensitivity to network design -- 8.5.4.1 VC count -- 8.5.4.2 Multicast ratio -- 8.5.4.3 Destinations per multicast -- 8.5.4.4 Network size -- 8.6 Power analysis -- 8.7 Related work -- 8.7.1 Message combination -- 8.7.2 NoC multicast routing -- 8.8 Chapter summary -- References -- Chapter 9: Network-on-chip customizations for message passing interface primitives -- 9.1 Introduction -- 9.2 Background -- 9.3 Motivation -- 9.3.1 MPI adaption in NoC designs -- 9.3.2 Optimizations of MPI functions -- 9.4 Communication customization architectures -- 9.4.1 Architecture overview -- 9.4.2 The customized NoC design: VBON -- 9.4.3 The MPI primitive implementation: MU -- 9.4.3.1 The architecture of the MU -- 9.4.3.2 MPI processing unit -- 9.4.3.3 The collective operation implementation -- 9.4.3.4 Communication protocols -- 9.5 Evaluation -- 9.5.1 Methodology -- 9.5.2 Experimental results -- 9.5.2.1 The effect of point-to-point communication: Bandwidth -- 9.5.2.2 The effect of collective communication: Broadcast operations -- 9.5.2.3 The effect of collective communication: Barrier operations -- 9.5.2.4 The effect of collective communication: Reduce operation -- 9.5.2.5 The effect of application communication: Performance -- 9.5.2.6 The effect of application communication: Power and scalability -- 9.5.2.7 Implementation overheads -- 9.6 Chapter summary -- References -- Chapter 10: Message passing interface communication protocol optimizations -- 10.1 Introduction -- 10.2 Background -- 10.2.1 Communication protocols in MPI.
,
10.2.2 Existing problems.
,
English
Weitere Ausg.:
ISBN 9781322477152
Weitere Ausg.:
ISBN 1322477159
Sprache:
Englisch
Bookmarklink