Format:
1 Online-Ressource (265 pages)
ISBN:
9783642416354
Series Statement:
Communications in Computer and Information Science Ser. v.396
Content:
Preface -- Organizing Committee -- Table of Contents -- Session 1: Application Specific Processors -- Design and Implementationof a Novel Entirely Covered K2 CORDIC -- 1 Introduction -- 2 Principle of k2 CORDIC Algorithm -- 2.1 Conventional CORDIC -- 2.2 k2 CORDIC Algorithm -- 3 Architecture of k2 CORDIC Algorithm -- 4 Performance Evaluation and Comparison -- 4.1 Error Analysis -- 4.2 Area Comparison -- 4.3 Speed Comparison -- 5 Conclusion -- References -- The Analysis of Generic SIMT Scheduling Model Extracted from GPU -- 1 Introduction -- 2 SIMT Scheduling Model of GPU -- 3 Analysis of the SIMT Scheduling Model Attribute -- 3.1 Influencing Factors of SIMT Scheduling Performance -- 3.2 Benchmarks -- 3.3 Analysis of Model Attribute Results -- 4 Conclusion -- References -- A Unified Cryptographic Processor for RSA and ECC in RNS -- 1 Introduction -- 2 RNS Montgomery Multiplication and Base Selection -- 2.1 Residue Number System -- 2.2 RNS Montgomery Multiplication and Data Level Parallelism Analysis -- 2.3 Base Selection and Efficient Arithmetic Implementation -- 3 Proposed Cryptographic Processor for RSA and ECC over GF(p) -- 3.1 Transport Triggered Architecture -- 3.2 The Architecture Overview of Proposed Cryptographic Processor -- 4 Coarse-Grained Reconfigurable MMAC Array -- 4.1 Coarse-Grained Reconfigurable Datapath -- 4.2 Versatile MMAC Unit -- 5 Performance Evaluation and Implementation Results -- 5.1 Performance Evaluation -- 5.2 Comparison to Related Works and Implementation Results -- 6 Conclusion -- References -- Real-Time Implementation of 4x4MIMO-OFDM System for 3GPP-LTE Based on a Programmable Processor -- 1 Introduction -- 2 Radio System Structure -- 3 Algorithms Analysis -- 3.1 Low Pass Filtering -- 3.2 Symbol Synchronization -- 3.3 OFMD (De)modulation -- 3.4 MIMO Channel Estimation -- 3.5 MIMO Detection -- 3.6 Algorithms Summary
Content:
4 Architecture of SDR Processor -- 4.1 Matrix Architecure -- 4.2 System Mapping Scheme -- 5 Opportunities and Challenges -- 5.1 Fully Programmable Architecure -- 5.2 Challenges -- 6 Conclusions -- References -- A Market Data Feeds Processing Accelerator Based on FPGA -- 1 Introduction -- 2 Design and Implements -- 2.1 Overview -- 2.2 Original Data Generator -- 2.3 Encoder Core Module -- 2.4 Decoder Core Module -- 2.5 Latency Monitor -- 2.6 Others -- 3 Experiment Results -- 3.1 Experiment Environment -- 3.2 Experiment Results -- 3.3 Results Comparison -- 4 Conclusion -- References -- The Design of Video Accelerator Bus Wrapper -- 1 Introduction -- 2 Background -- 3 Accelerator Bus Wrapper Structure -- 3.1 Wrapper Architecture -- 3.2 The Structure of Data Stored in FIFO -- 3.3 FSM Module Design -- 4 Performance Analyzing -- 4.1 Evaluation Metric and Platform -- 4.2 Evaluation Result -- 4.3 Result Analyzing -- 4.4 Synthesis Result -- 5 Conclusion -- References -- Design and Implementation of Novel Flexible Crypto Coprocessor and Its Application in Security Protocol -- 1 Introduction -- 2 Relative Work -- 3 Implementation of the Coprocessor -- 3.1 Architecture -- 3.2 Implementations of RCB -- 4 Experimental Results -- 4.1 Performance of RCB -- 4.2 Coprocessor Application in SSL Protocol -- 5 Conclusion and Future Work -- References -- Session 2: Communication Architecture -- Wormhole Bubble in Torus Networks -- 1 Introduction -- 2 Related Works -- 3 Bubble Scheme for Wormhole -- 4 Evaluation -- 4.1 Performance with Less Than Two Packet-Sized Buffers -- 4.2 Performance with Two Packet-Sized Buffers -- 5 Conclusions -- References -- Self-adaptive Scheme to Adjust Redundancy for Network Coding with TCP -- 1 Introduction -- 2 TCP/NCProtocol -- 3 Self-adaptive TCP/NC Protocol -- 3.1 Self-adaptive Redundancy Factor -- 3.2 Self-adaptation Algorithm for R
Content:
4 Simulation Results -- 4.1 Simulation Environment Setup -- 4.2 Simulation Results -- 5 Conclusions and Future Works -- References -- Research on Shifter Based on iButterfly Network -- 1 Introduction -- 2 The Design of Shifter Architecture -- 2.1 Analysis of the Shifter Based on iButterfly Network -- 2.2 Shifter Architecture Based on iButterfly Network -- 3 Design of Key Module -- 3.1 Extract of Routing Algorithm and Map of Hardware -- 3.2 Post-processing Circuit and Hardware Implementation -- 4 Performance Evaluation -- 5 Summary and Outlook -- References -- A Highly-Efficient Approach to Adaptive Load Balance for Scalable TBGP -- 1 Introduction -- 2 TBGP Architecture -- 3 ARLP Algorithm -- 4 Performance Evaluation -- 4.1 Load Balance Ratio -- 4.2 Performance for Route Update -- 5 Conclusion -- References -- Session 3: Computer Application and Software Optimization -- OpenACC to Intel Offload: Automatic Translation and Optimization -- 1 Introduction -- 2 Overview of OpenACC and the MIC Coprocessor -- 2.1 OpenACC -- 2.2 Intel MIC -- 3 Related Work -- 4 Automatic Translation of OpenACC to Offload -- 4.1 Mapping OpenACC Directives into Offload Directives -- 4.2 OpenACC to Offload Baseline Translation -- 5 Optimization -- 5.1 Communication Optimization -- 5.2 SIMD Optimization -- 6 Experiments -- 6.1 Experiments Environment -- 6.2 Experiment Case and Result -- 7 Conclusion -- References -- Applying Variable Neighborhood Search Algorithm to Multicore Task Scheduling Problem -- 1 Introduction -- 2 The Variable Neighborhood Search Algorithm -- 3 The Multicore Task Scheduling Problem -- 3.1 The Task Graph Model -- 3.2 The Multicore Platform Model -- 4 Applying VNSA to Multicore Task Scheduling Problem -- 4.1 Formalization of the Solution -- 4.2 Transformation of the Solution -- 4.3 Generating the Initial Solution
Content:
4.4 Generating the Neighborhood and the Neighborhood Set -- 4.5 Local Search Strategy and Termination Conditions -- 5 Experiments and Results Analysis -- 6 Conclusion -- References -- Empirical Analysis of Human Behavior Patterns in BBS -- 1 Introduction -- 2 Data Set Description -- 3 Empirical Analysis of Actual Data -- 3.1 Distribution of the Click Number and Reply Number of Posts -- 3.2 Distribution of the Post Number and Reply Number of Users -- 3.3 Distribution of the One-Day One-User Reply Number on Population Level -- 3.4 Distribution of the Abnormal One-Day Reply Behaviors -- 4 Discussion and Conclusions -- References -- Performance Evaluation and Scalability Analysis of NPB-MZ on Intel Xeon Phi Coprocessor -- 1 Introduction -- 2 Intel MIC Architecture and Execution Modes -- 2.1 Intel MIC Architecture -- 2.2 Execution Modes for Intel Xeon Phi -- 3 Experiment Results and Analysis -- 3.1 Experiment Setup -- 3.2 Experimental Results and Performance Analysis -- 4 Conclusions and Future Work -- References -- An Effective Framework of Program Optimization for High Performance Computing -- 1 Introduction -- 2 Formal Description -- 3 Polyhedral Model -- 3.1 Iteration Domain -- 3.2 Array Access Functions -- 3.3 Affine Scheduling -- 4 Genetic Algorithm Based Empirical Search -- 5 Performance Evaluation -- 5.1 Environmental Setup -- 5.2 Experimental Results -- 6 Related Work and Conclusions -- References -- Session 4: IC Design and Test -- A Constant Loop Bandwidth Fraction-N Frequency Synthesizer for GNSS Receivers -- 1 Introduction -- 2 Design Considerations -- 3 Circuits Implementations -- 3.1 Wideband VCO -- 3.2 Charge Pump -- 3.3 AFC -- 4 Implementation Results -- 5 Conclusion -- Reference -- Investigation of Reproducibility and Repeatability Issue on EFT Test at IC Level to Microcontrollers -- 1 Introduction -- 2 EFT Test Method at IC Level
Content:
3 Experiment and the Results -- 4 Discussion and Analysis -- 4.1 Poor Repeatability of B, C, D Type Failure on Each Probe -- 4.2 Bad Reproducibility of E Type Failure Level of the Two Probes -- 5 Conclusion -- References -- A Scan Chain Based SEU Test Method for Microprocessors -- 1 Introduction -- 2 Scan Chain Based Method -- 3 Experimental Setup and Procedure -- 4 Results and Discussion -- 5 Conclusion -- References -- Session 5: Processor Architecture -- Achieving Predictable Performance in SMT Processors by Instruction Fetch Policy -- 1 Introduction -- 2 Cazorla Policy -- 3 Achieving Predictable Performance by Instruction Fetch Policy -- 3.1 Basic Idea -- 3.2 Implementation -- 4 Methodology -- 4.1 Simulator -- 4.2 Benchmarks -- 4.3 Metrics -- 4.4 Choosing Parameter -- 5 Results -- 5.1 Efficiency in Achieving Predictable Performance -- 5.2 The Performance of LPTs and Overall Throughput Results -- 5.3 Compared with Cazorla Policy -- 6 Conclusions -- References -- Reconfigurable Many-Core Processor with Cache Coherence -- 1 Introduction -- 2 Motivation and Background -- 2.1 Phase in Parallel Programs -- 2.2 Reconfiguration in Many-Core Processors -- 3 Reconfigurable Design for Many-Core -- 3.1 Overview -- 3.2 Reconfigurable Subnet Design -- 3.3 Reconfigurable Cache Coherence Protocol Design -- 4 Simulation -- 4.1 Simulation Platform -- 4.2 Simulation Results -- 5 Conclusion -- References -- Backhaul-Route Pre-Configuration Mechanism for Delay Optimization in NoCs -- 1 Introduction -- 2 Related Works -- 3 Backhaul-Route Pre-Configuration Mechanism -- 3.1 General Router Architecture -- 3.2 Backhaul-Route Pre-Configuration -- 3.3 Backhaul-Route Reuse -- 3.4 Backhaul-Route Termination -- 3.5 Routing Transform Mechanism -- 4 Experiment and Performance Evaluation -- 5 Conclusion -- References
Content:
A Novel CGRA Architecture and Mapping Algorithm for Application Acceleration
Additional Edition:
9783642416347
Additional Edition:
Erscheint auch als Druck-Ausgabe Xu, Weixia Computer Engineering and Technology : 17th CCF Conference, NCCET 2013, Xining, China, July 20-22, 2013. Revised Selected Papers Berlin/Heidelberg : Springer Berlin Heidelberg,c2013 9783642416347
Language:
English
URL:
Volltext
(lizenzpflichtig)
Bookmarklink