Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    In: Computer, Institute of Electrical and Electronics Engineers (IEEE), Vol. 43, No. 4 ( 2010-04), p. 35-43
    Type of Medium: Online Resource
    ISSN: 0018-9162
    RVK:
    RVK:
    Language: Unknown
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2010
    detail.hit.zdb_id: 121237-0
    detail.hit.zdb_id: 2004656-X
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 2
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2011
    In:  ACM SIGOPS Operating Systems Review Vol. 45, No. 1 ( 2011-02-18), p. 34-44
    In: ACM SIGOPS Operating Systems Review, Association for Computing Machinery (ACM), Vol. 45, No. 1 ( 2011-02-18), p. 34-44
    Abstract: This paper presents the architecture and motivation for a clusterbased, many-core computing architecture for energy-efficient, dataintensive computing. FAWN, a Fast Array of Wimpy Nodes, consists of a large number of slower but efficient nodes coupled with low-power storage. We present the computing trends that motivate a FAWN-like approach, for CPU, memory, and storage. We follow with a set of microbenchmarks to explore under what workloads these FAWN nodes perform well (or perform poorly), and briefly examine scenarios in which both code and algorithms may need to be re-designed or optimized to perform well on an efficient platform. We conclude with an outline of the longer-term implications of FAWN that lead us to select a tightly integrated stacked chip and-memory architecture for future FAWN development.
    Type of Medium: Online Resource
    ISSN: 0163-5980
    RVK:
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2011
    detail.hit.zdb_id: 2082220-0
    detail.hit.zdb_id: 243805-7
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 3
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2016
    In:  ACM SIGARCH Computer Architecture News Vol. 43, No. 3S ( 2016-01-04), p. 79-91
    In: ACM SIGARCH Computer Architecture News, Association for Computing Machinery (ACM), Vol. 43, No. 3S ( 2016-01-04), p. 79-91
    Abstract: Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems track memory at a larger granularity (e.g., 4 KB pages), inhibiting efficient implementation of such techniques. Simply reducing the page size results in an unacceptable increase in page table overhead and TLB pressure. We propose a new virtual memory framework that enables efficient implementation of a variety of fine-grained memory management techniques. In our framework, each virtual page can be mapped to a structure called a page overlay, in addition to a regular physical page. An overlay contains a subset of cache lines from the virtual page. Cache lines that are present in the overlay are accessed from there and all other cache lines are accessed from the regular physical page. Our page-overlay framework enables cache-line-granularity memory management without significantly altering the existing virtual memory framework or introducing high overheads. We show that our framework can enable simple and efficient implementations of seven memory management techniques, each of which has a wide variety of applications. We quantitatively evaluate the potential benefits of two of these techniques: overlay-on-write and sparse-data-structure computation. Our evaluations show that overlay-on-write, when applied to fork, can improve performance by 15% and reduce memory capacity requirements by 53% on average compared to traditional copy-on-write. For sparse data computation, our framework can outperform a state-of-the-art software-based sparse representation on a number of real-world sparse matrices. Our framework is general, powerful, and effective in enabling fine-grained memory management at low cost.
    Type of Medium: Online Resource
    ISSN: 0163-5964
    RVK:
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2016
    detail.hit.zdb_id: 2088489-8
    detail.hit.zdb_id: 186012-4
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 4
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2014
    In:  ACM SIGARCH Computer Architecture News Vol. 42, No. 3 ( 2014-10-16), p. 157-168
    In: ACM SIGARCH Computer Architecture News, Association for Computing Machinery (ACM), Vol. 42, No. 3 ( 2014-10-16), p. 157-168
    Abstract: On-chip caches maintain multiple pieces of metadata about each cached block---e.g., dirty bit, coherence information, ECC. Traditionally, such metadata for each block is stored in the corresponding tag entry in the tag store. While this approach is simple to implement and scalable, it necessitates a full tag store lookup for any metadata query---resulting in high latency and energy consumption. We find that this approach is inefficient and inhibits several cache optimizations. In this work, we propose a new way of organizing the dirty bit information that enables simpler and more efficient implementations of several optimizations. In our proposed approach, we remove the dirty bits from the tag store and organize it differently in a separate structure, which we call the Dirty-Block Index (DBI). The organization of DBI is simple: it consists of multiple entries, each corresponding to some row in DRAM. A bit vector in each entry tracks whether or not each block in the corresponding DRAM row is dirty We demonstrate the benfits of DBI by using it to simultaneously and efficiently implement three optimizations proposed by prior work: 1) Aggressive DRAM-aware writeback, 2) Bypassing cache lookups, and 3) Heterogeneous ECC for clean/dirty blocks. DBI, with all three optimizations enabled, improves performance by 31% compared to the baseline (by 6% compared to the best previous mechanism) while reducing overall cache area cost by 8% compared to prior approaches.
    Type of Medium: Online Resource
    ISSN: 0163-5964
    RVK:
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2014
    detail.hit.zdb_id: 2088489-8
    detail.hit.zdb_id: 186012-4
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 5
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2014
    In:  ACM SIGPLAN Notices Vol. 49, No. 4 ( 2014-04-05), p. 655-670
    In: ACM SIGPLAN Notices, Association for Computing Machinery (ACM), Vol. 49, No. 4 ( 2014-04-05), p. 655-670
    Abstract: Device drivers are an Achilles' heel of modern commodity operating systems, accounting for far too many system failures. Previous work on driver reliability has focused on protecting the kernel from unsafe driver side-effects by interposing an invariant-checking layer at the driver interface, but otherwise treating the driver as a black box. In this paper, we propose and evaluate Guardrail, which is a more powerful framework for run-time driver analysis that performs decoupled instruction-grain dynamic correctness checking on arbitrary kernel-mode drivers as they execute, thereby enabling the system to detect and mitigate more challenging correctness bugs (e.g., data races, uninitialized memory accesses) that cannot be detected by today's fault isolation techniques. Our evaluation of Guardrail shows that it can find serious data races, memory faults, and DMA faults in native Linux drivers that required fixes, including previously unknown bugs. Also, with hardware logging support, Guardrail can be used for online protection of persistent device state from driver bugs with at most 10% overhead on the end-to-end performance of most standard I/O workloads.
    Type of Medium: Online Resource
    ISSN: 0362-1340 , 1558-1160
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2014
    detail.hit.zdb_id: 2079194-X
    detail.hit.zdb_id: 282422-X
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 6
    In: IEEE Internet Computing, Institute of Electrical and Electronics Engineers (IEEE), Vol. 11, No. 2 ( 2007-03), p. 16-25
    Type of Medium: Online Resource
    ISSN: 1089-7801
    RVK:
    RVK:
    Language: Unknown
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2007
    detail.hit.zdb_id: 2028745-8
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 7
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2014
    In:  ACM Transactions on Storage Vol. 10, No. 4 ( 2014-10-31), p. 1-27
    In: ACM Transactions on Storage, Association for Computing Machinery (ACM), Vol. 10, No. 4 ( 2014-10-31), p. 1-27
    Abstract: Elastic storage systems can be expanded or contracted to meet current demand, allowing servers to be turned off or used for other tasks. However, the usefulness of an elastic distributed storage system is limited by its agility: how quickly it can increase or decrease its number of servers. Due to the large amount of data they must migrate during elastic resizing, state of the art designs usually have to make painful trade-offs among performance, elasticity, and agility. This article describes the state of the art in elastic storage and a new system, called SpringFS, that can quickly change its number of active servers, while retaining elasticity and performance goals. SpringFS uses a novel technique, termed bounded write offloading , that restricts the set of servers where writes to overloaded servers are redirected. This technique, combined with the read offloading and passive migration policies used in SpringFS, minimizes the work needed before deactivation or activation of servers. Analysis of real-world traces from Hadoop deployments at Facebook and various Cloudera customers and experiments with the SpringFS prototype confirm SpringFS’s agility, show that it reduces the amount of data migrated for elastic resizing by up to two orders of magnitude, and show that it cuts the percentage of active servers required by 67--82%, outdoing state-of-the-art designs by 6--120%.
    Type of Medium: Online Resource
    ISSN: 1553-3077 , 1553-3093
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2014
    detail.hit.zdb_id: 2177816-4
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 8
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2015
    In:  ACM Transactions on Architecture and Code Optimization Vol. 11, No. 4 ( 2015-01-09), p. 1-22
    In: ACM Transactions on Architecture and Code Optimization, Association for Computing Machinery (ACM), Vol. 11, No. 4 ( 2015-01-09), p. 1-22
    Abstract: Many modern high-performance processors prefetch blocks into the on-chip cache. Prefetched blocks can potentially pollute the cache by evicting more useful blocks. In this work, we observe that both accurate and inaccurate prefetches lead to cache pollution, and propose a comprehensive mechanism to mitigate prefetcher-caused cache pollution. First, we observe that over 95% of useful prefetches in a wide variety of applications are not reused after the first demand hit (in secondary caches). Based on this observation, our first mechanism simply demotes a prefetched block to the lowest priority on a demand hit. Second, to address pollution caused by inaccurate prefetches, we propose a self-tuning prefetch accuracy predictor to predict if a prefetch is accurate or inaccurate. Only predicted-accurate prefetches are inserted into the cache with a high priority. Evaluations show that our final mechanism, which combines these two ideas, significantly improves performance compared to both the baseline LRU policy and two state-of-the-art approaches to mitigating prefetcher-caused cache pollution (up to 49%, and 6% on average for 157 two-core multiprogrammed workloads). The performance improvement is consistent across a wide variety of system configurations.
    Type of Medium: Online Resource
    ISSN: 1544-3566 , 1544-3973
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2015
    detail.hit.zdb_id: 2142607-7
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 9
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2010
    In:  ACM SIGPLAN Notices Vol. 45, No. 3 ( 2010-03-05), p. 257-270
    In: ACM SIGPLAN Notices, Association for Computing Machinery (ACM), Vol. 45, No. 3 ( 2010-03-05), p. 257-270
    Abstract: Online program monitoring is an effective technique for detecting bugs and security attacks in running applications. Extending these tools to monitor parallel programs is challenging because the tools must account for inter-thread dependences and relaxed memory consistency models. Existing tools assume sequential consistency and often slow down the monitored program by orders of magnitude. In this paper, we present a novel approach that avoids these pitfalls by not relying on strong consistency models or detailed inter-thread dependence tracking. Instead, we only assume that events in the distant past on all threads have become visible; we make no assumptions on (and avoid the overheads of tracking) the relative ordering of more recent events on other threads. To overcome the potential state explosion of considering all the possible orderings among recent events, we adapt two techniques from static dataflow analysis, reaching definitions and reaching expressions, to this new domain of dynamic parallel monitoring. Significant modifications to these techniques are proposed to ensure the correctness and efficiency of our approach. We show how our adapted analysis can be used in two popular memory and security tools. We prove that our approach does not miss errors, and sacrifices precision only due to the lack of a relative ordering among recent events. Moreover, our simulation study on a collection of Splash-2 and Parsec 2.0 benchmarks running a memory-checking tool on a hardware-assisted logging platform demonstrates the potential benefits in trading off a very low false positive rate for (i) reduced overhead and (ii) the ability to run on relaxed consistency models.
    Type of Medium: Online Resource
    ISSN: 0362-1340 , 1558-1160
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2010
    detail.hit.zdb_id: 2079194-X
    detail.hit.zdb_id: 282422-X
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 10
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2010
    In:  ACM SIGPLAN Notices Vol. 45, No. 3 ( 2010-03-05), p. 271-284
    In: ACM SIGPLAN Notices, Association for Computing Machinery (ACM), Vol. 45, No. 3 ( 2010-03-05), p. 271-284
    Abstract: Instruction-grain lifeguards monitor the events of a running application at the level of individual instructions in order to identify and help mitigate application bugs and security exploits. Because such lifeguards impose a 10-100X slowdown on existing platforms, previous studies have proposed hardware designs to accelerate lifeguard processing. However, these accelerators are either tailored to a specific class of lifeguards or suitable only for monitoring singlethreaded programs. We present ParaLog, the first design of a system enabling fast online parallel monitoring of multithreaded parallel applications. ParaLog supports a broad class of software-defined lifeguards. We show how three existing accelerators can be enhanced to support online multithreaded monitoring, dramatically reducing lifeguard overheads. We identify and solve several challenges in monitoring parallel applications and/or parallelizing these accelerators, including (i) enforcing inter-thread data dependences, (ii) dealing with inter-thread effects that are not reflected in coherence traffic, (iii) dealing with unmonitored operating system activity, and (iv) ensuring lifeguards can access shared metadata with negligible synchronization overheads. We present our system design for both Sequentially Consistent and Total Store Ordering processors. We implement and evaluate our design on a 16 core simulated CMP, using benchmarks from SPLASH-2 and PARSEC and two lifeguards: a data-flow tracking lifeguard and a memory-access checker lifeguard. Our results show that (i) our parallel accelerators improve performance by 2-9X and 1.13-3.4X for our two lifeguards, respectively, (ii) we are 5-126X faster than the time-slicing approach required by existing techniques, and (iii) our average overheads for applications with eight threads are 51% and 28% for the two lifeguards, respectively.
    Type of Medium: Online Resource
    ISSN: 0362-1340 , 1558-1160
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2010
    detail.hit.zdb_id: 2079194-X
    detail.hit.zdb_id: 282422-X
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. Further information can be found on the KOBV privacy pages