KOBV Portal

1

Online Resource

Open Cirrus: A Global Cloud Computing Testbed

Avetisyan, Arutyun I ; Campbell, Roy ; Gupta, Indranil ; [et al.]

Institute of Electrical and Electronics Engineers (IEEE) ; 2010

In: Computer Vol. 43, No. 4 ( 2010-04), p. 35-43

add to watchlist on the watchlist

Details

In: Computer, Institute of Electrical and Electronics Engineers (IEEE), Vol. 43, No. 4 ( 2010-04), p. 35-43

Type of Medium: Online Resource

ISSN: 0018-9162

URL: Article

DOI: 10.1109/MC.2010.111

RVK:

SQ 1100

RVK:

SA 3520

Language: Unknown

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 2010

detail.hit.zdb_id: 121237-0

detail.hit.zdb_id: 2004656-X

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

2

Online Resource

Challenges and opportunities for efficient computing with FAWN

Vasudevan, Vijay ; Andersen, David G. ; Kaminsky, Michael ; [et al.]

Association for Computing Machinery (ACM) ; 2011

In: ACM SIGOPS Operating Systems Review Vol. 45, No. 1 ( 2011-02-18), p. 34-44

add to watchlist on the watchlist

Details

In: ACM SIGOPS Operating Systems Review, Association for Computing Machinery (ACM), Vol. 45, No. 1 ( 2011-02-18), p. 34-44

Abstract: This paper presents the architecture and motivation for a clusterbased, many-core computing architecture for energy-efficient, dataintensive computing. FAWN, a Fast Array of Wimpy Nodes, consists of a large number of slower but efficient nodes coupled with low-power storage. We present the computing trends that motivate a FAWN-like approach, for CPU, memory, and storage. We follow with a set of microbenchmarks to explore under what workloads these FAWN nodes perform well (or perform poorly), and briefly examine scenarios in which both code and algorithms may need to be re-designed or optimized to perform well on an efficient platform. We conclude with an outline of the longer-term implications of FAWN that lead us to select a tightly integrated stacked chip and-memory architecture for future FAWN development.

Type of Medium: Online Resource

ISSN: 0163-5980

URL: Article

DOI: 10.1145/1945023.1945029

RVK:

S418

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2011

detail.hit.zdb_id: 2082220-0

detail.hit.zdb_id: 243805-7

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

3

Online Resource

Page overlays : an enhanced virtual memory framework to enable fine-grained memory management

Seshadri, Vivek ; Pekhimenko, Gennady ; Ruwase, Olatunji ; [et al.]

Association for Computing Machinery (ACM) ; 2016

In: ACM SIGARCH Computer Architecture News Vol. 43, No. 3S ( 2016-01-04), p. 79-91

add to watchlist on the watchlist

Details

In: ACM SIGARCH Computer Architecture News, Association for Computing Machinery (ACM), Vol. 43, No. 3S ( 2016-01-04), p. 79-91

Abstract: Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems track memory at a larger granularity (e.g., 4 KB pages), inhibiting efficient implementation of such techniques. Simply reducing the page size results in an unacceptable increase in page table overhead and TLB pressure. We propose a new virtual memory framework that enables efficient implementation of a variety of fine-grained memory management techniques. In our framework, each virtual page can be mapped to a structure called a page overlay, in addition to a regular physical page. An overlay contains a subset of cache lines from the virtual page. Cache lines that are present in the overlay are accessed from there and all other cache lines are accessed from the regular physical page. Our page-overlay framework enables cache-line-granularity memory management without significantly altering the existing virtual memory framework or introducing high overheads. We show that our framework can enable simple and efficient implementations of seven memory management techniques, each of which has a wide variety of applications. We quantitatively evaluate the potential benefits of two of these techniques: overlay-on-write and sparse-data-structure computation. Our evaluations show that overlay-on-write, when applied to fork, can improve performance by 15% and reduce memory capacity requirements by 53% on average compared to traditional copy-on-write. For sparse data computation, our framework can outperform a state-of-the-art software-based sparse representation on a number of real-world sparse matrices. Our framework is general, powerful, and effective in enabling fine-grained memory management at low cost.

Type of Medium: Online Resource

ISSN: 0163-5964

URL: Article

DOI: 10.1145/2872887.2750379

RVK:

SS 1985

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2016

detail.hit.zdb_id: 2088489-8

detail.hit.zdb_id: 186012-4

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

4

Online Resource

The dirty-block index

Seshadri, Vivek ; Bhowmick, Abhishek ; Mutlu, Onur ; [et al.]

Association for Computing Machinery (ACM) ; 2014

In: ACM SIGARCH Computer Architecture News Vol. 42, No. 3 ( 2014-10-16), p. 157-168

add to watchlist on the watchlist

Details

In: ACM SIGARCH Computer Architecture News, Association for Computing Machinery (ACM), Vol. 42, No. 3 ( 2014-10-16), p. 157-168

Abstract: On-chip caches maintain multiple pieces of metadata about each cached block---e.g., dirty bit, coherence information, ECC. Traditionally, such metadata for each block is stored in the corresponding tag entry in the tag store. While this approach is simple to implement and scalable, it necessitates a full tag store lookup for any metadata query---resulting in high latency and energy consumption. We find that this approach is inefficient and inhibits several cache optimizations. In this work, we propose a new way of organizing the dirty bit information that enables simpler and more efficient implementations of several optimizations. In our proposed approach, we remove the dirty bits from the tag store and organize it differently in a separate structure, which we call the Dirty-Block Index (DBI). The organization of DBI is simple: it consists of multiple entries, each corresponding to some row in DRAM. A bit vector in each entry tracks whether or not each block in the corresponding DRAM row is dirty We demonstrate the benfits of DBI by using it to simultaneously and efficiently implement three optimizations proposed by prior work: 1) Aggressive DRAM-aware writeback, 2) Bypassing cache lookups, and 3) Heterogeneous ECC for clean/dirty blocks. DBI, with all three optimizations enabled, improves performance by 31% compared to the baseline (by 6% compared to the best previous mechanism) while reducing overall cache area cost by 8% compared to prior approaches.

Type of Medium: Online Resource

ISSN: 0163-5964

URL: Article

DOI: 10.1145/2678373.2665697

RVK:

SS 1985

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2014

detail.hit.zdb_id: 2088489-8

detail.hit.zdb_id: 186012-4

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

5

Online Resource

Guardrail : a high fidelity approach to protecting hardware devices from buggy drivers

Ruwase, Olatunji ; Kozuch, Michael A. ; Gibbons, Phillip B. ; [et al.]

Association for Computing Machinery (ACM) ; 2014

In: ACM SIGPLAN Notices Vol. 49, No. 4 ( 2014-04-05), p. 655-670

add to watchlist on the watchlist

Details

In: ACM SIGPLAN Notices, Association for Computing Machinery (ACM), Vol. 49, No. 4 ( 2014-04-05), p. 655-670

Abstract: Device drivers are an Achilles' heel of modern commodity operating systems, accounting for far too many system failures. Previous work on driver reliability has focused on protecting the kernel from unsafe driver side-effects by interposing an invariant-checking layer at the driver interface, but otherwise treating the driver as a black box. In this paper, we propose and evaluate Guardrail, which is a more powerful framework for run-time driver analysis that performs decoupled instruction-grain dynamic correctness checking on arbitrary kernel-mode drivers as they execute, thereby enabling the system to detect and mitigate more challenging correctness bugs (e.g., data races, uninitialized memory accesses) that cannot be detected by today's fault isolation techniques. Our evaluation of Guardrail shows that it can find serious data races, memory faults, and DMA faults in native Linux drivers that required fixes, including previously unknown bugs. Also, with hardware logging support, Guardrail can be used for online protection of persistent device state from driver bugs with at most 10% overhead on the end-to-end performance of most standard I/O workloads.

Type of Medium: Online Resource

ISSN: 0362-1340 , 1558-1160

URL: Article

DOI: 10.1145/2644865.2541970

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2014

detail.hit.zdb_id: 2079194-X

detail.hit.zdb_id: 282422-X

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

6

Online Resource

Pervasive Personal Computing in an Internet Suspend/Resume System

Satyanarayanan, Mahadev ; Gilbert, Benjamin ; Toups, Matt ; [et al.]

Institute of Electrical and Electronics Engineers (IEEE) ; 2007

In: IEEE Internet Computing Vol. 11, No. 2 ( 2007-03), p. 16-25

add to watchlist on the watchlist

Details

In: IEEE Internet Computing, Institute of Electrical and Electronics Engineers (IEEE), Vol. 11, No. 2 ( 2007-03), p. 16-25

Type of Medium: Online Resource

ISSN: 1089-7801

URL: Article

DOI: 10.1109/MIC.2007.46

RVK:

SA 5517

RVK:

SQ 1100

Language: Unknown

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 2007

detail.hit.zdb_id: 2028745-8

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

7

Online Resource

Agility and Performance in Elastic Distributed Storage

Xu, Lianghong ; Cipar, James ; Krevat, Elie ; [et al.]

Association for Computing Machinery (ACM) ; 2014

In: ACM Transactions on Storage Vol. 10, No. 4 ( 2014-10-31), p. 1-27

add to watchlist on the watchlist

Details

In: ACM Transactions on Storage, Association for Computing Machinery (ACM), Vol. 10, No. 4 ( 2014-10-31), p. 1-27

Abstract: Elastic storage systems can be expanded or contracted to meet current demand, allowing servers to be turned off or used for other tasks. However, the usefulness of an elastic distributed storage system is limited by its agility: how quickly it can increase or decrease its number of servers. Due to the large amount of data they must migrate during elastic resizing, state of the art designs usually have to make painful trade-offs among performance, elasticity, and agility. This article describes the state of the art in elastic storage and a new system, called SpringFS, that can quickly change its number of active servers, while retaining elasticity and performance goals. SpringFS uses a novel technique, termed bounded write offloading , that restricts the set of servers where writes to overloaded servers are redirected. This technique, combined with the read offloading and passive migration policies used in SpringFS, minimizes the work needed before deactivation or activation of servers. Analysis of real-world traces from Hadoop deployments at Facebook and various Cloudera customers and experiments with the SpringFS prototype confirm SpringFS’s agility, show that it reduces the amount of data migrated for elastic resizing by up to two orders of magnitude, and show that it cuts the percentage of active servers required by 67--82%, outdoing state-of-the-art designs by 6--120%.

Type of Medium: Online Resource

ISSN: 1553-3077 , 1553-3093

URL: Article

DOI: 10.1145/2668129

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2014

detail.hit.zdb_id: 2177816-4

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

8

Online Resource

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks

Seshadri, Vivek ; Yedkar, Samihan ; Xin, Hongyi ; [et al.]

Association for Computing Machinery (ACM) ; 2015

In: ACM Transactions on Architecture and Code Optimization Vol. 11, No. 4 ( 2015-01-09), p. 1-22

add to watchlist on the watchlist

Details

In: ACM Transactions on Architecture and Code Optimization, Association for Computing Machinery (ACM), Vol. 11, No. 4 ( 2015-01-09), p. 1-22

Abstract: Many modern high-performance processors prefetch blocks into the on-chip cache. Prefetched blocks can potentially pollute the cache by evicting more useful blocks. In this work, we observe that both accurate and inaccurate prefetches lead to cache pollution, and propose a comprehensive mechanism to mitigate prefetcher-caused cache pollution. First, we observe that over 95% of useful prefetches in a wide variety of applications are not reused after the first demand hit (in secondary caches). Based on this observation, our first mechanism simply demotes a prefetched block to the lowest priority on a demand hit. Second, to address pollution caused by inaccurate prefetches, we propose a self-tuning prefetch accuracy predictor to predict if a prefetch is accurate or inaccurate. Only predicted-accurate prefetches are inserted into the cache with a high priority. Evaluations show that our final mechanism, which combines these two ideas, significantly improves performance compared to both the baseline LRU policy and two state-of-the-art approaches to mitigating prefetcher-caused cache pollution (up to 49%, and 6% on average for 157 two-core multiprogrammed workloads). The performance improvement is consistent across a wide variety of system configurations.

Type of Medium: Online Resource

ISSN: 1544-3566 , 1544-3973

URL: Article

DOI: 10.1145/2677956

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2015

detail.hit.zdb_id: 2142607-7

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

9

Online Resource

Butterfly analysis : adapting dataflow analysis to dynamic parallel monitoring

Goodstein, Michelle L. ; Vlachos, Evangelos ; Chen, Shimin ; [et al.]

Association for Computing Machinery (ACM) ; 2010

In: ACM SIGPLAN Notices Vol. 45, No. 3 ( 2010-03-05), p. 257-270

add to watchlist on the watchlist

Details

In: ACM SIGPLAN Notices, Association for Computing Machinery (ACM), Vol. 45, No. 3 ( 2010-03-05), p. 257-270

Abstract: Online program monitoring is an effective technique for detecting bugs and security attacks in running applications. Extending these tools to monitor parallel programs is challenging because the tools must account for inter-thread dependences and relaxed memory consistency models. Existing tools assume sequential consistency and often slow down the monitored program by orders of magnitude. In this paper, we present a novel approach that avoids these pitfalls by not relying on strong consistency models or detailed inter-thread dependence tracking. Instead, we only assume that events in the distant past on all threads have become visible; we make no assumptions on (and avoid the overheads of tracking) the relative ordering of more recent events on other threads. To overcome the potential state explosion of considering all the possible orderings among recent events, we adapt two techniques from static dataflow analysis, reaching definitions and reaching expressions, to this new domain of dynamic parallel monitoring. Significant modifications to these techniques are proposed to ensure the correctness and efficiency of our approach. We show how our adapted analysis can be used in two popular memory and security tools. We prove that our approach does not miss errors, and sacrifices precision only due to the lack of a relative ordering among recent events. Moreover, our simulation study on a collection of Splash-2 and Parsec 2.0 benchmarks running a memory-checking tool on a hardware-assisted logging platform demonstrates the potential benefits in trading off a very low false positive rate for (i) reduced overhead and (ii) the ability to run on relaxed consistency models.

Type of Medium: Online Resource

ISSN: 0362-1340 , 1558-1160

URL: Article

DOI: 10.1145/1735971.1736050

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2010

detail.hit.zdb_id: 2079194-X

detail.hit.zdb_id: 282422-X

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

10

Online Resource

ParaLog : enabling and accelerating online parallel monitoring of multithreaded applications

Vlachos, Evangelos ; Goodstein, Michelle L. ; Kozuch, Michael A. ; [et al.]

Association for Computing Machinery (ACM) ; 2010

In: ACM SIGPLAN Notices Vol. 45, No. 3 ( 2010-03-05), p. 271-284

add to watchlist on the watchlist

Details

In: ACM SIGPLAN Notices, Association for Computing Machinery (ACM), Vol. 45, No. 3 ( 2010-03-05), p. 271-284

Abstract: Instruction-grain lifeguards monitor the events of a running application at the level of individual instructions in order to identify and help mitigate application bugs and security exploits. Because such lifeguards impose a 10-100X slowdown on existing platforms, previous studies have proposed hardware designs to accelerate lifeguard processing. However, these accelerators are either tailored to a specific class of lifeguards or suitable only for monitoring singlethreaded programs. We present ParaLog, the first design of a system enabling fast online parallel monitoring of multithreaded parallel applications. ParaLog supports a broad class of software-defined lifeguards. We show how three existing accelerators can be enhanced to support online multithreaded monitoring, dramatically reducing lifeguard overheads. We identify and solve several challenges in monitoring parallel applications and/or parallelizing these accelerators, including (i) enforcing inter-thread data dependences, (ii) dealing with inter-thread effects that are not reflected in coherence traffic, (iii) dealing with unmonitored operating system activity, and (iv) ensuring lifeguards can access shared metadata with negligible synchronization overheads. We present our system design for both Sequentially Consistent and Total Store Ordering processors. We implement and evaluate our design on a 16 core simulated CMP, using benchmarks from SPLASH-2 and PARSEC and two lifeguards: a data-flow tracking lifeguard and a memory-access checker lifeguard. Our results show that (i) our parallel accelerators improve performance by 2-9X and 1.13-3.4X for our two lifeguards, respectively, (ii) we are 5-126X faster than the time-slicing approach required by existing techniques, and (iii) our average overheads for applications with eight threads are 51% and 28% for the two lifeguards, respectively.

Type of Medium: Online Resource

ISSN: 0362-1340 , 1558-1160

URL: Article

DOI: 10.1145/1735971.1736051

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2010

detail.hit.zdb_id: 2079194-X

detail.hit.zdb_id: 282422-X

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Kooperativer Bibliotheksverbund

Berlin Brandenburg