Embodiments that dynamically conserve power in non-uniform cache access (NUCA) caches are contemplated. Various embodiments comprise a computing device, having one or more processors coupled with one or more NUCA cache elements. The NUCA cache elements may comprise one or more banks of cache memory, wherein ways of the cache are vertically distributed across multiple banks. To conserve power, the computing devices generally turn off groups of banks, in a sequential manner according to different power states, based on the access latencies of the banks. The computing devices may first turn off groups having the greatest access latencies. The computing devices may conserve additional power by turning of more groups of banks according to different power states, continuing to turn off groups with larger access latencies before turning off groups with the smaller access latencies.
Data Reorganization In Non-Uniform Cache Access Caches
Ganesh Balakrishnan - Apex NC, US Gordon B. Bell - Cary NC, US Anil Krishna - Cary NC, US Srinivasan Ramani - Cary NC, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 15/163
US Classification:
711119, 711118, 711129, 711157
Abstract:
Embodiments that dynamically reorganize data of cache lines in non-uniform cache access (NUCA) caches are contemplated. Various embodiments comprise a computing device, having one or more processors coupled with one or more NUCA cache elements. The NUCA cache elements may comprise one or more banks of cache memory, wherein ways of the cache are horizontally distributed across multiple banks. To improve access latency of the data by the processors, the computing devices may dynamically propagate cache lines into banks closer to the processors using the cache lines. To accomplish such dynamic reorganization, embodiments may maintain “direction” bits for cache lines. The direction bits may indicate to which processor the data should be moved. Further, embodiments may use the direction bits to make cache line movement decisions.
Gordon Bernard Bell - Cary NC, US Anil Krishna - Cary NC, US Brian Michael Rogers - Durham NC, US Ken Van Vu - Cary NC, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/08
US Classification:
711136
Abstract:
The illustrative embodiments provide a method, apparatus, and computer program product for managing a number of cache lines in a cache. In one illustrative embodiment, it is determined whether activity on a memory bus in communication with the cache exceeds a threshold activity level. A least important cache line is located in the cache responsive to a determination that the threshold activity level is exceeded, wherein the least important cache line is located using a cache replacement scheme. It is determined whether the least important cache line is clean responsive to the determination that the threshold activity level is exceeded. The least important cache line is selected for replacement in the cache responsive to a determination that the least important cache line is clean. A clean cache line is located within a subset of the number of cache lines and selecting the clean cache line for replacement responsive to an absence of a determination that the least important cache line is not clean, wherein the each cache line in the subset is examined in ascending order of importance according to the cache replacement scheme.
Systems And Methods For Selectively Closing Pages In A Memory
Systems, methods and media for selectively closing pages in a memory in anticipation of a context switch are disclosed. In one embodiment, a table is provided to keep track of open pages for different processes. The table comprises rows corresponding to banks of memory and columns corresponding to cores of a multi-core processing system. When a context switch signal is received, the system unsets a bit in a column corresponding to the core from which the process is to be context-switched out. If no other process is using a page opened by the process the page is closed.
Ganesh Balakrishnan - Apex NC, US Anil Krishna - Cary NC, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/00
US Classification:
711118, 711160
Abstract:
Embodiments that that distribute replacement policy bits and operate the bits in cache memories, such as non-uniform cache access (NUCA) caches, are contemplated. An embodiment may comprise a computing device, such as a computer having multiple processors or multiple cores, which has cache memory elements coupled with the multiple processors or cores. The cache memory device may track usage of cache lines by using a number of bits. For example, a controller of the cache memory may manipulate bits as part of a pseudo least recently used (LRU) system. Some of the bits may be in a centralized area of the cache. Other bits of the pseudo LRU system may be distributed across the cache. Distributing the bits across the cache may enable the system to conserve additional power by turning off the distributed bits.
Effective Prefetching With Multiple Processors And Threads
Gordon Bernard Bell - Madison WI, US Gordon Taylor Davis - Chapel Hill NC, US Jeffrey Haskell Derby - Chapel Hill NC, US Anil Krishna - Cary NC, US Srinivasan Ramani - Cary NC, US Ken Vu - Cary NC, US Steve Woolet - Raleigh NC, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 13/00
US Classification:
711137, 711121, 711124, 710 22, 712237
Abstract:
A processing system includes a memory and a first core configured to process applications. The first core includes a first cache. The processing system includes a mechanism configured to capture a sequence of addresses of the application that miss the first cache in the first core and to place the sequence of addresses in a storage array; and a second core configured to process at least one software algorithm. The at least one software algorithm utilizes the sequence of addresses from the storage array to generate a sequence of prefetch addresses. The second core issues prefetch requests for the sequence of the prefetch addresses to the memory to obtain prefetched data and the prefetched data is provided to the first core if requested.
A method and a system for utilizing less recently used (LRU) bits and presence bits in selecting cache-lines for eviction from a lower level cache in a processor-memory sub-system. A cache back invalidation (CBI) logic utilizes LRU bits to evict only cache-lines within a LRU group, following a cache miss in the lower level cache. In addition, the CBI logic uses presence bits to (a) indicate whether a cache-line in a lower level cache is also present in a higher level cache and (b) evict only cache-lines in the lower level cache that are not present in a corresponding higher level cache. However, when the lower level cache-line selected for eviction is also present in any higher level cache, CBI logic invalidates the cache-line in the higher level cache. The CBI logic appropriately updates the values of presence bits and LRU bits, following evictions and invalidations.
Anil Krishna - Cary NC, US Brian M. Rogers - Durham NC, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/00
US Classification:
711133, 711113
Abstract:
The illustrative embodiments provide a method, a computer program product, and an apparatus for managing a cache. A probability of a future request for data to be stored in a portion of the cache by a thread is identified for each of the number of threads to form a number of probabilities. The data is stored with a rank in a number of ranks in the portion of the cache responsive to receiving the future request from the thread in the number of threads for the data. The rank is selected using the probability in the number of probabilities for the thread.
Apple
Engineer
Qualcomm Oct 1, 2012 - Aug 2017
Staff Engineer
Ibm Jul 2007 - Oct 2012
Advisory Engineer and Scientist
Ibm Jul 2003 - Jul 2007
Staff Scientist and Engineer
Ibm Jul 2001 - Jul 2003
Engineer and Scientist
Education:
North Carolina State University 2005 - 2013
Doctorates, Doctor of Philosophy, Philosophy
North Carolina State University 2005 - 2012
Doctorates, Doctor of Philosophy, Computer Engineering
Purdue University 1999 - 2001
Master of Science, Masters, Computer Engineering
Indian Institute of Technology, Guwahati 1995 - 1999
Skills:
Computer Architecture and Microarchitecture Embedded Soc Modeling Performance Modeling and Analysis Cmp Design Cache Hierarchy Design Cache Coherency Memory Management High Performance Computing Discrete Event Simulation Csim Ip Development Computer Architecture Performance Engineering Embedded Systems Ip Soc Asic Processors Microprocessors Tcl Debugging Algorithms System Architecture Perl Signal Processing Hardware Architecture Application Specific Integrated Circuits C System on A Chip