Jason Frederick Cantin - Madison WI, US Michael Ju Hyeok Lee - Austin TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
H03F003/45 H03K003/356
US Classification:
327 57, 327208, 327217, 327218, 326 94, 326 95
Abstract:
A latch circuit capable of ensuring race-free staging for signals in dynamic logic circuits is disclosed. The latch circuit includes four separate logic gates. The first inputs of the first and second logic gates are connected to a first and second precharged internal nodes of the dynamic logic circuit, respectively. The second inputs of the first and second gates are connected to a first and second differential outputs of the dynamic logic circuit, respectively. The first inputs of the third and fourth gates are connected to an output of the first and second logic gates, respectively. The second input of the fourth gate is connected to an output of the third logic gate to provide a first output for the latch circuit. Similarly, the second input of the third logic gate is connected to the output of the fourth logic gate to provide a second output for the latch circuit.
Method, Apparatus, And Computer Program Product For A Cache Coherency Protocol State That Predicts Locations Of Modified Memory Blocks
Jason Frederick Cantin - Madison WI, US Steven R. Kunkel - Rochester MN, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/00 G06F 13/00 G06F 13/28
US Classification:
711141, 711118, 711119, 711154
Abstract:
A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast remote requests to reduce the latency to access data from local nodes and to reduce global traffic in an SMP computer system. A modified invalid cache coherency protocol state is defined that predicts whether a memory access request to read or write data in a cache line can be satisfied within a local node. When a cache line is in the modified invalid state, the only valid copies of the data are predicted to be located in the local node. When a cache line is in the invalid state and not in the modified invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory access requests to read exclusive or write data in a cache line that is not currently in the modified invalid state are broadcast first to all nodes. Memory access requests to read exclusive or write data in a cache line that is currently in the modified invalid state are broadcast first to a local node, and in response to being unable to satisfy the memory access requests within the local node, the memory access requests are broadcast to the remote nodes.
Data Processing System And Method For Efficient Communication Utilizing An In Coherency State
Jason F. Cantin - Madison WI, US Steven R. Kunkel - Rochester MN, US William J. Starke - Round Rock TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 13/00
US Classification:
711144, 711119, 711141, 711133, 711145
Abstract:
A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain.
Method, Apparatus, And Computer Program Product For A Cache Coherency Protocol State That Predicts Locations Of Shared Memory Blocks
Jason Frederick Cantin - Madison WI, US Steven R. Kunkel - Rochester MN, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/00 G06F 13/00 G06F 13/28
US Classification:
711141, 711100, 711118, 711146
Abstract:
A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is declined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory read requests to read data in a cache line that is not currently in tile shared invalid state are broadcast first to remote nodes. Memory read requests to read data in a cache line that is currently in the shared invalid state are broadcast first to a local node, and in response to being unable to satisfy the memory read requests within the local node, the memory read requests are broadcast to the remote nodes.
Method, Apparatus, And Computer Program Product For A Cache Coherency Protocol State That Predicts Locations Of Modified Memory Blocks
Jason Frederick Cantin - Madison WI, US Steven R. Kunkel - Rochester MN, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/00 G06F 13/00 G06F 13/28 G06F 15/76
US Classification:
711114, 711118, 711119, 711154, 712 28
Abstract:
A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast remote requests to reduce the latency to access data from local nodes and to reduce global traffic in an SMP computer system. A modified invalid cache coherency protocol state is defined that predicts whether a memory access request to read or write data in a cache line can be satisfied within a local node. When a cache line is in the modified invalid state, the only valid copies of the data are predicted to be located in the local node. When a cache line is in the invalid state and not in the modified invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory access requests to read exclusive or write data in a cache line that is not currently in the modified invalid state are broadcast first to all nodes. Memory access requests to read exclusive or write data in a cache line that is currently in the modified invalid state are broadcast first to a local node, and in response to being unable to satisfy the memory access requests within the local node, the memory access requests are broadcast to the remote nodes.
Method, Apparatus, And Computer Program Product For A Cache Coherency Protocol State That Predicts Locations Of Shared Memory Blocks
Jason Frederick Cantin - Madison WI, US Steven R. Kunkel - Rochester MN, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/00 G06F 13/00 G06F 13/28
US Classification:
711141
Abstract:
A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is defined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory read requests to read data in a cache line that is not currently in the shared invalid state are broadcast first to remote nodes. Memory read requests to read data in a cache line that is currently in the shared invalid state are broadcast first to a local node, and in response to being unable to satisfy the memory read requests within the local node, the memory read requests are broadcast to the remote nodes.
Data Processing System And Method For Efficient Communication Utilizing An In Coherency State
A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain.
Shared Data Prefetching With Memory Region Cache Line Monitoring
A method, circuit arrangement, and design structure for prefetching data for responding to a memory request, in a shared memory computing system of the type that includes a plurality of nodes, is provided. Prefetching data comprises, receiving, in response to a first memory request by a first node, presence data for a memory region associated with the first memory request from a second node that sources data requested by the first memory request, and selectively prefetching at least one cache line from the memory region based on the received presence data. Responding to a memory request comprises tracking presence data associated with memory regions associated with cached cache lines in the first node, and, in response to a memory request by a second node, forwarding the tracked presence data for a memory region associated with the memory request to the second node.
Qualcomm Aug 2013 - Jul 2015
Staff Engineer
Apple Aug 2013 - Jul 2015
Performance Modeling Engineer Lead
Ibm Jun 2010 - Aug 2013
Senior Engineer, Systems Architecture and Performance
Ibm Jun 2006 - Jun 2010
Advisory Engineer, System Architecture and Performance
Hewlett-Packard Jun 2000 - Sep 2000
Mobile Hardware Research
Education:
University of Wisconsin - Madison 2000 - 2002
Master of Science, Doctorates, Masters, Doctor of Philosophy, Electrical Engineering, Philosophy
University of Cincinnati 1995 - 2000
Bachelors, Electrical Engineering, Computer Engineering
Worcester Polytechnic Institute 1998 - 1998
Skills:
Computer Architecture Embedded Systems Logic Design Hardware Architecture Debugging High Performance Computing Algorithms C++ Perl Microprocessors Vlsi Linux C Processors Simulations Microarchitecture Electrical Engineering Shell Scripting Verilog Genetic Algorithms Parallel Computing Functional Verification Compilers Soc Software Engineering System Architecture Unix Operating Systems Programming Languages Vhdl Python Physical Design Compiler Optimization Cluster Machine Learning Systemc Performance Modeling System Performance Genetic Programming Artificial Intelligence Cache Coherence Programming Application Specific Integrated Circuits Circuit Design