Naga K. Govindaraju - Redmond WA, US David Brandon Lloyd - Redmond WA, US Yuri Dotsenko - Redmond WA, US Burton Jordan Smith - Seattle WA, US Jon L. Manferdelli - Redmond WA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 17/14 G06F 12/00
US Classification:
708404, 708405, 711147
Abstract:
A system described herein includes a selector component that receives input data that is desirably transformed by way of a Discrete Fourier Transform, wherein the selector component selects one of a plurality of algorithms for computing the Discrete Fourier Transform from a library based at least in part upon a size of the input function. An evaluator component executes the selected one of the plurality of algorithms to compute the Discrete Fourier Transform, wherein the evaluator component causes leverages shared memory of a processor to compute the Discrete Fourier Transform.
Andy Glaister - Redmond WA, US Blaise Pascal Tine - Lynnwood WA, US Derek Sessions - Bellevue WA, US Mikhail Lyapunov - Woodinville WA, US Yuri Dotsenko - Kirkland WA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06T 15/80
US Classification:
345426
Abstract:
Described are compiler algorithms that partition a compute shader program into maximal-size regions, called thread-loops. The algorithms may remove original barrier-based synchronization yet the thus-transformed shader program remains semantically equivalent to the original shader program (i.e., the transformed shader program is correct). Moreover, the transformed shader program is amenable to optimization via existing compiler technology, and can be executed efficiently by CPU thread(s). A Dispatch call can be load-balanced on a CPU by assigning single or multiple CPU threads to execute thread blocks. In addition, the number of concurrently executing thread blocks do not overload the CPU.
Andy Glaister - Redmond WA, US Blaise Pascal Tine - Lynnwood WA, US Derek Sessions - Bellevue WA, US Mikhail Lyapunov - Woodinville WA, US Yuri Dotsenko - Kirkland WA, US
Assignee:
MICROSOFT CORPORATION - Redmond WA
International Classification:
G06F 9/45
US Classification:
717146, 717151
Abstract:
Described herein are optimizations of thread loop intermediate representation (IR) code. One embodiment involves an algorithm that, based on data-flow analysis, computes sets of temporary variables that are loaded at the beginning of a thread loop and stored upon exit from a thread loop. Another embodiment involves reducing the size of a thread loop trip for a commonly-found case where a piece of compute shader is executed by a single thread (or a compiler-analyzable range of threads). In yet another embodiment, compute shader thread indices are cached to avoid excessive divisions, further improving execution speed.
Andy Glaister - Redmond WA, US Blaise Pascal Tine - Lynnwood WA, US Blake Pelton - Redmond WA, US Derek Sessions - Bellevue WA, US Mikhail Lyapunov - Woodinville WA, US Yuri Dotsenko - Kirkland WA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 9/45
US Classification:
717146
Abstract:
Intermediate representation (IR) code is received as compiled from a shader in the form of shader language source code. The input IR code is first analyzed during an analysis pass, during which operations, scopes, parts of scopes, and if-statement scopes are annotated for predication, mask usage, and branch protection and predication. This analysis outputs vectorization information that is then used by various sets of vectorization transformation rules to vectorize the input IR code, thus producing vectorized output IR code.
Yuri Dotsenko - Redmond WA, US Naga Govindaraju - Redmond WA, US Charles Boyd - Redmond WA, US John Manferdelli - Redmond WA, US
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 17/30
US Classification:
707705, 707E17005
Abstract:
A system and method for performing a scan of an input sequence in a parallel processor having a shared register file. A two dimensional matrix is generated, having a number of rows representing a number of threads and a number of columns based on the input sequence block size and the number of rows. One or more padding columns may be added to the matrix to avoid or reduce memory bank conflicts. A first traversal of the rows performs a reduction or a scan of each of the rows in parallel, storing the reduction values. The reduction values are used during a second traversal to propagate the reduction values. In a segmented scan, propagation is selectively performed based on flags representing segment boundaries.
Configuring Resources Used By A Graphics Processing Unit
- Redmond WA, US Matthew D. Sandy - Bellevue WA, US Yuri Dotsenko - Kirkland WA, US Jesse T. Natalie - Redmond WA, US Max A. McMullen - Seattle WA, US
International Classification:
G06F 9/54 G06F 9/44
Abstract:
The application programming interface permits an application to specify resources to be used by shaders, executed by the GPU, through a data structure called the “root arguments.” A root signature is a data structure in an application that defines the layout of the root arguments used by an application. The root arguments are a data structure resulting from the application populating locations in memory according to the root signature. The root arguments can include one or more constant values or other state information, and/or one or more pointers to memory locations which can contain descriptors, and/or one or more descriptor tables. Thus, the root arguments can support multiple levels of indirection through which a GPU can identify resources that are available for shaders to access.
Configuring Resources Used By A Graphics Processing Unit
- Redmond WA, US Matthew D. Sandy - Bellevue WA, US Yuri Dotsenko - Kirkland WA, US Jesse T. Natalie - Redmond WA, US Max A. McMullen - Seattle WA, US
International Classification:
G06T 1/20 G06T 1/60 G06F 9/54
Abstract:
The application programming interface permits an application to specify resources to be used by shaders, executed by the GPU, through a data structure called the “root arguments.” A root signature is a data structure in an application that defines the layout of the root arguments used by an application. The root arguments are a data structure resulting from the application populating locations in memory according to the root signature. The root arguments can include one or more constant values or other state information, and/or one or more pointers to memory locations which can contain descriptors, and/or one or more descriptor tables. Thus, the root arguments can support multiple levels of indirection through which a GPU can identify resources that are available for shaders to access.
Configuring Resources Used By A Graphics Processing Unit
- Redmond WA, US Matthew D. Sandy - Bellevue WA, US Yuri Dotsenko - Kirkland WA, US Jesse T. Natalie - Redmond WA, US Max A. McMullen - Seattle WA, US
International Classification:
G06T 15/00 G06T 1/20
Abstract:
A resource used by a shader executed by a graphics processing unit is referenced using a “descriptor”. Descriptors are grouped together in memory called a descriptor heap. Applications allocate and store descriptors in descriptor heaps. Applications also create one or more descriptor tables specifying a subrange of a descriptor heap. To bind resources to a shader, descriptors are first loaded into a descriptor heap. When the resources are to be used by a set of executing shaders, descriptor tables are defined on the GPU identifying ranges within the descriptor heap. Shaders, when executing, refer to the currently defined descriptor tables to access the resources made available to them. If the shader is to be executed again with different resources, and if those resources are already in memory and specified in the descriptor heap, then the descriptor tables are changed to specify different ranges of the descriptor heap.
Rice University 2002 - 2007
Doctorates, Doctor of Philosophy, Computer Science
Rice University 1999 - 2002
Master of Science, Masters, Computer Science
Skills:
C++ Algorithms C Software Development Distributed Systems Computer Science Python Programming Linux High Performance Computing