|
Timothy Farrar blog
is always a very very good source of information if
you are looking for very in-depth thoughts and experiences on GPU.
Wednesday, Timothy published a post about his vision of the evolution
of
the GPU in a near future. In particular, Timothy exposes the idea that
future GPUs could expose a new highly flexible mechanism for job
distribution based on generic hardware managed queues (FIFO) associated
to kernels.
Current GPUs start threads by scheduling groups of independent jobs
between dependent state changes from a master control stream (a command
buffer filled by the CPU). OpenGL conditional rendering provides a
starting point to on-the-fly modify the task list in this stream and
DX11
seams to go further with the DispatchIndirect
function that enables DX Compute grid dimensions to come directly from
device
memory. The idea is that future hardware may provide generic queues
that could be filed by kernels and used proactively by the hardware
scheduler to set up thread blocks and route data to an available core
to start new thread blocks using the kernel associated with the queue.
Much
of the work in parallel processing is related to grouping, moving and
compacting or expanding data and end up to be data routing problems.
This model seems to provide a very good way to handle grouping for data
locality. That could allow kernels that reach a divergent point (such
as branch
divergence or data locality divergence) to output threads to new
queues with a new domain coordinate to insure a new good grouping for
continued computation. Data associated to a kernel would also be in the
queue and managed in hardware, to provide very fast access to threads
parameters.
This
can be done using a CPU like coherent cache with a large vector
processor like Larabee, but data routing becomes expensive with a
coherent cache that consume transistor for a rooting that could have
been define explicitly by programer. When you attempt to do all
this routing manually with dedicated local memory and high throughput
global memory, it is still expensive, just less expensive. The idea of Timothy is that this mechanism could be highly hardware accelerated and could provide a big advantages to "traditional" GPUs over Larabee like more generic architectures. I really
think this is the way to go for GPU to continue to provide high
performances to more generic graphics rendering pipelines.
The same idea is developed on a TOG paper that will be presented at Siggraph this year. This paper present GRAMPS, a programming model that generalizes concepts from
modern real-time graphics pipelines by exposing a model of execution
mixing task parallelism and data parallelism containing both fixed-function and application-programmable processing
stages that exchange data via queues.
Seuls les utilisateurs enregistrés peuvent écrire des commentaires. Veuillez vous identifier ou vous enregistrer. Powered by AkoComment 2.0! |