Yesterday, NVIDIA made public two new OpenGL extensions named NV_shader_buffer_load and NV_vertex_buffer_unified_memory, these new extensions allow to use OpenGL in a totally new way they called Bindless Graphics. With Bindless Graphics you can manipulate Buffer Objects directly using their GPU global memory addresses and control the residency of these objects from applications. It allows to remove the bottleneck coming from binding objects before being able to use them, that force the driver to fetch all objects states before being able to use of modify them.
The NV_shader_buffer_load extension provides a mechanism to bind buffer objects to the context in such a way that they can be accessed by reading from a flat, 64-bit GPU address space directly from any shader stage and to query GPU addresses of buffer objects at the API level. The intent is that applications can avoid re-binding buffer objects or updating constants between each Draw call and instead simply use a VertexAttrib (or TexCoord, or InstanceID, or…) to “point” to the new object’s state.
The NV_vertex_buffer_unified_memory extension provides a mechanism to specify vertex attributes and element array locations using these GPU addresses. Binding vertex buffers is one of the most frequent and expensive operations in many GL applications, due to the cost of chasing pointers and binding objects. With this extension, application can specify vertex attributes state direcly using VBO adresses that alleviates the overhead of object binds and driver memory management.
NVIDIA provides a small bindless graphics tutorial, with a
That seems very useful, but what scare me a little bit is that each time you provide the developer with lower level access like this, you reduce a lot the potential of automatic driver optimizations and in particular, I wonder how this mechanism interact with NVIDIA SLI mode that provide automatic scaling of OpenGL applications among multiple GPU. This mode duplicate data on each GPU and broadcast drawing command to all the GPU to allow them to produce differents parts of a frame and compose them before display. Using these extensions, the same address space has to be maintained on all GPU involved in SLI drawing, that seems to be very difficult especially in case of etherogenous SLI configurations.