Go to Home/NewsGo to ContactGo to About
Language selection
Main Menu
Welcome
- - - - - - - - - - - - -
GPU Blog
GPU Forum
- - - - - - - - - - - - -
Internships
OpenGL Projects
UTBM Projects
3D Gallery
- - - - - - - - - - - - -
Links
Site Map
Last articles
Random Image
TasSpheres1.png
Syndicate
Identification





Lost Password?

Welcome

Welcome on Icare3D !
General - Site
samedi, 12 juin 2004

newlogo1b


Welcome on my personal and technical website. The aim of this website is to publish different projects I make as part of my studies, internships or simply during my free time, mainly related to Computer Graphics, real time 3D programming and scientific visualisation.  The second goal of this website is to provide technical news about GPU related subjects.

Cyril


 
Farrarfocus thoughts on GPU evolution
TechnoBlog - GPU
vendredi, 08 mai 2009

nvidia_logoTimothy Farrar blog is always a very very good source of information if you are looking for very in-depth thoughts and experiences on GPU. Wednesday, Timothy published a post about his vision of the evolution of the GPU in a near future. In particular, Timothy exposes the idea that future GPUs  (in particular NVIDIA ones) could expose a new highly flexible mechanism for job distribution based on generic hardware managed queues (FIFO) associated to kernels.

Current GPUs start threads by scheduling groups of independent jobs between dependent state changes from a master control stream (a command buffer filled by the CPU). OpenGL conditional rendering provides a starting point to on-the-fly modify the task list in this stream and DX11 seams to go further with the DispatchIndirect function that enables DX Compute grid dimensions to come directly from device memory. The idea is that future hardware may provide generic queues that could be filed by kernels and used proactively by the hardware scheduler to set up thread blocks and route data to an available core to start new thread blocks using the kernel associated with the queue.

Much of the work in parallel processing is related to grouping, moving and compacting or expanding data and end up to be data routing problems. This model seems to provide a very good way to handle grouping for data locality. That could allow kernels that reach a divergent point (such as branch divergence or data locality divergence) to output threads to new queues with a new domain coordinate to insure a new good grouping for continued computation. Data associated to a kernel would also be in the queue and managed in hardware, to provide very fast access to threads parameters.

This can be done using a CPU like coherent cache with a large vector processor like Larabee, but data routing becomes expensive with a coherent cache that consume transistor for a rooting that could have been define explicitly by programer. When you attempt to do all this routing manually with dedicated local memory and high throughput global memory, it is still expensive, just less expensive. The idea of Timothy is that this mechanism could be highly hardware accelerated and could provide a big advantages to "traditional" GPUs over Larabee like more generic architectures. I really think this is the way to go for GPU to continue to provide high performances to more generic graphics rendering pipelines.

The same idea is developed on a TOG paper that will be presented at Siggraph this year. This paper present GRAMPS, a programming model that generalizes concepts from modern real-time graphics pipelines by exposing a model of execution mixing task parallelism and data parallelism containing both fixed-function and application-programmable processing stages that exchange data via queues.

Write Comment (0 Comments)
 
NVIDIA OpenGL Bindless Graphics
TechnoBlog - GPU
mardi, 28 avril 2009

nvidia_logoYesterday, NVIDIA made public two new OpenGL extensions named NV_shader_buffer_load and NV_vertex_buffer_unified_memory, these new extensions allow to use OpenGL in a totally new way they called Bindless Graphics. With Bindless Graphics you can manipulate Buffer Objects directly using their GPU global memory addresses and control the residency of these objects from applications. It allows to remove the bottleneck coming from binding objects before being able to use them, that force the driver to fetch all objects states before being able to use of modify them.

The  NV_shader_buffer_load extension provides a mechanism to bind buffer objects to the context in such a way that they can be accessed by reading from a flat, 64-bit GPU address space directly from any shader stage and to query GPU addresses of buffer objects at the API level. The intent is that applications can avoid re-binding buffer objects or updating constants between each Draw call and instead simply use a VertexAttrib (or TexCoord, or InstanceID, or...) to "point" to the new object's state.

The NV_vertex_buffer_unified_memory extension provides a mechanism to specify vertex attributes and element array locations using these GPU addresses. Binding vertex buffers is one of the most frequent and expensive operations in many GL applications, due to the cost of chasing pointers and binding objects. With this extension, application can specify vertex attributes state direcly using VBO adresses that alleviates the overhead of object binds and driver memory management.

NVIDIA provides a small bindless graphics tutorial, with a presentation of the new features.

That seems very useful, but what scare me a little bit is that each time you provide the developer with lower level access like this, you reduce a lot the potential of automatic driver optimizations and in particular, I wonder how this mechanism interact with NVIDIA SLI mode that provide automatic scaling of OpenGL applications among multiple GPU. This mode duplicate data on each GPU and broadcast drawing command to all the GPU to allow them to produce differents parts of a frame and compose them before display. Using these extensions, the same address space has to be maintained on all GPU involved in SLI drawing, that seems to be very difficult especially in case of etherogenous SLI configurations.

Write Comment (0 Comments)
 
CUDA visual studio integration
TechnoBlog - GPU
mercredi, 22 avril 2009

cudalogo2 Just would like to give small tips I found to help working with CUDA under visual studio.

First, syntax highlighting for .cu files can be enabled with these few steps:

  1. Copy the content of the “usertype.dat” file provided by nvidia (NVIDIA CUDA SDK\doc\syntax_highlighting\visual_studio_8) into your “Microsoft Visual Studio 8\Common7\IDE” folder from your program files folder.

  2. Open Visual Studio and Take Tools -> Options. Under Text Editor -> File Extension tab, specify the extension “cu” as a new type.

Visual Studio rely on a feature named Intellisense to provide functions and variables names completion, definitions lookup and all these kind of features. To get intelligence working with .cu files, yo have to modify a windows registry key: Add c and cuh extensions to NCB Default C/C++ Extensions key under "HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\9.0\Languages\Language Services\C/C++" path. (Thanks to http://www.wizardsofeast.com/?p=378 for the tip)

For those using Visual Assist X, you can do the following. First, find the Visual Assist X install directory: (X:\Program Files\Visual Assist X\AutoText\latest) and then make a copy of Cpp.tpl and rename it to Cu.tpl. Second, Open and close Visual Studio (this initializes Visual Assist X parameters by creating some folders/variables in the Registry ). Third, open regedit and go to: "HKEY_CURRENT_USER\Software\Whole Tomato\Visual Assist X\VANet9" and add ".cu;" to the ExtSource key and add ".cuh;" to the ExtHeader key. (Thanks to ciberxtrem for the tip)

Finally, build rules allowing to easily compile .cu files without having to write the rules manually can be integrated installing this little wizard: http://forums.nvidia.com/index.php?showtopic=65111. More details on CUDA build rules can be found on this website : http://sarathc.wordpress.com/2008/09/26/how-to-integrate-cuda-with-visual-c/.

Write Comment (1 Comments)
 
Larrabee ISA at GDC
TechnoBlog - GPU
lundi, 06 avril 2009

intelLast week, intel gives two talks about Larrabee ISA called Larrabee New instructions (LRBNi).

The most significant thing to note is that Larrabee will expose a vector assembly, very similar to SSE instructions, but operating on 16 components vectors instead of 4. To program this, they will provide C intrinsics whose names that look... really weird !

C++ Larrabee prototype library: http://software.intel.com/en-us/articles/prototype-primitives-guide/

Intel provide headers with x86 implementations of these instructions to allow developers to start using these instructions now. But I can't imagine anybody using this kind of vector intrinsics to program a data parallel architecture. As we have seen with SSE instructions, very few programmers finally used them, and only for very specific algorithm parts. So I think that these intrinsics will be only used  to implement higher level programming layers, like an OpenCL implementation, that is for me a really better and more flexible way to program these architectures.

The scalar model exposed for the G80 through CUDA and the PTX assembly (and that will be exposed by OpenCL) uses scalar operations over scalar registers. In this model,  the underlining SIMD architecture is visible through the notion of warps, inside which programmers know that divergent branches are serialized. Inter-threads communication is exposed through the notion of CTA (Cooperative Threads Array), a group of threads able to communicate through a very fast shared memory. Coalescing rules are given to the programmers to allow him to make best use of the underlining SIMD architecture,  but the model is far more scalable (not restricted to a given vector size) and allows to write codes in a lot more natural way than a vector model.

larabeeblock_pcwatch_01bEven if, for now, Larrabee exposes a vector assembly, where the G80 expose a scalar one, only the programming model vary but the underlining architecture is finally very similar. Each Larrabee core can dual issue instructions to an x86 unit and 16 scalar processors working in SIMD, that is very similar to a G80 Multiprocessor, that can dual issue instructions to a special unit or 8 scalar processor working in SIMD over 4 cycles (providing a 32 wide SIMD). Larrabee exposes 16 wide vectore registers, where the G80 expose scalar ones, that are in facts aligned parts of vector memory bank. 

The true difference before the two architecture is that Larrabee will implement the whole graphics pipeline using these general purpose cores (plus dedicated texture unit), where the G80 still has a lot of very optimized units and data paths dedicated to graphics operations connected into a fixed pipeline. The bet Intel is doing is that the flexibility provided by the full programmable pipeline will allow a better load balancing that will compensate the less efficiency of the architecture for graphics operations. The major asset they rely on is a binning rasterisation model, where after the transform stage, triangles are affected by screen tiles locality to the cores where all the rasterisation, the shading and the blending is done. Thanks to this model, they could keep local screen regions per cores in dedicated parts of a global L2 cache, used for inter-cores communications. That should allow efficient programmable blending for instance. But I think that even them don't know if it will really be competitive for consumer graphics !

And even on that point, Larrabee approach is not so different from G80 approach, where triangles are globally rasterized and then fragments are spread among Multiprocessors based on screen tiles (cf. http://www.icare3d.org/GPU/CN08) for fragment shading, the diference is that z-test and blending are done by fix ROP units, connected to the MP via a crossbar (cf. http://www.realworldtech.com/page.cfm?ArticleID=RWT090808195242).

Finally, with these talks, Intel seems to present as a revolutionary new architecture something that  for the major part has been here for more than 2 years now with the G80, coming with a programming model that seems really weird compared to the CUDA model. This is even more weird that Larabee may not be released before Q1 2010, and at this time NVIDIA and ATI will have already released their next generation architectures that may look even more similar to Larrabee. With Larrabee, Intel has been feeding the industry with a lot of promises, like the "it will be x86, so you won't have to do anything particular to use it", that we have always known to be wrong, since by nature the efficiency of a data parallel architecture comes from it's particular programming model. If a proof was needed, I think this ISA is the one.

Intel GDC presentations: http://software.intel.com/en-us/articles/intel-at-gdc/

Larrabee at GDC, PCWATCH review: http://translate.google.fr/translate?u=http%3A%2F%2Fpc.watch.impress.co.jp%2Fdocs%2F2009%2F0330%2Fkaigai498.htm&sl=ja&tl=en&hl=fr&ie=UTF-8

Very good article about Larrabee and LRBNi: http://www.ddj.com/architect/216402188

 

 

Write Comment (0 Comments)
 
OpenGL 3.1 Specifications
TechnoBlog - GPU
lundi, 30 mars 2009

opengl3_logo1OpenGL 3.1 specification have been released at GDC 2009 a few days ago. I think this is the first time in ARB history a new version of OpenGL is released so quickly, less than one year after OpenGL 3.0 was released (at siggraph last year).

This new revision promote to the core some remaining G80 features that where not promoted into OpenGL 3.0: Texture Buffer Objects (GL_ARB_texture_buffer_object, one-dimensional array of texels used as texture without filtering, equivalent to CUDA linear textures), Uniform Buffer Objects (GL_ARB_uniform_buffer_object, enables rapid swapping of blocks of uniforms, rapid update and sharing across program objects), Primitive Restart (GL_NV_primitive_restart, restart an executing primitive, exist as a extensin since Geforce 6 I think), Instancing (GL_ARB_draw_instanced), Texture Rectangle (GL_ARB_texture_rectangle). Uniform Buffer Objects has been enhanced quite a lot compared to the original GL_EXT_bindable_uniform, among other things, several buffer can be combined to populate a shader uniform block and a standard cross-platform data storage layout is proposed.

There is also two new "features" that were not available as extensions before. A CopyBuffer API that allows fast copied between buffer objects (VBO/PBO/UBO) that will also useful for sharing buffers with OpenCL. The other feature is a Signed Normalized Textures format that is a new integer texture formats that represent a value in the range [-1.0,1.0].

Geometry shaders (GL_ARB_geometry_shader4) where not promoted and maybe they will never be. The extension is not implemented by ATI and is not used a lot since this feature is usefull only in a few cases (due to implementation performances). Direct state access (GL_EXT_direct_state_access) was neither promoted, it's a very usefull extension that allows to reduce states changes cost, but it's a really new (released with GL3.0) and I did'nt expected it tobe promoted yet.

The deprecation model is a design mechanism introduced in GL 3.0 to allow to remove outdated features and commands. (the reverse of the extension mechanism). Core features are first marked as deprecated, then moved to an ARB extension, then eventually to an EXT or vendor extension, or removed entirely. The OpenGL 3.0 specification marks several features as deprecated, including the venerable glBegin/glEnd mechanism, display lists, matrix and attribute stacks, and the portion of the fixed function pipe subsumed by shaders (lighting, fog, texture mapping, and texture coordinate generation).

About deprecation, the specification is available in two formats, one with deprecated features (http://www.opengl.org/registry/doc/glspec31undep.20090324.pdf) and one with only "pure" GL 3.1 features (http://www.opengl.org/registry/doc/glspec31.20090324.pdf). An extension called ARB_compatibility has been introduce. If supported by an implementation, this extensien ensure that all deprecated features are available. This mechnism allows not to break the compatibility for old GL applications, keeping every features in the driver, while cleaning the API and providing new high performance paths. It seems to be a good mechanism, more convinient than the initial idea of creating specific contexts. NVIDIA for instance ensure that they will keep all deprecated features in their drivers to answer customers needs (I think mainly CAD customers).

Once again, like for OpenGL 3.0, while ATI declared that they will support GL 3.1, NVIDIA announced a BETA support of GL 3.1 and released drivers: http://developer.nvidia.com/object/opengl_3_driver.html

To conclude, it's good to the the Khronos/ARB remaining so active since the release of OpenGL 3.0, and it's good to see OpenGL evolving in the right direction :-)

Links:

- The announcement: http://www.khronos.org/news/press/releases/khronos-releases-streamlined-opengl-3.1-specification/

- The specifications: http://www.opengl.org/registry/

- More informations: http://www.g-truc.net/#news0152, http://www.skew-matrix.com/bb/viewtopic.php?f=3&t=4

Write Comment (0 Comments)
 
I3D 2009 and NVIRT
TechnoBlog - GPU
lundi, 02 mars 2009
i3dlogo I have been at Boston last few days to attend the I3D (ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games) conference where I presented our paper on GigaVoxel. The presentation went well and a lot of people seemed to  be interested by our method. I like I3D conference because it's a good opportunity to meet people and to share and discuss about everybody's research. In addition there is usually a lot of GPU guys here ;-)
This year, the invited banquet speaker was Austin Robinson from NVIDIA research group and he announced in exclusivity NVIRT, the NVIDIA Ray-Tracing engine. NVIRT is a low level API layered over CUDA and it seems to use a lot of functionalities of NVSG (NVIDIA scene graph).
 
 
nvidia_logoThe principle is to provide the API with an object scene graph and a ray generator, then it gives you the ray-intersections through a traversal black box. It seems to be quite flexible since intersection shaders can be written, allowing to compute arbitrary shadings or to launch secondary rays (for reflection, refractions or shadows).  Shaders are written in CUDA and the whole API generate PTX assembly. Efficiency strategies can be defined for rays,  they can be configured to can return the closest intersection or to terminate on the first intersection, found that is usefull for shadows computations. Different acceleration structures seem to be avalaible to store objects, like kd-trees for static objects and BVH (Bounding Volume Hierarchy) for dynami ones. This SDK seems to be thought quite generic to allow more than only ray-tracing rendering  (like collision detection, illumination or why not AI. The SDK should come with a lot of samples.
 
Since it runs on CUDA, it inherits the limitations from this one, like the cost of context switch between graphics API and CUDA and the current impossibility to share textures or render targets directly. That will limit in a first tim ethe usability of the API for mixed algorithms, but It seems to be a really cool toy to test ray-tracing algorithms, and will provide to NVIDIA a good black box to enhance ray-tracing support in their future hardware.
 
EDIT: It seems to take a little time before something appears on NVIDIA website, so since Austin shared it's slides with some people present at i3D, Eric Haines put them on it's blog: http://realtimerendering.com/downloads/NVIRT-Overview.pdf
Write Comment (3 Comments)
 
GigaVoxels
TechnoBlog - GPU
jeudi, 18 décembre 2008

raymarching_2008-04-06_18-51-08-43A first post to talk about my work as a PhD student ! I have been doing a PhD for a little bit more than one year now at INRIA Rhone-Alpes in Grenoble, on the rendering and exploration of very (very !) large voxel scenes. I am working more specifically on GPU voxel ray-casting and ray-tracing,  complex GPU data structures and interactive large datasets streaming. I am working in close collaboration with Fabrice Neyret, my PhD supervisor, on that project. Sylvain Lefebvre and Elmar Eisemann are also collaborating on it.

My research webpage can be found here: http://artis.imag.fr/Membres/Cyril.Crassin/

 raymarching_2008-04-07_18-28-35-57

Voxel representations are useful for semi-transparent phenomenas and rendering advanced visual effects such as accurate reflections and refractions. Such representation provide also faster rendering and higher quality (allowing better and easier data filtering) than triangle based representation for very complex meshes (typically leading in one or more triangles per pixels).

 

raymarching_2008-04-07_20-30-29-75The first result of that work have been a research report named Interactive GigaVoxels. We now have a paper accepted at I3D 2009 that is better written, more complete and presents our last work and results. It introduced a rendering system we have named GigaVoxels, that is our realtime voxel engine. GigaVoxels is based on a kind of lightwise sparse voxel octree data structure, a fully GPU voxel ray-caster that provides very high quality and real time rendering and a data streaming strategy based visibility informations provided directly by ray-casting. 3D Mip-Mapping is used to provide very-high filtering quality and the out-of-core algorithm allow virtually unlimited resolution scenes rendering.

GigaVoxels : Ray-Guided Streaming for Efficient and Detailed Voxel Rendering

 

Write Comment (4 Comments)
 
OpenCL specification
TechnoBlog - GPU
lundi, 08 décembre 2008

khronos_logoThe first specification of OpenCL (the Open Computing Language) has just been released by khronos. The specification can be downloaded here:

http://www.khronos.org/registry/cl/

OpenCL is a general purpose data-parallel computing API, firstly initiated by apple and then standardized within Khronos. It provides a common hardware abstraction layer to program on data parallel architectures. It is very (very !) close to CUDA and supported by both NVIDIA and AMD, but also INTEL and IBM (Among others). This API don't targets especially GPUs and we should also see CPUs implementations of it, but also maybe Cell implementation, and Larabee implementation (when it will be out) ? NVIDIA is likely to be one of the first to provide OpenCL implementation, since they already have their own now well-tried CUDA API to build on.

In addition to be cross platforms and cross-architectures, one interesting feature of OpenCL is that it have been designed to provide good interoperability with OpenGL, especially for data and textures sharing. It is meant to be the exact competitor to DX11 compute API.

Write Comment (0 Comments)
 
NVIDIA first OpenGL 3.0 drivers released
TechnoBlog - GPU
jeudi, 14 août 2008

opengl_logo I am at the Siggraph OpenGL 3.0 right now and NVIDIA just gives us the link to their first driver supporting OpenGL 3.0 (with some limitations yet).

The driver can be downloaded here :

http://developer.nvidia.com/object/opengl_3_driver.html 

I will do a complete post on OpenGL 3.0 in a few days, but as you may know it is not the OpenGL 3.0 we were waiting for, since the complete rebuild of the API have not been done (the main idea of Long Peak). But it don't seems so bad that we could have been afraid of, a deprecation and profiles mechanism have been introduced and everybody here seems to be confident on its usage and capacity to enhance OpenGL evolution speed. 

ATI is also here, and they announced their GL3.0 drivers for Q1 2009... Hum I don't know why I doubt we will get full functional GL3.0 support for them at this date, even if it is already late compared to NVIDIA.  

Write Comment (5 Comments)
 
OpenGL 3.0 signs of life
TechnoBlog - GPU
mardi, 29 juillet 2008

opengl_3_logoWe have been without news of OpenGL 3.0 for nearly 9 months now, a specification was promised for September 2007 at Siggraph last year and then no news after an ARB member reported that the spec was not ready since there was some unresolved issues they had to address. This situation started to becomes really worrying and a lot of speculations have been made on this delay (see opengl.org forum). The most likely thing is that there have been some disagreements inside ARB members that delayed the spec release.

But recently we had two comforting news tending to prove that OpenGL 3.0 is not dead. The first thing is the creation of the OpenGL 3.0 website with announcements for the Siggraph OpenGL BOF :

    - "OpenGL 3.0 Specification Overview"

    - "OpenGL hardware and driver plans - AMD, Intel, NVIDIA"

    - "Developer's perspective on OpenGL 3.0 "

The second think is a leeked picture of an NVidia presentation slide about their future drivers release (codename Big Bang 2, see the news here) where can distinguiche a reference to OpenGL 3.0 support for September. This tend to prove that the specification is now almost done and the IHV are already working on implementation.

I will be at Siggraph this year and I will be at the OpenGL BOF, I hope I will be able to the the final spec here ! 

 

Write Comment (0 Comments)
 
The Froggy FragSniffer
TechnoBlog - GPU
vendredi, 07 mars 2008

     For today GPU generation, we know a lot more about the hardware mechanism than for previous GPU generation, in particular thanks to the window open on this hardware by Cuda. But many hardware details are still hidden to the programmer, in particular mechanisms used for primitives rasterisation and fragments shading. As understanding how fragments are scheduled among the G80 processing units is a critical points for the research we do with Fabrice Neyret as part of my PhD, I wrote a small program allowing to investigate this point. 

    The probing tool I wrote is called The Froggy FragSniffer and can be downloaded here (see Readme for details): http://www.icare3d.org/FragSniffer/FragSniffer_0.2.zip

     We also wrote a document that presents our motivations for this work, the experiments we made yet and the results and answers we get: http://www.icare3d.org/GPU/CN08

    We hope this document will give you useful informations. We don't want it to be closed and we want to make it evolve through new experiments and also thanks to exterior comments you can let at the bottom of this page.

Write Comment (0 Comments)
 

| Cyril Crassin | Website based on Joomla | Janvier 2006 |