GPU-based Stimulus Generation

Support for GPU

Starting from version 2.0, Psykinematix brings support for GPU-based stimulus generation by taking advantage of the impressive computing power available in modern graphics cards.

Content

What is GPU Computing?
GPU Generation of Visual Stimuli
Setting up Psykinematix for GPU support
GPU Generation of Custom Stimuli
Pros and Cons

What is GPU Computing?

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for algorithms where processing of large blocks of data is done in parallel. The GPU is nowadays an essential part of modern graphics cards.

GPU in standard graphics cards (from Nvidia, AMD/ATI or Intel) can be programmed using the OpenGL Shading Language (GLSL). GLSL is a high-level shading language based on the syntax of the C programming language. It was created to give developers more direct control of the graphics pipeline without having to use assembly language or hardware-specific languages (like NVidia CUDA or ATI Stream). The GLSL programs (or shaders) utilizes the OpenGL API, and GLSL shaders are compiled and executed on the GPU (Graphics Processing Unit).

Here are some general benefits of using GLSL:

Cross-platform compatibility on multiple operating systems,
The ability to write GLSL programs that can be used on any hardware vendor's graphics card that supports the OpenGL Shading Language,
Each hardware vendor includes a GLSL compiler in their driver, thus allowing each vendor to create code optimized for their particular graphics card's architecture.

Though the detailed description of GLSL is beyond the scope of this chapter and learning GLSL is not at all necessary, knowing how it works is useful to better understand its limitations in the context of stimulus generation. To learn more about GLSL see the official GLSL documentation.

The previous versions of Psykinematix were relying entirely on the fixed function pipeline of OpenGL 1.* where stimuli essentially consisted of precomputed textures stored in video memory:

Starting from version 2.0, Psykinematix is now relying on the programmable pipeline introduced in OpenGL 2.*:

As you can see, some key areas (in orange) of the fixed function pipeline have been replaced with shader functions in the programmable pipeline. Since shaders are actually GLSL programs this provides a great deal of flexibility and power to OpenGL rendering. Shaders are written using a C-like syntax and like C programs they need to be compiled before being loaded and executed on the GPU. Then because the results of the shader execution are sent to the next stage of the OpenGL pipeline, they have an immediate effect on the video output on a frame-basis. This results in quasi real-time update of the video stream and allows stimuli to be generated on the fly.

As shown in the programmable pipeline above, there are different types of shaders (a vertex shader, a fragment shader and even a geometry shader in more recent version of OpenGL). One of particular interest is the fragment shader which is executed for each pixel, particularly suitable for generating 2D visual stimuli. However rather than being executed in a serial loop (as you would do in C or Matlab), the fragment shader is executed for each pixel and often in parallel on the GPU, with each execution running simultaneously on a separate GPU thread. The number of shader processors (or cores) on a graphics card determines how many can be executed at one time. This makes shader programs incredibly efficient, and provides the programmer with a simple API for implementing highly parallel computation. For example, a GeForce GT 650M (found on a recent Mac Book Pro) has 384 of these shader processors, and high-end graphics cards can have more than 1,000 of them.

Visual stimuli that benefit the most from the GPU are those that previously could take either a long time to precompute or a huge amount of memory to be stored, in particular fullscreen stimuli with time-varying properties. For example a non-periodic & chromatic stimulus rendered fullscreen (1280 x 1024 x 75Hz) and presented for a duration of 1 second could require up to 375MB of video memory (for optimal performance texture-based stimuli are stored on the graphics card memory to minimize skipped frames). This is almost the maximum amount of video memory available on a 2 year-old MacBook Air model! With a GPU implementation based on GLSL shader the computation would be almost instantaneous, use virtually no video memory, and require no precomputation time beyond the shader compilation.

There are however some limitations to what shaders can compute because the GPU hardware can be limited in a number of ways:

the maximum amount of instructions a shader can have. It varies from GPU to GPU, and if a shader is too large, compilation will generate an error.
the number of shader processors (more the better),
the shader clock speed (faster the better).

So depending on the graphics card, some stimuli may be easier to generate in real-time (i.e. at frame rate) than others. The next sections detail Psykinematix GPU support, its benefits and limitations.

GPU Generation of Visual Stimuli

Memory and CPU usage considerations:

Imagine you want to present some spatiotemporal and chromatic stimulus with a diameter of 13 degrees of visual angle for a duration of 10 seconds from a distance of 57 cm on a 17" display with a resolution of 1280 x 1024 and a frame rate of 60 Hz. This would require to generate and display 600 frames with a resolution of about 512 x 512 pixels, each pixel encoded with four 8-bit components (red, green, blue, alpha). In the worse case scenario where each temporal frame of the stimulus sequence is unique, the amount of data to generate would be about 600MB.

In Psykinematix and without GPU support, this stimulus sequence would be:

precomputed as 2D arrays, which may take some significant time depending on the CPU speed (e.g. about 5 seconds on a recent MacBook Pro for a drifting grating even if accelerated through vectorial optimization),
pre-stored as a sequence of textures directly in the video memory of the graphics card (a recent MacBook Pro may have a NVIDIA GeForce GT 650M graphics card with "only" 512 MB of video memory),
and presented at run-time at the specified display refresh rate (i.e. 1 texture/frame).

There are some obvious problems with this texture approach:

waiting several seconds for the stimulus generation may be impractical, in particular if this stimulus sequence needs to be regenerate for each experimental trial,
the graphics card may not have enough memory to store the whole texture sequence,
then if the texture sequence is not readily available to the graphics card, stimulus frame may be displayed with some delay or even skipped resulting in a bad timing of the stimulus presentation.

Note that if the stimulus generation requires more video memory than available, the operating system will not fail in general but use the main memory (RAM) or even disk space as temporary storage and will transfer the data to the video memory on-demand. However this would negatively impact the stimulus timing if it takes more than 1 frame duration to perform the data transfer which would result in skipped frames. Such bad timing in stimulus presentation must be avoided at all costs, this is why Psykinematix enforces the use of the video memory to store precomputed stimuli and informs the user that the stimuli memory requirements are too large relative to the available video memory at the precomputation stage.

Without resorting to some clever work-around to minimize the memory usage and precomputation time as described in the "Tips & Techniques" chapter, even the most basic stimulus such as a drifting chromatic grating could fill the entire video memory. Most of the time, Psykinematix users would rather resort to adjusting some parameters in their experimental design (i.e. reduce the size or duration of the stimulus) or even modifying the design so the stimulus size and computation time get more in line with the hardware and other experimental requirements. These memory and cpu usage problems could greatly limit the complexity of experiments designed by vision scientists: for example, they may prevent the use of long adapting dynamic stimuli or constrain to use periodic stimuli which may even be incompatible with the actual goal of the experimental paradigm!

Thinking about getting a graphics card with more video memory? A fullscreen version of the same stimulus would require 3 GB of video memory which is more than any high-end graphics card available on the market can hold, and this would not solve the problem with too long precomputation time. Until now getting a more powerful computer was the only solution to partially alleviate the problem of generating complex, dynamic and long visual stimuli. Here comes GPU computing through GLSL shaders: if you are already familiar with Psykinematix' custom stimuli, you may simply think of (fragment) shaders as similar "mini-programs" applied at each pixel position but, rather than being executed sequentially on the CPU to generate the 2D stimulus array, they are executed in parallel directly on the GPU.

With GPU support in Psykinematix, the same stimulus would be:

precompiled as a GLSL program and loaded on the GPU as a fragment shader (takes less than 10 ms for the same grating stimulus),
and executed at run-time as a GPU shader at frame rate.

This GPU implementation based on GLSL shader solves both the memory and cpu usage problems that hinder the texture approach while maintaining a good presentation timing: it requires no precomputation time beyond the shader compilation, uses virtually no video memory, and the stimulus generation can be almost instantaneous.

GPU Support in Psykinematix:

Psykinematix v2.0 brings GPU support for visual stimuli that benefit the most of it, that is:

We have written optimized shaders (i.e. GLSL programs that run directly on GPU) for grating- and checkerboard-based stimuli. Since the custom stimuli already use Matlab/C-like expressions they can now be used directly as GLSL programs that are compiled on the fly, and automatically loaded and executed on the GPU.

So using the GPU shaders in Psykinematix does not require any additional knowledge: you continue specify your stimuli as you used to do, and write custom stimuli using the same Matlab-like mathematical expression (though you still need to be aware of some differences in functions set between the "Texture" mode and the GPU mode, detailed in the "GPU Generation of Custom Stimuli" section below).

Note that not all stimuli can benefit from GPU whether because of compatibility issues or because there is no direct benefit. If the GPU cannot be used then the stimulus generation automatically falls back on the CPU as in the previous versions of Psykinematix.

You can check how well your graphics cards can accommodate your stimulus complexity by previewing the stimuli in GPU mode by clicking on the "Preview" button (note that the small preview inset always shows the texture version): the GPU-based stimulus is then previewed inside a resizable window with an indication of the achievable frame rate relative to the display frame rate. If the stimulus generation is not limited by the GPU power, then the indicated frame rate should approximately match the display refresh rate. A significantly lower frame rate compared to the display frame rate would indicate that the stimulus complexity is surpassing the GPU computing power. Note that the GPU preview is updated whenever a change is made to the stimulus parameters and that the preview window can be set to fullscreen by clicking the green button in its top-left corner.

Setting up Psykinematix for GPU Support

Psykinematix' use of GPU is mostly transparent to the end user, you simply have to click on one of the available check boxes:

in the Timing Preferences to activate GPU globally (i.e. for all experiments and stimuli):
in the Experiment Display Settings to activate GPU for a specific experiment (has not effect if the global GPU setting above has been activated):
in the Rendering Control Settings to activate GPU for a specific stimulus (has not effect if one of the GPU settings above has been already activated):

When activated Psykinematix will use the GPU whenever possible to generate the visual stimuli. If the GPU cannot be used because the compilation of the GLSL program fails or because the stimulus does not support it (see next section), Psykinematix will simply fall back on the non-GPU implementation (i.e. in texture mode).

GPU Generation of Custom Stimuli

Psykinematix' custom stimuli use Matlab-like expressions to generate virtually any kind of visual stimulus that can be described analytically. In the texture mode, these expressions are already highly accelerated through vectorial optimization thanks to the SIMD instruction sets like AltiVec or SSE found in modern CPUs. But because these expressions are evaluated on a pixel-basis and also use a syntax very similar to the C language, they are particularly suitable for a GPU implementation. They are actually so similar to GLSL programs that they can be compiled as such without any changes most of time.

Because this is essentially transparent to the user, you can check whether your stimulus can be implemented as a GLSL shader and how well your graphics cards can accommodate its complexity by previewing the stimuli in GPU mode by clicking on the "Preview" button (note that the small preview inset always shows the texture version): the GPU-based stimulus is then previewed in a resizable window with an indication of the achievable frame rate relative to the display frame rate. If the stimulus generation is not limited by the GPU power, then the indicated frame rate should approximately match the display refresh rate. A significantly lower frame rate compared to the display frame rate would indicate that the stimulus complexity is surpassing the GPU computing power and that you may have to further optimize the expression.

Note that the GLSL program is recompiled any time a change is made in the expression, which is immediately reflected in the GPU preview as well as when changing values in the parameter table.

If the stimulus cannot be implemented (compiled) as a GLSL shader, then an error message will explain the cause and you should revise the expression to solve the problem (see below to learn mode about the differences in function sets between the texture and GPU modes). In case of compilation error, Psykinematix will automatically fall back to texture generation on the CPU as in previous versions of Psykinematix.

Useful tips when using the GPU mode:

Use the time variable directly in the expression. The GPU preview will show the resulting dynamic stimulus in real-time.
Set the size parameter to 0 to generate a fullscreen stimulus.

Differences between texture and GPU modes:

There are several difference between the "Texture" mode and the GPU mode you should be aware of:

in syntax, though the differences are fairly small and Psykinematix corrects most of them automatically.
in supported built-in variables: in addition of the built-in variables x, y, r and theta you can now use time directly in the GPU expression.
in functions set : some functions available to generate textures cannot be implemented in GLSL shaders.

Psykinematix actually uses only a subset of the GLSL v1.10 functions (see www.opengl.org/sdk/docs/manglsl/xhtml for an exhaustive list) and tries to keep the use of GLSL for stimulus generation as transparent as possible to the end user: for this purpose Psykinematix implements also some additional GPU functions used by the texture mode but absent in GLSL v1.10 as well as some useful functions from GLSL that were previously absent in the texture mode like step( ) and hsmoothstep( ). Psykinematix even adds some entirely new functions like the pnoise() to generate Perlin noise (an approximation of Gaussian filtered noise well adapted to a GPU implementation).

Unsupported functions:

Most of the functions available in custom stimuli in texture mode are also supported in GPU mode (see the custom stimuli syntax and mathematical expressions sections for a list of functions and operators). However there are some functions that are not supported in GPU mode because the parallel execution of the fragment shader would require multiple passes to access neighboring pixels for filtering or comparison purpose then greatly reducing the ability to render each frame without skipping any (i.e. so far Psykinematix only supports functions whose GLSL implementation requires a single pass in the OpenGL pipeline):

Matrix operations: shift( ), dshiftp( ), conv( ), phase(), mag( ), ifft( ), norm( ), scale( )
Matrix to Scalar operations: min( ), max( ), mean( ), sum( ), length( )
Special functions: gnoise( ), bessj0( )

These functions are not available in GLSL as they can be quite hard to implement directly on the GPU without affecting the rendering performance. We recommend trying to use a more direct approach to generate your stimuli instead. Sometimes an approximation is sufficient: for example Gaussian filtered noise can be approximated using Perlin noise that can be generated using the new built-in pnoise() function. See the texture and GLSL versions of the "Center-Surround Noise" demo.

Errors emitted in GPU mode:

GLSL is quite sensitive to the type of operands and generally does not allow operations with operands of different types. Casting to the same type is often mandatory as indicated by the error message:

Incompatible types (float and bool) in assignment

The '<' operator is a logical operator that returns a boolean value, and a boolean value cannot be assigned to a float variable. Use the new step() function instead of < , > operators or surround the "val1 < val2" sub-expressions with the float() casting function like float(val1 < val2). Note that Psykinematix does its best to automatically detect the need for casting, so this error should be rare.
'^' does not operate on 'float' and 'float'

The ^ operator is not compatible with the GLSL language. Replace x^y with pow(x,y). Psykinematix does its best to automatically replace ^ with pow(,) , so this error should be rare.

Important Note: GLSL results for pow(x,y) are undefined for x < 0 (calculators typically return an error with the x^y operation for x<0)
'||' does not operate on 'float' and 'float'

The '|' operator is a logical operator that operates only on boolean operands. Use a comparison operator to cast the float operand to a boolean value.
'*' does not operate on 'bool' and 'float'

The '*' operator applies only between operands of same type. Cast the operands to the same type.

Here is an example of errors that could occur when attempting to generate a custom stimulus in GPU mode that would generate otherwise correctly in texture mode. In Texture mode you could write (see Mapping example in Stimuli/Custom demos):

z = ( radialenv | wedgenv ) * angular * radial

Several errors would be emitted when running in GPU mode:

'||' does not operate on 'float' and 'float'

You would have to cast the radialenv and wedgenv variables to boolean values by using for example a comparison operator (>):

z = ( radialenv>0 | wedgenv>0 ) * angular * radial

Another error will be then emitted:

'*' does not operate on 'bool' and 'float'

You would have to apply the float() casting on the boolean sub-expression:

z = float( radialenv>0 | wedgenv>0 ) * angular * radial

or use separate variables (a variables is always of float type):

env = radialenv>0 | wedgenv>0

z = env * angular * radial

Pros and Cons

The advantages of the GPU mode over the texture mode are clear in terms of computational requirements. In texture mode, stimuli are generated by the CPU in a conventional way where a stimulus 2D array is precomputed and stored in video memory for every frame. In GPU mode, after compiling a GLSL program, the stimuli are generated at run-time directly on the GPU (in parallel on a pixel-basis). Another worth mentioning advantage of GPU-based stimulus generation is its seamless integration in Psykinematix!

Pros:

Real-time stimulus generation
Better timing
Quasi-null usage of memory
Allow very long temporal presentation
Effect of changing parameters is immediate
Seamless integration (not effort needed from the user in most cases, and no need to learn GLSL)
Could be used to map receptive fields in neurophysiological setup
Supported by iOS devices
Automatic fallback to software rendering (i.e. texture-based) if GLSL shader fails to compile or is not supported for a specific stimulus

Cons:

Some complex functions like ifft( ) and conv( ) are not available
Some small difference in syntax (though Psykinematix has work-around for most syntax issues)
The shader complexity depends on the graphics card (frame rate can be tested in GPU preview mode)
Required a relatively modern graphics card (NVidia GeForce FX or better, ATI Radeon 9500 or better, Intel GMA 900 or better)
Some implementations of GLSL functions are hardware-dependent (i.e. slightly different results on different graphics card like unoise( ) )

OpenGL is a registered trademark of Silicon Graphics, Inc. All other brand and product names are trademarks of their respective holders. Any omission of such trademarks from any product is regretted and is not intended as an infringement on such trademarks.