GPU Express

Large data solutions…simplified

The Andor GPU Express library has been created to simplify and optimize data transfers from camera to a CUDA-enabled NVidia Graphical Processing Unit (GPU) card to facilitate accelerated GPU processing as part of the acquisition pipeline. GPU Express integrates easily with SDK3 (Windows) for Andor sCMOS cameras, providing a user-friendly but powerful solution for management of high bandwidth data flow challenges; ideal for data intensive applications such as Light Sheet Microscopy, Super-Resolution Microscopy and Adaptive Optics.

Why consider real time GPU processing?

  • Real time feed-back and improved visualization
  • A platform to save on memory resources via the use of standard compression algorithms and/or the removal of redundant information at acquisition
  • The ability to free up disk space to store only relevant data and reduce size of copies required to disk

GPU Benefits

  • Enhanced convenience, afforded by simple, optimized GPU data management
  • Guaranteed optimal data throughout
  • Accelerated real time processing frame rates
  • Superb, easily accessible documentation

GPU Express

Large data solutions…simplified

The Andor GPU Express library has been created to simplify and optimize data transfers from camera to a CUDA-enabled NVidia GPU card to facilitate accelerated GPU processing as part of the acquisition pipeline. GPU Express integrates easily with Andor’s SDK3 (Windows) library, providing a user-friendly but powerful solution for management of high bandwidth data flow challenges.

GPU offers the advantages of:

  • Enhanced convenience, afforded by simple, optimized GPU data management
  • Optimal data throughout
  • Accelerated real time processing frame rates
  • Superb, easily accessible documentation and examples

It is possible to send data to a GPU card for processing without the GPU Express library, via GPU processing library functions (with Nvidia cards, this would be achieved by using the CUDA library). However, for this to occur, the user is required to explicitly manage the buffers required to hold data on the CPU and the GPU. They also have to copy the data from and to the GPU, via the CPU.

The GPU Express library provides a simpler solution to the user, via the management of all required CPU and GPU buffers to hold and pass the acquired data. The copy functions of the GPU Express library are also optimised to reduce latency during the copies from and to the GPU to achieve acceleration of real time processing frame rates for a given GPU card. As such, non-expert users can achieve this optimisation, and more expert users can focus on the algorithms to be run on the GPU, and use a simplified API for their optimised copies.

Features and Benefits
✔ Simple API – Provides a clean interface that integrates easily with Andor’s SDK3 for Windows library, reducing development time.
✔ Accessible and thorough support manual, including tutorial and multiple user scenario examples.
✔ Management of all required buffers in the GPU memory space.
✔ Management of intermediate CPU buffers for copy to GPU memory from camera.
✔ Provision of functions for safe allocation and de-allocation (and locking/unlocking) of user output buffers on the CPU side to store the result of GPU processing.
✔ Provision of functions for safe locking/pinning and unlocking/unpinning of user output buffers on the CPU side (as required for asynchronous memory copies) to store the result of GPU processing.
✔ Support for multiple camera acquisition.
✔ Explicit Synchronisation call, to ensure that all previous copies and processing within a specified GPU Express Path are complete before continuing.
✔ Multiple copy functions - copies to/from GPU and CPU as required for a particular application.
✔ Management of CUDA streams - provides overlapping of copies to/from the GPU memory with accelerated GPU processing.
✔ Facilitates multiple GPU processing.
✔ A CUDA accelerated version of Andor’s unpacking library (CUDA_atunpackerlib.lib) is also provided for fast conversion between Pixel Encoding types and to provide unpacking of data due to granularity restrictions on the row width, as is the case with Camera Link frame grabbers.
✔ Includes Nearest Neighbour De-blurring library:
  • The library provides a number of functions implemented in CUDA to apply the well-known ‘Nearest Neighbour’ and ‘No Neighbour’ algorithms for 3D and 2D input microscopy datasets respectively.
  • The output from an intermediate de-noising stage of the algorithm is also provided as an option to the end-user.
  • The Nearest/No Neighbour algorithms are based upon improving an image by sharpening the edges of structures, using a sharpening filter called an ‘Unsharp Mask’.

GPU Express Path

  • Provides a means to encapsulate a set of resources managed by the library to aid in the acquisition of a dataset
  • Each GPU Express Path references a unique ‘Input CPU Buffer Array’, ‘Input GPU Buffer Array’, and CUDA stream managed by the library
  • Each may also reference an optional array of ‘Output GPU Buffers’, and an optional array of ‘Output CPU Buffers’
  • Data from either the Input GPU Buffer or an Output GPU Buffer may be copied into an Output CPU Buffer via the GPU Express library API functions
  • Copies between buffers within the library are only allowed between buffers within the same GPU Express Path. Each copy within a GPU Express Path takes place within the same CUDA stream
  • As each GPU Express Path has its own unique CUDA stream, this provides us with the possibility to concurrently acquire and process multiple datasets within separate GPU Express Paths
  • Aids synchronisation, as all operations within a CUDA stream are synchronous with respect to each other and guaranteed to execute in the prescribed order. Therefore all operations within a GPU Express Path take place in sync.
  • As each GPU Express Path has its own unique CUDA stream, this provides us with the possibility to concurrently acquire and process multiple datasets within separate GPU Express Paths.
  • When utilising the library with the SDK3, a GPU Express ‘Set’ can be readily configured to manage a number of GPU Express Paths, facilitating performance optimization.
  • Example layout of a GPU Express Path containing an Input CPU Buffer, Input GPU Buffer, 2 optional Output GPU Buffers and 2 optional Output CPU Buffers. All copies between any buffers in a GPU Express Path take place within the same (library managed) CUDA stream.
  • We may have varying numbers of each buffer type, and varying sizes between buffer types, for example:
  • Example layout of a GPU Express Path containing 1 Input CPU Buffer, 1 Input GPU Buffer and 1 Output CPU Buffer
    Example layout of a GPU Express Path containing 1 Input CPU Buffer, 1 Input GPU Buffer, 1 Output GPU Buffer and 1 Output CPU Buffer. The output buffers in this case are of a different size to the input buffers.
  • We may have varying numbers of each buffer type, and varying sizes between buffer types, for example:
  • Example layout of a GPU Express Path containing 1 Input CPU Buffer, 1 Input GPU Buffer, 2 Output GPU Buffers and 2 Output CPU Buffers. The output buffers in this case are of a different size to the input buffers, and the 2nd buffer in each output array is of a different size to the 1st buffer in the output arrays.
    Example layout of a GPU Express Path containing 2 Input CPU Buffers, 2 Input GPU Buffers, 1 Output GPU Buffer and 1 Output CPU Buffer. The 2nd buffer in each input array is of a different size to the 1st buffer in the input arrays in this case.
Customer Testimonials
"I'm really impressed. As with the Andor SDK3, you provide a clean interface, and the library is well designed. I'm particularly amazed about the documentation. Reading through it, it all made complete sense. It seems that lots of the tedious Cuda buffer management should be greatly simplified using GPU Express."

Dr Benjamin Schmid, Huisken Lab, MPI of Molecular Cell Biology and Genetics, Dresden, Germany
Brochure Downloads

Software User Guide