Monday, May 20, 2013

NVIDIA CUDA™ Parallel Programming

  • NVIDIA CUDA™ Parallel Programming

This post is about NVIDIA CUDA Parallel Programming. You will know what CUDA is, a historical background, along with links for downloads and more information can be obtained there.

NVIDIA CUDA™ Parallel Programming header

What is CUDA?
CUDA™ is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).
With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for GPU computing with CUDA. Here are a few examples:
  • Identify hidden plaque in arteries: Heart attacks are the leading cause of death worldwide. Harvard Engineering, Harvard Medical School and Brigham & Womens Hospital have teamed up to use GPUs to simulate blood flow and identify hidden arterial plaque without invasive imaging techniques or exploratory surgery.
  • Analyze air traffic flow: The National Airspace System manages the nationwide coordination of air traffic flow. Computer models help identify new ways to alleviate congestion and keep airplane traffic moving efficiently. Using the computational power of GPUs, a team at NASA obtained a large performance gain, reducing analysis time from ten minutes to three seconds.
  • Visualize molecules: A molecular simulation called NAMD (nanoscale molecular dynamics) gets a large performance boost with GPUs. The speed-up is a result of the parallel architecture of GPUs, which enables NAMD developers to port compute-intensive portions of the application to the GPU using the CUDA Toolkit.
Background

GPU Computing: The Revolution

Computing is evolving from "central processing" on the CPU to "co-processing" on the CPU and GPU. To enable this new computing paradigm, NVIDIA invented the CUDA parallel computing architecture that is now shipping in GeForce, ION, Quadro, and Tesla GPUs, representing a significant installed base for application developers.

In the consumer market, nearly every major consumer video application has been, or will soon be, accelerated by CUDA, including products from Elemental Technologies, MotionDSP and LoiLo, Inc.
CUDA has been enthusiastically received in the area of scientific research. For example, CUDA now accelerates AMBER, a molecular dynamics simulation program used by more than 60,000 researchers in academia and pharmaceutical companies worldwide to accelerate new drug discovery.

In the financial market, Numerix and CompatibL announced CUDA support for a new counterparty risk application and achieved an 18X speedup. Numerix is used by nearly 400 financial institutions.

An indicator of CUDA adoption is the ramp of the Tesla GPU for GPU computing. There are now more than 700 GPU clusters installed around the world at Fortune 500 companies ranging from Schlumberger and Chevron in the energy sector to BNP Paribas in banking.

And with the recent launches of Microsoft Windows 7 and Apple Snow Leopard, GPU computing is going mainstream. In these new operating systems, the GPU will not only be the graphics processor, but also a general purpose parallel processor accessible to any application.

There are multiple ways to tap into the power of GPU Computing, writing code in CUDA C/C++, OpenCL , DirectCompute, CUDA Fortran and others.

It is also possible to benefit from GPU Compute acceleration using powerful libraries such as MATLab, CULA and others.

Many widely adopted commercial codes have been developed to use GPU Computing, please visit our vertical solutions page  to find out if the software that you use has already been ported.



Youre faced with imperatives: Improve performance. Solve a problem more quickly. Parallel processing would be faster, but the learning curve is steep – isnt it?

Not anymore. With CUDA, you can send C, C++ and Fortran code straight to GPU, no assembly language required.

Developers at companies such as Adobe, ANSYS, Autodesk, MathWorks and Wolfram Research are waking that sleeping giant – the GPU -- to do general-purpose scientific and engineering computing across a range of platforms.

Using high-level languages, GPU-accelerated applications run the sequential part of their workload on the CPU – which is optimized for single-threaded performance – while accelerating parallel processing on the GPU. This is called "GPU computing."

GPU computing is possible because todays GPU does much more than render graphics: It sizzles with a teraflop of floating point performance and crunches application tasks designed for anything from finance to medicine.

CUDA is widely deployed through thousands of applications and published research papers and supported by an installed base of over 300 million CUDA-enabled GPUs in notebooks, workstations, compute clusters and supercomputers.

Visit CUDA Zone for examples of applications in diverse vertical markets… and awaken your GPU giant.

History of GPU Computing

The first GPUs were designed as graphics accelerators, supporting only specific fixed-function pipelines. Starting in the late 1990s, the hardware became increasingly programmable, culminating in NVIDIAs first GPU in 1999. Less than a year after NVIDIA coined the term GPU, artists and game developers werent the only ones doing ground-breaking work with the technology: Researchers were tapping its excellent floating point performance. The General Purpose GPU (GPGPU) movement had dawned.

But GPGPU was far from easy back then, even for those who knew graphics programming languages such as OpenGL. Developers had to map scientific calculations onto problems that could be represented by triangles and polygons. GPGPU was practically off-limits to those who hadnt memorized the latest graphics APIs until a group of Stanford University researchers set out to reimagine the GPU as a "streaming coprocessor."
In 2003, a team of researchers led by Ian Buck unveiled Brook, the first widely adopted programming model to extend C with data-parallel constructs. Using concepts such as streams, kernels and reduction operators, the Brook compiler and runtime system exposed the GPU as a general-purpose processor in a high-level language. Most importantly, Brook programs were not only easier to write than hand-tuned GPU code, they were seven times faster than similar existing code.

NVIDIA knew that blazingly fast hardware had to be coupled with intuitive software and hardware tools, and invited Ian Buck to join the company and start evolving a solution to seamlessly run C on the GPU. Putting the software and hardware together, NVIDIA unveiled CUDA in 2006, the worlds first solution for general-computing on GPUs.


Getting Started - Parallel Computing

Whether you are porting an existing application, designing a new application, or just want to get your work done faster using the applications you have, these resources will help you get started.

GPU Acceleration in Existing Applications

Many scientists, engineers, and professionals can realize the benefits of parallel computing on GPUs simply by upgrading to GPU-accelerated versions of the applications they already use.  Examples include LabVIEW, Mathematica, MATLAB, and many more…

Developing Your Own Applications

If you are developing applications or libraries, first decide whether you want to take advantage of existing libraries that are already optimized for parallel computing on GPUs.  If all the functionality you need already exists in a library, you may simply need to use these libraries in your application.  Even if you know you want to write your own custom code for the GPU, it’s worth reviewing the available libraries to see what you can leverage.
If you will be writing your own code for the GPU, there are many available language solutions and APIs, so it’s worth reviewing the options and selecting the solution that best meets your needs.
Approach Examples
Application Integration MATLAB, Mathematica, LabVIEW
Implicit Parallel Languages PGI Accelerator, HMPP
Abstraction Layer/Wrapper PyCUDA, CUDA.NET, jCUDA
Language Integration CUDA C/C++, PGI CUDA Fortran
Low-level Device API CUDA C/C++, DirectCompute, OpenCL

Once you have decided which libraries and language solution or API you’re going to use, you’re ready to start programming. If you selected a solution provided by NVIDIA, download the latest CUDA Toolkit and review the Getting Started Guide.  There’s also a collection of essential training materials, webinars, etc. on our CUDA Education & Training  page.

Choose Your Development Platform

With over 300M CUDA-architecture GPUs sold to date, most developers will be able to use the GPU you already have to get started.  When you’re ready to test and deploy your applications, make sure you review our current product lines and OEM solutions to select the best systems for your needs.
  • Tesla products are designed for datacenter and workstation computing applications
  • Quadro products are designed for professional graphics and engineering applications
  • GeForce products are designed for interactive gaming and consumer applications

Rounding Out Your Development Environment

There’s a good chance that the debuggers and performance analysis tools you use already have support for GPU development as well.

Need Some Help?

NVIDIA hosts the GPU Computing Developer Forums , where you’ll find a community of expert GPU developers from NVIDIA and around the world.  There are also dozens of domain-specific developer communities at www.gpucomputing.net, so be sure to connect with other developers in your field.
There is a growing ecosystem of consultant firms with expertise in GPU software development.  Check out the list of firms that provide high quality GPU developer training and have done great work for our customers.

ECOSYSTEM

Tools and Training

Today, the CUDA ecosystem is growing rapidly as more and more companies provide world-class tools, services and solutions.

If you want to write your own code, the easiest way to harness the performance of GPUs is with the CUDA Toolkit, which provides a comprehensive development environment for C and C++ developers.

The CUDA Toolkit includes a compiler, math libraries and tools for debugging and optimizing the performance of your applications. Youll also find code samples, programming guides, user manuals, API references and other documentation to help you get started.

NVIDIA provides all of this free of charge, including NVIDIA Parallel Nsight for Visual Studio, the industrys first development environment for massively parallel applications that use both GPUs and CPUs.


Here are some links that might interest you, where you`ll obtain more information and download files:-
http://developer.nvidia.com/cuda-downloads
http://developer.nvidia.com/cuda-toolkit
http://developer.nvidia.com/category/zone/cuda-zone
http://developer.nvidia.com/cuda-education-training



Data taken from http://www.nvidia.com/object/cuda_home_new.html  ,  http://developer.nvidia.com/category/zone/cuda-zone