What Intel and AMD’s Latest Multicore Development Moves Mean to You

The chip vendors have significant software efforts to help developers best use their multicore processors. Here’s a summary of their latest offerings and how they may impact the software you create.

From the moment they introduced multi-core processors, Intel and AMD have made significant efforts to educate, train, and assist developers in writing multicore applications – for obvious reasons. They have a vested interest in encouraging developers to use multicore hardware; that’s what sells chips.

The two firms have charged forward with their product development cycles, even as developers struggle to get their brains and code wrapped around multiple processing cores. So as Intel and AMD added cores, instructions and functionality, the companies  also upgraded their developer training and products.

In this article, we look at what the two companies have cooking, and what it means to developers and testers. While these two companies don’t agree on much, they have similar strategies: the inclusion of graphics processing cores into the CPUs. But they use the hardware differently.

Intel

AMD announced plans to go the integrated route back in 2006 when it acquired ATI Technologies. But Intel beat AMD to the punch, owing to having much more resources to throw at the effort.

The first processors with an on-board GPU were Intel’s Westmere generation; that technology shipped in early 2010. However, the graphics chips weren’t truly integrated into the CPU. Intel simply put the graphics processor that was on a separate motherboard chip onto the die of Westmere. It was with the Sandy Bridge generation that Intel fully integrated the CPU and GPU, so the GPU cores were another core to the CPU, and not an add-on.

However, you can’t really look at the GPU cores in the new Intel processors as math co-processors, the way that Nvidia is positioning its Tesla GPUs for high-performance computing.

“We’ve been looking at using GPUs for their floating point capabilities. But on Sandy Bridge, for most programmers, the best floating point options are still the ones that fit on the CPU’s floating point unit,” says James Reinders, chief evangelist for Intel software products.

The computational capability of the GPU is still substantial, but Intel believes Sandy Bridge’s graphics shouldn’t be used for anything other than general programming functions. If you want to do any high-performance computing, you still need a GPU or Intel’s regular CPU functions. That’s because Intel’s GPU capabilities just aren’t on the same level as Nvidia or AMD/ATI.

But you can still use Sandy Bridge for general graphics acceleration, like the Aero interface in Windows. To help jump-start these efforts, Intel has the Visual Adrenaline program, an umbrella program for all of Intel’s interesting technologies; it includes things like tools, libraries, code samples, and other collateral to support folks in the visual computing field.

A few key components of Visual Adrenaline include graphics performance analyzers (GPA) and a media SDK. The GPA tool is for game and graphics/visual computing developers to test and validate their applications’ graphics performance and complements Intel’s Parallel Studio 2011 products.

The Parallel Studio 2011 product line covers four major areas of development: design, build and debug, verify, and tune. Through all four steps, the Studio helps optimize the application for multithreading and optimal performance. The new version adds new techniques for building parallelism into an application, a redesigned advisor tool, and new support for Microsoft Visual Studio 2010 and Windows 7.

Intel’s Media Software Development Kit (SDK) 2.0 provides new APIs for building video-driven applications that use the Sandy Bridge graphics. Developers can use Intel codecs or their own codecs as needed. The API offers support for H.264 and MPEG-2 formats as well as VC-1.

Intel’s tools are designed to optimize graphics performance on its own processors, since the location of the GPU is now very different. It used to be on a separate piece of silicon. Now it’s in the CPU. This changes how data must be handled and processed. So the tools are designed to look for bottlenecks or poorly-written code that does not maximize the throughput of data or the use of multithreading.

For non-Sandy Bridge platforms, apps still get hardware-acceleration for video and performance optimization for software-based video encoding and decoding. The tools will make sure the application uses the integrated chipset to its maximum capabilities, which are not on par with Sandy Bridge.

Aside from the fact that Sandy Bridge puts the GPU on the CPU and it talks directly to the CPU, the GPU is a newer, faster generation processor. Even if the Sandy Bridge processor was a separate physical chip, it would be faster and better-performing than prior generations of integrated chipset GPUs.

Intel is also embracing OpenCL, the open graphics library designed for heterogeneous computing. OpenCL was first designed and written by Apple but the company has since turned the library over to the Khronos Group, which also manages the OpenGL library.

Since then, OpenCL is one of the few things that Intel, AMD, and Nvidia have been able to agree on. All three firms, despite being at each other’s throats in the marketplace, are members of the consortium and announced their support.

Intel released an Alpha test version of its OpenCL SDK but has no firm date for the final kit. The kit offers support for the OpenCL 1.1 standard for Intel’s Core processors running on Windows Vista and Windows 7. With the new version, Intel added 64-bit support and full coverage of OpenCL 1.1. In addition to the SDK there is a Performance Analysis and Development Tool along with the GPA tool and Intel’s Vtune performance analyzer.

You’ll also find some new features in Sandy Bridge itself. The Advanced Vector Extensions (AVX) doubles the SIMD floating point performance because it doubled the number of registers over the old SSE instructions, from 128-bits wide to 256 bits.

Also, AVX is more programmable than SSE, says Intel. In the past, instructions had to be lined up to the bus by the programmer. That meant writing one 128-bit instruction, two 64-bit instructions, or one 64-bit and two 32-bit instructions that could run in parallel. That was a major headache and ensured complaints from developers, according to Reinders. With AVX, data elements no longer have to divide the number of registers evenly.

AVX has been out for almost a year, so it’s already supported in some developer tools, and apps can be recompiled to make the most of the new instructions.

AMD

AMD launched its Fusion line of integrated processors, which it calls Accelerated Processing Units (APUs), late in 2010 with the low-end Ontario family of desktop processors. More will ship in 2011. Thus far, like Intel, AMD has not announced server plans, only desktop.

It’s also supporting OpenCL with its OpenCL Zone, where you can find the APU SDK for building, testing, and debugging Fusion applications.

OpenCL lets you create different application kernels, which are farmed out or distributed to the different devices available. Different tasks can go to different devices.

It’s still a bit of heavy lifting to see if one device is done, admits Margaret Lewis, AMD’s director of software product marketing.

“There’s still a lot of work the developer has to do to ensure things are parallelized and parsed out,” Lewis says. “But as these tools mature and you get this base level set of devices to do routines, over time you can mature that (code) so that as the tools get smarter, it can make those [optimization] decisions and load balance those things.”

AMD has been working heavily on OpenCL because it provides a standard way to do both CPU and GPU programming and to utilize different kinds of CPUs and GPUs. “Developers like it because they are not locked in to one kind of CPU or GPU,” says Lewis.

AMD has also released the OpenGL University Kit, a complete course in OpenCL programming spread over 13 lessons. The company also hosts regular webinars for both beginner and advanced programmers.

On the traditional compiler side, AMD continues to work with compiler vendors to make sure their compilers do what they can to make applications more parallel by nature. The company has put its efforts behind Open64, an open source, GPL-licensed compiler derived originally from the SGI MIPS compiler, MIPSPro.

AMD is working with Open64 because it provides the company with an opportunity to do more advanced acceleration, says Lewis. Gcc, the open source compiler that ships with Linux, is available across a wide variety of platforms, so it’s harder to contribute optimizations for a compiler trying to support many different processor architectures.

Open64 is aimed at the Itanium and x86-64 market only. So Open64 is a place where AMD felt it could best help with parallelism. The catch is that Open64 is only available on Linux.

When the Chips are Down

Both firms are dedicated to heterogeneous computing, but they are not oblivious to its limitations. Using a graphics processing core that was previously mounted as a separate chip on the motherboard is no substitute for true GPU computing using a full-blown GPU processor from AMD or Nvidia. Those big chips have hundreds of cores operating in parallel to do massive amounts of mathematical computation, and the GPU cores on Sandy Bridge and Fusion are miniscule by comparison. Intel and AMD know that.

What they have achieved is to put onto the CPU what had previously been a separate motherboard chip, thus reducing latency between CPU and GPU, plus giving it a performance boost. But make no mistake: If you want to do heavy duty visualization, calculation, or anything requiring double-precision mathematics, you still need a dedicated GPU.

Intel eventually will have some entry-level Sandy Bridge processors, but AMD has no server plans for now. Server computing is still something best left to traditional CPU cores, and both firms continue to optimize their tools and libraries to add automatic parallelization when apps are compiled. So those efforts continue as both companies advance their tools and educational programs.

Trackbacks

  1. [...] scalability that worked well in the PC (mostly) and better in the server market doesnt work so great in a smartphone, because the nature [...]

  2. [...] scalability that worked well in the PC (mostly) and better in the server market doesn’t work so great in a smartphone, because [...]

Speak Your Mind

*