Load-Testing-101-eBook-CTA-1

How Micron’s Automata Promises to Improve Parallel Processing

automata-processor-micron

The most common application for Micron’ new chip might be Big Data, but it can be used anywhere a complex or unstructured data stream needs analysis.

Micron Technology is not the first company that comes to mind when you think CPUs. Perhaps it’s not even the third or fourth to come to mind, because Micron is a memory company. Still, you should not think of the Micron Automata Processor (AP) announced at Supercomputing ’13 last November as a CPU, either, because it’s not. Nor is it a memory device. Think of the AP more as a powerful processing “engine” that leverages the massive parallelism found within memory technology in order to provide significant scalability to parallel processing.

The AP is geared for analysis of large, unstructured data sets or real-time data analysis challenges. As a result, it is targeted at high-performance computing applications such as graph processing, big data, and bioinformatics.

Forget quad-core or even octo-core processors. The Automata processor is a scalable, two-dimensional fabric comprised of thousands or even millions of application-specific compute machines, called Automata, which operate in parallel to perform a targeted task or operation. So instead of four or eight cores brought to bear on a processing task, you have thousands, and all of them can be programmed for a specific task.

The AP design, then, is not for a whole new CPU socket, although there is a prototype that is built on a PCI Express card. Micron put eight processors on a memory module that fits in a DIMM socket, so you have a module that can perform processing offload for the CPU. The Automata processor sits on the memory doing the processing; the CPU has very little involvement.

“Many people ask ‘Why Micron?’” said Paul Dlugosch, Director of Automata Processing development at Micron. “This is the first example of a processing device that at its core is based on memory technology or memory architecture. The way to explain that is not how cache memory can support a CPU, but how we are using memory in a fundamentally new way. With the Automata Processor, we don’t use memory as a traditional read/write storage device. Rather, memory is used as the basis of a processing engine that analyzes information as it streams across the chip.”

“The sequential instruction processing nature of conventional CPU/GPU architectures is not well aligned to the class of problems addressed by the AP,” said Dlugosch. “The fundamental problem is that of fine-grain parallelism. You have to understand the application requirements across a variety of domains. Any scalar conventional processor based on sequential instruction architecture is really where we saw the problem.”

A traditional CPU has an execution pipeline that decodes instructions, executes them, then unloads registers after execution is completed. The CPU performs operations based on the instruction as they are being processed. The AP doesn’t have a fixed execution pipeline. Instead, its 2D fabric of tiny processor elements answers thousands or even millions of different questions about data at the same time for massive parallelism.

In addition to handling parallelism, the traditional method of rule sets is limiting. Dlugosch said that if you are looking for one feature in a data set (say a specific protein sequence or maybe a cyber security threat) you have a relatively easy problem; conventional CPUs are quite adept at addressing that issue. A single pattern is not a highly parallel problem. But if you have to process a data stream and evaluate it for tens, hundreds, or thousands of different features, that becomes a highly parallel problem for which conventional processing architectures are not well suited

“The more features you look for, the more memory is consumed. You can quickly exceed practical memory limitations, or you get such a large data structure [that] you get memory access problems,” said Dlugosch.

Examples include analyzing data coming across the wire for a multitude of malware attacks, or scanning tweets based for certain features that help develop predictions about social trends or social unrest. These scenarios require multiple rules and they can get away from you fast.

Reconfigurable, reprogrammable

The Automata chip is more like an FPGA than a CPU in that it is reconfigurable. And because of this, it can take on the exact configuration that is best suited to solve the problem at hand. “With the Automata Processor, you don’t write a program of instructions. You configure it by compiling a program; and from that point forward it is an autonomous machine,” said Dlugosch.

Plus, because the chip becomes what the user defines, it doesn’t need to be told what to do next. Automata is a self-operating machine that is driven only by the data it receives, not by instructions. The data flowing through the machine drives the operation. So the programmer configures it to examine all of the data coming in, and as soon as data comes into the machine, Automata sets about doing what it was instructed to do, such as pattern matching.

Micron has a full function SDK to accompany the AP. The SDK is designed to take a user-defined pattern rule set or analytic definitions, compile it, and configure the chip to implement the exact machine requirements to process or analyze the data.

The Automata processor can be configured with either a list of regular expressions in PCRE or a direct description of the automata in an XML-based high-level language the company created, called the Automata Network Markup Language (ANML). PCRE will be accepted natively and unmodified into the compiler and can configure the chip that way. But ANML exploits all the architecture’s features. It allows end users to perform graphical design such as schematic capture, or to design highly complex automatons that can perform highly-detailed data set analysis.

Implementation

The AP uses a DDR3-like memory interface chosen to simplify the physical design-in process for system integrators. The AP will be made available as single components or as DIMM modules. A PCIe board that is populated with up to 48 AP’s will also be available to early-access application developers.

It’s coming soon, but you can’t get your hands on the AP quite yet. Micron is making silicon now but a revision is planned; so don’t expect hardware samples until the  second half of 2014.

See also:

 

subscribe-1

Comments

  1. Why do you pretend to be so stupid. My operating system works on x86_64. It runs ring-0-only and identity-mapped. Obviously that is why it is so fast. Do I have to spell it out? Yes. You pretend to be a fucken retard. I don’t have to change privilege levels and I never change address maps. I can change contexts 4,500,000 times a second per core on a core i7. You have 16 registers to store and load and you have a FPU state to store and load. It has a 3.4Ghz clock. It does a couple administrative things, but basically storing a few registers as a sub microsecond operation.

    You are criminally stupid and incompetent. Fucken ten years of this shit!

    • BigData says:

      It take one stupid person to knew one.
      Go home and study before take such word in your mouth.
      You seems not knowing the first thing about parallelism.

Trackbacks

  1. Micron says:

    […] June 5, 2014 By atcl If 's Automata (enabling computations inside memory) work as promised, GPUs and MICs have a new competitor in the scientific setting. Think about large-scale matrix multiplication without massive memory operations. #micron #yukon #automata #iram #anml  How Micron’s Automata Promises to Improve Parallel Processing […]

Speak Your Mind

*