Case Study: Micron uses data and artificial intelligence to see, hear and feel

Memory-chip maker Micron Technology does more than talk about the many advantages artificial intelligence brings to industry. Using data analytics and AI in its own manufacturing processes, the company literally puts its money where its mouth is, demonstrating the value to business of the technologies Micron enables with its next-generation memory storage and processing solutions. The benefits are many, including higher yields, a safer working environment, and improved efficiencies.

The enterprise’s factories produce memory technologies on silicon wafers through a highly complex and precise process. The potential for error, and for waste, is high—but data and AI are helping to reduce that potential. When relying on human vigilance to spot and track flaws, mechanical problems, and other potential trouble areas, the organization lost money--losses that might have been avoided by using today’s sophisticated technologies.

The Manufacturing Process

Silicon wafers, used as the foundation for computer chips, are made from silica, a type of sand, which must be filtered and refined to 99.999 percent purity. This electronic-grade silicon is melted and compressed into ingots, which are sliced into extremely thin—0.67 mm thick—wafers.

The wafers get polished to remove any marks from the cutting, coated with a thin layer of photo-resistant material, and etched with the design of the circuitry they will be supporting using a process similar to photography. The more complicated the circuitry, the more images imprinted onto the wafer, layer upon layer, with each layer treated separately—blasted with ionized plasma, for instance, a process known as “doping,” or bathed in metals.

The finished wafer is then coated with a thin protective film before being tested (“probed”) to ensure that it works as intended.

The entire manufacturing process can involve some 1,500 steps, and takes place in a sterile clean fabrication rooms designed to prevent even the tiniest speck of dust from falling on the pristine wafers. But damage does occur. The fragile wafers may get scratched, nicked, or punctured, or bubbles may form under the protective film.

Often, these flaws are microscopic and completely invisible to the naked eye. Even when they are visible, people scanning the 30 to 40 photos captured of each wafer during the photographic imaging process can overlook defects due to eye fatigue or momentary inattention. Blink, and they’ve missed it.

When problems aren’t caught until the “probe” phase, much time and money has already been wasted. Chances are, the issue causing the flaw affects more than one wafer—possibly even thousands.

Other things can go wrong in production, as well. Parts wear out; pipes leak, or drip hazardous chemicals onto products or people. Catching and correcting these issues early is imperative: shutdowns are expensive, costing an average of $250,000 an hour, according to Micron experts, and given the complexities of semiconductor manufacturing, the many hours spent in recovery put the true cost in the millions. What is more, the risks associated with worker injuries are multifold.

Detecting problems in products and machinery is paramount for manufacturing efficiency, effectiveness, and safety. Unfortunately, to err is human, and even the most highly trained person will not infallibly see, hear, or feel the very minute and subtle indicators that something is awry.

Artificial intelligence technologies, however, can perform these tasks with laser-sharp precision and in a fraction of the time. Micron collects petabytes of in-house manufacturing data from more than 8,000 sources and more than 500 servers around the world, and adds the information to two different maps of environments in Apache Hadoop for data mining. The organization’s data scientists across these manufacturing networks scour this data for insights to develop models for AI and machine learning to improve and enhance factory processes.

The results, mimicking our senses of sight, sound, and touch, have been impressive—so much so that in 2018 they won Micron a coveted CIO 100 award for IT leadership.

The Sense of Sight: Wafer Imaging

Wafer flaws come in many forms. For the most part, however, they fall into one of a few common categories: tiny holes near the wafer’s edge, scratches, and bubbles in the outer film. Micron’s AI systems use “computer vision” technology to spot these defects on the images the photolithographic cameras capture as they etch circuitry onto the wafers during manufacturing.

Engineers might direct the system to scan for tiny dots (holes) at the wafers’ edges, for instance, or for contiguous or slightly broken lines (scratches), or to seek color variations resulting in dark or light spots or patterns. Some of these flaws can be spotted in near-real time, with the system sounding alerts within 10 seconds after an image is taken. Other defects might be discovered during secondary scans 15 minutes after the photographs are stored. All these processes rely on the AI system’s use of two million images stored in the Hadoop environment for comparison and contrast.

The results have proven far more accurate than engineers’ assessments, Micron IT Director Tim Long says.

“Computer vision has high accuracy and high efficiency,” he says, “and it has up-scaled our engineers’ capabilities. Our engineers can focus on the problem, and on the data collection.”

The results have proven far more accurate than engineers’ assessments, Micron IT Director Tim Long says. “Computer vision has high accuracy and high efficiency,” he says, “and it has up-scaled our engineers’ capabilities. Our engineers can focus on the problem, and on the data collection.”

And with Micron’s AI-Auto-Defect Classification (ADC) system, technicians and engineers no longer need to classify wafer defects manually in Hadoop. Instead, AI-ADC uses deep learning to sort and categorize millions of flaws every year. Micron created this system using the latest imaging techniques available today, including neural networks, described as a biologically-inspired programming paradigm that enables a computer to learn from observational data.

This form of machine learning categorizes images according to their flaws, placing them in discrete Hadoop “clusters.” Not only does this process help engineers to discover what went wrong during manufacturing for an early fix that avoids more defects, but it enables AI systems to find flaws on their own and refine results with each iteration.

“You don’t have to tell system where to look or what to look for,” Micron fab data science manager Ted Doros says. “You give it some examples and tell the neural net, ‘This is what you need to find.’

“This process improves yields by fine-tuning our methods. And the more fine-tuned we get, the fewer issues we’ll have.”

The Sense of Sound: Acoustic Listening

What’s the first sign that your car is having mechanical trouble? Often, it is an unusual noise coming from under the hood. The same holds true in factories, where sounds deemed abnormal can signify a wearing part or imminent breakdown.

Manufacturing plants can be very loud, though, and problem sounds get lost in the noise. Or workers may not spend enough time in one location enough to discern what is “normal” and what is not.

Micron’s AI systems are hearing anomalies in its factory machinery via audial sensors installed near robotic actuators or in proximity to pumps. These microphones make recordings of normal activity for several weeks, and software converts the detected frequencies into graphs or charts depicting the sounds as visual data. When a new pitch or frequency appears, the system will issue an alert. Often it can even discern the cause of the anomaly.

Doros likens the factory, with all its many sounds, to an orchestra, and acoustic listening-enabled machines to conductors.

“You’ve got all these musical instruments going, and when you even get subtle changes in buildup of the chemicals in the line, it’s just like, say, if you have a French horn and the musician opens a valve a little bit, it changes the pitch, and the whole sound.” The audience might miss this change, but the conductor will not.

To set up this “acoustic listening” AI system, Micron engineers created a baseline in Hadoop using the data gathered during the initial monitoring phase. Next, they scanned files for anomalous sounds, and categorized them according to cause, placing them in discrete groups, or “clusters.” The more files gathered, examined, and sorted, the more accurate the results can be, and the more capable the system becomes of detecting and diagnosing unusual sounds and their causes.

Searching these massive databases can be time consuming. When a machine is in danger of a breakdown, however, plant managers need to know in the instant.

Sending the data to a GPU system filled with Micron memory and storage, which has 48,000 processing cores and terabytes of memory, can provide fast, intelligent results—much more quickly than CPU based systems. All these GPU cores and memory working simultaneously and synergistically can refine their results in the blink of an eye with little or no human intervention, and improve their diagnostics with each iteration, similar to the way the human brain works.

“One of the key advantages of a GPU is, a CPU might have two or four processor cores on a single chip, and each core can do one thing at a time. A GPU will have thousands of cores. It can do thousands of things in parallel,” Micron Senior Fellow Mark Helm says. “For an AI workload, that’s exactly what you want.

“You don’t want a CPU to do a very complex machine learning algorithm. A GPU will break it into very small pieces and do it all in parallel, with each of these tens of thousands of cores working simultaneously. GPU processing offers an incredible advantage in the amount of time it takes to execute a decision.”

A Person looking at the computer screen

Thermal imaging: Feeling the heat

Not every malfunction makes noise—and in a manufacturing environment, silence can be deadly. In many instances, a change in temperature occurs, instead. Machinery may heat up, or pumps or pipes may cool down, losing heat to evaporative cooling where leaks occur.

Until recently, the only way to detect a surge in temperature was to see a red glow, sparks, or smoke. By the time these appeared, the problem had already entered the danger zone, and a plant would need to evacuate workers. As already noted, shutdowns are extremely costly, but they are preferable to risking people’s safety.

Cool spots can also indicate trouble, but these show no visible signs. And feeling with hands for thermal fluctuations is as impractical as it is dangerous.

Increasingly, however, artificial intelligence can spot temperature anomalies by analyzing infrared photographs that produce “heat maps” of the factory environment. Micron overlays images created during normal working conditions and places them over a fab’s digital twin, which is a virtual replica of the plant. These maps give AI systems a baseline against which to compare the infrared images. When the systems detect a deviation, they sound an alarm.

Thermal imaging, still in the early stages at Micron, holds enormous potential for cost savings because of its ability to spot oncoming malfunctions early, before machine failure or serious damage occurs. Early detection can make the difference between making a simple repair and replacing an entire, expensive piece of equipment.

What is more, it can play a critical role in protecting workers, a priority for Micron. The company values its team members’ safety over profits—a major reason why it continues to invest heavily in technologies aimed at improving detection of problems before they become hazards.

“If it’s looking and it says, ‘This pump over here has a high risk,’ if it has a thermal runaway or there’s a spark, I want to know right away, and I want to notify the people in the area to evacuate,” Doros says. Early detection of mechanical problems is the primary goal of thermal imaging, but the company also uses the technology to optimize manufacturing systems and processes. System availability is one of the greatest costs Micron incurs in wafer production, Doros says. A system shutdown leaves fewer tools available for wafer manufacture. When the number of wafers produced goes down, the overall cost of running the fab goes up. A tool failure, when undetected, can also cause damage to wafers, which also increases costs.

Ideally, Doros says, Micron would create a thermal image of every tool in every fab, and find in real-time all the places where temperatures are too high or too low. The subsequent fine-tuning would most likely increase yields, resulting in a lower per-wafer production cost.

A Host of Benefits

Using AI to see, hear, and feel in its factories has yielded impressive results for Micron so far:

  • 25 percent faster time to yield maturity;
  • 10 percent increase in manufacturing output, and
  • 35 percent fewer quality events.

And the benefits of data analytics and AI extend beyond the fab to every aspect of Micron’s operations: sales and marketing, human resources, business operations, research and development, and more.

“This is about transforming the enterprise, not just the shop floor,” Doros says. “We can bring these techniques and methods to all the business processes within the company.”

“This is about transforming the enterprise, not just the shop floor,” Doros says. “We can bring these techniques and methods to all the business processes within the company.”

Deep learning, for instance, has significantly improved forecasts regarding product demand, increasing accuracy within 10 to 20 percent, Doros says.

The company’s main focus for artificial intelligence and data analytics lies in its industrial processes, however, and the promise of its fabs running as truly “smart” cyber-physical systems with minimal human intervention.

As technologies such as 5G cellular networks, virtual and augmented reality, the internet of things, and AI and data analytics progress ever more rapidly—developments aided by Micron’s own memory and storage solutions—the fulfillment of that promise grows ever nearer.

“AI includes a lot of things,” Long says. “Really, it describes diagnostic capabilities, and how we create them using machine learning algorithms. We’re reproducing the human senses—hearing, touch, sight—by giving algorithms data, and using history as context to teach our systems. The machines will then observe and learn the patterns, so that they can make conclusions on their own.”