In the semiconductor fabrication process, yield is negatively impacted by defects that appear systematically within specific patterns of the physical layout design. Those defective patterns are popularly known as hotspots, and they can arise due to various causes.
There are several known approaches of hotspot detection. One uses machine learning (ML), where known hotspot and non-hotspot patterns are used to train the model, which should then be able to predict new hotspots. The objective in ML approaches is to find all potential hotspots and to reduce the overhead of false positives. The model’s ability to correctly classify between hotspots and non-hotspots depends on the coverage of the training data set.
The real-world challenge in training a ML system to classify hotspots/non-hotspots is the imbalanced nature of the problem; the known hotspot patterns are always in the minority class. Another challenge of hotspot classification is correctly classifying non-hotspots that are similar to hotspots. These “hard-to-classify” patterns are ones with high mask error enhancement factor (MEEF), as small variations in the pattern can make it change between hotspot and non-hotspot. These two challenges cause conventional methods of handling imbalanced training datasets to be inadequate to the problem of hotspot detection.
In this paper, we present a flow for a quantified training dataset selection approach and put extra focus on the patterns that are hard to classify due to close similarity with known hotspots. This approach produces improved model accuracy compared to conventional sampling approaches.