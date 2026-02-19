Researchers at Korea University have created a machine-learning model that can reportedly predict cell efficiency from wafer quality.

“We developed this industrial data-driven machine learning framework using more than 100,000 solar cell data points collected directly from a real mass-production manufacturing line,” the research's lead author, Seungtae Lee, told pv magazine. “The goal is to enable data-based decision making and intelligent automation in photovoltaic manufacturing.”

“While interest in applying artificial intelligence (AI) to manufacturing has grown rapidly, practical implementations in photovoltaic production remain limited. By directly leveraging large-scale industrial data, our work demonstrates how machine learning can support autonomous decision making for smart factories, while maintaining human interpretability and operator engagement aligned with the human-centric vision of Industry 5.0.”

The proposed approach is based on three key methodologies: predicting final solar cell efficiency solely from wafer inspection quality data using machine learning models, enabling early wafer screening before fabrication; identifying wafer-specific optimal equipment routes, referred to as “golden paths,” through optimization algorithms to improve production yield and efficiency, particularly for low-performance samples; and enhancing interpretability through feature importance and SHapley Additive exPlanations (SHAP) analyses, allowing engineers to understand the relationship between process variables and performance outcomes.

The framework enables precise wafer screening before further processing. Process path optimization is carried out using the Tree-structured Parzen Estimator (TPE), a Bayesian optimization algorithm that efficiently tunes machine-learning hyperparameters and automatically identifies optimal model settings without exhaustive testing.

The framework also uses the Extremely Randomized Trees (ET) model, an ensemble algorithm for regression and classification, as the objective function.

The study leveraged a dataset of more than 100,000 samples from a PERC solar cell production line using multicrystalline silicon wafers. Aggressive outlier removal was applied through k-means clustering, an unsupervised algorithm that groups data points into clusters based on similarity, combined with efficiency-based filtering to enhance data quality.

The researchers claim that the ET model can achieve high predictive accuracy and offers robustness against noise and high training speed, making it suitable for industrial environments. Defect-related features, including defect area fraction, grain defect area fraction, and dark area fraction, were found to be critical for efficiency prediction.

Moreover, SHAP analysis provided directional insights, identifying thresholds where features begin to reduce efficiency. Wet bench was the process step contributing most to efficiency improvement in optimized “golden paths” for process equipment, improving efficiency, especially for low-performing wafers.

“While the methodology was validated using multicrystalline solar cell production data, it can be adapted to other photovoltaic technologies using the same underlying framework,” said Lee. “In the case of monocrystalline silicon solar cells, the approach is similarly applicable; however, the lack of grain boundaries compared to multicrystalline wafers limits the number of directly measurable quality-related features.”

“A similar methodological framework can be applied to perovskite solar cells,” he concluded.

The new methodology was introduced in “Industrial data-driven machine learning framework for wafer quality-based decision making toward smart solar-cell manufacturing,” published in Energy and AI.

The same research group presented in August a machine learning model for predicting sheet resistance in phosphorus oxychloride (POCl3) doping processes in solar cell manufacturing. It was found to achieve a more efficient and rapid optimization of process conditions compared to conventional and expensive trial-and-error methods used in the PV industry.

“We found that the model's learned representations and predictions are consistent with established physical and theoretical understanding. This provides confidence in the reliability and interpretability of the model in real-world manufacturing environments,” Lee told pv magazine at the time. “We believe that this methodology could be extended beyond solar cell manufacturing to a wide range of industrial processes.”