A research group from South Korea’s Jeju National University addressed dataset imbalance in photovoltaic modules by employing two complementary augmentation strategies. Specifically, they applied a feature-level synthetic minority over-sampling technique (SMOTE) and image-level stable diffusion (SD) augmentation.
Dataset imbalance occurs when one class in a dataset contains far more examples than another, causing models to favor the majority class and potentially overlook rarer—but important—cases.
“Our study demonstrates that diffusion-based augmentation significantly enhances minority-class detection of dusty panels, while maintaining deployment-level robustness,” corresponding author Raj Kumar told pv magazine. “Unlike prior works that relied solely on traditional oversampling or GAN-based augmentation, our research systematically compares feature-level and image-level imbalance mitigation strategies,” added Kumar. “We also introduced a two-stage synthetic image validation protocol including manual screening plus FID, KID, and perceptual hashing metrics, alongside stratified 10-fold cross-validation and imbalance-sensitive metrics such as F1-score, Cohen’s κ, and MCC.”
SMOTE is a widely used method for addressing class imbalance in datasets. It operates by selecting random samples from the minority class and generating new synthetic samples through combinations with their nearest neighbors. However, this process can disrupt spatial relationships, which are critical for image-based tasks such as dust detection on solar panels.
To address this limitation, the team explored whether stable diffusion (SD) could be more effective. This artificial intelligence (AI) model begins with random noise, which is gradually transformed into an image guided by text prompts. The diffusion model iteratively removes noise until a realistic image of a dusty panel is produced. Generated images are then filtered to remove artifacts and duplicates, and the accepted synthetic images are incorporated into the training dataset.

Image: Jeju National University, Energy Reports, CC BY 4.0
Both approaches – SMOTE and SD – were evaluated against each other, as well as against the original unbalanced dataset. The team used the publicly availableSolNET dataset, which contains polycrystalline silicon modules, labeled as either clean or dusty. The dataset comprises 842 images, originally imbalanced with 502 clean modules and 340 dusty ones. All images were resized to 224×224 pixels.
Three convolutional neural network (CNN) models – VGG-16, ResNet50, and MobileNetV3 – were tested on both the augmented balanced datasets and the unbalanced dataset. In all experiments, the data were split into 80% training and 20% testing, and models were evaluated using accuracy, precision, recall, F1-score, and confusion matrices.
“Several findings were particularly noteworthy,” said Kumar. “ResNet50, for instance, improved from 76.53% accuracy on the imbalanced dataset to 98.87% using SD augmentation. Minority-class detection was nearly perfect, with scores reaching 99%, and both Cohen’s κ and MCC exceeded 0.90.”
Kumar also emphasized that, while SMOTE improved model performance, SD consistently produced superior results by preserving spatial realism. “After training on the balanced datasets, models retained up to 98% accuracy even when tested on the original imbalanced dataset,” he noted. “These findings confirm that realistic, image-level augmentation is critical for improving PV dust detection performance.”
Looking ahead, the team is working to further enhance their imbalance mitigation framework by integrating undersampling with oversampling techniques. “While this study focused on SD and SMOTE, our updated research introduces Tomek-Link undersampling to remove borderline majority-class samples and reduce class overlap,” Kumar explained. “This hybrid approach improves decision boundary clarity, minimizes noise, and enhances minority-class detection more effectively than oversampling alone.”
The two methodologies were presented in “Mitigating dataset imbalance using image-based stable diffusion and feature-level SMOTE for solar panel classification with CNNs,” published in Energy Reports.
This content is protected by copyright and may not be reused. If you want to cooperate with us and would like to reuse some of our content, please contact: editors@pv-magazine.com.

By submitting this form you agree to pv magazine using your data for the purposes of publishing your comment.
Your personal data will only be disclosed or otherwise transmitted to third parties for the purposes of spam filtering or if this is necessary for technical maintenance of the website. Any other transfer to third parties will not take place unless this is justified on the basis of applicable data protection regulations or if pv magazine is legally obliged to do so.
You may revoke this consent at any time with effect for the future, in which case your personal data will be deleted immediately. Otherwise, your data will be deleted if pv magazine has processed your request or the purpose of data storage is fulfilled.
Further information on data privacy can be found in our Data Protection Policy.