Undergraduate Research: AI, Deep Reinforcement Learning, Deep Learning, Scheduling, Intelligent Multi-Robot Scheduling in Semiconductor Manufacturing Systems, and Modeling and Scheduling Algorithms for Discrete-Event Systems.
Guangdong University of TechnologySchool of Computer Science
Program
Artificial Intelligence Innovation ClassB.Eng.
Research
Research Projects
Selected undergraduate research projects on reinforcement learning and intelligent scheduling for semiconductor manufacturing and central air-conditioning systems.
Project 012025.11-Present
Scheduling of Single-Arm Multicluster Tools for Multi-Type Wafers with Parallel Processing, Residency Time Constraints, and Reentrant Buffer Cooling Operations
Developing an ongoing scheduling framework for single-arm multicluster tools that process multiple wafer types under parallel-processing, residency-time, and reentrant buffer cooling constraints, with breakthrough progress already achieved in exact modeling, neighborhood search, and learning-guided optimization.
Project 022025.11-2026.03
Scheduling of Single-Arm Multicluster Tools for Multi-Type Wafers with Parallel Processing and Residency Time Constraints
Developed a multi-agent reinforcement-learning scheduling framework for single-arm multicluster tools under parallel-processing and residency-time constraints. In representative two-type and three-type wafer scenarios, the proposed MA-QMIX-PPO achieved the best or tied-best minimum makespan against MAQMIX, MAPPO, MADDPG, MATD3, and MACTD4, and the related journal paper is currently under SCI submission with me as second author.
Project 032025.06-2026.01
Scheduling of a Single-Arm Cluster Tool for Processing Two Types of Wafers with Parallel Processing
Deep Reinforcement LearningSchedulingSingle RobotSingle Cluster ToolCCF-C Conference
Time
2025.06-2026.01
Role
Project Lead
Developed a deep-reinforcement-learning scheduling strategy for a single robotic arm in a cluster tool handling two wafer types with parallel processing and residency-time constraints. The improved TD3 method achieved lower makespan, faster convergence, and stronger generalization than TD3, DDPG, and DQN across representative scenarios, and the related paper has been accepted by a CCF-C conference with me as first author.
Project 042024.09-2025.07
Intelligent Scheduling System for Central Air Conditioning
Neural NetworksDDPGHVAC ControlPower RegulationGuangdong Regional Third Prize
Time
2024.09-2025.07
Role
Project Participant
Based on the operating principles of central air-conditioning systems, this project studied how to coordinate multiple components, used neural-network-based models to predict post-frequency-regulation power under constrained external conditions, and developed an adaptive intelligent scheduling system with a DDPG-based control algorithm. The project won Third Prize in the Guangdong Regional Competition of the China Robot and Artificial Intelligence Competition.
Awards
Honors and Awards
Current Focus
Competitions in Preparation
I am currently preparing several major competitions that are in progress or approaching submission and final evaluation.
In Progress
China Robot and Artificial Intelligence Competition
Currently advancing toward a national-level award after the regional competition stage.
China Statistical Modeling Competition
Preparing modeling work, data analysis, and competition materials for the upcoming round.
Network Technology Challenge
Currently organizing technical preparation and solution design for the next competition stage.
Challenge Cup Special Competition
Developing project materials and implementation details for submission and final presentation.
China Undergraduate Mathematical Contest in Modeling
2024
Award: Second Prize, Guangdong Provincial Division
National Mathematics Competition for College Students
2024
Award: Third Prize, Guangdong Provincial Division
China Robot and Artificial Intelligence Competition
2025
Track: AI Innovation
Award: Third Prize, Guangdong Regional Competition
Outstanding Student Scholarship, Guangdong University of Technology
2023-2024
Award: Third-Class Scholarship
Outstanding Student Scholarship, Guangdong University of Technology
2024-2025
Award: Second-Class Scholarship
Outstanding Class Monitor and Student Cadre, Guangdong University of Technology
2024-2025
Award: First-Class Scholarship
Education
Education
Education Timeline
Guangdong University of Technology
2023-2027
Program
Artificial Intelligence Innovation Class (B.Eng.)
GPA
3.536 / 5.0 (First Five Semesters)
English Proficiency
IELTS
Target Band 7.0 (In Preparation)
CET-4
Score: 465
CET-6
Score: 466
Coursework
Major Courses
Introduction to Artificial IntelligenceMachine LearningDeep LearningReinforcement LearningData Structures and AlgorithmsComputer NetworksOperating SystemsComputer OrganizationEngineering Mathematical AnalysisLinear AlgebraDiscrete MathematicsNumerical MethodsSoftware EngineeringProgrammingDatabase SystemsDigital Logic and System DesignOptimization MethodsLinux TechnologiesBig Data Principles and ApplicationsCompiler PrinciplesNatural Language ProcessingHigh-Performance ComputingKnowledge Engineering and Knowledge GraphsSemiconductor Manufacturing Processes and Equipment
This study focuses on the scheduling problem of a single-arm cluster tool processing multiple types of wafers under wafer residency time constraints, and proposes a scheduling strategy based on deep reinforcement learning. The main contents and contributions are as follows:
This paper studies the scheduling problem of a single-arm cluster tool processing two wafer types. Unlike many existing studies that focus only on steady-state scheduling, my work considers the entire production cycle, including the startup transient, steady-state, and close-down transient phases. At the same time, it incorporates parallel processing modules, shared module competition, and wafer residency time constraints. The challenge is that a single robotic arm can move only one wafer at a time, while different wafer routes may overlap and share modules, and wafers must be removed from modules within a strict time limit after processing.
To address this, I formulated the problem as an MDP and developed an improved TD3 algorithm tailored to this scheduling environment. The main technical contributions include dynamic noise exploration, dynamic prioritized replay, adaptive cyclic learning rate, an attention mechanism, a makespan-oriented reward function, and normalized state features. These designs improve both training stability and policy quality.
In the experiments, I designed four representative scenarios with and without shared PMs and parallel paths, and compared the proposed method against traditional TD3, DDPG, and DQN. The results show that my improved TD3 achieves better minimum makespan and faster convergence in most scenarios, demonstrating stronger robustness and generalization capability.
The significance of this work is that it demonstrates the feasibility of using deep reinforcement learning for complex, highly constrained, and dynamically coupled scheduling problems in semiconductor manufacturing. It provides a more flexible and practical alternative to conventional rule-based or static optimization approaches.
This research has been accepted by a CCF-C international conference and provides a novel solution for scheduling optimization in semiconductor manufacturing.
This study extends the single-arm scheduling problem to a multi-cluster tool scenario and, in the context of wafer residency time constraints and multi-type wafer processing, proposes a scheduling strategy based on multi-agent deep reinforcement learning. The main contents and contributions are as follows:
This paper studies a more challenging setting: single-arm multicluster tools. Compared with a single cluster tool, a multicluster tool consists of multiple interconnected cluster tools linked by shared buffer modules, which introduces stronger coupling, more severe resource competition, and a higher risk of deadlock. This work focuses on concurrent processing of two or three wafer types, while considering shared BMs, parallel PMs, wafer residency time constraints, and full-cycle scheduling.
To solve this problem, I formulated it as a multi-agent reinforcement learning problem and proposed a MA-QMIX-PPO algorithm. In this framework, each robot is controlled by an independent agent, while coordination is achieved through centralized training and decentralized execution. The key designs include QMIX-based value decomposition, action masking for invalid actions, epsilon-greedy exploration with exponential decay, PPO clipping for stable policy updates, asynchronous actor-critic updates, soft target-network updates, and state normalization.
In the experiments, I evaluated the method on representative scenarios for two-type wafer processing at a 1:1 ratio and three-type wafer processing at a 1:1:1 ratio. The proposed MA-QMIX-PPO achieved the best or tied-best minimum makespan compared with MAQMIX, MAPPO, MADDPG, MATD3, and MACTD4, while also showing strong coordination ability and stable training behavior.
The significance of this work is that it extends intelligent scheduling from a single-equipment setting to a multi-equipment collaborative setting, showing that multi-agent reinforcement learning can effectively handle real-world semiconductor systems with shared resources, strong coupling, and deadlock constraints. It provides a strong foundation for future work on larger-scale and more realistic manufacturing systems.
This research is currently under submission to a top-tier SCI journal and is expected to provide theoretical and technical support for intelligent scheduling in semiconductor manufacturing systems.
This project focused on the intelligent scheduling of a central air-conditioning system from September 2024 to July 2025. Starting from the operating principles of the system, we studied how major components such as chillers, pumps, fans, and auxiliary control units could be scheduled in a coordinated manner so that the entire system could respond efficiently to changing environmental and operating conditions.
A core part of the work was to analyze the coupling relationships among different devices and identify scheduling patterns that balance system stability, energy efficiency, and control responsiveness. Instead of optimizing each component independently, the project treated central air conditioning as a collaborative multi-device system and explored how equipment-level decisions affect the overall operating state.
To support intelligent decision-making, we applied neural network techniques to model and predict the power consumption of central air-conditioning equipment after frequency regulation under constrained external conditions. This prediction module provided a relatively accurate estimate of how each device would respond after control adjustments, which created a practical data-driven basis for subsequent scheduling and optimization.
On top of the predictive model, we designed and implemented an intelligent scheduling algorithm and system based on the DDPG reinforcement learning framework. The algorithm was built to adapt to environmental changes and dynamic variations in device power, enabling the scheduler to continuously adjust control actions and improve the coordination strategy in real time. This made the system more adaptive than conventional rule-based scheduling approaches.
The project was recognized with Third Prize in the Guangdong Regional Competition of the China Robot and Artificial Intelligence Competition, which highlighted both its technical completeness and its practical value. Overall, this work demonstrated how neural-network-based power prediction and reinforcement-learning-based control can be integrated into an end-to-end intelligent scheduling system for central air conditioning, providing a useful reference for energy management and smart building applications.
This project, which has been running from November 2025 to the present, studies the scheduling of single-arm multicluster tools for multi-type wafers under parallel processing, residency time constraints, and reentrant buffer cooling operations. I serve as the overall project lead. The work is still ongoing rather than completed, but it has already produced several breakthrough advances in modeling, search design, and preliminary optimization results.
The system considered in this project contains two single-arm robots, multiple process modules, and two buffer modules that jointly support transfer, queuing, and cooling. Three wafer types follow different processing routes. In particular, the C-type route reenters BM1 for a second waiting stage, and this reentrant buffer operation directly determines the required cooling time. As a result, BM1 is not merely a passive buffer but a critical bottleneck where resource competition, cooling feasibility, and downstream timing constraints become tightly coupled.
The main challenge is not a simple routing problem, because the wafer paths are fixed in advance. The real difficulty is to schedule all operations on time while satisfying several hard constraints simultaneously: resource exclusivity for robots, PMs, and BMs; precedence relationships inside each wafer route; upper bounds on residency time after specific processing stages; and minimum waiting requirements for buffer cooling. This makes the problem a strongly coupled combinatorial optimization task with shared resources, time windows, and explicit cooling logic.
To address this, the project has established a three-layer framework. At the bottom layer, a CP-SAT interval-scheduling model translates the physical process flow into a unified set of verifiable variables and constraints, ensuring that generated schedules remain strictly feasible. On top of that, a domain-specific large neighborhood search strategy repeatedly destroys and repairs only the most critical local regions, such as bottleneck resources, high-risk residency chains, BM1-related operation chains, and the special C-type cooling flow. At the upper layer, a GNN-plus-RL controller observes the current schedule as a dynamic graph and learns which neighborhood operator and relaxation size should be selected next.
The project has already moved beyond conceptual design. A unified exact model has been built, key domain neighborhoods have been defined, and the learning-guided destroy-repair loop has been connected to the optimization pipeline. Preliminary experiments indicate that domain-aware search can improve schedule quality over a short-horizon exact baseline, and the learning-guided version shows additional promise in handling larger and more constrained instances. These results are important because they demonstrate that the project is generating real technical value even before the full study is finished.
Current work is focused on extending the experimental evaluation, aligning time-budget comparisons across baselines, and further refining the learned search controller. In other words, this project is still in active progress, but it has already achieved breakthrough stage results and established a solid path toward a stronger final solution for intelligent semiconductor scheduling with reentrant buffer cooling constraints.
English Summary
This study focuses on the scheduling problem of a single-arm cluster tool processing multiple wafer types under wafer residency time constraints, and proposes a scheduling strategy based on deep reinforcement learning. Beyond establishing the scheduling model, the work delivers clear performance gains: the improved TD3 policy achieves better makespan, faster convergence, and stronger robustness than several mainstream deep-RL baselines in representative scenarios, and the resulting paper has been accepted by a CCF-C international conference with me as first author.
This paper studies the scheduling problem of a single-arm cluster tool processing two wafer types. Unlike many existing studies that focus only on steady-state scheduling, my work considers the entire production cycle, including the startup transient, steady-state, and close-down transient phases. At the same time, it incorporates parallel processing modules, shared module competition, and wafer residency time constraints. The challenge is that a single robotic arm can move only one wafer at a time, while different wafer routes may overlap and share modules, and wafers must be removed from modules within a strict time limit after processing.
To address this, I formulated the problem as an MDP and developed an improved TD3 algorithm tailored to this scheduling environment. The main technical contributions include dynamic noise exploration, dynamic prioritized replay, adaptive cyclic learning rate, an attention mechanism, a makespan-oriented reward function, and normalized state features. These designs improve both training stability and policy quality.
In the experiments, I designed four representative scenarios with and without shared PMs and parallel paths, and compared the proposed method against traditional TD3, DDPG, and DQN. The results show that my improved TD3 achieves better minimum makespan and faster convergence in most scenarios, demonstrating stronger robustness and generalization capability.
The significance of this work is that it demonstrates the feasibility of using deep reinforcement learning for complex, highly constrained, and dynamically coupled scheduling problems in semiconductor manufacturing. It provides a more flexible and practical alternative to conventional rule-based or static optimization approaches.
This first-author paper has been accepted by a CCF-C international conference and provides a novel solution for scheduling optimization in semiconductor manufacturing.
中文介绍
本研究针对单机械臂单组合设备在晶圆驻留时间约束下加工多品种晶圆的调度问题,提出了一种基于深度强化学习的调度策略。除了完成问题建模外,该工作在实验中取得了较突出的结果:改进后的 TD3 在多个代表性场景下实现了更优的 makespan、更快的收敛速度以及更强的鲁棒性,相关成果已以本人一作被 CCF C 类国际会议录用。
This study extends the single-arm scheduling problem to a multicluster-tool setting and, under wafer residency-time constraints and multi-type wafer processing, proposes a scheduling strategy based on multi-agent deep reinforcement learning. The work is especially outcome-driven: the proposed MA-QMIX-PPO achieves the best or tied-best minimum makespan in representative two-type and three-type wafer scenarios, outperforms several strong MARL baselines in stability and coordination quality, and has progressed to submission to a top-tier SCI journal, with me as second author.
This paper studies a more challenging setting: single-arm multicluster tools. Compared with a single cluster tool, a multicluster tool consists of multiple interconnected cluster tools linked by shared buffer modules, which introduces stronger coupling, more severe resource competition, and a higher risk of deadlock. This work focuses on concurrent processing of two or three wafer types, while considering shared BMs, parallel PMs, wafer residency time constraints, and full-cycle scheduling.
To solve this problem, I formulated it as a multi-agent reinforcement learning problem and proposed a MA-QMIX-PPO algorithm. In this framework, each robot is controlled by an independent agent, while coordination is achieved through centralized training and decentralized execution. The key designs include QMIX-based value decomposition, action masking for invalid actions, epsilon-greedy exploration with exponential decay, PPO clipping for stable policy updates, asynchronous actor-critic updates, soft target-network updates, and state normalization.
In the experiments, I evaluated the method on representative scenarios for two-type wafer processing at a 1:1 ratio and three-type wafer processing at a 1:1:1 ratio. The proposed MA-QMIX-PPO achieved the best or tied-best minimum makespan compared with MAQMIX, MAPPO, MADDPG, MATD3, and MACTD4, while also showing strong coordination ability and stable training behavior.
The significance of this work is that it extends intelligent scheduling from a single-equipment setting to a multi-equipment collaborative setting, showing that multi-agent reinforcement learning can effectively handle real-world semiconductor systems with shared resources, strong coupling, and deadlock constraints. It provides a strong foundation for future work on larger-scale and more realistic manufacturing systems.
This second-author paper is currently under submission to a top-tier SCI journal and is expected to provide theoretical and technical support for intelligent scheduling in semiconductor manufacturing systems.
中文介绍
本研究将单机械臂调度问题拓展到了多组合设备场景,在晶圆驻留时间约束和多品种晶圆并行加工的背景下,提出了一种基于多智能体深度强化学习的调度策略。该工作不仅完成了更复杂场景下的建模与算法设计,还在代表性实验中取得了较强成果:MA-QMIX-PPO 在两类与三类晶圆并行加工场景下获得了最优或并列最优的最小 makespan,相比多种主流多智能体强化学习基线表现出更好的稳定性与协同效果,目前相关论文已以本人二作进入 SCI Top 期刊投稿阶段。
该研究目前已以本人二作投稿 SCI Top 期刊,有望为半导体制造系统的智能调度提供进一步的理论与技术支持。
English Summary
This project focused on the intelligent scheduling of a central air-conditioning system from September 2024 to July 2025. Starting from the operating principles of the system, we studied how major components such as chillers, pumps, fans, and auxiliary control units could be scheduled in a coordinated manner so that the entire system could respond efficiently to changing environmental and operating conditions.
A core part of the work was to analyze the coupling relationships among different devices and identify scheduling patterns that balance system stability, energy efficiency, and control responsiveness. Instead of optimizing each component independently, the project treated central air conditioning as a collaborative multi-device system and explored how equipment-level decisions affect the overall operating state.
To support intelligent decision-making, we applied neural network techniques to model and predict the power consumption of central air-conditioning equipment after frequency regulation under constrained external conditions. This prediction module provided a relatively accurate estimate of how each device would respond after control adjustments, which created a practical data-driven basis for subsequent scheduling and optimization.
On top of the predictive model, we designed and implemented an intelligent scheduling algorithm and system based on the DDPG reinforcement learning framework. The algorithm was built to adapt to environmental changes and dynamic variations in device power, enabling the scheduler to continuously adjust control actions and improve the coordination strategy in real time. This made the system more adaptive than conventional rule-based scheduling approaches.
The project was recognized with Third Prize in the Guangdong Regional Competition of the China Robot and Artificial Intelligence Competition, which highlighted both its technical completeness and its practical value. Overall, this work demonstrated how neural-network-based power prediction and reinforcement-learning-based control can be integrated into an end-to-end intelligent scheduling system for central air conditioning, providing a useful reference for energy management and smart building applications.
This project, which has been running from November 2025 to the present, studies the scheduling of single-arm multicluster tools for multi-type wafers under parallel processing, residency time constraints, and reentrant buffer cooling operations. I serve as the overall project lead. The work is still ongoing rather than completed, but it has already produced several breakthrough advances in modeling, search design, and preliminary optimization results.
The system considered in this project contains two single-arm robots, multiple process modules, and two buffer modules that jointly support transfer, queuing, and cooling. Three wafer types follow different processing routes. In particular, the C-type route reenters BM1 for a second waiting stage, and this reentrant buffer operation directly determines the required cooling time. As a result, BM1 is not merely a passive buffer but a critical bottleneck where resource competition, cooling feasibility, and downstream timing constraints become tightly coupled.
The main challenge is not a simple routing problem, because the wafer paths are fixed in advance. The real difficulty is to schedule all operations on time while satisfying several hard constraints simultaneously: resource exclusivity for robots, PMs, and BMs; precedence relationships inside each wafer route; upper bounds on residency time after specific processing stages; and minimum waiting requirements for buffer cooling. This makes the problem a strongly coupled combinatorial optimization task with shared resources, time windows, and explicit cooling logic.
To address this, the project has established a three-layer framework. At the bottom layer, a CP-SAT interval-scheduling model translates the physical process flow into a unified set of verifiable variables and constraints, ensuring that generated schedules remain strictly feasible. On top of that, a domain-specific large neighborhood search strategy repeatedly destroys and repairs only the most critical local regions, such as bottleneck resources, high-risk residency chains, BM1-related operation chains, and the special C-type cooling flow. At the upper layer, a GNN-plus-RL controller observes the current schedule as a dynamic graph and learns which neighborhood operator and relaxation size should be selected next.
The project has already moved beyond conceptual design. A unified exact model has been built, key domain neighborhoods have been defined, and the learning-guided destroy-repair loop has been connected to the optimization pipeline. Preliminary experiments indicate that domain-aware search can improve schedule quality over a short-horizon exact baseline, and the learning-guided version shows additional promise in handling larger and more constrained instances. These results are important because they demonstrate that the project is generating real technical value even before the full study is finished.
Current work is focused on extending the experimental evaluation, aligning time-budget comparisons across baselines, and further refining the learned search controller. In other words, this project is still in active progress, but it has already achieved breakthrough stage results and established a solid path toward a stronger final solution for intelligent semiconductor scheduling with reentrant buffer cooling constraints.