To prevent the “Garbage in, garbage out” phenomenon and enhance early failure detection, quality control efforts concentrated on the initial stage of the entire process: Flip Chip Attach
Background
Integrated circuits (ICs), commonly referred to as microchips or computer chips, play a pivotal role in modern electronic devices These compact chips consist of interconnected components, including transistors, resistors, and capacitors, etched onto a single piece of semiconductor material, typically silicon Their widespread use in computers, smartphones, and televisions enables essential functions such as data processing and storage
ICs have revolutionized electronics by enabling device miniaturization and enhanced functionality As technology rapidly evolves, ICs continue to improve in performance while shrinking in size, following the trajectory predicted by Moore’s Law [1] This evolution brings forth new challenges for semiconductor manufacturing companies The increasing complexity of manufacturing processes, one of the mentioned challenges, makes maintaining stringent quality control standards becoming more difficult
This report focused on a case study of a critical assembly plant within a global semiconductor company Responsible for several final stages of chip manufacturing, this plant faces immense pressure to maintain product quality To prevent the “Garbage in, garbage out” phenomenon and enhance early failure detection, quality control efforts concentrated on the initial stage of the entire process: Flip Chip Attach module In this module, chips are inverted to connect solder dots with underlying electronics or circuit boards.
Problem and solution introduction
Currently, the challenges within the Flip Chip Attach module stem from managing process parameters, which significantly impact chip quality Specifically, the adoption of advanced Thermal Compression Bonding (TCB) methods introduces issues such as thermal coefficient of expansion mismatches, temperature gradients, misalignment, tilt, and process-induced stress and warpage [2] Additionally, foreign materials (FM), including dust, dirt, and unintentional contamination from the environment, affect product quality All semiconductor products are sensitive to FM, making its control crucial, not only in this module, but throughout the entire manufacturing process In 1993, Harold has stated the effect of microcontamination will impact directly to yield and reliability of the semiconductor device [3] Hence, control
2 microcontamination and reduce defect losses induced by microcontamination are the challenge in semiconductor manufacturing processes
The proposed solution to the quality challenges faced in Flip Chip Attach module involves the integration of two robust methodologies: the DMAIC (Define, Measure, Analyze, Improve, Control) framework and the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm In details, research study is structured around two key contributions:
- Problem Identification: Leveraging the DMAIC framework, we meticulously identify the existing challenges within the Flip Chip Attach module during the DMA (Define, Measure, Analyze) phase
- DBSCAN Implementation: The application of the DBSCAN clustering algorithm - known for its ability to detect arbitrary-shaped clusters in non-spherical data regions while effectively identifying noise outliers - occurs during the IC (Improve and Control) phase, aligning seamlessly with the DMAIC model
Above context and issues formed the title of this thesis topic as: “ Integrating DMAIC
Framework and DBSCAN Clustering Algorithm for Enhanced Quality Assurance in Semi- conductor Manufacturing”
Thesis goals
The primary goal of this study is to enhance quality control and quality assurance capabilities, thereby minimizing defect waste within the Flip Chip Attach module of the semiconductor industry
To achieve the defined goal, the thesis focuses on the following specific missions:
• Leveraging the DMAIC methodology to systematically evaluate and identify the root causes of defect waste within the study module
• Exploring machine learning technique - DBSCAN to optimize the existing process, ensuring precise control over the quality outcomes of each product within the study module
• Diligently documenting the results and outcomes of interventions after implementation
Scope and constraints
The focus of this study lies within the Flip Chip Attach operation in Assembly Test Manufacturing plant - a critical stage in semiconductor production Specifically, the addressed chronic issue is related to foreign materials (FM) impacting the current process
Key considerations are as below:
- Time Constraints: Conducting a comprehensive thesis in the semiconductor industry demands substantial time and resources Experimental work and data collection are essential components Due to confidentiality concerns and potential intellectual property violations, specific company names, machinery details, and process parameters cannot be disclosed explicitly The study duration is set at six months
- Adherence to Company Rules: Definitions and efficiency criteria will align with company guidelines
- Data Accessibility Challenges: The semiconductor industry operates in a highly competitive landscape, with companies safeguarding proprietary data Consequently, access to relevant and detailed information may be restricted, impacting the depth of analysis
- Resource Availability: Research in this field necessitates specialized equipment, facilities, and expert knowledge However, limited availability of these resources may pose the author a challenge.
Structure
The thesis comprises five well-defined chapters, each contributing to a comprehensive understanding of the research topic:
The introductory chapter sets the stage by providing an overview of the thesis It explains the rationale behind the study, emphasizing its significance and urgency within the context of the company Additionally, this chapter outlines the achievement goal and establishes the study’s scope
- Chapter 2: Literature Review and Methodology
This chapter delved into the core literature relevant to the research Topics such as DMAIC (Define, Measure, Analyze, Improve, Control), Lean Six Sigma, quality control, quality improvement, and the fundamental machine learning DBSCAN algorithm were explored Furthermore, we present insights from related studies that inform our approach
Here, the research subject is introduced: Flip Chip Attach operation within an assembly and test manufacturing company Leveraging the DMAIC methodology, the author addressed the current problem and set clear objectives during the Define phase Data collection and process capability measurement occur in the subsequent Measure phase Techniques and tools are employed to analyze the root cause during the Analyzer phase
The fourth chapter focuses on the proposed solution The author applied the machine learning method – DBSCAN to overcome limitations in the existing process, aiming to solve the identified problem during the Improve phase Additionally, how this solution can be sustained during manufacturing is discussed, and its effectiveness is validated in the Control phase
In the final chapter, the study’s results are consolidated This part will provide a comprehensive overview of findings and offer recommendations for further research in this critical domain
The semiconductor manufacturing industry, characterized by its complexity and stringent quality requirements, necessitates continuous process improvements and robust quality assurance methods With the advent of advanced manufacturing technologies and the increasing demand for high-quality semiconductor devices, maintaining process efficiency and minimizing defects have become paramount.
Literature review
Six Sigma and DMAIC Framework
Six Sigma is a continuous improvement methodology that has been acknowledged for its effectiveness in attaining organizational objectives related to operational efficiency and service quality [4] It is based on the cyclical project implementation methodology known as DMAIC, displayed in Fig.2.1 DMAIC (Define, Measure, Analyze, Improve, and Control) is a structured, data-driven problem-solving methodology central to Six Sigma It is widely adopted across various industries to enhance process performance and quality [5] Recent studies underscore its efficacy in complex manufacturing settings The DMAIC framework starts with defining the problem and setting goals The measure phase involves collecting data to understand current performance Analyzing this data helps identify the root causes of issues The improve phase focuses on implementing solutions, followed by the control phase to maintain the improvements [6]
Recent research highlights the continued relevance and application of DMAIC Six Sigma in manufacturing [7] For example, a 2018 study by Gupta et al demonstrated significant improvements in operational efficiency and defects reduction through DMAIC in the tire- manufacturing context [8] Similarly, Kumar et al in 2020 emphasized the benefits of DMAIC and stated that challenges of applying this methodology are often outweighed by the significant returns from the initiative [9]
DBSCAN Clustering Algorithm
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a robust clustering algorithm used to identify clusters in large datasets by discovering areas of high density separated by areas of low density Unlike traditional clustering methods like k-means, DBSCAN does not require the number of clusters to be predefined and is effective in handling noise and outliers [10]
A study in 2021 utilized DBSCAN, along with other clustering algorithms, to automatically group components by similarity in the automotive industry, facilitating anomaly detection and cost reduction through improved specification and pricing accuracy [11] Although DBSCAN offers significant advantages, its adoption within the manufacturing context remains limited, with sparse related research.
Integrating DMAIC and DBSCAN for Quality Assurance
The integration of DBSCAN within the Six Sigma DMAIC framework for Enhanced Quality Assurance in Semiconductor Manufacturing represents a novel approach in the field While Six Sigma methodologies are well-established for process improvement, the incorporation of DBSCAN introduces a data-driven clustering technique not commonly utilized in this context This integration holds promise for addressing quality assurance challenges in semiconductor manufacturing by leveraging DBSCAN's ability to identify patterns and anomalies in complex datasets, thus enhancing the efficiency and effectiveness of the DMAIC process
The Define phase serves as the foundation of DMAIC, emphasizing the importance of clearly articulating the problem or opportunity for improvement By establishing project goals, scope, and stakeholders, this phase ensures alignment with organizational objectives and sets the stage for success Effective definition lays the groundwork for focused efforts and informed decision- making throughout the project lifecycle
In the Define phase, project teams collaborate to precisely define the problem statement, identify critical-to-quality factors, and establish measurable objectives Clear delineation of project boundaries and deliverables facilitates a shared understanding among stakeholders and guides subsequent activities
Table 2.1 Tools used for Define phase
Steps in Define phase Tools
Identify the Problem or Opportunity Brainstorming sessions, affinity diagrams, stakeholder analysis Develop a problem statement 5Whys, fishbone diagram
Define goal and objective SMART criteria, project charter template
Establish project scope Project scope statement, project charter
Identify key stakeholder Stakeholder analysis matrices
Gain leadership support Executive summaries, presentations, and business cases
The Measure phase of DMAIC is pivotal in obtaining a comprehensive understanding of the current state of the process under scrutiny Through systematic data collection and analysis, this phase establishes baseline performance metrics, enabling informed decision-making and targeted interventions
During the Measure phase, emphasis is placed on quantifying process performance and variability Through data gathering and analysis techniques, key process parameters are identified and measured to assess process capability and stability
Table 2.2 Tools used for Measure phase
Steps in Measure phase Tools
Identify Process Inputs and Outputs Process mapping techniques, value stream mapping, and flowcharts
Define Measurement Objectives SMART criteria (Specific, Measurable,
Achievable, Relevant, Time-bound), Measurement Systems Analysis (MSA)
Check sheets, surveys, sampling plans, and automated data collection systems
Collect Data Data collection forms, software applications for data entry, and automated sensors, monitoring systems
Analyze Data quality Statistical process control (SPC) charts, histograms, and data validation techniques
Statistical software packages like Minitab, JMP, or Excel with data analysis tools
Establish Baseline Performance Control charts, trend analysis, and benchmarking against industry standards
The Analyze phase delves deeper into the data collected during the Measure phase to uncover root causes of process variation or defects Employing statistical tools and methodologies, this phase aims to identify critical factors influencing process performance and inform targeted improvement efforts
In the Analyze phase, data is scrutinized to discern patterns, trends, and relationships indicative of underlying causes of process inefficiencies or defects Statistical analysis techniques, such as hypothesis testing and regression analysis, are employed to validate hypotheses and prioritize improvement opportunities
Table 2.3 Tools used for Analyze phase
Steps in Analyze phase Tools
Review Data Collected in the
Statistical analysis software such as Minitab, JMP, or Excel
Steps in Analyze phase Tools
Identify Potential Root Causes Cause-and-effect diagrams (fishbone diagrams), scatter plots, and regression analysis
Conduct Root Cause Analysis 5 Whys technique, fault tree analysis, and
Ishikawa diagrams Perform Hypothesis Testing Statistical tests such as t-tests, ANOVA
(analysis of variance), chi-square tests, and hypothesis testing frameworks
Verify Root Causes Design of experiments (DOE), regression analysis, and control charts Summarize Findings Reports, presentations, and visual aids
The Improve phase represents the proactive stage of DMAIC, where targeted interventions are implemented to address identified root causes and drive measurable improvements in process performance Through experimentation and innovation, this phase seeks to optimize process parameters and enhance overall effectiveness
In the Improve phase, validated solutions are developed and implemented to address identified root causes of process variation or defects Rigorous testing and piloting of improvement initiatives are conducted to validate their effectiveness before full-scale implementation
Table 2.4 Tools used for Improvement phase
Steps in Improve phase Tools
Generate Potential Solutions Brainstorming sessions, mind maps, and affinity diagrams Evaluate and Prioritize Solutions Decision matrices, prioritization matrices, and cost-benefit analysis tools Develop Implementation Plan Gantt charts, action plans, and project management software Implement Solutions Project management tools, task tracking systems, and regular progress
Steps in Improve phase Tools
Conduct Pilot Tests or Trials Control charts, process capability analysis, and feedback mechanisms Refine and Optimize Solutions Continuous improvement tools such as
PDCA (Plan-Do-Check-Act) cycles, iterative testing
Document Lessons Learned Lessons learned templates, post- implementation reviews, and knowledge management systems
The Control phase of DMAIC focuses on sustaining the gains achieved through process improvement efforts and embedding a culture of continuous improvement within the organization By implementing monitoring mechanisms and control measures, this phase ensures long-term stability and reliability of the improved process
In the Control phase, measures are put in place to monitor and sustain the improvements achieved during the project Robust control systems, standardized procedures, and ongoing training are established to prevent regression and promote ongoing performance optimization
Table 2.5 Tools used for Control phase
Steps in Control phase Tools
Develop Control Plans Control plan templates, standard operating procedures (SOPs), and process flowcharts Implement Control Measures Control charts, statistical process control
(SPC) software, and automated monitoring systems
Work instructions, training manuals, and e- learning platforms
Audit checklists, process audits, and performance scorecards
Corrective action request (CAR) forms, root cause analysis tools, and 5 Whys
Monitor and Continuously Improve Performance dashboards, trend analysis tools
Steps in Control phase Tools
Document Lessons Learned and Best
Lessons learned templates, knowledge sharing platforms, and post-implementation reviews
2.1.3.2 Tools frequently used in Quality Control, Lean Six Sigma
Flowcharts are visual representations of processes, depicting the sequence of steps involved and the flow of materials or information
Flowcharts are being used to enhance process understanding by providing a clear visual depiction of workflows, facilitate process analysis and identification of inefficiencies or bottlenecks
• Simplify complex processes into easy-to-understand diagrams
• Standardize processes by documenting workflow procedures
• Improve communication and collaboration among team members by visualizing process steps and dependencies
Pareto charts are bar graphs that display the frequency or impact of different categories of problems or defects in descending order
Pareto charts prioritize improvement efforts by identifying the most significant issues that contribute to the majority of problems and focus resources on addressing high-impact issues to achieve maximum improvement with minimal effort
• Provide a visual representation of the "vital few" factors that contribute to the majority of problems, enabling targeted problem-solving
• Enhance decision-making by highlighting areas where intervention will yield the greatest benefit
• Facilitate communication of priorities and improvement opportunities to stakeholders
Cause-and-Effect Diagrams (Fishbone Diagrams):
Cause-and-effect diagrams visually depict the possible causes of a problem or effect, categorizing potential factors into major categories such as people, process, equipment, and environment
Fishbone diagram systematically identify root causes of problems or defects by analyzing potential contributing factors Moreover, it facilitates collaborative problem-solving and brainstorming sessions by visually organizing and categorizing ideas
• Encourage structured problem-solving by guiding teams through the identification and analysis of potential causes
• Promote cross-functional collaboration by involving stakeholders from different areas of expertise
• Support targeted improvement efforts by helping teams prioritize root causes for further investigation and resolution
Figure 2.3 Cause and Effect diagram (Fishbone diagram)
Scatter plots display the relationship between two variables in a dataset, helping identify correlations, trends, or patterns Further, Scatter plots help to visualizes relationships between variables, aiding in data analysis and interpretation It would help to identify correlations and potential cause-and-effect relationships
• Provides a graphical representation of data, making complex relationships easier to understand
• Helps identify outliers, anomalies, and nonlinear relationships
• Supports informed decision-making and hypothesis testing
Figure 2.4 Scatterplot diagram Process Control Charts:
Process control charts monitor process performance over time, displaying process data along with control limits to identify trends, shifts, or abnormal variations
Process control charts provide real-time feedback on process stability and performance in order to detect deviations from established control limits and take timely corrective action
• Ensures process stability and consistency
• Prevents defects or failures by monitoring key process parameters
• Facilitates continuous improvement and process control
Failure Mode Effect Analysis (FMEA)
Methodology
Methodology is the basis to help guide problem solving, including a system of tasks to be performed, arranged logically Based on the methodology to conduct research, collect data, and build research papers systematically and consistently, ensuring logic in research and problem solving The methodology is briefly described in the diagram below:
Figure 2.7 Methodology using DMAIC model
- Step 1: Identify current status The scope of this study focuses on the Flipchip attach module, an assembly manufacturing process in the semiconductor industry Quality issues arise due to foreign materials (FM), resulting in a Failure Modes and Effects Analysis (FMEA) Risk Priority Number (RPN) exceeding the target limit At the conclusion of this step, the study will launch the project with specific goals in mind
- Step 2: Identify the root cause First, this study will figure out what is the current process of Flipchip attach module Second, based on data retrieved from database and acquired by SQL Pathfinder, it will be analyzed through JMP pro software to visualize under scatterplot diagram Then Pareto chart, fishbone diagram, hypothesis table, scatter plot chart have been used to analyze the main root cause where the FM came from Finally, the study will find the solution to minimize the impact of FM to quality of the module
- Step 3: Solution After all current quality control, methods have been identified not able to detect the root cause of FM impacted to the product quality Another data driven has been approached to solve the problem on existing limitation DBSCAN clustering algorithm will be explored under python environment and embedded into SQL pathfinder tool Lastly, project owner would need to revalidate the effectiveness of DBSCAN and qualify it follow company new process procedures
- Step 4: Conclusion and recommendation Evaluate the result after improvement, verify with the target goal given early in step 1 Reassessment all improvement process and recommend for next step for future research
Table 2.6 Tools are using in each methodology step
2 Identify the root cause Pareto, fishbone diagram, hypothesis table, SPC chart
Object introduction
In this study, we study about one of those first ATM (Assembly Test Manufacturing) of semiconductor company in Vietnam The company starts to produce in 2010 with main products are processors, 5G, IOT, mobile chip The semiconductor industry is considered as a high value product created but also required high investment from technology, machine and material Therefore, factory operation always put focus on those key indicators such as safety, quality, yield, delivery and cost Continuous improvement is essential activities that the company must have to drive those indicators meet the goal
The main product of company is processor chip which contains these components: die, substrate, Epoxy, Heat Spreader, capacitor, resistor, and ball Refer Fig 3.1 for processor chip structure with cross-section
Figure 3.1 Image a CPU Chip product (a) and structure cross section (b)
With the complexity of chip design, manufacturing process must precisely created to adapt with all assembly and test requirements Manufacturing process is determined into 3 groups: Assembly, Test, and Finish packaging as below Fig 3.2
Figure 3.2 Assembly-Test-Finish Manufacturing Process
Initially, the die and substrate are sourced from the fabrication manufacturer Following the attachment of components like resistors and capacitors, the die and substrate undergo the flip- chip attach operation for assembly Subsequently, the processor chip progresses through additional assembly processes, including epoxy underfill, heat spreader, and ball attach The chips are then subjected to the Test operation, where functional and non-functional chips are differentiated Defective chips are identified, segregated, and scrapped, while chips exhibiting satisfactory performance are categorized as functionally sound Ultimately, the finish good chips are carefully packaged and dispatched to the next-level customer
This study aims to address a quality issue in the Flipchip attach operation, which occurs at the beginning of the assembly process and where materials be distributed to many downstream operations The defect in the Flipchip attach operation is only identified when the material reaches to test operation, resulting in waste Therefore, ensuring quality within the Flipchip attach process is crucial and essential.
Overview about Flipchip Attach operation
Flip Chip Attach process
Flipchip attach operation has many machines stay next to each other create a long link Each machine plays a different function but essential contribution Sequence of Flipchip attach link is described in the figure below:
Figure 3.3 Flipchip Attach module tool configuration
Function of each machine above could be detail explained in below table 2.6
Destack machine Destack media tray to individual one from a bundle (Fig 3.4) and feed it to flux dispensing machine
Flux dispensing machine Flux – a kind of material cleaning agent used for removing particles affecting solder quality This machine would help to dispense flux on top of substrate surface, cover all C4 bump area
Input Shuttle Receive media tray which dispensed flux and distribute into bonding machine This machine also plays as a communicator with Station controller contain information of substrate lot and die lot or complete unit
Bonding machine 1 link normally has 4 individual bonding machines which
Attach die into substrate, create an interconnection between die and substrate
Output Shuttle Receive complete media tray from bonder, and communicate with input shuttle about completed media tray
Stack machine Stack all complete tray into a bundle, contained into a cart
Figure 3.4 Media Tray Bundle (a) and single media tray (b)
The bonding machine, as explained in its function, is the central apparatus in the Flipchip attach process, playing a crucial role in producing high-value outcomes It facilitates the interconnection between the die and substrate using TCB advanced technology Therefore, maintaining control over the bonding process quality is the primary challenge in this operation.
Overview about Bonding machine
After receiving substrate media tray which dispensed flux from upstream tool through a Highway Bonding machine concept is described as Figure 3.5 The bonding machine concept
24 include two sides referred to as left and right, each side feature has two individual Substrate Transfer Arms (STAs) and Bond Stages (BSs) STA will pick and place 1 row of each media tray and transfer that row into BS Once substrate is placed into BS, BS will move into bonding area which underneath of Bond Head (BH) and the other side will wait until current substrate complete bonding The bonding process interleavely left - right until finish full media tray Whenever bonding, only one die is attaching into one substrate per time At the front of the bonding machine, die will be contained into a reel and fed by Die Pick Arm (DPA) then transferred by Die Transfer Arm (DTA) into Bond head
Flip Chip Attach Module using Thermo-Compression Bonding (TCB) technology is a cutting- edge packaging technology employed in the semiconductor industry This approach offers several advantages in terms of performance, reliability, and miniaturization compared to traditional packaging methods In Fig 3.6, the semiconductor die is flipped upside down and directly bonded to the substrate, enabling shorter interconnects and improved electrical
25 performance TCB is a bonding technique that uses heat and pressure to create a reliable and robust interconnection between the flip chip and the substrate This method enhances the mechanical and electrical properties of the package
Figure 3.6 Bonding process using TCB technology (a) and complete unit after bonding (b)
Precise alignment of the semiconductor die and the substrate is crucial for successful bonding Advanced alignment techniques ensure accurate positioning This step involves the application of heat and pressure to create a metallurgical bond between the metal pads on the semiconductor die and the substrate The process is carefully controlled to avoid damage to the sensitive electronic components.
FlipChip Attach key process parameters
As an essential module from the upstream, Flipchip attach required high quality outcome to distribute good unit for downstream, key process parameters and SPC (statistical Process Control) are fundamental tool used to ensure consistency and reliability of flip chip attach process Key process prameters of flip chip attach process are described below:
- Alignment Accuracy: Precise alignment between the die and substrate is critical for achieving reliable electrical connections Misalignment can result in signal degradation, short circuits, or open circuits, ultimately affecting the functionality of the IC
- Material Dispensing: Flux is used to removes oxides, contaminants, and other surface impurities that could inhibit the formation of a strong metallurgical bond between the solder and the substrate
- Bonding Profile: The bonding profile during the flip chip attachment process affects the reflow of solder bumps and the curing of underfill material Precise control of
26 temperature, force, height profiles is necessary to ensure proper bonding and avoid defects
All key process parameter will be controled at operation under two types: Lot level and Unit level like Fig 3.7 and be described below:
➢ For Lot level, SPC Implementation: Statistical Process Control (SPC) methodologies are applied to analyze process variation and maintain process stability within acceptable limits Control charts, process capability indices, and other SPC tools help monitor process performance and identify trends or abnormalities Lot level data will be calculated at the end of lot processing and upload into database
➢ For Unit level, bonding machine has Real-time Monitoring capability to continuous monitoring of process parameters, such as alignment accuracy, and bonding profiles, allows for early detection of deviations and immediate corrective actions Unit level data will be retrieved and calculated after every unit bonding from a database Control chart also applied to unit level to timely detect abnormalities.
Flip Chip Attach key failure mode
Test operation in semiconductor manufacturing refers to the process of evaluating the functionality, performance, and reliability of integrated circuits (ICs) before they are released for commercial use This step is essential for ensuring that semiconductor devices meet specified quality standards and functional requirements Test operations encompass a variety of techniques and methodologies aimed at detecting defects, verifying functionality, and assessing performance parameters Regarding to test functional, some key failure mode caused by Flip Chip attach operation will be screened out and triggered back upstream to investigate These key failure modes are described in the Table 3.2 below
Table 3.2 Flipchip attach failure mode description
No connection between die and substrate bump
Solder fails to adequately bond with the surfaces it is intended to join
Solder bumps unintentionally merge, forming a short circuit between neighboring
Missing solder from its intended location
Physical defect within die fissure
Physical defect caused on capacitor components.
Current state and Define problem
Current state
As semiconductor technology advances and the demand for higher performance and miniaturization increases, bonding technology has become a crucial element in integrated circuit packaging However, the reliability and performance of flip chip attach modules are
29 frequently compromised by the presence of microcontamination, posing significant challenges to the semiconductor industry
Microcontamination in flip chip attach modules refers to the unintended presence of minute particles, residues, or impurities at the microscale level during the assembly process These contaminants can originate from various sources, including the manufacturing environment, materials used in the process, or handling procedures The impact of microcontamination on flip chip modules includes degraded electrical performance, compromised mechanical integrity, and an increased risk of device failure (Park & O’Hanlon, 1993)
The difficulties associated with microcontamination in flip chip attach modules are multifaceted First, the small size of contaminants makes detection and characterization challenging, often requiring sophisticated equipment and techniques Second, the dynamic nature of the assembly process, involving intricate steps such as flux application, solder deposition, and reflow, creates opportunities for contamination introduction at each stage Additionally, the diversity of materials used in flip chip assembly, such as solder bumps, underfills, and substrates, further complicates the identification and mitigation of potential contaminants
Addressing microcontamination challenges in flip chip attach modules necessitates a comprehensive understanding of the root causes, detection methodologies, and effective mitigation strategies This research aims to use DMAIC methodology to investigate the root cause and propose innovative solutions to enhance detection for chip attach module to increase reliability and performance of flip chip attach modules in the presence of microcontamination.
Define problem
On the FMEA (Failure Modes and Effects Analysis) section review Flip chip attach performance in Q3’2023, FMEA determines top 10 failure modes that impact to module quality porfolio and SBB (Solder Bump Bridging) that has 6 - the most event excursion as Fig 3.8 below
Figure 3.8 Top failure mode excursion event in 2022-2023
Based on RPN (Risk Priority Number), SBB and NW/NCO (Non-wet and Non-contact opens) are put under high risk failure mode which need to have mitigation plan Those failure modes only be screened and detected at Test operation Meanwhile, NW/NCO already has action taken to reduce the Occurrence resulting in RPN significantly decreasing quarter to quarter, SBB still remaining exceed target goal 125 for 7 consecutive quarters as Fig 3.9
Figure 3.9 RPN for NCO/NonWet and SBB failure mode
SBB is identified as bottleneck impact to flip chip attach module productivity losses As consequences, SBB induced to many lots impacted with high yield loss caused units circuits short ( Fig 3.10 ) Failure units must be scrapped immediately Moreover, every event excursions took a lot of effords from engineering to trace back data, investigation, disposition, do corrective and preventive actions
Figure 3.10 SBB failure mode cross section
Measure the problem
At Measure phase, main mission is to get the overall picture of the process in order to identify the rootcause caused SBB Failure units were demixed out by test operation and submitted to Failure Analysis lab, after confirmation, a Pareto chart all root cause of historical events shows that FM (Foreign Material) contributed 90% from failure units and caused SBB FM as known as all type of micro contaminations from clean room process
Figure 3.11 Pareto chart for main contributor of SBB
From Material Analysis of Lab figure below, they figured out that main elements of some sample FM sent to lab are Fe, Cr, C, O… can be observed mostly from stainless steel equipment or tooling, refer to Fig 3.12
Figure 3.12 Material analysis for FM element evaluation
Further to find the source of FM, Fishbone diagram is being used to brainstorm idea from many equipment experts and process engineerings, which is displayed in Fig 3.13
Figure 3.13 Fishbone diagram of FM sourcing
With 6 mains factors given: Environment, Method, Machine, Material, Manpower, and Measurement, 5 of these received feeback from engineer that potentially creates FM Next, hypothesis Table 3.3 is built to determine likelyhood of each source and collect data needed to against the hypothesis could help to narrow down FM sourcing
Hypothesis Action plan to validate
NA Clean room meet 10K standard with frequently record particle data
Meet criteria on last record
Observation and validation current process
NA NA No gap if strictly follow spec guidance
Check by naked eyes and wipe some sensitive area
FM analysis shown FM element is
Observed White FM as flux evaporation and accumulation
FM analysis shown FM element is
White FM as flux evaporation and accumulation
Observe randomly material when running prod
NA NA Not observed Low
Hypothesis Action plan to validate
Observation FM analysis shown FM element is organic
Operator wear clean room suit while running prod
From the hypothesis table above, there is a problem with human factors or execution mistake, it more about discipline rule required to comply with all clean room prototypes will not mention into this case study Besides, clearly observe that FM source from equipment and tool part failure are unpredictable need deeper understanding equipment or process gap and enhance detection.
Analyze the problem
The mission of this phase is to identify the root cause with data validation The SBB failure mode during flip chip attach process is happened when FM persist underneath the substrate or present before substrate is placed on the BS, it will impact directly to bonding process like Fig
Figure 3.14 FM present in Bond stage
Figure 3.15 FM present underneath of substrate
Further investigation about Unit-level Key Process Parameters (KPP) to see any abnormalities, there are 2 KPPs retrieved from unit level bonding log file which are sensitive to FM which defined before for timely triggering during lot processing But the process still has several limitations while trying to utilize those KPPs: ContactHeight and DeltaHeight
SPC chart are also set for those KPPs like figure with limit defined before Like Fig 3.16, DELTAHEIGHT upper control limit is set at 0, with this current control limit there would be a potential outlier unit escape from detection Normally, people just revised the upper limit to a number where can catch all failure and outlier unit, example to set UCL at 0.5 But if considering to a whole population of many tools data points like Fig 3,17, that number would not correct anymore and might trigger a lot of false alarms consequences to productivity performance unhealthy and over rejected
Figure 3.16 SPC unit level limit for ContactHeight and DeltaHeight with less flexibility
Figure 3.17 Many over reject (false alarm) if revise the current Unit level limit
Therefore, we have approached for unsupervised machine learning methodology for FM detection enhancement but still well-adapt with current characteristics of unit level data variables due to no control limit defined, it will flexible adapt to variation process DBSCAN is a popular clustering algorithm used in data analysis and machine learning It is designed to group together data points that are closely packed and separate them from data points that are located in less dense regions DBSCAN is particularly useful for identifying clusters of irregular shapes and is robust to noise in the data From theoretical framework given in Section 2.3, during the analysis phase, we focus on how DBSCAN is applied to the current data model
Improvement proposal
Scripting logic
From grouping tool level data, 2 distributions of ContactHeight and DeltaHeight are collected to put under analysis, break down to bond stage or pocket level, those KPPs need to get normalized into z-score scale distribution by below formula (1) to ensure all data set as same scale axis
DBSCAN will automatically find all separate cluster based on eps and min sample user provided then label those clusters with commonality check if data is continuously Technically,
38 the DBSCAN methodology will go through 3 stages are Retrieve raw data, Find pattern and Commonality check, like Fig 4.2
Script validation
After building a script compatible with data structure, the next step is move to validation with reality result and do optimize Eps or Min sample if necessary From raw data, DBSCAN easily able to distinguish the pattern abnormal within the population, and label it different with normal one After that, the abnormal data will be drill down to bond stage level to find any commonalities within pocket level, as Fig 4.3
Script Optimization
There are few noise factors with the current data structure during validation step The first noise is outlier during lot processing due to the variation of the process, this is normal behavior without any impact to the quality, but it would create many different clusters like Fig 4.4 ,
DBSCAN identifies 3 clusters with cluster 0 is the main one, and the rest 2 clusters it’s considering outlier, really difficult to make the assessment with data returned
Figure 4.4 More than 1 cluster during validation
The second noise factors quite similar with outlier after drill data to pocket level, some pockets will include outlier to the clustering and causes noise to the result assessment In Fig 4.5 , for example, if the pocket 1 and pocket 3 has 1 outlier unit, the failure rate of each pocket would
40 be 14%, meanwhile the pocket 7 has 5 units that DBSCAN detected so the failure rate of this pocket extremely higher than other 2 pockets with 72%
Figure 4.5 Noise factor Outlier at pocket level
From Engineering assessment perspective, with these noise factors that met during the validation step, Engineering rules are created by engineering assessment as the person who define process and well acknowledge about equipment tool side
• 1 st rule: If FM persist on the surface of platform, underneath of substrate, there would be more than 1 and continuously unit impacted (excluded outlier which appear 1 time while bonding refer Fig 4.6a)
• 2 nd rule: 2 KPPS chosen if the more negative, the safer for bonding process (excluded negative cluster and merge it to main cluster Fig4.6b)
• 3 rd rule: with minimum lot size running at flip chip attach module return in pocket level, algorithm only run with data above 3 units (Min sample ≥ 3)
Script Qualification process
After optimization step, engineer need to qualify for the DBSCAN algorithm proposal by go through the qualification process before implementing it The qualification process ensures the script is running normally without any impact to productivity, tool performance or inaccuracy triggering The qualification process will evaluate DBSCAN algorithm under 2 criteria: Trigger rate and Precision by below matrix:
Table 4.1 Matrix for Valid and Invalid issue
DBSCAN not triggered DBSCAN triggered
Invalid Issue True Negative (TN) False Positive (FP)
Valid Issue False Negative (FN) True Positive (TP)
Then 2 formulas and criteria for Trigger rate and Precision are described below The criteria number is aligned with statistical agent
Following the matrix Table 4.1 above, Trigger rate formula (1) and Precision (2) is well determined for not impacting to production productivity Meanwhile, everytime when DBSCAN is triggered, production line has to be stopped and actions have to be taken immediately to remove the FM sticking on pedestal
The DBSCAN data collection will be performed under high volume manufacturing mode on both factory of Vietnam and China, duration is 1 quarter at the time of turning on the script
As final result, 14371 production lots are under scope of DBSCAN qualification, there are 28 lots triggered DBSCAN script with 226 units impacted Trigger rate is 0.2% (28/14371 lots) for lot level, and 0.001% (226/ 26.3M units) for unit level If consider for lot level on the result, trigger rate does not meet the criteria, but for unit level result the trigger rate still pass below
500 DPM (Defects per million) criteria
For accuracy rate, 9 lots are sample picked out to validate if SBB failure present Refer Fig 4.7, as result from XRAY machine, Failure Analysis lab, these 9 lots had SBB and detected FM on the impacted pocket The accuracy from 9 lots is 100% (9/9 lot detected), and DBSCAN successfully flagged highlight failure unit
Figure 4.7 FM observed on all cases triggered
Finally, DBSCAN algorithm is fully approved and proved that the data returned with high accuracy and not impact to manufacturing productivity
Implementation
After final approval from qualification process, DBSCAN is approved to use and would proliferate to across other factory site for all products applied at FlipChip Attach module
DBSCAN algorithm is put in query of SQL Pathfinder like Fig 4.8 and perform script host running every hour refresh to retrieve and analyze data frequently like Fig 4.9
Figure 4.9 Script host for DBSCAN script running
Control phase after implementation
At final stage of DMAIC – Control phase where engineering control the improvement made, document the improvement plan, monitor and review the result
Script host will be run every one hour and returned the latest data across Flipchip attach link if any abnormalities found The result returns under automail notification as Fig 4.10, automail includes all information which link/bonder/bond stage and product are triggering DBSCAN The operator needs to stop the impacted bonder and check physically the bond stage has potentially FM presence before resuming back production mode
Figure 4.10 Automail notification for operator take prompted response
On what we aim in the beginning Define stage, FMEA is being used to highlight the bottleneck of the process but it could also be used to document the improvement and implementation plan Key information needs to fill up with detail below information as Fig 4.11
• Tool type/Function is tool area that perform improvements Flip Chip Attach
• Potential Failure Mode/ Effects of Failure: SBB caused Circuit shorted
• Severity/Occurrence/Detection are ranking the current performance of Flip chip attach
• Potential Causes/ Mechanism of failure is root cause of failure: FM
• Prevention Control: Enable DBSCAN detection
• Detection Control is operation that screen of the failure mode: Test operation
• Responsibility is the person in charge
• Target completion date: Due date required complete
• Actions taken is document the action has been done in the past
Figure 4.11 FMEA template to document improvement
For further RPN and Occurrence comparison before and after the improvement, RPN trend demonstrates the different between before and after with significantly reduce, and meet module goal 125 meanwhile occurrence has been reduced for two conclusive quarters, refer Fig 4.12 and Fig 4.13
Figure 4.12 RPN Before and After do improvement
Figure 4.13 Occurrence of SBB before and after improvement
Conclusion
After the study, the thesis has demonstrated some results compare with methodology given from beginning, detail of the result is showing below Table 5.1 :
1 Identify current status 3 Identify current process and problem
2 Identify the root cause 3 Identify the root cause and limitation of current process
3 Solution 4 DBSCAN algorithm approached to solve the problem
5 Evaluate the result, limitation of solution, give recommendation
After DBSCAN enabled, the failure mode SBB induced by FM has significantly reduced quarter to quarter resulting in RPN and occurrence of SBB failure mode below module goal target DBSCAN helps to detect 28 out of 14371 lots with abnormal KPP detected, screens out
226 failure units out of 26.3 Million units within Flipchip attach module (below 10 DPM).
Evaluation of study
Advantages of study
- Identify the root cause impacted to quality indicator at Flipchip attach operation
- Identify the limitation of the quality control process
- Data driven approached successfully alternate traditional SPC control chart
- Assess all noise factors of the process to improve data returned
- Effectiveness of the solution result in number of occurrence event conclusive reduced
Disadvantages of study
- The study only aims to enhance the detection by data driven only, not focus to solve the root cause creates FM as FM is unpredictable
- The boundary of study only at Flipchip attach operation
- Manufacturing resources assuming not change, under normal condition.
Study limitation
- This study specifically focuses on the Flipchip attach operation and may not fully represent other operational functions
- Due to company-sensitive data, we have renamed the company, certain Key Performance Parameters (KPPs), and products
- Given differences in technology and products, KPP behavior may vary slightly and not be identical across all cases.
Recommendation for future research
From the advantage of the study given, author recommends for future research as below:
- Apply for all products are running at Flipchip attach operation and future products upcoming
- Apply DBSCAN for other KPPs to enhance detection for other failure modes impacted to quality outcome
- Monitoring data results from DBSCAN to make sure high accuracy and effectiveness
[1] G Moore, "Cramming more components onto integrated circuits," Electronics, vol 38, no 8, pp 114-117, 1965
[2] R K Ulrich and W D Brown, Advanced Electronic Packaging, Fayetteville Arkansas: Wiley, 2005
[3] H G Parks and J F O'Hanlon, "The evolving role of defects and contamination in semiconductor manufacturing," Microelectronics, vol 24, no 4, pp 313-327, 1993
[4] E Arnheiter and J Maleyeff, "The integration of lean management and Six Sigma," The
[5] M Harry and R Schroeder, Six sigma : the breakthrough management strategy revolutionizing the world's top corporations, New York: Currency, 2000
[6] P S Pande, R P Neuman and R R Cavanagh, The Six Sigma Way: How GE, Motorola, and Other Top Companies are Honing Their Performance, New York: McGraw-Hill,
[7] M Harry and R Schroeder, "Six sigma: The breakthrough management strategy revolutionizing the world's top corporations," Journal of Software Engineering and Applications, vol 7, no 6, 2000
[8] V Gupta, R Jain and M e a Meena, "Six-sigma application in tire-manufacturing company: a case study," J Ind Eng Int, vol 14, pp 511-520, 2018
[9] P e a Kumar, "Exploring the Potential of Six Sigma (DMAIC) in Minimizing the Production Defects," in Proceedings of the International Conference on Industrial & Mechanical Engineering and Operations Management , Bangladesh, 2021
[10] M Ester, H Kriegel, J Sander and X Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," Knowledge Discovery and Data Mining, vol 96, no 34, p 226–231, 1996
50 [11] M e a Guerreiro, "Anomaly Detection in Automotive Industry Using Clustering Methods—A Case Study," Applied Sciences, vol 11, no 21, 2021
Logic process of the algorithm will be described like Figure App.1
Figure App.1 Logic process of the algorithm
- Step 1: Input variables to be used globally Data will be retrieved from database
- Step 2: Create a table from new column added in from those input variables
- Step 3: Run DBSCAN source with min sample and radius on assigned Kpp name
- Step 4: Label those patterns returned from DBSCAN output
- Step 5: Find commonalities on pocket level and bondstage level to calculate failure rate on each pocket and bondstage
- Step 6: Predict any units have low change to fail, exclude out of pattern
- Step 7: Return in pass and fail units, merge all classes output
Input: global variables lst_kpp = ['CONTACTHEIGHT', 'DELTAHEIGHT']
52 lst_std = ['CONTACTHEIGHT_STD', 'DELTAHEIGHT_STD'] lst_norm = ['CONTACTHEIGHT_NORM', 'DELTAHEIGHT_NORM'] lst_bs_mean = ['CONTACTHEIGHT_BS_MEAN', 'DELTAHEIGHT_BS_MEAN'] lst_bs_std = ['CONTACTHEIGHT_BS_STD', 'DELTAHEIGHT_BS_STD']key_pocket = ['LINK_ID','LOT','TEST_BOND_HEAD_ID','COMPLEVEL_1', 'BONDSTAGE',
'CARRIER_X'] key_bs = ['LINK_ID','LOT','TEST_BOND_HEAD_ID','COMPLEVEL_1', 'BONDSTAGE'] columns = df.columns kpp = columns[columns.isin(lst_kpp)].values lot = df.LOT.unique() die = df.COMPLEVEL_1.unique() pocket = df.CARRIER_X.unique()
- df (DataFrame): Input data frame
- output (DataFrame): Resultant data frame
Initialize an empty DataFrame called 'output'
For each 'lot' in the data frame:
For each 'die' in the data frame:
Filter rows in 'df' where 'LOT' equals 'lot' and 'COMPLEVEL_1' equals 'die'
Get unique values of 'TEST_BOND_HEAD_ID' and store in 'bonder' For each 'bonder':
Filter rows in 'df' where 'TEST_BOND_HEAD_ID' equals 'bonder' Get unique values of 'BONDSTAGE' and store in 'bondstage' For each 'bs' in 'bondstage':
Filter rows in 'df' where 'BONDSTAGE' equals 'bs'
Get unique values of 'CARRIER_X' and store in 'pocket'
Initialize empty arrays 'qty' and 'pocket_sequence'
For each 'poc' in 'pocket':
Filter rows in 'df' where 'CARRIER_X' equals 'poc' Compute the quantity of rows and store in 'qty_poc' Append 'qty_poc' to 'qty'
Append a sequence from 1 to 'qty_poc' to
Add a new column 'UNIT_QTY_POCKET' to 'df_bondstage' with values from 'qty'
Add a new column 'POCKET_SEQUENCE' to 'df_bondstage' with values from 'pocket_sequence'
For each 'kpp' (a list of features):
Compute mean and standard deviation of 'kpp' in
If 'kpp_std' is not equal to 0:
Compute normalized values for 'kpp' and add a new column 'kpp_NORM'
Compute normalized values assuming a denominator of
Add a new column 'kpp_NORM'
Compute skewness and kurtosis of 'kpp' and add new columns 'skew' and 'kurt'
Concatenate 'df_bondstage' with 'output' and update
Return the final 'output' DataFrame
Class Clustering (df, kpp_name, radius, ms)
- df (DataFrame): Input data frame
- kpp_name (list): List of feature names
- radius (float): DBSCAN epsilon value
- ms (int): Minimum samples for DBSCAN
- output (DataFrame): Resultant data frame
Initialize an empty DataFrame called 'output'
For each 'lot' in the data frame:
For each 'die' in the data frame:
Filter rows in 'df' where 'LOT' equals 'lot' and 'COMPLEVEL_1' equals 'die'
Get unique values of 'TEST_BOND_HEAD_ID' and store in 'bonder' For each 'bonder':
Filter rows in 'df' where 'TEST_BOND_HEAD_ID' equals 'bonder' Get unique values of 'BONDSTAGE' and store in 'bondstage'
For each 'bs' in 'bondstage':
Filter rows in 'df' where 'BONDSTAGE' equals 'bs'
Apply DBSCAN to the selected features in 'df_bondstage' with parameters 'radius' and 'ms'
Get cluster labels and store in 'labels'
Create a new DataFrame 'df_distance' as a copy of
Add a new column 'PATTERN_LABEL' to 'df_distance' with values from 'labels'
Concatenate 'df_distance' with 'output' and update 'output'
Return the final 'output' DataFrame
- output (DataFrame): Input data frame
- corrected_output (DataFrame): Resultant data frame after label correction
# Increment positive pattern labels by 1
For each row in 'output':
If 'PATTERN_LABEL' is greater than or equal to 0:
# Compute the most frequent pattern label for each group defined by
Initialize an empty DataFrame called 'gb'
For each unique value in 'key_bs':
Group rows in 'output' by 'key_bs'
Compute the most frequent pattern label and store it in a new column 'BASELINE_LABEL'
Append this information to 'gb'
# Merge the computed baseline labels back into the original data frame Join 'output' with 'gb' on 'key_bs' using an inner join
If 'PATTERN_LABEL' equals 'BASELINE_LABEL', set it to 0
If 'PATTERN_LABEL' is greater than 'BASELINE_LABEL', decrement it by 1
Return the corrected 'output' DataFrame
Class Find_CA(Non_baseline, threshold, num_pocket)
- Non_baseline (DataFrame): Input data frame
- threshold (float): Threshold for failure rate
- num_pocket (int): Minimum number of units per pocket
- output (DataFrame): Resultant data frame after finding CA
Initialize a new DataFrame 'abn_table_pocket' as a copy of 'Non_baseline'
For each feature 'i' in the list 'lst_norm':
Filter rows in 'abn_table_pocket' where 'i' value is greater than -1
Add a new column 'NUM_UNIT_POCKET' to 'abn_table_pocket' with initial value
Group 'abn_table_pocket' by 'key_pocket' and count the number of rows
Reset the index and keep only relevant columns
Initialize another new DataFrame 'abn_table_bs' as a copy of 'Non_baseline' Add a new column 'NUM_UNIT_BS' to 'abn_table_bs' with initial value 0
Group 'abn_table_bs' by 'key_bs' and count the number of rows
Reset the index and keep only relevant columns
# Merge the two tables to compute the failure rate at pocket
Join 'abn_table_pocket' with 'abn_table_bs' using 'key_bs' as the common key
Compute the failure rate by dividing 'NUM_UNIT_POCKET' by 'NUM_UNIT_BS' Filter rows in 'failure_table' where the failure rate is greater than the specified threshold
Filter rows where the number of units per pocket is greater than the specified 'num_pocket'
# Merge with 'Non_baseline' table to get VID
Join 'Non_baseline' with 'failure_table' using 'key_pocket' as the common key, keeping only inner matches
Return the final 'output' DataFrame
Class Predict(CA_table, clustering_kpp, clustering_radius, clustering_ms, threshold, num_pocket)
- CA_table (DataFrame): Input data frame
- clustering_kpp (list): List of features for clustering
- clustering_radius (float): DBSCAN epsilon value for clustering
- clustering_ms (int): Minimum samples for DBSCAN clustering
- threshold (float): Threshold for failure rate
- num_pocket (int): Minimum number of units per pocket
- output (DataFrame): Resultant data frame after prediction
Initialize an empty DataFrame called 'output'
Create a copy of 'CA_table' called 'CA_table_temp'
Apply clustering to 'CA_table_temp' using the specified parameters
('clustering_kpp', 'clustering_radius', 'clustering_ms')
Correct labels in 'CA_table_temp' using the 'Label_Correction' function Drop the 'BASELINE_LABEL' column from 'CA_table_temp'
Drop columns related to unit counts and failure rates from 'CA_table_temp'
For each pattern label 'i' from 0 to the maximum pattern label in
Filter rows in 'CA_table_temp' where 'PATTERN_LABEL' equals 'i'
Find CA using the 'Find_CA' function with specified 'threshold' and 'num_pocket'
Concatenate the resulting DataFrame with 'output'
Return the final 'output' DataFrame
Class Merge_Table(predict_table, table):
Initialize an empty DataFrame called ‘output’
Extract the columns ‘COMPLEVEL_1’ and ‘VISUAL_ID’ from ‘predict_table’ and store them in ‘predict_vid’
Add a new column ‘PREDICTION’ to ‘predict_vid’ with an initial value of 1
Merge ‘table’ with ‘predict_vid’ on the columns ‘COMPLEVEL_1’ and
‘VISUAL_ID’ using a left join the ‘PREDICTION’ column: if it’s not equal to 1, set it to 0
Return the final ‘output’ DataFrame
Class Main Program (if name == “main”):
Create the ‘table’ DataFrame using the ‘Create_Table’ function with input data ‘df’
Apply clustering to ‘table’ using the specified parameters (‘lst_norm’, 1.8, 4) and store the result in ‘cluster_table’
Correct labels in ‘cluster_table’ using the ‘Label_Correction’ function Drop the ‘BASELINE_LABEL’ column from ‘cluster_table’
If ‘cluster_table’ is not empty:
Find CA using the ‘Find_CA’ function with specified
‘threshold’ and ‘num_pocket’ for rows where ‘PATTERN_LABEL’ is not 0
Otherwise, set ‘predict_table’ to an empty DataFrame
If ‘cluster_table’ is not empty:
Predict using the ‘Predict’ function with specified parameters (‘CA_table’, [‘POCKET_SEQUENCE’], 3, 2, 0.5, 3)
Save ‘predict_table’ to a CSV file named ‘predict_table.csv’
Compute the ‘STD_table’ for rows where ‘PATTERN_LABEL’ is 0 using the
Compute the ‘STD_pocket_table’ using the ‘High_STD_Pocket’ function with
Save ‘STD_pocket_table’ to a CSV file named ‘STD_pocket_table.csv’ Merge ‘predict_table’ with ‘table’ using the ‘Merge_Table’ function and save the result to a CSV file named ‘Raw.csv’
As same as with logic process of the algorithm in Appendix A DBSCAN Clustering code also be established under python environment below
# coding: utf-8 import numpy as np import math as m import pandas as pd from sklearn.cluster import DBSCAN import warnings import os warnings.filterwarnings('ignore')
# Read KPP data + fill NA values to prevent missing path = os.getcwd() df = pd.read_csv('DBSCAN_summary.csv') if(df.shape[0] == 0): exit() df = df.dropna() sort_order = ['LOT', 'COMPLEVEL_1', 'TEST_BOND_HEAD_ID', 'BONDSTAGE', 'CARRIER_X', 'PROCESSTIMESTAMP'] df = df.sort_values(by = sort_order)
# Declare global variables lst_kpp = ['CONTACTHEIGHT', 'DELTAHEIGHT'] lst_std = ['CONTACTHEIGHT_STD', 'DELTAHEIGHT_STD'] lst_norm = ['CONTACTHEIGHT_NORM', 'DELTAHEIGHT_NORM'] lst_bs_mean = ['CONTACTHEIGHT_BS_MEAN', 'DELTAHEIGHT_BS_MEAN'] lst_bs_std = ['CONTACTHEIGHT_BS_STD', 'DELTAHEIGHT_BS_STD'] lst_quantile = ['CONTACTHEIGHT_QUANTILE', 'DELTAHEIGHT_QUANTILE'] lst_FLAG = ['CONTACTHEIGHT_STD_FLAG', 'DELTAHEIGHT_STD_FLAG'] lst_skew = ['CONTACTHEIGHT_BS_SKEW', 'DELTAHEIGHT_BS_SKEW'] lst_kurt = ['CONTACTHEIGHT_BS_KURT', 'DELTAHEIGHT_BS_KURT'] key_pocket = ['LINK_ID','LOT','TEST_BOND_HEAD_ID','COMPLEVEL_1',
'BONDSTAGE', 'CARRIER_X'] key_bs = ['LINK_ID','LOT','TEST_BOND_HEAD_ID','COMPLEVEL_1', 'BONDSTAGE'] columns = df.columns kpp = columns[columns.isin(lst_kpp)].values lot = df.LOT.unique() die = df.COMPLEVEL_1.unique() pocket = df.CARRIER_X.unique()
# Distance function def euclidean_distance(x, y): sum = 0 for i in range (x.shape[0]): sum += (x[i] - y[i])**2 return m.sqrt(sum)
59 def Create_Table(df): output = pd.DataFrame() for l in lot: for d in die: df_lot_die = df[(df.LOT == l) & (df.COMPLEVEL_1 == d)] bonder = df_lot_die.TEST_BOND_HEAD_ID.unique() for b in bonder: df_bonder = df_lot_die[df_lot_die.TEST_BOND_HEAD_ID == b] bondstage = df_bonder.BONDSTAGE.unique() for bs in bondstage: df_bondstage = df_bonder[df_bonder.BONDSTAGE == bs] pocket = df_bondstage.CARRIER_X.unique()
# Compute how many units per bondstage pocket and put into columne ['UNIT_QTY_P] qty = np.empty(0) pocket_sequence = np.empty(0) for poc in pocket: df_poc = df_bondstage[df_bondstage.CARRIER_X == poc] qty_poc = np.full(df_poc.shape[0], df_poc.shape[0]) qty = np.concatenate([qty, qty_poc]) pocket_sequence = np.concatenate([pocket_sequence, list(range(1, np.max(qty_poc) + 1))]) df_bondstage['UNIT_QTY_POCKET'] = qty df_bondstage['POCKET_SEQUENCE'] = pocket_sequence
#Zscore normalization for i in range (len(lst_kpp)): kpp_mean = np.mean(df_bondstage[kpp[i]]) kpp_std = np.std(df_bondstage[kpp[i]]) if (kpp_std != 0): df_bondstage[lst_kpp[i] + '_' + 'NORM'] = df_bondstage[kpp[i]].map(lambda x: (x - kpp_mean)/(kpp_std)) else: df_bondstage[lst_kpp[i] + '_' + 'NORM'] = df_bondstage[kpp[i]].map(lambda x: (x - kpp_mean)/1)
# Compute skew and kurtosis df_bondstage[lst_skew[i]] = df_bondstage[kpp[i]].skew() df_bondstage[lst_kurt[i]] = df_bondstage[kpp[i]].kurt() output = pd.concat([output, df_bondstage]) return output
# Loop through each cluster and run DBSCAN to find pattern def Clustering(df, kpp_name, radius, ms):
#init a dataframe named distance output = pd.DataFrame() for l in lot: for d in die: df_lot_die = df[(df.LOT == l) & (df.COMPLEVEL_1 == d)] bonder = df_lot_die.TEST_BOND_HEAD_ID.unique() for b in bonder: df_bonder = df_lot_die[df_lot_die.TEST_BOND_HEAD_ID == b]
60 bondstage = df_bonder.BONDSTAGE.unique() for bs in bondstage: df_bondstage = df_bonder[df_bonder.BONDSTAGE == bs]
# pocket = df_bondstage.CARRIER_X.unique()
#DBSCAN db = DBSCAN(eps= radius, min_samples= ms).fit(df_bondstage.loc[:, kpp_name]) labels = db.labels_ df_distance = df_bondstage.copy() df_distance['PATTERN_LABEL'] = labels
#Dataframe concatenation output = pd.concat([output, df_distance]) return output
# Correct the true label for baseline and abnormal pattern (if exist) def Label_Correction(output): output.loc[output.PATTERN_LABEL >= 0, 'PATTERN_LABEL'] = output.PATTERN_LABEL + 1
# Create new columns BASELINE_LABEL which is the most repeat in
PATTERN_LABEL gb = output.groupby(key_bs)['PATTERN_LABEL'].agg([('BASELINE_LABEL', lambda x:x.value_counts().index[0])])
#Convert from DataFrame Groupby output from Series to DataFrame gb = gb.reset_index() output = pd.merge(output, gb, on = key_bs, how = 'inner') output.loc[output.PATTERN_LABEL == output.BASELINE_LABEL,
# Previously, all pattern labels were incremented Those patterns having label match with baseline will be set to 0
# other pattern labels are decreased back output.loc[output.PATTERN_LABEL > output.BASELINE_LABEL,
'PATTERN_LABEL'] = output.PATTERN_LABEL - 1 return output
# Filter out noise and abnormal pattern by doing CA def Find_CA(Non_baseline, threshold, num_pocket):
# Find CA on pocket abn_table_pocket = Non_baseline.copy() for i in lst_norm: abn_table_pocket = abn_table_pocket[abn_table_pocket[i] > -1]
#Filter outlier units having both CONTACTHEIGHT_Norm & DELTAHEIGHT_Norm > -
1 abn_table_pocket['NUM_UNIT_POCKET'] = 0 abn_table_pocket = abn_table_pocket.groupby(key_pocket).count().reset_index() abn_table_pocket = abn_table_pocket[['LINK_ID','LOT','TEST_BOND_HEAD_ID','COMPLEVEL_1','BONDST AGE','CARRIER_X','NUM_UNIT_POCKET']] #Only focus on few columns
# Find CA on bs abn_table_bs = Non_baseline.copy()
61 abn_table_bs['NUM_UNIT_BS'] = 0 abn_table_bs = abn_table_bs.groupby(key_bs).count().reset_index() abn_table_bs = abn_table_bs[['LINK_ID','LOT','TEST_BOND_HEAD_ID','COMPLEVEL_1','BONDSTAGE' ,'NUM_UNIT_BS']] #Only focus on few columns
# Merge 2 tables to compute the failure rate at pocket failure_table = abn_table_pocket.join(abn_table_bs.set_index(key_bs), on=key_bs)
# Compute the failure rate failure_table['FAILURE_RATE'] = failure_table.NUM_UNIT_POCKET / failure_table.NUM_UNIT_BS failure_table = failure_table[failure_table.FAILURE_RATE > threshold]
#The outlier units need to concentrate at 1 pocket with failure rate > threshold failure_table = failure_table[failure_table.NUM_UNIT_POCKET > num_pocket]
# Merge with Non_baseline table to get VID output = pd.merge(Non_baseline, failure_table, on = key_pocket, how = 'inner') return output
# DBSCAN those CA_table unit to see they fail continously or not def Predict(CA_table, clustering_kpp, clustering_radius, clustering_ms, threshold, num_pocket): output = pd.DataFrame()
CA_table_temp = CA_table.copy()
CA_table_temp = Clustering(CA_table, clustering_kpp, clustering_radius, clustering_ms)
CA_table_temp = Label_Correction(CA_table_temp)
CA_table_temp = CA_table_temp.drop(['BASELINE_LABEL'], axis =
CA_table_temp = CA_table_temp.drop(['NUM_UNIT_POCKET', 'NUM_UNIT_BS', 'FAILURE_RATE'], axis = 'columns') for i in range(np.max(CA_table_temp.PATTERN_LABEL) + 1): df_temp = Find_CA(CA_table_temp[CA_table_temp.PATTERN_LABEL == i], threshold, num_pocket) output = pd.concat([output, df_temp]) return output
# Function to compute std at pocket level def STD_Pocket(baseline): high_std_table = baseline.copy()
# Compute pocket std high_std_table = high_std_table.groupby(key_pocket)[lst_kpp].std().reset_index()
# Rename column name from [lst_Kpp] to [lst_std] high_std_table.columns = high_std_table.columns[:6].tolist() + lst_std # Merge with cluster_table to get VID output = pd.merge(cluster_table, high_std_table, on = key_pocket, how =
# Calculate std quantile high_std_table = high_std_table.groupby(key_bs)[lst_std].quantile(0.5).reset_index()
# Rename column name from [lst_std] to [lst_quantile] high_std_table.columns = high_std_table.columns[:5].tolist() + lst_quantile
# Merge to get the final output table output = pd.merge(output, high_std_table, on = key_bs, how = 'inner') return output
# Function to get pocket having high std but not establishing an abnormal pattern def High_STD_Pocket(STD_table): for i in range (len(lst_kpp)):
STD_table[lst_std[i] + '_' + 'FLAG'] = 0
STD_table.loc[(STD_table[lst_std[i]] > 2 *
STD_table[lst_quantile[i]]) & (STD_table[lst_skew[i]] > -0.5) &
(STD_table['UNIT_QTY_POCKET'] > 8), lst_std[i] + '_' + 'FLAG'] = 1
FLAG_table = np.array(STD_table.iloc[:, STD_table.shape[1] - len(lst_kpp) : STD_table.shape[1]]) for i in range (FLAG_table.shape[0]): if np.any(FLAG_table[i]):