A number of unique issues must be solved when analyzing big and very high-dimensional data and/or big data with discrete variables of very high cardinality:
- You must satisfy performance requirements for effective process monitoring and anomaly detection, predictive analytics and fault classification, and root-cause analysis.
- Discrete predictor variables of high cardinality (for example, codes identifying thousands of tools) must be pre-processed and converted to fewer or to single-column continuous derived variables.
- Initial feature selection methods must then be applied to derive from the very large numbers of predictor variables a smaller subset of “important” predictors. These are then related to important process outcomes using machine learning algorithms.
- Results must be delivered to an interactive visualization platform that enables actionable insights for engineers and process stakeholders.
Download the whitepaper to learn about an architecture developed by TIBCO for a large semiconductor manufacturer for efficiently implementing these steps, in addition to real-world analytics use cases typically encountered in this industry.