Documentation Index
Fetch the complete documentation index at: https://docs.atomscale.ai/llms.txt
Use this file to discover all available pages before exploring further.
The Anomaly Detection workflow detects outliers, correlates anomalies across instruments,
classifies root causes, and forecasts future values. It runs in batch and streaming modes,
producing anomaly scores, classifications, and a growth quality prediction alongside the
raw timeseries.
This workflow runs on the outputs of Characterization
and Tool State workflows.
Detection Process
For each timeseries property:
- Embedding: Raw timeseries windows are encoded into high-dimensional vectors using a timeseries foundation model fine-tuned on process data.
- Outlier detection: Each embedding is scored against learned baselines. Outliers in the latent space correspond to unusual temporal patterns in the original signal.
- Cross-source correlation: After per-parameter detection completes, the system identifies temporal overlaps and embedding-space similarities across instruments to group related anomalies.
- Classification: Each anomaly is classified by probable root cause (e.g., source depletion, temperature excursion, substrate defect, chamber contamination) using the embedding vector and contextual features.
- Forecasting: The model projects the timeseries forward with confidence intervals, flagging predicted anomalies before they develop.
- Growth quality scoring: Signals from all active data sources are combined into a single quality score representing the probability of meeting target specifications.
Output Metrics
For each property in a timeseries, the workflow produces:| Metric | Description | Range |
|---|
| anomaly_score | Latent-space outlier score. Higher values indicate greater deviation from baselines. | 0 (normal) to 1 (anomalous) |
| anomaly_class | Classification label, assigned when anomaly score exceeds the detection threshold | Categorical |
| forecast | Predicted value N steps ahead based on current trajectory | Same unit as property |
| forecast_lower / forecast_upper | Forecast confidence interval bounds | Same unit as property |
| z_score | Standard deviations from running mean (legacy) | Unbounded |
| ema_z_score | Residual z-score relative to exponential moving average (legacy) | Unbounded |
At the growth session level:| Metric | Description | Range |
|---|
| growth_quality_score | Probability of meeting target specifications, combining signals across all data sources | 0 (poor) to 1 (excellent) |
Execution Modes
Batch: Runs automatically after the analysis pipeline completes on uploaded data. The full timeseries is evaluated in a single pass.Streaming: Runs incrementally with each incoming data chunk during a live run. The system maintains a rolling detection context across chunks to detect both sudden anomalies and emerging trends. Forecasts and the growth quality score update with each chunk. Error-level anomalies trigger real-time alerts.Scores and forecasts are not emitted until enough observations have been processed to fill the embedding window. Before that threshold, outputs are NaN.Severity and Thresholds
Severity is assigned based on anomaly score and impact on the growth quality prediction:| Severity | Meaning | Action |
|---|
| Warning | Score elevated but within tolerance, or minor quality score impact. May self-correct. | Review after the run or if warnings accumulate. |
| Error | Score exceeds the critical threshold, or quality score has dropped significantly. | Investigate immediately. Triggers real-time alerts in streaming runs. |
Default baselines are calibrated from your organization’s historical data. Per-project overrides let you tighten thresholds for critical parameters or relax them for known-noisy signals.Embedding Pipeline
The workflow uses a pretrained timeseries transformer fine-tuned on Atomscale’s accumulated process data. Fine-tuning produces embeddings that are more discriminative for growth process anomalies than the base model.Timeseries are split into overlapping windows, each encoded into a high-dimensional vector capturing temporal structure.The approach is source-agnostic: a single detection engine works across all characterization and tool state types without per-source threshold tuning.Outlier Detection
Detection operates in the model’s latent space rather than on raw values. Embeddings encode temporal structure, not just instantaneous magnitudes, catching patterns that statistical methods on raw values miss.Ensemble outlier detection scores each embedding against learned baselines. Baselines are computed per organization from historical data: the distribution of embeddings from past growths defines “normal” for each recipe and instrument combination.Scores are calibrated to consistent 0-to-1 values across parameter types and instruments. Near 0 means consistent with the baseline; near 1 means far from any previously observed pattern.Cross-Source Correlation
After per-parameter detection completes, the correlation engine groups related anomalies across instruments within a growth session (all data items sharing a single run’s time origin).Two anomalies from different sources are linked when they overlap in time and their embedding vectors are similar in latent space. Correlated groups carry compound severity: co-occurring anomalies across instruments are stronger evidence of a real process deviation than any individual detection.Anomaly Classification
Anomalies are classified using the embedding vector combined with contextual features (instrument type, recipe phase, parameter identity). The classification model is trained on labeled data from production use, where operators confirm or dismiss detections.Categories include source depletion, temperature control excursion, substrate quality variation, and chamber contamination signatures. New categories can emerge as the model encounters novel patterns that don’t fit existing clusters.Predictive Forecasting
The model generates forecasts with confidence intervals by projecting the embedding trajectory forward. During streaming, forecasts update with each chunk.A predicted anomaly is flagged when the forecast crosses the anomaly threshold within the prediction horizon, providing early warning before the parameter has fully deviated.Growth Quality Scoring
The growth quality score combines anomaly signals from all active data sources into a single probability of the final outcome meeting target specifications, trained on historical anomaly-outcome correlations.During streaming, the score updates in real time. A sudden drop, even without any individual Error-level anomaly, can indicate a compound process shift that warrants attention.Process-Level Compound Rules
Beyond per-parameter detection, the system evaluates compound rules combining conditions across parameters or instruments (e.g., RHEED trajectory divergence while metrology anomaly score is elevated and optical segmentation shows a morphology shift).Compound rules are defined at the project level and encode domain-specific knowledge about failure modes that no single parameter captures alone.