Anomaly Detection

The Anomaly Detection workflow detects outliers, correlates anomalies across instruments, classifies root causes, and forecasts future values. It runs in batch and streaming modes, producing anomaly scores, classifications, and a growth quality prediction alongside the raw timeseries. This workflow runs on the outputs of Characterization and Tool State workflows.

Overview
Technical

Detection Process

For each timeseries property:

Embedding: Raw timeseries windows are encoded into high-dimensional vectors using a timeseries foundation model fine-tuned on process data.
Outlier detection: Each embedding is scored against learned baselines. Outliers in the latent space correspond to unusual temporal patterns in the original signal.
Cross-source correlation: After per-parameter detection completes, the system identifies temporal overlaps and embedding-space similarities across instruments to group related anomalies.
Classification: Each anomaly is classified by probable root cause (e.g., source depletion, temperature excursion, substrate defect, chamber contamination) using the embedding vector and contextual features.
Forecasting: The model projects the timeseries forward with confidence intervals, flagging predicted anomalies before they develop.
Growth quality scoring: Signals from all active data sources are combined into a single quality score representing the probability of meeting target specifications.

Output Metrics

For each property in a timeseries, the workflow produces:

Metric	Description	Range
anomaly_score	Latent-space outlier score. Higher values indicate greater deviation from baselines.	0 (normal) to 1 (anomalous)
anomaly_class	Classification label, assigned when anomaly score exceeds the detection threshold	Categorical
forecast	Predicted value N steps ahead based on current trajectory	Same unit as property
forecast_lower / forecast_upper	Forecast confidence interval bounds	Same unit as property
z_score	Standard deviations from running mean (legacy)	Unbounded
ema_z_score	Residual z-score relative to exponential moving average (legacy)	Unbounded

At the growth session level:

Metric	Description	Range
growth_quality_score	Probability of meeting target specifications, combining signals across all data sources	0 (poor) to 1 (excellent)

Execution Modes

Batch: Runs automatically after the analysis pipeline completes on uploaded data. The full timeseries is evaluated in a single pass.Streaming: Runs incrementally with each incoming data chunk during a live run. The system maintains a rolling detection context across chunks to detect both sudden anomalies and emerging trends. Forecasts and the growth quality score update with each chunk. Error-level anomalies trigger real-time alerts.Scores and forecasts are not emitted until enough observations have been processed to fill the embedding window. Before that threshold, outputs are NaN.

Severity and Thresholds

Severity is assigned based on anomaly score and impact on the growth quality prediction:

Severity	Meaning	Action
Warning	Score elevated but within tolerance, or minor quality score impact. May self-correct.	Review after the run or if warnings accumulate.
Error	Score exceeds the critical threshold, or quality score has dropped significantly.	Investigate immediately. Triggers real-time alerts in streaming runs.

Default baselines are calibrated from your organization’s historical data. Per-project overrides let you tighten thresholds for critical parameters or relax them for known-noisy signals.

Embedding Pipeline

The workflow uses a pretrained timeseries transformer fine-tuned on Atomscale’s accumulated process data. Fine-tuning produces embeddings that are more discriminative for growth process anomalies than the base model.Timeseries are split into overlapping windows, each encoded into a high-dimensional vector capturing temporal structure.The approach is source-agnostic: a single detection engine works across all characterization and tool state types without per-source threshold tuning.

Outlier Detection

Detection operates in the model’s latent space rather than on raw values. Embeddings encode temporal structure, not just instantaneous magnitudes, catching patterns that statistical methods on raw values miss.Ensemble outlier detection scores each embedding against learned baselines. Baselines are computed per organization from historical data: the distribution of embeddings from past growths defines “normal” for each recipe and instrument combination.Scores are calibrated to consistent 0-to-1 values across parameter types and instruments. Near 0 means consistent with the baseline; near 1 means far from any previously observed pattern.

Cross-Source Correlation

After per-parameter detection completes, the correlation engine groups related anomalies across instruments within a growth session (all data items sharing a single run’s time origin).Two anomalies from different sources are linked when they overlap in time and their embedding vectors are similar in latent space. Correlated groups carry compound severity: co-occurring anomalies across instruments are stronger evidence of a real process deviation than any individual detection.

Anomaly Classification

Anomalies are classified using the embedding vector combined with contextual features (instrument type, recipe phase, parameter identity). The classification model is trained on labeled data from production use, where operators confirm or dismiss detections.Categories include source depletion, temperature control excursion, substrate quality variation, and chamber contamination signatures. New categories can emerge as the model encounters novel patterns that don’t fit existing clusters.

Predictive Forecasting

The model generates forecasts with confidence intervals by projecting the embedding trajectory forward. During streaming, forecasts update with each chunk.A predicted anomaly is flagged when the forecast crosses the anomaly threshold within the prediction horizon, providing early warning before the parameter has fully deviated.

Growth Quality Scoring

The growth quality score combines anomaly signals from all active data sources into a single probability of the final outcome meeting target specifications, trained on historical anomaly-outcome correlations.During streaming, the score updates in real time. A sudden drop, even without any individual Error-level anomaly, can indicate a compound process shift that warrants attention.

Process-Level Compound Rules

Beyond per-parameter detection, the system evaluates compound rules combining conditions across parameters or instruments (e.g., RHEED trajectory divergence while metrology anomaly score is elevated and optical segmentation shows a morphology shift).Compound rules are defined at the project level and encode domain-specific knowledge about failure modes that no single parameter captures alone.

Platform

Get Started

Guides

Characterization

Reference

Detection Process

Output Metrics

Execution Modes

Severity and Thresholds

Embedding Pipeline

Outlier Detection

Cross-Source Correlation

Anomaly Classification

Predictive Forecasting

Growth Quality Scoring

Process-Level Compound Rules

Platform

Get Started

Guides

Characterization

Reference

​Detection Process

​Output Metrics

​Execution Modes

​Severity and Thresholds

​Embedding Pipeline

​Outlier Detection

​Cross-Source Correlation

​Anomaly Classification

​Predictive Forecasting

​Growth Quality Scoring

​Process-Level Compound Rules

Detection Process

Output Metrics

Execution Modes

Severity and Thresholds

Embedding Pipeline

Outlier Detection

Cross-Source Correlation

Anomaly Classification

Predictive Forecasting

Growth Quality Scoring

Process-Level Compound Rules