Skip to main content
The Similarity workflow computes embeddings from characterization timeseries data and uses them for two purposes: projecting data items into a 2D map for visual comparison, and tracking how similar a live item is to selected reference items over time. Similarity is scoped per workflow type (e.g., RHEED rotating, RHEED stationary) and per organization. Each workflow type has its own embedding space.

Embedding Projection

The embedding pipeline takes characterization timeseries, encodes them into high-dimensional embeddings, and projects those embeddings down to 2D coordinates. The result is a scatter plot where data items with similar characterization profiles appear near each other.When enough data items are available, the pipeline can use a landmark-based projection that scales better to large datasets.The projection refreshes on demand and tracks its status (fresh, updating, stale, or error) so the frontend knows when to show updated coordinates.

Similarity Trajectory

The trajectory pipeline computes how similar a tracked item is to one or more reference items over time. It slides a window across the tracked item’s timeseries, encodes each window into an embedding, and compares it against precomputed reference prototypes.Two execution modes are available:
  • Stream: A single incremental update, typically triggered when new data arrives from the RHEED pipeline. Appends one new similarity score to the existing trajectory.
  • Historical: Recomputes the full trajectory by iterating through the entire timeseries. Used when a user initiates a new comparison.
When data items have angle tags (common with RHEED), the pipeline groups comparisons by matching angles so that, for example, a [100] azimuth item is only compared against [100] references.