Embeddings¶
Declarative embedding pipelines for observation processing.
Overview¶
The falcon.embeddings package provides tools for building observation embedding
networks via declarative YAML configuration. Embeddings map raw observations to
lower-dimensional summary statistics before they enter the estimator.
Declarative Configuration¶
Embedding networks are defined in YAML using the _target_ / _input_ system:
_target_: Import path of thenn.Moduleclass to instantiate_input_: List of observation node names (or nested sub-embeddings) that feed into this module
Basic Embedding¶
Multi-Input Embedding¶
Nested Pipeline¶
Sub-embeddings can be nested arbitrarily deep. Each _input_ entry can itself
be a _target_ / _input_ block:
embedding:
_target_: model.Concatenate
_input_:
- _target_: timm.create_model
model_name: resnet18
pretrained: true
num_classes: 0
_input_:
_target_: model.Unsqueeze
_input_: [image]
- _target_: torch.nn.Linear
in_features: 64
out_features: 32
_input_: [metadata]
Built-in Utilities¶
Normalization¶
LazyOnlineNorm¶
Online normalization with lazy initialization and optional momentum adaptation. Normalizes inputs to zero mean and unit variance using exponential moving averages.
DiagonalWhitener¶
Diagonal whitening with optional Hartley transform preprocessing and momentum-based running statistics.
hartley_transform¶
A standalone function for computing the discrete Hartley transform, useful as a preprocessing step.
Dimensionality Reduction¶
PCAProjector¶
Streaming dual PCA projector with momentum-based updates. Performs online SVD for dimensionality reduction with variance-based prior (ridge-like regularization) and optional output normalization.
Class Reference¶
instantiate_embedding
¶
Instantiate embedding pipeline from config.
Source code in falcon/embeddings/builder.py
EmbeddingWrapper
¶
Bases: Module
Sequential execution of modules with shared data dictionary.
Source code in falcon/embeddings/builder.py
LazyOnlineNorm
¶
LazyOnlineNorm(momentum=0.01, epsilon=1e-20, log_prefix=None, adaptive_momentum=False, monotonic_variance=True, use_log_update=False)
Bases: Module
Source code in falcon/embeddings/norms.py
DiagonalWhitener
¶
Bases: Module
dim: number of features (last dimension of x) momentum: how much of the new batch stats to use (PyTorch-style) eps: small constant for numerical stability use_fourier: if True, apply Hartley transform before whitening
Source code in falcon/embeddings/norms.py
update
¶
Update running mean and variance from current batch x: Tensor of shape (batch_size, dim)
Source code in falcon/embeddings/norms.py
__call__
¶
Apply whitening: (x - mean) / std If use_fourier, whitening happens in Hartley space and is transformed back.
Source code in falcon/embeddings/norms.py
hartley_transform
¶
Hartley transform: H(x) = Re(FFT(x)) - Im(FFT(x)) It is its own inverse: H(H(x)) = x
PCAProjector
¶
PCAProjector(n_components=10, oversampling=10, buffer_size=256, momentum=0.1, normalize_output=True, use_prior=True)
Bases: Module
A streaming, dual PCA projector with momentum-based updates.
This class maintains a running mean and a set of principal components (eigenvectors) and eigenvalues (variances) of the input data, computed using a dual PCA approach. It stores incoming data in a buffer until the buffer is full, then performs an eigen-decomposition update. A momentum term blends the new PCA decomposition with the existing one to adapt over time.
Optionally
- A "prior" can be applied, which acts like ridge (Tikhonov) regularization. It assumes white noise on the inputs and shrinks each principal component proportionally to 1 / (1 + 1 / eigenvalue).
- The output can be normalized so that the expected variance (averaged over all features) is unity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_components
|
int
|
Number of principal components to retain. |
10
|
oversampling
|
int
|
Extra dimensions to capture a slightly larger subspace (not currently used). |
10
|
buffer_size
|
int
|
Number of samples to accumulate before performing an SVD update. |
256
|
momentum
|
float
|
Blend factor for merging new PCA decomposition with the old one. |
0.1
|
normalize_output
|
bool
|
Whether to normalize the reconstructed outputs so that they have unit average variance. |
True
|
use_prior
|
bool
|
Whether to apply a variance-based prior (ridge-like shrinkage). |
True
|
Source code in falcon/embeddings/svd.py
update
¶
Accumulate a batch of data in the buffer. If the buffer is full, update the PCA decomposition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
Tensor
|
A batch of input data with shape (batch_size, D). |
required |
Source code in falcon/embeddings/svd.py
forward
¶
Filter the input data by projecting onto the learned principal components, optionally applying a variance-based prior, then reconstructing and possibly normalizing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
Tensor
|
A batch of input data with shape (batch_size, D). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: A batch of data with shape (batch_size, D) after PCA |
Tensor
|
transform (and optional prior & normalization). |