FRAME_FM.models package¶
Submodules¶
FRAME_FM.models.demo_autoencoder module¶
EuroSAT Autoencoder (torchvision EuroSAT friendly)
This module defines a simple convolutional autoencoder intended for use with
torchvision.datasets.EuroSAT and a dataloader that yields batches as:
batch = (x, y)
where:
x(Tensor): float Tensor of shape[B, C, H, W]y(Tensor): class label Tensor[B](not used for reconstruction loss)
Important
Your transforms should convert PIL -> Tensor (e.g.,
ToTensor()).For this architecture, it’s simplest to resize images to 64x64 so that 4x MaxPool(2) leads to a 4x4 spatial map.
- class FRAME_FM.models.demo_autoencoder.EuroSATAutoencoder(in_channels: int = 3, base_ch: int = 32, k_size: int = 5, latent_dim: int = 256, lr: float = 0.001, **kwargs)[source]¶
Bases:
BaseModuleConvolutional autoencoder:
x -> encoder -> z -> decoder -> x_recon
Uses MSE reconstruction loss.
- > Thappitla, R.S., Villuri, V.G.K. & Kumar, S. An autoencoder driven deep learning geospatial approach to flood vulnerability analysis
in the upper and middle basin of river Damodar. Sci Rep 15, 33741 (2025). https://doi.org/10.1038/s41598-025-96781-2
- Hydra config (example):
_target_: FRAME_FM.model_code.demo_model.EuroSATAutoencoder in_channels: 3 base_ch: 32 k_size: 5 latent_dim: 256 lr: 1e-3
FRAME_FM.models.convae module¶
This demo shows the application of convolutional autoencoder to a stack of geospatial tiles. A ConvAutoencoder class is defined to readh tiles form the input batch and pass them through the convolutional encoder-decodernetwork.
- class FRAME_FM.models.convae.ConvAutoencoder(in_channels: int = 3, base_channels: int = 32, kernel_size: int = 3, latent_dim: int = 256, input_dim: int = 32, plotting: bool = False, lr=0.0, weight_decay=1e-05)[source]¶
Bases:
BaseModuleClass for defining the AE, train and validation steps
FRAME_FM.models.mmmae module¶
- class FRAME_FM.models.mmmae.MultimodalMaskedAutoencoder(input_shapes: list[dict[str, int] | list[int]], n_channels: list[int], patch_shapes: list[dict[str, int] | list[int]], inputs_positioned: list[str] | str = '', position_space: dict[str, tuple[float, float]] | list[tuple[float, float]] | None = None, pos_embed_ratio: dict[str, float] | list[float] | None = None, encoder_embed_dim: int = 16, encoder_depth: int = 24, encoder_num_heads: int = 16, decoder_embed_dim: int = 16, decoder_depth: int = 8, decoder_num_heads: int = 16, mlp_ratio: float = 4.0, norm_layer: type[~torch.nn.modules.normalization.LayerNorm] = <class 'torch.nn.modules.normalization.LayerNorm'>, norm_token_loss: bool = False, learning_rate: float = 0.001, default_mask_ratio: float = 0.75)[source]¶
Bases:
BaseModuleMasked Autoencoder with flexible multi-input embeddings and transformer backbone
- forward(inputs: list, mask_ratio: float = 0.75) tuple[Tensor, list[Tensor], Tensor][source]¶
Apply MMMAE to inputs and return the loss, predictions, and mask.
- Parameters:
inputs (list) – Batched model inputs.
mask_ratio (float, optional) – Proportion of token embeddings to mask per batch. Defaults to 0.75.
- Returns:
Mean squared error of model predictions, over masked tokens, shape [1].
Model predictions of input tokens, shapes ([B, L_i, D_i])_i.
Mask with 0 where token extracted, 1 otherwise, shape [B, sum(L_i)].
- Return type:
tuple[torch.Tensor, list[torch.Tensor], torch.Tensor]
- forward_decoder(x: Tensor, ids_restore: Tensor, metadata_embed: Tensor) list[Tensor][source]¶
Transform encoding of masked inputs, decode using a transformer, and reconstruct tokens.
- Parameters:
x (torch.Tensor) – Encodings of shuffled, masked tokens, shape [B, 1 + (1-p)L, D].
ids_restore (torch.Tensor) – IDs with which to restore original, unshuffled encodings, shape [B, L].
metadata_embed (torch.Tensor) – Encodings of input mode and positions, shape [B, L, D_d]
- Returns:
- Decoded tokens for each input, as reconstructed by input_embedders,
shapes ([B, L_i, D_i])_i with sum(L_i) = L.
- Return type:
list[torch.Tensor]
- forward_encoder(inputs: list, mask_ratio: float) tuple[Tensor, Tensor, Tensor, Tensor][source]¶
Tokenise and embed inputs, randomly mask tokens, and encode using a transformer.
- Parameters:
inputs (list) – Batched model inputs, for conversion by input_embedders into token and position embeddings of shapes ([B, L_i, D])_i and ([B, L_i, D_d])_i
mask_ratio (float) – Proportion p of token embeddings to mask per batch.
- Returns:
Encodings of randomly selected input embeddings, shape [B, 1 + (1-p)sum(L_i), D].
Batched mode and position embeddings for decoder, shape [B, sum(L_i), D_d]
Mask with 0 where token extracted, 1 otherwise, shape [B, sum(L_i)].
IDs with which to restore original, unshuffled token embeddings, shape [B, sum(L_i)].
- Return type:
tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- forward_loss(inputs: list, predictions: list[Tensor], mask: Tensor) Tensor[source]¶
Calculate masked-token MSE between batched inputs and model predictions.
- Parameters:
inputs (list) – Batched model inputs, for conversion to tokens by input_embedders.
predictions (list[torch.Tensor]) – Model predictions, shapes ([B, L_i, D_i])_i.
mask (torch.Tensor) – Mask with 1 where token masked, shape [B, sum(L_i)].
- Returns:
Average mean squared error over the batch, shape [1].
- Return type:
torch.Tensor
- initialize_weights()[source]¶
Initialise layer weights and parameters, including in input embedders.
- input_embedders: list[BaseEmbedder]¶
- random_masking(x: Tensor, mask_ratio: float) tuple[Tensor, Tensor, Tensor][source]¶
Shuffle batched token embeddings and mask random selection.
- Parameters:
x (torch.Tensor) – Batched token embeddings, shape [B, L, D].
mask_ratio (float) – Proportion p of token embeddings to mask per batch.
- Returns:
Randomly selected token embeddings, shape [B, pL, D].
Mask with 0 where token extracted, 1 otherwise, shape [B, L].
IDs with which to restore original, unshuffled token embeddings.
- Return type:
tuple[torch.Tensor, torch.Tensor, torch.Tensor]