FRAME_FM.models package¶

Submodules¶

FRAME_FM.models.demo_autoencoder module¶

EuroSAT Autoencoder (torchvision EuroSAT friendly)

This module defines a simple convolutional autoencoder intended for use with torchvision.datasets.EuroSAT and a dataloader that yields batches as:

batch = (x, y)

where:

x (Tensor): float Tensor of shape [B, C, H, W]
y (Tensor): class label Tensor [B] (not used for reconstruction loss)

Important

Your transforms should convert PIL -> Tensor (e.g., ToTensor()).
For this architecture, it’s simplest to resize images to 64x64 so that 4x MaxPool(2) leads to a 4x4 spatial map.

class FRAME_FM.models.demo_autoencoder.EuroSATAutoencoder(in_channels: int = 3, base_ch: int = 32, k_size: int = 5, latent_dim: int = 256, lr: float = 0.001, **kwargs)[source]¶

Bases: BaseModule

Convolutional autoencoder:

x -> encoder -> z -> decoder -> x_recon

Uses MSE reconstruction loss.

> Thappitla, R.S., Villuri, V.G.K. & Kumar, S. An autoencoder driven deep learning geospatial approach to flood vulnerability analysis: in the upper and middle basin of river Damodar. Sci Rep 15, 33741 (2025). https://doi.org/10.1038/s41598-025-96781-2
Hydra config (example):: _target_: FRAME_FM.model_code.demo_model.EuroSATAutoencoder in_channels: 3 base_ch: 32 k_size: 5 latent_dim: 256 lr: 1e-3

configure_optimizers()[source]¶

forward(x: Tensor) → Tuple[Tensor, Tensor][source]¶

Parameters:: x – Tensor [B, C, H, W]
Returns:: Tensor [B, C, H, W] z: Tensor [B, latent_dim]
Return type:: x_recon

test_step_body(batch, batch_idx: int = 0)[source]¶

training_step_body(batch, batch_idx: int = 0)[source]¶

validation_step_body(batch, batch_idx: int = 0)[source]¶

FRAME_FM.models.convae module¶

This demo shows the application of convolutional autoencoder to a stack of geospatial tiles. A ConvAutoencoder class is defined to readh tiles form the input batch and pass them through the convolutional encoder-decodernetwork.

class FRAME_FM.models.convae.ConvAutoencoder(in_channels: int = 3, base_channels: int = 32, kernel_size: int = 3, latent_dim: int = 256, input_dim: int = 32, plotting: bool = False, lr=0.0, weight_decay=1e-05)[source]¶

Bases: BaseModule

Class for defining the AE, train and validation steps

configure_optimizers()[source]¶

forward(x)[source]¶

on_validation_epoch_end()[source]¶

on_validation_epoch_start()[source]¶

training_step_body(batch, batch_idx)[source]¶

validation_step_body(batch, batch_idx)[source]¶

FRAME_FM.models.mmmae module¶

class FRAME_FM.models.mmmae.MultimodalMaskedAutoencoder(input_shapes: list[dict[str, int] | list[int]], n_channels: list[int], patch_shapes: list[dict[str, int] | list[int]], inputs_positioned: list[str] | str = '', position_space: dict[str, tuple[float, float]] | list[tuple[float, float]] | None = None, pos_embed_ratio: dict[str, float] | list[float] | None = None, encoder_embed_dim: int = 16, encoder_depth: int = 24, encoder_num_heads: int = 16, decoder_embed_dim: int = 16, decoder_depth: int = 8, decoder_num_heads: int = 16, mlp_ratio: float = 4.0, norm_layer: type[~torch.nn.modules.normalization.LayerNorm] = <class 'torch.nn.modules.normalization.LayerNorm'>, norm_token_loss: bool = False, learning_rate: float = 0.001, default_mask_ratio: float = 0.75)[source]¶

Bases: BaseModule

Masked Autoencoder with flexible multi-input embeddings and transformer backbone

configure_optimizers()[source]¶

forward(inputs: list, mask_ratio: float = 0.75) → tuple[Tensor, list[Tensor], Tensor][source]¶

Apply MMMAE to inputs and return the loss, predictions, and mask.

Parameters:

inputs (list) – Batched model inputs.
mask_ratio (float, optional) – Proportion of token embeddings to mask per batch. Defaults to 0.75.

Returns:

Mean squared error of model predictions, over masked tokens, shape [1].
Model predictions of input tokens, shapes ([B, L_i, D_i])_i.
Mask with 0 where token extracted, 1 otherwise, shape [B, sum(L_i)].

Return type:

tuple[torch.Tensor, list[torch.Tensor], torch.Tensor]

forward_decoder(x: Tensor, ids_restore: Tensor, metadata_embed: Tensor) → list[Tensor][source]¶

Transform encoding of masked inputs, decode using a transformer, and reconstruct tokens.

Parameters:

x (torch.Tensor) – Encodings of shuffled, masked tokens, shape [B, 1 + (1-p)L, D].
ids_restore (torch.Tensor) – IDs with which to restore original, unshuffled encodings, shape [B, L].
metadata_embed (torch.Tensor) – Encodings of input mode and positions, shape [B, L, D_d]

Returns:

Decoded tokens for each input, as reconstructed by input_embedders,: shapes ([B, L_i, D_i])_i with sum(L_i) = L.

Return type:

list[torch.Tensor]

forward_encoder(inputs: list, mask_ratio: float) → tuple[Tensor, Tensor, Tensor, Tensor][source]¶

Tokenise and embed inputs, randomly mask tokens, and encode using a transformer.

Parameters:

inputs (list) – Batched model inputs, for conversion by input_embedders into token and position embeddings of shapes ([B, L_i, D])_i and ([B, L_i, D_d])_i
mask_ratio (float) – Proportion p of token embeddings to mask per batch.

Returns:

Encodings of randomly selected input embeddings, shape [B, 1 + (1-p)sum(L_i), D].
Batched mode and position embeddings for decoder, shape [B, sum(L_i), D_d]
Mask with 0 where token extracted, 1 otherwise, shape [B, sum(L_i)].
IDs with which to restore original, unshuffled token embeddings, shape [B, sum(L_i)].

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

forward_loss(inputs: list, predictions: list[Tensor], mask: Tensor) → Tensor[source]¶

Calculate masked-token MSE between batched inputs and model predictions.

Parameters:

inputs (list) – Batched model inputs, for conversion to tokens by input_embedders.
predictions (list[torch.Tensor]) – Model predictions, shapes ([B, L_i, D_i])_i.
mask (torch.Tensor) – Mask with 1 where token masked, shape [B, sum(L_i)].

Returns:

Average mean squared error over the batch, shape [1].

Return type:

torch.Tensor

initialize_weights()[source]¶: Initialise layer weights and parameters, including in input embedders.

input_embedders: list[BaseEmbedder]¶

random_masking(x: Tensor, mask_ratio: float) → tuple[Tensor, Tensor, Tensor][source]¶

Shuffle batched token embeddings and mask random selection.

Parameters:

x (torch.Tensor) – Batched token embeddings, shape [B, L, D].
mask_ratio (float) – Proportion p of token embeddings to mask per batch.

Returns:

Randomly selected token embeddings, shape [B, pL, D].
Mask with 0 where token extracted, 1 otherwise, shape [B, L].
IDs with which to restore original, unshuffled token embeddings.

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

test_step_body(batch, batch_idx)[source]¶

training_step_body(batch, batch_idx)[source]¶

validation_step_body(batch, batch_idx)[source]¶

FRAME_FM.models package¶

Submodules¶

FRAME_FM.models.demo_autoencoder module¶

FRAME_FM.models.convae module¶

FRAME_FM.models.mmmae module¶

Module contents¶