FRAME_FM.models package

Submodules

FRAME_FM.models.demo_autoencoder module

EuroSAT Autoencoder (torchvision EuroSAT friendly)

This module defines a simple convolutional autoencoder intended for use with torchvision.datasets.EuroSAT and a dataloader that yields batches as:

batch = (x, y)

where:

  • x (Tensor): float Tensor of shape [B, C, H, W]

  • y (Tensor): class label Tensor [B] (not used for reconstruction loss)

Important

  • Your transforms should convert PIL -> Tensor (e.g., ToTensor()).

  • For this architecture, it’s simplest to resize images to 64x64 so that 4x MaxPool(2) leads to a 4x4 spatial map.

class FRAME_FM.models.demo_autoencoder.EuroSATAutoencoder(in_channels: int = 3, base_ch: int = 32, k_size: int = 5, latent_dim: int = 256, lr: float = 0.001, **kwargs)[source]

Bases: BaseModule

Convolutional autoencoder:

x -> encoder -> z -> decoder -> x_recon

Uses MSE reconstruction loss.

> Thappitla, R.S., Villuri, V.G.K. & Kumar, S. An autoencoder driven deep learning geospatial approach to flood vulnerability analysis

in the upper and middle basin of river Damodar. Sci Rep 15, 33741 (2025). https://doi.org/10.1038/s41598-025-96781-2

Hydra config (example):

_target_: FRAME_FM.model_code.demo_model.EuroSATAutoencoder in_channels: 3 base_ch: 32 k_size: 5 latent_dim: 256 lr: 1e-3

configure_optimizers()[source]
forward(x: Tensor) Tuple[Tensor, Tensor][source]
Parameters:

x – Tensor [B, C, H, W]

Returns:

Tensor [B, C, H, W] z: Tensor [B, latent_dim]

Return type:

x_recon

test_step_body(batch, batch_idx: int = 0)[source]
training_step_body(batch, batch_idx: int = 0)[source]
validation_step_body(batch, batch_idx: int = 0)[source]

FRAME_FM.models.convae module

This demo shows the application of convolutional autoencoder to a stack of geospatial tiles. A ConvAutoencoder class is defined to readh tiles form the input batch and pass them through the convolutional encoder-decodernetwork.

class FRAME_FM.models.convae.ConvAutoencoder(in_channels: int = 3, base_channels: int = 32, kernel_size: int = 3, latent_dim: int = 256, input_dim: int = 32, plotting: bool = False, lr=0.0, weight_decay=1e-05)[source]

Bases: BaseModule

Class for defining the AE, train and validation steps

configure_optimizers()[source]
forward(x)[source]
on_validation_epoch_end()[source]
on_validation_epoch_start()[source]
training_step_body(batch, batch_idx)[source]
validation_step_body(batch, batch_idx)[source]

FRAME_FM.models.mmmae module

class FRAME_FM.models.mmmae.MultimodalMaskedAutoencoder(input_shapes: list[dict[str, int] | list[int]], n_channels: list[int], patch_shapes: list[dict[str, int] | list[int]], inputs_positioned: list[str] | str = '', position_space: dict[str, tuple[float, float]] | list[tuple[float, float]] | None = None, pos_embed_ratio: dict[str, float] | list[float] | None = None, encoder_embed_dim: int = 16, encoder_depth: int = 24, encoder_num_heads: int = 16, decoder_embed_dim: int = 16, decoder_depth: int = 8, decoder_num_heads: int = 16, mlp_ratio: float = 4.0, norm_layer: type[~torch.nn.modules.normalization.LayerNorm] = <class 'torch.nn.modules.normalization.LayerNorm'>, norm_token_loss: bool = False, learning_rate: float = 0.001, default_mask_ratio: float = 0.75)[source]

Bases: BaseModule

Masked Autoencoder with flexible multi-input embeddings and transformer backbone

configure_optimizers()[source]
forward(inputs: list, mask_ratio: float = 0.75) tuple[Tensor, list[Tensor], Tensor][source]

Apply MMMAE to inputs and return the loss, predictions, and mask.

Parameters:
  • inputs (list) – Batched model inputs.

  • mask_ratio (float, optional) – Proportion of token embeddings to mask per batch. Defaults to 0.75.

Returns:

  • Mean squared error of model predictions, over masked tokens, shape [1].

  • Model predictions of input tokens, shapes ([B, L_i, D_i])_i.

  • Mask with 0 where token extracted, 1 otherwise, shape [B, sum(L_i)].

Return type:

tuple[torch.Tensor, list[torch.Tensor], torch.Tensor]

forward_decoder(x: Tensor, ids_restore: Tensor, metadata_embed: Tensor) list[Tensor][source]

Transform encoding of masked inputs, decode using a transformer, and reconstruct tokens.

Parameters:
  • x (torch.Tensor) – Encodings of shuffled, masked tokens, shape [B, 1 + (1-p)L, D].

  • ids_restore (torch.Tensor) – IDs with which to restore original, unshuffled encodings, shape [B, L].

  • metadata_embed (torch.Tensor) – Encodings of input mode and positions, shape [B, L, D_d]

Returns:

Decoded tokens for each input, as reconstructed by input_embedders,

shapes ([B, L_i, D_i])_i with sum(L_i) = L.

Return type:

list[torch.Tensor]

forward_encoder(inputs: list, mask_ratio: float) tuple[Tensor, Tensor, Tensor, Tensor][source]

Tokenise and embed inputs, randomly mask tokens, and encode using a transformer.

Parameters:
  • inputs (list) – Batched model inputs, for conversion by input_embedders into token and position embeddings of shapes ([B, L_i, D])_i and ([B, L_i, D_d])_i

  • mask_ratio (float) – Proportion p of token embeddings to mask per batch.

Returns:

  • Encodings of randomly selected input embeddings, shape [B, 1 + (1-p)sum(L_i), D].

  • Batched mode and position embeddings for decoder, shape [B, sum(L_i), D_d]

  • Mask with 0 where token extracted, 1 otherwise, shape [B, sum(L_i)].

  • IDs with which to restore original, unshuffled token embeddings, shape [B, sum(L_i)].

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

forward_loss(inputs: list, predictions: list[Tensor], mask: Tensor) Tensor[source]

Calculate masked-token MSE between batched inputs and model predictions.

Parameters:
  • inputs (list) – Batched model inputs, for conversion to tokens by input_embedders.

  • predictions (list[torch.Tensor]) – Model predictions, shapes ([B, L_i, D_i])_i.

  • mask (torch.Tensor) – Mask with 1 where token masked, shape [B, sum(L_i)].

Returns:

Average mean squared error over the batch, shape [1].

Return type:

torch.Tensor

initialize_weights()[source]

Initialise layer weights and parameters, including in input embedders.

input_embedders: list[BaseEmbedder]
random_masking(x: Tensor, mask_ratio: float) tuple[Tensor, Tensor, Tensor][source]

Shuffle batched token embeddings and mask random selection.

Parameters:
  • x (torch.Tensor) – Batched token embeddings, shape [B, L, D].

  • mask_ratio (float) – Proportion p of token embeddings to mask per batch.

Returns:

  • Randomly selected token embeddings, shape [B, pL, D].

  • Mask with 0 where token extracted, 1 otherwise, shape [B, L].

  • IDs with which to restore original, unshuffled token embeddings.

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

test_step_body(batch, batch_idx)[source]
training_step_body(batch, batch_idx)[source]
validation_step_body(batch, batch_idx)[source]

Module contents