module dataset
class DatasetDCA
Dataset class for handling multi-sequence alignments data.
method __init__
__init__(
path_data: str | Path,
path_weights: str | Path | None = None,
alphabet: str = 'protein',
clustering_th: float = 0.8,
no_reweighting: bool = False,
device: device = device(type='cpu'),
dtype: dtype = torch.float32,
message: bool = True
)
Initialize the dataset.
Args:
path_data
(str | Path): Path to multi sequence alignment in fasta format.path_weights
(str | Path | None, optional): Path to the file containing the importance weights of the sequences. If None, the weights are computed automatically.alphabet
(str, optional): Selects the type of encoding of the sequences. Default choices are ("protein", "rna", "dna"). Defaults to "protein".clustering_th
(float, optional): Sequence identity threshold for clustering. Defaults to 0.8.no_reweighting
(bool, optional): If True, the weights are not computed. Defaults to False.device
(torch.device, optional): Device to be used. Defaults to "cpu".dtype
(torch.dtype, optional): Data type of the dataset. Defaults to torch.float32.message
(bool, optional): Print the import message. Defaults to True.
method get_effective_size
get_effective_size() → int
Returns the effective size (Meff) of the dataset.
Returns:
int
: Effective size of the dataset.
method get_num_residues
get_num_residues() → int
Returns the number of residues (L) in the multi-sequence alignment.
Returns:
int
: Length of the MSA.
method get_num_states
get_num_states() → int
Returns the number of states (q) in the alphabet.
Returns:
int
: Number of states.
method shuffle
shuffle() → None
Shuffles the dataset.
This file was automatically generated via lazydocs.