`module` `adabmDCA.dataset`

`class` `DatasetDCA`

Dataset class for handling multi-sequence alignments data.

`method` `init`

__init__(
    path_data: str,
    path_weights: Optional[str] = None,
    alphabet: str = 'protein',
    clustering_th: float = 0.8,
    no_reweighting: bool = False,
    remove_duplicates: bool = False,
    filter_sequences: bool = False,
    message: bool = True,
    device: device = device(type='cpu'),
    dtype: dtype = torch.float32
)

Initialize the dataset.

Args:

path_data (str): Path to multi sequence alignment in fasta format.
path_weights (Optional[str], optional): Path to the file containing the importance weights of the sequences. If None, the weights are computed automatically.
alphabet (str, optional): Selects the type of encoding of the sequences. Default choices are ("protein", "rna", "dna"). Defaults to "protein".
clustering_th (float, optional): Sequence identity threshold for clustering. Defaults to 0.8.
no_reweighting (bool, optional): If True, the weights are not computed. Defaults to False.
remove_duplicates (bool, optional): If True, removes duplicate sequences from the dataset. Defaults to False.
filter_sequences (bool, optional): If True, removes sequences containing tokens not in the alphabet. Defaults to False.
message (bool, optional): Print the import message. Defaults to True.
device (torch.device, optional): Device to be used. Defaults to "cpu".
dtype (torch.dtype, optional): Data type of the dataset. Defaults to torch.float32.

`method` `get_effective_size`

get_effective_size() → int

Returns the effective size (Meff) of the dataset.

Returns:

int: Effective size of the dataset.

`method` `get_num_residues`

get_num_residues() → int

Returns the number of residues (L) in the multi-sequence alignment.

Returns:

int: Length of the MSA.

`method` `get_num_states`

get_num_states() → int

Returns the number of states (q) in the alphabet.

Returns:

int: Number of states.

`method` `shuffle`

shuffle() → None

Shuffles the dataset.

This file was automatically generated via lazydocs.

module adabmDCA.dataset

class DatasetDCA

method __init__

method get_effective_size

method get_num_residues

method get_num_states

method shuffle

`module` `adabmDCA.dataset`

`class` `DatasetDCA`

`method` `init`

`method` `get_effective_size`

`method` `get_num_residues`

`method` `get_num_states`

`method` `shuffle`