`module` `adabmDCA.cobalt`

`function` `split_train_test`

split_train_test(
    headers: ndarray,
    X: Tensor,
    seqid_th: float,
    rnd_gen: Optional[Generator] = None
) → Tuple[ndarray, Tensor, ndarray, Tensor]

Splits X into two sets, T and S, such that no sequence in S has more than 'seqid_th' fraction of its residues identical to any sequence in T.

Args:

headers (np.ndarray): Array of sequence headers.
X (torch.Tensor): Encoded input MSA, shape (batch_size, L).
seqid_th (float): Threshold sequence identity.
rnd_gen (Optional[torch.Generator], optional): Random number generator. Defaults to None.

Returns: Training and test sets as: (np.ndarray) Training headers, (torch.Tensor) Training sequences, (np.ndarray) Test headers, (torch.Tensor) Test sequences.

`function` `prune_redundant_sequences`

prune_redundant_sequences(
    headers: ndarray,
    X: Tensor,
    seqid_th: float,
    rnd_gen: Optional[Generator] = None
) → Tuple[ndarray, Tensor]

Prunes sequences from X such that no sequence has more than 'seqid_th' fraction of its residues identical to any other sequence in the set.

Args:

headers (np.ndarray): Array of sequence headers.
X (torch.Tensor): Encoded input MSA.
seqid_th (float): Threshold sequence identity.
rnd_gen (Optional[torch.Generator], optional): Random generator. Defaults to None.

Returns: Tuple[np.ndarray, torch.Tensor]: (np.ndarray) Headers of pruned sequences (torch.Tensor) Pruned sequences.

`function` `run_cobalt`

run_cobalt(
    headers: ndarray,
    X: Tensor,
    t1: float,
    t2: float,
    t3: float,
    max_train: Optional[int] = None,
    max_test: Optional[int] = None,
    rnd_gen: Optional[Generator] = None
) → Tuple[ndarray, Tensor, ndarray, Tensor]

Runs the Cobalt algorithm to split the input MSA into training and test sets.

Args:

headers (np.ndarray): Array of sequence headers.
X (torch.Tensor): Encoded input MSA.
t1 (float): No sequence in S has more than this fraction of its residues identical to any sequence in T.
t2 (float): No pair of test sequences has more than this value fractional identity.
t3 (float): No pair of training sequences has more than this value fractional identity.
max_train (Optional[int], optional): Maximum number of sequences in the training set. Defaults to None.
max_test (Optional[int], optional): Maximum number of sequences in the test set. Defaults to None.
rnd_gen (Optional[torch.Generator], optional): Random number generator. Defaults to None.

Returns: Training and test sets as: (np.ndarray) Training headers, (torch.Tensor) Training sequences, (np.ndarray) Test headers, (torch.Tensor) Test sequences.

This file was automatically generated via lazydocs.

module adabmDCA.cobalt

function split_train_test

function prune_redundant_sequences

function run_cobalt

`module` `adabmDCA.cobalt`

`function` `split_train_test`

`function` `prune_redundant_sequences`

`function` `run_cobalt`