Skip to content

module adabmDCA.cobalt


function split_train_test

split_train_test(
    headers: ndarray,
    X: Tensor,
    seqid_th: float,
    rnd_gen: Optional[Generator] = None
) → Tuple[ndarray, Tensor, ndarray, Tensor]

Splits X into two sets, T and S, such that no sequence in S has more than 'seqid_th' fraction of its residues identical to any sequence in T.

Args:

  • headers (np.ndarray): Array of sequence headers.
  • X (torch.Tensor): Encoded input MSA, shape (batch_size, L).
  • seqid_th (float): Threshold sequence identity.
  • rnd_gen (Optional[torch.Generator], optional): Random number generator. Defaults to None.

Returns: Training and test sets as: (np.ndarray) Training headers, (torch.Tensor) Training sequences, (np.ndarray) Test headers, (torch.Tensor) Test sequences.


function prune_redundant_sequences

prune_redundant_sequences(
    headers: ndarray,
    X: Tensor,
    seqid_th: float,
    rnd_gen: Optional[Generator] = None
) → Tuple[ndarray, Tensor]

Prunes sequences from X such that no sequence has more than 'seqid_th' fraction of its residues identical to any other sequence in the set.

Args:

  • headers (np.ndarray): Array of sequence headers.
  • X (torch.Tensor): Encoded input MSA.
  • seqid_th (float): Threshold sequence identity.
  • rnd_gen (Optional[torch.Generator], optional): Random generator. Defaults to None.

Returns: Tuple[np.ndarray, torch.Tensor]: (np.ndarray) Headers of pruned sequences (torch.Tensor) Pruned sequences.


function run_cobalt

run_cobalt(
    headers: ndarray,
    X: Tensor,
    t1: float,
    t2: float,
    t3: float,
    max_train: Optional[int] = None,
    max_test: Optional[int] = None,
    rnd_gen: Optional[Generator] = None
) → Tuple[ndarray, Tensor, ndarray, Tensor]

Runs the Cobalt algorithm to split the input MSA into training and test sets.

Args:

  • headers (np.ndarray): Array of sequence headers.
  • X (torch.Tensor): Encoded input MSA.
  • t1 (float): No sequence in S has more than this fraction of its residues identical to any sequence in T.
  • t2 (float): No pair of test sequences has more than this value fractional identity.
  • t3 (float): No pair of training sequences has more than this value fractional identity.
  • max_train (Optional[int], optional): Maximum number of sequences in the training set. Defaults to None.
  • max_test (Optional[int], optional): Maximum number of sequences in the test set. Defaults to None.
  • rnd_gen (Optional[torch.Generator], optional): Random number generator. Defaults to None.

Returns: Training and test sets as: (np.ndarray) Training headers, (torch.Tensor) Training sequences, (np.ndarray) Test headers, (torch.Tensor) Test sequences.


This file was automatically generated via lazydocs.