Skip to content

module cobalt


function split_train_test

split_train_test(
    headers: list[str],
    X: Tensor,
    seqid_th: float,
    rnd_gen: Generator | None = None
) → tuple[list, Tensor, list, Tensor]

Splits X into two sets, T and S, such that no sequence in S has more than 'seqid_th' fraction of its residues identical to any sequence in T.

Args:

  • headers (list[str]): List of sequence headers.
  • X (torch.Tensor): Encoded input MSA.
  • seqid_th (float): Threshold sequence identity.
  • rnd_gen (torch.Generator, optional): Random number generator. Defaults to None.

Returns:

  • tuple[list, torch.Tensor, list, torch.Tensor]: Training and test sets.

function prune_redundant_sequences

prune_redundant_sequences(
    headers: list[str],
    X: Tensor,
    seqid_th: float,
    rnd_gen: Generator | None = None
) → tuple[list, Tensor]

Prunes sequences from X such that no sequence has more than 'seqid_th' fraction of its residues identical to any other sequence in the set.

Args:

  • headers (list[str]): List of sequence headers.
  • X (torch.Tensor): Encoded input MSA.
  • seqid_th (float): Threshold sequence identity.
  • rnd_gen (torch.Generator, optional): Random generator. Defaults to None.

Returns:

  • tuple[list, torch.Tensor]: Pruned sequences.

function run_cobalt

run_cobalt(
    headers: list[str],
    X: Tensor,
    t1: float,
    t2: float,
    t3: float,
    max_train: int | None,
    max_test: int | None,
    rnd_gen: Generator | None = None
) → tuple[list, Tensor, list, Tensor]

Runs the Cobalt algorithm to split the input MSA into training and test sets.

Args:

  • headers (list[str]): List of sequence headers.
  • X (torch.Tensor): Encoded input MSA.
  • t1 (float): No sequence in S has more than this fraction of its residues identical to any sequence in T.
  • t2 (float): No pair of test sequences has more than this value fractional identity.
  • t3 (float): No pair of training sequences has more than this value fractional identity.
  • max_train (int | None): Maximum number of sequences in the training set.
  • max_test (int | None): Maximum number of sequences in the test set.
  • rnd_gen (torch.Generator, optional): Random number generator. Defaults to None.

Returns:

  • tuple[list, torch.Tensor, list, torch.Tensor]: Training and test sets.

This file was automatically generated via lazydocs.