`module` `cobalt`

`function` `split_train_test`

split_train_test(
    headers: list[str],
    X: Tensor,
    seqid_th: float,
    rnd_gen: Generator | None = None
) → tuple[list, Tensor, list, Tensor]

Splits X into two sets, T and S, such that no sequence in S has more than 'seqid_th' fraction of its residues identical to any sequence in T.

Args:

headers (list[str]): List of sequence headers.
X (torch.Tensor): Encoded input MSA.
seqid_th (float): Threshold sequence identity.
rnd_gen (torch.Generator, optional): Random number generator. Defaults to None.

Returns:

tuple[list, torch.Tensor, list, torch.Tensor]: Training and test sets.

`function` `prune_redundant_sequences`

prune_redundant_sequences(
    headers: list[str],
    X: Tensor,
    seqid_th: float,
    rnd_gen: Generator | None = None
) → tuple[list, Tensor]

Prunes sequences from X such that no sequence has more than 'seqid_th' fraction of its residues identical to any other sequence in the set.

Args:

headers (list[str]): List of sequence headers.
X (torch.Tensor): Encoded input MSA.
seqid_th (float): Threshold sequence identity.
rnd_gen (torch.Generator, optional): Random generator. Defaults to None.

Returns:

tuple[list, torch.Tensor]: Pruned sequences.

`function` `run_cobalt`

run_cobalt(
    headers: list[str],
    X: Tensor,
    t1: float,
    t2: float,
    t3: float,
    max_train: int | None,
    max_test: int | None,
    rnd_gen: Generator | None = None
) → tuple[list, Tensor, list, Tensor]

Runs the Cobalt algorithm to split the input MSA into training and test sets.

Args:

headers (list[str]): List of sequence headers.
X (torch.Tensor): Encoded input MSA.
t1 (float): No sequence in S has more than this fraction of its residues identical to any sequence in T.
t2 (float): No pair of test sequences has more than this value fractional identity.
t3 (float): No pair of training sequences has more than this value fractional identity.
max_train (int | None): Maximum number of sequences in the training set.
max_test (int | None): Maximum number of sequences in the test set.
rnd_gen (torch.Generator, optional): Random number generator. Defaults to None.

Returns:

tuple[list, torch.Tensor, list, torch.Tensor]: Training and test sets.

This file was automatically generated via lazydocs.

module cobalt

function split_train_test

function prune_redundant_sequences

function run_cobalt

`module` `cobalt`

`function` `split_train_test`

`function` `prune_redundant_sequences`

`function` `run_cobalt`