module adabmDCA.cobalt
function split_train_test
split_train_test(
headers: ndarray,
X: Tensor,
seqid_th: float,
rnd_gen: Optional[Generator] = None
) → Tuple[ndarray, Tensor, ndarray, Tensor]
Splits X into two sets, T and S, such that no sequence in S has more than 'seqid_th' fraction of its residues identical to any sequence in T.
Args:
headers(np.ndarray): Array of sequence headers.X(torch.Tensor): Encoded input MSA, shape (batch_size, L).seqid_th(float): Threshold sequence identity.rnd_gen(Optional[torch.Generator], optional): Random number generator. Defaults to None.
Returns: Training and test sets as: (np.ndarray) Training headers, (torch.Tensor) Training sequences, (np.ndarray) Test headers, (torch.Tensor) Test sequences.
function prune_redundant_sequences
prune_redundant_sequences(
headers: ndarray,
X: Tensor,
seqid_th: float,
rnd_gen: Optional[Generator] = None
) → Tuple[ndarray, Tensor]
Prunes sequences from X such that no sequence has more than 'seqid_th' fraction of its residues identical to any other sequence in the set.
Args:
headers(np.ndarray): Array of sequence headers.X(torch.Tensor): Encoded input MSA.seqid_th(float): Threshold sequence identity.rnd_gen(Optional[torch.Generator], optional): Random generator. Defaults to None.
Returns: Tuple[np.ndarray, torch.Tensor]: (np.ndarray) Headers of pruned sequences (torch.Tensor) Pruned sequences.
function run_cobalt
run_cobalt(
headers: ndarray,
X: Tensor,
t1: float,
t2: float,
t3: float,
max_train: Optional[int] = None,
max_test: Optional[int] = None,
rnd_gen: Optional[Generator] = None
) → Tuple[ndarray, Tensor, ndarray, Tensor]
Runs the Cobalt algorithm to split the input MSA into training and test sets.
Args:
headers(np.ndarray): Array of sequence headers.X(torch.Tensor): Encoded input MSA.t1(float): No sequence in S has more than this fraction of its residues identical to any sequence in T.t2(float): No pair of test sequences has more than this value fractional identity.t3(float): No pair of training sequences has more than this value fractional identity.max_train(Optional[int], optional): Maximum number of sequences in the training set. Defaults to None.max_test(Optional[int], optional): Maximum number of sequences in the test set. Defaults to None.rnd_gen(Optional[torch.Generator], optional): Random number generator. Defaults to None.
Returns: Training and test sets as: (np.ndarray) Training headers, (torch.Tensor) Training sequences, (np.ndarray) Test headers, (torch.Tensor) Test sequences.
This file was automatically generated via lazydocs.