`module` `fasta`

Global Variables

TOKENS_PROTEIN
TOKENS_RNA
TOKENS_DNA

`function` `get_tokens`

get_tokens(alphabet: str) → str

Converts the alphabet into the corresponding tokens.

Args:

alphabet (str): Alphabet to be used for the encoding. It can be either "protein", "rna", "dna" or a custom string of tokens.

Returns:

str: Tokens of the alphabet.

`function` `encode_sequence`

encode_sequence(sequence: str | ndarray | list, tokens: str) → ndarray

Encodes a sequence or a list of sequences into a numeric format.

Args:

sequence (str | np.ndarray | list): Input sequence.
tokens (str): Alphabet to be used for the encoding.

Returns:

np.ndarray: Encoded sequence or sequences.

`function` `decode_sequence`

decode_sequence(sequence: list | ndarray | Tensor, tokens: str) → str | ndarray

Takes a numeric sequence or list of seqences in input an returns the corresponding string encoding.

Args:

sequence (np.ndarray): Input sequences. Can be either a 1D or a 2D iterable.
tokens (str): Alphabet to be used for the encoding.

Returns:

str | np.ndarray: string or array of strings with the decoded input.

`function` `import_from_fasta`

import_from_fasta(
    fasta_name: str | Path,
    tokens: str | None = None,
    filter_sequences: bool = False,
    remove_duplicates: bool = True
) → Tuple[ndarray, ndarray]

Import sequences from a fasta file. The following operations are performed: - If 'tokens' is provided, encodes the sequences in numeric format. - If 'filter_sequences' is True, removes the sequences whose tokens are not present in the alphabet. - If 'remove_duplicates' is True, removes the duplicated sequences.

Args:

fasta_name (str | Path): Path to the fasta file.
tokens (str | None, optional): Alphabet to be used for the encoding. If provided, encodes the sequences in numeric format.
filter_sequences (bool, optional): If True, removes the sequences whose tokens are not present in the alphabet. Defaults to False.
remove_duplicates (bool, optional): If True, removes the duplicated sequences. Defaults to True.

Raises:

RuntimeError: The file is not in fasta format.

Returns:

Tuple[np.ndarray, np.ndarray]: headers, sequences.

`function` `write_fasta`

write_fasta(
    fname: str,
    headers: ndarray,
    sequences: ndarray,
    numeric_input: bool = False,
    remove_gaps: bool = False,
    tokens: str = 'protein'
)

Generate a fasta file with the input sequences.

Args:

fname (str): Name of the output fasta file.
headers (np.ndarray): Array of sequences' headers.
sequences (np.ndarray): Array of sequences.
numeric_input (bool, optional): Whether the sequences are in numeric (encoded) format or not. Defaults to False.
remove_gaps (bool, optional): If True, removes the gap from the alignment. Defaults to False.
tokens (str): Alphabet to be used for the encoding. Defaults to protein.

`function` `compute_weights`

compute_weights(
    data: ndarray | Tensor,
    th: float = 0.8,
    device: device = device(type='cpu'),
    dtype: dtype = torch.float32
) → Tensor

Computes the weight to be assigned to each sequence 's' in 'data' as 1 / n_clust, where 'n_clust' is the number of sequences that have a sequence identity with 's' >= th.

Args:

data (np.ndarray | torch.Tensor): Encoded input dataset.
th (float, optional): Sequence identity threshold for the clustering. Defaults to 0.8.
device (toch.device, optional): Device. Defaults to "cpu".
dtype (torch.dtype, optional): Data type. Defaults to torch.float32.

Returns:

torch.Tensor: Array with the weights of the sequences.

`function` `validate_alphabet`

validate_alphabet(sequences: ndarray, tokens: str)

Check if the chosen alphabet is compatible with the input sequences.

Args:

sequences (np.ndarray): Input sequences.
tokens (str): Alphabet to be used for the encoding.

Raises:

KeyError: The chosen alphabet is incompatible with the Multi-Sequence Alignment.

This file was automatically generated via lazydocs.

module fasta

Global Variables

function get_tokens

function encode_sequence

function decode_sequence

function import_from_fasta

function write_fasta

function compute_weights

function validate_alphabet

`module` `fasta`

`function` `get_tokens`

`function` `encode_sequence`

`function` `decode_sequence`

`function` `import_from_fasta`

`function` `write_fasta`

`function` `compute_weights`

`function` `validate_alphabet`