adabmDCApy APIs

This section describe all the functions available in the Python implementation of adabmDCA.

Submodules

adabmDCA.dataset module

class adabmDCA.dataset.DatasetDCA(path_data: str | Path, path_weights: str | Path | None = None, alphabet: str = 'protein', device: device = device(type='cpu'))

Bases: Dataset

get_effective_size() int

Returns the effective size (Meff) of the dataset.

Returns:

Effective size of the dataset.

Return type:

int

get_num_residues() int

Returns the number of residues (L) in the multi-sequence alignment.

Returns:

Length of the MSA.

Return type:

int

get_num_states() int

Returns the number of states (q) in the alphabet.

Returns:

Number of states.

Return type:

int

shuffle() None

Shuffles the dataset.

adabmDCA.fasta_utils module

adabmDCA.fasta_utils.compute_weights(data: ndarray | Tensor, th: float = 0.8, device: device = device(type='cpu')) Tensor

Computes the weight to be assigned to each sequence ‘s’ in ‘data’ as 1 / n_clust, where ‘n_clust’ is the number of sequences that have a sequence identity with ‘s’ >= th.

Parameters:
  • data (np.ndarray | torch.Tensor) – Encoded input dataset.

  • th (float, optional) – Sequence identity threshold for the clustering. Defaults to 0.8.

  • device (toch.device, optional) – Device. Defaults to “cpu”.

Returns:

Array with the weights of the sequences.

Return type:

torch.Tensor

adabmDCA.fasta_utils.decode_sequence(sequence: ndarray, tokens: str) str | ndarray

Takes a numeric sequence or list of seqences in input an returns the corresponding string encoding.

Parameters:
  • sequence (np.ndarray) – Input sequences. Can be either a 1D or a 2D array.

  • tokens (str) – Alphabet to be used for the encoding.

Returns:

Decoded input.

Return type:

str | np.ndarray

adabmDCA.fasta_utils.encode_sequence(sequence: str | ndarray, tokens: str) ndarray

Encodes a sequence or a list of sequences into a numeric format.

Parameters:
  • sequence (Union[str, Array]) – Input sequence.

  • tokens (str) – Alphabet to be used for the encoding.

Returns:

Encoded sequence or sequences.

Return type:

np.ndarray

adabmDCA.fasta_utils.get_tokens(alphabet: str) str

Converts the alphabet into the corresponding tokens.

Parameters:

alphabet (str) – Alphabet to be used for the encoding. It can be either “protein”, “rna”, “dna” or a custom string of tokens.

Returns:

Tokens of the alphabet.

Return type:

str

adabmDCA.fasta_utils.import_clean_dataset(filein: str, tokens: str = 'protein') Tuple[ndarray, ndarray]

Imports data from a fasta file and removes all the sequences whose tokens are not present in a specified alphabet.

Parameters:
  • filein (str) – Input fasta.

  • tokens (str, optional) – Alphabet to be used for the encoding. Defaults to “protein”.

Returns:

headers, sequences.

Return type:

Tuple[np.ndarray, np.ndarray]

adabmDCA.fasta_utils.import_from_fasta(fasta_name: str | Path, tokens: str | None = None) Tuple[ndarray, ndarray]

Import data from a fasta file.

Parameters:
  • fasta_name (Union[str, Path]) – Path to the fasta file.

  • tokens (str) – Alphabet to be used for the encoding. If provided, encodes the sequences in numeric format.

Raises:

RuntimeError – The file is not in fasta format.

Returns:

headers, sequences.

Return type:

Tuple[np.ndarray, np.ndarray]

adabmDCA.fasta_utils.validate_alphabet(sequences: ndarray, tokens: str)

Check if the chosen alphabet is compatible with the input sequences.

Parameters:
  • sequences (np.ndarray) – Input sequences.

  • tokens (str) – Alphabet to be used for the encoding.

Raises:

KeyError – The chosen alphabet is incompatible with the Multi-Sequence Alignment.

adabmDCA.fasta_utils.write_fasta(fname: str, headers: ndarray, sequences: ndarray, numeric_input: bool = False, remove_gaps: bool = False, alphabet: str = 'protein')

Generate a fasta file with the input sequences.

Parameters:
  • fname (str) – Name of the output fasta file.

  • headers (np.ndarray) – Array of sequences’ headers.

  • sequences (np.ndarray) – Array of sequences.

  • numeric_input (bool, optional) – Whether the sequences are in numeric (encoded) format or not. Defaults to False.

  • remove_gaps (bool, optional) – If True, removes the gap from the alignment. Defaults to False.

  • tokens (str) – Alphabet to be used for the encoding. Defaults to protein.

adabmDCA.functional module

adabmDCA.functional.one_hot(x: Tensor, num_classes: int = -1, dtype: dtype = torch.float32)

A fast one-hot encoding function faster than the PyTorch one working with torch.int32 and returning a float Tensor. Works only for 2D tensors.

Parameters:
  • x (torch.Tensor) – Input tensor to be one-hot encoded.

  • num_classes (int, optional) – Number of classes. If -1, the number of classes is inferred from the input tensor. Defaults to -1.

  • dtype (torch.dtype, optional) – Data type of the output tensor. Defaults to torch.float32.

Returns:

One-hot encoded tensor.

Return type:

torch.Tensor

adabmDCA.graph module

adabmDCA.graph.decimate_graph(pij: Tensor, params: Dict[str, Tensor], mask: Tensor, drate: float) Tuple[Dict[str, Tensor], Tensor]

Performs one decimation step and updates the parameters and mask.

Parameters:
  • pij (torch.Tensor) – Two-point marginal probability distribution.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • mask (torch.Tensor) – Mask.

  • drate (float) – Percentage of active couplings to be pruned at each decimation step.

Returns:

Updated parameters and mask.

Return type:

Tuple[Dict[str, torch.Tensor], torch.Tensor]

adabmDCA.graph.update_mask(mask: Tensor, Dkl: Tensor, drate: float) Tensor

Updates the mask by removing the n_remove couplings with the smallest Dkl.

Parameters:
  • mask (torch.Tensor) – Mask.

  • Dkl (torch.Tensor) – Kullback-Leibler divergence matrix.

  • drate (float) – Percentage of active couplings to be pruned at each decimation step.

Returns:

Updated mask.

Return type:

torch.Tensor

adabmDCA.io module

adabmDCA.io.load_chains(fname: str, tokens: str, load_weights: bool = False) ndarray | Tuple[ndarray, ndarray]

Loads the sequences from a fasta file and returns the numeric-encoded version. If the sequences are weighted, the log-weights are also returned. If the sequences are not weighted, the log-weights are set to 0.

Parameters:
  • fname (str) – Path to the file containing the sequences.

  • tokens (str) – “protein”, “dna”, “rna” or another string with the alphabet to be used.

  • load_weights (bool, optional) – If True, the log-weights are loaded and returned. Defaults to False.

Returns:

Numeric-encoded sequences and log-weights if load_weights is True.

Return type:

np.ndarray | Tuple[np.ndarray, np.ndarray]

adabmDCA.io.load_params(fname: str, tokens: str, device: device, dtype: dtype = torch.float32) Dict[str, Tensor]

Import the parameters of the model from a file.

Parameters:
  • fname (str) – Path of the file that stores the parameters.

  • tokens (str) – “protein”, “dna”, “rna” or another string with the alphabet to be used.

  • device (torch.device) – Device where to store the parameters.

  • dtype (torch.dtype) – Data type of the parameters. Defaults to torch.float32.

Returns:

Parameters of the model.

Return type:

Dict[str, torch.Tensor]

adabmDCA.io.load_params_oldformat(fname: str, device: device, dtype: dtype = torch.float32) Dict[str, Tensor]

Import the parameters of the model from a file. Assumes the old DCA format.

Parameters:
  • fname (str) – Path of the file that stores the parameters.

  • device (torch.device) – Device where to store the parameters.

  • dtype (torch.dtype) – Data type of the parameters. Defaults to torch.float32.

Returns:

Parameters of the model.

Return type:

Dict[str, torch.Tensor]

adabmDCA.io.save_chains(fname: str, chains: Tensor, tokens: str, log_weights: Tensor | None = None)

Saves the chains in a fasta file.

Parameters:
  • fname (str) – Path to the file where to save the chains.

  • chains (torch.Tensor) – Chains.

  • tokens (str) – “protein”, “dna”, “rna” or another string with the alphabet to be used.

adabmDCA.io.save_params(fname: str, params: Dict[str, Tensor], mask: Tensor, tokens: str) None

Saves the parameters of the model in a file.

Parameters:
  • fname (str) – Path to the file where to save the parameters.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • mask (torch.Tensor) – Mask of the coupling matrix that determines which are the non-zero entries.

  • tokens (str) – “protein”, “dna”, “rna” or another string with the alphabet to be used.

adabmDCA.io.save_params_oldformat(fname: str, params: Dict[str, Tensor], mask: Tensor) None

Saves the parameters of the model in a file. Assumes the old DCA format.

Parameters:
  • fname (str) – Path to the file where to save the parameters.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • mask (torch.Tensor) – Mask of the coupling matrix that determines which are the non-zero entries.

adabmDCA.parser module

adabmDCA.parser.add_args_contacts(parser: ArgumentParser) ArgumentParser
adabmDCA.parser.add_args_dca(parser: ArgumentParser) ArgumentParser
adabmDCA.parser.add_args_dms(parser: ArgumentParser) ArgumentParser
adabmDCA.parser.add_args_eaDCA(parser: ArgumentParser) ArgumentParser
adabmDCA.parser.add_args_edDCA(parser: ArgumentParser) ArgumentParser
adabmDCA.parser.add_args_energies(parser: ArgumentParser) ArgumentParser
adabmDCA.parser.add_args_sample(parser: ArgumentParser) ArgumentParser
adabmDCA.parser.add_args_train(parser: ArgumentParser) ArgumentParser

adabmDCA.plot module

adabmDCA.plot.plot_PCA(fig: figure, data1: array, data2: array, dim1: int, dim2: int, labels: List[str], title: str)
adabmDCA.plot.plot_autocorrelation(ax: Axes, checkpoints: array, autocorr: array, gen_seqid: array, data_seqid: array)
adabmDCA.plot.plot_hist(ax, data1, data2, color, dim, labels, orientation='vertical')
adabmDCA.plot.plot_pearson_sampling(ax: Axes, checkpoints: array, pearsons: array, pearson_training: array | None = None)
adabmDCA.plot.plot_scatter_correlations(ax: Axes, Cij_data: array, Cij_gen: array, Cijk_data: array, Cijk_gen: array, pearson_Cij: float, pearson_Cijk: float) Axes
adabmDCA.plot.plot_scatter_labels(ax, data1, data2, dim1, dim2, labels)

adabmDCA.resampling module

adabmDCA.resampling.compute_mixing_time(sampler: Callable, data: Tensor, params: Dict[str, Tensor], n_max_sweeps: int, beta: float) Dict[str, list]

Computes the mixing time using the t and t/2 method. The sampling will halt when the mixing time is reached or the limit of n_max_sweeps sweeps is reached.

Parameters:
  • sampler (Callable) – Sampling function.

  • data (torch.Tensor) – Initial data.

  • params (Dict[str, torch.Tensor]) – Parameters for the sampling.

  • n_max_sweeps (int) – Maximum number of sweeps.

  • beta (float) – Inverse temperature for the sampling.

Returns:

Results of the mixing time analysis.

Return type:

Dict[str, list]

adabmDCA.sampling module

adabmDCA.sampling.get_deltaE(idx: int, chain: Tensor, residue_old: Tensor, residue_new: Tensor, params: Dict[str, Tensor], L: int, q: int) float
adabmDCA.sampling.get_sampler(sampling_method: str) Callable

Returns the sampling function corresponding to the chosen method.

Parameters:

sampling_method (str) – String indicating the sampling method. Choose between ‘metropolis’ and ‘gibbs’.

Raises:

KeyError – Unknown sampling method.

Returns:

Sampling function.

Return type:

Callable

adabmDCA.sampling.gibbs_sampling(chains: Tensor, params: Dict[str, Tensor], nsweeps: int, beta: float = 1.0) Tensor

Gibbs sampling.

Parameters:
  • chains (torch.Tensor) – Initial chains.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • nsweeps (int) – Number of sweeps.

  • beta (float, optional) – Inverse temperature. Defaults to 1.0.

Returns:

Updated chains.

Return type:

torch.Tensor

adabmDCA.sampling.metropolis(chains: Tensor, params: Dict[str, Tensor], nsweeps: int, beta: float = 1.0) Tensor

Metropolis sampling.

Parameters:
  • chains (torch.Tensor) – One-hot encoded sequences.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • nsweeps (int) – Number of sweeps to be performed.

  • beta (float, optional) – Inverse temperature. Defaults to 1.0.

Returns:

Updated chains.

Return type:

torch.Tensor

adabmDCA.sampling.metropolis_sweep(chains: Tensor, params: Dict[str, Tensor], beta: float) Tensor

Performs a Metropolis sweep over the chains.

Parameters:
  • chains (torch.Tensor) – One-hot encoded sequences.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • beta (float) – Inverse temperature.

Returns:

Updated chains.

Return type:

torch.Tensor

adabmDCA.statmech module

adabmDCA.statmech.compute_energy(X: Tensor, params: Dict[str, Tensor]) Tensor

Compute the DCA energy of the sequences in X.

Parameters:
  • X (torch.Tensor) – Sequences in one-hot encoding format.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

Returns:

DCA Energy of the sequences.

Return type:

torch.Tensor

adabmDCA.statmech.compute_entropy(chains: Tensor, params: Dict[str, Tensor], logZ: float) float

Compute the entropy of the DCA model.

Parameters:
  • chains (torch.Tensor) – Chains that are supposed to be an equilibrium realization of the model.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • logZ (float) – Log-partition function of the model.

Returns:

Entropy of the model.

Return type:

float

adabmDCA.statmech.compute_logZ_exact(all_states: Tensor, params: Dict[str, Tensor]) float

Compute the log-partition function of the model.

Parameters:
  • all_states (torch.Tensor) – All possible states of the system.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

Returns:

Log-partition function of the model.

Return type:

float

adabmDCA.statmech.compute_log_likelihood(fi: Tensor, fij: Tensor, params: Dict[str, Tensor], logZ: float) float

Compute the log-likelihood of the model.

Parameters:
  • fi (torch.Tensor) – Single-site frequencies of the data.

  • fij (torch.Tensor) – Two-site frequencies of the data.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • logZ (float) – Log-partition function of the model.

Returns:

Log-likelihood of the model.

Return type:

float

adabmDCA.statmech.enumerate_states(L: int, q: int, device: device = device(type='cpu')) Tensor

Enumerate all possible states of a system of L sites and q states.

Parameters:
  • L (int) – Number of sites.

  • q (int) – Number of states.

  • device (torch.device, optional) – Device to store the states. Defaults to “cpu”.

Returns:

All possible states.

Return type:

torch.Tensor

adabmDCA.statmech.update_weights_AIS(prev_params: Dict[str, Tensor], curr_params: Dict[str, Tensor], chains: Tensor, log_weights: Tensor) Tuple[Tensor, Tensor]

Update the weights used during the trajectory Annealed Importance Sampling (AIS) algorithm.

Parameters:
  • prev_params (Dict[str, torch.Tensor]) – Params at time t-1.

  • curr_params (Dict[str, torch.Tensor]) – Params at time t.

  • chains (torch.Tensor) – Chains at time t-1.

  • log_weights (torch.Tensor) – Log-weights at time t-1.

Returns:

Log-weights and chains at time t.

Return type:

Tuple[torch.Tensor, torch.Tensor]

adabmDCA.stats module

adabmDCA.stats.extract_Cij_from_freq(fij: Tensor, pij: Tensor, fi: Tensor, pi: Tensor, mask: Tensor | None = None) Tuple[float, float]

Extracts the lower triangular part of the covariance matrices of the data and chains starting from the frequencies.

Parameters:
  • fij (torch.Tensor) – Two-point frequencies of the data.

  • pij (torch.Tensor) – Two-point frequencies of the chains.

  • fi (torch.Tensor) – Single-point frequencies of the data.

  • pi (torch.Tensor) – Single-point frequencies of the chains.

  • mask (torch.Tensor, optional) – Mask for comparing just a subset of the couplings. Defaults to None.

Returns:

Extracted two-point frequencies of the data and chains.

Return type:

Tuple[float, float]

adabmDCA.stats.extract_Cij_from_seqs(data: Tensor, chains: Tensor, weights: Tensor | None = None, pseudo_count: float = 0.0, mask: Tensor | None = None) Tuple[float, float]

Extracts the lower triangular part of the covariance matrices of the data and chains starting from the sequences.

Parameters:
  • data (torch.Tensor) – Data sequences.

  • chains (torch.Tensor) – Chain sequences.

  • mask (torch.Tensor, optional) – Mask for comparing just a subset of the couplings. Defaults to None.

Returns:

Two-point frequencies of the data and chains.

Return type:

Tuple[float, float]

adabmDCA.stats.generate_unique_triplets(L: int, ntriplets: int, device: device = device(type='cpu')) Tensor

Generates a set of unique triplets of positions. Used to compute the 3-points statistics.

Parameters:
  • L (int) – Length of the sequences.

  • ntriplets (int) – Number of triplets to be generated.

  • device (torch.device, optional) – Device to perform computations on. Defaults to “cpu”.

Returns:

Tensor of shape (ntriplets, 3) containing the indices of the triplets.

Return type:

torch.Tensor

adabmDCA.stats.get_correlation_two_points(fij: Tensor, pij: Tensor, fi: Tensor, pi: Tensor, mask: Tensor | None = None) Tuple[float, float]

Computes the Pearson coefficient and the slope between the two-point frequencies of data and chains.

Parameters:
  • fij (torch.Tensor) – Two-point frequencies of the data.

  • pij (torch.Tensor) – Two-point frequencies of the chains.

  • fi (torch.Tensor) – Single-point frequencies of the data.

  • pi (torch.Tensor) – Single-point frequencies of the chains.

  • mask (torch.Tensor, optional) – Mask to select the couplings to use for the correlation coefficient. Defaults to None.

Returns:

Pearson correlation coefficient of the two-sites statistics and slope of the interpolating line.

Return type:

Tuple[float, float]

adabmDCA.stats.get_covariance_matrix(data: Tensor, weights: Tensor, pseudo_count: float = 0.0) Tensor

Computes the weighted covariance matrix of the input multi sequence alignment.

Parameters:
  • data (torch.Tensor) – Input MSA in one-hot variables.

  • weights (torch.Tensor) – Importance weights of the sequences.

  • pseudo_count (float, optional) – Pseudo count. Defaults to 0..

Returns:

Covariance matrix.

Return type:

torch.Tensor

adabmDCA.stats.get_freq_single_point(data: Tensor, weights: Tensor | None, pseudo_count: float = 0.0) Tensor

Computes the single point frequencies of the input MSA. :param data: One-hot encoded data array. :type data: torch.Tensor :param weights: Weights of the sequences. :type weights: torch.Tensor | None :param pseudo_count: Pseudo count to be added to the frequencies. Defaults to 0.. :type pseudo_count: float, optional

Raises:

ValueError – If the input data is not a 3D tensor.

Returns:

Single point frequencies.

Return type:

torch.Tensor

adabmDCA.stats.get_freq_three_points(data: Tensor, weights: Tensor, ntriplets: int, device: device = device(type='cpu')) Tensor

Computes the 3-body statistics of the input MSA.

Parameters:
  • data (torch.Tensor) – Input MSA in one-hot encoding.

  • weights (torch.Tensor) – Importance weights for the sequences.

  • ntriplets (int) – Number of triplets to test.

  • device (torch.device, optional) – Device to perform computations on. Defaults to “cpu”.

Returns:

3-points connected correlation for ntriplets randomly extracted triplets.

Return type:

torch.Tensor

adabmDCA.stats.get_freq_two_points(data: Tensor, weights: Tensor | None, pseudo_count: float = 0.0) Tensor

Computes the 2-points statistics of the input MSA.

Parameters:
  • data (torch.Tensor) – One-hot encoded data array.

  • weights (torch.Tensor | None) – Array of weights to assign to the sequences of shape.

  • pseudo_count (float, optional) – Pseudo count for the single and two points statistics. Acts as a regularization. Defaults to 0..

Raises:

ValueError – If the input data is not a 3D tensor.

Returns:

Matrix of two-point frequencies of shape (L, q, L, q).

Return type:

torch.Tensor

adabmDCA.stats.get_slope(x, y)

adabmDCA.training module

adabmDCA.training.train_graph(sampler: Callable, chains: Tensor, mask: Tensor, fi: Tensor, fij: Tensor, params: Dict[str, Tensor], nsweeps: int, lr: float, max_epochs: int, target_pearson: float, tokens: str = 'protein', check_slope: bool = False, log_weights: Tensor | None = None, file_paths: Dict[str, Path] | None = None, progress_bar: bool = True, device: device = device(type='cpu')) Tuple[Tensor, Dict[str, Tensor], Tensor]

Trains the model on a given graph until the target Pearson correlation is reached or the maximum number of epochs is exceeded.

Parameters:
  • sampler (Callable) – Sampling function.

  • chains (torch.Tensor) – Markov chains simulated with the model.

  • mask (torch.Tensor) – Mask encoding the sparse graph.

  • fi (torch.Tensor) – Single-point frequencies of the data.

  • fij (torch.Tensor) – Two-point frequencies of the data.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • nsweeps (int) – Number of Gibbs steps for each gradient estimation.

  • lr (float) – Learning rate.

  • max_epochs (int) – Maximum number of gradient updates to be done.

  • target_pearson (float) – Target Pearson coefficient.

  • tokens (str, optional) – Alphabet to be used for the encoding. Defaults to “protein”.

  • log_weights (torch.Tensor, optional) – Log-weights used for the online computation of the log-likelihood. Defaults to None.

  • check_slope (bool, optional) – Whether to take into account the slope for the convergence criterion or not. Defaults to False.

  • file_paths (Dict[str, Path], optional) – Dictionary containing the paths where to save log, params, and chains. Defaults to None.

  • progress_bar (bool, optional) – Whether to display a progress bar or not. Defaults to True.

  • device (torch.device, optional) – Device to be used. Defaults to “cpu”.

Returns:

Updated chains and parameters, log-weights for the log-likelihood computation.

Return type:

Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]

adabmDCA.training.update(sampler: Callable, chains: Tensor, fi: Tensor, fij: Tensor, pi: Tensor, pij: Tensor, params: Dict[str, Tensor], mask: Tensor, lr: float, nsweeps: int) Tuple[Tensor, Dict[str, Tensor]]

Updates the parameters of the model and the Markov chains.

Parameters:
  • sampler (Callable) – Sampling function.

  • chains (torch.Tensor) – Markov chains simulated with the model.

  • fi (torch.Tensor) – Single-point frequencies of the data.

  • fij (torch.Tensor) – Two-points frequencies of the data.

  • pi (torch.Tensor) – Single-point marginals of the model.

  • pij (torch.Tensor) – Two-points marginals of the model.

  • params (Dict[str, torch.Tensor]) – Parameters of the model.

  • mask (torch.Tensor) – Mask of the interaction graph.

  • lr (float) – Learning rate.

  • nsweeps (int) – Number of Monte Carlo updates.

Returns:

Updated chains and parameters.

Return type:

Tuple[torch.Tensor, Dict[str, torch.Tensor]]

adabmDCA.utils module

adabmDCA.utils.get_device(device: str) device

Returns the device where to store the tensors.

Parameters:

device (str) – Device to be used.

Returns:

Device.

Return type:

torch.device

adabmDCA.utils.get_mask_save(L: int, q: int, device: device) Tensor

Returns the mask to save the upper-triangular part of the coupling matrix.

Parameters:
  • L (int) – Length of the MSA.

  • q (int) – Number of values that each residue can assume.

  • device (torch.device) – Device where to store the mask.

Returns:

Mask.

Return type:

torch.Tensor

adabmDCA.utils.init_chains(num_chains: int, L: int, q: int, device: device, fi: Tensor | None = None) Tensor

Initialize the chains of the DCA model. If ‘fi’ is provided, the chains are sampled from the profile model, otherwise they are sampled uniformly at random.

Parameters:
  • num_chains (int) – Number of parallel chains.

  • L (int) – Length of the MSA.

  • q (int) – Number of values that each residue can assume.

  • device (torch.device) – Device where to store the chains.

  • fi (torch.Tensor, optional) – Single-point frequencies. Defaults to None.

Returns:

Initialized parallel chains in one-hot encoding format.

Return type:

torch.Tensor

adabmDCA.utils.init_parameters(fi: Tensor) Dict[str, Tensor]

Initialize the parameters of the DCA model.

Parameters:

fi (torch.Tensor) – Single-point frequencies of the data.

Returns:

Parameters of the model.

Return type:

Dict[str, torch.Tensor]

adabmDCA.utils.resample_sequences(data: Tensor, weights: Tensor, nextract: int) Tensor

Extracts nextract sequences from data with replacement according to the weights.

Parameters:
  • data (torch.Tensor) – Data array.

  • weights (torch.Tensor) – Weights of the sequences.

  • nextract (int) – Number of sequences to be extracted.

Returns:

Extracted sequences.

Return type:

torch.Tensor

adabmDCA.utils.set_zerosum_gauge(params: Dict[str, Tensor]) Dict[str, Tensor]

Sets the zero-sum gauge on the coupling matrix.

Parameters:

params (Dict[str, torch.Tensor]) – Parameters of the model.

Returns:

Parameters with fixed gauge.

Return type:

Dict[str, torch.Tensor]

Module contents