adabmDCApy APIs
This section describe all the functions available in the Python implementation of adabmDCA
.
Submodules
adabmDCA.dataset module
- class adabmDCA.dataset.DatasetDCA(path_data: str | Path, path_weights: str | Path | None = None, alphabet: str = 'protein', device: device = device(type='cpu'))
Bases:
Dataset
- get_effective_size() int
Returns the effective size (Meff) of the dataset.
- Returns:
Effective size of the dataset.
- Return type:
int
- get_num_residues() int
Returns the number of residues (L) in the multi-sequence alignment.
- Returns:
Length of the MSA.
- Return type:
int
- get_num_states() int
Returns the number of states (q) in the alphabet.
- Returns:
Number of states.
- Return type:
int
- shuffle() None
Shuffles the dataset.
adabmDCA.fasta_utils module
- adabmDCA.fasta_utils.compute_weights(data: ndarray | Tensor, th: float = 0.8, device: device = device(type='cpu')) Tensor
Computes the weight to be assigned to each sequence ‘s’ in ‘data’ as 1 / n_clust, where ‘n_clust’ is the number of sequences that have a sequence identity with ‘s’ >= th.
- Parameters:
data (np.ndarray | torch.Tensor) – Encoded input dataset.
th (float, optional) – Sequence identity threshold for the clustering. Defaults to 0.8.
device (toch.device, optional) – Device. Defaults to “cpu”.
- Returns:
Array with the weights of the sequences.
- Return type:
torch.Tensor
- adabmDCA.fasta_utils.decode_sequence(sequence: ndarray, tokens: str) str | ndarray
Takes a numeric sequence or list of seqences in input an returns the corresponding string encoding.
- Parameters:
sequence (np.ndarray) – Input sequences. Can be either a 1D or a 2D array.
tokens (str) – Alphabet to be used for the encoding.
- Returns:
Decoded input.
- Return type:
str | np.ndarray
- adabmDCA.fasta_utils.encode_sequence(sequence: str | ndarray, tokens: str) ndarray
Encodes a sequence or a list of sequences into a numeric format.
- Parameters:
sequence (Union[str, Array]) – Input sequence.
tokens (str) – Alphabet to be used for the encoding.
- Returns:
Encoded sequence or sequences.
- Return type:
np.ndarray
- adabmDCA.fasta_utils.get_tokens(alphabet: str) str
Converts the alphabet into the corresponding tokens.
- Parameters:
alphabet (str) – Alphabet to be used for the encoding. It can be either “protein”, “rna”, “dna” or a custom string of tokens.
- Returns:
Tokens of the alphabet.
- Return type:
str
- adabmDCA.fasta_utils.import_clean_dataset(filein: str, tokens: str = 'protein') Tuple[ndarray, ndarray]
Imports data from a fasta file and removes all the sequences whose tokens are not present in a specified alphabet.
- Parameters:
filein (str) – Input fasta.
tokens (str, optional) – Alphabet to be used for the encoding. Defaults to “protein”.
- Returns:
headers, sequences.
- Return type:
Tuple[np.ndarray, np.ndarray]
- adabmDCA.fasta_utils.import_from_fasta(fasta_name: str | Path, tokens: str | None = None) Tuple[ndarray, ndarray]
Import data from a fasta file.
- Parameters:
fasta_name (Union[str, Path]) – Path to the fasta file.
tokens (str) – Alphabet to be used for the encoding. If provided, encodes the sequences in numeric format.
- Raises:
RuntimeError – The file is not in fasta format.
- Returns:
headers, sequences.
- Return type:
Tuple[np.ndarray, np.ndarray]
- adabmDCA.fasta_utils.validate_alphabet(sequences: ndarray, tokens: str)
Check if the chosen alphabet is compatible with the input sequences.
- Parameters:
sequences (np.ndarray) – Input sequences.
tokens (str) – Alphabet to be used for the encoding.
- Raises:
KeyError – The chosen alphabet is incompatible with the Multi-Sequence Alignment.
- adabmDCA.fasta_utils.write_fasta(fname: str, headers: ndarray, sequences: ndarray, numeric_input: bool = False, remove_gaps: bool = False, alphabet: str = 'protein')
Generate a fasta file with the input sequences.
- Parameters:
fname (str) – Name of the output fasta file.
headers (np.ndarray) – Array of sequences’ headers.
sequences (np.ndarray) – Array of sequences.
numeric_input (bool, optional) – Whether the sequences are in numeric (encoded) format or not. Defaults to False.
remove_gaps (bool, optional) – If True, removes the gap from the alignment. Defaults to False.
tokens (str) – Alphabet to be used for the encoding. Defaults to protein.
adabmDCA.functional module
- adabmDCA.functional.one_hot(x: Tensor, num_classes: int = -1, dtype: dtype = torch.float32)
A fast one-hot encoding function faster than the PyTorch one working with torch.int32 and returning a float Tensor. Works only for 2D tensors.
- Parameters:
x (torch.Tensor) – Input tensor to be one-hot encoded.
num_classes (int, optional) – Number of classes. If -1, the number of classes is inferred from the input tensor. Defaults to -1.
dtype (torch.dtype, optional) – Data type of the output tensor. Defaults to torch.float32.
- Returns:
One-hot encoded tensor.
- Return type:
torch.Tensor
adabmDCA.graph module
- adabmDCA.graph.decimate_graph(pij: Tensor, params: Dict[str, Tensor], mask: Tensor, drate: float) Tuple[Dict[str, Tensor], Tensor]
Performs one decimation step and updates the parameters and mask.
- Parameters:
pij (torch.Tensor) – Two-point marginal probability distribution.
params (Dict[str, torch.Tensor]) – Parameters of the model.
mask (torch.Tensor) – Mask.
drate (float) – Percentage of active couplings to be pruned at each decimation step.
- Returns:
Updated parameters and mask.
- Return type:
Tuple[Dict[str, torch.Tensor], torch.Tensor]
- adabmDCA.graph.update_mask(mask: Tensor, Dkl: Tensor, drate: float) Tensor
Updates the mask by removing the n_remove couplings with the smallest Dkl.
- Parameters:
mask (torch.Tensor) – Mask.
Dkl (torch.Tensor) – Kullback-Leibler divergence matrix.
drate (float) – Percentage of active couplings to be pruned at each decimation step.
- Returns:
Updated mask.
- Return type:
torch.Tensor
adabmDCA.io module
- adabmDCA.io.load_chains(fname: str, tokens: str, load_weights: bool = False) ndarray | Tuple[ndarray, ndarray]
Loads the sequences from a fasta file and returns the numeric-encoded version. If the sequences are weighted, the log-weights are also returned. If the sequences are not weighted, the log-weights are set to 0.
- Parameters:
fname (str) – Path to the file containing the sequences.
tokens (str) – “protein”, “dna”, “rna” or another string with the alphabet to be used.
load_weights (bool, optional) – If True, the log-weights are loaded and returned. Defaults to False.
- Returns:
Numeric-encoded sequences and log-weights if load_weights is True.
- Return type:
np.ndarray | Tuple[np.ndarray, np.ndarray]
- adabmDCA.io.load_params(fname: str, tokens: str, device: device, dtype: dtype = torch.float32) Dict[str, Tensor]
Import the parameters of the model from a file.
- Parameters:
fname (str) – Path of the file that stores the parameters.
tokens (str) – “protein”, “dna”, “rna” or another string with the alphabet to be used.
device (torch.device) – Device where to store the parameters.
dtype (torch.dtype) – Data type of the parameters. Defaults to torch.float32.
- Returns:
Parameters of the model.
- Return type:
Dict[str, torch.Tensor]
- adabmDCA.io.load_params_oldformat(fname: str, device: device, dtype: dtype = torch.float32) Dict[str, Tensor]
Import the parameters of the model from a file. Assumes the old DCA format.
- Parameters:
fname (str) – Path of the file that stores the parameters.
device (torch.device) – Device where to store the parameters.
dtype (torch.dtype) – Data type of the parameters. Defaults to torch.float32.
- Returns:
Parameters of the model.
- Return type:
Dict[str, torch.Tensor]
- adabmDCA.io.save_chains(fname: str, chains: Tensor, tokens: str, log_weights: Tensor | None = None)
Saves the chains in a fasta file.
- Parameters:
fname (str) – Path to the file where to save the chains.
chains (torch.Tensor) – Chains.
tokens (str) – “protein”, “dna”, “rna” or another string with the alphabet to be used.
- adabmDCA.io.save_params(fname: str, params: Dict[str, Tensor], mask: Tensor, tokens: str) None
Saves the parameters of the model in a file.
- Parameters:
fname (str) – Path to the file where to save the parameters.
params (Dict[str, torch.Tensor]) – Parameters of the model.
mask (torch.Tensor) – Mask of the coupling matrix that determines which are the non-zero entries.
tokens (str) – “protein”, “dna”, “rna” or another string with the alphabet to be used.
- adabmDCA.io.save_params_oldformat(fname: str, params: Dict[str, Tensor], mask: Tensor) None
Saves the parameters of the model in a file. Assumes the old DCA format.
- Parameters:
fname (str) – Path to the file where to save the parameters.
params (Dict[str, torch.Tensor]) – Parameters of the model.
mask (torch.Tensor) – Mask of the coupling matrix that determines which are the non-zero entries.
adabmDCA.parser module
- adabmDCA.parser.add_args_contacts(parser: ArgumentParser) ArgumentParser
- adabmDCA.parser.add_args_dca(parser: ArgumentParser) ArgumentParser
- adabmDCA.parser.add_args_dms(parser: ArgumentParser) ArgumentParser
- adabmDCA.parser.add_args_eaDCA(parser: ArgumentParser) ArgumentParser
- adabmDCA.parser.add_args_edDCA(parser: ArgumentParser) ArgumentParser
- adabmDCA.parser.add_args_energies(parser: ArgumentParser) ArgumentParser
- adabmDCA.parser.add_args_sample(parser: ArgumentParser) ArgumentParser
- adabmDCA.parser.add_args_train(parser: ArgumentParser) ArgumentParser
adabmDCA.plot module
- adabmDCA.plot.plot_PCA(fig: figure, data1: array, data2: array, dim1: int, dim2: int, labels: List[str], title: str)
- adabmDCA.plot.plot_autocorrelation(ax: Axes, checkpoints: array, autocorr: array, gen_seqid: array, data_seqid: array)
- adabmDCA.plot.plot_hist(ax, data1, data2, color, dim, labels, orientation='vertical')
- adabmDCA.plot.plot_pearson_sampling(ax: Axes, checkpoints: array, pearsons: array, pearson_training: array | None = None)
- adabmDCA.plot.plot_scatter_correlations(ax: Axes, Cij_data: array, Cij_gen: array, Cijk_data: array, Cijk_gen: array, pearson_Cij: float, pearson_Cijk: float) Axes
- adabmDCA.plot.plot_scatter_labels(ax, data1, data2, dim1, dim2, labels)
adabmDCA.resampling module
- adabmDCA.resampling.compute_mixing_time(sampler: Callable, data: Tensor, params: Dict[str, Tensor], n_max_sweeps: int, beta: float) Dict[str, list]
Computes the mixing time using the t and t/2 method. The sampling will halt when the mixing time is reached or the limit of n_max_sweeps sweeps is reached.
- Parameters:
sampler (Callable) – Sampling function.
data (torch.Tensor) – Initial data.
params (Dict[str, torch.Tensor]) – Parameters for the sampling.
n_max_sweeps (int) – Maximum number of sweeps.
beta (float) – Inverse temperature for the sampling.
- Returns:
Results of the mixing time analysis.
- Return type:
Dict[str, list]
adabmDCA.sampling module
- adabmDCA.sampling.get_deltaE(idx: int, chain: Tensor, residue_old: Tensor, residue_new: Tensor, params: Dict[str, Tensor], L: int, q: int) float
- adabmDCA.sampling.get_sampler(sampling_method: str) Callable
Returns the sampling function corresponding to the chosen method.
- Parameters:
sampling_method (str) – String indicating the sampling method. Choose between ‘metropolis’ and ‘gibbs’.
- Raises:
KeyError – Unknown sampling method.
- Returns:
Sampling function.
- Return type:
Callable
- adabmDCA.sampling.gibbs_sampling(chains: Tensor, params: Dict[str, Tensor], nsweeps: int, beta: float = 1.0) Tensor
Gibbs sampling.
- Parameters:
chains (torch.Tensor) – Initial chains.
params (Dict[str, torch.Tensor]) – Parameters of the model.
nsweeps (int) – Number of sweeps.
beta (float, optional) – Inverse temperature. Defaults to 1.0.
- Returns:
Updated chains.
- Return type:
torch.Tensor
- adabmDCA.sampling.metropolis(chains: Tensor, params: Dict[str, Tensor], nsweeps: int, beta: float = 1.0) Tensor
Metropolis sampling.
- Parameters:
chains (torch.Tensor) – One-hot encoded sequences.
params (Dict[str, torch.Tensor]) – Parameters of the model.
nsweeps (int) – Number of sweeps to be performed.
beta (float, optional) – Inverse temperature. Defaults to 1.0.
- Returns:
Updated chains.
- Return type:
torch.Tensor
- adabmDCA.sampling.metropolis_sweep(chains: Tensor, params: Dict[str, Tensor], beta: float) Tensor
Performs a Metropolis sweep over the chains.
- Parameters:
chains (torch.Tensor) – One-hot encoded sequences.
params (Dict[str, torch.Tensor]) – Parameters of the model.
beta (float) – Inverse temperature.
- Returns:
Updated chains.
- Return type:
torch.Tensor
adabmDCA.statmech module
- adabmDCA.statmech.compute_energy(X: Tensor, params: Dict[str, Tensor]) Tensor
Compute the DCA energy of the sequences in X.
- Parameters:
X (torch.Tensor) – Sequences in one-hot encoding format.
params (Dict[str, torch.Tensor]) – Parameters of the model.
- Returns:
DCA Energy of the sequences.
- Return type:
torch.Tensor
- adabmDCA.statmech.compute_entropy(chains: Tensor, params: Dict[str, Tensor], logZ: float) float
Compute the entropy of the DCA model.
- Parameters:
chains (torch.Tensor) – Chains that are supposed to be an equilibrium realization of the model.
params (Dict[str, torch.Tensor]) – Parameters of the model.
logZ (float) – Log-partition function of the model.
- Returns:
Entropy of the model.
- Return type:
float
- adabmDCA.statmech.compute_logZ_exact(all_states: Tensor, params: Dict[str, Tensor]) float
Compute the log-partition function of the model.
- Parameters:
all_states (torch.Tensor) – All possible states of the system.
params (Dict[str, torch.Tensor]) – Parameters of the model.
- Returns:
Log-partition function of the model.
- Return type:
float
- adabmDCA.statmech.compute_log_likelihood(fi: Tensor, fij: Tensor, params: Dict[str, Tensor], logZ: float) float
Compute the log-likelihood of the model.
- Parameters:
fi (torch.Tensor) – Single-site frequencies of the data.
fij (torch.Tensor) – Two-site frequencies of the data.
params (Dict[str, torch.Tensor]) – Parameters of the model.
logZ (float) – Log-partition function of the model.
- Returns:
Log-likelihood of the model.
- Return type:
float
- adabmDCA.statmech.enumerate_states(L: int, q: int, device: device = device(type='cpu')) Tensor
Enumerate all possible states of a system of L sites and q states.
- Parameters:
L (int) – Number of sites.
q (int) – Number of states.
device (torch.device, optional) – Device to store the states. Defaults to “cpu”.
- Returns:
All possible states.
- Return type:
torch.Tensor
- adabmDCA.statmech.update_weights_AIS(prev_params: Dict[str, Tensor], curr_params: Dict[str, Tensor], chains: Tensor, log_weights: Tensor) Tuple[Tensor, Tensor]
Update the weights used during the trajectory Annealed Importance Sampling (AIS) algorithm.
- Parameters:
prev_params (Dict[str, torch.Tensor]) – Params at time t-1.
curr_params (Dict[str, torch.Tensor]) – Params at time t.
chains (torch.Tensor) – Chains at time t-1.
log_weights (torch.Tensor) – Log-weights at time t-1.
- Returns:
Log-weights and chains at time t.
- Return type:
Tuple[torch.Tensor, torch.Tensor]
adabmDCA.stats module
- adabmDCA.stats.extract_Cij_from_freq(fij: Tensor, pij: Tensor, fi: Tensor, pi: Tensor, mask: Tensor | None = None) Tuple[float, float]
Extracts the lower triangular part of the covariance matrices of the data and chains starting from the frequencies.
- Parameters:
fij (torch.Tensor) – Two-point frequencies of the data.
pij (torch.Tensor) – Two-point frequencies of the chains.
fi (torch.Tensor) – Single-point frequencies of the data.
pi (torch.Tensor) – Single-point frequencies of the chains.
mask (torch.Tensor, optional) – Mask for comparing just a subset of the couplings. Defaults to None.
- Returns:
Extracted two-point frequencies of the data and chains.
- Return type:
Tuple[float, float]
- adabmDCA.stats.extract_Cij_from_seqs(data: Tensor, chains: Tensor, weights: Tensor | None = None, pseudo_count: float = 0.0, mask: Tensor | None = None) Tuple[float, float]
Extracts the lower triangular part of the covariance matrices of the data and chains starting from the sequences.
- Parameters:
data (torch.Tensor) – Data sequences.
chains (torch.Tensor) – Chain sequences.
mask (torch.Tensor, optional) – Mask for comparing just a subset of the couplings. Defaults to None.
- Returns:
Two-point frequencies of the data and chains.
- Return type:
Tuple[float, float]
- adabmDCA.stats.generate_unique_triplets(L: int, ntriplets: int, device: device = device(type='cpu')) Tensor
Generates a set of unique triplets of positions. Used to compute the 3-points statistics.
- Parameters:
L (int) – Length of the sequences.
ntriplets (int) – Number of triplets to be generated.
device (torch.device, optional) – Device to perform computations on. Defaults to “cpu”.
- Returns:
Tensor of shape (ntriplets, 3) containing the indices of the triplets.
- Return type:
torch.Tensor
- adabmDCA.stats.get_correlation_two_points(fij: Tensor, pij: Tensor, fi: Tensor, pi: Tensor, mask: Tensor | None = None) Tuple[float, float]
Computes the Pearson coefficient and the slope between the two-point frequencies of data and chains.
- Parameters:
fij (torch.Tensor) – Two-point frequencies of the data.
pij (torch.Tensor) – Two-point frequencies of the chains.
fi (torch.Tensor) – Single-point frequencies of the data.
pi (torch.Tensor) – Single-point frequencies of the chains.
mask (torch.Tensor, optional) – Mask to select the couplings to use for the correlation coefficient. Defaults to None.
- Returns:
Pearson correlation coefficient of the two-sites statistics and slope of the interpolating line.
- Return type:
Tuple[float, float]
- adabmDCA.stats.get_covariance_matrix(data: Tensor, weights: Tensor, pseudo_count: float = 0.0) Tensor
Computes the weighted covariance matrix of the input multi sequence alignment.
- Parameters:
data (torch.Tensor) – Input MSA in one-hot variables.
weights (torch.Tensor) – Importance weights of the sequences.
pseudo_count (float, optional) – Pseudo count. Defaults to 0..
- Returns:
Covariance matrix.
- Return type:
torch.Tensor
- adabmDCA.stats.get_freq_single_point(data: Tensor, weights: Tensor | None, pseudo_count: float = 0.0) Tensor
Computes the single point frequencies of the input MSA. :param data: One-hot encoded data array. :type data: torch.Tensor :param weights: Weights of the sequences. :type weights: torch.Tensor | None :param pseudo_count: Pseudo count to be added to the frequencies. Defaults to 0.. :type pseudo_count: float, optional
- Raises:
ValueError – If the input data is not a 3D tensor.
- Returns:
Single point frequencies.
- Return type:
torch.Tensor
- adabmDCA.stats.get_freq_three_points(data: Tensor, weights: Tensor, ntriplets: int, device: device = device(type='cpu')) Tensor
Computes the 3-body statistics of the input MSA.
- Parameters:
data (torch.Tensor) – Input MSA in one-hot encoding.
weights (torch.Tensor) – Importance weights for the sequences.
ntriplets (int) – Number of triplets to test.
device (torch.device, optional) – Device to perform computations on. Defaults to “cpu”.
- Returns:
3-points connected correlation for ntriplets randomly extracted triplets.
- Return type:
torch.Tensor
- adabmDCA.stats.get_freq_two_points(data: Tensor, weights: Tensor | None, pseudo_count: float = 0.0) Tensor
Computes the 2-points statistics of the input MSA.
- Parameters:
data (torch.Tensor) – One-hot encoded data array.
weights (torch.Tensor | None) – Array of weights to assign to the sequences of shape.
pseudo_count (float, optional) – Pseudo count for the single and two points statistics. Acts as a regularization. Defaults to 0..
- Raises:
ValueError – If the input data is not a 3D tensor.
- Returns:
Matrix of two-point frequencies of shape (L, q, L, q).
- Return type:
torch.Tensor
- adabmDCA.stats.get_slope(x, y)
adabmDCA.training module
- adabmDCA.training.train_graph(sampler: Callable, chains: Tensor, mask: Tensor, fi: Tensor, fij: Tensor, params: Dict[str, Tensor], nsweeps: int, lr: float, max_epochs: int, target_pearson: float, tokens: str = 'protein', check_slope: bool = False, log_weights: Tensor | None = None, file_paths: Dict[str, Path] | None = None, progress_bar: bool = True, device: device = device(type='cpu')) Tuple[Tensor, Dict[str, Tensor], Tensor]
Trains the model on a given graph until the target Pearson correlation is reached or the maximum number of epochs is exceeded.
- Parameters:
sampler (Callable) – Sampling function.
chains (torch.Tensor) – Markov chains simulated with the model.
mask (torch.Tensor) – Mask encoding the sparse graph.
fi (torch.Tensor) – Single-point frequencies of the data.
fij (torch.Tensor) – Two-point frequencies of the data.
params (Dict[str, torch.Tensor]) – Parameters of the model.
nsweeps (int) – Number of Gibbs steps for each gradient estimation.
lr (float) – Learning rate.
max_epochs (int) – Maximum number of gradient updates to be done.
target_pearson (float) – Target Pearson coefficient.
tokens (str, optional) – Alphabet to be used for the encoding. Defaults to “protein”.
log_weights (torch.Tensor, optional) – Log-weights used for the online computation of the log-likelihood. Defaults to None.
check_slope (bool, optional) – Whether to take into account the slope for the convergence criterion or not. Defaults to False.
file_paths (Dict[str, Path], optional) – Dictionary containing the paths where to save log, params, and chains. Defaults to None.
progress_bar (bool, optional) – Whether to display a progress bar or not. Defaults to True.
device (torch.device, optional) – Device to be used. Defaults to “cpu”.
- Returns:
Updated chains and parameters, log-weights for the log-likelihood computation.
- Return type:
Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]
- adabmDCA.training.update(sampler: Callable, chains: Tensor, fi: Tensor, fij: Tensor, pi: Tensor, pij: Tensor, params: Dict[str, Tensor], mask: Tensor, lr: float, nsweeps: int) Tuple[Tensor, Dict[str, Tensor]]
Updates the parameters of the model and the Markov chains.
- Parameters:
sampler (Callable) – Sampling function.
chains (torch.Tensor) – Markov chains simulated with the model.
fi (torch.Tensor) – Single-point frequencies of the data.
fij (torch.Tensor) – Two-points frequencies of the data.
pi (torch.Tensor) – Single-point marginals of the model.
pij (torch.Tensor) – Two-points marginals of the model.
params (Dict[str, torch.Tensor]) – Parameters of the model.
mask (torch.Tensor) – Mask of the interaction graph.
lr (float) – Learning rate.
nsweeps (int) – Number of Monte Carlo updates.
- Returns:
Updated chains and parameters.
- Return type:
Tuple[torch.Tensor, Dict[str, torch.Tensor]]
adabmDCA.utils module
- adabmDCA.utils.get_device(device: str) device
Returns the device where to store the tensors.
- Parameters:
device (str) – Device to be used.
- Returns:
Device.
- Return type:
torch.device
- adabmDCA.utils.get_mask_save(L: int, q: int, device: device) Tensor
Returns the mask to save the upper-triangular part of the coupling matrix.
- Parameters:
L (int) – Length of the MSA.
q (int) – Number of values that each residue can assume.
device (torch.device) – Device where to store the mask.
- Returns:
Mask.
- Return type:
torch.Tensor
- adabmDCA.utils.init_chains(num_chains: int, L: int, q: int, device: device, fi: Tensor | None = None) Tensor
Initialize the chains of the DCA model. If ‘fi’ is provided, the chains are sampled from the profile model, otherwise they are sampled uniformly at random.
- Parameters:
num_chains (int) – Number of parallel chains.
L (int) – Length of the MSA.
q (int) – Number of values that each residue can assume.
device (torch.device) – Device where to store the chains.
fi (torch.Tensor, optional) – Single-point frequencies. Defaults to None.
- Returns:
Initialized parallel chains in one-hot encoding format.
- Return type:
torch.Tensor
- adabmDCA.utils.init_parameters(fi: Tensor) Dict[str, Tensor]
Initialize the parameters of the DCA model.
- Parameters:
fi (torch.Tensor) – Single-point frequencies of the data.
- Returns:
Parameters of the model.
- Return type:
Dict[str, torch.Tensor]
- adabmDCA.utils.resample_sequences(data: Tensor, weights: Tensor, nextract: int) Tensor
Extracts nextract sequences from data with replacement according to the weights.
- Parameters:
data (torch.Tensor) – Data array.
weights (torch.Tensor) – Weights of the sequences.
nextract (int) – Number of sequences to be extracted.
- Returns:
Extracted sequences.
- Return type:
torch.Tensor
- adabmDCA.utils.set_zerosum_gauge(params: Dict[str, Tensor]) Dict[str, Tensor]
Sets the zero-sum gauge on the coupling matrix.
- Parameters:
params (Dict[str, torch.Tensor]) – Parameters of the model.
- Returns:
Parameters with fixed gauge.
- Return type:
Dict[str, torch.Tensor]