Skip to content

Script Arguments

In this section we list all the possible command-line arguments for the main routines of adabmDCA 2.0.

Train a DCA model

Command Default value Description
-d, --data N/A Filename of the dataset to be used for training the model.
-o, --output DCA_model Path to the folder where to save the model.
-m, --model bmDCA Type of model to be trained. Possible options are bmDCA, eaDCA, and edDCA.
-w, --weights None Path to the file containing the weights of the sequences. If None, the weights are computed automatically.
--clustering_seqid 0.8 Sequence identity threshold to be used for computing the sequence weights.
--no_reweighting N/A If this flag is used, the routine assigns uniform weights to the sequences.
-p, --path_params None Path to the file containing the model's parameters. Required for restoring the training.
-c, --path_chains None Path to the FASTA file containing the model's chains. Required for restoring the training.
-l, --label None A label to identify different algorithm runs. It prefixes the output files with this label.
--alphabet protein Type of encoding for the sequences. Choose among protein, rna, dna, or a user-defined string of tokens.
--lr 0.05 Learning rate.
--nsweeps 10 Number of sweeps for each gradient estimation.
--sampler gibbs Sampling method to be used. Possible options are gibbs and metropolis.
--nchains 10000 Number of Markov chains to run in parallel.
--target 0.95 Pearson correlation coefficient on the two-sites statistics to be reached.
--nepochs 50000 Maximum number of epochs allowed.
--pseudocount None Pseudo count for the single and two-sites statistics. Acts as a regularization. If None, it is set to \(1/M_{\mathrm{eff}}\).
--seed 0 Random seed.
--nthreads¹ 1 Number of threads used in the Julia multithreaded version.
--device¹ cuda Device to be used between cuda (GPU) and CPU. Used in the Python version.
--dtype¹ float32 Data type to be used between float32 and float64. Used in the Python version.

eaDCA options

Command Default value Description
--gsteps 10 Number of gradient updates to be performed on a given graph.
--factivate 0.001 Fraction of inactive couplings to try to activate at each graph update.

edDCA options

Command Default value Description
--gsteps 10 The number of gradient updates applied at each step of the graph convergence process.
--density 0.02 Target density to be reached.
--drate 0.01 Fraction of remaining couplings to be pruned at each decimation step.

Sampling from a DCA model

Command Default value Description
-p, --path_params N/A Path to the file containing the parameters of the DCA model to sample from.
-d, --data N/A Filename of the dataset MSA.
-o, --output N/A Path to the folder where to save the output.
--ngen None Number of samples to generate.
-l, --label None A label to identify different algorithm runs. It prefixes the output files with this label.
-w, --weights None Path to the file containing the weights of the sequences. If None, the weights are computed automatically.
--clustering_seqid 0.8 Sequence identity threshold to be used for computing the sequence weights.
--no_reweighting N/A If this flag is used, the routine assigns uniform weights to the sequences.
--nmeasure 10000 Number of data sequences to use for computing the mixing time. The value min(nmeasure, len(data)) is taken.
--nmix 2 Number of mixing times used to generate 'ngen' sequences starting from random.
--max_nsweeps 10000 Maximum number of sweeps allowed.
--alphabet protein Type of encoding for the sequences. Choose among protein, rna, dna, or a user-defined string of tokens.
--sampler gibbs Sampling method to be used. Possible options are gibbs and metropolis.
--beta 1.0 Inverse temperature to be used for the sampling.
--pseudocount None Pseudo count for the single and two-sites statistics. Acts as a regularization. If None, it is set to \(1/M_{\mathrm{eff}}\).
--device¹ cuda Device to be used between cuda (GPU) and CPU. Used in the Python version.
--dtype¹ float32 Data type to be used between float32 and float64. Used in the Python version.

Computing DCA energies of a MSA

Command Default value Description
-d, --data N/A Filename of the input MSA.
-p, --path_params N/A Path to the file containing the parameters of the DCA model.
-o, --output N/A Path to the folder where to save the output.
--alphabet protein Type of encoding for the sequences. Choose among protein, rna, dna, or a user-defined string of tokens.
--device¹ cuda Device to be used between cuda (GPU) and CPU. Used in the Python version.
--dtype¹ float32 Data type to be used between float32 and float64. Used in the Python version.

Generate a Deep Mutational Scan (DMS) from a wild type

Command Default value Description
-d, --data N/A Filename of the input MSA containing the wild type. If multiple sequences are present, the first one is used.
-p, --path_params N/A Path to the file containing the parameters of the DCA model.
-o, --output N/A Path to the folder where to save the output.
--alphabet protein Type of encoding for the sequences. Choose among protein, rna, dna, or a user-defined string of tokens.
--device¹ cuda Device to be used between cuda (GPU) and CPU. Used in the Python version.
--dtype¹ float32 Data type to be used between float32 and float64. Used in the Python version.

Compute the Frobenius contact matrix

Command Default value Description
-p, --path_params N/A Path to the file containing the parameters of the DCA model.
-o, --output N/A Path to the folder where to save the output.
-l, --label None If provided, adds a label to the output files inside the output folder.
--alphabet protein Type of encoding for the sequences. Choose among protein, rna, dna, or a user-defined string of tokens.
--device¹ cuda Device to be used between cuda (GPU) and CPU. Used in the Python version.
--dtype¹ float32 Data type to be used between float32 and float64. Used in the Python version.

¹ Used in specific versions of the software.