Script Arguments

In this section we list all the possible command-line arguments for the main routines of adabmDCA 2.0.

Train a DCA model

Command	Default value	Description
`-d, --data`	N/A	Filename of the dataset to be used for training the model.
`-o, --output`	DCA_model	Path to the folder where to save the model.
`-m, --model`	bmDCA	Type of model to be trained. Possible options are `bmDCA`, `eaDCA`, and `edDCA`.
`-w, --weights`	None	Path to the file containing the weights of the sequences. If `None`, the weights are computed automatically.
`--clustering_seqid`	0.8	Sequence identity threshold to be used for computing the sequence weights.
`--no_reweighting`	N/A	If this flag is used, the routine assigns uniform weights to the sequences.
`-p, --path_params`	None	Path to the file containing the model's parameters. Required for restoring the training.
`-c, --path_chains`	None	Path to the FASTA file containing the model's chains. Required for restoring the training.
`-l, --label`	None	A label to identify different algorithm runs. It prefixes the output files with this label.
`--alphabet`	protein	Type of encoding for the sequences. Choose among `protein`, `rna`, `dna`, or a user-defined string of tokens.
`--lr`	0.05	Learning rate.
`--nsweeps`	10	Number of sweeps for each gradient estimation.
`--sampler`	gibbs	Sampling method to be used. Possible options are `gibbs` and `metropolis`.
`--nchains`	10000	Number of Markov chains to run in parallel.
`--target`	0.95	Pearson correlation coefficient on the two-sites statistics to be reached.
`--nepochs`	50000	Maximum number of epochs allowed.
`--pseudocount`	None	Pseudo count for the single and two-sites statistics. Acts as a regularization. If `None`, it is set to \(1/M_{\mathrm{eff}}\).
`--seed`	0	Random seed.
`--nthreads`¹	1	Number of threads used in the Julia multithreaded version.
`--device`¹	cuda	Device to be used between cuda (GPU) and CPU. Used in the Python version.
`--dtype`¹	float32	Data type to be used between float32 and float64. Used in the Python version.

eaDCA options

Command	Default value	Description
`--gsteps`	10	Number of gradient updates to be performed on a given graph.
`--factivate`	0.001	Fraction of inactive couplings to try to activate at each graph update.

edDCA options

Command	Default value	Description
`--gsteps`	10	The number of gradient updates applied at each step of the graph convergence process.
`--density`	0.02	Target density to be reached.
`--drate`	0.01	Fraction of remaining couplings to be pruned at each decimation step.

Sampling from a DCA model

Command	Default value	Description
`-p, --path_params`	N/A	Path to the file containing the parameters of the DCA model to sample from.
`-d, --data`	N/A	Filename of the dataset MSA.
`-o, --output`	N/A	Path to the folder where to save the output.
`--ngen`	None	Number of samples to generate.
`-l, --label`	None	A label to identify different algorithm runs. It prefixes the output files with this label.
`-w, --weights`	None	Path to the file containing the weights of the sequences. If `None`, the weights are computed automatically.
`--clustering_seqid`	0.8	Sequence identity threshold to be used for computing the sequence weights.
`--no_reweighting`	N/A	If this flag is used, the routine assigns uniform weights to the sequences.
`--nmeasure`	10000	Number of data sequences to use for computing the mixing time. The value min(`nmeasure`, len(data)) is taken.
`--nmix`	2	Number of mixing times used to generate 'ngen' sequences starting from random.
`--max_nsweeps`	10000	Maximum number of sweeps allowed.
`--alphabet`	protein	Type of encoding for the sequences. Choose among `protein`, `rna`, `dna`, or a user-defined string of tokens.
`--sampler`	gibbs	Sampling method to be used. Possible options are `gibbs` and `metropolis`.
`--beta`	1.0	Inverse temperature to be used for the sampling.
`--pseudocount`	None	Pseudo count for the single and two-sites statistics. Acts as a regularization. If `None`, it is set to \(1/M_{\mathrm{eff}}\).
`--device`¹	cuda	Device to be used between cuda (GPU) and CPU. Used in the Python version.
`--dtype`¹	float32	Data type to be used between float32 and float64. Used in the Python version.

Computing DCA energies of a MSA

Command	Default value	Description
`-d, --data`	N/A	Filename of the input MSA.
`-p, --path_params`	N/A	Path to the file containing the parameters of the DCA model.
`-o, --output`	N/A	Path to the folder where to save the output.
`--alphabet`	protein	Type of encoding for the sequences. Choose among `protein`, `rna`, `dna`, or a user-defined string of tokens.
`--device`¹	cuda	Device to be used between cuda (GPU) and CPU. Used in the Python version.
`--dtype`¹	float32	Data type to be used between float32 and float64. Used in the Python version.

Generate a Deep Mutational Scan (DMS) from a wild type

Command	Default value	Description
`-d, --data`	N/A	Filename of the input MSA containing the wild type. If multiple sequences are present, the first one is used.
`-p, --path_params`	N/A	Path to the file containing the parameters of the DCA model.
`-o, --output`	N/A	Path to the folder where to save the output.
`--alphabet`	protein	Type of encoding for the sequences. Choose among `protein`, `rna`, `dna`, or a user-defined string of tokens.
`--device`¹	cuda	Device to be used between cuda (GPU) and CPU. Used in the Python version.
`--dtype`¹	float32	Data type to be used between float32 and float64. Used in the Python version.

Compute the Frobenius contact matrix

Command	Default value	Description
`-p, --path_params`	N/A	Path to the file containing the parameters of the DCA model.
`-o, --output`	N/A	Path to the folder where to save the output.
`-l, --label`	None	If provided, adds a label to the output files inside the output folder.
`--alphabet`	protein	Type of encoding for the sequences. Choose among `protein`, `rna`, `dna`, or a user-defined string of tokens.
`--device`¹	cuda	Device to be used between cuda (GPU) and CPU. Used in the Python version.
`--dtype`¹	float32	Data type to be used between float32 and float64. Used in the Python version.

¹ Used in specific versions of the software.