Skip to content

Training DCA models πŸš€

All versions of adabmDCA β€” Python, Julia, and C++ β€” expose the same command-line interface through the adabmDCA command.

To see the complete list of training options:

$ adabmDCA train -h

The standard command to start training a DCA model is:

$ adabmDCA train -m <model> -d <fasta_file> -o <output_folder> -l <label>

Arguments 🧩

  • <model> ∈ {bmDCA, eaDCA, edDCA}
    Selects the training routine.
    By default, the fully connected bmDCA algorithm is used. edDCA can follow two different routines: either it decimates a pre-trained bmDCA model, or it first trains a bmDCA model and then decimates it.
  • <fasta_file> – Path to the FASTA file containing the training MSA.
  • <output_folder> – Folder where results will be stored (created if missing).
  • <label> – Optional tag for output files.

Training Behavior βš™οΈ

Training stops when the Pearson correlation between model and empirical connected correlations reaches the target value (default: 0.95).

  • Early training is fast (e.g., Pearson β‰ˆ 0.9 after ~100 iterations).
  • Approaching higher values takes significantly longer (power‑law decay).

For a quick coarse model, set:

--target 0.9

Output Files πŸ“

During training, adabmDCA maintains three output files:

  • <label>_params.dat – Non‑zero model parameters
  • Lines starting with J β†’ couplings
  • Lines with h β†’ biases

  • <label>_chains.fasta – State of the Markov chains

  • <label>_adabmDCA.log – Log file updated throughout training

Update intervals: - bmDCA: every 50 updates
- eaDCA, edDCA: every 10 updates


Restoring Interrupted Training πŸ”„

Resume training using:

$ adabmDCA train [...] -p <file_params> -c <file_chains>

Importance Weights πŸ‹οΈβ€β™‚οΈ

Provide custom weights with:

--weights <path>

Otherwise, weights are computed automatically and stored as:

<label>_weights.dat

Options:

  • --clustering_seqid <value> – default: 0.8
  • --no_reweighting – use uniform weights

Choosing the Alphabet πŸ” 

Default alphabet: protein.

Specify alternatives:

  • RNA β†’ --alphabet rna
  • DNA β†’ --alphabet dna
  • Custom β†’
    --alphabet ABCD-

eaDCA 🌱

Enable with:

--model eaDCA

Key hyperparameters:

  • --factivate – fraction of inactive couplings activated (default: 0.001)
  • --gsteps – parameter updates per graph update (default: 10)

Recommended: reduce sweeps to 5.


edDCA βœ‚οΈ (Decimated DCA)

Run decimation:

$ adabmDCA train -m edDCA -d <fasta_file> -p <params> -c <chains>

Two workflows:

  1. Use pre‑trained bmDCA (params + chains)
  2. Train bmDCA automatically, then decimate

Key hyperparameters:

  • --gsteps – default: 10
  • --drate – pruning fraction (default: 0.01)
  • --density – target graph density (default: 0.02)
  • --target – Pearson threshold (default: 0.95)

Choosing Hyperparameters 🎚️

Defaults work well for clean and moderately diverse MSAs. For more difficult datasets, consider tuning:


Learning Rate

  • Default: 0.01
  • If chains mix poorly, try:
    --lr 0.005

Number of Markov Chains

  • Default: 10,000
  • Using fewer chains reduces the memory required to train the model, but it may also lead to a longer algorithm convergence time.
  • Change with:
    --nchains <value>

Number of Monte Carlo Steps

  • Controlled by --nsweeps
  • Default: 10
  • Recommended range: 10–50. Higher values drastically increase the training time and, empirically, do not help much the model convergence.

Regularization (Pseudocount)

Controlled by --pseudocount.

Default:

Ξ± = 1 / M_eff

Increasing Ξ± (e.g. Ξ± = 0.001 or 0.01) may help when the training struggle converging or the mixing time of the model is very high, but it also makes the model less expressive.