Configuration

The main input to the ArcticTraining CLI is a YAML configuration file that defines files for the TrainerConfig class. This is a Pydantic configuration model that also contains the sub-configurations for data, model, etc.

pydantic model arctic_training.config.trainer.TrainerConfig[source]

Bases: BaseConfig

Base Trainer Configuration.

Show JSON schema

{
   "title": "TrainerConfig",
   "description": "Base Trainer Configuration.",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "sft",
         "description": "Trainer type. ",
         "title": "Type",
         "type": "string"
      },
      "code": {
         "default": "train.py",
         "description": "Path to the python script containing custom trainer implementation. ",
         "format": "path",
         "title": "Code",
         "type": "string"
      },
      "skip_validation": {
         "default": false,
         "description": "Skips validation of types for subconfigs and registered classes. ",
         "title": "Skip Validation",
         "type": "boolean"
      },
      "model": {
         "$ref": "#/$defs/ModelConfig",
         "description": "Model configuration. "
      },
      "tokenizer": {
         "$ref": "#/$defs/TokenizerConfig",
         "description": "Tokenizer configuration. "
      },
      "data": {
         "$ref": "#/$defs/DataConfig",
         "description": "Train and eval data configuration. "
      },
      "logger": {
         "$ref": "#/$defs/LoggerConfig",
         "description": "Logger configuration. "
      },
      "wandb": {
         "$ref": "#/$defs/WandBConfig",
         "description": "Weights and Biases configuration. "
      },
      "scheduler": {
         "$ref": "#/$defs/SchedulerConfig",
         "description": "Scheduler configuration. "
      },
      "optimizer": {
         "$ref": "#/$defs/OptimizerConfig",
         "description": "Optimizer configuration. "
      },
      "deepspeed": {
         "additionalProperties": true,
         "default": {},
         "description": "DeepSpeed config dict. Will be automatically filled if not provided by the user. ",
         "title": "Deepspeed",
         "type": "object"
      },
      "epochs": {
         "default": 1,
         "description": "Number of epochs to train. ",
         "minimum": 0,
         "title": "Epochs",
         "type": "integer"
      },
      "loss_log_interval": {
         "default": 1,
         "description": "Number of steps between logging loss. ",
         "minimum": 0,
         "title": "Loss Log Interval",
         "type": "integer"
      },
      "train_log_iter_interval": {
         "default": 1,
         "description": "Iters between training metric log outputs. `0` is off, only intervals of `1` currently supported. ",
         "enum": [
            0,
            1
         ],
         "title": "Train Log Iter Interval",
         "type": "integer"
      },
      "train_log_metrics_path": {
         "default": "train-log-metrics.jsonl",
         "description": ".jsonl path to log precise metrics according to the `train_log_iter_interval` schedule. Defaults to `./train-log-metrics.jsonl` ",
         "format": "path",
         "title": "Train Log Metrics Path",
         "type": "string"
      },
      "gradient_accumulation_steps": {
         "default": 1,
         "description": "Number of gradient accumulation steps. ",
         "minimum": 1,
         "title": "Gradient Accumulation Steps",
         "type": "integer"
      },
      "micro_batch_size": {
         "default": 1,
         "description": "Micro batch size per GPU. ",
         "minimum": 1,
         "title": "Micro Batch Size",
         "type": "integer"
      },
      "sequence_parallel_size": {
         "default": 1,
         "description": "Sequence Parallelism Degree. Disabled if set to 1 ",
         "minimum": 1,
         "title": "Sequence Parallel Size",
         "type": "integer"
      },
      "activation_checkpoint_cpu_offload": {
         "default": false,
         "description": "Offload activation checkpoint tensors to cpu. Enables a much longer sequence length. It is not very beneficial if sequence length is <64k  ",
         "title": "Activation Checkpoint Cpu Offload",
         "type": "boolean"
      },
      "tiled_mlp_compute": {
         "default": false,
         "description": "Tile the MLP computation to save GPU memory. Currently only limited architectures supported, but can be expanded to more. ",
         "title": "Tiled Mlp Compute",
         "type": "boolean"
      },
      "seed": {
         "default": 42,
         "description": "Random seed value for numpy, python.random, torch, and transformers. ",
         "minimum": 0,
         "title": "Seed",
         "type": "integer"
      },
      "checkpoint": {
         "default": [],
         "description": "Checkpoint configurations. Multiple checkpoint engines may be used together. ",
         "items": {
            "$ref": "#/$defs/CheckpointConfig"
         },
         "title": "Checkpoint",
         "type": "array"
      },
      "train_iters": {
         "default": 0,
         "description": "Maximum number of training iterations. ",
         "minimum": 0,
         "title": "Train Iters",
         "type": "integer"
      },
      "eval_frequency": {
         "default": 0,
         "minimum": 0,
         "title": "Eval Frequency",
         "type": "integer"
      },
      "exit_iteration": {
         "default": 0,
         "description": "Force exit of training after specified iteration count (useful for debugging). ",
         "minimum": 0,
         "title": "Exit Iteration",
         "type": "integer"
      },
      "min_iterations": {
         "default": 0,
         "description": "When >0, the training dataset will be replicated until there is enough data to run this many iterations. ",
         "minimum": 0,
         "title": "Min Iterations",
         "type": "integer"
      },
      "overfit_first_batch": {
         "default": false,
         "description": "Train only on repetitions of the first training batch. Useful for development. ",
         "title": "Overfit First Batch",
         "type": "boolean"
      },
      "mem_profiler": {
         "default": null,
         "description": "Enable memory profiling. ",
         "enum": [
            null,
            "step",
            "e2e"
         ],
         "title": "Mem Profiler"
      },
      "mem_profiler_dir": {
         "description": "Path to save memory profiling results. Defaults to `logger.output_dir/mem-prof`. ",
         "format": "path",
         "title": "Mem Profiler Dir",
         "type": "string"
      },
      "mem_profiler_max_entries": {
         "default": 100000,
         "description": "Maximum number of entries to store in the memory profiler. ",
         "minimum": 1,
         "title": "Mem Profiler Max Entries",
         "type": "integer"
      },
      "kill_switch_path": {
         "default": "/tmp/at_kill_switch",
         "description": "Path to a file that can be used to trigger a graceful shutdown mid-training (sets early exit to True). ",
         "format": "path",
         "title": "Kill Switch Path",
         "type": "string"
      }
   },
   "$defs": {
      "CheckpointConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Checkpoint engine type. ",
               "title": "Type",
               "type": "string"
            },
            "output_dir": {
               "description": "Checkpoint output directory. If directory does not exist, it will be created. ",
               "format": "path",
               "title": "Output Dir",
               "type": "string"
            },
            "enabled": {
               "default": true,
               "description": "Enable this checkpoint engine. ",
               "title": "Enabled",
               "type": "boolean"
            },
            "auto_resume": {
               "default": false,
               "description": "If a checkpoint is found in the output directory, resume training from that checkpoint. ",
               "title": "Auto Resume",
               "type": "boolean"
            },
            "save_every_n_steps": {
               "default": 0,
               "description": "How often to trigger a checkpoint save by training global step count. ",
               "minimum": 0,
               "title": "Save Every N Steps",
               "type": "integer"
            },
            "save_every_n_epochs": {
               "default": 0,
               "description": "How often to trigger a checkpoint save by training epoch count. ",
               "minimum": 0,
               "title": "Save Every N Epochs",
               "type": "integer"
            },
            "save_end_of_training": {
               "default": false,
               "description": "Whether to save a checkpoint at the end of training. ",
               "title": "Save End Of Training",
               "type": "boolean"
            }
         },
         "required": [
            "output_dir"
         ],
         "title": "CheckpointConfig",
         "type": "object"
      },
      "DataConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Data factory type. Defaults to the `data_factory_type` in the trainer. ",
               "title": "Type",
               "type": "string"
            },
            "sources": {
               "description": "List of data sources to use for training. These must be registered `DataSource`. ",
               "items": {
                  "$ref": "#/$defs/DataSourceConfig"
               },
               "title": "Sources",
               "type": "array"
            },
            "eval_sources": {
               "default": [],
               "description": "list of data sources to use for evaluation. These must be registered `DataSource`. ",
               "items": {
                  "$ref": "#/$defs/DataSourceConfig"
               },
               "title": "Eval Sources",
               "type": "array"
            },
            "train_eval_split": {
               "default": [
                  1.0,
                  0.0
               ],
               "description": "How much of the training data to use for evaluation. ",
               "maxItems": 2,
               "minItems": 2,
               "prefixItems": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "number"
                  }
               ],
               "title": "Train Eval Split",
               "type": "array"
            },
            "max_length": {
               "default": 8192,
               "description": "Maximum length of the input sequence. ",
               "title": "Max Length",
               "type": "integer"
            },
            "num_proc": {
               "default": 16,
               "description": "Number of processes to use for data loading. ",
               "title": "Num Proc",
               "type": "integer"
            },
            "dl_num_workers": {
               "default": 2,
               "description": "Number of DL workers per gpu. ",
               "title": "Dl Num Workers",
               "type": "integer"
            },
            "seed": {
               "default": 42,
               "description": "Seed for data loading. ",
               "title": "Seed",
               "type": "integer"
            },
            "use_data_cache": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Whether to cache loaded data. ",
               "title": "Use Data Cache"
            },
            "cache_processed_data": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Deprecated, please use \"use_data_cache\". ",
               "title": "Cache Processed Data"
            },
            "cache_dir": {
               "default": "/tmp",
               "description": "Directory to store cached data. ",
               "format": "path",
               "title": "Cache Dir",
               "type": "string"
            },
            "cache_fs_type": {
               "default": "auto",
               "enum": [
                  "auto",
                  "local",
                  "shared"
               ],
               "title": "Cache Fs Type",
               "type": "string"
            }
         },
         "required": [
            "sources"
         ],
         "title": "DataConfig",
         "type": "object"
      },
      "DataSourceConfig": {
         "additionalProperties": false,
         "description": "Base DataSource configuration.",
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Data source type. Defaults to 'huggingface' if only a dataset name or path is provided.",
               "title": "Type",
               "type": "string"
            },
            "split": {
               "default": "",
               "description": "Which split the data source is used for. This will be automatically set to either \"train\" or \"eval\" if no value is passed.\n\nFor HFDataSource, this can be any value supported by Dataset slice splits:\nhttps://huggingface.co/docs/datasets/en/loading#slice-splits.",
               "title": "Split",
               "type": "string"
            },
            "sample_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Ratio of the dataset to randomly sample. If None, all examples are used.",
               "title": "Sample Ratio"
            },
            "sample_count": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Number of examples to randomly sample. If None, all examples are used.",
               "title": "Sample Count"
            },
            "sample_seed": {
               "default": 42,
               "description": "Seed for random sampling. Used only if `sample_ratio` or `sample_count` is set.",
               "title": "Sample Seed",
               "type": "integer"
            },
            "process": {
               "default": true,
               "description": "Whether to process the data with the data factory `process` function (e.g., tokenization for SFTDataFactory). ",
               "title": "Process",
               "type": "boolean"
            }
         },
         "title": "DataSourceConfig",
         "type": "object"
      },
      "LoggerConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "output_dir": {
               "default": "/dev/null",
               "description": "Output directory for log files. ",
               "format": "path",
               "title": "Output Dir",
               "type": "string"
            },
            "level": {
               "default": "WARNING",
               "description": "Log level for the logger. ",
               "title": "Level",
               "type": "string"
            },
            "print_output_ranks": {
               "anyOf": [
                  {
                     "const": "*",
                     "type": "string"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  }
               ],
               "default": [
                  0
               ],
               "description": "Which ranks will print logs. Either a list of ranks or \"*\" for all ranks. ",
               "title": "Print Output Ranks"
            },
            "file_output_ranks": {
               "anyOf": [
                  {
                     "const": "*",
                     "type": "string"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  }
               ],
               "default": "*",
               "description": "Which ranks will output logs to a file. Either a list of ranks or \"*\" for all ranks. ",
               "title": "File Output Ranks"
            }
         },
         "title": "LoggerConfig",
         "type": "object"
      },
      "ModelConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Model factory type. ",
               "title": "Type",
               "type": "string"
            },
            "name_or_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "format": "path",
                     "type": "string"
                  }
               ],
               "description": "Model name (as described in Hugging Face model hub) or local path to model checkpoint. ",
               "title": "Name Or Path"
            },
            "dtype": {
               "default": "torch.bfloat16",
               "description": "Data type for model weights. ",
               "examples": [
                  "float32",
                  "bfloat16"
               ],
               "title": "Torch Dtype",
               "type": "string"
            },
            "save_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Name to use when saving the model. ",
               "title": "Save Name"
            },
            "attn_implementation": {
               "default": "sdpa",
               "description": "Attention implementation to use. ",
               "title": "Attn Implementation",
               "type": "string"
            },
            "disable_activation_checkpoint": {
               "default": false,
               "description": "Disable the use of activation checkpointing. ",
               "title": "Disable Activation Checkpoint",
               "type": "boolean"
            },
            "peft_config": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Configuration for Parameter Efficient Fine Tuning. ",
               "title": "Peft Config"
            }
         },
         "required": [
            "name_or_path"
         ],
         "title": "ModelConfig",
         "type": "object"
      },
      "OptimizerConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Optimizer factory type. Defaults to the `optimizer_factory_type` of the trainer. ",
               "title": "Type",
               "type": "string"
            },
            "weight_decay": {
               "default": 0.1,
               "description": "Coefficient for L2 regularization applied to the optimizer's weights. ",
               "minimum": 0.0,
               "title": "Weight Decay",
               "type": "number"
            },
            "betas": {
               "default": [
                  0.9,
                  0.999
               ],
               "description": "Tuple of coefficients used for computing running averages of gradient and its square (e.g., (beta1, beta2) for Adam). ",
               "maxItems": 2,
               "minItems": 2,
               "prefixItems": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "number"
                  }
               ],
               "title": "Betas",
               "type": "array"
            },
            "lr": {
               "default": 0.0005,
               "description": "The initial learning rate. ",
               "minimum": 0.0,
               "title": "Lr",
               "type": "number"
            }
         },
         "title": "OptimizerConfig",
         "type": "object"
      },
      "SchedulerConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Scheduler factory type. Defaults to the `scheduler_factory_type` of the trainer. ",
               "title": "Type",
               "type": "string"
            },
            "lr": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The initial learning rate. Deprecated in favor of `optimizer.learning_rate`. ",
               "title": "Lr"
            }
         },
         "title": "SchedulerConfig",
         "type": "object"
      },
      "TokenizerConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Tokenizer factory type. Defaults to the `tokenizer_factory_type` of the trainer. ",
               "title": "Type",
               "type": "string"
            },
            "name_or_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "format": "path",
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "",
               "description": "Tokenizer name (as described in Hugging Face model hub) or local path directory containing tokenizer. ",
               "title": "Name Or Path"
            }
         },
         "title": "TokenizerConfig",
         "type": "object"
      },
      "WandBConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "enable": {
               "default": false,
               "description": "Whether to enable Weights and Biases logging. ",
               "title": "Enable",
               "type": "boolean"
            },
            "entity": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Weights and Biases entity name. ",
               "title": "Entity"
            },
            "project": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "arctic-training",
               "description": "Weights and Biases project name. ",
               "title": "Project"
            },
            "name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Weights and Biases run name. ",
               "title": "Name"
            }
         },
         "title": "WandBConfig",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "model",
      "data"
   ]
}

Config:

extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

Fields:

activation_checkpoint_cpu_offload (bool)
checkpoint (List[arctic_training.config.checkpoint.CheckpointConfig])
code (pathlib.Path)
data (arctic_training.config.data.DataConfig)
deepspeed (Dict[str, Any])
epochs (int)
eval_frequency (int)
exit_iteration (int)
gradient_accumulation_steps (int)
kill_switch_path (pathlib.Path)
logger (arctic_training.config.logger.LoggerConfig)
loss_log_interval (int)
mem_profiler (Literal[None, 'step', 'e2e'])
mem_profiler_dir (pathlib.Path)
mem_profiler_max_entries (int)
micro_batch_size (int)
min_iterations (int)
model (arctic_training.config.model.ModelConfig)
optimizer (arctic_training.config.optimizer.OptimizerConfig)
overfit_first_batch (bool)
scheduler (arctic_training.config.scheduler.SchedulerConfig)
seed (int)
sequence_parallel_size (int)
skip_validation (bool)
tiled_mlp_compute (bool)
tokenizer (arctic_training.config.tokenizer.TokenizerConfig)
train_iters (int)
train_log_iter_interval (Literal[0, 1])
train_log_metrics_path (pathlib.Path)
type (str)
wandb (arctic_training.config.wandb.WandBConfig)

Validators:

build_deepspeed_config » all fields
coerce_deepspeed_human_friendly_values » deepspeed
init_checkpoint_configs » checkpoint
init_data_config » data
init_dist » all fields
init_model_config » model
init_optimizer_config » optimizer
init_scheduler_config » scheduler
init_tokenizer_config » tokenizer
initialize_logger » logger
mem_profiler_mkdir » all fields
set_tokenizer » all fields
train_log_metrics_path_prep » all fields
validate_eval_frequency » all fields
validate_single_checkpoint_resume » all fields

field type: str = 'sft'

Trainer type.

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field code: Path = PosixPath('train.py')

Path to the python script containing custom trainer implementation.

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field skip_validation: bool = False

Skips validation of types for subconfigs and registered classes.

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field model: ModelConfig [Required]

Model configuration.

Validated by:

build_deepspeed_config
init_dist
init_model_config
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field tokenizer: TokenizerConfig [Optional]

Tokenizer configuration.

Validated by:

build_deepspeed_config
init_dist
init_tokenizer_config
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field data: DataConfig [Required]

Train and eval data configuration.

Validated by:

build_deepspeed_config
init_data_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field logger: LoggerConfig [Optional]

Logger configuration.

Validated by:

build_deepspeed_config
init_dist
initialize_logger
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field wandb: WandBConfig [Optional]

Weights and Biases configuration.

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field scheduler: SchedulerConfig [Optional]

Scheduler configuration.

Validated by:

build_deepspeed_config
init_dist
init_scheduler_config
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field optimizer: OptimizerConfig [Optional]

Optimizer configuration.

Validated by:

build_deepspeed_config
init_dist
init_optimizer_config
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field deepspeed: Dict[str, Any] = {}

DeepSpeed config dict. Will be automatically filled if not provided by the user.

Validated by:

build_deepspeed_config
coerce_deepspeed_human_friendly_values
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field epochs: int = 1

Number of epochs to train.

Constraints:

ge = 0

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field loss_log_interval: Annotated[int] = 1

Number of steps between logging loss.

Constraints:

ge = 0
func = <function parse_human_val at 0x7d029ece95a0>
json_schema_input_type = PydanticUndefined

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field train_log_iter_interval: Literal[0, 1] = 1

Iters between training metric log outputs. 0 is off, only intervals of 1 currently supported.

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field train_log_metrics_path: Path = PosixPath('train-log-metrics.jsonl')

.jsonl path to log precise metrics according to the train_log_iter_interval schedule. Defaults to ./train-log-metrics.jsonl

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field gradient_accumulation_steps: int = 1

Number of gradient accumulation steps.

Constraints:

ge = 1

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field micro_batch_size: int = 1

Micro batch size per GPU.

Constraints:

ge = 1

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field sequence_parallel_size: int = 1

Sequence Parallelism Degree. Disabled if set to 1

Constraints:

ge = 1

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field activation_checkpoint_cpu_offload: bool = False

Offload activation checkpoint tensors to cpu. Enables a much longer sequence length. It is not very beneficial if sequence length is <64k

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field tiled_mlp_compute: bool = False

Tile the MLP computation to save GPU memory. Currently only limited architectures supported, but can be expanded to more.

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field seed: int = 42

Random seed value for numpy, python.random, torch, and transformers.

Constraints:

ge = 0

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field checkpoint: List[CheckpointConfig] = []

Checkpoint configurations. Multiple checkpoint engines may be used together.

Validated by:

build_deepspeed_config
init_checkpoint_configs
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field train_iters: Annotated[int] = 0

Maximum number of training iterations.

Constraints:

ge = 0
func = <function parse_human_val at 0x7d029ece95a0>
json_schema_input_type = PydanticUndefined

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field exit_iteration: int = 0

Force exit of training after specified iteration count (useful for debugging).

Constraints:

ge = 0

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field min_iterations: Annotated[int] = 0

When >0, the training dataset will be replicated until there is enough data to run this many iterations.

Constraints:

ge = 0
func = <function parse_human_val at 0x7d029ece95a0>
json_schema_input_type = PydanticUndefined

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field overfit_first_batch: bool = False

Train only on repetitions of the first training batch. Useful for development.

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field mem_profiler: Literal[None, 'step', 'e2e'] = None

Enable memory profiling.

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field mem_profiler_dir: Path [Optional]

Path to save memory profiling results. Defaults to logger.output_dir/mem-prof.

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field mem_profiler_max_entries: Annotated[int] = 100000

Maximum number of entries to store in the memory profiler.

Constraints:

ge = 1
func = <function parse_human_val at 0x7d029ece95a0>
json_schema_input_type = PydanticUndefined

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

field kill_switch_path: Path = PosixPath('/tmp/at_kill_switch')

Path to a file that can be used to trigger a graceful shutdown mid-training (sets early exit to True).

Validated by:

build_deepspeed_config
init_dist
mem_profiler_mkdir
set_tokenizer
train_log_metrics_path_prep
validate_eval_frequency
validate_single_checkpoint_resume

pydantic model arctic_training.config.checkpoint.CheckpointConfig[source]

Bases: BaseConfig

Show JSON schema

{
   "title": "CheckpointConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Checkpoint engine type. ",
         "title": "Type",
         "type": "string"
      },
      "output_dir": {
         "description": "Checkpoint output directory. If directory does not exist, it will be created. ",
         "format": "path",
         "title": "Output Dir",
         "type": "string"
      },
      "enabled": {
         "default": true,
         "description": "Enable this checkpoint engine. ",
         "title": "Enabled",
         "type": "boolean"
      },
      "auto_resume": {
         "default": false,
         "description": "If a checkpoint is found in the output directory, resume training from that checkpoint. ",
         "title": "Auto Resume",
         "type": "boolean"
      },
      "save_every_n_steps": {
         "default": 0,
         "description": "How often to trigger a checkpoint save by training global step count. ",
         "minimum": 0,
         "title": "Save Every N Steps",
         "type": "integer"
      },
      "save_every_n_epochs": {
         "default": 0,
         "description": "How often to trigger a checkpoint save by training epoch count. ",
         "minimum": 0,
         "title": "Save Every N Epochs",
         "type": "integer"
      },
      "save_end_of_training": {
         "default": false,
         "description": "Whether to save a checkpoint at the end of training. ",
         "title": "Save End Of Training",
         "type": "boolean"
      }
   },
   "additionalProperties": false,
   "required": [
      "output_dir"
   ]
}

Config:

extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

Fields:

auto_resume (bool)
enabled (bool)
output_dir (pathlib.Path)
save_end_of_training (bool)
save_every_n_epochs (int)
save_every_n_steps (int)
type (str)

Validators:

resolve_output_dir » output_dir

field type: str = '': Checkpoint engine type.

field output_dir: Path [Required]

Checkpoint output directory. If directory does not exist, it will be created.

Validated by:

resolve_output_dir

field enabled: bool = True: Enable this checkpoint engine.

field auto_resume: bool = False: If a checkpoint is found in the output directory, resume training from that checkpoint.

field save_every_n_steps: Annotated[int] = 0

How often to trigger a checkpoint save by training global step count.

Constraints:

ge = 0
func = <function parse_human_val at 0x7d029ece95a0>
json_schema_input_type = PydanticUndefined

field save_every_n_epochs: Annotated[int] = 0

How often to trigger a checkpoint save by training epoch count.

Constraints:

ge = 0
func = <function parse_human_val at 0x7d029ece95a0>
json_schema_input_type = PydanticUndefined

field save_end_of_training: bool = False: Whether to save a checkpoint at the end of training.

pydantic model arctic_training.config.data.DataConfig[source]

Bases: BaseConfig

Show JSON schema

{
   "title": "DataConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Data factory type. Defaults to the `data_factory_type` in the trainer. ",
         "title": "Type",
         "type": "string"
      },
      "sources": {
         "description": "List of data sources to use for training. These must be registered `DataSource`. ",
         "items": {
            "$ref": "#/$defs/DataSourceConfig"
         },
         "title": "Sources",
         "type": "array"
      },
      "eval_sources": {
         "default": [],
         "description": "list of data sources to use for evaluation. These must be registered `DataSource`. ",
         "items": {
            "$ref": "#/$defs/DataSourceConfig"
         },
         "title": "Eval Sources",
         "type": "array"
      },
      "train_eval_split": {
         "default": [
            1.0,
            0.0
         ],
         "description": "How much of the training data to use for evaluation. ",
         "maxItems": 2,
         "minItems": 2,
         "prefixItems": [
            {
               "type": "number"
            },
            {
               "type": "number"
            }
         ],
         "title": "Train Eval Split",
         "type": "array"
      },
      "max_length": {
         "default": 8192,
         "description": "Maximum length of the input sequence. ",
         "title": "Max Length",
         "type": "integer"
      },
      "num_proc": {
         "default": 16,
         "description": "Number of processes to use for data loading. ",
         "title": "Num Proc",
         "type": "integer"
      },
      "dl_num_workers": {
         "default": 2,
         "description": "Number of DL workers per gpu. ",
         "title": "Dl Num Workers",
         "type": "integer"
      },
      "seed": {
         "default": 42,
         "description": "Seed for data loading. ",
         "title": "Seed",
         "type": "integer"
      },
      "use_data_cache": {
         "anyOf": [
            {
               "type": "boolean"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Whether to cache loaded data. ",
         "title": "Use Data Cache"
      },
      "cache_processed_data": {
         "anyOf": [
            {
               "type": "boolean"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Deprecated, please use \"use_data_cache\". ",
         "title": "Cache Processed Data"
      },
      "cache_dir": {
         "default": "/tmp",
         "description": "Directory to store cached data. ",
         "format": "path",
         "title": "Cache Dir",
         "type": "string"
      },
      "cache_fs_type": {
         "default": "auto",
         "enum": [
            "auto",
            "local",
            "shared"
         ],
         "title": "Cache Fs Type",
         "type": "string"
      }
   },
   "$defs": {
      "DataSourceConfig": {
         "additionalProperties": false,
         "description": "Base DataSource configuration.",
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Data source type. Defaults to 'huggingface' if only a dataset name or path is provided.",
               "title": "Type",
               "type": "string"
            },
            "split": {
               "default": "",
               "description": "Which split the data source is used for. This will be automatically set to either \"train\" or \"eval\" if no value is passed.\n\nFor HFDataSource, this can be any value supported by Dataset slice splits:\nhttps://huggingface.co/docs/datasets/en/loading#slice-splits.",
               "title": "Split",
               "type": "string"
            },
            "sample_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Ratio of the dataset to randomly sample. If None, all examples are used.",
               "title": "Sample Ratio"
            },
            "sample_count": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Number of examples to randomly sample. If None, all examples are used.",
               "title": "Sample Count"
            },
            "sample_seed": {
               "default": 42,
               "description": "Seed for random sampling. Used only if `sample_ratio` or `sample_count` is set.",
               "title": "Sample Seed",
               "type": "integer"
            },
            "process": {
               "default": true,
               "description": "Whether to process the data with the data factory `process` function (e.g., tokenization for SFTDataFactory). ",
               "title": "Process",
               "type": "boolean"
            }
         },
         "title": "DataSourceConfig",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "sources"
   ]
}

Config:

extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

Fields:

cache_dir (pathlib.Path)
cache_fs_type (Literal['auto', 'local', 'shared'])
cache_processed_data (bool | None)
dl_num_workers (int)
eval_sources (List[arctic_training.config.data.DataSourceConfig])
max_length (int)
num_proc (int)
seed (int)
sources (List[arctic_training.config.data.DataSourceConfig])
train_eval_split (Tuple[float, float])
type (str)
use_data_cache (bool | None)

Validators:

deprecate_cache_processed_data » cache_processed_data
deprecate_cache_processed_data » use_data_cache
init_source_configs » eval_sources
init_source_configs » sources
resolve_cache_dir » cache_dir
set_cache_fs_type » all fields
validate_cache_dir » all fields
validate_train_eval_split » all fields

field type: str = ''

Data factory type. Defaults to the data_factory_type in the trainer.

Validated by:

set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field sources: List[DataSourceConfig] [Required]

List of data sources to use for training. These must be registered DataSource.

Validated by:

init_source_configs
set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field eval_sources: List[DataSourceConfig] = []

list of data sources to use for evaluation. These must be registered DataSource.

Validated by:

init_source_configs
set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field train_eval_split: Tuple[float, float] = (1.0, 0.0)

How much of the training data to use for evaluation.

Validated by:

set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field max_length: Annotated[int] = 8192

Maximum length of the input sequence.

Constraints:

func = <function parse_human_val at 0x7d029ece95a0>
json_schema_input_type = PydanticUndefined

Validated by:

set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field num_proc: int = 16

Number of processes to use for data loading.

Validated by:

set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field dl_num_workers: int = 2

Number of DL workers per gpu.

Validated by:

set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field seed: int = 42

Seed for data loading.

Validated by:

set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field use_data_cache: Optional[bool] = None

Whether to cache loaded data.

Validated by:

deprecate_cache_processed_data
set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field cache_processed_data: Optional[bool] = None

Deprecated, please use “use_data_cache”.

Validated by:

deprecate_cache_processed_data
set_cache_fs_type
validate_cache_dir
validate_train_eval_split

field cache_dir: Path = PosixPath('/tmp')

Directory to store cached data.

Validated by:

resolve_cache_dir
set_cache_fs_type
validate_cache_dir
validate_train_eval_split

validator init_source_configs » eval_sources, sources[source]

Convert string and dict input to correct subclass of DataSourceConfig. If a string is passed, “huggingface” is used as the DataSource type.

Return type:

List[DataSourceConfig]

Parameters:

v (List[str | Dict | DataSourceConfig])
info (ValidationInfo)

pydantic model arctic_training.config.logger.LoggerConfig[source]

Bases: BaseConfig

Show JSON schema

{
   "title": "LoggerConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "output_dir": {
         "default": "/dev/null",
         "description": "Output directory for log files. ",
         "format": "path",
         "title": "Output Dir",
         "type": "string"
      },
      "level": {
         "default": "WARNING",
         "description": "Log level for the logger. ",
         "title": "Level",
         "type": "string"
      },
      "print_output_ranks": {
         "anyOf": [
            {
               "const": "*",
               "type": "string"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            }
         ],
         "default": [
            0
         ],
         "description": "Which ranks will print logs. Either a list of ranks or \"*\" for all ranks. ",
         "title": "Print Output Ranks"
      },
      "file_output_ranks": {
         "anyOf": [
            {
               "const": "*",
               "type": "string"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            }
         ],
         "default": "*",
         "description": "Which ranks will output logs to a file. Either a list of ranks or \"*\" for all ranks. ",
         "title": "File Output Ranks"
      }
   },
   "additionalProperties": false
}

Config:

extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

Fields:

file_output_ranks (Literal['*'] | List[int])
level (str)
output_dir (pathlib.Path)
print_output_ranks (Literal['*'] | List[int])

Validators:

fill_output_ranks » all fields
set_wandb_output_dir » all fields

field output_dir: Path = PosixPath('/dev/null')

Output directory for log files.

Validated by:

fill_output_ranks
set_wandb_output_dir

field level: str = 'WARNING'

Log level for the logger.

Validated by:

fill_output_ranks
set_wandb_output_dir

field print_output_ranks: Union[Literal['*'], List[int]] = [0]

Which ranks will print logs. Either a list of ranks or “*” for all ranks.

Validated by:

fill_output_ranks
set_wandb_output_dir

field file_output_ranks: Union[Literal['*'], List[int]] = '*'

Which ranks will output logs to a file. Either a list of ranks or “*” for all ranks.

Validated by:

fill_output_ranks
set_wandb_output_dir

pydantic model arctic_training.config.model.ModelConfig[source]

Bases: BaseConfig

Show JSON schema

{
   "title": "ModelConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Model factory type. ",
         "title": "Type",
         "type": "string"
      },
      "name_or_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "format": "path",
               "type": "string"
            }
         ],
         "description": "Model name (as described in Hugging Face model hub) or local path to model checkpoint. ",
         "title": "Name Or Path"
      },
      "dtype": {
         "default": "torch.bfloat16",
         "description": "Data type for model weights. ",
         "examples": [
            "float32",
            "bfloat16"
         ],
         "title": "Torch Dtype",
         "type": "string"
      },
      "save_name": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Name to use when saving the model. ",
         "title": "Save Name"
      },
      "attn_implementation": {
         "default": "sdpa",
         "description": "Attention implementation to use. ",
         "title": "Attn Implementation",
         "type": "string"
      },
      "disable_activation_checkpoint": {
         "default": false,
         "description": "Disable the use of activation checkpointing. ",
         "title": "Disable Activation Checkpoint",
         "type": "boolean"
      },
      "peft_config": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Configuration for Parameter Efficient Fine Tuning. ",
         "title": "Peft Config"
      }
   },
   "additionalProperties": false,
   "required": [
      "name_or_path"
   ]
}

Config:

extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

Fields:

attn_implementation (str)
disable_activation_checkpoint (bool)
dtype (arctic_training.config.enums.DType)
name_or_path (str | pathlib.Path)
peft_config (Dict | None)
save_name (str | None)
type (str)

Validators:

validate_attn_implementation » attn_implementation
validate_peft_config_type » peft_config

field type: str = '': Model factory type.

field name_or_path: Union[str, Path] [Required]: Model name (as described in Hugging Face model hub) or local path to model checkpoint.

field dtype: DType = DType.BF16: Data type for model weights.

field save_name: Optional[str] = None: Name to use when saving the model.

field attn_implementation: str = 'sdpa'

Attention implementation to use.

Validated by:

validate_attn_implementation

field disable_activation_checkpoint: bool = False: Disable the use of activation checkpointing.

field peft_config: Optional[Dict] = None

Configuration for Parameter Efficient Fine Tuning.

Validated by:

validate_peft_config_type

pydantic model arctic_training.config.optimizer.OptimizerConfig[source]

Bases: BaseConfig

Show JSON schema

{
   "title": "OptimizerConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Optimizer factory type. Defaults to the `optimizer_factory_type` of the trainer. ",
         "title": "Type",
         "type": "string"
      },
      "weight_decay": {
         "default": 0.1,
         "description": "Coefficient for L2 regularization applied to the optimizer's weights. ",
         "minimum": 0.0,
         "title": "Weight Decay",
         "type": "number"
      },
      "betas": {
         "default": [
            0.9,
            0.999
         ],
         "description": "Tuple of coefficients used for computing running averages of gradient and its square (e.g., (beta1, beta2) for Adam). ",
         "maxItems": 2,
         "minItems": 2,
         "prefixItems": [
            {
               "type": "number"
            },
            {
               "type": "number"
            }
         ],
         "title": "Betas",
         "type": "array"
      },
      "lr": {
         "default": 0.0005,
         "description": "The initial learning rate. ",
         "minimum": 0.0,
         "title": "Lr",
         "type": "number"
      }
   },
   "additionalProperties": false
}

Config:

extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

Fields:

betas (Tuple[float, float])
learning_rate (float)
type (str)
weight_decay (float)

field type: str = '': Optimizer factory type. Defaults to the optimizer_factory_type of the trainer.

field weight_decay: Annotated[float] = 0.1

Coefficient for L2 regularization applied to the optimizer’s weights.

Constraints:

ge = 0.0
func = <function parse_human_val at 0x7d029ece95a0>
json_schema_input_type = PydanticUndefined

field betas: Tuple[float, float] = (0.9, 0.999): Tuple of coefficients used for computing running averages of gradient and its square (e.g., (beta1, beta2) for Adam).

field learning_rate: Annotated[float] = 0.0005 (alias 'lr')

The initial learning rate.

Constraints:

ge = 0.0
func = <function parse_human_val at 0x7d029ece95a0>
json_schema_input_type = PydanticUndefined

pydantic model arctic_training.config.scheduler.SchedulerConfig[source]

Bases: BaseConfig

Show JSON schema

{
   "title": "SchedulerConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Scheduler factory type. Defaults to the `scheduler_factory_type` of the trainer. ",
         "title": "Type",
         "type": "string"
      },
      "lr": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The initial learning rate. Deprecated in favor of `optimizer.learning_rate`. ",
         "title": "Lr"
      }
   },
   "additionalProperties": false
}

Config:

extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

Fields:

learning_rate (float | None)
type (str)

Validators:

_deprecated_learning_rate » learning_rate

field type: str = '': Scheduler factory type. Defaults to the scheduler_factory_type of the trainer.

field learning_rate: Optional[float] = None (alias 'lr')

The initial learning rate. Deprecated in favor of optimizer.learning_rate.

Validated by:

_deprecated_learning_rate

pydantic model arctic_training.config.tokenizer.TokenizerConfig[source]

Bases: BaseConfig

Show JSON schema

{
   "title": "TokenizerConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Tokenizer factory type. Defaults to the `tokenizer_factory_type` of the trainer. ",
         "title": "Type",
         "type": "string"
      },
      "name_or_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "format": "path",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "",
         "description": "Tokenizer name (as described in Hugging Face model hub) or local path directory containing tokenizer. ",
         "title": "Name Or Path"
      }
   },
   "additionalProperties": false
}

Config:

extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

Fields:

name_or_path (str | pathlib.Path | None)
type (str)

field type: str = '': Tokenizer factory type. Defaults to the tokenizer_factory_type of the trainer.

field name_or_path: Union[str, Path, None] = '': Tokenizer name (as described in Hugging Face model hub) or local path directory containing tokenizer.

pydantic model arctic_training.config.wandb.WandBConfig[source]

Bases: BaseConfig

Show JSON schema

{
   "title": "WandBConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "enable": {
         "default": false,
         "description": "Whether to enable Weights and Biases logging. ",
         "title": "Enable",
         "type": "boolean"
      },
      "entity": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Weights and Biases entity name. ",
         "title": "Entity"
      },
      "project": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "arctic-training",
         "description": "Weights and Biases project name. ",
         "title": "Project"
      },
      "name": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Weights and Biases run name. ",
         "title": "Name"
      }
   },
   "additionalProperties": false
}

Config:

extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True

Fields:

enable (bool)
entity (str | None)
name (str | None)
project (str | None)

field enable: bool = False: Whether to enable Weights and Biases logging.

field entity: Optional[str] = None: Weights and Biases entity name.

field project: Optional[str] = 'arctic-training': Weights and Biases project name.

field name: Optional[str] = None: Weights and Biases run name.

Numerical Formatting

When specifying numerical values in the configuration file, you can use human-friendly strings to represent very large or very small numbers. The following formats are supported:

X%: This format represents a percentage. For example, 50% is equivalent to 0.5.
XeY: This format represents a number in scientific notation. For example, 1e-6 is equivalent to 0.000001.
X^Y: This format represents a number raised to a power. For example, 2^20 is equivalent to 1048576.
XK: This format represents a number in thousands (base 10). For example, 1K is equivalent to 1000. Similarly you can use M for millions, B for billions, and T for trillions.
1Ki: This format represents a number in kibibytes (base 2). For example, 1Ki is equivalent to 1024. Similarly you can use Mi for mebibytes, Gi for gibibytes, and Ti for tebibytes.