Configuration

The main input to the ArcticTraining CLI is a YAML configuration file that defines files for the TrainerConfig class. This is a Pydantic configuration model that also contains the sub-configurations for data, model, etc.

pydantic model arctic_training.config.trainer.TrainerConfig[source]

Bases: BaseConfig

Base Trainer Configuration.

Show JSON schema
{
   "title": "TrainerConfig",
   "description": "Base Trainer Configuration.",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "sft",
         "description": "Trainer type. ",
         "title": "Type",
         "type": "string"
      },
      "code": {
         "default": "train.py",
         "description": "Path to the python script containing custom trainer implementation. ",
         "format": "path",
         "title": "Code",
         "type": "string"
      },
      "skip_validation": {
         "default": false,
         "description": "Skips validation of types for subconfigs and registered classes. ",
         "title": "Skip Validation",
         "type": "boolean"
      },
      "model": {
         "$ref": "#/$defs/ModelConfig",
         "description": "Model configuration. "
      },
      "tokenizer": {
         "$ref": "#/$defs/TokenizerConfig",
         "description": "Tokenizer configuration. "
      },
      "data": {
         "$ref": "#/$defs/DataConfig",
         "description": "Train and eval data configuration. "
      },
      "logger": {
         "$ref": "#/$defs/LoggerConfig",
         "description": "Logger configuration. "
      },
      "wandb": {
         "$ref": "#/$defs/WandBConfig",
         "description": "Weights and Biases configuration. "
      },
      "scheduler": {
         "$ref": "#/$defs/SchedulerConfig",
         "description": "Scheduler configuration. "
      },
      "optimizer": {
         "$ref": "#/$defs/OptimizerConfig",
         "description": "Optimizer configuration. "
      },
      "deepspeed": {
         "additionalProperties": true,
         "default": {},
         "description": "DeepSpeed config dict. Will be automatically filled if not provided by the user. ",
         "title": "Deepspeed",
         "type": "object"
      },
      "epochs": {
         "default": 1,
         "description": "Number of epochs to train. ",
         "minimum": 0,
         "title": "Epochs",
         "type": "integer"
      },
      "loss_log_interval": {
         "default": 1,
         "description": "Number of steps between logging loss. ",
         "minimum": 0,
         "title": "Loss Log Interval",
         "type": "integer"
      },
      "train_log_iter_interval": {
         "default": 1,
         "description": "Iters between training metric log outputs. `0` is off, only intervals of `1` currently supported. ",
         "enum": [
            0,
            1
         ],
         "title": "Train Log Iter Interval",
         "type": "integer"
      },
      "train_log_metrics_path": {
         "default": "train-log-metrics.jsonl",
         "description": ".jsonl path to log precise metrics according to the `train_log_iter_interval` schedule. Defaults to `./train-log-metrics.jsonl` ",
         "format": "path",
         "title": "Train Log Metrics Path",
         "type": "string"
      },
      "gradient_accumulation_steps": {
         "default": 1,
         "description": "Number of gradient accumulation steps. ",
         "minimum": 1,
         "title": "Gradient Accumulation Steps",
         "type": "integer"
      },
      "micro_batch_size": {
         "default": 1,
         "description": "Micro batch size per GPU. ",
         "minimum": 1,
         "title": "Micro Batch Size",
         "type": "integer"
      },
      "sequence_parallel_size": {
         "default": 1,
         "description": "Sequence Parallelism Degree. Disabled if set to 1 ",
         "minimum": 1,
         "title": "Sequence Parallel Size",
         "type": "integer"
      },
      "activation_checkpoint_cpu_offload": {
         "default": false,
         "description": "Offload activation checkpoint tensors to cpu. Enables a much longer sequence length. It is not very beneficial if sequence length is <64k  ",
         "title": "Activation Checkpoint Cpu Offload",
         "type": "boolean"
      },
      "tiled_mlp_compute": {
         "default": false,
         "description": "Tile the MLP computation to save GPU memory. Currently only limited architectures supported, but can be expanded to more. ",
         "title": "Tiled Mlp Compute",
         "type": "boolean"
      },
      "seed": {
         "default": 42,
         "description": "Random seed value for numpy, python.random, torch, and transformers. ",
         "minimum": 0,
         "title": "Seed",
         "type": "integer"
      },
      "checkpoint": {
         "default": [],
         "description": "Checkpoint configurations. Multiple checkpoint engines may be used together. ",
         "items": {
            "$ref": "#/$defs/CheckpointConfig"
         },
         "title": "Checkpoint",
         "type": "array"
      },
      "train_iters": {
         "default": 0,
         "description": "Maximum number of training iterations. ",
         "minimum": 0,
         "title": "Train Iters",
         "type": "integer"
      },
      "eval_frequency": {
         "default": 0,
         "minimum": 0,
         "title": "Eval Frequency",
         "type": "integer"
      },
      "exit_iteration": {
         "default": 0,
         "description": "Force exit of training after specified iteration count (useful for debugging). ",
         "minimum": 0,
         "title": "Exit Iteration",
         "type": "integer"
      },
      "min_iterations": {
         "default": 0,
         "description": "When >0, the training dataset will be replicated until there is enough data to run this many iterations. ",
         "minimum": 0,
         "title": "Min Iterations",
         "type": "integer"
      },
      "overfit_first_batch": {
         "default": false,
         "description": "Train only on repetitions of the first training batch. Useful for development. ",
         "title": "Overfit First Batch",
         "type": "boolean"
      },
      "mem_profiler": {
         "default": null,
         "description": "Enable memory profiling. ",
         "enum": [
            null,
            "step",
            "e2e"
         ],
         "title": "Mem Profiler"
      },
      "mem_profiler_dir": {
         "description": "Path to save memory profiling results. Defaults to `logger.output_dir/mem-prof`. ",
         "format": "path",
         "title": "Mem Profiler Dir",
         "type": "string"
      },
      "mem_profiler_max_entries": {
         "default": 100000,
         "description": "Maximum number of entries to store in the memory profiler. ",
         "minimum": 1,
         "title": "Mem Profiler Max Entries",
         "type": "integer"
      },
      "kill_switch_path": {
         "default": "/tmp/at_kill_switch",
         "description": "Path to a file that can be used to trigger a graceful shutdown mid-training (sets early exit to True). ",
         "format": "path",
         "title": "Kill Switch Path",
         "type": "string"
      }
   },
   "$defs": {
      "CheckpointConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Checkpoint engine type. ",
               "title": "Type",
               "type": "string"
            },
            "output_dir": {
               "description": "Checkpoint output directory. If directory does not exist, it will be created. ",
               "format": "path",
               "title": "Output Dir",
               "type": "string"
            },
            "enabled": {
               "default": true,
               "description": "Enable this checkpoint engine. ",
               "title": "Enabled",
               "type": "boolean"
            },
            "auto_resume": {
               "default": false,
               "description": "If a checkpoint is found in the output directory, resume training from that checkpoint. ",
               "title": "Auto Resume",
               "type": "boolean"
            },
            "save_every_n_steps": {
               "default": 0,
               "description": "How often to trigger a checkpoint save by training global step count. ",
               "minimum": 0,
               "title": "Save Every N Steps",
               "type": "integer"
            },
            "save_every_n_epochs": {
               "default": 0,
               "description": "How often to trigger a checkpoint save by training epoch count. ",
               "minimum": 0,
               "title": "Save Every N Epochs",
               "type": "integer"
            },
            "save_end_of_training": {
               "default": false,
               "description": "Whether to save a checkpoint at the end of training. ",
               "title": "Save End Of Training",
               "type": "boolean"
            }
         },
         "required": [
            "output_dir"
         ],
         "title": "CheckpointConfig",
         "type": "object"
      },
      "DataConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Data factory type. Defaults to the `data_factory_type` in the trainer. ",
               "title": "Type",
               "type": "string"
            },
            "sources": {
               "description": "List of data sources to use for training. These must be registered `DataSource`. ",
               "items": {
                  "$ref": "#/$defs/DataSourceConfig"
               },
               "title": "Sources",
               "type": "array"
            },
            "eval_sources": {
               "default": [],
               "description": "list of data sources to use for evaluation. These must be registered `DataSource`. ",
               "items": {
                  "$ref": "#/$defs/DataSourceConfig"
               },
               "title": "Eval Sources",
               "type": "array"
            },
            "train_eval_split": {
               "default": [
                  1.0,
                  0.0
               ],
               "description": "How much of the training data to use for evaluation. ",
               "maxItems": 2,
               "minItems": 2,
               "prefixItems": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "number"
                  }
               ],
               "title": "Train Eval Split",
               "type": "array"
            },
            "max_length": {
               "default": 8192,
               "description": "Maximum length of the input sequence. ",
               "title": "Max Length",
               "type": "integer"
            },
            "num_proc": {
               "default": 16,
               "description": "Number of processes to use for data loading. ",
               "title": "Num Proc",
               "type": "integer"
            },
            "dl_num_workers": {
               "default": 2,
               "description": "Number of DL workers per gpu. ",
               "title": "Dl Num Workers",
               "type": "integer"
            },
            "seed": {
               "default": 42,
               "description": "Seed for data loading. ",
               "title": "Seed",
               "type": "integer"
            },
            "use_data_cache": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Whether to cache loaded data. ",
               "title": "Use Data Cache"
            },
            "cache_processed_data": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Deprecated, please use \"use_data_cache\". ",
               "title": "Cache Processed Data"
            },
            "cache_dir": {
               "default": "/tmp",
               "description": "Directory to store cached data. ",
               "format": "path",
               "title": "Cache Dir",
               "type": "string"
            },
            "cache_fs_type": {
               "default": "auto",
               "enum": [
                  "auto",
                  "local",
                  "shared"
               ],
               "title": "Cache Fs Type",
               "type": "string"
            }
         },
         "required": [
            "sources"
         ],
         "title": "DataConfig",
         "type": "object"
      },
      "DataSourceConfig": {
         "additionalProperties": false,
         "description": "Base DataSource configuration.",
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Data source type. Defaults to 'huggingface' if only a dataset name or path is provided.",
               "title": "Type",
               "type": "string"
            },
            "split": {
               "default": "",
               "description": "Which split the data source is used for. This will be automatically set to either \"train\" or \"eval\" if no value is passed.\n\nFor HFDataSource, this can be any value supported by Dataset slice splits:\nhttps://huggingface.co/docs/datasets/en/loading#slice-splits.",
               "title": "Split",
               "type": "string"
            },
            "sample_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Ratio of the dataset to randomly sample. If None, all examples are used.",
               "title": "Sample Ratio"
            },
            "sample_count": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Number of examples to randomly sample. If None, all examples are used.",
               "title": "Sample Count"
            },
            "sample_seed": {
               "default": 42,
               "description": "Seed for random sampling. Used only if `sample_ratio` or `sample_count` is set.",
               "title": "Sample Seed",
               "type": "integer"
            },
            "process": {
               "default": true,
               "description": "Whether to process the data with the data factory `process` function (e.g., tokenization for SFTDataFactory). ",
               "title": "Process",
               "type": "boolean"
            }
         },
         "title": "DataSourceConfig",
         "type": "object"
      },
      "LoggerConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "output_dir": {
               "default": "/dev/null",
               "description": "Output directory for log files. ",
               "format": "path",
               "title": "Output Dir",
               "type": "string"
            },
            "level": {
               "default": "WARNING",
               "description": "Log level for the logger. ",
               "title": "Level",
               "type": "string"
            },
            "print_output_ranks": {
               "anyOf": [
                  {
                     "const": "*",
                     "type": "string"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  }
               ],
               "default": [
                  0
               ],
               "description": "Which ranks will print logs. Either a list of ranks or \"*\" for all ranks. ",
               "title": "Print Output Ranks"
            },
            "file_output_ranks": {
               "anyOf": [
                  {
                     "const": "*",
                     "type": "string"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  }
               ],
               "default": "*",
               "description": "Which ranks will output logs to a file. Either a list of ranks or \"*\" for all ranks. ",
               "title": "File Output Ranks"
            }
         },
         "title": "LoggerConfig",
         "type": "object"
      },
      "ModelConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Model factory type. ",
               "title": "Type",
               "type": "string"
            },
            "name_or_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "format": "path",
                     "type": "string"
                  }
               ],
               "description": "Model name (as described in Hugging Face model hub) or local path to model checkpoint. ",
               "title": "Name Or Path"
            },
            "dtype": {
               "default": "torch.bfloat16",
               "description": "Data type for model weights. ",
               "examples": [
                  "float32",
                  "bfloat16"
               ],
               "title": "Torch Dtype",
               "type": "string"
            },
            "save_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Name to use when saving the model. ",
               "title": "Save Name"
            },
            "attn_implementation": {
               "default": "sdpa",
               "description": "Attention implementation to use. ",
               "title": "Attn Implementation",
               "type": "string"
            },
            "disable_activation_checkpoint": {
               "default": false,
               "description": "Disable the use of activation checkpointing. ",
               "title": "Disable Activation Checkpoint",
               "type": "boolean"
            },
            "peft_config": {
               "anyOf": [
                  {
                     "additionalProperties": true,
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Configuration for Parameter Efficient Fine Tuning. ",
               "title": "Peft Config"
            }
         },
         "required": [
            "name_or_path"
         ],
         "title": "ModelConfig",
         "type": "object"
      },
      "OptimizerConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Optimizer factory type. Defaults to the `optimizer_factory_type` of the trainer. ",
               "title": "Type",
               "type": "string"
            },
            "weight_decay": {
               "default": 0.1,
               "description": "Coefficient for L2 regularization applied to the optimizer's weights. ",
               "minimum": 0.0,
               "title": "Weight Decay",
               "type": "number"
            },
            "betas": {
               "default": [
                  0.9,
                  0.999
               ],
               "description": "Tuple of coefficients used for computing running averages of gradient and its square (e.g., (beta1, beta2) for Adam). ",
               "maxItems": 2,
               "minItems": 2,
               "prefixItems": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "number"
                  }
               ],
               "title": "Betas",
               "type": "array"
            },
            "lr": {
               "default": 0.0005,
               "description": "The initial learning rate. ",
               "minimum": 0.0,
               "title": "Lr",
               "type": "number"
            }
         },
         "title": "OptimizerConfig",
         "type": "object"
      },
      "SchedulerConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Scheduler factory type. Defaults to the `scheduler_factory_type` of the trainer. ",
               "title": "Type",
               "type": "string"
            },
            "lr": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The initial learning rate. Deprecated in favor of `optimizer.learning_rate`. ",
               "title": "Lr"
            }
         },
         "title": "SchedulerConfig",
         "type": "object"
      },
      "TokenizerConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Tokenizer factory type. Defaults to the `tokenizer_factory_type` of the trainer. ",
               "title": "Type",
               "type": "string"
            },
            "name_or_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "format": "path",
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "",
               "description": "Tokenizer name (as described in Hugging Face model hub) or local path directory containing tokenizer. ",
               "title": "Name Or Path"
            }
         },
         "title": "TokenizerConfig",
         "type": "object"
      },
      "WandBConfig": {
         "additionalProperties": false,
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "enable": {
               "default": false,
               "description": "Whether to enable Weights and Biases logging. ",
               "title": "Enable",
               "type": "boolean"
            },
            "entity": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Weights and Biases entity name. ",
               "title": "Entity"
            },
            "project": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "arctic-training",
               "description": "Weights and Biases project name. ",
               "title": "Project"
            },
            "name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Weights and Biases run name. ",
               "title": "Name"
            }
         },
         "title": "WandBConfig",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "model",
      "data"
   ]
}

Config:
  • extra: str = forbid

  • use_enum_values: bool = True

  • validate_default: bool = True

  • use_attribute_docstrings: bool = True

  • populate_by_name: bool = True

  • validate_by_alias: bool = True

  • validate_by_name: bool = True

Fields:
Validators:
  • build_deepspeed_config » all fields

  • coerce_deepspeed_human_friendly_values » deepspeed

  • init_checkpoint_configs » checkpoint

  • init_data_config » data

  • init_dist » all fields

  • init_model_config » model

  • init_optimizer_config » optimizer

  • init_scheduler_config » scheduler

  • init_tokenizer_config » tokenizer

  • initialize_logger » logger

  • mem_profiler_mkdir » all fields

  • set_tokenizer » all fields

  • train_log_metrics_path_prep » all fields

  • validate_eval_frequency » all fields

  • validate_single_checkpoint_resume » all fields

field type: str = 'sft'

Trainer type.

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field code: Path = PosixPath('train.py')

Path to the python script containing custom trainer implementation.

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field skip_validation: bool = False

Skips validation of types for subconfigs and registered classes.

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field model: ModelConfig [Required]

Model configuration.

Validated by:
  • build_deepspeed_config

  • init_dist

  • init_model_config

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field tokenizer: TokenizerConfig [Optional]

Tokenizer configuration.

Validated by:
  • build_deepspeed_config

  • init_dist

  • init_tokenizer_config

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field data: DataConfig [Required]

Train and eval data configuration.

Validated by:
  • build_deepspeed_config

  • init_data_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field logger: LoggerConfig [Optional]

Logger configuration.

Validated by:
  • build_deepspeed_config

  • init_dist

  • initialize_logger

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field wandb: WandBConfig [Optional]

Weights and Biases configuration.

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field scheduler: SchedulerConfig [Optional]

Scheduler configuration.

Validated by:
  • build_deepspeed_config

  • init_dist

  • init_scheduler_config

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field optimizer: OptimizerConfig [Optional]

Optimizer configuration.

Validated by:
  • build_deepspeed_config

  • init_dist

  • init_optimizer_config

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field deepspeed: Dict[str, Any] = {}

DeepSpeed config dict. Will be automatically filled if not provided by the user.

Validated by:
  • build_deepspeed_config

  • coerce_deepspeed_human_friendly_values

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field epochs: int = 1

Number of epochs to train.

Constraints:
  • ge = 0

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field loss_log_interval: Annotated[int] = 1

Number of steps between logging loss.

Constraints:
  • ge = 0

  • func = <function parse_human_val at 0x7d029ece95a0>

  • json_schema_input_type = PydanticUndefined

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field train_log_iter_interval: Literal[0, 1] = 1

Iters between training metric log outputs. 0 is off, only intervals of 1 currently supported.

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field train_log_metrics_path: Path = PosixPath('train-log-metrics.jsonl')

.jsonl path to log precise metrics according to the train_log_iter_interval schedule. Defaults to ./train-log-metrics.jsonl

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field gradient_accumulation_steps: int = 1

Number of gradient accumulation steps.

Constraints:
  • ge = 1

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field micro_batch_size: int = 1

Micro batch size per GPU.

Constraints:
  • ge = 1

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field sequence_parallel_size: int = 1

Sequence Parallelism Degree. Disabled if set to 1

Constraints:
  • ge = 1

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field activation_checkpoint_cpu_offload: bool = False

Offload activation checkpoint tensors to cpu. Enables a much longer sequence length. It is not very beneficial if sequence length is <64k

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field tiled_mlp_compute: bool = False

Tile the MLP computation to save GPU memory. Currently only limited architectures supported, but can be expanded to more.

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field seed: int = 42

Random seed value for numpy, python.random, torch, and transformers.

Constraints:
  • ge = 0

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field checkpoint: List[CheckpointConfig] = []

Checkpoint configurations. Multiple checkpoint engines may be used together.

Validated by:
  • build_deepspeed_config

  • init_checkpoint_configs

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field train_iters: Annotated[int] = 0

Maximum number of training iterations.

Constraints:
  • ge = 0

  • func = <function parse_human_val at 0x7d029ece95a0>

  • json_schema_input_type = PydanticUndefined

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field exit_iteration: int = 0

Force exit of training after specified iteration count (useful for debugging).

Constraints:
  • ge = 0

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field min_iterations: Annotated[int] = 0

When >0, the training dataset will be replicated until there is enough data to run this many iterations.

Constraints:
  • ge = 0

  • func = <function parse_human_val at 0x7d029ece95a0>

  • json_schema_input_type = PydanticUndefined

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field overfit_first_batch: bool = False

Train only on repetitions of the first training batch. Useful for development.

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field mem_profiler: Literal[None, 'step', 'e2e'] = None

Enable memory profiling.

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field mem_profiler_dir: Path [Optional]

Path to save memory profiling results. Defaults to logger.output_dir/mem-prof.

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field mem_profiler_max_entries: Annotated[int] = 100000

Maximum number of entries to store in the memory profiler.

Constraints:
  • ge = 1

  • func = <function parse_human_val at 0x7d029ece95a0>

  • json_schema_input_type = PydanticUndefined

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

field kill_switch_path: Path = PosixPath('/tmp/at_kill_switch')

Path to a file that can be used to trigger a graceful shutdown mid-training (sets early exit to True).

Validated by:
  • build_deepspeed_config

  • init_dist

  • mem_profiler_mkdir

  • set_tokenizer

  • train_log_metrics_path_prep

  • validate_eval_frequency

  • validate_single_checkpoint_resume

pydantic model arctic_training.config.checkpoint.CheckpointConfig[source]

Bases: BaseConfig

Show JSON schema
{
   "title": "CheckpointConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Checkpoint engine type. ",
         "title": "Type",
         "type": "string"
      },
      "output_dir": {
         "description": "Checkpoint output directory. If directory does not exist, it will be created. ",
         "format": "path",
         "title": "Output Dir",
         "type": "string"
      },
      "enabled": {
         "default": true,
         "description": "Enable this checkpoint engine. ",
         "title": "Enabled",
         "type": "boolean"
      },
      "auto_resume": {
         "default": false,
         "description": "If a checkpoint is found in the output directory, resume training from that checkpoint. ",
         "title": "Auto Resume",
         "type": "boolean"
      },
      "save_every_n_steps": {
         "default": 0,
         "description": "How often to trigger a checkpoint save by training global step count. ",
         "minimum": 0,
         "title": "Save Every N Steps",
         "type": "integer"
      },
      "save_every_n_epochs": {
         "default": 0,
         "description": "How often to trigger a checkpoint save by training epoch count. ",
         "minimum": 0,
         "title": "Save Every N Epochs",
         "type": "integer"
      },
      "save_end_of_training": {
         "default": false,
         "description": "Whether to save a checkpoint at the end of training. ",
         "title": "Save End Of Training",
         "type": "boolean"
      }
   },
   "additionalProperties": false,
   "required": [
      "output_dir"
   ]
}

Config:
  • extra: str = forbid

  • use_enum_values: bool = True

  • validate_default: bool = True

  • use_attribute_docstrings: bool = True

  • populate_by_name: bool = True

  • validate_by_alias: bool = True

  • validate_by_name: bool = True

Fields:
Validators:
field type: str = ''

Checkpoint engine type.

field output_dir: Path [Required]

Checkpoint output directory. If directory does not exist, it will be created.

Validated by:
  • resolve_output_dir

field enabled: bool = True

Enable this checkpoint engine.

field auto_resume: bool = False

If a checkpoint is found in the output directory, resume training from that checkpoint.

field save_every_n_steps: Annotated[int] = 0

How often to trigger a checkpoint save by training global step count.

Constraints:
  • ge = 0

  • func = <function parse_human_val at 0x7d029ece95a0>

  • json_schema_input_type = PydanticUndefined

field save_every_n_epochs: Annotated[int] = 0

How often to trigger a checkpoint save by training epoch count.

Constraints:
  • ge = 0

  • func = <function parse_human_val at 0x7d029ece95a0>

  • json_schema_input_type = PydanticUndefined

field save_end_of_training: bool = False

Whether to save a checkpoint at the end of training.

pydantic model arctic_training.config.data.DataConfig[source]

Bases: BaseConfig

Show JSON schema
{
   "title": "DataConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Data factory type. Defaults to the `data_factory_type` in the trainer. ",
         "title": "Type",
         "type": "string"
      },
      "sources": {
         "description": "List of data sources to use for training. These must be registered `DataSource`. ",
         "items": {
            "$ref": "#/$defs/DataSourceConfig"
         },
         "title": "Sources",
         "type": "array"
      },
      "eval_sources": {
         "default": [],
         "description": "list of data sources to use for evaluation. These must be registered `DataSource`. ",
         "items": {
            "$ref": "#/$defs/DataSourceConfig"
         },
         "title": "Eval Sources",
         "type": "array"
      },
      "train_eval_split": {
         "default": [
            1.0,
            0.0
         ],
         "description": "How much of the training data to use for evaluation. ",
         "maxItems": 2,
         "minItems": 2,
         "prefixItems": [
            {
               "type": "number"
            },
            {
               "type": "number"
            }
         ],
         "title": "Train Eval Split",
         "type": "array"
      },
      "max_length": {
         "default": 8192,
         "description": "Maximum length of the input sequence. ",
         "title": "Max Length",
         "type": "integer"
      },
      "num_proc": {
         "default": 16,
         "description": "Number of processes to use for data loading. ",
         "title": "Num Proc",
         "type": "integer"
      },
      "dl_num_workers": {
         "default": 2,
         "description": "Number of DL workers per gpu. ",
         "title": "Dl Num Workers",
         "type": "integer"
      },
      "seed": {
         "default": 42,
         "description": "Seed for data loading. ",
         "title": "Seed",
         "type": "integer"
      },
      "use_data_cache": {
         "anyOf": [
            {
               "type": "boolean"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Whether to cache loaded data. ",
         "title": "Use Data Cache"
      },
      "cache_processed_data": {
         "anyOf": [
            {
               "type": "boolean"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Deprecated, please use \"use_data_cache\". ",
         "title": "Cache Processed Data"
      },
      "cache_dir": {
         "default": "/tmp",
         "description": "Directory to store cached data. ",
         "format": "path",
         "title": "Cache Dir",
         "type": "string"
      },
      "cache_fs_type": {
         "default": "auto",
         "enum": [
            "auto",
            "local",
            "shared"
         ],
         "title": "Cache Fs Type",
         "type": "string"
      }
   },
   "$defs": {
      "DataSourceConfig": {
         "additionalProperties": false,
         "description": "Base DataSource configuration.",
         "properties": {
            "local_rank": {
               "title": "Local Rank",
               "type": "integer"
            },
            "global_rank": {
               "title": "Global Rank",
               "type": "integer"
            },
            "world_size": {
               "title": "World Size",
               "type": "integer"
            },
            "type": {
               "default": "",
               "description": "Data source type. Defaults to 'huggingface' if only a dataset name or path is provided.",
               "title": "Type",
               "type": "string"
            },
            "split": {
               "default": "",
               "description": "Which split the data source is used for. This will be automatically set to either \"train\" or \"eval\" if no value is passed.\n\nFor HFDataSource, this can be any value supported by Dataset slice splits:\nhttps://huggingface.co/docs/datasets/en/loading#slice-splits.",
               "title": "Split",
               "type": "string"
            },
            "sample_ratio": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Ratio of the dataset to randomly sample. If None, all examples are used.",
               "title": "Sample Ratio"
            },
            "sample_count": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Number of examples to randomly sample. If None, all examples are used.",
               "title": "Sample Count"
            },
            "sample_seed": {
               "default": 42,
               "description": "Seed for random sampling. Used only if `sample_ratio` or `sample_count` is set.",
               "title": "Sample Seed",
               "type": "integer"
            },
            "process": {
               "default": true,
               "description": "Whether to process the data with the data factory `process` function (e.g., tokenization for SFTDataFactory). ",
               "title": "Process",
               "type": "boolean"
            }
         },
         "title": "DataSourceConfig",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "sources"
   ]
}

Config:
  • extra: str = forbid

  • use_enum_values: bool = True

  • validate_default: bool = True

  • use_attribute_docstrings: bool = True

  • populate_by_name: bool = True

  • validate_by_alias: bool = True

  • validate_by_name: bool = True

Fields:
Validators:
field type: str = ''

Data factory type. Defaults to the data_factory_type in the trainer.

Validated by:
  • set_cache_fs_type

  • validate_cache_dir

  • validate_train_eval_split

field sources: List[DataSourceConfig] [Required]

List of data sources to use for training. These must be registered DataSource.

Validated by:
field eval_sources: List[DataSourceConfig] = []

list of data sources to use for evaluation. These must be registered DataSource.

Validated by:
field train_eval_split: Tuple[float, float] = (1.0, 0.0)

How much of the training data to use for evaluation.

Validated by:
  • set_cache_fs_type

  • validate_cache_dir

  • validate_train_eval_split

field max_length: Annotated[int] = 8192

Maximum length of the input sequence.

Constraints:
  • func = <function parse_human_val at 0x7d029ece95a0>

  • json_schema_input_type = PydanticUndefined

Validated by:
  • set_cache_fs_type

  • validate_cache_dir

  • validate_train_eval_split

field num_proc: int = 16

Number of processes to use for data loading.

Validated by:
  • set_cache_fs_type

  • validate_cache_dir

  • validate_train_eval_split

field dl_num_workers: int = 2

Number of DL workers per gpu.

Validated by:
  • set_cache_fs_type

  • validate_cache_dir

  • validate_train_eval_split

field seed: int = 42

Seed for data loading.

Validated by:
  • set_cache_fs_type

  • validate_cache_dir

  • validate_train_eval_split

field use_data_cache: Optional[bool] = None

Whether to cache loaded data.

Validated by:
  • deprecate_cache_processed_data

  • set_cache_fs_type

  • validate_cache_dir

  • validate_train_eval_split

field cache_processed_data: Optional[bool] = None

Deprecated, please use “use_data_cache”.

Validated by:
  • deprecate_cache_processed_data

  • set_cache_fs_type

  • validate_cache_dir

  • validate_train_eval_split

field cache_dir: Path = PosixPath('/tmp')

Directory to store cached data.

Validated by:
  • resolve_cache_dir

  • set_cache_fs_type

  • validate_cache_dir

  • validate_train_eval_split

validator init_source_configs  »  eval_sources, sources[source]

Convert string and dict input to correct subclass of DataSourceConfig. If a string is passed, “huggingface” is used as the DataSource type.

Return type:

List[DataSourceConfig]

Parameters:
  • v (List[str | Dict | DataSourceConfig])

  • info (ValidationInfo)

pydantic model arctic_training.config.logger.LoggerConfig[source]

Bases: BaseConfig

Show JSON schema
{
   "title": "LoggerConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "output_dir": {
         "default": "/dev/null",
         "description": "Output directory for log files. ",
         "format": "path",
         "title": "Output Dir",
         "type": "string"
      },
      "level": {
         "default": "WARNING",
         "description": "Log level for the logger. ",
         "title": "Level",
         "type": "string"
      },
      "print_output_ranks": {
         "anyOf": [
            {
               "const": "*",
               "type": "string"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            }
         ],
         "default": [
            0
         ],
         "description": "Which ranks will print logs. Either a list of ranks or \"*\" for all ranks. ",
         "title": "Print Output Ranks"
      },
      "file_output_ranks": {
         "anyOf": [
            {
               "const": "*",
               "type": "string"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            }
         ],
         "default": "*",
         "description": "Which ranks will output logs to a file. Either a list of ranks or \"*\" for all ranks. ",
         "title": "File Output Ranks"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

  • use_enum_values: bool = True

  • validate_default: bool = True

  • use_attribute_docstrings: bool = True

  • populate_by_name: bool = True

  • validate_by_alias: bool = True

  • validate_by_name: bool = True

Fields:
Validators:
  • fill_output_ranks » all fields

  • set_wandb_output_dir » all fields

field output_dir: Path = PosixPath('/dev/null')

Output directory for log files.

Validated by:
  • fill_output_ranks

  • set_wandb_output_dir

field level: str = 'WARNING'

Log level for the logger.

Validated by:
  • fill_output_ranks

  • set_wandb_output_dir

field print_output_ranks: Union[Literal['*'], List[int]] = [0]

Which ranks will print logs. Either a list of ranks or “*” for all ranks.

Validated by:
  • fill_output_ranks

  • set_wandb_output_dir

field file_output_ranks: Union[Literal['*'], List[int]] = '*'

Which ranks will output logs to a file. Either a list of ranks or “*” for all ranks.

Validated by:
  • fill_output_ranks

  • set_wandb_output_dir

pydantic model arctic_training.config.model.ModelConfig[source]

Bases: BaseConfig

Show JSON schema
{
   "title": "ModelConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Model factory type. ",
         "title": "Type",
         "type": "string"
      },
      "name_or_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "format": "path",
               "type": "string"
            }
         ],
         "description": "Model name (as described in Hugging Face model hub) or local path to model checkpoint. ",
         "title": "Name Or Path"
      },
      "dtype": {
         "default": "torch.bfloat16",
         "description": "Data type for model weights. ",
         "examples": [
            "float32",
            "bfloat16"
         ],
         "title": "Torch Dtype",
         "type": "string"
      },
      "save_name": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Name to use when saving the model. ",
         "title": "Save Name"
      },
      "attn_implementation": {
         "default": "sdpa",
         "description": "Attention implementation to use. ",
         "title": "Attn Implementation",
         "type": "string"
      },
      "disable_activation_checkpoint": {
         "default": false,
         "description": "Disable the use of activation checkpointing. ",
         "title": "Disable Activation Checkpoint",
         "type": "boolean"
      },
      "peft_config": {
         "anyOf": [
            {
               "additionalProperties": true,
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Configuration for Parameter Efficient Fine Tuning. ",
         "title": "Peft Config"
      }
   },
   "additionalProperties": false,
   "required": [
      "name_or_path"
   ]
}

Config:
  • extra: str = forbid

  • use_enum_values: bool = True

  • validate_default: bool = True

  • use_attribute_docstrings: bool = True

  • populate_by_name: bool = True

  • validate_by_alias: bool = True

  • validate_by_name: bool = True

Fields:
Validators:
field type: str = ''

Model factory type.

field name_or_path: Union[str, Path] [Required]

Model name (as described in Hugging Face model hub) or local path to model checkpoint.

field dtype: DType = DType.BF16

Data type for model weights.

field save_name: Optional[str] = None

Name to use when saving the model.

field attn_implementation: str = 'sdpa'

Attention implementation to use.

Validated by:
  • validate_attn_implementation

field disable_activation_checkpoint: bool = False

Disable the use of activation checkpointing.

field peft_config: Optional[Dict] = None

Configuration for Parameter Efficient Fine Tuning.

Validated by:
  • validate_peft_config_type

pydantic model arctic_training.config.optimizer.OptimizerConfig[source]

Bases: BaseConfig

Show JSON schema
{
   "title": "OptimizerConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Optimizer factory type. Defaults to the `optimizer_factory_type` of the trainer. ",
         "title": "Type",
         "type": "string"
      },
      "weight_decay": {
         "default": 0.1,
         "description": "Coefficient for L2 regularization applied to the optimizer's weights. ",
         "minimum": 0.0,
         "title": "Weight Decay",
         "type": "number"
      },
      "betas": {
         "default": [
            0.9,
            0.999
         ],
         "description": "Tuple of coefficients used for computing running averages of gradient and its square (e.g., (beta1, beta2) for Adam). ",
         "maxItems": 2,
         "minItems": 2,
         "prefixItems": [
            {
               "type": "number"
            },
            {
               "type": "number"
            }
         ],
         "title": "Betas",
         "type": "array"
      },
      "lr": {
         "default": 0.0005,
         "description": "The initial learning rate. ",
         "minimum": 0.0,
         "title": "Lr",
         "type": "number"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

  • use_enum_values: bool = True

  • validate_default: bool = True

  • use_attribute_docstrings: bool = True

  • populate_by_name: bool = True

  • validate_by_alias: bool = True

  • validate_by_name: bool = True

Fields:
field type: str = ''

Optimizer factory type. Defaults to the optimizer_factory_type of the trainer.

field weight_decay: Annotated[float] = 0.1

Coefficient for L2 regularization applied to the optimizer’s weights.

Constraints:
  • ge = 0.0

  • func = <function parse_human_val at 0x7d029ece95a0>

  • json_schema_input_type = PydanticUndefined

field betas: Tuple[float, float] = (0.9, 0.999)

Tuple of coefficients used for computing running averages of gradient and its square (e.g., (beta1, beta2) for Adam).

field learning_rate: Annotated[float] = 0.0005 (alias 'lr')

The initial learning rate.

Constraints:
  • ge = 0.0

  • func = <function parse_human_val at 0x7d029ece95a0>

  • json_schema_input_type = PydanticUndefined

pydantic model arctic_training.config.scheduler.SchedulerConfig[source]

Bases: BaseConfig

Show JSON schema
{
   "title": "SchedulerConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Scheduler factory type. Defaults to the `scheduler_factory_type` of the trainer. ",
         "title": "Type",
         "type": "string"
      },
      "lr": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The initial learning rate. Deprecated in favor of `optimizer.learning_rate`. ",
         "title": "Lr"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

  • use_enum_values: bool = True

  • validate_default: bool = True

  • use_attribute_docstrings: bool = True

  • populate_by_name: bool = True

  • validate_by_alias: bool = True

  • validate_by_name: bool = True

Fields:
Validators:
field type: str = ''

Scheduler factory type. Defaults to the scheduler_factory_type of the trainer.

field learning_rate: Optional[float] = None (alias 'lr')

The initial learning rate. Deprecated in favor of optimizer.learning_rate.

Validated by:
  • _deprecated_learning_rate

pydantic model arctic_training.config.tokenizer.TokenizerConfig[source]

Bases: BaseConfig

Show JSON schema
{
   "title": "TokenizerConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "type": {
         "default": "",
         "description": "Tokenizer factory type. Defaults to the `tokenizer_factory_type` of the trainer. ",
         "title": "Type",
         "type": "string"
      },
      "name_or_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "format": "path",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "",
         "description": "Tokenizer name (as described in Hugging Face model hub) or local path directory containing tokenizer. ",
         "title": "Name Or Path"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

  • use_enum_values: bool = True

  • validate_default: bool = True

  • use_attribute_docstrings: bool = True

  • populate_by_name: bool = True

  • validate_by_alias: bool = True

  • validate_by_name: bool = True

Fields:
field type: str = ''

Tokenizer factory type. Defaults to the tokenizer_factory_type of the trainer.

field name_or_path: Union[str, Path, None] = ''

Tokenizer name (as described in Hugging Face model hub) or local path directory containing tokenizer.

pydantic model arctic_training.config.wandb.WandBConfig[source]

Bases: BaseConfig

Show JSON schema
{
   "title": "WandBConfig",
   "type": "object",
   "properties": {
      "local_rank": {
         "title": "Local Rank",
         "type": "integer"
      },
      "global_rank": {
         "title": "Global Rank",
         "type": "integer"
      },
      "world_size": {
         "title": "World Size",
         "type": "integer"
      },
      "enable": {
         "default": false,
         "description": "Whether to enable Weights and Biases logging. ",
         "title": "Enable",
         "type": "boolean"
      },
      "entity": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Weights and Biases entity name. ",
         "title": "Entity"
      },
      "project": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "arctic-training",
         "description": "Weights and Biases project name. ",
         "title": "Project"
      },
      "name": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Weights and Biases run name. ",
         "title": "Name"
      }
   },
   "additionalProperties": false
}

Config:
  • extra: str = forbid

  • use_enum_values: bool = True

  • validate_default: bool = True

  • use_attribute_docstrings: bool = True

  • populate_by_name: bool = True

  • validate_by_alias: bool = True

  • validate_by_name: bool = True

Fields:
field enable: bool = False

Whether to enable Weights and Biases logging.

field entity: Optional[str] = None

Weights and Biases entity name.

field project: Optional[str] = 'arctic-training'

Weights and Biases project name.

field name: Optional[str] = None

Weights and Biases run name.

Numerical Formatting

When specifying numerical values in the configuration file, you can use human-friendly strings to represent very large or very small numbers. The following formats are supported:

  • X%: This format represents a percentage. For example, 50% is equivalent to 0.5.

  • XeY: This format represents a number in scientific notation. For example, 1e-6 is equivalent to 0.000001.

  • X^Y: This format represents a number raised to a power. For example, 2^20 is equivalent to 1048576.

  • XK: This format represents a number in thousands (base 10). For example, 1K is equivalent to 1000. Similarly you can use M for millions, B for billions, and T for trillions.

  • 1Ki: This format represents a number in kibibytes (base 2). For example, 1Ki is equivalent to 1024. Similarly you can use Mi for mebibytes, Gi for gibibytes, and Ti for tebibytes.