Configuration
The main input to the ArcticTraining CLI is a YAML configuration file that
defines files for the TrainerConfig
class. This is a Pydantic configuration model that also contains the
sub-configurations for data, model, etc.
- pydantic model arctic_training.config.trainer.TrainerConfig[source]
Bases:
BaseConfigBase Trainer Configuration.
Show JSON schema
{ "title": "TrainerConfig", "description": "Base Trainer Configuration.", "type": "object", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "sft", "description": "Trainer type. ", "title": "Type", "type": "string" }, "code": { "default": "train.py", "description": "Path to the python script containing custom trainer implementation. ", "format": "path", "title": "Code", "type": "string" }, "skip_validation": { "default": false, "description": "Skips validation of types for subconfigs and registered classes. ", "title": "Skip Validation", "type": "boolean" }, "model": { "$ref": "#/$defs/ModelConfig", "description": "Model configuration. " }, "tokenizer": { "$ref": "#/$defs/TokenizerConfig", "description": "Tokenizer configuration. " }, "data": { "$ref": "#/$defs/DataConfig", "description": "Train and eval data configuration. " }, "logger": { "$ref": "#/$defs/LoggerConfig", "description": "Logger configuration. " }, "wandb": { "$ref": "#/$defs/WandBConfig", "description": "Weights and Biases configuration. " }, "scheduler": { "$ref": "#/$defs/SchedulerConfig", "description": "Scheduler configuration. " }, "optimizer": { "$ref": "#/$defs/OptimizerConfig", "description": "Optimizer configuration. " }, "deepspeed": { "additionalProperties": true, "default": {}, "description": "DeepSpeed config dict. Will be automatically filled if not provided by the user. ", "title": "Deepspeed", "type": "object" }, "epochs": { "default": 1, "description": "Number of epochs to train. ", "minimum": 0, "title": "Epochs", "type": "integer" }, "loss_log_interval": { "default": 1, "description": "Number of steps between logging loss. ", "minimum": 0, "title": "Loss Log Interval", "type": "integer" }, "train_log_iter_interval": { "default": 1, "description": "Iters between training metric log outputs. `0` is off, only intervals of `1` currently supported. ", "enum": [ 0, 1 ], "title": "Train Log Iter Interval", "type": "integer" }, "train_log_metrics_path": { "default": "train-log-metrics.jsonl", "description": ".jsonl path to log precise metrics according to the `train_log_iter_interval` schedule. Defaults to `./train-log-metrics.jsonl` ", "format": "path", "title": "Train Log Metrics Path", "type": "string" }, "gradient_accumulation_steps": { "default": 1, "description": "Number of gradient accumulation steps. ", "minimum": 1, "title": "Gradient Accumulation Steps", "type": "integer" }, "micro_batch_size": { "default": 1, "description": "Micro batch size per GPU. ", "minimum": 1, "title": "Micro Batch Size", "type": "integer" }, "sequence_parallel_size": { "default": 1, "description": "Sequence Parallelism Degree. Disabled if set to 1 ", "minimum": 1, "title": "Sequence Parallel Size", "type": "integer" }, "activation_checkpoint_cpu_offload": { "default": false, "description": "Offload activation checkpoint tensors to cpu. Enables a much longer sequence length. It is not very beneficial if sequence length is <64k ", "title": "Activation Checkpoint Cpu Offload", "type": "boolean" }, "tiled_mlp_compute": { "default": false, "description": "Tile the MLP computation to save GPU memory. Currently only limited architectures supported, but can be expanded to more. ", "title": "Tiled Mlp Compute", "type": "boolean" }, "seed": { "default": 42, "description": "Random seed value for numpy, python.random, torch, and transformers. ", "minimum": 0, "title": "Seed", "type": "integer" }, "checkpoint": { "default": [], "description": "Checkpoint configurations. Multiple checkpoint engines may be used together. ", "items": { "$ref": "#/$defs/CheckpointConfig" }, "title": "Checkpoint", "type": "array" }, "train_iters": { "default": 0, "description": "Maximum number of training iterations. ", "minimum": 0, "title": "Train Iters", "type": "integer" }, "eval_interval": { "default": 0, "description": "Number of iterations between evaluations. If 0, no evaluation is performed. ", "minimum": 0, "title": "Eval Interval", "type": "integer" }, "eval_log_iter_interval": { "default": 1, "description": "Iters between eval metric log outputs. `0` is off. ", "minimum": 0, "title": "Eval Log Iter Interval", "type": "integer" }, "exit_iteration": { "default": 0, "description": "Do not continue training after specified iteration count even if there is still data and epochs to run (useful for debugging and tests). ", "minimum": 0, "title": "Exit Iteration", "type": "integer" }, "exit_iteration_this_run": { "default": 0, "description": "Force exit of training after specified iteration count in this run (but will restart running until `exit_iteration` or running out of data/epochs after resume (useful for debugging and tests). ", "minimum": 0, "title": "Exit Iteration This Run", "type": "integer" }, "min_iterations": { "default": 0, "description": "When >0, the training dataset will be replicated until there is enough data to run this many iterations. ", "minimum": 0, "title": "Min Iterations", "type": "integer" }, "overfit_first_batch": { "default": false, "description": "Train only on repetitions of the first training batch. Useful for development. ", "title": "Overfit First Batch", "type": "boolean" }, "mem_profiler": { "default": null, "description": "Enable memory profiling. ", "enum": [ null, "step", "e2e" ], "title": "Mem Profiler" }, "mem_profiler_dir": { "description": "Path to save memory profiling results. Defaults to `logger.output_dir/mem-prof`. ", "format": "path", "title": "Mem Profiler Dir", "type": "string" }, "mem_profiler_max_entries": { "default": 100000, "description": "Maximum number of entries to store in the memory profiler. ", "minimum": 1, "title": "Mem Profiler Max Entries", "type": "integer" }, "kill_switch_path": { "default": "/tmp/at_kill_switch", "description": "Path to a file that can be used to trigger a graceful shutdown mid-training (sets early exit to True). ", "format": "path", "title": "Kill Switch Path", "type": "string" } }, "$defs": { "CheckpointConfig": { "additionalProperties": false, "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Checkpoint engine type. ", "title": "Type", "type": "string" }, "output_dir": { "description": "Checkpoint output directory. If directory does not exist, it will be created. ", "format": "path", "title": "Output Dir", "type": "string" }, "enabled": { "default": true, "description": "Enable this checkpoint engine. ", "title": "Enabled", "type": "boolean" }, "auto_resume": { "default": false, "description": "If a checkpoint is found in the output directory, resume training from that checkpoint. ", "title": "Auto Resume", "type": "boolean" }, "save_every_n_steps": { "default": 0, "description": "How often to trigger a checkpoint save by training global step count. ", "minimum": 0, "title": "Save Every N Steps", "type": "integer" }, "save_every_n_epochs": { "default": 0, "description": "How often to trigger a checkpoint save by training epoch count. ", "minimum": 0, "title": "Save Every N Epochs", "type": "integer" }, "save_end_of_training": { "default": false, "description": "Whether to save a checkpoint at the end of training. ", "title": "Save End Of Training", "type": "boolean" } }, "required": [ "output_dir" ], "title": "CheckpointConfig", "type": "object" }, "DataConfig": { "additionalProperties": false, "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Data factory type. Defaults to the `data_factory_type` in the trainer. ", "title": "Type", "type": "string" }, "sources": { "description": "List of data sources to use for training. These must be registered `DataSource`. ", "items": { "$ref": "#/$defs/DataSourceConfig" }, "title": "Sources", "type": "array" }, "eval_sources": { "default": [], "description": "list of data sources to use for evaluation. These must be registered `DataSource`. ", "items": { "$ref": "#/$defs/DataSourceConfig" }, "title": "Eval Sources", "type": "array" }, "train_eval_split": { "default": [ 1.0, 0.0 ], "description": "How much of the training data to use for evaluation. ", "maxItems": 2, "minItems": 2, "prefixItems": [ { "type": "number" }, { "type": "number" } ], "title": "Train Eval Split", "type": "array" }, "max_length": { "default": 8192, "description": "Maximum length of the input sequence. ", "title": "Max Length", "type": "integer" }, "num_proc": { "default": 16, "description": "Number of processes to use for data loading. ", "title": "Num Proc", "type": "integer" }, "dl_num_workers": { "default": 2, "description": "Number of DL workers per gpu. ", "title": "Dl Num Workers", "type": "integer" }, "seed": { "default": 42, "description": "Seed for data loading. ", "title": "Seed", "type": "integer" }, "use_data_cache": { "anyOf": [ { "type": "boolean" }, { "type": "null" } ], "default": null, "description": "Whether to cache loaded data. ", "title": "Use Data Cache" }, "cache_processed_data": { "anyOf": [ { "type": "boolean" }, { "type": "null" } ], "default": null, "description": "Deprecated, please use \"use_data_cache\". ", "title": "Cache Processed Data" }, "cache_dir": { "default": "/tmp", "description": "Directory to store cached data. ", "format": "path", "title": "Cache Dir", "type": "string" }, "cache_fs_type": { "default": "auto", "enum": [ "auto", "local", "shared" ], "title": "Cache Fs Type", "type": "string" }, "fail_on_missing_cache": { "default": false, "description": "Whether to fail if the cache is missing. ", "title": "Fail On Missing Cache", "type": "boolean" } }, "required": [ "sources" ], "title": "DataConfig", "type": "object" }, "DataSourceConfig": { "additionalProperties": false, "description": "Base DataSource configuration.", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Data source type. Defaults to 'huggingface' if only a dataset name or path is provided.", "title": "Type", "type": "string" }, "split": { "default": "", "description": "Which split to load for a given data source. This will be automatically set to either \"train\" or \"eval\" if no value is passed.\n\nFor HFDataSource, this can be any value supported by Dataset slice splits:\nhttps://huggingface.co/docs/datasets/en/loading#slice-splits.", "title": "Split", "type": "string" }, "sample_ratio": { "anyOf": [ { "type": "number" }, { "type": "null" } ], "default": null, "description": "Ratio of the dataset to randomly sample. If None, all examples are used.", "title": "Sample Ratio" }, "sample_count": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Number of examples to randomly sample. If None, all examples are used.", "title": "Sample Count" }, "sample_seed": { "default": 42, "description": "Seed for random sampling. Used only if `sample_ratio` or `sample_count` is set.", "title": "Sample Seed", "type": "integer" }, "process": { "default": true, "description": "Whether to process the data with the data factory `process` function (e.g., tokenization for SFTDataFactory). ", "title": "Process", "type": "boolean" } }, "title": "DataSourceConfig", "type": "object" }, "LoggerConfig": { "additionalProperties": false, "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "output_dir": { "default": "/dev/null", "description": "Output directory for log files. ", "format": "path", "title": "Output Dir", "type": "string" }, "level": { "default": "WARNING", "description": "Log level for the logger. ", "title": "Level", "type": "string" }, "print_output_ranks": { "anyOf": [ { "const": "*", "type": "string" }, { "items": { "type": "integer" }, "type": "array" } ], "default": [ 0 ], "description": "Which ranks will print logs. Either a list of ranks or \"*\" for all ranks. ", "title": "Print Output Ranks" }, "file_output_ranks": { "anyOf": [ { "const": "*", "type": "string" }, { "items": { "type": "integer" }, "type": "array" } ], "default": "*", "description": "Which ranks will output logs to a file. Either a list of ranks or \"*\" for all ranks. ", "title": "File Output Ranks" } }, "title": "LoggerConfig", "type": "object" }, "ModelConfig": { "additionalProperties": false, "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Model factory type. ", "title": "Type", "type": "string" }, "name_or_path": { "anyOf": [ { "type": "string" }, { "format": "path", "type": "string" } ], "description": "Model name (as described in Hugging Face model hub) or local path to model checkpoint. ", "title": "Name Or Path" }, "dtype": { "default": "torch.bfloat16", "description": "Data type for model weights. ", "examples": [ "float32", "bfloat16" ], "title": "Torch Dtype", "type": "string" }, "save_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Name to use when saving the model. ", "title": "Save Name" }, "attn_implementation": { "default": "sdpa", "description": "Attention implementation to use. ", "title": "Attn Implementation", "type": "string" }, "disable_activation_checkpoint": { "default": false, "description": "Disable the use of activation checkpointing. ", "title": "Disable Activation Checkpoint", "type": "boolean" }, "peft_config": { "anyOf": [ { "additionalProperties": true, "type": "object" }, { "type": "null" } ], "default": null, "description": "Configuration for Parameter Efficient Fine Tuning. ", "title": "Peft Config" }, "hf_config_kwargs": { "additionalProperties": true, "description": "Optional kwargs to override in the HF model config object created by `AutoConfig.from_pretrained(model.name_or_path)` ", "title": "Hf Config Kwargs", "type": "object" } }, "required": [ "name_or_path" ], "title": "ModelConfig", "type": "object" }, "OptimizerConfig": { "additionalProperties": false, "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Optimizer factory type. Defaults to the `optimizer_factory_type` of the trainer. ", "title": "Type", "type": "string" }, "weight_decay": { "default": 0.1, "description": "Coefficient for L2 regularization applied to the optimizer's weights. ", "minimum": 0.0, "title": "Weight Decay", "type": "number" }, "betas": { "default": [ 0.9, 0.999 ], "description": "Tuple of coefficients used for computing running averages of gradient and its square (e.g., (beta1, beta2) for Adam). ", "maxItems": 2, "minItems": 2, "prefixItems": [ { "type": "number" }, { "type": "number" } ], "title": "Betas", "type": "array" }, "lr": { "default": 0.0005, "description": "The initial learning rate. ", "minimum": 0.0, "title": "Lr", "type": "number" } }, "title": "OptimizerConfig", "type": "object" }, "SchedulerConfig": { "additionalProperties": false, "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Scheduler factory type. Defaults to the `scheduler_factory_type` of the trainer. ", "title": "Type", "type": "string" }, "lr": { "anyOf": [ { "type": "number" }, { "type": "null" } ], "default": null, "description": "The initial learning rate. Deprecated in favor of `optimizer.learning_rate`. ", "title": "Lr" } }, "title": "SchedulerConfig", "type": "object" }, "TokenizerConfig": { "additionalProperties": false, "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Tokenizer factory type. Defaults to the `tokenizer_factory_type` of the trainer. ", "title": "Type", "type": "string" }, "name_or_path": { "anyOf": [ { "type": "string" }, { "format": "path", "type": "string" }, { "type": "null" } ], "default": "", "description": "Tokenizer name (as described in Hugging Face model hub) or local path directory containing tokenizer. ", "title": "Name Or Path" }, "tokenize_kwargs": { "additionalProperties": true, "description": "Optional kwargs to be passed to tokenizer.tokenize in addition or to override the default values passed in the corresponding data factory's process function", "title": "Tokenize Kwargs", "type": "object" } }, "title": "TokenizerConfig", "type": "object" }, "WandBConfig": { "additionalProperties": false, "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "enable": { "default": false, "description": "Whether to enable Weights and Biases logging. ", "title": "Enable", "type": "boolean" }, "entity": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Weights and Biases entity name. ", "title": "Entity" }, "project": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "arctic-training", "description": "Weights and Biases project name. ", "title": "Project" }, "name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Weights and Biases run name. ", "title": "Name" } }, "title": "WandBConfig", "type": "object" } }, "additionalProperties": false, "required": [ "model", "data" ] }
- Config:
extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- Validators:
build_deepspeed_config»all fieldscoerce_deepspeed_human_friendly_values»deepspeedinit_checkpoint_configs»checkpointinit_data_config»datainit_dist»all fieldsinit_model_config»modelinit_optimizer_config»optimizerinit_scheduler_config»schedulerinit_tokenizer_config»tokenizerinitialize_logger»loggermem_profiler_mkdir»all fieldsset_max_length»all fieldsset_tokenizer»all fieldstrain_log_metrics_path_prep»all fieldsvalidate_eval_interval»all fieldsvalidate_sft_sample_packing»all fieldsvalidate_single_checkpoint_resume»all fields
-
field type:
str= 'sft' Trainer type.
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field code:
Path= PosixPath('train.py') Path to the python script containing custom trainer implementation.
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field skip_validation:
bool= False Skips validation of types for subconfigs and registered classes.
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field model:
ModelConfig[Required] Model configuration.
- Validated by:
build_deepspeed_configinit_distinit_model_configmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field tokenizer:
TokenizerConfig[Optional] Tokenizer configuration.
- Validated by:
build_deepspeed_configinit_distinit_tokenizer_configmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field data:
DataConfig[Required] Train and eval data configuration.
- Validated by:
build_deepspeed_configinit_data_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field logger:
LoggerConfig[Optional] Logger configuration.
- Validated by:
build_deepspeed_configinit_distinitialize_loggermem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field wandb:
WandBConfig[Optional] Weights and Biases configuration.
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field scheduler:
SchedulerConfig[Optional] Scheduler configuration.
- Validated by:
build_deepspeed_configinit_distinit_scheduler_configmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field optimizer:
OptimizerConfig[Optional] Optimizer configuration.
- Validated by:
build_deepspeed_configinit_distinit_optimizer_configmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field deepspeed:
Dict[str,Any] = {} DeepSpeed config dict. Will be automatically filled if not provided by the user.
- Validated by:
build_deepspeed_configcoerce_deepspeed_human_friendly_valuesinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field epochs:
int= 1 Number of epochs to train.
- Constraints:
ge = 0
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field loss_log_interval:
Annotated[int] = 1 Number of steps between logging loss.
- Constraints:
ge = 0
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field train_log_iter_interval:
Literal[0,1] = 1 Iters between training metric log outputs. 0 is off, only intervals of 1 currently supported.
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field train_log_metrics_path:
Path= PosixPath('train-log-metrics.jsonl') .jsonl path to log precise metrics according to the train_log_iter_interval schedule. Defaults to ./train-log-metrics.jsonl
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field gradient_accumulation_steps:
int= 1 Number of gradient accumulation steps.
- Constraints:
ge = 1
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field micro_batch_size:
int= 1 Micro batch size per GPU.
- Constraints:
ge = 1
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field sequence_parallel_size:
int= 1 Sequence Parallelism Degree. Disabled if set to 1
- Constraints:
ge = 1
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field activation_checkpoint_cpu_offload:
bool= False Offload activation checkpoint tensors to cpu. Enables a much longer sequence length. It is not very beneficial if sequence length is <64k
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field tiled_mlp_compute:
bool= False Tile the MLP computation to save GPU memory. Currently only limited architectures supported, but can be expanded to more.
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field seed:
int= 42 Random seed value for numpy, python.random, torch, and transformers.
- Constraints:
ge = 0
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field checkpoint:
List[CheckpointConfig] = [] Checkpoint configurations. Multiple checkpoint engines may be used together.
- Validated by:
build_deepspeed_configinit_checkpoint_configsinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field train_iters:
Annotated[int] = 0 Maximum number of training iterations.
- Constraints:
ge = 0
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field eval_interval:
Annotated[int] = 0 Number of iterations between evaluations. If 0, no evaluation is performed.
- Constraints:
ge = 0
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field eval_log_iter_interval:
Annotated[int] = 1 Iters between eval metric log outputs. 0 is off.
- Constraints:
ge = 0
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field exit_iteration:
int= 0 Do not continue training after specified iteration count even if there is still data and epochs to run (useful for debugging and tests).
- Constraints:
ge = 0
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field exit_iteration_this_run:
int= 0 Force exit of training after specified iteration count in this run (but will restart running until exit_iteration or running out of data/epochs after resume (useful for debugging and tests).
- Constraints:
ge = 0
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field min_iterations:
Annotated[int] = 0 When >0, the training dataset will be replicated until there is enough data to run this many iterations.
- Constraints:
ge = 0
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field overfit_first_batch:
bool= False Train only on repetitions of the first training batch. Useful for development.
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field mem_profiler:
Literal[None,'step','e2e'] = None Enable memory profiling.
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field mem_profiler_dir:
Path[Optional] Path to save memory profiling results. Defaults to logger.output_dir/mem-prof.
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field mem_profiler_max_entries:
Annotated[int] = 100000 Maximum number of entries to store in the memory profiler.
- Constraints:
ge = 1
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
-
field kill_switch_path:
Path= PosixPath('/tmp/at_kill_switch') Path to a file that can be used to trigger a graceful shutdown mid-training (sets early exit to True).
- Validated by:
build_deepspeed_configinit_distmem_profiler_mkdirset_max_lengthset_tokenizertrain_log_metrics_path_prepvalidate_eval_intervalvalidate_sft_sample_packingvalidate_single_checkpoint_resume
- pydantic model arctic_training.config.checkpoint.CheckpointConfig[source]
Bases:
BaseConfigShow JSON schema
{ "title": "CheckpointConfig", "type": "object", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Checkpoint engine type. ", "title": "Type", "type": "string" }, "output_dir": { "description": "Checkpoint output directory. If directory does not exist, it will be created. ", "format": "path", "title": "Output Dir", "type": "string" }, "enabled": { "default": true, "description": "Enable this checkpoint engine. ", "title": "Enabled", "type": "boolean" }, "auto_resume": { "default": false, "description": "If a checkpoint is found in the output directory, resume training from that checkpoint. ", "title": "Auto Resume", "type": "boolean" }, "save_every_n_steps": { "default": 0, "description": "How often to trigger a checkpoint save by training global step count. ", "minimum": 0, "title": "Save Every N Steps", "type": "integer" }, "save_every_n_epochs": { "default": 0, "description": "How often to trigger a checkpoint save by training epoch count. ", "minimum": 0, "title": "Save Every N Epochs", "type": "integer" }, "save_end_of_training": { "default": false, "description": "Whether to save a checkpoint at the end of training. ", "title": "Save End Of Training", "type": "boolean" } }, "additionalProperties": false, "required": [ "output_dir" ] }
- Config:
extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- Validators:
resolve_output_dir»output_dir
-
field type:
str= '' Checkpoint engine type.
-
field output_dir:
Path[Required] Checkpoint output directory. If directory does not exist, it will be created.
- Validated by:
resolve_output_dir
-
field enabled:
bool= True Enable this checkpoint engine.
-
field auto_resume:
bool= False If a checkpoint is found in the output directory, resume training from that checkpoint.
-
field save_every_n_steps:
Annotated[int] = 0 How often to trigger a checkpoint save by training global step count.
- Constraints:
ge = 0
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
-
field save_every_n_epochs:
Annotated[int] = 0 How often to trigger a checkpoint save by training epoch count.
- Constraints:
ge = 0
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
-
field save_end_of_training:
bool= False Whether to save a checkpoint at the end of training.
- pydantic model arctic_training.config.data.DataConfig[source]
Bases:
BaseConfigShow JSON schema
{ "title": "DataConfig", "type": "object", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Data factory type. Defaults to the `data_factory_type` in the trainer. ", "title": "Type", "type": "string" }, "sources": { "description": "List of data sources to use for training. These must be registered `DataSource`. ", "items": { "$ref": "#/$defs/DataSourceConfig" }, "title": "Sources", "type": "array" }, "eval_sources": { "default": [], "description": "list of data sources to use for evaluation. These must be registered `DataSource`. ", "items": { "$ref": "#/$defs/DataSourceConfig" }, "title": "Eval Sources", "type": "array" }, "train_eval_split": { "default": [ 1.0, 0.0 ], "description": "How much of the training data to use for evaluation. ", "maxItems": 2, "minItems": 2, "prefixItems": [ { "type": "number" }, { "type": "number" } ], "title": "Train Eval Split", "type": "array" }, "max_length": { "default": 8192, "description": "Maximum length of the input sequence. ", "title": "Max Length", "type": "integer" }, "num_proc": { "default": 16, "description": "Number of processes to use for data loading. ", "title": "Num Proc", "type": "integer" }, "dl_num_workers": { "default": 2, "description": "Number of DL workers per gpu. ", "title": "Dl Num Workers", "type": "integer" }, "seed": { "default": 42, "description": "Seed for data loading. ", "title": "Seed", "type": "integer" }, "use_data_cache": { "anyOf": [ { "type": "boolean" }, { "type": "null" } ], "default": null, "description": "Whether to cache loaded data. ", "title": "Use Data Cache" }, "cache_processed_data": { "anyOf": [ { "type": "boolean" }, { "type": "null" } ], "default": null, "description": "Deprecated, please use \"use_data_cache\". ", "title": "Cache Processed Data" }, "cache_dir": { "default": "/tmp", "description": "Directory to store cached data. ", "format": "path", "title": "Cache Dir", "type": "string" }, "cache_fs_type": { "default": "auto", "enum": [ "auto", "local", "shared" ], "title": "Cache Fs Type", "type": "string" }, "fail_on_missing_cache": { "default": false, "description": "Whether to fail if the cache is missing. ", "title": "Fail On Missing Cache", "type": "boolean" } }, "$defs": { "DataSourceConfig": { "additionalProperties": false, "description": "Base DataSource configuration.", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Data source type. Defaults to 'huggingface' if only a dataset name or path is provided.", "title": "Type", "type": "string" }, "split": { "default": "", "description": "Which split to load for a given data source. This will be automatically set to either \"train\" or \"eval\" if no value is passed.\n\nFor HFDataSource, this can be any value supported by Dataset slice splits:\nhttps://huggingface.co/docs/datasets/en/loading#slice-splits.", "title": "Split", "type": "string" }, "sample_ratio": { "anyOf": [ { "type": "number" }, { "type": "null" } ], "default": null, "description": "Ratio of the dataset to randomly sample. If None, all examples are used.", "title": "Sample Ratio" }, "sample_count": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Number of examples to randomly sample. If None, all examples are used.", "title": "Sample Count" }, "sample_seed": { "default": 42, "description": "Seed for random sampling. Used only if `sample_ratio` or `sample_count` is set.", "title": "Sample Seed", "type": "integer" }, "process": { "default": true, "description": "Whether to process the data with the data factory `process` function (e.g., tokenization for SFTDataFactory). ", "title": "Process", "type": "boolean" } }, "title": "DataSourceConfig", "type": "object" } }, "additionalProperties": false, "required": [ "sources" ] }
- Config:
extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
cache_fs_type (Literal['auto', 'local', 'shared'])eval_sources (List[arctic_training.config.data.DataSourceConfig])sources (List[arctic_training.config.data.DataSourceConfig])
- Validators:
deprecate_cache_processed_data»cache_processed_datadeprecate_cache_processed_data»use_data_cacheresolve_cache_dir»cache_dirset_cache_fs_type»all fieldsvalidate_cache_dir»all fieldsvalidate_train_eval_split»all fields
-
field type:
str= '' Data factory type. Defaults to the data_factory_type in the trainer.
- Validated by:
set_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field sources:
List[DataSourceConfig] [Required] List of data sources to use for training. These must be registered DataSource.
- Validated by:
set_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field eval_sources:
List[DataSourceConfig] = [] list of data sources to use for evaluation. These must be registered DataSource.
- Validated by:
set_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field train_eval_split:
Tuple[float,float] = (1.0, 0.0) How much of the training data to use for evaluation.
- Validated by:
set_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field max_length:
Annotated[int] = 8192 Maximum length of the input sequence.
- Constraints:
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
- Validated by:
set_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field num_proc:
int= 16 Number of processes to use for data loading.
- Validated by:
set_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field dl_num_workers:
int= 2 Number of DL workers per gpu.
- Validated by:
set_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field seed:
int= 42 Seed for data loading.
- Validated by:
set_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field use_data_cache:
Optional[bool] = None Whether to cache loaded data.
- Validated by:
deprecate_cache_processed_dataset_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field cache_processed_data:
Optional[bool] = None Deprecated, please use “use_data_cache”.
- Validated by:
deprecate_cache_processed_dataset_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field cache_dir:
Path= PosixPath('/tmp') Directory to store cached data.
- Validated by:
resolve_cache_dirset_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
-
field fail_on_missing_cache:
bool= False Whether to fail if the cache is missing.
- Validated by:
set_cache_fs_typevalidate_cache_dirvalidate_train_eval_split
- validator init_source_configs » eval_sources, sources[source]
Convert string and dict input to correct subclass of DataSourceConfig.
- Return type:
List[DataSourceConfig]- Parameters:
v (List[str | Dict | DataSourceConfig])
info (ValidationInfo)
Note
If data.max_length is not set in your configuration, it will be automatically set to the value of model.config.max_position_embeddings (if available) from the HuggingFace model config. If your model config does not have this attribute, you must set max_length manually to avoid errors with sequence lengths, which are longer than what the model was built to handle.
- pydantic model arctic_training.config.logger.LoggerConfig[source]
Bases:
BaseConfigShow JSON schema
{ "title": "LoggerConfig", "type": "object", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "output_dir": { "default": "/dev/null", "description": "Output directory for log files. ", "format": "path", "title": "Output Dir", "type": "string" }, "level": { "default": "WARNING", "description": "Log level for the logger. ", "title": "Level", "type": "string" }, "print_output_ranks": { "anyOf": [ { "const": "*", "type": "string" }, { "items": { "type": "integer" }, "type": "array" } ], "default": [ 0 ], "description": "Which ranks will print logs. Either a list of ranks or \"*\" for all ranks. ", "title": "Print Output Ranks" }, "file_output_ranks": { "anyOf": [ { "const": "*", "type": "string" }, { "items": { "type": "integer" }, "type": "array" } ], "default": "*", "description": "Which ranks will output logs to a file. Either a list of ranks or \"*\" for all ranks. ", "title": "File Output Ranks" } }, "additionalProperties": false }
- Config:
extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- Validators:
fill_output_ranks»all fieldsset_wandb_output_dir»all fields
-
field output_dir:
Path= PosixPath('/dev/null') Output directory for log files.
- Validated by:
fill_output_ranksset_wandb_output_dir
-
field level:
str= 'WARNING' Log level for the logger.
- Validated by:
fill_output_ranksset_wandb_output_dir
-
field print_output_ranks:
Union[Literal['*'],List[int]] = [0] Which ranks will print logs. Either a list of ranks or “*” for all ranks.
- Validated by:
fill_output_ranksset_wandb_output_dir
-
field file_output_ranks:
Union[Literal['*'],List[int]] = '*' Which ranks will output logs to a file. Either a list of ranks or “*” for all ranks.
- Validated by:
fill_output_ranksset_wandb_output_dir
- pydantic model arctic_training.config.model.ModelConfig[source]
Bases:
BaseConfigShow JSON schema
{ "title": "ModelConfig", "type": "object", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Model factory type. ", "title": "Type", "type": "string" }, "name_or_path": { "anyOf": [ { "type": "string" }, { "format": "path", "type": "string" } ], "description": "Model name (as described in Hugging Face model hub) or local path to model checkpoint. ", "title": "Name Or Path" }, "dtype": { "default": "torch.bfloat16", "description": "Data type for model weights. ", "examples": [ "float32", "bfloat16" ], "title": "Torch Dtype", "type": "string" }, "save_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Name to use when saving the model. ", "title": "Save Name" }, "attn_implementation": { "default": "sdpa", "description": "Attention implementation to use. ", "title": "Attn Implementation", "type": "string" }, "disable_activation_checkpoint": { "default": false, "description": "Disable the use of activation checkpointing. ", "title": "Disable Activation Checkpoint", "type": "boolean" }, "peft_config": { "anyOf": [ { "additionalProperties": true, "type": "object" }, { "type": "null" } ], "default": null, "description": "Configuration for Parameter Efficient Fine Tuning. ", "title": "Peft Config" }, "hf_config_kwargs": { "additionalProperties": true, "description": "Optional kwargs to override in the HF model config object created by `AutoConfig.from_pretrained(model.name_or_path)` ", "title": "Hf Config Kwargs", "type": "object" } }, "additionalProperties": false, "required": [ "name_or_path" ] }
- Config:
extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- Validators:
validate_attn_implementation»attn_implementationvalidate_peft_config_type»peft_config
-
field type:
str= '' Model factory type.
-
field name_or_path:
Union[str,Path] [Required] Model name (as described in Hugging Face model hub) or local path to model checkpoint.
-
field dtype:
DType= DType.BF16 Data type for model weights.
-
field save_name:
Optional[str] = None Name to use when saving the model.
-
field attn_implementation:
str= 'sdpa' Attention implementation to use.
- Validated by:
validate_attn_implementation
-
field disable_activation_checkpoint:
bool= False Disable the use of activation checkpointing.
-
field peft_config:
Optional[Dict] = None Configuration for Parameter Efficient Fine Tuning.
- Validated by:
validate_peft_config_type
-
field hf_config_kwargs:
Dict[Optional] Optional kwargs to override in the HF model config object created by AutoConfig.from_pretrained(model.name_or_path)
- pydantic model arctic_training.config.optimizer.OptimizerConfig[source]
Bases:
BaseConfigShow JSON schema
{ "title": "OptimizerConfig", "type": "object", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Optimizer factory type. Defaults to the `optimizer_factory_type` of the trainer. ", "title": "Type", "type": "string" }, "weight_decay": { "default": 0.1, "description": "Coefficient for L2 regularization applied to the optimizer's weights. ", "minimum": 0.0, "title": "Weight Decay", "type": "number" }, "betas": { "default": [ 0.9, 0.999 ], "description": "Tuple of coefficients used for computing running averages of gradient and its square (e.g., (beta1, beta2) for Adam). ", "maxItems": 2, "minItems": 2, "prefixItems": [ { "type": "number" }, { "type": "number" } ], "title": "Betas", "type": "array" }, "lr": { "default": 0.0005, "description": "The initial learning rate. ", "minimum": 0.0, "title": "Lr", "type": "number" } }, "additionalProperties": false }
- Config:
extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
-
field type:
str= '' Optimizer factory type. Defaults to the optimizer_factory_type of the trainer.
-
field weight_decay:
Annotated[float] = 0.1 Coefficient for L2 regularization applied to the optimizer’s weights.
- Constraints:
ge = 0.0
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
-
field betas:
Tuple[float,float] = (0.9, 0.999) Tuple of coefficients used for computing running averages of gradient and its square (e.g., (beta1, beta2) for Adam).
-
field learning_rate:
Annotated[float] = 0.0005 (alias 'lr') The initial learning rate.
- Constraints:
ge = 0.0
func = <function parse_human_val at 0x78ba4229a950>
json_schema_input_type = PydanticUndefined
- pydantic model arctic_training.config.scheduler.SchedulerConfig[source]
Bases:
BaseConfigShow JSON schema
{ "title": "SchedulerConfig", "type": "object", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Scheduler factory type. Defaults to the `scheduler_factory_type` of the trainer. ", "title": "Type", "type": "string" }, "lr": { "anyOf": [ { "type": "number" }, { "type": "null" } ], "default": null, "description": "The initial learning rate. Deprecated in favor of `optimizer.learning_rate`. ", "title": "Lr" } }, "additionalProperties": false }
- Config:
extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- Validators:
_deprecated_learning_rate»learning_rate
-
field type:
str= '' Scheduler factory type. Defaults to the scheduler_factory_type of the trainer.
-
field learning_rate:
Optional[float] = None (alias 'lr') The initial learning rate. Deprecated in favor of optimizer.learning_rate.
- Validated by:
_deprecated_learning_rate
- pydantic model arctic_training.config.tokenizer.TokenizerConfig[source]
Bases:
BaseConfigShow JSON schema
{ "title": "TokenizerConfig", "type": "object", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "type": { "default": "", "description": "Tokenizer factory type. Defaults to the `tokenizer_factory_type` of the trainer. ", "title": "Type", "type": "string" }, "name_or_path": { "anyOf": [ { "type": "string" }, { "format": "path", "type": "string" }, { "type": "null" } ], "default": "", "description": "Tokenizer name (as described in Hugging Face model hub) or local path directory containing tokenizer. ", "title": "Name Or Path" }, "tokenize_kwargs": { "additionalProperties": true, "description": "Optional kwargs to be passed to tokenizer.tokenize in addition or to override the default values passed in the corresponding data factory's process function", "title": "Tokenize Kwargs", "type": "object" } }, "additionalProperties": false }
- Config:
extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
-
field type:
str= '' Tokenizer factory type. Defaults to the tokenizer_factory_type of the trainer.
-
field name_or_path:
Union[str,Path,None] = '' Tokenizer name (as described in Hugging Face model hub) or local path directory containing tokenizer.
-
field tokenize_kwargs:
Dict[Optional] Optional kwargs to be passed to tokenizer.tokenize in addition or to override the default values passed in the corresponding data factory’s process function
- pydantic model arctic_training.config.wandb.WandBConfig[source]
Bases:
BaseConfigShow JSON schema
{ "title": "WandBConfig", "type": "object", "properties": { "local_rank": { "title": "Local Rank", "type": "integer" }, "global_rank": { "title": "Global Rank", "type": "integer" }, "world_size": { "title": "World Size", "type": "integer" }, "enable": { "default": false, "description": "Whether to enable Weights and Biases logging. ", "title": "Enable", "type": "boolean" }, "entity": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Weights and Biases entity name. ", "title": "Entity" }, "project": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "arctic-training", "description": "Weights and Biases project name. ", "title": "Project" }, "name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Weights and Biases run name. ", "title": "Name" } }, "additionalProperties": false }
- Config:
extra: str = forbid
use_enum_values: bool = True
validate_default: bool = True
use_attribute_docstrings: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
-
field enable:
bool= False Whether to enable Weights and Biases logging.
-
field entity:
Optional[str] = None Weights and Biases entity name.
-
field project:
Optional[str] = 'arctic-training' Weights and Biases project name.
-
field name:
Optional[str] = None Weights and Biases run name.
Numerical Formatting
When specifying numerical values in the configuration file, you can use human-friendly strings to represent very large or very small numbers. The following formats are supported:
X%: This format represents a percentage. For example,50%is equivalent to0.5.XeY: This format represents a number in scientific notation. For example,1e-6is equivalent to0.000001.X^Y: This format represents a number raised to a power. For example,2^20is equivalent to1048576.XK: This format represents a number in thousands (base 10). For example,1Kis equivalent to1000. Similarly you can useMfor millions,Bfor billions, andTfor trillions.1Ki: This format represents a number in kibibytes (base 2). For example,1Kiis equivalent to1024. Similarly you can useMifor mebibytes,Gifor gibibytes, andTifor tebibytes.