Command Line Interface

llm-atc train

Launch a train job on a cloud provider

llm-atc train [OPTIONS]

Options

--model_type <model_type>

Required LLM type to train. Run llm-atc show-models to see a list of supported models

--finetune_data <finetune_data>

Required local/cloud URI to finetuning data. (e.g ~/mychat.json, s3://my_bucket/my_chat.json)

-n, --name <name>

Required Name of this model run.

--description <description>

description of this model run

-c, --cluster <cluster>

Name of skypilot cluster. If name matches existing cluster, will use this cluster

--cloud <cloud>

Which cloud provider to use.

--envs <envs>

Environment variables for run. Usage llm-atc train … –envs ‘MODEL_SIZE=7 USE_FLASH_ATTN=0 WANDB_API_KEY=<mywanbd_key>’

--accelerator <accelerator>

Required Which GPU type to use

--detach_setup <detach_setup>

launch task non-interactively. Don’t stream setup logs

--detach_run <detach_run>

Perform execution non-interactively. Calling terminal doesn’t hang on run

--no_setup <no_setup>

Skip setup. Faster if cluster is already provisioned and UP

llm-atc serve

Create a cluster to serve an openAI.api_server using FastChat and vLLM

llm-atc serve [OPTIONS]

Options

-n, --name <name>

Required name of model to serve

-e, --envs <envs>

environment variables for this serve deployment. i.e. HF_TOKEN=’<huggingface_token>’

--accelerator <accelerator>

Which gpu instance to use for serving

-c, --cluster <cluster>

Name of skypilot cluster. If name matches existing cluster, will use this cluster

--cloud <cloud>

which cloud provider to use for deployment

--region <region>

which region to deploy. Defaults to any region

--zone <zone>

which zone to deploy. Defaults to any zone

--no_setup

skip setup step

Default

False

--detach_setup

Don’t connect to this session

Default

False

-d, --detach_run

Don’t connect to this session

Default

False

llm-atc list

List models created by llm-atc. For models that are done, their status is permanently marked as available. Jobs that are pending require a cluster to be UP in order to update their status TODO: add checks for status of runs

llm-atc list [OPTIONS]

Options

--limit <limit>

Limit of number of models to print

--model_type <model_type>

Filter models by model type

--name <name>

Filter models by name. Matches against model names with pattern name included

llm-atc show_models

List suppported models for training in llm-atc

llm-atc show_models [OPTIONS]