Command Line Interface

llm-atc train

Launch a train job on a cloud provider

llm-atc train [OPTIONS]

Options

--model_type <model_type>

Required LLM type to train. Run llm-atc show-models to see a list of supported models

--finetune_data <finetune_data>

Required local/cloud URI to finetuning data. (e.g ~/mychat.json, s3://my_bucket/my_chat.json)

--checkpoint_bucket <checkpoint_bucket>

Required object store bucket name

--checkpoint_store <checkpoint_store>

Required object store type [‘S3’, ‘GCS’, ‘AZURE’, ‘R2’, ‘IBM’]

-n, --name <name>

Required Name of this model run.

--description <description>

description of this model run

-c, --cluster <cluster>

Name of skypilot cluster. If name matches existing cluster, will use this cluster

--cloud <cloud>

Which cloud provider to use.

--envs <envs>

Environment variable to set on the remote node. It can be specified multiple times. Examples:

 1. --envs MY_ENV=1: set $MY_ENV on the cluster to be 1.

2. --envs MY_ENV2=$HOME: set $MY_ENV2 on the cluster to be the same value of $HOME in the local environment where the CLI command is run.

3. --envs MY_ENV3: set $MY_ENV3 on the cluster to be the same value of $MY_ENV3 in the local environment.

--region <region>

which region to train in. Defaults to any region

--zone <zone>

which zone to train in. Defaults to any zone

--accelerator <accelerator>

Required Which GPU type to use

--detach_setup <detach_setup>

launch task non-interactively. Don’t stream setup logs

--detach_run <detach_run>

Perform execution non-interactively. Calling terminal doesn’t hang on run

--no_setup

Skip setup. Faster if cluster is already provisioned and UP

Default

False

llm-atc serve

Create a cluster to serve an openAI.api_server using FastChat and vLLM

llm-atc serve [OPTIONS]

Options

-n, --name <name>

Required name of model to serve

--source <source>

object store path for llm-atc finetuned model checkpoints.e.g. s3://<bucket-name>/<path>/<to>/<checkpoints>

-e, --envs <envs>

Environment variable to set on the remote node. It can be specified multiple times. Examples:

 1. --envs MY_ENV=1: set $MY_ENV on the cluster to be 1.

2. --envs MY_ENV2=$HOME: set $MY_ENV2 on the cluster to be the same value of $HOME in the local environment where the CLI command is run.

3. --envs MY_ENV3: set $MY_ENV3 on the cluster to be the same value of $MY_ENV3 in the local environment.

--accelerator <accelerator>

Which gpu instance to use for serving

-c, --cluster <cluster>

Name of skypilot cluster. If name matches existing cluster, will use this cluster

--cloud <cloud>

which cloud provider to use for deployment

--region <region>

which region to deploy. Defaults to any region

--zone <zone>

which zone to deploy. Defaults to any zone

--no_setup

skip setup step

Default

False

--detach_setup

Don’t connect to this session

Default

False

-d, --detach_run

Don’t connect to this session

Default

False

llm-atc list

List models created by llm-atc. For models that are done, their status is permanently marked as available. Jobs that are pending require a cluster to be UP in order to update their status TODO: add checks for status of runs

llm-atc list [OPTIONS]

Options

--limit <limit>

Limit of number of models to print

--model_type <model_type>

Filter models by model type

--name <name>

Filter models by name. Matches against model names with pattern name included

llm-atc show_models

List suppported models for training in llm-atc

llm-atc show_models [OPTIONS]