Get Started: Your First Trainwave Job
This guide walks you through the process of launching your first machine learning job on Trainwave. Let's get started!
Step 1: Create an Organization
An organization helps you manage all your projects under a single umbrella. Each organization has its own separate billing, making it easy to track and manage costs.
When you first create a Trainwave account, you'll need to set up an organization. You can easily do this from the top left corner of the Trainwave web UI.
Step 2: Create a Project
Once you have an organization, you can create a project to house your training jobs. To create a new project, navigate to: https://trainwave.ai/projects (opens in a new tab)
Make a note of your project ID, as you'll need it later.
Step 3: (Optional) Invite team members
If you're working with a team, you can invite them to your organization. You can all share the same billing and collaborate on projects.
You can invite them by going to this page: https://trainwave.ai/orgs/members (opens in a new tab)
Step 4: Fund your account with some credits
To run jobs on Trainwave, you'll need to add credits to your account. Visit the billing page to add funds: https://trainwave.ai/orgs/billing (opens in a new tab)
NOTE: If you run out of funds, your jobs will be terminated and you will not be able to launch another job until you add more credits to your account.
Step 5: Install the CLI
The Trainwave CLI gives you powerful command-line control over your training jobs. Install it using pip:
pip install trainwave-cli
Make sure you have your preferred Python environment set up before installing.
More information on the CLI can be found in the CLI Reference and a more in-depth installation guide in the Installation section.
Step 6: Authenticate the CLI
Log in to Trainwave through your CLI for secure access:
wave auth login
Alternatively you can create an API key in the web UI and configure it with the CLI:
wave auth set-token <API_KEY>
This will open a browser window for authentication.
Verify your login by running:
wave auth whoami
Step 7: Create your first job
In order to run a job, you will need a trainwave.toml
configuration file in your project.
There is two ways to create this file:
- Through the web UI using the
wave config
command - Manually creating the file
Option 1: Using the wave config
command
wave config
This will open a browser window where you can configure your job. Once you're done, the configuration file will be saved in your project.
The file will be saved to trainwave.toml
in your project directory or you can specify a different path. Once the file is saved, you can edit it manually if needed.
We recommend adding any variables that your job requires and setting them as environment variables in the configuration file.
Option 2: Manually creating the file
For this simple setup, we will assume you only need one. Here is a sample configuration file:
name = "Finetune LLAMA3" # The name of job
project = "p-eqhplsmc" # The job id from what we got in step 2
expires = "1h" # Optional: Will kill the job after 1h
gpu_type = "RTX A5000" # The type of GPU to use
gpus = 2 # The number of GPUs to use
hdd_size_mb = 51200 # Size of the disk you need
setup_command = "bash setup.sh" # This is a command that will run first to set up your env
run_command = "bash run.sh" # this is the command that should start your training
compliance_soc2 = true # Optional: If you care about compliance
image = "trainwave/pytorch:2.3.1" # You can find the list of images under the "Images" documentation
env_vars.WANDB_API_KEY = "${WANDB_API_KEY}" # This will take your current env value for "WANDB_API_KEY"
env_vars.HUGGINGFACE_TOKEN = "${HF_TOKEN}" # Same for "HF_TOKEN"
Copy and paste this confirguration into a file called trainwave.toml
in your project directory and customize it to your needs.
Step 8: Launch!
Once you've configured your job, simply do:
wave jobs launch
Which will upload your code and run it on a machine in the cloud!
Additional documentation
How to manage secrets and environment variables: Variables
To see the full documentation for the configuration file please see: Configuration docs
To see the full documentation for the CLI please see: CLI docs