Replicate Version control for machine learning

You can store your experiment data in the cloud on Google Cloud Storage or Amazon S3.

This means you store results from multiple training machines in one place and collaborate with other people.

When you list experiments on your local machine, it will list all of the experiments that have run anywhere by anyone, so you can easily compare between them and download the results.

Log in to Google Cloud

To store data on Google Cloud Storage, you first need to install the Google Cloud SDK if you haven't already. Then, run gcloud init and follow its instructions:

Log in to your Google account when prompted.
If it asks to choose a project, pick the default or first option.
If it asks to pick a default region, hit enter and select 1 (you want a default region, but it doesn't matter which one).

Next, run this, because Cloud SDK needs you to log in a second time to let other applications use your Google Cloud account:

gcloud auth application-default login

Point Replicate at Google Cloud

Then, create replicate.yaml with this content, replacing [your model name] with your model's name, or some other unique string:

repository: "gs://replicate-repository-[your model name]"

Log in to Amazon Web Services

First, you need to install and configure the Amazon Web Services CLI if you haven't already. The easiest way to do this on macOS is to install it with Homebrew:

If you haven't already, install Homebrew.
Run brew install awscli
If you haven't already got an Amazon Web Services account, sign up for Amazon Web Services.
Run aws configure and follow these instructions to configure it. You will need to get an access key ID and secret access key and set the region to us-east-1 (unless you want to use a particular region). You don't need to set an output format, or profile.

If you're on another platform, follow these instructions to install the Amazon Web Services CLI.

Point Replicate at Amazon S3

Then, create replicate.yaml with this content, replacing [your model name] with your model's name, or some other unique string:

repository: "s3://replicate-repository-[your model name]"

Now, when you run your training script, calling experiment.checkpoint() will upload all your working directory and checkpoint metadata to this bucket.

If you're following along from the tutorial, run python train.py again. This time it will save your experiments to cloud storage. (It takes a second to save each checkpoint, so press Ctrl-C after a few epochs if you don't want to wait.)

Now, when you run replicate ls, it will list experiments from the bucket.

Migrate data

If you switch from local file storage to remote cloud storage, you will not see your local experiments any longer. Unlike Git which stores data both locally and remotely, Replicate only stores data in one location.

It is easy to migrate locations, if you ever need to, because Replicate just stores its data as plain files. To migrate from local file storage to the cloud, just copy the .replicate directory, or wherever you store your data, to your Amazon S3 or Google Cloud Storage bucket.

What's next

You might want to take a look at:

````