Friday, June 05, 2020

Downloading files from Google Cloud

In doing some testing with GATK 4, I found myself in need of downloading files from Google Cloud. Google Cloud likes to use URL's
that start with gs: For example, the URL for some tumor data is

gs://gatk-best-practices/somatic-b37/HCC1143.bam .

You can't just visit that URL in your browser though; or at least I couldn't. I had to install gsutil as described here: https://cloud.google.com/storage/docs/gsutil_install#linux . This is one of those weird installs where they provide a script online that you can run; a bit dangerous, but at least they don't ask for sudo. It downloads about a gazillion files then asks permission to muck with your settings. I said no, of course, and it gave me a couple of files to source if I wanted. One of them had to do with providing autocomplete, but the other one simply added a directory to the path, so I created
an environment module to do that work. Now I can download the files I need:

$ gsutil cp gs://gatk-best-practices/somatic-b37/HCC1143.bam .