Tutorial ini menunjukkan cara menggunakan GPU di Dataflow untuk memproses gambar satelit Landsat 8 dan merendernya sebagai file JPEG. Tutorial ini didasarkan pada contoh Memproses gambar satelit Landsat dengan GPU.
Tujuan
- Build image Docker untuk Dataflow yang memiliki TensorFlow dengan dukungan GPU.
- Menjalankan tugas Dataflow dengan GPU.
Biaya
Tutorial ini menggunakan komponen Google Cloudyang dapat ditagih, termasuk:
- Cloud Storage
- Dataflow
- Artifact Registry
Gunakan kalkulator harga untuk membuat perkiraan biaya berdasarkan penggunaan yang Anda proyeksikan.
Sebelum memulai
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
- Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_ID
with a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_ID
with your Google Cloud project name.
-
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Dataflow, Cloud Build, and Artifact Registry APIs:
gcloud services enable dataflow
cloudbuild.googleapis.com artifactregistry.googleapis.com -
If you're using a local shell, then create local authentication credentials for your user account:
gcloud auth application-default login
You don't need to do this if you're using Cloud Shell.
If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.
-
Grant roles to your user account. Run the following command once for each of the following IAM roles:
roles/iam.serviceAccountUser
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
- Replace
PROJECT_ID
with your project ID. -
Replace
USER_IDENTIFIER
with the identifier for your user account. For example,user:[email protected]
. - Replace
ROLE
with each individual role.
- Replace
- Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_ID
with a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_ID
with your Google Cloud project name.
-
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Dataflow, Cloud Build, and Artifact Registry APIs:
gcloud services enable dataflow
cloudbuild.googleapis.com artifactregistry.googleapis.com -
If you're using a local shell, then create local authentication credentials for your user account:
gcloud auth application-default login
You don't need to do this if you're using Cloud Shell.
If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.
-
Grant roles to your user account. Run the following command once for each of the following IAM roles:
roles/iam.serviceAccountUser
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
- Replace
PROJECT_ID
with your project ID. -
Replace
USER_IDENTIFIER
with the identifier for your user account. For example,user:[email protected]
. - Replace
ROLE
with each individual role.
- Replace
Berikan peran ke akun layanan default Compute Engine Anda. Jalankan perintah berikut satu kali untuk setiap peran IAM berikut:
roles/dataflow.admin
,roles/dataflow.worker
,roles/bigquery.dataEditor
,roles/pubsub.editor
,roles/storage.objectAdmin
, danroles/artifactregistry.reader
.gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:PROJECT_NUMBER[email protected]" --role=SERVICE_ACCOUNT_ROLE
- Ganti
PROJECT_ID
dengan project ID Anda. - Ganti
PROJECT_NUMBER
dengan nomor project Anda. Untuk menemukan nomor project, lihat Mengidentifikasi project. - Ganti
SERVICE_ACCOUNT_ROLE
dengan setiap peran individual.
- Ganti
- Untuk menyimpan file gambar JPEG output dari tutorial ini, buat bucket Cloud Storage:
- In the Google Cloud console, go to the Cloud Storage Buckets page.
- Click Create.
- On the Create a bucket page, enter your bucket information. To go to the next
step, click Continue.
- For Name your bucket, enter a unique bucket name. Don't include sensitive information in the bucket name, because the bucket namespace is global and publicly visible.
-
In the Choose where to store your data section, do the following:
- Select a Location type.
- Choose a location where your bucket's data is permanently stored from the Location type drop-down menu.
- If you select the dual-region location type, you can also choose to enable turbo replication by using the relevant checkbox.
- To set up cross-bucket replication, select
Add cross-bucket replication via Storage Transfer Service and
follow these steps:
Set up cross-bucket replication
- In the Bucket menu, select a bucket.
In the Replication settings section, click Configure to configure settings for the replication job.
The Configure cross-bucket replication pane appears.
- To filter objects to replicate by object name prefix, enter a prefix that you want to include or exclude objects from, then click Add a prefix.
- To set a storage class for the replicated objects, select a storage class from the Storage class menu. If you skip this step, the replicated objects will use the destination bucket's storage class by default.
- Click Done.
-
In the Choose how to store your data section, do the following:
- In the Set a default class section, select the following: Standard.
- To enable hierarchical namespace, in the Optimize storage for data-intensive workloads section, select Enable hierarchical namespace on this bucket.
- In the Choose how to control access to objects section, select whether or not your bucket enforces public access prevention, and select an access control method for your bucket's objects.
-
In the Choose how to protect object data section, do the
following:
- Select any of the options under Data protection that you
want to set for your bucket.
- To enable soft delete, click the Soft delete policy (For data recovery) checkbox, and specify the number of days you want to retain objects after deletion.
- To set Object Versioning, click the Object versioning (For version control) checkbox, and specify the maximum number of versions per object and the number of days after which the noncurrent versions expire.
- To enable the retention policy on objects and buckets, click the Retention (For compliance) checkbox, and then do the following:
- To enable Object Retention Lock, click the Enable object retention checkbox.
- To enable Bucket Lock, click the Set bucket retention policy checkbox, and choose a unit of time and a length of time for your retention period.
- To choose how your object data will be encrypted, expand the Data encryption section (Data encryption method. ), and select a
- Select any of the options under Data protection that you
want to set for your bucket.
- Click Create.
Menyiapkan lingkungan kerja
Download file awal, lalu buat repositori Artifact Registry Anda.
Mendownload file awal
Download file awal, lalu ubah direktori.
Buat clone repositori
python-docs-samples
.git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
Buka direktori kode contoh.
cd python-docs-samples/dataflow/gpu-examples/tensorflow-landsat
Mengonfigurasi Artifact Registry
Buat repositori Artifact Registry agar Anda dapat mengupload artefak. Setiap repositori dapat berisi artefak untuk satu format yang didukung.
Semua konten repositori dienkripsi menggunakan Google-owned and Google-managed encryption keys atau kunci enkripsi yang dikelola pelanggan. Artifact Registry menggunakan Google-owned and Google-managed encryption keys secara default dan tidak diperlukan konfigurasi untuk opsi ini.
Anda harus memiliki setidaknya akses Artifact Registry Writer ke repositori.
Jalankan perintah berikut untuk membuat repositori baru. Perintah ini menggunakan
flag --async
dan segera ditampilkan, tanpa menunggu operasi yang sedang
berlangsung selesai.
gcloud artifacts repositories create REPOSITORY \
--repository-format=docker \
--location=LOCATION \
--async
Ganti REPOSITORY dengan nama untuk repositori Anda. Untuk setiap lokasi repositori dalam project, nama repositori harus unik.
Sebelum Anda dapat mengirim atau mengambil image, konfigurasikan Docker untuk mengautentikasi permintaan untuk Artifact Registry. Untuk menyiapkan autentikasi ke repositori Docker, jalankan perintah berikut:
gcloud auth configure-docker LOCATION-docker.pkg.dev
Perintah ini memperbarui konfigurasi Docker Anda. Sekarang Anda dapat terhubung dengan Artifact Registry di project Google Cloud untuk mengirim image.
Membangun gambar Docker
Cloud Build memungkinkan Anda mem-build image Docker menggunakan Dockerfile dan menyimpannya ke Artifact Registry, tempat image dapat diakses oleh produkGoogle Cloud lainnya.
Build image container menggunakan file konfigurasi build.yaml
.
gcloud builds submit --config build.yaml
Menjalankan tugas Dataflow dengan GPU
Blok kode berikut menunjukkan cara meluncurkan pipeline Dataflow ini dengan GPU.
Kita menjalankan pipeline Dataflow menggunakan file konfigurasi run.yaml
.
export PROJECT=PROJECT_NAME
export BUCKET=BUCKET_NAME
export JOB_NAME="satellite-images-$(date +%Y%m%d-%H%M%S)"
export OUTPUT_PATH="gs://$BUCKET/samples/dataflow/landsat/output-images/"
export REGION="us-central1"
export GPU_TYPE="nvidia-tesla-t4"
gcloud builds submit \
--config run.yaml \
--substitutions _JOB_NAME=$JOB_NAME,_OUTPUT_PATH=$OUTPUT_PATH,_REGION=$REGION,_GPU_TYPE=$GPU_TYPE \
--no-source
Ganti kode berikut:
- PROJECT_NAME: Google Cloud nama project
- BUCKET_NAME: nama bucket Cloud Storage (tanpa awalan
gs://
)
Setelah Anda menjalankan pipeline ini, tunggu hingga perintah selesai. Jika keluar dari shell, Anda mungkin kehilangan variabel lingkungan yang telah ditetapkan.
Untuk menghindari berbagi GPU di antara beberapa proses pekerja, contoh ini menggunakan jenis mesin dengan 1 vCPU. Persyaratan memori pipeline ditangani dengan menggunakan memori tambahan sebesar 13 GB. Untuk informasi selengkapnya, baca GPU dan paralelisme pekerja.
Melihat hasil
Pipeline di
tensorflow-landsat/main.py
memproses gambar satelit Landsat 8 dan
merendernya sebagai file JPEG. Gunakan langkah-langkah berikut untuk melihat file ini.
Cantumkan file JPEG output dengan detail menggunakan Google Cloud CLI.
gcloud storage ls "gs://$BUCKET/samples/dataflow/landsat/" --long --readable-sizes
Salin file ke direktori lokal Anda.
mkdir outputs gcloud storage cp "gs://$BUCKET/samples/dataflow/landsat/*" outputs/
Buka file gambar ini dengan penampil gambar pilihan Anda.
Pembersihan
Agar tidak perlu membayar biaya pada akun Google Cloud Anda untuk resource yang digunakan dalam tutorial ini, hapus project yang berisi resource tersebut, atau simpan project dan hapus setiap resource.
Menghapus project
Cara termudah untuk menghilangkan penagihan adalah dengan menghapus project yang Anda buat untuk tutorial.
Untuk menghapus project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Langkah berikutnya
- Lihat contoh TensorFlow minimal yang mendukung GPU
- Lihat contoh PyTorch minimal yang mendukung GPU
- Pelajari lebih lanjut dukungan GPU di Dataflow.
- Lihat tugas untuk Menggunakan GPU.
- Pelajari arsitektur referensi, diagram, dan praktik terbaik tentang Google Cloud. Lihat Cloud Architecture Center kami.