![]() Which cloud-native service should you use to orchestrate the entire pipeline?Īny insight on this would be greatly appreciated. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers. Your company has a hybrid cloud initiative. You want to use managed services where possible, and the pipeline will run every day. ![]() The pipeline includes Cloud Dataproc and Cloud Dataflow jobs that have multiple dependencies on each other. This feature is available on Google Workspace Business and. Plus, keep everyone in the loop with shared calendars. You want to automate execution of a multi-step data pipeline running on Google Cloud. Save time scheduling meetings by layering multiple calendars in a single view. Which service should you use to manage the execution of these jobs? If the steps fail, they must be retried a fixed number of times. The jobs are expected to run for many minutes up to several hours. Portions of the jobs involve executing shell scripts, running Hadoop jobs, and running queries in BigQuery. These jobs have many interdependent steps that must be executed in a specific order. You are implementing several batch jobs that must be executed on a schedule. Their main duties include planning weekly employee schedules, determining appointment lengths and making phone calls to patients or customers regarding their appointment or meeting times. Here are the example questions that confused me in regards to this topic: A Scheduler, or Appointment Scheduler, coordinates appointments for employees, customers or patients. However, I was surprised with the "correct answers" I found, and was hoping someone could clarify if these answers are correct and if I understood when to use one over another. These thoughts came after attempting to answer some exam questions I found. ![]() Therefore, seems to be more tailored to use in "simpler" tasks. You can then chain flexibly as many of these "workflows" as you want, as well as giving the opporutnity to restart jobs when failed, run batch jobs, shell scripts, chain queries and so on.įor the Cloud Scheduler, it has very similar capabilities in regards to what tasks it can execute, however, it is used more for regular jobs, that you can execute at regular intervals, and not necessarily used when you have interdependencies in between jobs or when you need to wait for other jobs before starting another one. we need the output of a job to start another whenever the first finished, and use dependencies coming from first job. I am currently studying for the GCP Data Engineer exam and have struggled to understand when to use Cloud Scheduler and whe to use Cloud Composer.įrom reading the docs, I have the impression that Cloud Composer should be used when there is interdependencies between the job, e.g.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |