Skip to main content

Create a Scheduled Job

From the Job Management page in Safe Haven, you can click Create Job to configure a PySpark or Python job that specifies files containing the Python code that you want to schedule or run immediately. Once you create a job, it appears on the Job Management page, where you can view its status, run it, see its next scheduled run time, stop, enable, disable, or delete it.

Before you create a job, do the following:

  • Create any required Python code files in your code repository bucket that you want to run, for example:

    gs://{name_of_your_GCP_bucket}-coderepo/{your_file_name}.py

  • Determine any optional arguments.

  • Decide when the job should run and at what interval if it is a recurring job.

Procedure. To create a scheduled job:
  1. In the left navigation bar, click Job Management.

    LSH-Select-Job_Management.png

    The Job Management page displays, with the Scheduled Jobs tab selected by default.

    job-manager-parent-main-01.png
  2. On the Scheduled Jobs tab, click Create Job.

    job-management-create-job-08272025-01.png

    The Create Job page displays the Details step.

    job-manager-jobs-config-1--01.png
  3. Enter a unique name for the job and a description so that you can remember what the job does and then click Next.

    The Job Settings step appears.

    job-manager-jobs-config-2--01.png
  4. From the Job Type drop-down list, select PySpark (the default option) or Python.

    • If you selected PySpark:

      job-management-step-2-pyspark-08202025-01.png
      1. In the Cluster Type drop-down list, select the Dataproc cluster type based on the resource intensity of your PySpark job (see “Python Server and PySpark Dataproc Cluster Sizing” below).

      2. In the Dataproc Version drop-down list, select a Dataproc version.

    • If you selected Python:

      job-management-step-2-python-08202025-01.png
      1. In the Version drop-down list, select a Python version.

      2. In the Server Type drop-down list, select the server size based on the resource intensity of your Python job (see “Python Server and PySpark Dataproc Cluster Sizing” below).

    Python Server and PySpark Dataproc Cluster Sizing

    Note

    Size

    Machine Type

    vCPUs

    RAM

    Worker Number

    Max Number

    Small

    n2-standard-4

    4

    16 GB

    2

    16

    Medium

    n2-standard-8

    8

    32 GB

    2

    16

    Large

    n2-standard-16

    16

    64 GB

    2

    16

    X-Large

    n2-standard-32

    32

    128 GB

    2

    16

    XX-Large

    n2-standard-64

    64

    256 GB

    2

    16

  5. In the Code File section, click Add Code File to select the code file for the job.

    Note

    When managing code files for scheduled jobs in the GitLab shared repository, be aware of the following:

    • Removed files: If your scheduled job's code file is removed from the GitLab repository, the removal will not impact existing scheduled jobs that use the file. These jobs will continue to use the version of the file that existed before its removal. If the removal of the file from the GitLab repository was expected and you no longer want its associated scheduled jobs to run, you must stop or disable the scheduled jobs in the Job Management UI.

    • Modified files: If your scheduled job's code file is modified in the GitLab repository, modification might affect the running of any scheduled jobs associated with this file. No action is needed if this change was expected, as scheduled jobs will automatically use the latest file version. If the modification was unexpected, you can either revoke the changes to the job in Jupyter or select a previous version of the code file in the Job Management UI.

    job-management-step-2-add-code-file-08202025-01.png

    The From GCS bucket and From GitLab shared repository drop-downs display.

    job-management-step-2-add-code-drop-downs-08202025-01.png
  6. Select the appropriate drop-down list that contains the file you want to use for this job.

    Note

    The file must exist in the GCS bucket or GitLab repository before it can be selected.

    • For files in a GCS bucket, click the From GCS bucket drop-down list and select the appropriate file.

    • For files in a GitLab shared repository, click the From GitLab shared repository drop-down list and select the appropriate branch and file.

  7. Click Add Code File to add the selected code file.

    job-management-step-2-add-code2-08202025-01.png
  8. (Optional) In the Arguments field, enter a comma-delimited list of job arguments to pass to the main class and to any additional Python files.

    job-management-step-2-add-arguments-08202025-01.png
  9. (Optional) To add an additional file needed to run your job:

    1. Click Add Files.

      job-management-step-2-add-files-08202025-01.png

      The From GCS bucket and From GitLab shared repository drop-down lists display.

      job-management-step-2-add-files2-08202025-01.png
    2. Select the appropriate drop-down list that contains the file you want to use for this job.

      Note

      The file must exist in the GCS bucket or GitLab repository before it can be selected.

      • For files in a GCS bucket, click the From GCS bucket drop-down list and select the appropriate file.

      • For files in a GitLab shared repository, click the From GitLab shared repository drop-down list and select the appropriate branch and file.

    3. Click Add Files to add the selected file.

      job-management-step-2-add-files3-08202025-01.png
    4. To add additional files, repeat Step 9.

  10. (Optional) If your Python job requires any non-standard Python packages to run, enter its <package_name>==<version_number> syntax in the Additional Packages field. For information, see "Supported Python Packages."

    job-management-step-2-add-files4-08202025-01.png
  11. Click Next. The Schedule step appears.

  12. From the Repeats Every drop-down list, select an interval at which you want to repeat the job.

    job-manager-jobs-config-3--01.png

    If you want to specify a custom schedule, you can enter a cron schedule expression to instruct the cron utility to run your job at a specified day, time, and recurring interval. For example, the cron format typically includes a string of space-delimited integers and special characters (e.g., , - * /) in the following order:

    • Minute (0−59)

    • Hour (0−23)

    • Day of the month (1−31)

    • Month of the year (1−12)

    • Day of the week (0−6 with 0=Sunday)

    For more information, see Google Cloud's "Configuring cron job schedules."

  13. As needed, enter the start run time and time zone options.

  14. Click Next, review your job information, and then click Create.

    You are returned to the Job Management page and your job displays the Processing status.

    If you want to disable your job at any time, you can click its Enable switch to the off position.