Limiting Job Concurrency on a Remote Sparkler Engine

When compiling ETL jobs on a remote Sparkler engine, all jobs are executed simultaneously. For pipelines with more than 110 jobs, running all jobs concurrently can consume all of ports in the default port range and cause the pipeline to fail. To limit the number of jobs that can be executed concurrently on a remote Spark cluster with Sparkler, you can add a configuration file to the cluster and specify the maximum number of jobs that can be executed at the same time. When the number of jobs exceeds the limit, additional jobs are queued and then executed as resources are freed. Follow the instructions below to configure the limit.

  1. If necessary, run the following command to stop the remote Sparkler server:
    ./<install_path>/sparkler/bin/sparkler-server stop
  2. The Anzo embedded Sparkler engine includes a configuration file template, application.conf.template, that you can copy to the remote cluster. If needed, you can retrieve application.conf.template from the following directory on the Anzo server:
    <install_path>/Server/data/sdiScripts/spark-2.2/compile/dependencies-lib/sparkler/conf
  3. Rename application.conf.template to application.conf and place applicaton.conf in the <install_path>/sparkler/conf/ directory on the remote cluster.
  4. Open application.conf in an editor. At the top of the file under server options, change the value for maxActiveJobs to the maximum number of jobs that you want Sparkler to execute concurrently. The setting and default value are shown in bold below:
    server {
      actorSystemName = "SparklerServerSystem"
      actorName = "SparklerJobActor"
      retryDelay = "3 seconds"
      maxRetries = 5
      maxActiveJobs = 1
    ...
  5. Save and close application.conf, and then run the following command to restart the Sparkler server:
    ./<install_path>/sparkler/bin/sparkler-server start

Related Topics