Limiting the Number of Unstructured Status Journals

To limit the disk space used by Graph Studio Unstructured pipelines, you have the option to configure the Graph Studio Unstructured Distributed service to limit the number of status journals that are preserved on disk. When the specified limit is reached and a pipeline generates a new journal, the oldest journal is deleted.

Journals are removed based on their timestamps alone. The pipeline they are associated with is not a factor in determining the journals to delete.

Follow the instructions below to configure the Unstructured Distributed service to limit the number of status journals on disk.

  1. In the Administration application, expand the Servers menu and click Advanced Configuration. Click I understand and accept the risk.
  2. Search for the Anzo Unstructured Distributed bundle and view its details.
  3. Click the Services tab and expand Anzo Unstructured Distributed.
  4. Locate the com.cambridgesemantics.anzo.unstructured.distributed.defaultNumStatusJournalGlobalLimit property (shown in the image below).

  5. Click the property to make it editable. Then replace the current value with the maximum number of status journals to keep on disk. The default value is -1 (unlimited). For example, in the image below, the value is set to keep 10 status journals.

  6. After changing the value, click the checkmark icon () for that property to save the change.
  7. Restart Graph Studio to apply the configuration change.