Adding an XML Data Source
This topic provides instructions for adding an XML data source.
- In the Anzo application, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources. For example:
- Click the Add Data Source button and select File > XML Data Source. Anzo opens the Create XML Data Source screen.
- Specify a name for the data source in the Title field, and type an optional description in the Description field.
- Click the File Location field to open the File Location dialog box.
- Follow the appropriate steps below depending on whether the file is on your computer or the shared File Store:
If the file is on your computer:
The From Your Computer option is a convenient way to do a one-time ingestion so you can quickly get started with your data. It should not be relied upon as part of a regular onboarding workflow unless the server is configured to store uploaded files on the shared file store as described in Setting the Default File Upload Path. Data source files that are routinely updated and re-ingested should be hosted on a shared file store.
- As a best practice, check the upload location that is listed in the Upload To field by hovering your pointer over the value to view the tooltip. Make sure the upload location is a directory on the shared file store and not in the server installation path. If the file is not uploaded to the shared file store it is not accessible by applications like AnzoGraph. In addition, other users cannot create graphmarts from the data source because they typically do not have access to the file location.
For example, viewing the Upload To location for the screen above shows that the file will be uploaded to the server installation path,
/opt/Anzo/Server/data...
If your Upload To location is configured to upload the file to the server installation path, click Change and select an upload location that is on the shared file store. For example, the image below shows the Upload Folder Location dialog box that is presented after clicking Change. A folder called fileUploads is selected on the shared store.
- Drag and drop the file onto the screen or click browse to navigate to the file and select it. Anzo attaches the file and the OK button becomes active.
- Click OK. Anzo lists the path to the file in the XML File Location field.
If the file is on the File Store:
- Click the From File Store radio button.
- In the File Location dialog box, on the left side of the screen, select the appropriate File Store. On the right side of the screen, navigate to the directory that contains the file to import. The screen displays the list of files in the directory. For example:
- Select the file that you want to import and then click OK to close the dialog box. Anzo lists the path to the file in the XML File Location field.
If you have multiple files with the same schema— the files contain the same elements in the same order—and you want the files to be imported as if they are a single file, you can select the Insert Wildcard option. Then type a string using asterisks as wildcard characters to find the files with similar names. Files that match the specified string will be imported as one file and will result in one job being created in the pipeline to ingest all of the files that are selected by the specified string. After typing a string, click Apply to include that string in the Selected list.
- As a best practice, check the upload location that is listed in the Upload To field by hovering your pointer over the value to view the tooltip. Make sure the upload location is a directory on the shared file store and not in the server installation path. If the file is not uploaded to the shared file store it is not accessible by applications like AnzoGraph. In addition, other users cannot create graphmarts from the data source because they typically do not have access to the file location.
- Click Save to create the data source. Anzo adds the source and displays the Overview screen. For example:
- By default, when the data from this source is ingested, the entire root node is captured; the node at the root of the hierarchy is loaded to AnzoGraph in its entirety. Building the hierarchical record for a large file is extremely memory intensive. To increase load performance and decrease memory usage when onboarding a large file with many repeating elements, Cambridge Semantics recommends that you configure the Root Element Name field on the Overview tab. This field designates the element in the hierarchy that should be treated as the root node. Specifying the desired root node tells the Graph Data Interface to scan into memory only the data that you are interested in and not the entire file. To set the root element, follow these steps:
- Click in the Root Element Name field to make it editable.
- Add the name of the element to designate as the root element. Type the name the same way it appears in the file. Data is captured from whichever node in the hierarchy matches the Root Element Name value in its entirety.
It is not necessary to express the path to an element if it is low in the hierarchy. For specificity, however, you can use dot notation to supply the path. For example, specifying "city" captures all city elements anywhere in the file. But specifying "country.state.city" captures only the city elements that are under state and city in the hierarchy. You can also include the dollar sign ($) character to anchor the selector at the root of the file. For example, "data" captures all data elements anywhere in the file. But "$.data" captures only the data elements that are at the root of the hierarchy.
As an example, for a file that contains weather data in daily, hourly, minutely, and currently hierarchies, "hourly" is specified to target only the data under the hourly hierarchy:
- Click the checkmark icon (
) to save the change.
For information about creating or changing primary keys and foreign keys, see Assigning Primary and Foreign Keys in a Schema.
When you are ready to onboard the data to Anzo, see Onboarding Data with the Automated Workflow for next steps. Or, if you want to onboard or virtualize the source by manually writing SPARQL queries against the Graph Data Interface service, see Onboarding or Virtualizing Data with SPARQL Queries.