Importing Data from XML Files

This topic provides instructions for creating an XML data source, scanning a file, and generating the schema.

  1. In the Anzo console, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources. For example:

  2. Click the Create button and select XML Data Source. Anzo opens the Create XML Data Source screen.

  3. Type a name for the data source in the Title field, and type an optional description in the Description field.
  4. Click the XML File Location field to open the File Location dialog box.
  5. In the File Location dialog box, on the left side of the screen, select the file connection for the file. On the right side of the screen, navigate to the directory that contains the file to import. The screen displays the list of files in the directory. For example:

  6. Select the file that you want to import and then click OK to close the dialog box. If you have multiple files with the same schema— the files contain the same elements in the same order—you can select the Insert Wildcard option. Then type a string using asterisks as wildcard characters to find the files with similar names. Files that match the specified string will be imported as one file and will result in one job being created in the pipeline to ingest all of the files that are selected by the specified string. After typing a string, click Apply to include that string in the Selected list.
  7. Specify the type of schema that Anzo should create. Click the Schema Type field and select one of the following types from the drop-down list:
    • Flat: By default, the Schema Type is set to Flat. A flat schema type results in a single schema table with a single mapping file and ETL job. Generating a flat schema is ideal for files with many different objects with nested relationships where there are many one-to-one relationships. If the file contains a large number of arrays or a number of arrays that are large in size, however, generating a flat schema is not recommended. The import can require extensive server resources and take a long time to process.
      NoteIn Flat mode, Anzo creates relationships that go from the parent node to the child node. For example: Person → Address.
    • Relational: A relational schema type results in multiple schema tables, mappings, and jobs. Generating a relational schema is ideal for files that include many arrays or a number of very large arrays. Creating a relational schema from a file that contains many different objects with one-to-one relationships can result in poor import performance and a very large number of small tables, mappings, and ETL jobs.
      NoteIn Relational mode, Anzo creates relationships that go from the child node to the parent node. For example: Address → Person.

    Anzo performs pre-processing before creating the schema. If the specified Schema Type would result in poor performance or require extensive resources, Anzo displays a warning and prompts you to change the schema type before proceeding with the schema creation.

  8. The Schema File Location field defines where Anzo saves the generated schema. Cambridge Semantics recommends that you leave the field blank. If you want to designate a custom location, click Browse and choose a file location.
  9. The value in the Scan Depth field indicates the number of entities in the file that Anzo should scan to find all of the unique objects to include as classes and properties in the generated model. The scan process follows nested objects, counting one object array as one row. Edit the value as needed. A value of -1 instructs Anzo to scan the entire file.
  10. If the XML file contains lists of objects that are not nested, the file scan cannot determine if any of the objects are the same type, and Anzo treats each object as a new type. To ensure that repeating object paths are treated as the same type if the XML elements are all at the same level, use standard XML path (XPath) syntax to define the repeating element types in the Repeating Element Paths field. If the file nests elements, leave this field blank. Separate paths with semicolons (;). For example:
    /root/people;/root/people/vehicles;/root/people/vehicles/maintenance
  11. Click Save & Extract Schema to scan the file and generate the schema. Anzo saves the data source, creates the schema, and displays the data source overview. For example:

To view the schema that Anzo created, you can click the Schema Name link at the bottom of the screen under Schema Details. Anzo opens the Tables screen for the schema, where you can access schema details.

The source data can now be onboarded to Anzo. For instructions on onboarding the data by letting Anzo automatically generate the mapping, model, and ETL pipeline, see Ingesting Data into Anzo. For information about generating metrics, see Generating Source Data Metrics. For information about adding a schema to a metadata dictionary, see Using Data Dictionaries.

Related Topics