Scanning the Whole CSV File on Import

To help improve accuracy of data type assignment when importing CSV files, you have the option to configure the system so that any time a CSV file is imported, Anzo scans the entire file before inferring the data types for each column. Follow the instructions below if you want to configure the system to scan entire CSV files.

This change affects all CSV file imports. Users cannot opt-out of a complete scan at import time. This configuration is not related to the Use Extended Sample setting in file import options. Choosing to scan entire files will significantly increase the time it takes to import files. However, scanning the complete file is the best way to ensure that data type assignments are accurate.

  1. In the Administration application, expand the Servers menu and click Advanced Configuration. Click I understand and accept the risk.
  2. Search for the Anzo Utilityservices VFS bundle and view its details.
  3. Click the Services tab and expand UtilityServices VFS Activator.
  4. Find the com.cambridgesemantics.anzo.utilityservices.vfs.isSampleEntireFile property, and select the checkbox to enable the option.

    When SampleEntireFile is enabled, the values in the maxSampleSize and sampleSize properties are ignored and Anzo always scans entire CSV files on import.

  5. Restart Anzo to apply the configuration changes.
Related Topics