The ASDD profiler was introduced in Continuous Compliance version 9.0, and represents the future direction for sensitive data discovery. It offers a number of advantages as compared to the legacy profiler, but currently has some limitations as well.
The introduction of the ASDD profiler does not make any changes to the legacy profiler. Existing profiling jobs should continue to function as they have in the past.
The ASDD profiler currently supports the following:
JSON files (not available for the legacy profiler)
The ASDD profiler uses classifiers rather than search and type expressions. Classifiers support more features and configuration options than expressions.
The LIST classifier framework is new and has no equivalent functionality in the legacy profiler.
The TYPE classifier framework uses standard Java SQLType values to identify data types, which should provide broad support for type detection across all database variants.
The PATH classifier supports exact matching and can be configured to consider table name in addition to column name when matching.
The REGEX classifier supports the following checksums for data-level recognition:
a. LUHN: Luhn checksum for credit card numbers.
b. MOD10_ABA: Modulus 10 checksum for ABA Routing numbers (USA).
c. MOD11_NHS: Modulus 11 checksum for 10-digit NHS (National Health Services) numbers.
d. MOD11_TFN: Modulus 11 checksum for 9-digit TFN (Australian Tax File Numbers).
e. MOD97: Modulus 97 checksum for IBAN (International Bank Account Numbers).
The ASDD profiler provides better matching when the number of rows in a table is less than the target number of rows for profiling, and in general provides more nuanced confidence value in profiling results.
The ASDD profiler attempts to retrieve more data values when a large fraction of the data values for a column are null or empty. The threshold to trigger an additional query is controlled by the application setting ASDD/DefaultNullFilterThreshold.
The ASDD profiler supports statistical sampling for Oracle, SQL Server and SAP ASE databases, so that the data sampled will better reflect the full range of values for each column across the entire table.
When data sampling is employed, the sample percentage is always set to 1% - if this percentage does not yield enough rows, the query is performed again without sampling.
The ASDD Standard profile set contains data level logic by default, allowing some columns containing sensitive information to be identified even if the column names are not meaningful.
New or improved REGEX classifiers for Zip Code and Email Address domains.
New LIST classifiers are present for First and Last Name, Full Name, US City, US State, and Country domains.
Classifiers and profile sets using them may be exported and imported using the Engine Sync feature. Classifiers are included when the Export Settings action is performed from the Environments tab.
The primary limitation of the ASDD Profiler is that it is not yet supported for all connectors. The UI will report an error if the user attempts to save a job using an ASDD profile set with an unsupported connector.
Currently, the following limitations apply to the ASDD Profiler:
XML, fixed width, and mainframe profiling are not supported.
Discovery may fail or produce lower-quality results for some extended connectors due to known issues:
The SQL syntax used for column truncation is not compatible with all database variants. This is known to cause failures in discovery jobs for the Informix database.