ASDD features and support
The ASDD profiler was introduced in Continuous Compliance version 9.0.0.0 and represents the future direction for sensitive data discovery. It offers a number of advantages compared to the legacy profiler, but currently has some limitations as well.
The introduction of the ASDD profiler does not make any changes to the legacy profiler. Existing profiling jobs should continue to function as they have in the past.
The ASDD profiler currently supports the following:
Databases
Delimited files
Fixed width files
JSON files
XML files
Mainframe
ASDD features
The ASDD profiler uses classifiers rather than search and type expressions. Classifiers support more features and configuration options than expressions.
The
LIST
classifier framework is new and has no equivalent functionality in the legacy profiler.The
TYPE
classifier framework uses standard Java SQLType values to identify data types, which should provide broad support for type detection across all database variants.The
PATH
classifier supports exact matching and can be configured to consider table name in addition to column name when matching.The
REGEX
classifier supports the following checksums for data-level recognition:
a.LUHN
: Luhn checksum for credit card numbers.
b.MOD10_ABA
: Modulus 10 checksum for ABA Routing numbers (USA).
c.MOD11_NHS
: Modulus 11 checksum for 10-digit NHS (National Health Services) numbers.
d.MOD11_TFN
: Modulus 11 checksum for 9-digit TFN (Australian Tax File Numbers).
e.MOD97
: Modulus 97 checksum for IBAN (International Bank Account Numbers).
The ASDD profiler provides better matching when the number of rows in a table is less than the target number of rows for profiling, and in general provides more nuanced confidence value in profiling results.
The ASDD profiler attempts to retrieve more data values when a large fraction of the data values for a column are null or empty. The threshold to trigger an additional query is controlled by the application setting ASDD/DefaultNullFilterThreshold – more information can be found in the Application settings section of the Masking API client page.
The ASDD profiler supports statistical sampling for Oracle, SQL Server, and SAP ASE databases, so the data sampled will better reflect the full range of values for each column across the entire table.
Sample percentage
When data sampling is employed, the sample percentage is always set to 1% – if this percentage does not yield enough rows, the query is performed again without sampling.
The ASDD Standard profile set contains data level logic by default, allowing some columns containing sensitive information to be identified even if the column names are not meaningful.
New or improved
REGEX
classifiers for Zip Code and Email Address domains.New
LIST
classifiers are present for First and Last Name, Full Name, US City, US State, and Country domains.
Classifiers and profile sets using them may be exported and imported using the Engine Sync feature. Classifiers are included when the Export Settings action is performed from the Environments tab.
Prior to the 22.0.0.0 release, an ASDD profile set assignment threshold was exclusively determined by the application setting for ASDD called DefaultAssignmentThreshold. Assignment threshold can now be set on a per profile set basis. See Configuring profile sets for more information.
ASDD limitations
The primary limitation of the ASDD Profiler is that it is not yet supported for all connectors. The UI will report an error if the user attempts to save a job using an ASDD profile set with an unsupported connector.
Currently, the following limitations apply to the ASDD Profiler:
Profiling XML or JSON documents stored in database columns are not supported.
Discovery may fail or produce lower-quality results for some extended connectors due to known issues:
The SQL syntax used for column truncation is not compatible with all database variants. This is known to cause failures in discovery jobs for the Informix database.