Discovering your sensitive data
Overview
After connecting data to the masking service, the next step is to discover which of the data should be secured. This process is referred as sensitive data discovery, or profiling throughout the product documentation.
Once a rule set has been created, profiling is done by Managing rule sets and running a profiling job for that rule set. A profiling job examines the metadata, such as column names and types, and potentially the data itself, to determine which columns or fields contain sensitive information. Upon determining that a data item is sensitive, the profiler assigns the matching domain and associated masking algorithm to the column or field. A profiling job covers only those tables and files present in the rule set; any new objects accessible through the defined connector will not be discovered and must be manually added to the rule set.
The Continuous Compliance product currently ships with the Automated Sensitive Data Discovery (ASDD) profiler. The ASDD profiler supports a wide range of logic for detecting sensitive fields and improved data inspection logic for databases.
Concepts
Profile set
The Profile Set chosen defines the logic that will be used to determine which columns or fields in the rule set contain sensitive information. A profile set contains a set of classifiers, that define the recognition logic for the ASDD profiler. As each classifier is associated with a domain, the composition of the profile set determines which types of sensitive data may be detected by a profiling job using a particular profile set. Several built-in profile sets are available by default.
Domain
A domain represents a particular type of sensitive information, such as first name or tax ID number. Based on the detection logic in the profile set, a profile job may assign a domain to a particular field or column in the rule set; when this occurs, the default masking algorithm defined for that domain will also be assigned. The domain mechanism helps to ensure that the same masking algorithm is applied consistently across rule sets whenever a particular type of sensitive data is discovered.
Classifier
A classifier defines a specific piece of logic for recognizing sensitive data. Classifiers may only be used with the ASDD profiler. Classifiers use a framework and instance model, similar to algorithms. A framework represents a particular software module for detecting sensitive information, while an instance provides the configuration for a framework and associates it with a particular domain. The pre-built ASDD Standard profile set includes a number of classifier instance definitions. It is possible to create additional instances using the API client.
The following classifier frameworks are available:
PATH - Examines the path to the data in question and applies regular expression and/or exact match logic to match domains. For databases, the path includes the table and column name.
TYPE - Uses the data type and length of a field or column to reject possible domain matches. Supported types are String, Number, Date and Binary.
REGEX - Matches the data itself using regular expressions to match or reject domains.
LIST - Checks whether data values are present in a list of value to match or reject domains.
Of these frameworks, PATH and TYPE operate at the column level, while REGEX and LIST operate at the data level. It is not currently possible to install additional classifier frameworks.