Skip to main content
Skip table of contents

Discovering your sensitive data

Overview

After connecting data to the masking service, the next step is to discover which of the data should be secured. This process is referred as sensitive data discovery, or profiling throughout the product documentation.

Once a rule set has been created, profiling is done by Managing rule sets and running a profiling job for that rule set. A profiling job examines the metadata, such as column names and types, and potentially the data itself, to determine which columns or fields contain sensitive information. Upon determining that a data item is sensitive, the profiler assigns the matching domain and associated masking algorithm to the column or field. A profiling job covers only those tables and files present in the rule set; any new objects accessible through the defined connector will not be discovered and must be manually added to the rule set.

The Continuous Compliance product currently ships with two distinct profiling implementations: the new Automated Sensitive Data Discovery (ASDD) profiler and the legacy profiler. The content of the profile set determines which implementation will be chosen when a profiling job is run. The ASDD profiler supports a wider range of logic for detecting sensitive fields and improved data inspection logic for databases. However, at this time, ASDD profiling is limited to only specific database variants.

Concepts

Profile set

The Profile Set chosen defines the logic that will be used to determine which columns or fields in the rule set contain sensitive information. A profile set may contain a set of search expression and type expressions, or a set of classifiers, that define the recognition logic for the legacy or ASDD profiler, respectively. As each expression or classifier is associated with a domain, the composition of the profile set determines which types of sensitive data may be detected by a profiling job use a particular profile set. Several built-in profile sets are available by default.

Domain

A domain represents a particular type of sensitive information, such as first name or tax ID number. Based on the detection logic in the profile set, a profile job may assign a domain to a particular field or column in the rule set; when this occurs, the default masking algorithm defined for that domain will also be assigned. The domain mechanism helps to ensure that the same masking algorithm is applied consistently across rule sets whenever a particular type of sensitive data is discovered.

Level - column or data

The term Level is used for search expressions to indicate whether the data itself is examined, or if profiling is done based only on the field or column name and type. Examining the data is more time-consuming than examining metadata alone, as the profiling job must retrieve data from the data source.

Classifier

A classifier defines a specific piece of logic for recognizing sensitive data. Classifiers may only be used with the ASDD profiler. Classifiers use a framework and instance model, similar to algorithms. A framework represents a particular software module for detecting sensitive information, while an instance provides the configuration for a framework and associates it with a particular domain. The pre-built ASDD Standard profile set includes a number of classifier instance definitions. It is possible to create additional instances using the API client.

The following classifier frameworks are available:

  • PATH - Examines the path to the data in question and applies regular expression and/or exact match logic to match domains. For databases, the path includes the table and column name.

  • TYPE - Uses the data type and length of a field or column to reject possible domain matches. Supported types are String, Number, Date and Binary.

  • REGEX - Matches the data itself using regular expressions to match or reject domains.

  • LIST - Checks whether data values are present in a list of value to match or reject domains.

Of these frameworks, PATH and TYPE operate at the column level, while REGEX and LIST operate at the data level. It is not currently possible to install additional classifier frameworks.

Search expression

A search expression defines a regular expression (regex) that will be used to match data to a domain. How the regex is applied depends on the value chosen for level - column-level expressions are matched against the field or column name, while data-level expressions are matched against the data values themselves. Every legacy, built-in profile set includes a number of column-level search expressions

designed to identify common sensitive data types (SSN, Name, Addresses, etc). The pre-built profile sets do not include any data level expressions by default, but some data level expressions are included (but not part of any profile set) that may be added to user-created profile sets. You also have the ability to create additional search expressions. 

Type expression

A type expression defines a constraint limiting matches for a particular domain to a particular set of data types, with an optional minimum length for each type. For example, matches for the FIRST_NAME domain may be limited to only string columns with a length of 8 characters or more. Supported types are STRING, NUMBER, DATE, and BINARY. The Standard profile set includes type expressions for most domains, and more may be created if desired.


JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.