Creating a profiling job

This section describes how users can create a Profiling job. You can create Profiling jobs for databases, XML, mainframe files, delimited files, and fixed-width file rule sets. It is not currently possible to profile XML or JSON documents stored in database columns.

When a profiling job runs, it applies all of the recognition logic specified in the profile set to each data element present in the rule set. The behavior of the profiler is also influenced by several application settings, refer to the Profile group settings section of this article.

The Profiler assigns each sensitive data element to a domain, with each domain having a default masking algorithm. Then, in the inventory, masking algorithms can be manually updated as needed to establish the masking rulesets for your data sources.

Column and Field Priority

If you wish to prevent the profiler from updating the domain and algorithm assignments for a particular column or file field, set the Priority value for the column or field to USER.

Profiling Jobs are grouped within environments on the Environment Overview page along with all masking jobs. In order to navigate to the Overview screen, click on an environment and the Overview tab should automatically display.

Creating a new profiling job

To create a new Profiling job:

Click the Profile button on the upper side of the page.
The Create Profiling Job window appears.
You will be prompted for the following information:
- Job Name: A free-form name for the job you are creating. Must be unique.
- Multi Tenant: Check the box if the job is for a multi-tenant database. This option allows existing rulesets to be re-used to mask identical schemas via different connectors. The connector is selected at job execution time.
- Rule Set: Select the rule set that this job will profile.
- No. of Streams: The number of parallel streams to use when running the jobs. For example, you can select two streams to profile two tables in the ruleset concurrently in the job instead of one table at a time.
- Min Memory (MB) (optional): Minimum amount of memory to allocate for the job, in megabytes.
- Max Memory (MB) (optional): Maximum amount of memory to allocate for the job, in megabytes. When an ASDD profile set is selected, the max memory for the job must be at least 1024MB for each stream. For example, if No. of Streams is 4, this value would need to be 4096 or higher.
- Feedback Size (optional): The number of rows to process before writing a message to the logs. Set this parameter to the appropriate level of detail required for monitoring your job. For example, if you set this number significantly higher than the actual number of rows in a job, the progress for that job will only show 0 or 100%.
- Multiple Profiler Expression Check: By default, the profiler stops testing Profiler Expressions on a column or data value after the first expression matches. Check this box if the job should check all Profiler Expressions. If multiple Profiler Expressions match, the Profiler report will indicate multiple matches and the algorithm specified by the DefaultMultiphiAlgorithm application setting will be assigned. This setting applies to both the legacy and ASDD profilers.
- Clear previous auto domain assignments: Checking this setting would reset or clear the domain assignments for the columns (with 'Enable Automatic Updates' enabled) from previous runs.
- Profile Sets: The name of the Profile Set to use. A Profile Set is a set of Profile Expressions (for example, a set of financial expressions) or classifiers. The profile set selected determines whether the legacy or ASDD profiler will run. If the current data source is not supported by the ASDD profiler, selecting an ASDD profile set will result in an error and another profile set must be selected. Refer to this section for information regarding which connectors are supported by ASDD.
- Comments (optional): Add comments related to this job.
- Email (optional): Add e-mail address(es) to which to send status messages. Separate addresses with a comma (,).
When you are finished, click Save.