Delphix masking terminology
Before getting started with the Continuous Compliance Engine, an overview of universal terms and concepts will build and unify how different masking components come together. The following provides a brief overview of the key concepts within the masking service.
Products
Term | Definition |
---|---|
Delphix Continuous Data | Delphix Continuous Data is a Delphix product to deliver data on-demand to application developers and testers. Running as a virtual appliance, it is sometimes referred to as a Data Engine. |
Delphix Continuous Data with Elastic Data | Elastic Data is a storage feature of Delphix Continuous Data that allows optimal data management to minimize costs through the use of block and elastic storage. |
Delphix Continuous Compliance | Delphix Continuous Compliance is a Delphix product for discovering sensitive data and replacing it with realistic but fictitious data. Running as a virtual appliance, it is sometimes referred to as a Compliance Engine. |
Data Control Tower | Data Control Tower (DCT) is a Delphix product that provides a data mesh to unify data governance, automation, and compliance across all applications and cloud platforms. |
Interfaces
Term | Definition |
---|---|
Delphix Setup | Delphix Setup is the Delphix Continuous Data's user interface for system administrators to configure their engine settings, such as storage, support bundle, authentication, and network configurations. |
Delphix Management | Delphix Management is Delphix Continuous Data's user interface for product administrators to manage their virtualized or masked datasets. |
Delphix Self-Service | Delphix Self-Service is the Delphix Continuous Data user interface designed specifically for project teams, application developers, and testers to manage their virtualized datasets. |
Command Line Interface (CLI) | Command Line Interface (CLI) is the engine's terminal interface which allows users to perform various administrative commands. This is not to be confused with the |
Delphix Download Portal | Delphix Download Portal is the location where users download Delphix's products. |
Delphix Support Portal | Delphix Support Portal is the location where users receive support for Delphix's products. |
Core concepts
Term | Definition |
---|---|
Virtualization | Virtualization describes the capability of producing a functioning database or filesystem copy that is lightweight and ephemeral. |
Masking | Masking describes the capability of iterating through a dataset to identify all sensitive fields and replace them with desensitized values to eliminate risk in lower environments. |
Data Source | Data Source is a database or unstructured files located in a user's environment. It generally describes ingestion sources and is typically located in a Source or Staging Environment. |
Dataset | Dataset is an instance of any collection of data, such as VDB, dSource, vFiles, data source, or database. The dataset may or may not be managed by Delphix Continuous Data. |
Data Source Connectors
Term | Definitions |
---|---|
Connector | Connector refers to the Delphix Continuous Data's data source connection mechanism. A connector enables Delphix Continuous Data functionality with a specific data source system or DBMS. See other connector types for specific details. |
Standard Connector | Standard Connector are connectors that are built and supported by Delphix. They are included for free with a Delphix Continuous Data License Agreement. |
Select Connector | Select Connector are connectors that are built and supported by Delphix but require a separate License Agreement. |
Premium Connector | Premium Connector are connectors that are built and supported by a third party. They often require a separate License Agreement. |
Plugin | Plugin is the software delivery framework for many Connectors. The user must upload the plugin into Delphix Continuous Data to install the associated Connector. |
Continuous Compliance
Term | Definition |
---|---|
Application | An Application is a tag that is assigned to one or more environments. We recommend using an application name that is the same as the application associated with the environments. |
Connector | Connector refers to the Delphix Continuous Data's data source connection mechanism. A connector enables Delphix Continuous Data functionality with a specific data source system or DBMS. See other connector types for specific details. |
Domain | A domain represents a correlation between various sensitive data categories (social security numbers) and the way it should be secured. |
Environment | An environment is a construct that can be used to describe a collection of masking jobs associated with a group of data sources. |
dSource | dSource is the copy of a source database's persistent data layer that Delphix Continuous Data uses to create and update virtual databases (VDBs). Based on the ingestion model and data source type, the dSource could be exposed through a mount point and interacts with a Staging Database Instance, Database, or Files. Consult the data source connector documentation for specific details. |
Virtual Database (VDB) | Virtual Database (VDB) is a full read-and-write copy of the source data that is provisioned from either a dSource or another VDB. A VDB is provisioned and managed by Delphix Continuous Data. |
Timeflow | Timeflow describes the timeline of data of a virtual database or dSource. |
Snapshot | Snapshot represents the state of a dataset at a specific moment in time. They are used to create or refresh the same or another timeline. |
Hooks | Hooks are mechanisms that allow the execution of custom operations at specific points in various processes like linking, provisioning, and managing virtual datasets. |
Delphix Connector | Delphix Connector is a service that runs on the Windows Staging and Target Environments to enable communication with Delphix Continuous Data. The Delphix Connector should not be confused with Data Source Connectors. |
Ruleset | A rule set is group of tables or flat files within a particular data source that a user may choose to run profile, masking, or tokenization jobs on. |
Source Environment | Source Environment is the "production" or "golden" environment that contains the ideal database the user would like to virtualize in Delphix Continuous Data. |
Target Environment | Target Environment is the configured infrastructure, with available database binaries, in which virtual database copies will be hosted. |
Host | Host is a single server within the environment collection. An environment is considered to be one or more hosts. For example, a RAC environment contains multiple hosts. |
Masking algorithms
The following terminology is around the different Algorithms that users may use to secure their data.
Term | Definition |
---|---|
Algorithm Framework | A type of masking algorithm. One or more usable instances of an algorithm framework may be created. For example, "FIRST NAME SL" is an instance of the Secure Lookup algorithm framework. |
Algorithm Instance | A named combination of algorithm framework and configuration values. Algorithm instances are applied to data fields and columns in the inventory in order to mask data. |
Built-in Algorithm | An algorithm instance or framework included with the Masking Engine software. This includes several built-in algorithm instances that provide masking behavior that doesn't correspond to any built-in algorithm framework. |
Non-conformant Data | Some masking algorithms require data to be in a particular format. The required format may vary by the configuration of the algorithm instance. For example, a particular Segment Mapping algorithm might be configured to expect a 10 digit number. Data which doesn't fit the pattern expected by an algorithm is called nonconforming data or non-conformant data. By default, non-conformant data is not masked, and warnings are recorded for the masking job. Warnings are indicated by a yellow triangle warning marker next to the job execution in Environment and Job Monitor pages. Whether non-conformant data results in a warning or failure is configurable for each algorithm instance. |
Collision | The term collision describes the case where a masking algorithm masks two or more unique input values to the same output value. For example, a first name Secure Lookup algorithm might mask both "Amy" and "Jane" to the same masked value "Beth". This may be desirable, in the sense that it further obfuscates the original data, however collisions are problematic for data columns with uniqueness constraints. |
Secure Lookup | The most commonly used algorithm framework. Secure lookup works by replacing each data value with a new value chosen from an input file. Replacement values are chosen based on a cryptographic hash of the original value, so masking output is consistent for each input. Secure lookup algorithms are easy to configure and work with different languages. When this algorithm replaces real data with fictional data, collisions, described above, are possible. Because many types of data, such as first or last name, address, etc, are not unique in real data, this is often acceptable. However, if unique masking output for each unique input is required, consider using a mapping or segment mapping algorithm, described below. |
Segment Mapping | This algorithm permutes short numeric or alpha-numeric values to other values of the same format. This algorithm is guaranteed to not produce collisions, so long as the set of permissible mask values is at least as large as input or "real" set. The maximum number of digits or characters in the masked value is 36. You might use this method if you need columns with unique values, such as Social Security Numbers, primary key columns, or foreign key columns. |
Mapping | Similar to secure lookup, a mapping algorithm allows you to provide a set of values that will replace the original data. There will be no collisions in the masked data, because each input is always matched to the same output, and each output value is only assigned to one input value. In order to accomplish this, the algorithm records, in an encrypted format, all known input to output mappings. You can use a mapping algorithm on any set of values, of any length, but you must know how many values you plan to mask, and provide a set of unique replacement values sufficient to replace each unique input value. NOTE: When you use a mapping algorithm, you cannot mask more than one table at a time. You must mask tables serially. |
Binary Lookup | Replaces objects that appear in object columns. For example, if a bank has an object column that stores images of checks, you can use binary lookup algorithm to mask those images. The Delphix Engine cannot change data within images themselves, such as the name on X-rays or driver’s licenses. However, you can replace all such images with a new, fictional image. This fictional image is provided by the owner of the original data. |
Tokenization | The only type of algorithm that allows you to reverse its masking. For example, you can use a tokenization algorithm to mask data before you send it to an external vendor for analysis. The vendor can then identify accounts that need attention without having any access to the original, sensitive data. Once you have the vendor’s feedback, you can reverse the masking and take action on the appropriate accounts. Like mapping, a tokenization algorithm creates a unique token for each input such as “David” or “Melissa.” The Delphix Engine stores both the token and original so that you can reverse masking later. |
Min Max | Values that are extremely high or low in certain categories allow viewers to infer someone’s identity, even if their name has been masked. For example, a salary of $1 suggests a company’s CEO, and some age ranges suggest higher insurance risk. You can use a min max algorithm to move all values of this kind into the midrange. |
Data Cleaning | Does not perform any masking. Instead, it standardizes varied spellings, misspellings, and abbreviation for the same name. For example, “Ariz,” “Az,” and “Arizona” can all be cleaned to “AZ.” |
Free Text Redaction | Helps you remove sensitive data that appears in free-text columns such as “Notes.” This type of algorithm requires some expertise to use, because you must set it to recognize sensitive data within a block of text. One challenge is that individual words might not be sensitive on their own, but together they may be. This algorithm uses profiler sets to determine which information it needs to mask. You can decide which expressions the algorithm uses to search for material such as addresses. For example, you can set the algorithm to look for “St,” “Cir,” “Blvd,” and other words that suggest an address. You can also use pattern matching to identify potential sensitive information. For example, a number that takes the form 123-45-6789 is likely to be a Social Security Number. You can use free text redaction algorithm to show or hide information by displaying either a "deny list” or an “allow list.” |
Profile job concepts
The following set of concepts are options available to the user for configuring a profiling job.
Term | Definition |
---|---|
Job Name | A free-form name for the job you are creating. Must be unique. |
Multi-Tenant | Check the box if the job is for a multi-tenant database. This option allows existing rulesets to be reused to mask identical schemas via different connectors. The connector can be selected at job execution time. |
Rule Set | Select a ruleset that this job will execute against. |
No. of Streams | The number of parallel streams to use when running the jobs. For example, you can select two streams to run two tables in the ruleset concurrently in the job instead of one table at a time. |
Min Memory (MB) optional | Minimum amount of memory to allocate for the job, in megabytes. |
Max Memory (MB) optional | Maximum amount of memory to allocate for the job, in megabytes. |
Feedback Size optional | The number of rows to process before writing a message to the log. Set this parameter to the appropriate level of detail required for monitoring your job. For example, if you set this number significantly higher than the actual number of rows in a job, the progress for that job will only show 0 or 100% |
Profile Sets optional | The name of a profile set, which is a subset of expressions (for example, a subset of financial expressions). |
Comments optional | Add comments related to this job. |
Email optional | Add email address(es) to which to send status messages. Separate addresses with a comma (,). |
Masking job concepts
These concepts are options available to the user for configuring a masking job.
Term | Definition |
---|---|
Job Name | A free-form name for the job you are creating. Must be unique across the entire application. |
Masking Method | Select either In-Place or On-The-Fly. |
Multi-Tenant | Check the box if the job is for a multi-tenant database. This option allows existing rulesets to be reused to mask identical schemas via different connectors. The connector can be selected at job execution time. |
Rule Set | Select a ruleset for this job to execute against. |
Masking Method | Select either In-place or On-the-fly. |
Min Memory (MB) optional | Minimum amount of memory to allocate for the job, in megabytes. |
Max Memory (MB) optional | Maximum amount of memory to allocate for the job, in megabytes. |
Update Threads | The number of update threads to run in parallel to update the target database. For database using T-SQL, multiple update/insert threads can cause deadlock. If you see this type of error, reduce the number of threads that you specify in this box. |
Commit Size | The number of rows to process before issuing a commit to the database. |
Feedback Size | The number of rows to process before writing a message to the logs. Set this parameter to the appropriate level of detail required for monitoring your job. For example, if you set this number significantly higher than the actual number of rows in a job, the progress that job will show 0% or 100%. |
Disable Trigger optional | Whether to automatically disable database triggers. The default is for this check box to be clear and therefore not perform automatic disabling of triggers. |
Drop Index optional | Whether to automatically drop indexes on columns which are being masked and automatically re-create the index when the masking job is completed. The default is for this check box to be clear and therefore not perform automatic dropping of indexes. |
Prescript optional | Specify the full pathname of a file that contains SQL statements to run before the job starts, or click Browse to specify a file. If you are editing the job and a pre script file is already specified, you can click the Delete button to remove the file. (The Delete button only appears if a prescript file was already specified.) |
Postscript optional | Specify the full pathname of a file that contains SQL statements to be run after the job finishes, or click Browse to specify a file. If you are editing the job and a postscript file is already specified, you can click the Delete button to remove the file. (The Delete button only appears if a postscript file was already specified.) |
Comments optional | Add comments related to this masking job. |
Email optional | Add email address(es) to which to send status messages. |