Data masking is tightly coupled with building test data. Two major types of data masking are static and on-the-fly data masking.
Static data masking Static data masking is usually performed on the
golden copy of the database, but can also be applied to values in other sources, including files. In DB environments, production database administrators will typically load table backups to a separate environment, reduce the dataset to a subset that holds the data necessary for a particular round of testing (a technique called "subsetting"), apply data masking rules while data is in stasis, apply necessary code changes from source control, and/or and push data to desired environment.
Deterministic data masking Deterministic masking is the process of replacing a value in a column with the same value whether in the same row, the same table, the same database/schema and between instances/servers/database types. Example: A database has multiple tables, each with a column that has first names. With deterministic masking the first name will always be replaced with the same value – “Lynne” will always become “Denise” – wherever “Lynne” may be in the database.
Statistical data obfuscation There are also alternatives to the static data masking that rely on stochastic perturbations of the data that preserve some of the statistical properties of the original data. Examples of statistical data obfuscation methods include
differential privacy and the
DataSifter method.
On-the-fly data masking On-the-fly data masking happens in the process of transferring data from environment to environment without data touching the disk on its way. The same technique is applied to "Dynamic Data Masking" but one record at a time. This type of data masking is most useful for environments that do continuous deployments as well as for heavily integrated applications. Organizations that employ continuous deployment or
continuous delivery practices do not have the time necessary to create a backup and load it to the golden copy of the database. Thus, continuously sending smaller subsets (deltas) of masked testing data from production is important. In heavily integrated applications, developers get feeds from other production systems at the very onset of development and masking of these feeds is either overlooked and not budgeted until later, making organizations non-compliant. Having on-the-fly data masking in place becomes essential.
Dynamic data masking Dynamic data masking is similar to on-the-fly data masking, but it differs in the sense that on-the-fly data masking is about copying data from one source to another source so that the latter can be shared. Dynamic data masking happens at runtime, dynamically, and on-demand so that there doesn't need to be a second data source where to store the masked data dynamically. Dynamic data masking enables several scenarios, many of which revolve around strict privacy regulations e.g. the Singapore Monetary Authority or the Privacy regulations in Europe. Dynamic data masking is
attribute-based and policy-driven. Policies include: • Doctors can view the medical records of patients they are assigned to (data filtering) • Doctors cannot view the SSN field inside a medical record (data masking). Dynamic data masking can also be used to encrypt or decrypt values on the fly especially when using
format-preserving encryption. Several standards have emerged in recent years to implement dynamic data filtering and masking. For instance,
XACML policies can be used to mask data inside databases. There are six possible technologies to apply Dynamic data masking: • In the database: Database receives the SQL and applies rewrite to returned masked result set. Applicable for developers and database administrators, but not for applications (because connection pools, application caching and data-bus hide the application user identity from the database and can also cause application data corruption). • Network proxy between the application and the database: Captures the SQL and applies rewrite on the select request. Applicable for developers and database administrators with simple 'select'requests but not for stored procedures (which the proxy only identifies the exec.) and applications (because connection pools, application caching and data-bus hide the application user identity from the database and can also cause application data corruption). • Database proxy: is a variation of network proxy. Database proxy is deployed usually between applications/users and the database. Applications and users are connecting to the database through database security proxy. There are no changes to the way applications and users are connecting to the database. There is also no need of an agent to be installed on the database server. The sql queries are rewritten, but when implemented, this type of dynamic data masking also supported within store procedures and database functions. • Network proxy between the end-user and the application: identifying text strings and replacing them. This method is not applicable for complex applications as it will easily cause corruption when the real-time string replacement is unintentionally applied. • Code changes in the applications & XACML: code changes are usually hard to perform, impossible to maintain and not applicable for packaged applications. • Within the application run-time: By instrumenting the application run-time, policies are defined to rewrite the result set returned from the data sources, while having full visibility to the application user. This method is the only applicable way to dynamically mask complex applications as it enables control to the data request, data result and user result. • Supported by a browser plugin: In the case of SaaS or local web applications, browser add-ons can be configured to mask data fields corresponding to precise
CSS Selectors. This can either be accomplished by marking sensitive fields in the application, for example by a
HTML class or by finding the right selectors that identify the fields to be obfuscated or masked.
Data masking and the cloud In latest years, organizations develop their new applications in the cloud more and more often, regardless of whether final applications will be hosted in the cloud or on- premises. The cloud solutions as of now allow organizations to use
infrastructure as a service,
platform as a service, and
software as a service. There are various modes of creating test data and moving it from on-premises databases to the cloud, or between different environments within the cloud. Dynamic Data Masking becomes even more critical in cloud when customers need to protecting PII data while relying on cloud providers to administer their databases. Data masking invariably becomes the part of these processes in the
systems development life cycle (SDLC) as the development environments'
service-level agreements (SLAs) are usually not as stringent as the production environments' SLAs regardless of whether application is hosted in the cloud or on-premises. ==See also==