100% found this document useful (1 vote)

754 views

15.5 Admin Guide PDF

Uploaded by

Manuraj Raghuwanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

754 views

15.5 Admin Guide PDF

Uploaded by

Manuraj Raghuwanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2560

Symantec™ Data Loss

Prevention Administration
Guide

Version 15.5

Last updated: 19 August 2019

Symantec Data Loss Prevention Administration Guide
Documentation version: 15.5d

Legal Notice
Copyright © 2019 Symantec Corporation. All rights reserved.

Symantec, CloudSOC, Blue Coat, the Symantec Logo, the Checkmark Logo, the Blue Coat logo, and the
Shield Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S.
and other countries. Other names may be trademarks of their respective owners.

This Symantec product may contain third party software for which Symantec is required to provide attribution
to the third party (“Third Party Programs”). Some of the Third Party Programs are available under open
source or free software licenses. The License Agreement accompanying the Software does not alter any
rights or obligations you may have under those open source or free software licenses. Please see the
Third Party Legal Notice Appendix to this Documentation or TPIP ReadMe File accompanying this Symantec
product for more information on the Third Party Programs.

The product described in this document is distributed under licenses restricting its use, copying, distribution,
and decompilation/reverse engineering. No part of this document may be reproduced in any form by any
means without prior written authorization of Symantec Corporation and its licensors, if any.

THE DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS,
REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE
DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY
INVALID. SYMANTEC CORPORATION SHALL NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL
DAMAGES IN CONNECTION WITH THE FURNISHING, PERFORMANCE, OR USE OF THIS
DOCUMENTATION. THE INFORMATION CONTAINED IN THIS DOCUMENTATION IS SUBJECT TO
CHANGE WITHOUT NOTICE.

The Licensed Software and Documentation are deemed to be commercial computer software as defined
in FAR 12.212 and subject to restricted rights as defined in FAR Section 52.227-19 "Commercial Computer
Software - Restricted Rights" and DFARS 227.7202, et seq. "Commercial Computer Software and
Commercial Computer Software Documentation," as applicable, and any successor regulations, whether
delivered by Symantec as on premises or hosted services. Any use, modification, reproduction release,
performance, display or disclosure of the Licensed Software and Documentation by the U.S. Government
shall be solely in accordance with the terms of this Agreement.
Symantec Corporation
350 Ellis Street
Mountain View, CA 94043

https://www.symantec.com
Symantec Support
All support services will be delivered in accordance with your support agreement and the
then-current Enterprise Technical Support policy.

Knowledge Base Articles and Symantec Connect

Before you contact Technical Support, you can find free content in our online Knowledge Base,
which includes troubleshooting articles, how-to articles, alerts, and product manuals. In the
search box of the following URL, type the name of your product:
https://support.symantec.com
Access our blogs and online forums to engage with other customers, partners, and Symantec
employees on a wide range of topics at the following URL:
https://www.symantec.com/connect

Technical Support and Enterprise Customer Support

Symantec Support maintains support centers globally 24 hours a day, 7 days a week. Technical
Support’s primary role is to respond to specific queries about product features and functionality.
Enterprise Customer Support assists with non-technical questions, such as license activation,
software version upgrades, product access, and renewals.
For Symantec Support terms, conditions, policies, and other support information, see:
https://entced.symantec.com/default/ent/supportref
To contact Symantec Support, see:
https://support.symantec.com/en_US/contact-support.html
Contents

Symantec Support .............................................................................................. 4

Section 1 Getting started .............................................................. 71

Chapter 1 Introducing Symantec Data Loss Prevention ................ 72
About updates to the Symantec Data Loss Prevention Administration
Guide ................................................................................... 72
About Symantec Data Loss Prevention ............................................. 75
About the Enforce Server platform ................................................... 77
About Network Monitor and Prevent ................................................. 78
About Network Discover/Cloud Storage Discover ................................ 78
About Network Protect ................................................................... 79
About Endpoint Discover ................................................................ 80
About Endpoint Prevent ................................................................. 80

Chapter 2 Getting started administering Symantec Data Loss

Prevention ...................................................................... 82
About Symantec Data Loss Prevention administration .......................... 82
About the Enforce Server administration console ................................ 83
Logging on and off the Enforce Server administration console ............... 84
About the administrator account ...................................................... 85
Performing initial setup tasks .......................................................... 85
Changing the administrator password ............................................... 86
Adding an administrator email account .............................................. 86
Editing a user profile ..................................................................... 87
Changing your password ............................................................... 89

Chapter 3 Working with languages and locales ............................... 91

About support for character sets, languages, and locales ...................... 91
Supported languages for detection ................................................... 92
Working with international characters ............................................... 93
About Symantec Data Loss Prevention language packs ....................... 94
About locales ............................................................................... 95
Contents 6

Using a non-English language on the Enforce Server administration

console ................................................................................ 95
Using the Language Pack Utility ...................................................... 96

Section 2 Managing the Enforce Server

platform ..................................................................... 100
Chapter 4 Managing Enforce Server services and settings ......... 101
About Symantec Data Loss Prevention services ................................ 101
About starting and stopping services on Windows ............................. 102
Starting an Enforce Server on Windows .................................... 103
Stopping an Enforce Server on Windows ................................... 103
Starting a detection server on Windows .................................... 104
Stopping a detection server on Windows ................................... 104
Starting services on single-tier Windows installations ................... 104
Stopping services on single-tier Windows installations ................. 105
About starting and stopping services on Linux .................................. 105
Starting an Enforce Server on Linux ......................................... 105
Stopping an Enforce Server on Linux ........................................ 106
Starting a detection server on Linux ......................................... 106
Stopping a detection server on Linux ........................................ 107
Starting services on single-tier Linux installations ........................ 107
Stopping services on single-tier Linux installations ...................... 107

Chapter 5 Managing roles and users ............................................... 109

About role-based access control .................................................... 109
About configuring roles and users .................................................. 110
About recommended roles for your organization ................................ 111
Roles included with solution packs ................................................. 112
Configuring roles ........................................................................ 114
Configuring user accounts ............................................................ 121
Configuring password enforcement settings ..................................... 124
Resetting the Administrator password ............................................. 125
Manage and add roles ................................................................. 126
Manage and add users ................................................................ 126
About authenticating users ........................................................... 127
Configuring user authentication ..................................................... 131
About SAML authentication .................................................... 131
Setting up authentication ........................................................ 132
Administrator Bypass URL ..................................................... 133
Set up and configure the authentication method .......................... 133
Contents 7

Set up the SAML authentication configuration ............................ 135

Generate or download Enforce (service providers) SAML
metadata ...................................................................... 135
Configure the Enforce Server as a SAML service provider with
the IdP (Create an application in your identity provider) .......... 136
Export the IdP metadata to DLP .............................................. 136
Configuring Active Directory authentication ................................ 136
Configuring forms-based authentication .................................... 137
Configuring certificate authentication ........................................ 137
Integrating Active Directory for user authentication ............................ 137
Creating the configuration file for Active Directory
integration ..................................................................... 138
Verifying the Active Directory connection ................................... 140
Configuring the Enforce Server for Active Directory
authentication ................................................................ 141
About certificate authentication configuration .................................... 142
Configuring certificate authentication for the Enforce Server
administration console ..................................................... 144
Adding certificate authority (CA) certificates to the Tomcat trust
store ............................................................................ 146
Mapping Common Name (CN) values to Symantec Data Loss
Prevention user accounts ................................................. 149
About certificate revocation checks .......................................... 150
Troubleshooting certificate authentication .................................. 153
Disabling password authentication and forms-based logon ............ 154

Chapter 6 Connecting to group directories ..................................... 155

Creating connections to LDAP servers ............................................ 155
Configuring directory server connections ......................................... 156
Scheduling directory server indexing .............................................. 158

Chapter 7 Managing stored credentials .......................................... 160

About the credential store ............................................................. 160
Adding new credentials to the credential store .................................. 161
Configuring endpoint credentials .................................................... 161
Managing credentials in the credential store ..................................... 162
Managing stored credentials ......................................................... 162

Chapter 8 Managing system events and messages ...................... 164

About system events ................................................................... 164
System events reports ................................................................. 165
Contents 8

Working with saved system reports ................................................ 168

Server and Detectors event detail .................................................. 169
Configuring event thresholds and triggers ........................................ 170
About system event responses ...................................................... 172
Enabling a syslog server .............................................................. 174
About system alerts ..................................................................... 175
Configuring the Enforce Server to send email alerts ........................... 176
Configuring system alerts ............................................................. 177
About log review ......................................................................... 179
System event codes and messages ................................................ 180

Chapter 9 Managing the Symantec Data Loss Prevention

database ........................................................................ 206
Working with Symantec Data Loss Prevention database diagnostic
tools ................................................................................... 206
Viewing tablespaces and data file allocations ................................... 207
Adjusting warning thresholds for tablespace usage in large
databases ..................................................................... 208
Generating a database report ................................................. 208
Viewing table details .................................................................... 209
Checking the database update readiness ........................................ 210
Preparing to run the Update Readiness tool ............................... 211
Creating the Update Readiness tool database account ................. 213
Running the Update Readiness tool from the Enforce Server
administration console ..................................................... 214
Running the Update Readiness tool at the command line ............. 215
Reviewing update readiness results ......................................... 218

Chapter 10 Working with Symantec Information Centric

Encryption ..................................................................... 219
About Symantec Information Centric Encryption ................................ 219
About the Symantec ICE Utility ...................................................... 220
Overview of implementing Information Centric Encryption
capabilities .......................................................................... 222
Configuring the Enforce Server to connect to the Symantec ICE
Cloud ................................................................................. 224
Contents 9

Chapter 11 Working with Symantec Information Centric

Tagging .......................................................................... 226
About integrating Information Centric Tagging with Data Loss
Prevention ........................................................................... 226
Overview of steps to tie Information Centric Tagging to Data Loss
Prevention ........................................................................... 228
Integrating the ICT server with the Enforce Server ............................. 229
About automatic and static imports of the ICT classification
taxonomy ...................................................................... 229
Using the ICT Web Service for scheduled classification taxonomy
imports ......................................................................... 230
Using an XML file for static classification taxonomy imports ........... 231
Importing the ICT classification taxonomy ........................................ 231
Supported file types for ICT-Data Loss Prevention integration .............. 232

Chapter 12 Adding a new product module ........................................ 234

Installing a new license file ........................................................... 234
About system upgrades ............................................................... 235

Chapter 13 Applying a Maintenance Pack ......................................... 236

Applying a Symantec Data Loss Prevention Maintenance Pack ............ 236
Steps to apply a maintenance pack on Windows servers .............. 236
Steps to apply a maintenance pack on Linux servers ................... 240

Section 3 Managing detection servers ................................ 245

Chapter 14 Installing and managing detection servers and
cloud detectors ............................................................ 246
About managing Symantec Data Loss Prevention servers ................... 247
Preparing for Microsoft Rights Management file monitoring ................. 247
Enabling Microsoft Rights Management file monitoring ................. 248
Enabling Advanced Process Control ............................................... 250
Server controls ........................................................................... 251
Server configuration—basic .......................................................... 253
Network Monitor Server—basic configuration ............................. 254
Network Prevent for Email Server—basic configuration ................ 256
Network Prevent for Web Server—basic configuration ................. 259
Network Discover/Cloud Storage Discover Server and Network
Protect—basic configuration ............................................. 261
Endpoint Server—basic configuration ....................................... 262
Contents 10

Single Tier Monitor — basic configuration .................................. 263

Editing a detector ........................................................................ 272
Server and detector configuration—advanced .................................. 273
Adding a detection server ............................................................. 273
Adding a cloud detector ............................................................... 275
Removing a server ...................................................................... 277
Importing SSL certificates to Enforce or Discover servers .................... 277
About the Overview screen ........................................................... 278
Configuring the Enforce Server to use a proxy to connect to cloud
services .............................................................................. 279
Server and detector status overview ............................................... 280
Recent error and warning events list ............................................... 282
Server/Detector Detail screen ........................................................ 283
Advanced server settings ............................................................. 285
Advanced detector settings ........................................................... 326
About using load balancers in an endpoint deployment ....................... 330

Chapter 15 Managing log files ............................................................. 333

About log files ............................................................................ 333
Operational log files .............................................................. 334
Debug log files ..................................................................... 337
Log collection and configuration screen ........................................... 343
Configuring server logging behavior ............................................... 343
Collecting server logs and configuration files .................................... 347
About log event codes ................................................................. 350
Network Prevent for Web operational log files and event
codes ........................................................................... 351
Network Prevent for Web access log files and fields .................... 352
Network Prevent for Web protocol debug log files ....................... 354
Network Prevent for Email log levels ........................................ 355
Network Prevent for Email operational log codes ........................ 355
Network Prevent for Email originated responses and codes .......... 359

Chapter 16 Using Symantec Data Loss Prevention utilities .......... 362

About Symantec Data Loss Prevention utilities ................................. 362
About Endpoint utilities ................................................................ 363
About DBPasswordChanger ......................................................... 364
DBPasswordChanger syntax .................................................. 364
Example of using DBPasswordChanger .................................... 365
Contents 11

Section 4 Authoring policies ..................................................... 366

Chapter 17 Introduction to policies .................................................... 368
About Data Loss Prevention policies ............................................... 368
Policy components ...................................................................... 370
Policy templates ......................................................................... 371
Solution packs ........................................................................... 372
Policy groups ............................................................................. 372
Policy deployment ....................................................................... 373
Policy severity ............................................................................ 374
Policy authoring privileges ............................................................ 375
Data Profiles .............................................................................. 375
User Groups .............................................................................. 376
Policy template import and export .................................................. 377
Workflow for implementing policies ................................................. 378
Viewing, printing, and downloading policy details ............................... 379

Chapter 18 Overview of policy detection ........................................... 381

Detecting data loss ..................................................................... 381
Content that can be detected .................................................. 382
Files that can be detected ...................................................... 382
Protocols that can be monitored .............................................. 382
Endpoint events that can be detected ....................................... 383
Identities that can be detected ................................................. 383
Languages that can be detected .............................................. 383
Data Loss Prevention policy detection technologies ........................... 383
Policy matching conditions ............................................................ 386
Content matching conditions ................................................... 387
File property matching conditions ............................................. 388
Protocol matching condition for network .................................... 389
Endpoint matching conditions ................................................. 389
Groups (identity) matching conditions ....................................... 390
Detection messages and message components ................................ 391
Exception conditions ................................................................... 393
Compound conditions .................................................................. 394
Policy detection execution ............................................................ 394
Two-tier detection for DLP Agents .................................................. 395

Chapter 19 Creating policies from templates ................................... 397

Creating a policy from a template ................................................... 397
US Regulatory Enforcement policy templates ................................... 400
Contents 12

General Data Protection Regulation (GDPR) policy templates .............. 402

International Regulatory Enforcement policy templates ....................... 403
Customer and Employee Data Protection policy templates .................. 404
Confidential or Classified Data Protection policy templates .................. 405
Network Security Enforcement policy templates ................................ 406
Acceptable Use Enforcement policy templates .................................. 407
Columbia Personal Data Regulatory Enforcement policy template ........ 408
Choosing an Exact Data Profile ..................................................... 409
Choosing an Indexed Document Profile ........................................... 411

Chapter 20 Configuring policies .......................................................... 412

Adding a new policy or policy template ............................................ 412
Configuring policies ..................................................................... 413
Adding a rule to a policy ............................................................... 415
Configuring policy rules ................................................................ 417
Defining rule severity ................................................................... 420
Configuring match counting .......................................................... 421
Selecting components to match on ................................................. 423
Adding an exception to a policy ..................................................... 424
Configuring policy exceptions ........................................................ 426
Configuring compound match conditions ......................................... 429
Input character limits for policy configuration .................................... 431

Chapter 21 Administering policies ...................................................... 432

Manage and add policies ............................................................. 432
Manage and add policy groups ...................................................... 435
Creating and modifying policy groups ............................................. 436
Importing policies ........................................................................ 437
About importing policies ......................................................... 437
About policy references ......................................................... 438
Exporting policies ....................................................................... 439
About policy export ............................................................... 439
Cloning policies .......................................................................... 440
Importing policy templates ............................................................ 441
Exporting policy detection as a template .......................................... 442
Adding an automated response rule to a policy ................................. 442
Removing policies and policy groups .............................................. 443
Viewing and printing policy details .................................................. 444
Downloading policy details ........................................................... 444
Troubleshooting policies ............................................................... 445
Updating EDM and IDM profiles to the latest version .......................... 446
Updating policies after upgrading to the latest version ........................ 447
Contents 13

Chapter 22 Best practices for authoring policies ............................ 449

Best practices for authoring policies ................................................ 449
Develop a policy strategy that supports your data security
objectives ........................................................................... 451
Use a limited number of policies to get started .................................. 451
Use policy templates but modify them to meet your requirements ......... 452
Use the appropriate match condition for your data loss prevention
objectives ........................................................................... 452
Test and tune policies to improve match accuracy ............................. 453
Start with high match thresholds to reduce false positives ................... 454
Use a limited number of exceptions to narrow detection scope ............. 455
Use compound conditions to improve match accuracy ........................ 455
Author policies to limit the potential effect of two-tier detection ............. 456
Use policy groups to manage policy lifecycle .................................... 457
Follow detection-specific best practices ........................................... 457

Chapter 23 Increasing the Inspection Content Size ........................ 459

Increasing the inspection content size ............................................. 459

Chapter 24 Installing remote indexers ............................................... 463

About installing remote indexers .................................................... 589
Installing a remote indexer on Windows ........................................... 464
Installing a remote indexer on Linux ................................................ 466
Configuring a remote indexer on Linux ............................................ 466

Chapter 25 Detecting content using Exact Match Data

Identifiers (EMDI) ........................................................ 468
Introducing Exact Match Data Identifiers (EMDI) ............................... 468
About using EMDI to protect content ........................................ 469
About EMDI and key columns ................................................. 470
About EMDI policy features .................................................... 470
EMDI compared to EDM ........................................................ 471
About the Exact Match Data Identifier profile and index ................ 473
About the Exact Match Data Identifier source file ........................ 473
About cleansing the Exact Match Data Identifier source file ........... 474
About EMDI index scheduling ................................................. 475
Configuring Exact Match Data Identifier profiles ................................ 476
Creating the Exact Match Data Identifier source file ..................... 477
Preparing the Exact Match Data Identifier source for
indexing ....................................................................... 478
Contents 14

Uploading the Exact Match Data Identifier source files to the

Enforce Server ............................................................... 480
Adding Exact Match Data Identifier Profiles ................................ 482
Creating and modifying the Exact Match Data Identifier
profiles ......................................................................... 483
Scheduling EMDI profile indexing ............................................ 485
Associating data identifiers with your data source (EMDI) ............. 486
Adding an EMDI check to a built-in or custom data identifier
condition in a policy ........................................................ 487
Using multi-token matching with EMDI ............................................ 488
Characteristics of multi-token cells for EMDI .............................. 489
Multi-token with spaces for EMDI ............................................. 490
Multi-token with mixed language characters for EMDI .................. 490
Multi-token with punctuation for EMDI ....................................... 491
Additional examples for multi-token cells with punctuation for
EMDI ........................................................................... 492
Multi-token punctuation characters for EMDI .............................. 495
Proximity matching example for EMDI ...................................... 496
Memory requirements for EMDI ..................................................... 498
EMDI memory configuration and limitations ............................... 499
Overview of configuring memory and indexing the data source for
EMDI ........................................................................... 500
Determining requirements for both local indexers and remote
indexers for EMDI ........................................................... 500
Detection server memory requirements for EMDI ........................ 501
Increasing the memory for the detection server (File Reader) for
EMDI ........................................................................... 503
Profile size limitations on the DLP Agent for EMDI ...................... 504
Remote EMDI indexing ................................................................ 504
About the Remote EMDI Indexer ............................................. 505
About the SQL Preindexer and EMDI ....................................... 505
System requirements for remote EMDI indexing ......................... 505
Workflow for remote EMDI indexing ......................................... 506
About installing the Remote EMDI indexer ................................. 507
Creating an EMDI profile template for remote indexing ................. 508
Downloading and copying the EMDI profile file to a remote
system ......................................................................... 509
Generating remote index files for EMDI ..................................... 509
Remote EMDI indexing examples using data source file ............... 510
Remote EMDI Indexer command options .................................. 511
Remote EMDI indexing examples using the SQL Preindexer ......... 512
Copying and loading EMDI remote index files to the Enforce
Server .......................................................................... 513
Contents 15

Troubleshooting EMDI preindexing errors .................................. 514

Properties file settings for EMDI ..................................................... 515
Best practices for using EMDI ....................................................... 517
Never use a personal identifier as an optional column in
EMDI ........................................................................... 519
Use three or more columns in a match for EMDI ......................... 519
Don’t use EMDI validators as both optional and required for a
given data identifier in a policy .......................................... 519
Use additional validators with EMDI where possible ..................... 519
Limit the required number of columns to two or three for
EMDI ........................................................................... 519
When matching with only a single optional column, avoid adding
low-variability values as optional columns with EMDI ............. 519
Use full disk encryption on EMDI endpoint deployments ............... 519
Cleanse the EMDI data source file of blank columns and duplicate
rows ............................................................................ 519
Remove ambiguous character types from the EMDI data source
file ............................................................................... 520
Clean up your EMDI data source for multi-token matching ............ 521
Do not use the comma delimiter if the EMDI data source has
number fields ................................................................. 521
Ensure that the EMDI data source is clean for indexing ................ 522
Include column headers as the first row of the EMDI data source
file ............................................................................... 522
Check the EMDI system alerts to tune profile accuracy ................ 522
Use scheduled indexing to automate EMDI profile updates ........... 523
EMDI Troubleshooting ................................................................. 523
The EMDI index doesn’t get published to the Endpoint
Agent ........................................................................... 523
The EMDI index doesn’t get published to the Endpoint Agent and
the EnabledOnAgents setting is true ................................... 523
A key column that is in an EMDI index doesn’t generate an incident
................................................................................... 524
EMDI generates an unexpectedly high number of false
positives ....................................................................... 524

Chapter 26 Detecting content using Exact Data Matching

(EDM) ............................................................................. 525

Introducing Exact Data Matching (EDM) .......................................... 525

About using EDM to protect content ......................................... 526
EDM policy features .............................................................. 527
About the Exact Data Profile and index ..................................... 528
Contents 16

About the exact data source file ............................................... 529

About cleansing the exact data source file for EDM ..................... 530
About using System Fields for data source validation with
EDM ............................................................................ 530
About index scheduling for EDM .............................................. 531
About the Content Matches Exact Data From condition for
EDM ............................................................................ 532
About Data Owner Exception for EDM ...................................... 532
About profiled Directory Group Matching (DGM) for EDM ............. 533
About two-tier detection for EDM on the endpoint ........................ 533
About upgrading EDM deployments ......................................... 534
Configuring Exact Data profiles for EDM .......................................... 534
Creating the exact data source file for EDM ............................... 535
Creating the exact data source file for Data Owner Exception for
EDM ............................................................................ 536
Creating the exact data source file for profiled DGM for
EDM ............................................................................ 537
Preparing the exact data source file for indexing for EDM ............. 537
Uploading exact data source files for EDM to the Enforce
Server .......................................................................... 539
Creating and modifying Exact Data Profiles for EDM .................... 541
Mapping Exact Data Profile fields for EDM ................................. 545
Using system-provided pattern validators for EDM profiles ............ 547
Scheduling Exact Data Profile indexing for EDM ......................... 548
Managing and adding Exact Data Profiles for EDM ...................... 550
Configuring EDM policies ............................................................. 551
Configuring the Content Matches Exact Data policy condition for
EDM ............................................................................ 551
Configuring Data Owner Exception for EDM policy
conditions ..................................................................... 554
Configuring the Sender/User based on a Profiled Directory policy
condition for EDM ........................................................... 554
Configuring the Recipient based on a Profiled Directory policy
condition for EDM ........................................................... 555
About configuring natural language processing for Chinese,
Japanese, and Korean for EDM policies .............................. 556
Configuring Advanced Settings for EDM policies ......................... 557
Using multi-token matching with EDM ............................................. 560
Characteristics of multi-token cells (EDM) .................................. 560
Multi-token with spaces (EDM) ................................................ 561
Multi-token with stopwords (EDM) ............................................ 562
Multi-token with mixed language characters (EDM) ..................... 562
Multi-token with punctuation (EDM) .......................................... 563
Contents 17

Additional examples for multi-token cells with punctuation

(EDM) .......................................................................... 564
Some special use cases for system-recognized data patterns
(EDM) .......................................................................... 567
Multi-token punctuation characters (EDM) ................................. 569
Match count variant examples (EDM) ....................................... 570
Proximity matching example for EDM ....................................... 572
Updating EDM indexes to the latest version ..................................... 574
Update process using the Remote EDM Indexer ......................... 574
Update process using the Enforce Server for EDM ...................... 576
EDM index out-of-date error codes ........................................... 578
Memory requirements for EDM ...................................................... 579
About memory requirements for EDM ....................................... 579
Overview of configuring memory and indexing the data source for
EDM ............................................................................ 580
Determining requirements for both local and remote indexers for
EDM ............................................................................ 580
Detection server memory requirements for EDM ......................... 582
Increasing the memory for the detection server (File Reader) for
EDM ............................................................................ 584
Using the EDM Memory Requirements Spreadsheet ................... 585
Remote EDM indexing ................................................................. 585
About the Remote EDM Indexer .............................................. 586
About the SQL Preindexer for EDM .......................................... 586
System requirements for remote EDM indexing .......................... 587
Workflow for remote EDM indexing .......................................... 587
Installing the Remote EDM Indexer .......................................... 588
Creating an EDM profile template for remote indexing .................. 589
Downloading and copying the EDM profile file to a remote
system ......................................................................... 591
Generating remote index files for EDM ...................................... 591
Remote indexing examples using data source file (EDM) .............. 592
Remote indexing examples using SQL Preindexer (EDM) ............. 593
Copying and loading remote EDM index files to the Enforce
Server .......................................................................... 594
SQL Preindexer command options (EDM) ................................. 595
Remote EDM Indexer command options ................................... 597
Troubleshooting preindexing errors for EDM .............................. 598
Troubleshooting remote indexing errors for EDM ......................... 599
Best practices for using EDM ........................................................ 601
Ensure data source has at least one column of unique data
(EDM) .......................................................................... 602
Contents 18

Cleanse the data source file of blank columns and duplicate rows
(EDM) .......................................................................... 603
Remove ambiguous character types from the data source file
(EDM) .......................................................................... 604
Understand how multi-token cell matching functions (EDM) ........... 604
Do not use the comma delimiter if the data source has number
fields (EDM) .................................................................. 605
Map data source column to system fields to leverage validation
(EDM) .......................................................................... 605
Ensure that the data source is clean for indexing (EDM) ............... 605
Leverage EDM policy templates when possible .......................... 606
Include column headers as the first row of the data source file
(EDM) .......................................................................... 606
Check the system alerts to tune profile accuracy (EDM) ............... 607
Use stopwords to exclude common words from detection
(EDM) .......................................................................... 607
Use scheduled indexing to automate profile updates (EDM) .......... 607
Match on 3 columns in an EDM condition to increase detection
accuracy ....................................................................... 608
Leverage exception tuples to avoid false positives (EDM) ............. 609
Use a WHERE clause to detect records that meet specific criteria
(EDM) .......................................................................... 609
Use the minimum matches field to fine tune EDM rules ................ 610
Combine Data Identifiers with EDM rules to limit the impact of
two-tier detection ............................................................ 610
Include an email address field in the Exact Data Profile for profiled
DGM (EDM) .................................................................. 610
Use profiled DGM for Network Prevent for Web identity detection
(EDM) .......................................................................... 611

Chapter 27 Detecting content using Indexed Document

Matching (IDM) ............................................................ 612
Introducing Indexed Document Matching (IDM) ................................. 612
About using IDM .................................................................. 613
Supported forms of matching for IDM ....................................... 613
Types of IDM detection .......................................................... 614
About the Indexed Document Profile ........................................ 615
About the document data source ............................................. 616
About the indexing process .................................................... 616
About indexing remote documents ........................................... 617
About the server index files and the agent index files ................... 618
About index deployment and logging ........................................ 619
Contents 19

Using IDM to detect exact files ................................................ 620

Using IDM to detect exact and partial file contents ....................... 621
About using the Content Matches Document Signature policy
condition ....................................................................... 623
About white listing partial file contents ....................................... 624
Configuring IDM profiles and policy conditions .................................. 625
Preparing the document data source for indexing ........................ 625
White listing file contents to exclude from partial matching ............ 627
Manage and add Indexed Document Profiles ............................. 628
Creating and modifying Indexed Document Profiles ..................... 629
Configure endpoint partial content matching ............................... 632
Uploading a document archive to the Enforce Server ................... 633
Referencing a document archive on the Enforce Server ............... 634
Using local path on Enforce Server .......................................... 636
Using the remote SMB share option to index file shares ............... 637
Using the remote SMB share option to index SharePoint
documents .................................................................... 637
Filtering documents by file name ............................................. 640
Filtering documents by file size ................................................ 642
Scheduling document profile indexing ....................................... 643
Changing the default indexer properties .................................... 644
Enabling Agent IDM .............................................................. 645
Estimating endpoint memory use for agent IDM .......................... 646
Configuring the Content Matches Document Signature policy
condition ....................................................................... 646
Best practices for using IDM ......................................................... 648
Reindex IDM profiles after upgrade .......................................... 649
Do not compress files in the document source ............................ 649
Do not index empty documents ............................................... 649
Prefer partial matching over exact matching on the DLP
Agent ........................................................................... 650
Understand limitations of exact matching ................................... 650
Use white listing to exclude non-sensitive content from partial
matching ...................................................................... 651
Filter documents from indexing to reduce false positives ............... 652
Distinguish IDM exceptions from white listing and filtering ............. 652
Create separate profiles to index large document sources ............ 653
Use WebDAV or CIFS to index remote document data
sources ........................................................................ 653
Use scheduled indexing to keep profiles up to date ..................... 653
Use parallel IDM rules to tune match thresholds ......................... 654
Remote IDM indexing .................................................................. 655
About the Remote IDM Indexer ............................................... 655
Contents 20

Installing the Remote IDM Indexer .......................................... 656

Indexing the document data source using the GUI edition
(Windows only) .............................................................. 656
Scheduling remote indexing with the Remote IDM Indexer app
for Windows .................................................................. 659
Incremental indexing ............................................................. 661
Logging and troubleshooting ................................................... 662
Copying the preindex file to the Enforce Server host .................... 662
Loading the remote index file into the Enforce Server ................... 663

Chapter 28 Detecting content using Vector Machine Learning

(VML) .............................................................................. 664
Introducing Vector Machine Learning (VML) ..................................... 664
About the Vector Machine Learning Profile ................................ 665
About the content you train ..................................................... 665
About the base accuracy from training percentage rates ............... 666
About the Similarity Threshold and Similarity Score ..................... 667
About using unaccepted VML profiles in policies ......................... 667
Configuring VML profiles and policy conditions ................................. 668
Creating new VML profiles ..................................................... 669
Working with the Current Profile and Temporary Workspace
tabs ............................................................................. 670
Uploading example documents for training ................................ 671
Training VML profiles ............................................................ 672
Adjusting the memory allocation .............................................. 675
Managing training set documents ............................................ 676
Managing VML profiles .......................................................... 677
Changing names and descriptions for VML profiles ..................... 679
Configuring the Detect using Vector Machine Learning Profile
condition ....................................................................... 679
Configuring VML policy exceptions ........................................... 680
Adjusting the Similarity Threshold ............................................ 681
Testing and tuning VML profiles ............................................... 682
Properties for configuring training ............................................ 683
Log files for troubleshooting VML training and policy
detection ...................................................................... 686
Best practices for using VML ......................................................... 687
When to use VML ................................................................. 688
When not to use VML ............................................................ 689
Recommendations for training set definition ............................... 689
Guidelines for training set sizing .............................................. 690
Recommendations for uploading documents for training ............... 691
Contents 21

Guidelines for profile sizing ..................................................... 691

Recommendations for accepting or rejecting a profile .................. 692
Guidelines for accepting or rejecting training results .................... 693
Recommendations for deploying profiles ................................... 694

Chapter 29 Detecting content using Form Recognition -

Sensitive Image Recognition ..................................... 695
About Form Recognition detection .................................................. 695
How Form Recognition works ................................................. 696
Configuring Form Recognition detection .......................................... 696
Preparing a Form Recognition Gallery Archive ........................... 697
Configuring a Form Recognition profile ..................................... 698
Configuring the Form Recognition detection rule ......................... 699
Configuring the Form Recognition exception rule ........................ 700
Managing Form Recognition profiles ............................................... 700
Advanced server settings for Form Recognition ................................ 702
Viewing a Form Recognition incident .............................................. 703

Chapter 30 Detecting Content using OCR - Sensitive Image

Recognition ................................................................... 704
About content detection with OCR Sensitive Image Recognition ........... 705
Detection types supported for OCR extraction ............................ 705
File types supported for OCR extraction .................................... 705
About extracting images from Microsoft Office documents for OCR
and Form Recognition ..................................................... 706
OCR Server system requirements .................................................. 706
Using diagnostics for sizing OCR Server deployments ........................ 706
Creating a null policy to assist in OCR diagnostics for Discover
Servers ............................................................................... 708
Using the OCR Server Sizing Estimator spreadsheet ......................... 710
Setting up OCR Servers ............................................................... 710
Installing an OCR Sensitive Image Recognition license ...................... 711
Creating an OCR configuration ...................................................... 711
Using the OCR engine ................................................................. 713
More about languages and Dictionaries ........................................... 713
Specialized Dictionaries available for OCR content
extraction ...................................................................... 714
Languages supported for OCR extraction .................................. 714
Viewing OCR incidents in reports ................................................... 715
Advanced Server settings and Troubleshooting for Sensitive Image
Recognition content extraction ................................................. 715
Contents 22

Chapter 31 Detecting content using data identifiers ...................... 717

Introducing data identifiers ............................................................ 717
System-defined data identifiers ............................................... 718
Extending and customizing data identifiers ................................ 731
About data identifier configuration ............................................ 731
About data identifier breadths ................................................. 731
About optional validators for data identifiers ............................... 732
About data identifier patterns .................................................. 732
About pattern validators ......................................................... 733
About data normalizers .......................................................... 733
About cross-component matching ............................................ 733
About unique match counting .................................................. 734
Configuring data identifier policy conditions ...................................... 734
Workflow for configuring data identifier policies ........................... 734
Managing and adding data identifiers ....................................... 735
Editing data identifiers ........................................................... 736
Configuring the Content Matches data identifier condition ............. 737
Using data identifier breadths .................................................. 738
Selecting a data identifier breadth ............................................ 739
Using optional validators ........................................................ 762
Configuring optional validators ................................................ 763
Acceptable characters for optional validators .............................. 764
Using unique match counting .................................................. 775
Configuring unique match counting .......................................... 775
Modifying system data identifiers ................................................... 776
Cloning a system data identifier before modifying it ..................... 777
Editing pattern validator input .................................................. 778
List of pattern validators that accept input data ........................... 778
Editing keywords for international PII data identifiers .................... 779
List of keywords for international system data identifiers ............... 780
Updating policies to use the Randomized US SSN data
identifier ....................................................................... 810
Creating custom data identifiers ..................................................... 811
Workflow for creating custom data identifiers .............................. 812
Custom data identifier configuration ......................................... 814
Using the data identifier pattern language .................................. 814
Writing data identifier patterns to match data .............................. 817
Using pattern validators ......................................................... 818
Selecting pattern validators .................................................... 829
Selecting a data normalizer .................................................... 830
Creating custom script validators ............................................. 831
Configuring pre- and post-validators ......................................... 831
Contents 23

Best practices for using data identifiers ........................................... 833

Use data identifiers instead of regular expressions to improve
accuracy ....................................................................... 834
Clone system-defined data identifiers before modifying to preserve
original state .................................................................. 835
Modify data identifier definitions when you want tuning to apply
globally ........................................................................ 835
Consider using multiple breadths in parallel to detect different
severities of confidential data ............................................ 836
Avoid matching on the Envelope over HTTP to reduce false
positives ....................................................................... 836
Use the Randomized US SSN data identifier to detect SSNs ......... 836
Use unique match counting to improve accuracy and ease
remediation ................................................................... 837

Chapter 32 Detecting content using keyword matching ................ 838

Introducing keyword matching ....................................................... 838
About keyword matching for Chinese, Japanese, and Korean
(CJK) languages ............................................................ 839
About keyword proximity ........................................................ 840
Keyword matching syntax ...................................................... 840
Keyword matching examples .................................................. 841
Keyword matching examples for CJK languages ......................... 842
About updates to the Drug, Disease, and Treatment keyword
lists ............................................................................. 843
Configuring keyword matching ...................................................... 844
Configuring the Content Matches Keyword condition ................... 844
Enabling and using CJK token verification for server keyword
matching ...................................................................... 847
Updating the Drug, Disease, and Treatment keyword lists for your
HIPAA and Caldicott policies ............................................. 848
Best practices for using keyword matching ....................................... 849
Enable token verification on the server to reduce false positives
for CJK keyword detection ................................................ 850
Keep the keyword lists for your HIPAA and Caldicott policies up
to date ......................................................................... 850
Tune keywords lists for data identifiers to improve match
accuracy ....................................................................... 851
Use keyword matching to detect document metadata ................... 851
Use VML to generate and maintain large keyword
dictionaries ................................................................... 851
Contents 24

Chapter 33 Detecting content using regular expressions .............. 852

Introducing regular expression matching ......................................... 852
About the updated regular expression engine ................................... 853
About writing regular expressions ................................................... 853
Configuring the Content Matches Regular Expression condition ........... 854
Best practices for using regular expression matching ......................... 855
When to use regular expression matching ................................. 856
Use look ahead and look behind characters to improve regular
expression accuracy ....................................................... 856
Use regular expressions sparingly to support efficient
performance .................................................................. 857
Test regular expressions before deployment to improve
accuracy ....................................................................... 857

Chapter 34 Detecting content using classification

matching ....................................................................... 858

Introducing classification matching ................................................. 858

Supported file types .................................................................... 859
How tag matching works .............................................................. 860
Configuring the Content Matches Classification condition .................... 863

Chapter 35 Detecting international language content ................... 866

Detecting non-English language content .......................................... 866
Best practices for detecting non-English language content .................. 867
Use international policy templates for policy creation ................... 867
Use custom keywords for system data identifiers ........................ 869
Enable token validation to match Chinese, Japanese, and Korean
keywords on the server .................................................... 899

Chapter 36 Detecting file properties .................................................. 900

Introducing file property detection ................................................... 900
About file type matching ......................................................... 900
About file format support for file type matching ........................... 901
About custom file type identification .......................................... 901
About file size matching ......................................................... 902
About file name matching ....................................................... 903
Configuring file property matching .................................................. 903
Configuring the Message Attachment or File Type Match
condition ....................................................................... 904
Contents 25

Configuring the Message Attachment or File Size Match

condition ....................................................................... 905
Configuring the Message Attachment or File Name Match
condition ....................................................................... 906
File name matching syntax ..................................................... 907
File name matching examples ................................................. 907
Enabling the Custom File Type Signature condition in the policy
console ........................................................................ 908
Configuring the Custom File Type Signature condition .................. 908
Best practices for using file property matching .................................. 909
Use compound file property rules to protect design and multimedia
files ............................................................................. 909
Do not use file type matching to detect content ........................... 910
Calculate file size properly to improve match accuracy ................. 910
Use expression patterns to match file names ............................. 910
Use scripts and plugins to detect custom file types ...................... 910

Chapter 37 Detecting network incidents ........................................... 912

Introducing protocol monitoring for network ...................................... 912
Configuring the Protocol Monitoring condition for network
detection ............................................................................. 913
Best practices for using network protocol matching ............................ 914
Use separate policies for specific protocols ................................ 914
Consider detection server network placement to support IP
address matching ........................................................... 914

Chapter 38 Detecting endpoint events .............................................. 915

Introducing endpoint event detection .............................................. 915
About endpoint protocol monitoring .......................................... 915
About endpoint destination monitoring ...................................... 916
About endpoint global application monitoring .............................. 916
About endpoint location detection ............................................ 917
About endpoint device detection .............................................. 917
Configuring endpoint event detection conditions ................................ 917
Configuring the Endpoint Monitoring condition ............................ 918
Configuring the Endpoint Location condition ............................... 919
Configuring the Endpoint Device Class or ID condition ................. 920
Gathering endpoint device IDs for removable devices .................. 921
Creating and modifying endpoint device configurations ................ 922
Best practices for using endpoint detection ...................................... 923
Contents 26

Chapter 39 Detecting described identities ........................................ 925

Introducing described identity matching ........................................... 925
Described identity matching examples ............................................ 925
Configuring described identity matching policy conditions .................... 926
About Reusable Sender/Recipient Patterns ............................... 927
Configuring the Sender/User Matches Pattern condition ............... 927
Configuring a Reusable Sender Pattern .................................... 929
Configuring the Recipient Matches Pattern condition ................... 930
Configuring a Reusable Recipient Pattern ................................. 931
Best practices for using described identity matching ........................... 932
Define precise identity patterns to match users ........................... 932
Specify email addresses exactly to improve accuracy .................. 933
Match domains instead of IP addresses to improve
accuracy ....................................................................... 933

Chapter 40 Detecting synchronized identities ................................. 935

Introducing synchronized Directory Group Matching (DGM) ................. 935
About two-tier detection for synchronized DGM ................................. 936
Configuring User Groups .............................................................. 936
Configuring synchronized DGM policy conditions .............................. 938
Configuring the Sender/User based on a Directory Server Group
condition ....................................................................... 939
Configuring the Recipient based on a Directory Server Group
condition ....................................................................... 940
Best practices for using synchronized DGM ..................................... 941
Refresh the directory on initial save of the User Group ................. 941
Distinguish synchronized DGM from other types endpoint
detection ...................................................................... 941

Chapter 41 Detecting profiled identities ........................................... 942

Introducing profiled Directory Group Matching (DGM) ......................... 942
About two-tier detection for profiled DGM ......................................... 942
Configuring Exact Data profiles for DGM ......................................... 943
Configuring profiled DGM policy conditions ...................................... 944
Configuring the Sender/User based on a Profiled Directory
condition ....................................................................... 944
Configuring the Recipient based on a Profiled Directory
condition ....................................................................... 945
Best practices for using profiled DGM ............................................. 946
Follow EDM best practices when implementing profiled
DGM ............................................................................ 946
Contents 27

Include an email address field in the Exact Data Profile for profiled
DGM ............................................................................ 946
Use profiled DGM for Network Prevent for Web identity
detection ...................................................................... 947

Chapter 42 Using contextual attributes for Application

Detection ....................................................................... 948
Introducing contextual attributes for cloud applications ....................... 948
Configuring contextual attribute conditions ....................................... 948
Contextual attribute categories ................................................ 949

Chapter 43 Supported file formats for detection ............................ 962

Overview of detection file format support ......................................... 962
Supported formats for file type identification ..................................... 964
Supported formats for content extraction ......................................... 980
Supported word-processing formats for content extraction ............ 980
Supported presentation formats for content extraction .................. 982
Supported spreadsheet formats for content extraction .................. 983
Supported text and markup formats for content extraction ............. 984
Supported email formats for content extraction ........................... 985
Supported CAD formats for content extraction ............................ 985
Supported graphics formats for content extraction ....................... 986
Supported database formats for content extraction ...................... 986
Other file formats supported for content extraction ....................... 986
Supported encapsulation formats for subfile extraction ....................... 987
Supported file formats for metadata extraction .................................. 989
About document metadata detection ........................................ 989
Enabling server metadata detection ......................................... 990
Enabling endpoint metadata detection ...................................... 990
Best practices for using metadata detection ............................... 991

Chapter 44 Supported Office Open XML formats for

high-performance content extraction ..................... 996
About high-performance content extraction for Office Open XML
formats ............................................................................... 996
Enabling high-performance content extraction for Office Open XML
files .................................................................................... 998
About metadata extraction for Office Open XML files .......................... 999
About subfile extraction for Office Open XML files ............................ 1000
Contents 28

Chapter 45 Library of system data identifiers ................................ 1004

Library of system data identifiers .................................................. 1013
ABA Routing Number ................................................................. 1013
ABA Routing Number wide breadth ........................................ 1013
ABA Routing Number medium breadth .................................... 1014
ABA Routing Number narrow breadth ..................................... 1014
Argentina Tax Identification Number .............................................. 1015
Argentina Tax Identification Number wide breadth ..................... 1016
Argentina Tax Identification Number medium breadth ................. 1016
Argentina Tax Identification Number narrow breadth .................. 1017
Australia Driver's License Number ................................................ 1018
Australia Driver's License Number wide breadth ........................ 1018
Australia Driver's License Number narrow breadth ..................... 1019
Australian Business Number ....................................................... 1020
Australian Business Number wide breadth ............................... 1020
Australian Business Number medium breadth ........................... 1021
Australian Business Number narrow breadth ............................ 1021
Australian Company Number ....................................................... 1022
Australian Company Number wide breadth .............................. 1023
Australian Company Number medium breadth .......................... 1023
Australian Company Number narrow breadth ........................... 1023
Australian Medicare Number ....................................................... 1024
Australian Medicare Number wide breadth ............................... 1024
Australian Medicare Number medium breadth .......................... 1025
Australian Medicare Number narrow breadth ............................ 1026
Australian Passport Number ........................................................ 1027
Australian Passport Number wide breadth ............................... 1027
Australian Passport Number narrow breadth ............................ 1028
Australian Tax File Number ......................................................... 1029
Australian Tax File Number wide breadth ................................. 1029
Australian Tax File Number narrow breadth .............................. 1029
Austria Passport Number ............................................................ 1030
Austria Passport Number wide breadth ................................... 1030
Austria Passport Number narrow breadth ................................ 1031
Austria Tax Identification Number ................................................. 1031
Austria Tax Identification Number wide breadth ......................... 1032
Austria Tax Identification Number narrow breadth ...................... 1032
Austria Value Added Tax (VAT) Number ......................................... 1033
Austria Value Added Tax (VAT) Number wide breadth ................. 1033
Austria Value Added Tax (VAT) Number medium breadth ............ 1034
Austria Value Added Tax (VAT) Number narrow breadth .............. 1035
Austrian Social Security Number .................................................. 1036
Contents 29

Austrian Social Security Number wide breadth .......................... 1036

Austrian Social Security Number medium breadth ..................... 1037
Austrian Social Security Number narrow breadth ....................... 1037
Belgian National Number ............................................................ 1039
Belgian National Number wide breadth ................................... 1040
Belgian National Number medium breadth ............................... 1040
Belgian National Number narrow breadth ................................. 1041
Belgium Driver's Licence Number ................................................. 1042
Belgium Driver's Licence Number wide breadth ........................ 1042
Belgium Driver's Licence Number narrow breadth ..................... 1043
Belgium Passport Number .......................................................... 1044
Belgium Passport Number wide breadth .................................. 1044
Belgium Passport Number narrow breadth ............................... 1044
Belgium Tax Identification Number ................................................ 1045
Belgium Tax Identification Number wide breadth ....................... 1045
Belgium Tax Identification Number narrow breadth .................... 1046
Belgium Value Added Tax (VAT) Number ....................................... 1047
Belgium Value Added Tax (VAT) Number wide breadth ............... 1048
Belgium Value Added Tax (VAT) Number medium breadth ........... 1048
Belgium Value Added Tax (VAT) Number narrow breadth ............ 1049
Brazilian Election Identification Number ......................................... 1049
Brazilian Election Identification Number wide breadth ................. 1050
Brazilian Election Identification Number medium breadth ............ 1051
Brazilian Election Identification Number narrow breadth .............. 1052
Brazilian National Registry of Legal Entities Number ........................ 1053
Brazilian National Registry of Legal Entities Number wide
breadth ....................................................................... 1054
Brazilian National Registry of Legal Entities Number medium
breadth ....................................................................... 1054
Brazilian National Registry of Legal Entities Number narrow
breadth ....................................................................... 1055
Brazilian Natural Person Registry Number (CPF) ............................. 1055
Brazilian Natural Person Registry Number wide breadth ............. 1056
Brazilian Natural Person Registry Number medium breadth ......... 1056
Brazilian Natural Person Registry Number narrow breadth ......... 1057
British Columbia Personal Healthcare Number ................................ 1058
British Columbia Personal Healthcare Number wide breadth ....
1 0 5 8
British Columbia Personal Healthcare Number medium
breadth ....................................................................... 1058
British Columbia Personal Healthcare Number narrow
breadth ....................................................................... 1059
Bulgaria Value Added Tax (VAT) Number ....................................... 1060
Contents 30

Bulgaria Value Added Tax (VAT) Number wide breadth ............... 1061
Bulgaria Value Added Tax (VAT) Number medium breadth .......... 1061
Bulgaria Value Added Tax (VAT) Number narrow breadth ............ 1062
Bulgarian Uniform Civil Number - EGN .......................................... 1063
Bulgarian Uniform Civil Number - EGN wide breadth .................. 1063
Bulgarian Uniform Civil Number - EGN medium breadth ............. 1064
Bulgarian Uniform Civil Number - EGN narrow breadth ............... 1065
Burgerservicenummer ................................................................ 1066
Burgerservicenummer wide breadth ....................................... 1066
Burgerservicenummer narrow breadth .................................... 1066
Canada Driver's License Number ................................................. 1067
Canada Driver's License Number wide breadth ......................... 1067
Canada Driver's License Number medium breadth .................... 1068
Canada Driver's License Number narrow breadth ...................... 1069
Canada Passport Number .......................................................... 1070
Canada Passport Number wide breadth .................................. 1071
Canada Passport Number narrow breadth ............................... 1071
Canada Permanent Residence (PR) Number .................................. 1072
Canada Permanent Residence (PR) Number wide breadth ......... 1072
Canada Permanent Residence (PR) Number narrow
breadth ....................................................................... 1073
Canadian Social Insurance Number .............................................. 1074
Canadian Social Insurance Number wide breadth ...................... 1075
Canadian Social Insurance Number medium breadth ................. 1075
Canadian Social Insurance Number narrow breadth ................... 1076
Chilean National Identification Number .......................................... 1077
Chilean National Identification Number wide breadth .................. 1077
Chilean National Identification Number medium breadth ............. 1078
Chilean National Identification Number narrow breadth ............... 1078
China Passport Number ............................................................. 1079
China Passport Number wide breadth ..................................... 1080
China Passport Number narrow breadth .................................. 1080
Codice Fiscale .......................................................................... 1081
Codice Fiscale wide breadth ................................................. 1081
Codice Fiscale narrow breadth .............................................. 1082
Colombian Addresses ................................................................ 1082
Colombian Addresses wide breadth ........................................ 1083
Colombian Addresses narrow breadth ..................................... 1084
Colombian Cell Phone Number .................................................... 1085
Colombian Cell Phone Number wide breadth ............................ 1085
Colombian Cell Phone Number narrow breadth ......................... 1086
Colombian Personal Identification Number ..................................... 1088
Colombian Personal Identification Number wide breadth ............. 1088
Contents 31

Colombian Personal Identification Number narrow breadth .......... 1089

Colombian Tax Identification Number ............................................ 1090
Colombian Tax Identification Number wide breadth .................... 1090
Colombian Tax Identification Number narrow breadth ................. 1091
Credit Card Magnetic Stripe Data ................................................. 1092
Credit Card Number .................................................................. 1095
Credit Card Number wide breadth .......................................... 1095
Credit Card Number medium breadth ...................................... 1096
Credit Card Number narrow breadth ....................................... 1100
Croatia National Identification Number ........................................... 1104
Croatia National Identification Number wide breadth .................. 1105
Croatia National Identification Number medium breadth .............. 1105
Croatia National Identification Number narrow breadth ............... 1105
CUSIP Number ......................................................................... 1106
CUSIP Number wide breadth ................................................ 1107
CUSIP Number medium breadth ............................................ 1107
CUSIP Number narrow breadth ............................................. 1108
Cyprus Tax Identification Number ................................................. 1109
Cyprus Tax Identification Number wide breadth ......................... 1109
Cyprus Tax Identification Number medium breadth .................... 1109
Cyprus Tax Identification Number narrow breadth ...................... 1110
Cyprus Value Added Tax (VAT) Number ......................................... 1111
Cyprus Value Added Tax (VAT) Number wide breadth ................ 1111
Cyprus Value Added Tax (VAT) Number medium breadth ............ 1111
Cyprus Value Added Tax (VAT) Number narrow breadth ............. 1112
Czech Republic Driver's Licence Number ....................................... 1112
Czech Republic Driver's License Number wide breadth .............. 1113
Czech Republic Driver's License Number narrow breadth ........... 1113
Czech Republic Personal Identification Number .............................. 1114
Czech Republic personal Identification Number wide
breadth ....................................................................... 1115
Czech Republic Personal Identification Number medium
breadth ....................................................................... 1115
Czech Republic Personal Identification Number narrow
breadth ....................................................................... 1116
Czech Republic Tax Identification Number ...................................... 1117
Czech Republic Tax Identification Number wide breadth ............. 1118
Czech Republic Tax Identification Number medium breadth ......... 1119
Czech Republic Tax Identification Number narrow breadth .......... 1120
Czech Republic Value Added Tax (VAT) Number ............................. 1121
Czech Republic Value Added Tax (VAT) Number wide
breadth ....................................................................... 1122
Contents 32

Czech Republic Value Added Tax (VAT) Number medium

breadth ....................................................................... 1123
Czech Republic Value Added Tax (VAT) Number narrow
breadth ....................................................................... 1124
Denmark Personal Identification Number ....................................... 1126
Denmark Personal Identification Number wide breadth ............... 1126
Denmark Personal Identification Number medium breadth .......... 1126
Denmark Personal Identification Number narrow breadth ............ 1127
Denmark Tax Identification Number .............................................. 1128
Denmark Tax Identification Number wide breadth ...................... 1128
Denmark Tax Identification Number medium breadth .................. 1129
Denmark Tax Identification Number narrow breadth ................... 1129
Denmark Value Added Tax (VAT) Number ...................................... 1130
Denmark Value Added Tax (VAT) Number wide breadth .............. 1131
Denmark Value Added Tax (VAT) Number medium breadth ......... 1131
Denmark Value Added Tax (VAT) Number narrow breadth ........... 1132
Driver's License Number – CA State ............................................ 1133
Driver's License Number – CA State wide breadth ..................... 1133
Driver's License Number – CA State medium breadth ................ 1134
Driver's License Number - FL, MI, MN States .................................. 1134
Driver's License Number- FL, MI, MN States wide breadth .......... 1135
Driver's License Number- FL, MI, MN States medium
breadth ....................................................................... 1135
Driver's License Number - IL State ............................................... 1136
Driver's License Number- IL State wide breadth ........................ 1136
Driver's License Number- IL State medium breadth .................... 1137
Driver's License Number - NJ State .............................................. 1138
Driver's License Number- NJ State wide breadth ....................... 1138
Driver's License Number- NJ State medium breadth .................. 1138
Driver's License Number - NY State .............................................. 1139
Driver's License Number- NY State wide breadth ...................... 1139
Driver's License Number - NY State medium breadth ................. 1140
Driver's License Number - WA State ............................................. 1140
Driver's License Number - WA State wide breadth ..................... 1141
Driver's License Number - WA State medium breadth ................ 1141
Driver's License Number - WA State narrow breadth .................. 1142
Driver's License Number - WI State .............................................. 1142
Driver's License Number - WI State wide breadth ...................... 1143
Driver's License Number - WI State medium breadth .................. 1143
Driver's License Number - WI State narrow breadth ................... 1144
Drug Enforcement Agency (DEA) Number ...................................... 1145
Drug Enforcement Agency (DEA) Number wide breadth ............. 1145
Drug Enforcement Agency (DEA) Number medium breadth ......... 1146
Contents 33

Drug Enforcement Agency (DEA) Number narrow breadth .......... 1146

Estonia Driver's Licence Number .................................................. 1147
Estonia Driver's Licence Number wide breadth ......................... 1147
Estonia Driver's Licence Number narrow breadth ...................... 1148
Estonia Passport Number ........................................................... 1149
Estonia Passport Number wide breadth ................................... 1149
Estonia Passport Number narrow breadth ................................ 1150
Estonia Personal Identification Code ............................................. 1151
Estonia Personal Identification Code wide breadth ..................... 1152
Estonia Personal Identification Code medium breadth ................ 1152
Estonia Personal Identification Code narrow breadth .................. 1153
Estonia Value Added Tax (VAT) Number ........................................ 1153
Estonia Value Added Tax (VAT) Number wide breadth ................ 1154
Estonia Value Added Tax (VAT) Number medium breadth ........... 1154
Estonia Value Added Tax (VAT) Number narrow breadth ............. 1155
European Health Insurance Card Number ...................................... 1156
European Health Insurance Card Number wide breadth .............. 1156
European Health Insurance Card Number narrow breadth ........... 1160
Finland Driver's Licence Number .................................................. 1165
Finland Driver's Licence Number wide breadth ......................... 1166
Finland Driver's Licence Number medium breadth ..................... 1166
Finland Driver's Licence Number narrow breadth ...................... 1166
Finland European Health Insurance Number .................................. 1167
Finland European Health Insurance Number wide breadth .......... 1168
Finland European Health Insurance Number narrow breadth ....... 1168
Finland Passport Number ........................................................... 1169
Finland Passport Number wide breadth ................................... 1170
Finland Passport Number narrow breadth ................................ 1170
Finland Tax Identification Number ................................................. 1171
Finland Tax Identification Number wide breadth ........................ 1171
Finland Tax Identification Number medium breadth .................... 1172
Finland Tax Identification Number narrow breadth ..................... 1172
Finland Value Added Tax (VAT) Number ........................................ 1173
Finland Value Added Tax (VAT) Number wide breadth ................ 1173
Finland Value Added Tax (VAT) Number medium breadth ............ 1174
Finland Value Added Tax (VAT) Number narrow breadth ............. 1175
Finnish Personal Identification Number .......................................... 1175
Finnish Personal Identification Number wide breadth ................. 1176
Finnish Personal Identification Number medium breadth ............. 1176
Finnish Personal Identification Number narrow breadth .............. 1176
France Driver's License Number .................................................. 1177
France Driver's License Number wide breadth .......................... 1178
France Driver's License Number narrow breadth ....................... 1178
Contents 34

France Health Insurance Number ................................................. 1179

France Health Insurance Number wide breadth ......................... 1179
France Health Insurance Number narrow breadth ...................... 1180
France Tax Identification Number ................................................. 1181
France Tax Identification Number wide breadth ......................... 1181
France Tax Identification Number narrow breadth ...................... 1181
France Value Added Tax (VAT) Number ......................................... 1182
France Value Added Tax (VAT) Number wide breadth ................. 1182
France Value Added Tax (VAT) Number medium breadth ............ 1183
France Value Added Tax (VAT) Number narrow breadth .............. 1184
French INSEE Code .................................................................. 1185
French INSEE Code wide breadth .......................................... 1185
French INSEE Code narrow breadth ....................................... 1186
French Passport Number ............................................................ 1187
French Passport Number wide breadth ................................... 1187
French Passport Number narrow breadth ................................ 1187
French Social Security Number .................................................... 1188
French Social Security Number wide breadth ........................... 1188
French Social Security Number medium breadth ....................... 1189
French Social Security Number narrow breadth ........................ 1189
German Passport Number .......................................................... 1190
German Passport Number wide breadth .................................. 1190
German Passport Number medium breadth ............................. 1191
German Passport Number narrow breadth ............................... 1191
German Personal ID Number ...................................................... 1192
German Personal ID Number wide breadth .............................. 1192
German Personal ID Number medium breadth .......................... 1193
German Personal ID Number narrow breadth ........................... 1193
Germany Driver's License Number ............................................... 1194
Germany Driver's License Number wide breadth ....................... 1194
Germany Driver's License Number narrow breadth .................... 1195
Germany Value Added Tax (VAT) Number ...................................... 1196
Germany Value Added Tax (VAT) Number wide breadth .............. 1196
Germany Value Added Tax (VAT) Number medium breadth ......... 1196
Germany Value Added Tax (VAT) Number narrow breadth ........... 1197
Germany Tax Identification Number .............................................. 1198
Germany Tax Identification Number wide breadth ...................... 1198
Germany Tax Identification Number medium breadth ................. 1199
Germany Tax Identification Number narrow breadth ................... 1199
Greece Passport Number ........................................................... 1200
Greece Passport Number wide breadth ................................... 1201
Greece Passport Number narrow breadth ................................ 1201
Greece Social Security Number (AMKA) ........................................ 1202
Contents 35

Greece Social Security Number (AMKA) wide breadth ................ 1202

Greece Social Security Number (AMKA) medium breadth ........... 1203
Greece Social Security Number (AMKA) narrow breadth ............. 1203
Greek Tax Identification Number .................................................. 1204
Greek Tax Identification Number wide breadth .......................... 1204
Greek Tax Identification Number medium breadth ...................... 1205
Greek Tax Identification Number narrow breadth ....................... 1205
Greece Value Added Tax (VAT) Number ........................................ 1206
Greece Value Added Tax (VAT) Number wide breadth ................ 1207
Greece Value Added Tax (VAT) Number medium breadth ............ 1207
Greece Value Added Tax (VAT) Number narrow breadth ............. 1208
Healthcare Common Procedure Coding System (HCPCS CPT
Code) ............................................................................... 1208
Healthcare Common Procedure Coding System (HCPCS CPT
Code) medium breadth .................................................. 1209
Healthcare Common Procedure Coding System (HCPCS CPT
Code) narrow breadth .................................................... 1210
Health Insurance Claim Number ................................................... 1212
Health Insurance Claim Number wide breadth .......................... 1212
Health Insurance Claim Number medium breadth ...................... 1213
Health Insurance Claim Number narrow breadth ....................... 1214
Hong Kong ID .......................................................................... 1215
Hong Kong ID wide breadth .................................................. 1216
Hong Kong ID narrow breadth ............................................... 1216
Hungary Driver's Licence Number ................................................ 1217
Hungary Driver's Licence Number wide breadth ........................ 1218
Hungary Driver's Licence Number narrow breadth ..................... 1218
Hungary Passport Number .......................................................... 1219
Hungary Passport Number wide breadth ................................. 1220
Hungary Passport Number medium breadth ............................. 1220
Hungary Passport Number narrow breadth .............................. 1220
Hungarian Social Security Number ............................................... 1221
Hungarian Social Security Number wide breadth ....................... 1222
Hungarian Social Security Number medium breadth .................. 1222
Hungarian Social Security Number narrow breadth .................... 1222
Hungarian Tax Identification Number ............................................. 1223
Hungarian Tax Identification Number wide breadth .................... 1224
Hungarian Tax Identification Number medium breadth ................ 1224
Hungarian Tax Identification Number narrow breadth ................. 1224
Hungarian VAT Number .............................................................. 1225
Hungarian VAT Number wide breadth ..................................... 1226
Hungarian VAT Number medium breadth ................................. 1226
Hungarian VAT Number narrow breadth .................................. 1226
Contents 36

IBAN Central ............................................................................ 1227

IBAN Central wide breadth ................................................... 1228
IBAN Central narrow breadth ................................................ 1229
IBAN East ............................................................................... 1231
IBAN East wide breadth ....................................................... 1232
IBAN East narrow-breadth .................................................... 1234
IBAN West ............................................................................... 1237
IBAN West wide breadth ...................................................... 1237
IBAN West narrow-breadth ................................................... 1239
Iceland National Identification Number ........................................... 1241
Iceland National Identification Number wide breadth .................. 1242
Iceland National Identification Number medium breadth .............. 1243
Iceland National Identification Number narrow breadth ............... 1244
Iceland Passport Number ........................................................... 1245
Iceland Passport Number wide breadth ................................... 1245
Iceland Passport Number narrow breadth ................................ 1246
Iceland Value Added Tax (VAT) Number ......................................... 1247
Iceland Value Added Tax (VAT) Number wide breadth ................ 1247
Iceland Value Added Tax (VAT) Number narrow breadth ............. 1248
Indian Aadhaar Card Number ...................................................... 1249
Indian Aadhaar Card Number wide breadth .............................. 1249
Indian Aadhaar Card Number medium breadth ......................... 1249
Indian Aadhaar Card Number narrow breadth ........................... 1250
Indian Permanent Account Number .............................................. 1251
Indian Permanent Account Number wide breadth ...................... 1251
Indian Permanent Account Number narrow breadth ................... 1252
India RuPay Card Number .......................................................... 1252
India RuPay Card Number wide breadth .................................. 1253
India RuPay Card Number medium breadth ............................. 1253
India RuPay Card Number narrow breadth ............................... 1254
Indonesian Identity Card Number ................................................. 1255
Indonesian Identity Card Number wide breadth ......................... 1256
Indonesian Identity Card Number medium breadth .................... 1256
Indonesian Identity Card Number narrow breadth ...................... 1256
International Mobile Equipment Identity Number .............................. 1257
International Mobile Equipment Identity Number wide
breadth ....................................................................... 1258
International Mobile Equipment Identity Number medium
breadth ....................................................................... 1258
International Mobile Equipment Identity Number narrow
breadth ....................................................................... 1259
International Securities Identification Number .................................. 1259
International Securities Identification Number wide breadth ......... 1260
Contents 37

International Securities Identification Number medium

breadth ....................................................................... 1260
International Securities Identification Number narrow
breadth ....................................................................... 1260
IP Address ............................................................................... 1261
IP Address wide breadth ...................................................... 1261
IP Address medium breadth .................................................. 1262
IP Address narrow breadth ................................................... 1263
IPv6 Address ........................................................................... 1263
IPv6 Address wide breadth ................................................... 1264
IPv6 Address medium breadth ............................................... 1264
IPv6 Address narrow breadth ................................................ 1265
Ireland Passport Number ............................................................ 1266
Ireland Passport Number wide breadth .................................... 1266
Ireland Passport Number narrow breadth ................................. 1267
Ireland Tax Identification Number ................................................. 1268
Ireland Tax Identification Number wide breadth ......................... 1268
Ireland Tax Identification Number medium breadth ..................... 1269
Ireland Tax Identification Number narrow breadth ...................... 1270
Ireland Value Added Tax (VAT) Number ......................................... 1271
Ireland Value Added Tax (VAT) Number wide breadth ................. 1272
Ireland Value Added Tax (VAT) Number medium breadth ............ 1273
Ireland Value Added Tax (VAT) Number narrow breadth .............. 1273
Irish Personal Public Service Number ............................................ 1274
Irish Personal Public Service Number wide breadth ................... 1275
Irish Personal Public Service Number medium breadth ............... 1275
Irish Personal Public Service Number narrow breadth ................ 1276
Israel Personal Identification Number ............................................ 1276
Israel Personal Identification Number wide breadth .................... 1277
Israel Personal Identification Number medium breadth ............... 1277
Israel Personal Identification Number narrow breadth ................. 1277
Italy Driver's Licence Number ...................................................... 1278
Italy Driver's Licence Number wide breadth .............................. 1279
Italy Driver's Licence Number narrow breadth ........................... 1279
Italy Health Insurance Number ..................................................... 1280
Italy Health Insurance Number wide breadth ............................ 1280
Italy Health Insurance Number narrow breadth ......................... 1281
Italy Passport Number ................................................................ 1282
Italy Passport Number wide breadth ....................................... 1282
Italy Passport Number narrow breadth .................................... 1282
Italy Value Added Tax (VAT) Number ............................................. 1283
Italy Value Added Tax (VAT) Number wide breadth .................... 1283
Italy Value Added Tax (VAT) Number medium breadth ................ 1284
Contents 38

Italy Value Added Tax (VAT) Number narrow breadth ................. 1285
Japan Driver's License Number ................................................... 1285
Japan Driver's License Number wide breadth ........................... 1286
Japan Driver's License Number medium breadth ....................... 1286
Japan Driver's License Number narrow breadth ........................ 1286
Japan Passport Number ............................................................. 1287
Japan Passport Number wide breadth ..................................... 1287
Japan Passport Number narrow breadth .................................. 1288
Japanese Juki-Net Identification Number ....................................... 1289
Japanese Juki-Net Identification Number wide breadth ............... 1289
Japanese Juki-Net Identification Number medium breadth .......... 1290
Japanese Juki-Net Identification Number narrow breadth ............ 1290
Japanese My Number - Corporate ................................................ 1291
Japanese My Number - Corporate wide breadth ........................ 1291
Japanese My Number - Corporate narrow breadth ..................... 1292
Japanese My Number - Personal ................................................. 1292
Japanese My Number - Personal wide breadth ......................... 1293
Japanese My Number - Personal medium breadth ..................... 1293
Japanese My Number - Personal narrow breadth ...................... 1294
Kazakhstan Passport Number ..................................................... 1295
Kazakhstan Passport Number wide breadth ............................. 1295
Kazakhstan Passport Number narrow breadth .......................... 1296
Korea Passport Number ............................................................. 1296
Korea Passport Number wide breadth ..................................... 1297
Korea Passport Number narrow breadth .................................. 1297
Korea Residence Registration Number for Foreigners ...................... 1298
Korea Residence Registration Number for Foreigners wide
breadth ....................................................................... 1298
Korea Residence Registration Number for Foreigners medium
breadth ....................................................................... 1299
Korea Residence Registration Number for Foreigners narrow
breadth ....................................................................... 1299
Korea Residence Registration Number for Korean ........................... 1300
Korea Residence Registration Number for Korean wide
breadth ....................................................................... 1301
Korea Residence Registration Number for Korean medium
breadth ....................................................................... 1301
Korea Residence Registration Number for Korean narrow
breadth ....................................................................... 1302
Latvia Driver's Licence Number .................................................... 1303
Latvia Driver's Licence Number wide breadth ........................... 1303
Latvia Driver's Licence Number narrow breadth ........................ 1304
Latvia Passport Number ............................................................. 1305
Contents 39

Latvia Passport Number wide breadth ..................................... 1305

Latvia Passport Number narrow breadth .................................. 1305
Latvia Personal Identification Number ........................................... 1306
Latvia Personal Identification Number wide breadth ................... 1307
Latvia Personal Identification Number medium breadth ............... 1307
Latvia Personal Identification Number narrow breadth ................ 1307
Latvia Value Added Tax (VAT) Number .......................................... 1308
Latvia Value Added Tax (VAT) Number wide breadth .................. 1309
Latvia Value Added Tax (VAT) Number medium breadth ............. 1309
Latvia Value Added Tax (VAT) Number narrow breadth ............... 1310
Liechtenstein Passport Number ................................................... 1311
Liechtenstein Passport Number wide breadth ........................... 1311
Liechtenstein Passport Number narrow breadth ........................ 1312
Lithuania Personal Identification Number ....................................... 1312
Lithuania Personal Identification Number wide breadth ............... 1313
Lithuania Personal Identification Number medium breadth .......... 1314
Lithuania Personal Identification Number narrow breadth ............ 1314
Lithuania Tax Identification Number .............................................. 1315
Lithuania Tax Identification Number wide breadth ...................... 1315
Lithuania Tax Identification Number medium breadth .................. 1316
Lithuania Tax Identification Number narrow breadth ................... 1316
Lithuania Value Added Tax (VAT) Number ...................................... 1317
Lithuania Value Added Tax (VAT) Number wide breadth .............. 1318
Lithuania Value Added Tax (VAT) Number medium breadth ......... 1318
Lithuania Value Added Tax (VAT) Number narrow breadth ........... 1319
Luxembourg National Register of Individuals Number ....................... 1320
Luxembourg National Register of Individuals Number wide
breadth ....................................................................... 1320
Luxembourg National Register of Individuals Number medium
breadth ....................................................................... 1321
Luxembourg National Register of Individuals Number narrow
breadth ....................................................................... 1321
Luxembourg Passport Number .................................................... 1322
Luxembourg Passport Number wide breadth ............................ 1322
Luxembourg Passport Number narrow breadth ......................... 1323
Luxembourg Tax Identification Number .......................................... 1324
Luxembourg Tax Identification Number wide breadth .................. 1324
Luxembourg Tax Identification Number medium breadth ............. 1325
Luxembourg Tax Identification Number narrow breadth ............... 1326
Luxembourg Value Added Tax (VAT) Number .................................. 1327
Luxembourg Value Added Tax (VAT) Number wide breadth ......... 1328
Luxembourg Value Added Tax (VAT) Number medium
breadth ....................................................................... 1329
Contents 40

Luxembourg Value Added Tax (VAT) Number narrow

breadth ....................................................................... 1329
Macau National Identification Number ........................................... 1331
Macau National Identification Number wide breadth ................... 1331
Macau National Identification Number narrow breadth ................ 1332
Malaysia Passport Number ......................................................... 1333
Malaysia Passport Number wide breadth ................................. 1333
Malaysia Passport Number narrow breadth .............................. 1334
Malaysian MyKad Number (MyKad) .............................................. 1335
Malaysian MyKad Number (MyKad) wide breadth ...................... 1335
Malaysian MyKad Number (MyKad) medium breadth ................. 1336
Malaysian MyKad Number (MyKad) narrow breadth ................... 1336
Malta National Identification Number ............................................. 1337
Malta National Identification Number wide breadth ..................... 1338
Malta National Identification Number narrow breadth .................. 1338
Malta Tax Identification Number ................................................... 1339
Malta Tax Identification Number wide breadth ........................... 1339
Malta Tax Identification Number narrow breadth ........................ 1340
Malta Value Added Tax (VAT) Number ........................................... 1342
Malta Value Added Tax (VAT) Number wide breadth ................... 1342
Malta Value Added Tax (VAT) Number medium breadth .............. 1343
Malta Value Added Tax (VAT) Number narrow breadth ................ 1343
Medicare Beneficiary Identifier ..................................................... 1344
Medicare Beneficiary Identifier wide breadth ............................. 1345
Medicare Beneficiary Identifier medium breadth ........................ 1345
Medicare Beneficiary Identifier narrow breadth .......................... 1345
Mexican Personal Registration and Identification Number .................. 1346
Mexican Personal Registration and Identification Number wide
breadth ....................................................................... 1347
Mexican Personal Registration and Identification Number medium
breadth ....................................................................... 1347
Mexican Personal Registration and Identification Number narrow
breadth ....................................................................... 1348
Mexican Tax Identification Number ............................................... 1349
Mexican Tax Identification Number wide breadth ....................... 1349
Mexican Tax Identification Number medium breadth ................... 1350
Mexican Tax Identification Number narrow breadth .................... 1350
Mexican Unique Population Registry Code ..................................... 1351
Mexican Unique Population Registry Code wide breadth ............. 1352
Mexican Unique Population Registry Code medium breadth ....
1 3 5 2
Mexican Unique Population Registry Code narrow breadth .......... 1352
Mexico CLABE Number .............................................................. 1353
Contents 41

Mexico CLABE Number wide breadth ..................................... 1353

Mexico CLABE Number medium breadth ................................. 1354
Mexico CLABE Number narrow breadth .................................. 1354
National Drug Code (NDC) .......................................................... 1355
National Drug Code (NDC) wide breadth ................................. 1355
National Drug Code (NDC) medium breadth ............................. 1356
National Drug Code (NDC) narrow breadth .............................. 1356
National Provider Identifier Number .............................................. 1357
National Provider Identifier Number wide breadth ...................... 1357
National Provider Identifier Number medium breadth .................. 1358
National Provider Identifier Number narrow breadth ................... 1358
Netherlands Bank Account Number .............................................. 1359
Netherlands Bank Account Number wide breadth ...................... 1360
Netherlands Bank Account Number medium breadth ................. 1360
Netherlands Bank Account Number narrow breadth ................... 1361
Netherlands Driver's License Number ........................................... 1362
Netherlands Driver's License Number wide breadth ................... 1362
Netherlands Driver's License Number narrow breadth ................ 1362
Netherlands Passport Number ..................................................... 1363
Netherlands Passport Number wide breadth ............................. 1363
Netherlands Passport Number narrow breadth .......................... 1364
Netherlands Tax Identification Number .......................................... 1364
Netherlands Tax Identification Number wide breadth .................. 1365
Netherlands Tax Identification Number medium breadth .............. 1365
Netherlands Tax Identification Number narrow breadth ............... 1366
Netherlands Value Added Tax (VAT) Number .................................. 1367
Netherlands Value Added Tax (VAT) Number wide breadth .......... 1368
Netherlands Value Added Tax (VAT) Number medium
breadth ....................................................................... 1368
Netherlands Value Added Tax (VAT) Number narrow
breadth ....................................................................... 1369
New Zealand Driver's Licence Number .......................................... 1370
New Zealand Driver's Licence Number wide breadth .................. 1370
New Zealand Driver's Licence Number narrow breadth ............... 1370
New Zealand National Health Index Number ................................... 1371
New Zealand National Health Index Number wide breadth .......... 1372
New Zealand National Health Index Number medium
breadth ....................................................................... 1372
New Zealand National Health Index Number narrow breadth ....... 1372
New Zealand Passport Number ................................................... 1373
New Zealand Passport Number wide breadth ........................... 1373
New Zealand Passport Number narrow breadth ........................ 1374
Norway Driver's Licence Number ................................................. 1375
Contents 42

Norway Driver's Licence Number wide breadth ......................... 1376

Norway Driver's Licence Number narrow breadth ...................... 1376
Norway National Identification Number .......................................... 1377
Norway National Identification Number wide breadth .................. 1377
Norway National Identification Number medium breadth ............. 1378
Norway National Identification Number narrow breadth ............... 1379
Norway Value Added Tax Number ................................................ 1379
Norway Value Added Tax Number wide breadth ........................ 1380
Norway Value Added Tax Number medium breadth ................... 1381
Norway Value Added Tax Number narrow breadth ..................... 1381
Norwegian Birth Number ............................................................ 1382
Norwegian Birth Number wide breadth .................................... 1382
Norwegian Birth Number medium breadth ................................ 1383
Norwegian Birth Number narrow breadth ................................. 1383
People's Republic of China ID ..................................................... 1384
People's Republic of China ID wide breadth ............................. 1385
People's Republic of China ID narrow breadth .......................... 1385
Poland Driver's Licence Number .................................................. 1386
Poland Driver's Licence Number wide breadth .......................... 1386
Poland Driver's Licence Number narrow breadth ....................... 1387
Poland European Health Insurance Number ................................... 1387
Poland European Health Insurance Number wide breadth ........... 1388
Poland European Health Insurance Number narrow breadth ........ 1388
Poland Passport Number ............................................................ 1389
Poland Passport Number wide breadth ................................... 1390
Poland Passport Number narrow breadth ................................ 1390
Poland Value Added Tax (VAT) Number ......................................... 1391
Poland Value Added Tax (VAT) Number wide breadth ................. 1392
Poland Value Added Tax (VAT) Number medium breadth ............ 1392
Poland Value Added Tax (VAT) Number narrow breadth .............. 1393
Polish Identification Number ........................................................ 1394
Polish Identification Number wide breadth ................................ 1394
Polish Identification Number medium breadth ........................... 1395
Polish Identification Number narrow breadth ............................. 1395
Polish REGON Number .............................................................. 1396
Polish REGON Number wide breadth ..................................... 1396
Polish REGON Number medium breadth ................................. 1397
Polish REGON Number narrow breadth .................................. 1397
Polish Social Security Number (PESEL) ........................................ 1398
Polish Social Security Number (PESEL) wide breadth ................ 1399
Polish Social Security Number (PESEL) medium breadth ............ 1399
Polish Social Security Number (PESEL) narrow breadth ............. 1399
Polish Tax Identification Number .................................................. 1400
Contents 43

Polish Tax Identification Number wide breadth .......................... 1401

Polish Tax Identification Number medium breadth ...................... 1401
Polish Tax Identification Number narrow breadth ....................... 1401
Portugal Driver's Licence Number ................................................ 1402
Portugal Driver's Licence Number wide breadth ........................ 1403
Portugal Driver's Licence Number narrow breadth ..................... 1403
Portugal National Identification Number ......................................... 1404
Portugal National Identification Number wide breadth ................. 1405
Portugal National Identification Number medium breadth ............ 1405
Portugal National Identification Number narrow breadth .............. 1406
Portugal Passport Number .......................................................... 1407
Portugal Passport Number wide breadth .................................. 1408
Portugal Passport Number narrow breadth ............................... 1408
Portugal Tax Identification Number ............................................... 1408
Portugal Tax Identification Number wide breadth ....................... 1409
Portugal Tax Identification Number medium breadth ................... 1409
Portugal Tax Identification Number narrow breadth .................... 1410
Portugal Value Added Tax (VAT) Number ....................................... 1411
Portugal Value Added Tax (VAT) Number wide breadth ............... 1412
Portugal Value Added Tax (VAT) Number medium breadth .......... 1412
Portugal Value Added Tax (VAT) Number narrow breadth ............ 1413
Randomized US Social Security Number (SSN) .............................. 1414
Randomized US Social Security Number (SSN) medium
breadth ....................................................................... 1415
Randomized US Social Security Number (SSN) narrow
breadth ....................................................................... 1415
Romania Driver's Licence Number ................................................ 1416
Romania Driver's Licence Number wide breadth ....................... 1417
Romania Driver's Licence Number narrow breadth .................... 1418
Romania National Identification Number ........................................ 1419
Romania National Identification Number wide breadth ................ 1419
Romania National Identification Number medium breadth ........... 1419
Romania National Identification Number narrow breadth ............. 1420
Romania Value Added Tax (VAT) Number ...................................... 1420
Romania Value Added Tax (VAT) Number wide breadth .............. 1421
Romania Value Added Tax (VAT) Number medium breadth ......... 1422
Romania Value Added Tax (VAT) Number narrow breadth ........... 1423
Romanian Numerical Personal Code ............................................ 1425
Romanian Numerical Personal Code wide breadth .................... 1425
Romanian Numerical Personal Code medium breadth ................ 1425
Romanian Numerical Personal Code narrow breadth ................. 1426
Russian Passport Identification Number ......................................... 1427
Russian Passport Identification Number wide breadth ................ 1427
Contents 44

Russian Passport Identification Number narrow breadth ............. 1427

Russian Taxpayer Identification Number ........................................ 1428
Russian Taxpayer Identification Number wide breadth ................ 1429
Russian Taxpayer Identification Number medium breadth ........... 1429
Russian Taxpayer Identification Number narrow breadth ............. 1429
SEPA Creditor Identifier Number North .......................................... 1430
SEPA Creditor Identifier Number North wide breadth .................. 1431
SEPA Creditor Identifier Number North medium breadth ............. 1433
SEPA Creditor Identifier Number North narrow breadth ............... 1435
SEPA Creditor Identifier Number South ......................................... 1437
SEPA Creditor Identifier Number South wide breadth ................. 1438
SEPA Creditor Identifier Number South medium breadth ............. 1439
SEPA Creditor Identifier Number South narrow breadth .............. 1439
SEPA Creditor Identifier Number West .......................................... 1441
SEPA Creditor Identifier Number West wide breadth .................. 1442
SEPA Creditor Identifier Number West medium breadth .............. 1443
SEPA Creditor Identifier Number West narrow breadth ............... 1443
Serbia Unique Master Citizen Number ........................................... 1445
Serbia Unique Master Citizen Number wide breadth .................. 1446
Serbia Unique Master Citizen Number medium breadth .............. 1446
Serbia Unique Master Citizen Number narrow breadth ............... 1447
Serbia Value Added Tax (VAT) Number .......................................... 1448
Serbia Value Added Tax (VAT) Number wide breadth ................. 1449
Serbia Value Added Tax (VAT) Number medium breadth ............. 1449
Serbia Value Added Tax (VAT) Number narrow breadth .............. 1450
Singapore NRIC data identifier ..................................................... 1451
Slovakia Driver's Licence Number ................................................ 1451
Slovakia Driver's Licence Number wide breadth ........................ 1452
Slovakia Driver's Licence Number narrow breadth ..................... 1452
Slovakia National Identification Number ......................................... 1453
Slovakia National Identification Number wide breadth ................. 1454
Slovakia National Identification Number medium breadth ............ 1455
Slovakia National Identification Number narrow breadth .............. 1455
Slovakia Passport Number .......................................................... 1457
Slovakia Passport Number wide breadth ................................. 1458
Slovakia Passport Number narrow breadth .............................. 1458
Slovakia Value Added Tax (VAT) Number ....................................... 1459
Slovakia Value Added Tax (VAT) Number wide breadth ............... 1460
Slovakia Value Added Tax (VAT) Number medium breadth .......... 1460
Slovakia Value Added Tax (VAT) Number narrow breadth ............ 1460
Slovenia Passport Number .......................................................... 1461
Slovenia Passport Number wide breadth ................................. 1462
Slovenia Passport Number narrow breadth .............................. 1462
Contents 45

Slovenia Tax Identification Number ............................................... 1463

Slovenia Tax Identification Number wide breadth ....................... 1463
Slovenia Tax Identification Number medium breadth .................. 1464
Slovenia Tax Identification Number narrow breadth .................... 1464
Slovenia Unique Master Citizen Number ........................................ 1465
Slovenia Unique Master Citizen Number wide breadth ................ 1465
Slovenia Unique Master Citizen Number medium breadth ........... 1466
Slovenia Unique Master Citizen Number narrow breadth ............. 1466
Slovenia Value Added Tax (VAT) Number ....................................... 1467
Slovenia Value Added Tax (VAT) Number wide breadth .............. 1468
Slovenia Value Added Tax (VAT) Number medium breadth .......... 1468
Slovenia Value Added Tax (VAT) Number narrow breadth ........... 1469
South African Personal Identification Number ................................. 1469
South African Personal Identification Number wide breadth ......... 1470
South African Personal Identification Number medium
breadth ....................................................................... 1470
South African Personal Identification Number narrow
breadth ....................................................................... 1471
South Korea Resident Registration Number .................................... 1471
South Korea Resident Registration Number wide breadth ........... 1472
South Korea Resident Registration Number medium
breadth ....................................................................... 1472
South Korea Resident Registration Number narrow breadth ........ 1473
Spain Value Added Tax (VAT) Number ........................................... 1474
Spain Value Added Tax (VAT) Number wide breadth .................. 1474
Spain Value Added Tax (VAT) Number medium breadth .............. 1475
Spain Value Added Tax (VAT) Number narrow breadth ............... 1476
Spain Driver's Licence Number .................................................... 1477
Spain Driver's Licence Number wide breadth ............................ 1477
Spain Driver's Licence Number narrow breadth ......................... 1478
Spanish Customer Account Number ............................................. 1479
Spanish Customer Account Number wide breadth ..................... 1480
Spanish Customer Account Number medium breadth ................. 1480
Spanish Customer Account Number narrow breadth .................. 1481
Spanish DNI ID ......................................................................... 1481
Spanish DNI ID wide breadth ................................................ 1482
Spanish DNI ID narrow breadth ............................................. 1482
Spanish Passport Number .......................................................... 1483
Spanish Passport Number wide breadth .................................. 1483
Spanish Passport Number narrow breadth ............................... 1484
Spanish Social Security Number ................................................. 1485
Spanish Social Security Number wide breadth .......................... 1485
Spanish Social Security Number medium breadth ..................... 1486
Contents 46

Spanish Social Security Number narrow breadth ....................... 1486

Spanish Tax Identification (CIF) .................................................... 1487
Spanish Tax Identification (CIF) wide breadth ........................... 1487
Spanish Tax Identification (CIF) medium breadth ....................... 1488
Spanish Tax Identification (CIF) narrow breadth ........................ 1489
Sri Lanka National Identity Number ............................................... 1490
Sri Lanka National Identity Number wide breadth ...................... 1490
Sri Lanka National Identity Number medium breadth .................. 1491
Sri Lanka National Identity Number narrow breadth ................... 1491
Sweden Driver's Licence Number ................................................. 1492
Sweden Driver's Licence Number wide breadth ........................ 1493
Sweden Driver's Licence Number medium breadth .................... 1493
Sweden Driver's Licence Number narrow breadth ..................... 1493
Sweden Tax Identification Number ................................................ 1494
Sweden Tax Identification Number wide breadth ....................... 1495
Sweden Tax Identification Number medium breadth ................... 1495
Sweden Tax Identification Number narrow breadth .................... 1496
Sweden Value Added Tax (VAT) Number ....................................... 1496
Sweden Value Added Tax (VAT) Number wide breadth ............... 1497
Sweden Value Added Tax (VAT) Number medium breadth ........... 1497
Sweden Value Added Tax (VAT) Number narrow breadth ............ 1498
Swedish Passport Number .......................................................... 1499
Swedish Passport Number wide breadth ................................. 1499
Swedish Passport Number narrow breadth .............................. 1500
Sweden Personal Identification Number ......................................... 1501
Sweden Personal Identification Number wide breadth ................ 1501
Sweden Personal Identification Number medium breadth ........... 1502
Sweden Personal Identification Number narrow breadth ............. 1502
SWIFT Code ........................................................................... 1503
SWIFT Code wide breadth .................................................... 1503
SWIFT Code narrow breadth ................................................. 1504
Swiss AHV Number ................................................................... 1505
Swiss AHV Number wide breadth ........................................... 1506
Swiss AHV Number narrow breadth ........................................ 1506
Swiss Social Security Number (AHV) ............................................ 1507
Swiss Social Security Number (AHV) wide breadth .................... 1507
Swiss Social Security Number (AHV) medium breadth ............... 1508
Swiss Social Security Number (AHV) narrow breadth ................. 1508
Switzerland Health Insurance Card Number ................................... 1509
Switzerland Health Insurance Card Number wide breadth ........... 1510
Switzerland Health Insurance Card Number narrow breadth ........ 1510
Switzerland Passport Number ...................................................... 1511
Switzerland Passport Number wide breadth ............................. 1512
Contents 47

Switzerland Passport Number narrow breadth .......................... 1512

Switzerland Value Added Tax (VAT) Number ................................... 1513
Switzerland Value Added Tax (VAT) Number wide breadth .......... 1514
Switzerland Value Added Tax (VAT) Number medium
breadth ....................................................................... 1514
Switzerland Value Added Tax (VAT) Number narrow
breadth ....................................................................... 1515
Taiwan ROC ID ......................................................................... 1515
Taiwan ROC ID wide breadth ................................................ 1516
Taiwan ROC ID narrow breadth ............................................. 1516
Thailand Passport Number .......................................................... 1517
Thailand Passport Number wide breadth ................................. 1517
Thailand Passport Number narrow breadth .............................. 1518
Thailand Personal Identification Number ........................................ 1519
Thailand Personal Identification Number wide breadth ................ 1519
Thailand Personal Identification Number medium breadth ........... 1520
Thailand Personal Identification Number narrow breadth ............. 1520
Turkish Identification Number ...................................................... 1521
Turkish Identification Number wide breadth .............................. 1521
Turkish Identification Number medium breadth .......................... 1522
Turkish Identification Number narrow breadth ........................... 1522
UK Bank Account Number Sort Code ............................................ 1523
UK Bank Account Number Sort Code wide breadth .................... 1523
UK Bank Account Number Sort Code medium breadth ............... 1524
UK Bank Account Number Sort Code narrow breadth ................. 1524
UK Drivers Licence Number ........................................................ 1525
UK Drivers Licence Number wide breadth ................................ 1525
UK Drivers Licence Number medium breadth ........................... 1526
UK Drivers Licence Number narrow breadth ............................. 1526
UK Electoral Roll Number ........................................................... 1527
UK National Health Service (NHS) Number .................................... 1528
UK National Health Service (NHS) Number medium breadth ....... 1529
UK National Health Service (NHS) Number narrow breadth ......... 1529
UK National Insurance Number .................................................... 1530
UK National Insurance Number wide breadth ........................... 1531
UK National Insurance Number medium breadth ....................... 1531
UK National Insurance Number narrow breadth ........................ 1531
UK Passport Number ................................................................. 1532
UK Passport Number wide breadth ......................................... 1532
UK Passport Number medium breadth .................................... 1533
UK Passport Number narrow breadth ...................................... 1533
UK Tax ID Number .................................................................... 1534
UK Tax ID Number wide breadth ............................................ 1534
Contents 48

UK Tax ID Number medium breadth ....................................... 1535

UK Tax ID Number narrow breadth ......................................... 1535
UK Value Added Tax (VAT) Number .............................................. 1536
UK Value Added Tax (VAT) Number wide breadth ...................... 1536
UK Value Added Tax (VAT) Number medium breadth ................. 1537
UK Value Added Tax (VAT) Number narrow breadth ................... 1538
Ukraine Identity Card ................................................................. 1539
Ukraine Identity Card wide breadth ......................................... 1540
Ukraine Identity Card medium breadth .................................... 1540
Ukraine Identity Card narrow breadth ...................................... 1541
Ukraine Passport (Domestic) ....................................................... 1541
Ukraine Passport (Domestic) wide breadth ............................... 1542
Ukraine Passport (Domestic) narrow breadth ............................ 1542
Ukraine Passport (International) ................................................... 1543
Ukraine Passport (International) wide breadth ........................... 1543
Ukraine Passport (International) narrow breadth ........................ 1544
United Arab Emirates Personal Number ......................................... 1544
United Arab Emirates Personal Number wide breadth ............... 1545
United Arab Emirates Personal Number medium breadth ............ 1545
United Arab Emirates Personal Number narrow breadth ............. 1545
US Individual Tax Identification Number (ITIN) ................................ 1546
US Individual Tax Identification Number (ITIN) wide breadth ........ 1547
US Individual Tax Identification Number (ITIN) medium
breadth ....................................................................... 1547
US Individual Tax Identification Number (ITIN) narrow
breadth ....................................................................... 1548
US Passport Number ................................................................. 1548
US Passport Number wide breadth ......................................... 1549
US Passport Number narrow breadth ...................................... 1549
US Social Security Number (SSN) ................................................ 1550
US Social Security Number (SSN) wide breadth ........................ 1551
US Social Security Number (SSN) medium breadth ................... 1551
US Social Security Number (SSN) narrow breadth ..................... 1552
US ZIP+4 Postal Codes ............................................................. 1553
US ZIP+4 Postal Codes wide breadth ..................................... 1553
US ZIP+4 Postal Codes medium breadth ................................. 1554
US ZIP+4 Postal Codes narrow breadth .................................. 1554
Venezuela National Identification Number ...................................... 1555
Venezuela National Identification Number wide breadth .............. 1556
Venezuela National Identification Number medium breadth ......... 1556
Venezuela National Identification Number narrow breadth ........... 1556
Contents 49

Chapter 46 Library of policy templates ............................................ 1558

Caldicott Report policy template ................................................... 1561
Canadian Social Insurance Numbers policy template ........................ 1562
CAN-SPAM Act policy template .................................................... 1563
Colombian Personal Data Protection Law 1581 policy template .......... 1564
Common Spyware Upload Sites policy template .............................. 1564
Competitor Communications policy template ................................... 1565
Confidential Documents policy template ......................................... 1565
Credit Card Numbers policy template ............................................ 1566
Customer Data Protection policy template ...................................... 1567
Data Protection Act 1998 policy template ....................................... 1568
Data Protection Directives (EU) policy template ............................... 1570
Defense Message System (DMS) GENSER Classification policy
template ............................................................................ 1572
Design Documents policy template ............................................... 1573
Employee Data Protection policy template ...................................... 1574
Encrypted Data policy template .................................................... 1575
Export Administration Regulations (EAR) policy template .................. 1576
FACTA 2003 (Red Flag Rules) policy template ................................ 1577
Financial Information policy template ............................................. 1581
Forbidden Websites policy template .............................................. 1581
Gambling policy template ............................................................ 1582
General Data Protection Regulation (Banking and Finance) ............... 1583
General Data Protection Regulation (Digital Identity) ........................ 1617
General Data Protection Regulation (Government Identification) ......... 1618
General Data Protection Regulation (Healthcare and Insurance) ......... 1656
General Data Protection Regulation (Personal Profile) ...................... 1672
General Data Protection Regulation (Travel) ................................... 1675
Gramm-Leach-Bliley policy template ............................................. 1688
HIPAA and HITECH (including PHI) policy template ......................... 1690
Human Rights Act 1998 policy template ......................................... 1694
Illegal Drugs policy template ........................................................ 1695
Individual Taxpayer Identification Numbers (ITIN) policy template ........ 1695
International Traffic in Arms Regulations (ITAR) policy template .......... 1696
Media Files policy template ......................................................... 1697
Medicare and Medicaid (including PHI) .......................................... 1698
Merger and Acquisition Agreements policy template ......................... 1699
NASD Rule 2711 and NYSE Rules 351 and 472 policy template ......... 1700
NASD Rule 3010 and NYSE Rule 342 policy template ...................... 1702
NERC Security Guidelines for Electric Utilities policy template ............ 1703
Network Diagrams policy template ................................................ 1704
Network Security policy template .................................................. 1705
Contents 50

Offensive Language policy template .............................................. 1705

Office of Foreign Assets Control (OFAC) policy template ................... 1706
OMB Memo 06-16 and FIPS 199 Regulations policy template ............ 1707
Password Files policy template .................................................... 1709
Payment Card Industry (PCI) Data Security Standard policy
template ............................................................................ 1709
PIPEDA policy template ............................................................. 1711
Price Information policy template .................................................. 1713
Project Data policy template ........................................................ 1713
Proprietary Media Files policy template .......................................... 1713
Publishing Documents policy template ........................................... 1714
Racist Language policy template .................................................. 1715
Restricted Files policy template .................................................... 1715
Restricted Recipients policy template ............................................ 1715
Resumes policy template ............................................................ 1716
Sarbanes-Oxley policy template ................................................... 1716
SEC Fair Disclosure Regulation policy template .............................. 1719
Sexually Explicit Language policy template ..................................... 1721
Source Code policy template ....................................................... 1722
State Data Privacy policy template ................................................ 1723
SWIFT Codes policy template ...................................................... 1726
Symantec DLP Awareness and Avoidance policy template ................ 1726
UK Drivers License Numbers policy template .................................. 1727
UK Electoral Roll Numbers policy template ..................................... 1727
UK National Health Service (NHS) Number policy template ............... 1728
UK National Insurance Numbers policy template ............................. 1728
UK Passport Numbers policy template ........................................... 1728
UK Tax ID Numbers policy template .............................................. 1729
US Intelligence Control Markings (CAPCO) and DCID 1/7 policy
template ............................................................................ 1729
US Social Security Numbers policy template ................................... 1730
Violence and Weapons policy template .......................................... 1731
Webmail policy template ............................................................. 1731
Yahoo Message Board Activity policy template ................................ 1732
Yahoo and MSN Messengers on Port 80 policy template ................... 1733

Section 5 Configuring policy response rules ................. 1736

Chapter 47 Responding to policy violations .................................... 1737
About response rules ................................................................. 1738
About response rule actions ........................................................ 1738
Response rule actions for all detection servers ................................ 1739
Contents 51

Response rule actions for endpoint detection .................................. 1740

Response rule actions for Network Prevent detection ....................... 1741
Response rule actions for Network Protect detection ........................ 1742
Response rule actions for Cloud Storage detection .......................... 1743
Response rule actions for Cloud Applications and API appliance
detectors ........................................................................... 1744
About response rule execution types ............................................. 1750
About Automated Response rules ................................................ 1751
About Smart Response rules ....................................................... 1751
About response rule conditions .................................................... 1752
About response rule action execution priority .................................. 1753
About response rule authoring privileges ....................................... 1757
Implementing response rules ....................................................... 1758
Response rule best practices ...................................................... 1759

Chapter 48 Configuring and managing response rules ................ 1761

Manage response rules .............................................................. 1761
Adding a new response rule ........................................................ 1762
Configuring response rules ......................................................... 1763
About configuring Smart Response rules ....................................... 1764
Configuring response rule conditions ............................................ 1764
Configuring response rule actions ................................................ 1765
Modifying response rule ordering ................................................. 1769
About removing response rules .................................................... 1770

Chapter 49 Response rule conditions ............................................... 1771

Configuring the Endpoint Location response condition ...................... 1771
Configuring the Endpoint Device response condition ........................ 1772
Configuring the Incident Type response condition ............................ 1773
Configuring the Incident Match Count response condition .................. 1774
Configuring the Protocol or Endpoint Monitoring response
condition ........................................................................... 1775
Configuring the SEP Intensity Level response condition .................... 1777
Configuring the Severity response condition ................................... 1778

Chapter 50 Response rule actions ..................................................... 1780

Configuring the Add Note action ................................................... 1782
Configuring the Encrypt Smart Response action .............................. 1783
Configuring the Limit Incident Data Retention action ......................... 1783
Retaining data for endpoint incidents ...................................... 1784
Discarding data for network incidents ...................................... 1785
Contents 52

Configuring the Log to a Syslog Server action ................................. 1785

Configuring the Send Email Notification action ................................ 1786
Configuring the Server FlexResponse action .................................. 1788
Configuring the Set Attribute action ............................................... 1789
Configuring the Set Status action ................................................. 1790
Configuring the Cloud Storage: Add Visual Tag action ...................... 1791
Configuring the Cloud Storage: Quarantine action ............................ 1791
Configuring the Quarantine Smart Response action ......................... 1793
Configuring the Network Protect: SharePoint Quarantine smart
response action .................................................................. 1793
Configuring the Network Protect: SharePoint Release from Quarantine
smart response action .......................................................... 1795
Configuring the Remove Collaborator Access Smart Response
action ............................................................................... 1797
Configuring the Remove Shared Links Smart Response action ........... 1797
Configuring the Restore File Smart Response action ........................ 1797
Configuring the Custom Action on Data-at-Rest action ...................... 1798
Configuring the Delete Data-at-Rest action ..................................... 1799
Configuring the Encrypt Data-at-Rest action ................................... 1799
Configuring the Perform DRM on Data-at-Rest action ....................... 1800
Configuring the Quarantine Data-at-Rest action ............................... 1801
Configuring the Remove Shared Links in Data-at-Rest action ............. 1802
Configuring the Tag Data-at-Rest action ......................................... 1802
Configuring the Prevent download, copy, print action ........................ 1803
Configuring the Remove Collaborator Access action ........................ 1804
Configuring the Set Collaborator Access to 'Edit' action ..................... 1804
Configuring the Set Collaborator Access to 'Preview' action ............... 1805
Configuring the Set Collaborator Access to 'Read' action ................... 1805
Configuring the Set File Access to 'All Read' action .......................... 1806
Configuring the Set File Access to 'Internal Edit' .............................. 1806
Configuring the Set File Access to 'Internal Read' action ................... 1807
Configuring the Add two-factor authentication action ........................ 1808
Configuring the Block Data-in-Motion action ................................... 1808
Configuring the Custom Action on Data-in-Motion action ................... 1809
Configuring the Encrypt Data-in-Motion action ................................. 1810
Configuring the Perform DRM on Data-in-Motion action .................... 1810
Configuring the Quarantine Data-in-Motion action ............................ 1811
Configuring the Redact Data-in-Motion action ................................. 1812
Configuring the Endpoint: FlexResponse action ............................... 1813
Configuring the Endpoint: ICT Classification And Tagging action ......... 1814
Configuring the Endpoint Discover: Quarantine File action ................. 1815
Configuring the Endpoint Prevent: Block action ............................... 1817
Configuring the Endpoint Prevent: Encrypt action ............................ 1821
Contents 53

Configuring the Endpoint Prevent: Notify action ............................... 1825

Configuring the Endpoint Prevent: User Cancel action ...................... 1828
Configuring the Network Prevent for Web: Block FTP Request
action ............................................................................... 1831
Configuring the Network Prevent for Web: Block HTTP/S action ......... 1831
Configuring the Network Prevent: Block SMTP Message action .......... 1832
Configuring the Network Prevent: Modify SMTP Message action ........ 1833
Configuring the Network Prevent for Web: Remove HTTP/S Content
action ............................................................................... 1835
Configuring the Network Protect: Copy File action ............................ 1836
Configuring the Network Protect: Quarantine File action .................... 1837
Configuring the Network Protect: Encrypt File action ........................ 1838

Section 6 Remediating and managing incidents ......... 1840

Chapter 51 Remediating incidents .................................................... 1841
About incident remediation .......................................................... 1841
Remediating incidents ................................................................ 1844
Executing Smart response rules ................................................... 1845
Incident remediation action commands .......................................... 1845
Response action variables .......................................................... 1847
General incident variables .................................................... 1847
Network Monitor and Network Prevent incident variables ............ 1848
Discover incident variables ................................................... 1848
Endpoint incident variables ................................................... 1849
Application incident variables ................................................ 1849

Chapter 52 Remediating Network incidents ................................... 1851

Network incident list ................................................................... 1851
Network incident list—Actions ...................................................... 1854
Network incident list—Columns .................................................... 1856
Network incident snapshot .......................................................... 1857
Network incident snapshot—Heading and navigation ....................... 1857
Network incident snapshot—General information ............................. 1858
Network incident snapshot—Matches ............................................ 1860
Network incident snapshot—Attributes .......................................... 1861
Network summary report ............................................................ 1861

Chapter 53 Remediating Endpoint incidents .................................. 1863

About endpoint incident lists ........................................................ 1863
Endpoint incident snapshot ......................................................... 1866
Contents 54

Reporting on Endpoint Prevent response rules ................................ 1871

Endpoint incident destination or protocol-specific information ............. 1872
Endpoint incident summary reports ............................................... 1874

Chapter 54 Remediating Discover incidents ................................... 1876

About reports for Network Discover ............................................... 1876
About incident reports for Network Discover/Cloud Storage
Discover ........................................................................... 1877
Discover incident reports ............................................................ 1878
Discover incident lists ................................................................ 1879
Discover incident actions ............................................................ 1879
Discover incident entries ............................................................. 1880
Discover incident snapshot ......................................................... 1882
Discover summary reports .......................................................... 1885

Chapter 55 Working with Application incidents ............................. 1887

About Applications incident reports ............................................... 1887
Applications incident list ............................................................. 1889
Applications incident entries ........................................................ 1889
Applications incident actions ........................................................ 1891
Applications incident snapshot ..................................................... 1892
Applications summary reports ...................................................... 1896

Chapter 56 Managing and reporting incidents ............................... 1897

About Symantec Data Loss Prevention reports ................................ 1899
About strategies for using reports ................................................. 1900
Setting report preferences ........................................................... 1901
About incident reports ................................................................ 1902
About dashboard reports and executive summaries ......................... 1903
Viewing dashboards .................................................................. 1905
Creating dashboard reports ......................................................... 1906
Configuring dashboard reports ..................................................... 1907
Choosing reports to include in a dashboard .................................... 1909
About summary reports .............................................................. 1909
Viewing summary reports ........................................................... 1909
Creating summary reports ........................................................... 1910
Viewing incidents ...................................................................... 1911
About custom reports and dashboards .......................................... 1912
Using IT Analytics to manage incidents .......................................... 1913
Filtering reports ........................................................................ 1914
Saving custom incident reports .................................................... 1914
Contents 55

Scheduling custom incident reports ............................................... 1915

Delivery schedule options for incident and system reports ................. 1917
Delivery schedule options for dashboard reports ............................. 1919
Using the date widget to schedule reports ...................................... 1921
Editing custom dashboards and reports ......................................... 1921
Exporting incident reports ........................................................... 1921
Exported fields for Network Monitor .............................................. 1922
Exported fields for Network Discover/Cloud Storage Discover ............ 1923
Exported fields for Endpoint Discover ............................................ 1924
Deleting incidents ..................................................................... 1925
About the incident deletion process ........................................ 1926
Configuring the incident deletion job schedule .......................... 1927
Starting and stopping incident deletion jobs .............................. 1927
Working with the deletion jobs history ..................................... 1928
About automatically flagging incidents for deletion ..................... 1929
About creating incident reports for automatic incident deletion
flagging ...................................................................... 1930
Configuring automatic incident deletion flagging ........................ 1931
Managing automatic incident deletion flagging .......................... 1931
Troubleshooting automatic incident deletion flagging .................. 1932
Deleting custom dashboards and reports ....................................... 1932
Common incident report features .................................................. 1933
Page navigation in incident reports ............................................... 1934
Incident report filter and summary options ...................................... 1934
Sending incident reports by email ................................................. 1935
Printing incident reports .............................................................. 1936
Incident snapshot history tab ....................................................... 1936
Incident snapshot notes tab ......................................................... 1937
Incident snapshot attributes section .............................................. 1937
Incident snapshot correlations tab ................................................ 1937
Incident snapshot policy section ................................................... 1938
Incident snapshot matches section ............................................... 1938
Incident snapshot access information section .................................. 1938
Customizing incident snapshot pages ........................................... 1939
About filters and summary options for reports ................................. 1940
General filters for reports ............................................................ 1941
Summary options for incident reports ............................................ 1944
Advanced filter options for reports ................................................ 1949

Chapter 57 Hiding incidents ............................................................... 1958

About incident hiding ................................................................. 1958
Hiding incidents ....................................................................... 1959
Contents 56

Unhiding hidden incidents ........................................................... 1959

Preventing incidents from being hidden ......................................... 1960
Deleting hidden incidents ............................................................ 1961

Chapter 58 Working with incident data ........................................... 1962

About incident status attributes .................................................... 1962
Configuring status attributes and values ......................................... 1964
Configuring status groups ........................................................... 1965
Export web archive .................................................................... 1966
Export web archive—Create Archive ............................................. 1966
Export web archive—All Recent Events ......................................... 1967
About custom attributes .............................................................. 1968
About using custom attributes ...................................................... 1969
How custom attributes are populated ............................................ 1970
Configuring custom attributes ...................................................... 1970
Setting custom attributes ............................................................ 1971
Setting the values of custom attributes manually .............................. 1972

Chapter 59 Working with user risk .................................................... 1973

About user risk ......................................................................... 1973
About user data sources ............................................................. 1975
Defining custom attributes for user data ................................... 1976
Bringing in user data ........................................................... 1977
About identifying users in web incidents ........................................ 1981
Enabling user identification and configuring the mapping
schedule ..................................................................... 1982
Checking the status of the domain controllers ........................... 1983
Viewing the user list ................................................................... 1983
Viewing user details ................................................................... 1984
Working with the user risk summary .............................................. 1984

Chapter 60 Implementing lookup plug-ins ...................................... 1986

About lookup plug-ins ................................................................ 1986
Types of lookup plug-ins ....................................................... 1987
About lookup parameters ..................................................... 1990
About plug-in deployment ..................................................... 1991
About plug-in chaining ......................................................... 1991
About upgrading lookup plug-ins ............................................ 1991
Implementing and testing lookup plug-ins ....................................... 1992
Managing and configuring lookup plug-ins ............................... 1994
Creating new lookup plug-ins ................................................ 1995
Contents 57

Selecting lookup parameters ................................................. 1996

Enabling lookup plug-ins ...................................................... 2001
Chaining lookup plug-ins ...................................................... 2002
Reloading lookup plug-ins .................................................... 2002
Troubleshooting lookup plug-ins ............................................. 2003
Configuring detailed logging for lookup plug-ins ........................ 2004
Configuring advanced plug-in properties .................................. 2005
Configuring the CSV Lookup Plug-In ............................................. 2006
Requirements for creating the CSV file .................................... 2008
Specifying the CSV file path .................................................. 2009
Choosing the CSV file delimiter ............................................. 2009
Selecting the CSV file character set ........................................ 2009
Mapping attributes and parameter keys to CSV fields ................. 2010
CSV attribute mapping example ............................................. 2011
Testing and troubleshooting the CSV Lookup Plug-In ................ 2012
CSV Lookup Plug-In tutorial .................................................. 2013
Configuring LDAP Lookup Plug-Ins ............................................... 2015
Requirements for LDAP server connections ............................. 2016
Mapping attributes to LDAP data ............................................ 2017
Attribute mapping examples for LDAP ..................................... 2018
Testing and troubleshooting LDAP Lookup Plug-ins ................... 2018
LDAP Lookup Plug-In tutorial ................................................ 2019
Configuring Script Lookup Plug-Ins ............................................... 2020
Writing scripts for Script Lookup Plug-Ins ................................. 2021
Specifying the Script Command ............................................. 2022
Specifying the Arguments ..................................................... 2023
Enabling the stdin and stdout options ...................................... 2023
Enabling incident protocol filtering for scripts ............................ 2024
Enabling and encrypting script credentials ............................... 2025
Chaining multiple Script Lookup Plug-Ins ................................. 2027
Script Lookup Plug-In tutorial ................................................ 2027
Example script ................................................................... 2029
Configuring migrated Custom (Legacy) Lookup Plug-Ins ................... 2031

Section 7 Monitoring and preventing data loss in

the network ............................................................ 2033
Chapter 61 Implementing Network Monitor ................................... 2034
Implementing Network Monitor ..................................................... 2034
About IPv6 support for Network Monitor ......................................... 2036
Choosing a network packet capture method ................................... 2037
Contents 58

About packet capture software installation and configuration .............. 2038

Installing WinPcap on a Windows platform ............................... 2038
Updating the Endace card driver ............................................ 2039
Installing and updating the Napatech network adapter and driver
software ...................................................................... 2039
Configuring the Network Monitor Server ......................................... 2045
Enabling GET processing with Network Monitor .............................. 2046
Creating a policy for Network Monitor ............................................ 2047
Testing Network Monitor ............................................................. 2048

Chapter 62 Implementing Network Prevent for Email .................. 2049

Implementing Network Prevent for Email ........................................ 2049
About Mail Transfer Agent (MTA) integration ................................... 2051
Configuring Network Prevent for Email Server for reflecting or
forwarding mode ................................................................. 2051
Configuring Linux IP tables to reroute traffic from a restricted
port ............................................................................ 2056
Specifying one or more upstream mail transfer agents (MTAs) ........... 2057
Creating a policy for Network Prevent for Email ............................... 2058
About policy violation data headers ............................................... 2059
Enabling policy violation data headers ........................................... 2060
Testing Network Prevent for Email ................................................ 2061

Chapter 63 Implementing Network Prevent for Web .................... 2062

Implementing Network Prevent for Web ......................................... 2062
Configuring Network Prevent for Web Server .................................. 2064
Configuring a secure ICAP keystore for Network Prevent for Web ....... 2067
About proxy server configuration .................................................. 2070
Configuring request and response mode services ...................... 2070
Specifying one or more proxy servers ............................................ 2071
Enabling GET processing for Network Prevent for Web ..................... 2072
Creating policies for Network Prevent for Web ................................ 2073
Testing Network Prevent for Web ................................................. 2074
Troubleshooting information for Network Prevent for Web Server ........ 2075
Contents 59

Section 8 Discovering where confidential data is

stored ........................................................................ 2076
Chapter 64 About Network Discover ................................................ 2078

About Network Discover/Cloud Storage Discover ............................. 2078

How Network Discover/Cloud Storage Discover works ...................... 2079

Chapter 65 Setting up and configuring Network Discover ........... 2082

Setting up and configuring Network Discover/Cloud Storage
Discover ........................................................................... 2082
Modifying the Network Discover/Cloud Storage Discover Server
configuration ...................................................................... 2083
Configuring Network Discover to use a proxy to connect to the
Symantec ICE Cloud for file share scans ................................. 2085
Adding a new Network Discover/Cloud Storage Discover target .......... 2086
Editing an existing Network Discover/Cloud Storage Discover
target ................................................................................ 2088

Chapter 66 Network Discover scan target configuration

options ......................................................................... 2090

Network Discover/Cloud Storage Discover scan target configuration

options ............................................................................. 2090
Configuring the required fields for Network Discover targets ............... 2092
Scheduling Network Discover/Cloud Storage Discover scans ............. 2093
Providing the password authentication for Network Discover scanned
content ............................................................................. 2095
Managing cloud storage authorizations .......................................... 2096
Providing Box cloud storage authorization credentials ................ 2097
Encrypting passwords in configuration files ..................................... 2100
Setting up Network Discover/Cloud Storage Discover filters to include
or exclude items from the scan .............................................. 2100
Filtering Discover targets by item size ........................................... 2103
Filtering Discover targets by date last accessed or modified ............... 2103
Optimizing resources with Network Discover/Cloud Storage Discover
scan throttling ..................................................................... 2106
Creating an inventory of the locations of unprotected sensitive
data ................................................................................. 2107
Contents 60

Chapter 67 Managing Network Discover target scans .................. 2110

Managing Network Discover/Cloud Storage Discover target
scans ............................................................................... 2111
Managing Network Discover/Cloud Storage Discover targets ............. 2111
About the Network Discover/Cloud Storage Discover scan target
list ............................................................................. 2111
Working with Network Discover/Cloud Storage Discover scan
targets ........................................................................ 2113
Removing Network Discover/Cloud Storage Discover scan
targets ........................................................................ 2113
Managing Network Discover/Cloud Storage Discover scan histories
....................................................................................... 2114
About Discover and Endpoint Discover scan histories ................ 2114
Working with Network Discover/Cloud Storage Discover scan
histories ...................................................................... 2116
Deleting Network Discover/Cloud Storage Discover scans .......... 2116
About Discover scan details .................................................. 2117
Working with Network Discover/Cloud Storage Discover scan
details ........................................................................ 2120
Managing Network Discover/Cloud Storage Discover Servers ............ 2121
Viewing Network Discover/Cloud Storage Discover server
status ......................................................................... 2121
About Network Discover/Cloud Storage Discover scan
optimization ....................................................................... 2122
About the difference between incremental scans and differential
scans ............................................................................... 2124
About incremental scans ............................................................ 2125
Scanning new or modified items with incremental scans .................... 2126
About managing incremental scans .............................................. 2127
Scanning new or modified items with differential scans ..................... 2128
Configuring parallel scanning of Network Discover/Cloud Storage
Discover targets .................................................................. 2128
About grid scanning ................................................................... 2130
Configuring grid scanning ........................................................... 2132
Renewing grid communication certificates for Discover detection
servers ............................................................................. 2134
Migrating a Discover scan from a single server to a grid .................... 2136
Grid scanning performance guidelines ........................................... 2136
Troubleshooting grid scans ......................................................... 2138
Contents 61

Chapter 68 Using Server FlexResponse plug-ins to remediate

incidents ...................................................................... 2140
About the Server FlexResponse platform ....................................... 2140
Using Server FlexResponse custom plug-ins to remediate
incidents ........................................................................... 2142
Deploying a Server FlexResponse plug-in ...................................... 2143
Adding a Server FlexResponse plug-in to the plug-ins properties
file ............................................................................. 2143
Creating a properties file to configure a Server FlexResponse
plug-in ........................................................................ 2145
Locating incidents for manual remediation ...................................... 2148
Using the action of a Server FlexResponse plug-in to remediate an
incident manually ................................................................ 2149
Verifying the results of an incident response action .......................... 2150
Troubleshooting a Server FlexResponse plug-in .............................. 2151

Chapter 69 Setting up scans of Box cloud storage using an

on-premises detection server ................................. 2153
Setting up scans of Box cloud storage targets using an on-premises
detection server .................................................................. 2153
Configuring scans of Box cloud storage targets ............................... 2154
Optimizing Box cloud storage scanning ......................................... 2156
Configuring remediation options for Box cloud storage targets ............ 2157

Chapter 70 Setting up scans of file shares ...................................... 2159

Setting up server scans of file systems .......................................... 2159
Supported file system targets ...................................................... 2160
Automatically discovering servers and shares before configuring a file
system target ..................................................................... 2161
Working with Content Root Enumeration scans ......................... 2162
Troubleshooting Content Root Enumeration scans ..................... 2164
Automatically discovering open file shares ..................................... 2165
About automatically tracking incident remediation status ................... 2166
Troubleshooting automated incident remediation tracking ............ 2167
Configuration options for Automated Incident Remediation
Tracking ...................................................................... 2168
Excluding internal DFS folders ..................................................... 2171
Configuring scans of Microsoft Outlook Personal Folders (.pst
files) ................................................................................. 2171
Configuring scans of file systems ................................................. 2172
Optimizing file system target scanning ........................................... 2176
Contents 62

Configuring Network Protect for file shares ..................................... 2177

Priority of write-access credentials for file shares ............................. 2179

Chapter 71 Setting up scans of Lotus Notes databases ............... 2181

Setting up server scans of IBM (Lotus) Notes databases ................... 2181
Supported IBM (Lotus) Notes targets ............................................ 2182
Configuring and running IBM (Lotus) Notes scans ............................ 2182
Configuring IBM (Lotus) Notes DIIOP mode configuration scan
options ............................................................................. 2185

Chapter 72 Setting up scans of SQL databases .............................. 2187

Setting up server scans of SQL databases ..................................... 2187
Supported SQL database targets ................................................. 2188
Configuring and running SQL database scans ................................ 2188
Installing the JDBC driver for SQL database targets ......................... 2192
SQL database scan configuration properties ................................... 2192

Chapter 73 Setting up scans of SharePoint servers ...................... 2195

Setting up server scans of SharePoint servers ................................ 2195
About scans of SharePoint servers ............................................... 2196
Supported SharePoint server targets ............................................. 2198
Access privileges for SharePoint scans ......................................... 2198
About Alternate Access Mapping Collections .................................. 2198
Configuring and running SharePoint server scans ............................ 2198
Configuring Network Protect for SharePoint servers ......................... 2203
Installing the SharePoint solution on the Web Front Ends in a
farm ................................................................................. 2205
Enabling SharePoint scanning without installing the SharePoint
solution ............................................................................. 2207
Setting up SharePoint scans to use Kerberos authentication .............. 2208
Troubleshooting SharePoint scans ............................................... 2209

Chapter 74 Setting up scans of Exchange servers ......................... 2211

Setting up server scans of Exchange repositories ............................ 2211
About scans of Exchange servers ................................................. 2212
Supported Exchange Server targets .............................................. 2213
Configuring Exchange Server scans ............................................. 2214
Setting up Exchange scans to use Kerberos authentication ............... 2217
Example configurations and use cases for Exchange scans ............... 2218
Troubleshooting Exchange scans ................................................. 2219
Contents 63

Chapter 75 About Network Discover scanners ............................... 2220

How Network Discover scanners work ........................................... 2220
Troubleshooting scanners ........................................................... 2221
Scanner processes ................................................................... 2222
Scanner installation directory structure .......................................... 2223
Scanner configuration files .......................................................... 2224
Scanner controller configuration options ........................................ 2225

Chapter 76 Setting up scanning of Documentum

repositories ................................................................. 2227
Setting up remote scanning of Documentum repositories .................. 2227
Supported Documentum (scanner) targets ..................................... 2228
Installing Documentum scanners .................................................. 2228
Starting Documentum scans ........................................................ 2230
Configuration options for Documentum scanners ............................. 2231
Example configuration for scanning all documents in a Documentum
repository .......................................................................... 2233

Chapter 77 Setting up scanning of file systems ............................. 2235

Setting up remote scanning of file systems ..................................... 2236
Supported file system scanner targets ........................................... 2237
Installing file system scanners ..................................................... 2237
Starting file system scans ........................................................... 2239
Installing file system scanners silently from the command line ............ 2241
Configuration options for file system scanners ................................. 2242
Example configuration for scanning the C drive on a Windows
computer ........................................................................... 2243
Example configuration for scanning the /usr directory on UNIX .......... 2243
Example configuration for scanning with include filters ...................... 2243
Example configuration for scanning with exclude filters ..................... 2244
Example configuration for scanning with include and exclude filters
....................................................................................... 2244
Example configuration for scanning with date filtering ...................... 2245
Example configuration for scanning with file size filtering ................... 2245
Example configuration for scanning that skips symbolic links on UNIX
systems ............................................................................ 2246
Contents 64

Chapter 78 Setting up scanning of OpenText (Livelink)

targets .......................................................................... 2247
Setting up remote scanning of OpenText (Livelink) repositories ........... 2247
Supported OpenText (Livelink) scanner targets ............................... 2248
Creating an ODBC data source for SQL Server ............................... 2248
Installing Livelink scanners ......................................................... 2249
Starting OpenText (Livelink) scans ................................................ 2251
Configuration options for Livelink scanners ..................................... 2253
Example configuration for scanning a Livelink database .................... 2254

Chapter 79 Setting up scanning of Web servers ............................ 2255

Setting up remote scanning of web servers .................................... 2255
Supported web server (scanner) targets ........................................ 2256
Installing web server scanners ..................................................... 2256
Starting web server scans ........................................................... 2258
Configuration options for web server scanners ................................ 2260
Example configuration for a web site scan with no authentication ........ 2262
Example configuration for a web site scan with basic
authentication .................................................................... 2262
Example configuration for a web site scan with form-based
authentication .................................................................... 2263
Example configuration for a web site scan with NTLM ....................... 2263
Example of URL filtering for a web site scan ................................... 2264
Example of date filtering for a web site scan ................................... 2265

Chapter 80 Setting up Web Services for custom scan

targets .......................................................................... 2266
Setting up Web Services for custom scan targets ............................ 2266
About setting up the Web Services Definition Language (WSDL) ........ 2267
Example of a Web Services Java client ......................................... 2267
Sample Java code for the Web Services example ............................ 2268

Section 9 Discovering and preventing data loss on

endpoints ................................................................ 2272
Chapter 81 Overview of Symantec Data Loss Prevention for
endpoints ..................................................................... 2273
About discovering and preventing data loss on endpoints .................. 2273
Guidelines for authoring Endpoint policies ...................................... 2275
Contents 65

Chapter 82 Summary of DLP Agent for Mac support .................... 2277

About DLP Agent feature-level support .......................................... 2277
Mac agent installation and tools feature details ................................ 2278
Mac agent installation support ............................................... 2278
Mac endpoint tools features .................................................. 2279
Mac agent management features ................................................. 2279
Mac agent endpoint location ................................................. 2280
Mac agent groups features ................................................... 2280
Overview of Mac agent detection technologies and policy authoring
features ............................................................................ 2280
Mac agent detection technologies .......................................... 2281
Mac agent policy response rule features .................................. 2284
Mac agent monitoring support ...................................................... 2297
Mac agent removable storage features .................................... 2287
Clipboard features supported on Mac agents ............................ 2288
Mac agent Email features ..................................................... 2289
Mac agent browser features .................................................. 2290
Mac agent Application Monitoring features ............................... 2291
Mac agent copy to network share features ............................... 2292
Mac agent filter by file properties features ................................ 2292
Mac agent filter by network properties features ......................... 2293
Endpoint Prevent for Mac agent advanced agent settings
features ............................................................................ 2293
Endpoint Discover for Mac targets features .................................... 2294
Endpoint Discover for Mac file system support ................................ 2295
Endpoint Discover for Mac advanced agent settings support .............. 2295

Chapter 83 Using Endpoint Prevent .................................................. 2296

About Endpoint Prevent monitoring ............................................... 2296
About removable storage monitoring ....................................... 2297
About endpoint network monitoring ......................................... 2299
About CD/DVD monitoring .................................................... 2300
About print/fax monitoring ..................................................... 2301
About network share monitoring ............................................. 2302
About clipboard monitoring ................................................... 2303
About global application monitoring ........................................ 2303
About group-specific application monitoring: using overrides ........ 2304
About cloud storage application monitoring .............................. 2305
About virtual desktop support with Endpoint Prevent .................. 2306
About rules results caching (RRC) .......................................... 2309
About policy creation for Endpoint Prevent ..................................... 2309
Contents 66

About monitoring policies with response rules for Endpoint

Servers ....................................................................... 2310
How to implement Endpoint Prevent ............................................. 2312
Setting the endpoint location ................................................. 2313
About Endpoint Prevent response rules in different locales .......... 2314

Chapter 84 Using Endpoint Discover ................................................ 2316

How Endpoint Discover works ..................................................... 2316
About Endpoint Discover scanning ............................................... 2316
About scanning targeted endpoints ........................................ 2317
About Endpoint Discover full scanning .................................... 2318
About Endpoint Discover incremental scanning ......................... 2318
About Endpoint Discover classification scanning ....................... 2320
About parallel scans on targeted endpoints .............................. 2321
Optimizing the scan for endpoint performance .......................... 2322
Preparing to set up Endpoint Discover ........................................... 2322
Creating a policy group for Endpoint Discover ........................... 2323
Creating a policy for Endpoint Discover ................................... 2324
Adding a rule for Endpoint Discover ........................................ 2324
Setting up and configuring Endpoint Discover ................................. 2325
Creating an Endpoint Discover scan ............................................. 2326
Creating a new Endpoint Discover target ................................. 2327
About Endpoint Discover filters .............................................. 2334
Configuring Endpoint Discover scan timeout settings ................. 2341
Managing Endpoint Discover target scans ...................................... 2342
About managing Endpoint Discover scans ............................... 2342
About Endpoint Discover targeted endpoints scan details ............ 2343
About remediating Endpoint Discover incidents ......................... 2345
About Endpoint reports ........................................................ 2345

Chapter 85 Working with agent configurations .............................. 2347

About agent configurations ......................................................... 2347
About cloning agent configurations ......................................... 2348
Adding and editing agent configurations ........................................ 2348
Channel settings ................................................................. 2349
Channel Filters settings ........................................................ 2353
Application Monitoring settings .............................................. 2362
Device Control settings ........................................................ 2364
Agent settings .................................................................... 2364
Advanced agent settings ...................................................... 2372
Setting specific channels to monitor based on the endpoint
location ....................................................................... 2411
Contents 67

Applying agent configurations to an agent group ............................. 2412

Configuring the agent connection status ........................................ 2412

Chapter 86 Working with Agent Groups ........................................... 2414

About agent groups ................................................................... 2414
Developing a strategy for deploying Agent Groups ........................... 2415
Overview of the agent group deployment process ............................ 2416
Creating and managing agent attributes ........................................ 2417
Creating a new agent attribute ............................................... 2418
Defining a search filter for creating user-defined attributes ........... 2419
Verifying attribute queries with the Attribute Query Resolver
tool ............................................................................ 2419
Applying a new attribute or changed attribute to agents .............. 2420
Undoing changes to agent attributes ....................................... 2421
Editing user-defined agent attributes ....................................... 2421
Viewing and managing agent groups ............................................. 2421
Agent group conditions ........................................................ 2422
Creating a new agent group .................................................. 2423
Updating outdated agent configurations ................................... 2423
Assigning configurations to deploy groups ............................... 2424
Verify that group assignments are correct ................................ 2424
Viewing group conflicts ............................................................... 2425
Changing groups ...................................................................... 2425

Chapter 87 Managing Symantec DLP Agents .................................. 2427

About Symantec DLP Agent administration .................................... 2427
Agent Overview screen ........................................................ 2428
About agent events ............................................................. 2446
About Symantec DLP Agent removal ...................................... 2454
About DLP Agent logs ................................................................ 2457
Setting the log levels for an Endpoint Agent ............................. 2457
About agent password management ............................................. 2458
Create a new agent uninstall or Endpoint tools password ............ 2459
Change an existing agent uninstall or Endpoint tools
password .................................................................... 2460
Retain existing agent uninstall or Endpoint tools passwords ......... 2460

Chapter 88 Using application monitoring ........................................ 2461

About global application monitoring .............................................. 2461
Changing global application monitoring settings ........................ 2462
Contents 68

Monitoring instant messenger applications on Mac

endpoints .................................................................... 2465
List of CD/DVD applications .................................................. 2466
About adding applications ........................................................... 2467
Adding a Windows application ..................................................... 2468
Using the GetAppInfo tool ..................................................... 2471
Adding a macOS application ....................................................... 2472
Defining macOS application binary names ............................... 2475
Ignoring macOS applications ....................................................... 2475
About Application File Access monitoring ....................................... 2476
Implementing Application File Access monitoring ............................. 2477

Chapter 89 Working with Endpoint FlexResponse ......................... 2479

About Endpoint FlexResponse ..................................................... 2479
Deploying Endpoint FlexResponse ............................................... 2481
About deploying Endpoint FlexResponse plug-ins on endpoints .......... 2481
Deploying Endpoint FlexResponse plug-ins using a silent installation
process ............................................................................ 2482
About the Endpoint FlexResponse utility ........................................ 2483
Deploying an Endpoint FlexResponse plug-in using the Endpoint
FlexResponse utility ............................................................ 2485
Enabling Endpoint FlexResponse on the Enforce Server ................... 2486
Uninstalling an Endpoint FlexResponse plug-in using the Endpoint
FlexResponse utility ............................................................ 2486
Retrieving an Endpoint FlexResponse plug-in from a specific
endpoint ............................................................................ 2487
Retrieving a list of Endpoint FlexResponse plug-ins from an
endpoint ............................................................................ 2488

Chapter 90 Using Endpoint tools ....................................................... 2489

About Endpoint tools .................................................................. 2489
Using Endpoint tools with Windows 7/8.1/10 ............................. 2491
Shutting down the agent and the watchdog services on Windows
endpoints .................................................................... 2492
Using Endpoint tools with macOS .......................................... 2492
Shutting down the agent service on Mac endpoints .................... 2493
Inspecting the database files accessed by the agent .................. 2493
Viewing extended log files .................................................... 2494
About the Device ID utilities .................................................. 2496
Starting DLP Agents that run on Mac endpoints ........................ 2499
Contents 69

Chapter 91 Using SEP Intensive Protection .................................... 2501

About the SEP Intensive Protection file reputation service ................. 2501
Enabling SEP Intensive Protection ................................................ 2502
Setting the SEP Intensity Level .................................................... 2503
Adding a SEP Intensive Protection response rule ............................ 2503

Section 10 Monitoring data loss in cloud

applications ........................................................... 2505
Chapter 92 Working with Application Detection ............................ 2506
About Application Detection ........................................................ 2506
Managing Application Detection ................................................... 2507

Chapter 93 Working with Cloud Service for Email ......................... 2513

About Cloud Service for Email ..................................................... 2513
About updating email domains in the Enforce Server administration
console ............................................................................. 2514
Viewing Cloud Service for Email detector details ....................... 2514
Adding the unique TXT record to your DNS settings ................... 2516
Updating email domains ....................................................... 2516
Update override by the Symantec Cloud Service ....................... 2518
Encrypting cloud email with Symantec Information Centric
Encryption ......................................................................... 2518
Implementing ICE with Cloud Service for Email ......................... 2519
Configuring the Enforce Server to communicate with the ICE
service ....................................................................... 2519
Creating encryption response rules for ICE encryption ................ 2520
About decrypting ICE encrypted email ..................................... 2522
Viewing details about ICE incidents ........................................ 2522

Section 11 Monitoring data loss using DLP

Appliances .............................................................. 2526
Chapter 94 Implementing and working with DLP
Appliances ................................................................... 2527
About DLP Appliances ............................................................... 2527
About obtaining the appliance activation file and licenses .................. 2528
Contents 70

Obtaining activation and license files for the virtual

appliance .................................................................... 2528
Obtaining license files for the DLP S500-10 Hardware
Appliance .................................................................... 2530
About the Command Line Interface (CLI) ....................................... 2531
About performance tuning and sizing for appliances ......................... 2531

Chapter 95 Deploying DLP Appliances ............................................. 2532

Deployment overview for the virtual appliance ................................. 2532
Setting up the virtual appliance .................................................... 2534
Deployment overview for the DLP-S500 hardware appliance .............. 2536
Setting up the DLP-S500 Appliance .............................................. 2537
Adding an appliance .................................................................. 2539
Configuring the API Detection for Developer Apps Appliance ............. 2540

Chapter 96 Post-deployment tasks ................................................... 2542

Unbinding or resetting a DLP appliance ......................................... 2542
Updating appliance software ....................................................... 2543
Log files and logging for appliances .............................................. 2544

Index ................................................................................................................. 2545

Section 1
Getting started

■ Chapter 1. Introducing Symantec Data Loss Prevention

■ Chapter 2. Getting started administering Symantec Data Loss Prevention

■ Chapter 3. Working with languages and locales

Chapter 1
Introducing Symantec Data
Loss Prevention
This chapter includes the following topics:

■ About updates to the Symantec Data Loss Prevention Administration Guide

■ About Symantec Data Loss Prevention

■ About the Enforce Server platform

■ About Network Monitor and Prevent

■ About Network Discover/Cloud Storage Discover

■ About Network Protect

■ About Endpoint Discover

■ About Endpoint Prevent

About updates to the Symantec Data Loss Prevention

Administration Guide
This guide is occasionally updated as new information becomes available. You can find the
latest version of the Symantec Data Loss Prevention Administration Guide at the following link
to the Symantec Support Center article: http://www.symantec.com/docs/DOC9261.
Subscribe to the article at the Support Center to be notified when there are updates.
The following table provides the history of updates to this version of the Symantec Data Loss
Prevention Administration Guide:
Introducing Symantec Data Loss Prevention 73
About updates to the Symantec Data Loss Prevention Administration Guide

Table 1-1 Change history for the Symantec Data Loss Prevention Administration Guide

Date Description

19 August 2019 Corrected the path to the index data files on Windows to read
ProgramData\Symantec\DataLossPrevention\ServerPlatformCommon
\15.5\Protect\datafiles.

Updated the Windows and Linux pathnames for the Indexer.properties

file.

Added a table of channel-specific limits to the Increasing the inspection

content size chapter.

Changed the name of SmtpPrevent0.log to

SmtpPrevent_operational0.log.

Corrected name of the log file for SMTP Prevent. The filename was
SmtpPrevent0.log; now it is RequestProcessor0.log.

Clarified that you must use dedicated hardware or VMs with dedicated
resources for OCR Servers.

Corrected the description of the Limit Incident Data Retention response rule
to indicate that the rule is only supported for Endpoint Prevent.

Corrected the default value for the advanced agent setting

"NetworkMonitor.NUM_OF_LISTENER_THREADS."

14 June 2019 Minor editorial updates.

Introducing Symantec Data Loss Prevention 74
About updates to the Symantec Data Loss Prevention Administration Guide

Table 1-1 Change history for the Symantec Data Loss Prevention Administration Guide
(continued)

Date Description

11 June 2019 Updated Secure ICAP content to reflect that both self-signed and CA-issued
certificates are supported.

Deleted text that indicated document source files uploaded to the Enforce
Server are deleted after indexing. IDM source files are not deleted from the
Enforce Server after indexing.

Added that for OCR, only load balancers without persistence enabled are
supported.

Added support for OCR and Cloud Prevent for Office 365 on Azure.

Updated default value for Lexer.MaximumNumberOfTokens to 30000.

Fixed squished text in the "Advanced settings for OCR and FR image
extraction" table.

Added detailed information about pre- and post-validator characters for custom
data identifiers.

Added information about high-performance content extraction for Office Open

XML files.

Added information about whitelisting the Titanium server for SEP Intensive
Protection.

26 March 2019 Updated cross reference to locale settings to one for JDK 8 and JRE 8:
Changed to
https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html.

Updated "About updating email domains in the Enforce Server administration

console" to remove "new" and add references to 15.5.

Fixed broken link to "Detecting content using data identifiers" in "Introducing

Exact Match Data Identifiers" topic to point to "About data identifiers."

Fixed broken xref and properties name in Profile size limitations on the DLP
Agent for EMDI.

Removed the “Only available with Network Prevent for Email” text from two
response rule topics: "Network Prevent: Modify SMTP Message" and "Network
Prevent: Block SMTP Message".

6 March 2019 Reinstated procedure for starting an Enforce Server (inadvertently dropped
in previous release).

11 February 2019 Minor fixes to layout for readability.

Introducing Symantec Data Loss Prevention 75
About Symantec Data Loss Prevention

Table 1-1 Change history for the Symantec Data Loss Prevention Administration Guide
(continued)

Date Description

1 February 2019 ■ Revised entire "Detecting content using Exact Match Data Identifiers"
chapter.
■ Made minor updates to "Using diagnostics for OCR Server deployments"
section.
■ Added new "Creating a null policy to assist in OCR diagnostics for Discover
Servers" section.
■ Fixed formatting in table 69-2.
■ Added Cloud Applications and API Appliance lookup parameters.
■ Corrected data exposure detail description for "Document is Exposed."

About Symantec Data Loss Prevention

Symantec Data Loss Prevention enables you to:
■ Discover and locate confidential information in cloud storage repositories, on file and web
servers, in databases, and on endpoints (desk and laptop systems)
■ Protect confidential information through quarantine
■ Monitor network traffic for transmission of confidential data
■ Monitor the use of sensitive data on endpoints
■ Prevent transmission of confidential data to outside locations
■ Automatically enforce data security and encryption policies
Symantec Data Loss Prevention includes the following components:
■ Enforce Server
See “About the Enforce Server platform” on page 77.
See “About Symantec Data Loss Prevention administration” on page 82.
See “About the Enforce Server administration console” on page 83.
■ Network Discover/Cloud Storage Discover
See “About Network Discover/Cloud Storage Discover” on page 78.
■ Network Protect
See “About Network Protect” on page 79.
■ Network Monitor
See “About Network Monitor and Prevent” on page 78.
■ Network Prevent
Introducing Symantec Data Loss Prevention 76
About Symantec Data Loss Prevention

See “About Network Monitor and Prevent” on page 78.

■ Endpoint Discover
See “About Endpoint Discover” on page 80.
■ Endpoint Prevent
See “About Endpoint Prevent” on page 80.
The Discover, Protect, Monitor, and Prevent modules can be deployed as stand-alone products
or in combination. Regardless of which stand-alone products you deploy, the Enforce Server
is always provided for central management. Note that the Network Protect module requires
the Network Discover/Cloud Storage Discover module.
Associated with each product module are corresponding detection servers and cloud detectors:
■ Network Discover/Cloud Storage Discover Server locates the exposed confidential data
on a broad range of enterprise data repositories including:
■ Box cloud storage
■ File servers
■ Databases
■ Microsoft SharePoint
■ IBM/Lotus Notes
■ EMC Documentum
■ Livelink
■ Microsoft Exchange
■ Web servers
■ Other data repositories
If you are licensed for Network Protect, this server also copies and quarantines sensitive
data on file servers and in Box cloud storage, as specified in your policies.
See “About Network Discover/Cloud Storage Discover” on page 78.
■ Network Monitor Server monitors the traffic on your network.
See “About Network Monitor and Prevent” on page 78.
■ Network Prevent for Email Server blocks emails that contain sensitive data.
See “About Network Monitor and Prevent” on page 78.
■ Network Prevent for Web Server blocks HTTP postings and FTP transfers that contain
sensitive data.
See “About Network Monitor and Prevent” on page 78.
■ Endpoint Server monitors and prevents the misuse of confidential data on endpoints.
See “About Endpoint Discover” on page 80.
Introducing Symantec Data Loss Prevention 77
About the Enforce Server platform

See “About Endpoint Prevent” on page 80.

The distributed architecture of Symantec Data Loss Prevention allows organizations to:
■ Perform centralized management and reporting.
■ Centrally manage data security policies once and deploy immediately across the entire
Symantec Data Loss Prevention suite.
■ Scale data loss prevention according to the size of your organization.

About the Enforce Server platform

The Symantec Data Loss Prevention Enforce Server is the central management platform that
enables you to define, deploy, and enforce data loss prevention and security policies. The
Enforce Server administration console provides a centralized, web-based interface for deploying
detection servers, authoring policies, remediating incidents, and managing the system.
See “About Symantec Data Loss Prevention” on page 75.
The Enforce platform provides you with the following capabilities:
■ Build and deploy accurate data loss prevention policies. You can choose among various
detection technologies, define rules, and specify actions to include in your data loss
prevention policies. Using provided regulatory and best-practice policy templates, you can
meet your regulatory compliance, data protection and acceptable-use requirements, and
address specific security threats.
See “About Data Loss Prevention policies” on page 368.
See “Detecting data loss” on page 381.
■ Automatically deploy and enforce data loss prevention policies. You can automate policy
enforcement options for notification, remediation workflow, blocking, and encryption.
■ Measure risk reduction and demonstrate compliance. The reporting features of the Enforce
Server enables you to create actionable reports identifying risk reduction trends over time.
You can also create compliance reports to address conformance with regulatory
requirements.
See “About Symantec Data Loss Prevention reports” on page 1899.
See “About incident reports” on page 1902.
■ Empower rapid remediation. Based on incident severity, you can automate the entire
remediation process using detailed incident reporting and workflow automation. Role-based
access controls empower individual business units and departments to review and remediate
those incidents that are relevant to their business or employees.
See “About incident remediation” on page 1841.
See “Remediating incidents” on page 1844.
Introducing Symantec Data Loss Prevention 78
About Network Monitor and Prevent

■ Safeguard employee privacy. You can use the Enforce Server to review incidents without
revealing the sender identity or message content. In this way, multi-national companies
can meet legal requirements on monitoring European Union employees and transferring
personal data across national boundaries.
See “About role-based access control” on page 109.

About Network Monitor and Prevent

The Symantec Data Loss Prevention network data monitoring and prevention products include:
■ Network Monitor
Network Monitor captures and analyzes traffic on your network. It detects confidential data
and significant traffic metadata over the protocols that you specify. For example, SMTP,
FTP, HTTP, and various IM protocols. You can configure a Network Monitor Server to
monitor custom protocols and to use a variety of filters (per protocol) to filter out low-risk
traffic.
■ Network Prevent for Email
Network Prevent for Email integrates with standard MTAs and hosted email services to
provide in-line active SMTP email management. Policies that are deployed on in-line
Network Prevent for Email Server direct the next-hop mail server to block, reroute, or tag
email messages. These blocks are based on specific content and other message attributes.
Communication between MTAs and Network Prevent for Email Server can be secured as
necessary using TLS.
Implement Network Monitor, review the incidents it captures, and refine your policies
accordingly before you implement Network Prevent for Email.
See the Symantec Data Loss Prevention MTA Integration Guide for Network Prevent for
Email.
■ Network Prevent for Web
For in-line active web request management, Network Prevent for Web integrates with an
HTTP, HTTPS, or FTP proxy server. This integration uses the Internet Content Adaptation
Protocol (ICAP) . The Network Prevent for Web Server detects confidential data in HTTP,
HTTPS, or FTP content. When it does, it causes the proxy to reject requests or remove
HTML content as specified by the governing policies.

About Network Discover/Cloud Storage Discover

Network Discover/Cloud Storage Discover scans cloud storage repositories, networked file
shares, web content servers, databases, document repositories, and endpoint systems at high
speeds to detect exposed data and documents. Network Discover/Cloud Storage Discover
enables companies to understand exactly where confidential data is exposed and helps
significantly reduce the risk of data loss.
Introducing Symantec Data Loss Prevention 79
About Network Protect

Network Discover/Cloud Storage Discover gives organizations the following capabilities:

■ Pinpoint unprotected confidential data. Network Discover/Cloud Storage Discover helps
organizations accurately locate at risk data that is stored on their networks. You can then
inform shared file server owners to protect the data.
■ Reduce proliferation of confidential data. Network Discover/Cloud Storage Discover helps
organizations to detect the spread of sensitive information throughout the company and
reduce the risk of data loss.
■ Automate investigations and audits. Network Discover/Cloud Storage Discover streamlines
data security investigations and compliance audits. It accomplishes this task by enabling
users to scan for confidential data automatically, as well as review access control and
encryption policies.
■ During incident remediation, Veritas Data Insight helps organizations solve the problem of
identifying data owners and responsible parties for information due to incomplete or
inaccurate metadata or tracking information.
See the Symantec Data Loss Prevention Data Insight Implementation Guide.
■ To provide additional flexibility in remediating Network Discover/Cloud Storage Discover
incidents, use the FlexResponse application programming interface (API), or the
FlexResponse plug-ins that are available.
See the Symantec Data Loss Prevention FlexResponse Platform Developers Guide, or
contact Symantec Professional Services for a list of plug-ins.
See “About Symantec Data Loss Prevention” on page 75.

About Network Protect

Network Protect reduces your risk by removing exposed confidential data, intellectual property,
and classified information from open file shares on network servers or desktop computers.
Note that there is no separate Network Protect server; the Network Protect product module
adds protection functionality to the Network Discover Server.
Network Protect gives organizations the following capabilities:
■ Apply visual tags to content in Box cloud storage. Network Protect can apply a text tag to
files that violate policies that are store in Box cloud storage.
■ Quarantine exposed files. Network Protect can automatically move those files that violate
policies to a quarantine area that re-creates the source file structure for easy location.
Optionally, Symantec Data Loss Prevention can place a marker text file in the original
location of the offending file. The marker file can explain why and where the original file
was quarantined.
Introducing Symantec Data Loss Prevention 80
About Endpoint Discover

■ Copy exposed or suspicious files. Network Protect can automatically copy those files that
violate policies to a quarantine area. The quarantine area can re-create the source file
structure for easy location, and leave the original file in place.
■ Quarantine file restoration. Network Protect can easily restore quarantined files to their
original or a new location.
■ Enforce access control and encryption policies. Network Protect proactively ensures
workforce compliance with existing access control and encryption policies.
See “About Symantec Data Loss Prevention” on page 75.
See “Configuring Network Protect for file shares” on page 2177.

About Endpoint Discover

Endpoint Discover detects sensitive data on your desktop or your laptop endpoints. It consists
of at least one Endpoint Server and at least one Symantec DLP Agent that runs on an endpoint.
You can have many Symantec DLP Agents connected to a single Endpoint Server. Symantec
DLP Agents:
■ Detect sensitive data in the endpoint file system.
■ Collect data on that activity.
■ Send incidents to the Endpoint Server.
■ Send the data to the associated Endpoint Server for analysis, if necessary.
See “About Endpoint Prevent” on page 80.
See “About Symantec Data Loss Prevention” on page 75.

About Endpoint Prevent

Endpoint Prevent detects and prevents sensitive data from leaving from your desktop or your
laptop endpoints. It consists of at least one Endpoint Server and all the Symantec DLP Agents
running on the endpoint systems that are connected to it. You can have many Symantec DLP
Agents connected to a single Endpoint Server. Endpoint Prevent detects on the following data
transfers:
■ Application monitoring
■ CD/DVD
■ Clipboard
■ Email/SMTP
■ eSATA removable drives
Introducing Symantec Data Loss Prevention 81
About Endpoint Prevent

■ FTP
■ HTTP/HTTPS
■ IM
■ Network shares
■ Print/Fax
■ USB removable media devices
See “About Endpoint Discover” on page 80.
See “About Symantec Data Loss Prevention” on page 75.
Chapter 2
Getting started
administering Symantec
Data Loss Prevention
This chapter includes the following topics:

■ About Symantec Data Loss Prevention administration

■ About the Enforce Server administration console

■ Logging on and off the Enforce Server administration console

■ About the administrator account

■ Performing initial setup tasks

■ Changing the administrator password

■ Adding an administrator email account

■ Editing a user profile

■ Changing your password

About Symantec Data Loss Prevention administration

The Symantec Data Loss Prevention system consists of one Enforce Server and one or more
detection servers.
The Enforce Server stores all system configuration, policies, saved reports, and other Symantec
Data Loss Prevention information and manages all activities.
Getting started administering Symantec Data Loss Prevention 83
About the Enforce Server administration console

System administration is performed from the Enforce Server administration console, which is
accessed by a Firefox or Internet Explorer Web browser. The Enforce console is displayed
after you log on.
See “About the Enforce Server administration console” on page 83.
After completing the installation steps in the Symantec Data Loss Prevention Installation Guide,
you must perform initial configuration tasks to get Symantec Data Loss Prevention up and
running for the first time. These are essential tasks that you must perform before the system
can begin monitoring data on your network.
See “Performing initial setup tasks” on page 85.

About the Enforce Server administration console

You administer the Symantec Data Loss Prevention system through the Enforce Server
administration console.
The Administrator user can see and access all parts of the administration console. Other users
can see only the parts to which their roles grant them access. The user account under which
you are currently logged on appears at the top right of the screen.
When you first log on to the administration console, the default Home page is displayed. You
and your users can change the default Home page using the Home page selection button.
See Table 2-1 on page 83.
To navigate through the system, select items from one of the four menu clusters (Home,
Incidents, Manage, and System).
Located in the upper-right portion of the administration console are the following navigation
and operation icons:

Table 2-1 Administration console navigation and operation icons

Icon Description

Help. Click this icon to access the context-sensitive online help for your current page.

Select this page as your Home page. If the current screen cannot be selected as
your Home page, this icon is unavailable.

Back to previous screen. Symantec recommends using this Back button rather than
your browser Back button. Use of your browser Back button may lead to
unpredictable behavior and is not recommended.

Screen refresh. Symantec recommends using this Refresh button rather than your
browser Reload or Refresh button. Use of your browser buttons may lead to
unpredictable behavior and is not recommended.
Getting started administering Symantec Data Loss Prevention 84
Logging on and off the Enforce Server administration console

Table 2-1 Administration console navigation and operation icons (continued)

Icon Description

Print the current report. If the current screen contents cannot be sent to the printer,
this icon is unavailable.

Email the current report to one or more recipients. If the current screen contents
cannot be sent as an email, this icon is unavailable.

See “Logging on and off the Enforce Server administration console” on page 84.

Logging on and off the Enforce Server administration

console
If you are assigned more than one role, you can only log on under one role at a time. You must
specify the role name and user name at logon.
To log on to the Enforce Server
1 On the Enforce Server host, open a browser and point it to the URL for your server (as
provided by the Symantec Data Loss Prevention administrator).
2 On the Symantec Data Loss Prevention logon screen, enter your user name in the
Username field. For the administrator role, this user name is always Administrator.
Users with multiple roles should specify the role name and the user name in the format
role\user (for example, ReportViewer\bsmith). If they do not, Symantec Data Loss
Prevention assigns the user a role upon logon.
See “Configuring roles” on page 114.
3 In the Password field, type the password. For the administrator at first logon, this password
is the password you created during the installation.
For installation details, see the appropriate Symantec Data Loss Prevention Installation
Guide.
4 Click login.
The Enforce Server administration console appears. The administrator can access all
parts of the administration console, but another user can see only those parts that are
authorized for that particular role.
To log out of the Enforce Server
1 Click logout at the top right of the screen.
2 Click OK to confirm.
Symantec Data Loss Prevention displays a message confirming the logout was successful.
Getting started administering Symantec Data Loss Prevention 85
About the administrator account

See “Editing a user profile” on page 87.

About the administrator account

The Symantec Data Loss Prevention system is preconfigured with a permanent administrator
account. Note that the name is case sensitive and cannot be changed. You configured a
password for the administrator account during installation.
Refer to the Symantec Data Loss Prevention Installation Guide for more information.
Only the administrator can see or modify the administrator account. Role options do not appear
on the administrator configure screen, because the administrator always has access to
every part of the system.
See “Changing the administrator password” on page 86.
See “Adding an administrator email account” on page 86.

Performing initial setup tasks

After completing the installation steps in the Symantec Data Loss Prevention Installation Guide,
you must perform initial configuration tasks to get Symantec Data Loss Prevention up and
running for the first time. These are essential tasks that you must perform before the system
can begin monitoring data on your network.
■ Change the Administrator's password to a unique password only you know, and add an
email address for the Administrator user account so you can be notified of various system
events.
See “About the administrator account” on page 85.
■ Add and configure your detection servers.
See “Adding a detection server” on page 273.
See “Server configuration—basic” on page 253.
■ Add any user accounts you need in addition to those supplied by your Symantec Data Loss
Prevention solution pack.
■ Review the policy templates provided with your Symantec Data Loss Prevention solution
pack to familiarize yourself with their content and data requirements. Revise the polices or
create new ones as needed.
■ Add the data profiles that you plan to associate with policies.
Data profiles are not always required. This step is necessary only if you are licensed for
data profiles and if you intend to use them in policies.
Getting started administering Symantec Data Loss Prevention 86
Changing the administrator password

Changing the administrator password

During installation, you created a generic administrator password. When you log on for the
first time, you should change this password to a unique, secret password.
See the Symantec Data Loss Prevention Installation Guide for more information.
Passwords are case-sensitive and they must contain at least eight characters.
Note that you can configure Symantec Data Loss Prevention to require strong passwords.
Strong passwords are passwords specifically designed to be difficult to break. Password policy
is configured from the System > Settings > General > Configure screen.
When your password expires, Symantec Data Loss Prevention displays the Password Renewal
window at the next logon. When the Password Renewal window appears, type your old
password, and then type your new password and confirm it.
See “Configuring user accounts” on page 121.
To change the administrator password
1 Log on as administrator.
2 Click Profile in the upper-right corner of the administration console.
3 On the Edit Profile screen:
■ Enter your new password in the New Password field.
■ Re-enter your new password in the Re-enter New Password field. The two new
passwords must be identical.
Note that passwords are case-sensitive.
4 Click Save.
See “About the administrator account” on page 85.
See “About the Enforce Server administration console” on page 83.
See “About the Overview screen” on page 278.

Adding an administrator email account

You can specify an email address to receive administrator account related messages.
Getting started administering Symantec Data Loss Prevention 87
Editing a user profile

To add or change an administrator email account

1 Click Profile in the upper-right corner of the administration console.
2 Type the new (or changed) administrator email address in the email Address field.
The email addresses must include a fully qualified domain name. For example:
[email protected].

3 Click Save.
See “About the administrator account” on page 85.
See “About the Enforce Server administration console” on page 83.
See “About the Overview screen” on page 278.

Editing a user profile

System users can use the Profile screen to configure their profile passwords, email addresses,
and languages.
Users can also specify their report preferences at the Profile screen.
To display the Profile screen, click the drop-down list at the top-right of the Enforce Server
administration console, then select Profile.
The Profile screen is divided into the following sections:
■ Authentication. Use this section to change your password, or select certificate
authentication, if available.
■ General. Use this section to specify your email address, choose a language preference,
and view your selected home page.
■ Report Preferences. Use this section to specify your preferred text encoding, CSV delimiter,
and XML export preferences.
■ Roles This section displays your role. Note that this section is not displayed for the
administrator because the administrator is authorized to perform all roles.

The Authentication section:

To change your password
1 Enter your new password in the New Password field.
2 Re-enter your new password in the Re-enter New Password field.
3 Click Save.
Getting started administering Symantec Data Loss Prevention 88
Editing a user profile

To use certificate authentication

1 If certificate authentication is available to you, select Use Certificate authentication.
2 Enter your LDAP common name (CN) in the Common Name (CN) field.
3 Click Save.

The General section:

The next time you log on, you must use your new password.
See “Changing your password” on page 89.
To specify a new personal email address
1 In the Email Address field enter your personal email address.
2 Click Save.
Individual Symantec Data Loss Prevention users can choose which of the available languages
and locales they want to use.
To choose a language for individual use
1 Click the option next to your language choice.
2 Click Save.
The Enforce Server administration console is re-displayed in the new language.
Choosing a language profile has no effect on the detection of policy violations. Detection is
performed on all content that is written in any supported language regardless of the language
you choose for your profile.
See “About support for character sets, languages, and locales” on page 91.
The languages available to you are determined when the product is installed and the later
addition of language packs for Symantec Data Loss Prevention. The effect of choosing a
different language varies as follows:
■ Locale only. If the language you choose has the notice Translations not available, dates
and numbers are displayed in formats appropriate for the language. Reports and lists are
sorted in accordance with that language. But the administration console menus, labels,
screens, and Help system are not translated and remain in English.
See “About locales” on page 95.
■ Translated. The language you choose may not display the notice Translations not available.
In this case, in addition to the number and date format, and sort order, the administration
console menus, labels, screens, and in some cases the Help system, are translated into
the chosen language.
See “About Symantec Data Loss Prevention language packs” on page 94.
Getting started administering Symantec Data Loss Prevention 89
Changing your password

The Report Preferences section:

To select your text encoding
1 Select a text encoding option:
■ Use browser default encoding. Check this box to specify that text files use the same
encoding as your browser.
■ Pull down menu. Click on an encoding option in the pull down menu to select it.

2 Click Save.
The new text encoding is applied to CSV exported files. This encoding lets you select a
text encoding that matches the encoding that is expected by CSV applications.
To select a CSV delimiter
1 Choose one of the delimiters from the pull-down menu.
2 Click Save.
The new delimiter is applied to the next comma-separated values (CSV) list that you
export.
See “About incident reports” on page 1902.
See “Exporting incident reports” on page 1921.
To select XML export details
1 Include Incident Violations in XML Export. If this box is checked, reports exported to
XML include the highlighted matches on each incident snapshot.
2 Include Incident History in XML Export. If this box is checked, reports exported to XML
include the incident history data that is contained in the History tab of each incident
snapshot.
3 Click Save.
Your selections are applied to the next report you export to XML.
If neither box is checked, the exported XML report contains only the basic incident information.
See “About incident reports” on page 1902.
See “Exporting incident reports” on page 1921.

Changing your password

When your password expires, Symantec Data Loss Prevention displays the Password Renewal
window at the next logon. When the Password Renewal window appears, enter your new
password and confirm it.
Getting started administering Symantec Data Loss Prevention 90
Changing your password

When your password expires, the system requires you to specify a new one the next time you
attempt to log on. If you are required to change your password, the Password Renewal window
appears.
To change your password from the Password Renewal window
1 Enter your old password in the Old password field of the Password Renewal window.
2 Enter your new password in the New Password field of the Password Renewal window.
3 Re-enter your new password in the Re-enter New Password field of the Password
Renewal window.
The next time you log on, you must use your new password.
You can also change your password at any time from the Profile screen.
See “Editing a user profile” on page 87.
See “About the administrator account” on page 85.
See “Logging on and off the Enforce Server administration console” on page 84.
Chapter 3
Working with languages
and locales
This chapter includes the following topics:

■ About support for character sets, languages, and locales

■ Supported languages for detection

■ Working with international characters

■ About Symantec Data Loss Prevention language packs

■ About locales

■ Using a non-English language on the Enforce Server administration console

■ Using the Language Pack Utility

About support for character sets, languages, and

locales
Symantec Data Loss Prevention fully supports international deployments by offering a large
number of languages and localization options:
■ Policy creation and violation detection across many languages.
The supported languages can be used in keywords, data identifiers, regular expressions,
exact data profiles (EDM) and document profiles (IDM).
See “Supported languages for detection” on page 92.
■ Operation on localized and Multilingual User Interface (MUI) versions of Windows operating
systems.
Working with languages and locales 92
Supported languages for detection

■ International character sets. To view and work with international character sets, the system
on which you are viewing the Enforce Server administration console must have the
appropriate capabilities.
See “Working with international characters” on page 93.
■ Locale-based date and number formats, as well as sort orders for lists and reports.
See “About locales” on page 95.
■ Localized user interface (UI) and Help system. Language packs for Symantec Data Loss
Prevention provide language-specific versions of the Enforce Server administration console.
They may also provide language-specific versions of the online Help system.

Note: These language packs are added separately following initial product installation.

■ Localized product documentation.

■ Language-specific notification pop-ups. Endpoint notification pop-ups appear in the display
language that is selected on the endpoint instead of the system locale language. For
example, if the system locale is set to English and the user sets the display language to
German, the notification pop-up appears in German.

Note: A mixed language notification pop-up displays if the user locale language does not
match the language used in the response rule.

Supported languages for detection

■ Arabic
■ Brazilian Portuguese
■ Chinese (traditional)
■ Chinese (simplified)
■ Czech
■ Danish
■ Dutch
■ English
■ Finnish
■ French
■ German
Working with languages and locales 93
Working with international characters

■ Greek
■ Hebrew
■ Hungarian
■ Italian
■ Japanese
■ Korean
■ Norwegian
■ Polish
■ Portuguese
■ Romanian
■ Russian
■ Spanish
■ Swedish
See “About support for character sets, languages, and locales” on page 91.

Working with international characters

You can use a variety of languages in Symantec Data Loss Prevention, based on:
■ The operating system-based character set installed on the computer from which you view
the Enforce Server administration console
■ The capabilities of your browser
For example, an incident report on a scan of Russian-language data would contain Cyrillic
characters. To view that report, the computer and browser you use to access the Enforce
Server administration console must be capable of displaying these characters. Here are some
general guidelines:
■ If the computer you use to access the Enforce Server administration console has an
operating system localized for a particular language, you should be able to view and use
a character set that supports that language.
■ If the operating system of the computer you use to access the administration console is
not localized for a particular language, you may need to add supplemental language support.
This supplemental language support is added to the computer you use to access the
administration console, not on the Enforce Server.
Working with languages and locales 94
About Symantec Data Loss Prevention language packs

■ On a Windows system, you add supplemental language support using the Control
Panel > Regional and Language Options > Languages (tab) - Supplemental
Language Support to add fonts for some character sets.

■ It may also be necessary to set your browser to accommodate the characters you want to
view and enter.

Note: The Enforce Server administration console supports UTF-8 encoded data.

■ On a Windows system, it may also be necessary to use the Languages – Supplemental

Language Support tab under Control Panel > Regional and Language Options to add
fonts for some character sets.
See the Symantec Data Loss Prevention Release Notes for known issues regarding specific
languages.
See “About support for character sets, languages, and locales” on page 91.

About Symantec Data Loss Prevention language packs

Language packs for Symantec Data Loss Prevention localize the product for a particular
language on Windows-based systems. After a language pack is added to Symantec Data Loss
Prevention, administrators can specify it as the system-wide default. If administrators make
multiple language packs available for use, individual users can choose the language they want
to work in.
See “Using a non-English language on the Enforce Server administration console” on page 95.
Language packs provide the following:
■ The locale of the selected language becomes available to administrators and end users in
Enforce Server Configuration screen.
■ Enforce Server screens, menu items, commands, and messages appear in the language.
■ The Symantec Data Loss Prevention online Help system may be displayed in the language.
Language packs for Symantec Data Loss Prevention are available from Symantec File Connect.

Caution: When you install a new version of Symantec Data Loss Prevention, any language
packs you have installed are deleted. For a new, localized version of Symantec Data Loss
Prevention, you must upgrade to a new version of the language pack.

See “About locales” on page 95.

See “About support for character sets, languages, and locales” on page 91.
Working with languages and locales 95
About locales

About locales
Locales are installed as part of a language pack.
A locale provides the following:
■ Displays dates and numbers in formats appropriate for that locale.
■ Sorts lists and reports based on text columns, such as "policy name" or "file owner,"
alphabetically according to the rules of the locale.
An administrator can also configure an additional locale for use by individual users. This
additional locale need only be supported by the required version of Java.
For a list of these locales, see
https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html.
The locale can be specified at product installation time, as described in the Symantec Data
Loss Prevention Installation Guide. It can also be configured at a later time using the Language
Pack Utility.
See “Using a non-English language on the Enforce Server administration console” on page 95.
See “About support for character sets, languages, and locales” on page 91.

Using a non-English language on the Enforce Server

administration console
The use of locales and languages is specified through the Enforce Server administration
console by the following roles:
■ Symantec Data Loss Prevention administrator. Specifies that one of the available languages
be the default system-wide language and sets the locale.
■ Individual Symantec Data Loss Prevention user. Chooses which of the available locales
to use.

Note: The addition of multiple language packs could slightly affect Enforce Server performance,
depending on the number of languages and customizations present. This occurs because an
additional set of indexes has to be built and maintained for each language.

Warning: Do not modify the Oracle database NLS_LANGUAGE and NLS_TERRITORY settings.

See “About Symantec Data Loss Prevention language packs” on page 94.
See “About locales” on page 95.
Working with languages and locales 96
Using the Language Pack Utility

A Symantec Data Loss Prevention administrator specifies which of the available languages
is the default system-wide language.
To choose the default language for all users
1 On the Enforce Server, go to System > Settings > General and click Configure.
The Edit General Settings screen is displayed.
2 Scroll to the Language section of the Edit General Settings screen, and click the button
next to the language you want to use as the system-wide default.
3 Click Save.
Individual Symantec Data Loss Prevention users can choose which of the available languages
and locales they want to use by updating their profiles.
See “Editing a user profile” on page 87.
Administrators can use the Language Pack Utility to update the available languages.
See “Using the Language Pack Utility” on page 96.
See “About support for character sets, languages, and locales” on page 91.

Note: If the Enforce Server runs on a Linux host, you must install language fonts on the host
machine using the Linux Package Manager application. Language font packages begin with
fonts-<language_name>. For example, fonts-japanese-0.20061016-4.el5.noarch

Using the Language Pack Utility

To make a specific locale available for Symantec Data Loss Prevention, you add language
packs through the Language Pack Utility.
You run the Language Pack Utility from the command line. Its executable,
LanguagePackUtility.exe, resides in the \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\bin directory on
Windows and /opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/bin on
Linux.
To use the Language Pack Utility, you must have Read, Write, and Execute permissions on
all of the\Program Files\Symantec\DataLossPrevention\EnforceServer\15.5 (Windows)
or /opt/Symantec/DataLossPrevention/EnforceServer/15.5 (Linux) folders and subfolders.
If you are running the utility on Linux, you must be a root user.
To display help for the utility, such as the list of valid options and their flags, enter
LanguagePackUtility without any flags.
Working with languages and locales 97
Using the Language Pack Utility

Note: Running the Language Pack Utility causes the SymantecDLPManagerService and
SymantecDLPIncidentPersisterService services to stop for as long as 20 seconds. Any
users who are logged on to the Enforce Server administration console will be logged out
automatically. When finished making its updates, the utility restarts the services automatically,
and users can log back on to the administration console.

Language packs for Symantec Data Loss Prevention can be obtained from Symantec File
Connect.
To add a language pack (Windows)
1 Advise other users that anyone currently using the Enforce Server administration console
must save their work and log off.
2 Run the Language Pack Utility with the -a flag followed by the name of the ZIP file for
that language pack. Enter:

LanguagePackUtility -a filename

where filename is the fully qualified path and name of the language pack ZIP file.
For example, if the Japanese language pack ZIP file is stored in c:\temp, add it by entering:

LanguagePackUtility -a c:\temp\Symantec_DLP_15.5_Japanese.zip

To add multiple language packs during the same session, specify multiple file names,
separated by spaces, for example:

LanguagePackUtility -a
c:\temp\Symantec_DLP_15.5_Japanese.zip
Symantec_DLP_15.5_Chinese.zip

3 Log on to the Enforce Server administration console and confirm that the new language
option is available on the Edit General Settings screen. To do this, go to System >
Settings > General > Configure > Edit General Settings.
To add a language pack (Linux)
1 Advise other users that anyone currently using the Enforce Server administration console
must save their work and log off.
2 Open a terminal session to the Enforce Server host and switch to the DLP_system_account
by running the following command:
su - DLP_system_account
Working with languages and locales 98
Using the Language Pack Utility

3 Run the following command:

DLP_home/Protect/bin/LanguagePackUtility -a <path to language pack zip
file>

4 Log on to the Enforce Server administration console and confirm that the new language
option is available on the Edit General Settings screen. To do this, go to System >
Settings > General > Configure > Edit General Settings.
To remove a language pack
1 Advise users that anyone currently using the Enforce Server administration console must
save their work and log off.
2 Run the Language Pack Utility with the -r flag followed by the Java locale code of the
language pack you want to remove. Enter:

LanguagePackUtility -r locale

where locale is a valid Java locale code corresponding to a Symantec Data Loss Prevention
language pack.
For example, to remove the French language pack enter:

LanguagePackUtility -r fr_FR

To remove multiple language packs during the same session, specify multiple file names,
separated by spaces.
3 Log on to the Enforce Server administration console and confirm that the language pack
is no longer available on the Edit General Settings screen. To do this, go to System >
Settings > General > Configure > Edit General Settings.
Removing a language pack has the following effects:
■ Users can no longer select the locale of the removed language pack for individual use.

Note: If the locale of the language pack is supported by the version of Java required for
running Symantec Data Loss Prevention, the administrator can later specify it as an alternate
locale for any users who need it.

■ The locale reverts to the system-wide default configured by the administrator.

■ If the removed language was the system-wide default locale, the system locale reverts to
English.
Working with languages and locales 99
Using the Language Pack Utility

To change or add a locale

1 Advise users that anyone currently using the Enforce Server administration console must
save their work and log off.
2 Run the Language Pack Utility using the -c flag followed by the Java locale code for the
locale that you want to change or add. Enter:

LanguagePackUtility -c locale

where locale is a valid locale code recognized by Java, such as pt_PT for Portuguese.
For example, to change the locale to Brazilian Portuguese enter:

LanguagePackUtility -c pt_BR

3 Log on to the Enforce Server administration console and confirm that the new alternate
locale is now available on the Edit General Settings screen. To do this, go to System >
Settings > General > Configure > Edit General Settings.
If you specify a locale for which there is no language pack, "Translations not available"
appears next to the locale name. This means that formatting and sort order are appropriate
for the locale, but the Enforce Server administration console screens and online Help are
not translated.

Note: Administrators can only make one additional locale available for users that is not based
on a previously installed Symantec Data Loss Prevention language pack.

See “About support for character sets, languages, and locales” on page 91.
Section 2
Managing the Enforce Server
platform

■ Chapter 4. Managing Enforce Server services and settings

■ Chapter 5. Managing roles and users

■ Chapter 6. Connecting to group directories

■ Chapter 7. Managing stored credentials

■ Chapter 8. Managing system events and messages

■ Chapter 9. Managing the Symantec Data Loss Prevention database

■ Chapter 10. Working with Symantec Information Centric Encryption

■ Chapter 11. Working with Symantec Information Centric Tagging

■ Chapter 12. Adding a new product module

■ Chapter 13. Applying a Maintenance Pack

Chapter 4
Managing Enforce Server
services and settings
This chapter includes the following topics:

■ About Symantec Data Loss Prevention services

■ About starting and stopping services on Windows

■ About starting and stopping services on Linux

About Symantec Data Loss Prevention services

The Symantec Data Loss Prevention services may need to be stopped and started periodically.
This section provides a brief description of each service and how to start and stop the services
on supported platforms.
The Symantec Data Loss Prevention services for the Enforce Server are described in the
following table:

Table 4-1 Symantec Data Loss Prevention Enforce Server services

Service Name Description

Symantec DLP Manager Provides the centralized reporting and management services for Symantec
Data Loss Prevention.

If you have more than 50 policies, 50 detection servers, or 50,000 agents,

increase the Max Memory for this service from 2048 to 4096. You can
adjust this setting in the SymantecDLPManager.conf file.

See “To increase memory for the Symantec DLP Manager service”
on page 102.
Managing Enforce Server services and settings 102
About starting and stopping services on Windows

Table 4-1 Symantec Data Loss Prevention Enforce Server services (continued)

Service Name Description

Symantec DLP Detection Controls the detection servers.

Server Controller

Symantec DLP Notifier Provides the database notifications.

Symantec DLP Incident Writes the incidents to the database.

Persister

To increase memory for the Symantec DLP Manager service

1 Open the SymantecDLPManager.conf file in a text editor.
You can find this configuration file in one of the following locations:
■ Windows: \Program
Files\Symantec\DataLossPrevention\EnforceServer\Services

■ Linux: /opt/Symantec/DataLossPrevention/EnforceServer/Services

2 Change the value of the wrapper.java.maxmemory parameter to 4096.

wrapper.java.maxmemory = 4096

3 Save and close the file.

See “About starting and stopping services on Windows” on page 102.

About starting and stopping services on Windows

The procedures for starting and stopping services vary according to installation configurations
and between Enforce and detection servers.
■ See “Starting an Enforce Server on Windows” on page 103.
■ See “Stopping an Enforce Server on Windows” on page 103.
■ See “Starting a detection server on Windows” on page 104.
■ See “Stopping a detection server on Windows” on page 104.
■ See “Starting services on single-tier Windows installations” on page 104.
■ See “Starting services on single-tier Windows installations” on page 104.
■ See “Stopping services on single-tier Windows installations” on page 105.
Managing Enforce Server services and settings 103
About starting and stopping services on Windows

Starting an Enforce Server on Windows

Use the following procedure to start the Symantec Data Loss Prevention services on a Windows
Enforce Server.
To start the Symantec Data Loss Prevention services on a Windows Enforce Server
1 On the computer that hosts the Enforce Server, navigate to Start > All Programs >
Administrative Tools > Services to open the Windows Services menu.
2 Start the Symantec Data Loss Prevention services in the following order:
■ SymantecDLPNotifierService

■ SymantecDLPManagerService

■ SymantecDLPIncidentPersisterService

■ SymantecDLPDetectionServerControllerService

Note: Start the SymantecDLPNotifierService service first before starting other services.

See “Stopping an Enforce Server on Windows” on page 103.

Stopping an Enforce Server on Windows

Use the following procedure to stop the Symantec Data Loss Prevention services on a Windows
Enforce Server.
To stop the Symantec Data Loss Prevention services on a Windows Enforce Server
1 On the computer that hosts the Enforce Server, navigate to Start > All Programs >
Administrative Tools > Services to open the Windows Services menu.
2 From the Services menu, stop all running Symantec Data Loss Prevention services in the
following order:
■ SymantecDLPDetectionServerControllerService

■ SymantecDLPIncidentPersisterService

■ SymantecDLPManagerService

■ SymantecDLPNotifierService

See “Starting an Enforce Server on Windows” on page 103.

Managing Enforce Server services and settings 104
About starting and stopping services on Windows

Starting a detection server on Windows

To start the Symantec Data Loss Prevention service on a Windows detection server
1 On the computer that hosts the detection server, navigate to Start > All Programs >
Administrative Tools > Services to open the Windows Services menu.
2 Start the SymantecDLPDetectionServerService service.
See “Stopping a detection server on Windows” on page 104.

Stopping a detection server on Windows

Use the following procedure to stop the Symantec Data Loss Prevention service on a Windows
detection server.
To stop the Symantec Data Loss Prevention service on a Windows detection server
1 On the computer that hosts the detection server, navigate to Start > All Programs >
Administrative Tools > Services to open the Windows Services menu.
2 Stop the SymantecDLPDetectionServerService service.
See “Starting a detection server on Windows” on page 104.

Starting services on single-tier Windows installations

Use the following procedure to start the Symantec Data Loss Prevention services on a single-tier
installation on Windows.
To start the Symantec Data Loss Prevention services on a single-tier Windows installation
1 On the computer that hosts the Symantec Data Loss Prevention server applications,
navigate to Start > All Programs > Administrative Tools > Services to open the Windows
Services menu.
2 Start the Symantec Data Loss Prevention in the following order:
■ SymantecDLPNotifierService

■ SymantecDLPManagerService

■ SymantecDLPIncidentPersisterService

■ SymantecDLPDetectionServerControllerService

■ SymantecDLPDetectionServerService

Note: Start the SymantecDLPNotifierService service before starting other services.

See “Stopping services on single-tier Windows installations” on page 105.

Managing Enforce Server services and settings 105
About starting and stopping services on Linux

Stopping services on single-tier Windows installations

Use the following procedure to stop the Symantec Data Loss Prevention services on a single-tier
installation on Windows.
To stop the Symantec Data Loss Prevention services on a single-tier Windows installation
1 On the computer that hosts the Symantec Data Loss Prevention server applications,
navigate to Start > All Programs > Administrative Tools > Services to open the Windows
Services menu.
2 From the Services menu, stop all running Symantec Data Loss Prevention services in the
following order:
■ SymantecDLPDetectionServerService

■ SymantecDLPDetectionServerControllerService

■ SymantecDLPIncidentPersisterService

■ SymantecDLPManagerService

■ SymantecDLPNotifierService

See “Starting services on single-tier Windows installations” on page 104.

About starting and stopping services on Linux

The procedures for starting and stopping services vary according to installation configurations
and between Enforce and detection servers.
■ See “Starting an Enforce Server on Linux” on page 105.
■ See “Stopping an Enforce Server on Linux” on page 106.
■ See “Starting a detection server on Linux” on page 106.
■ See “Stopping a detection server on Linux” on page 107.
■ See “Starting services on single-tier Linux installations” on page 107.
■ See “Stopping services on single-tier Linux installations” on page 107.

Starting an Enforce Server on Linux

Use the following procedure to start the Symantec Data Loss Prevention services on a Linux
Enforce Server.
Managing Enforce Server services and settings 106
About starting and stopping services on Linux

To start the Symantec Data Loss Prevention services on a Linux Enforce Server
1 On the computer that hosts the Enforce Server, log on as root.
2 Start the Symantec DLP Notifier service by running the following command:

service SymantecDLPNotifierService start

3 Start the remaining Symantec Data Loss Prevention services, by running the following
commands:

service SymantecDLPManagerService start

service SymantecDLPIncidentPersisterService start
service SymantecDLPDetectionServerControllerService start

See “Stopping an Enforce Server on Linux” on page 106.

Stopping an Enforce Server on Linux

Use the following procedure to stop the Symantec Data Loss Prevention services on a Linux
Enforce Server.
To stop the Symantec Data Loss Prevention services on a Linux Enforce Server
1 On the computer that hosts the Enforce Server, log on as root.
2 Stop all running Symantec Data Loss Prevention services by running the following
commands:

service SymantecDLPIncidentPersisterService stop

service SymantecDLPManagerService stop
service SymantecDLPDetectionServerControllerService stop
service SymantecDLPNotifierService stop

See “Starting an Enforce Server on Linux” on page 105.

Starting a detection server on Linux

Use the following procedure to start the Symantec Data Loss Prevention service on a Linux
detection server.
To start the Symantec Data Loss Prevention service on a Linux detection server
1 On the computer that hosts the detection server, log on as root.
2 Start the Symantec Data Loss Prevention service by running the following command:

service SymantecDLPDetectionServerService start

Managing Enforce Server services and settings 107
About starting and stopping services on Linux

See “Stopping a detection server on Linux” on page 107.

Stopping a detection server on Linux

Use the following procedure to stop the Symantec Data Loss Prevention service on a Linux
detection server.
To stop the Symantec Data Loss Prevention service on a Linux detection server
1 On the computer that hosts the detection server, log on as root.
2 Stop the Symantec Data Loss Prevention service by running the following command:

service SymantecDLPDetectionServerService stop

See “Starting a detection server on Linux” on page 106.

Starting services on single-tier Linux installations

Use the following procedure to start the Symantec Data Loss Prevention services on a single-tier
installation on Linux.
To start the Symantec Data Loss Prevention services on a single-tier Linux installation
1 On the computer that hosts the Symantec Data Loss Prevention server applications, log
on as root.
2 Start the Symantec DLP Notifier service by running the following command:

service SymantecDLPNotifierService start

3 Start the remaining Symantec Data Loss Prevention services by running the following
commands:

service SymantecDLPManagerService start

service SymantecDLPDetectionServerService start
service SymantecDLPIncidentPersisterService start
service SymantecDLPDetectionServerControllerService start

See “Stopping services on single-tier Linux installations” on page 107.

Stopping services on single-tier Linux installations

Use the following procedure to stop the Symantec Data Loss Prevention services on a single-tier
installation on Linux.
Managing Enforce Server services and settings 108
About starting and stopping services on Linux

To stop the Symantec Data Loss Prevention services on a single-tier Linux installation
1 On the computer that hosts the Symantec Data Loss Prevention servers, log on as root.
2 Stop all running Symantec Data Loss Prevention services by running the following
commands:

service SymantecDLPIncidentPersisterService stop

service SymantecDLPManagerService stop
service SymantecDLPDetectionServerService stop
service SymantecDLPDetectionServerControllerService stop
service SymantecDLPNotifierService stop

See “Starting services on single-tier Linux installations” on page 107.

Chapter 5
Managing roles and users
This chapter includes the following topics:

■ About role-based access control

■ About configuring roles and users

■ About recommended roles for your organization

■ Roles included with solution packs

■ Configuring roles

■ Configuring user accounts

■ Configuring password enforcement settings

■ Resetting the Administrator password

■ Manage and add roles

■ Manage and add users

■ About authenticating users

■ Configuring user authentication

■ Integrating Active Directory for user authentication

■ About certificate authentication configuration

About role-based access control

Symantec Data Loss Prevention provides role-based access control to govern how users
access product features and functionality. For example, a role might let users view reports,
but prevent users from creating policies or deleting incidents. Or, a role might let users author
policy response rules but not detection rules.
Managing roles and users 110
About configuring roles and users

Roles determine what a user can see and do in the Enforce Server administration console.
For example, the Report role is a specific role that is included in most Symantec Data Loss
Prevention solution packs. Users in the Report role can view incidents and create policies,
and configure Discover targets (if you are running a Discover Server). However, users in the
Report role cannot create Exact Data or Document Profiles. Also, users in the Report role
cannot perform system administration tasks. When a user logs on to the system in the Report
role, the Manage > Data Profiles and the System > Login Management modules in the
Enforce Server administration console are not visible to this user.
You can assign a user to more than one role. Membership in multiple roles allows a user to
perform different kinds of work in the system. For example, you grant the information security
manager user (InfoSec Manager) membership in two roles: ISR (information security first
responder) and ISM (information security manager). The InfoSec Manager can log on to the
system as either a first responder (ISR) or a manager (ISM), depending on the task(s) to
perform. The InfoSec Manager only sees the Enforce Server components appropriate for those
tasks.
You can also combine roles and policy groups to limit the policies and detection servers that
a user can configure. For example, you associate a role with the European Office policy group.
This role grants access to the policies that are designed only for the European office.
See “Policy deployment” on page 373.
Users who are assigned to multiple roles must specify the desired role at log on. Consider an
example where you assign the user named "User01" to two roles, "Report" and "System
Admin." If "User01" wanted to log on to the system to administer the system, the user would
log on with the following syntax: Login: System Admin\User01
See “Logging on and off the Enforce Server administration console” on page 84.
The Administrator user (created during installation) has access to every part of the system
and therefore is not a member of any access-control role.
See “About the administrator account” on page 85.

About configuring roles and users

When you install the Enforce Server, you create a default Administrator user that has access
to all roles. If you import a solution pack to the Enforce Server, the solution pack includes
several roles and users to get you started.
See “About the administrator account” on page 85.
You may want to add roles and users to the Enforce Server. When adding roles and users,
consider the following guidelines:
■ Understand the roles necessary for your business users and for the information security
requirements and procedures of your organization.
Managing roles and users 111
About recommended roles for your organization

See “About recommended roles for your organization” on page 111.

■ Review the roles that created when you installed a solution pack. You can likely use several
of them (or modified versions of them) for users in your organization.
See “Roles included with solution packs” on page 112.
■ If necessary, modify the solution-pack roles and create any required new roles.
See “Configuring roles” on page 114.
■ Create users and assign each of them to one or more roles.
See “Configuring user accounts” on page 121.
■ Manage roles and users and remove those not being used.
See “Manage and add roles” on page 126.
See “Manage and add users” on page 126.

About recommended roles for your organization

To determine the most useful roles for your organization, review your business processes and
security requirements.
Most businesses and organizations find the following roles fundamental when they implement
the Symantec Data Loss Prevention system:
■ System Administrator
This role provides access to the System module and associated menu options in the
Enforce Server administration console. Users in this role can monitor and manage the
Enforce Server and detection servers(s). Users in this role can also deploy detection servers
and run Discover scans. However, users in this role cannot view detailed incident information
or author policies. All solution packs create a "Sys Admin" role that has system administrator
privileges.
■ User Administrator
This role grants users the right to manage users and roles. Typically this role grants no
other access or privileges. Because of the potential for misuse, it is recommended that no
more than two people in the organization be assigned this role (primary and backup).
■ Policy Admininistrator
This role grants users the right to manage policies and response rules. Typically this role
grants no other access or privileges. Because of the potential for misuse, it is recommended
that no more than two people in the organization be assigned this role (primary and backup).
■ Policy Author
This role provides access to the Policies module and associated menu options in the
Enforce Server administration console. This role is suited for information security managers
who track incidents and respond to risk trends. An information security manager can author
Managing roles and users 112
Roles included with solution packs

new policies or modifying existing policies to prevent data loss. All solution packs create
an "InfoSec Manager" (ISM) role that has policy authoring privileges.
■ Incident Responder
This role provides access to the Incidents module and associated menu options in the
Enforce Server administration console. Users in this role can track and remediate incidents.
Businesses often have at least two incident responder roles that provide two levels of
privileges for viewing and responding to incidents.
A first-level responder may view generic incident information, but cannot access incident
details (such as sender or recipient identity). In addition, a first-level responder may also
perform some incident remediation, such as escalating an incident or informing the violator
of corporate security policies. A second-level responder might be escalation responder
who has the ability to view incident details and edit custom attributes. A third-level responder
might be an investigation responder who can create response rules, author policies, and
create policy groups.
All solution packs create an "InfoSec Responder" (ISR) role. This role serves as a first-level
responder. You can use the ISM (InfoSec Manager) role to provide second-level responder
access.
Your business probably requires variations on these roles, as well as other roles. For more
ideas about these and other possible roles, see the descriptions of the roles that are imported
with solution packs.
See “Roles included with solution packs” on page 112.

Roles included with solution packs

The various solution packs offered with Symantec Data Loss Prevention create roles and users
when installed. For all solution packs there is a standard set of roles and users. You may see
some variation in those roles and users, depending on the solution pack you import.
The following table summarizes the Financial Services Solution Pack roles. These roles are
largely the same as the roles that are found in other Symantec Data Loss Prevention solution
packs.
See Table 5-1 on page 113.
Managing roles and users 113
Roles included with solution packs

Table 5-1 Financial Services Solution Pack roles

Role Name Description

Compliance Compliance Officer:

■ Users in this role can view, remediate, and delete incidents; look up attributes;
and edit all custom attributes.
■ This comprehensive role provides users with privileges to ensure that
compliance regulations are met. It also allows users to develop strategies for
risk reduction at a business unit (BU) level, and view incident trends and risk
scorecards.

Exec Executive:

■ Users in this role can view, remediate, and delete incidents; look up attributes;
and view all custom attributes.
■ This role provides users with access privileges to prevent data loss risk at the
macro level. Users in this role can review the risk trends and performance
metrics, as well as incident dashboards.

HRM HR Manager:

■ Users in this role can view, remediate, and delete incidents; look up attributes;
and edit all custom attributes.
■ This role provides users with access privileges to respond to the security
incidents that are related to employee breaches.

Investigator Incident Investigator:

■ Users in this role can view, remediate, and delete incidents; look up attributes;
and edit all custom attributes.
■ This role provides users with access privileges to research details of incidents,
including forwarding incidents to forensics. Users in this role may also
investigate specific employees.

ISM InfoSec Manager:

■ Users in this role can view, remediate, and delete incidents. They can look
up attributes, edit all custom attributes, author policies and response rules.
■ This role provides users with second-level incident response privileges. Users
can manage escalated incidents within information security team.

ISR InfoSec Responder:

■ Users in this role can view, remediate, and delete incidents; look up attributes;
and view or edit some custom attributes. They have no access to sender or
recipient identity details.
■ This role provides users with first-level incident response privileges. Users
can view policy incidents, find broken business processes, and enlist the
support of the extended remediation team to remediate incidents.
Managing roles and users 114
Configuring roles

Table 5-1 Financial Services Solution Pack roles (continued)

Role Name Description

Report Reporting and Policy Authoring:

■ Users in this role can view and remediate incidents, and author policies. They
have no access to incident details.
■ This role provides a single role for policy authoring and data loss risk
management.

Sys Admin System administrator:

■ Users in this role can administer the system and the system users, and can
view incidents. They have no access to incident details.

Configuring roles
Each Symantec Data Loss Prevention user is assigned to one or more roles that define the
privileges and rights that user has within the system. A user’s role determines system
administration privileges, policy authoring rights, incident access, and more. If a user is a
member of multiple roles, the user must specify the role when logging on, for example: Login:
Sys Admin/sysadmin01.

See “About role-based access control” on page 109.

See “About configuring roles and users” on page 110.
To configure a role
1 Navigate to the System > Login Management > Roles screen.
2 Click Add Role.
The Configure Role screen appears, displaying the following tabs: General, Incident
Access, Policy Management, and Users.
3 In the General tab:
■ Enter a unique Name for the role. The name field is case-sensitive and is limited to
30 characters. The name you enter should be short and self-describing. Use the
Description field to annotate the role name and explain its purpose in more details.
The role name and description appear in the Role List screen.
■ In the User Privileges section, you grant user privileges for the role.
System privileges(s):

User Select the User Administration option to enable users to create

Administration additional roles and users in the Enforce Server.
(Superuser)
Managing roles and users 115
Configuring roles

Server Select the Server Administration option to enable users to perform the
Administration following functions:
■ Configure detection servers.
■ Create and manage Data Profiles for Exact Data Matching (EDM),
Form Recognition, Indexed Document Matching (IDM), and Vector
Machine Learning (VML).
■ Configure and assign incident attributes.
■ Configure system settings.
■ Configure response rules.
■ Create policy groups.
■ Configure recognition protocols.
■ View system event and traffic reports.
■ Import policies.
Note: Selecting Server Administration also provides Agent Management
privileges.

Agent Management Select the Agent Management option to enable users to perform the
following functions:
■ Review agent status
■ Review agent events
■ Manage agents and perform troubleshooting tasks
■ Delete, restart, and shut down agents
■ Change the Endpoint Server to which agents connect
■ Pull agent logs
■ Access agent summary reports
■ Add and update agent configurations
■ Manage and create agent groups
■ View agent group conflicts
■ Review server logs
■ Manage server logs, including canceling log collection, configuring
logs, and downloading and deleting logs

People privilege:

User Select the User Reporting option to enable users to view the user risk summary.
Reporting
Note: The Incident > View privilege is automatically enabled for all incident
(Risk
types for users with the User Reporting privilege.
Summary,
User See “About user risk” on page 1973.
Snapshot)
Managing roles and users 116
Configuring roles

■ In the Incidents section, you grant users in this role the following incident privilege(s).
These settings apply to all incident reports in the system, including the Executive
Summary, Incident Summary, Incident List, and Incident Snapshots.

View Select the View option to enable users in this role to view policy violation
incidents.
You can customize incident viewing access by selecting various Actions
and Display Attribute options as follows:
■ By default the View option is enabled (selected) for all types of
incidents: Network Incidents, Discover Incidents, and Endpoint
Incidents.
■ To restrict viewing access to only certain incident types, select
(highlight) the type of incident you want to authorize this role to view.
(Hold down the Ctrl key to make multiple selections.) If a role does
not allow a user to view part of an incident report, the option is
replaced with "Not Authorized" or is blank.
Note: If you revoke an incident-viewing privilege for a role, the system
deletes any saved reports for that role that rely on the revoked privilege.
For example, if you revoke (deselect) the privilege to view network
incidents, the system deletes any saved network incident reports
associated with the role.
Managing roles and users 117
Configuring roles

Actions Select among the following Actions to customize the actions a user can
perform when an incident occurs:
■ Remediate Incidents
This privilege lets users change the status or severity of an incident,
set a data owner, add a comment to the incident history, set the Do
Not Hide and Allow Hiding options, and execute response rule
actions. In addition, if you are using the Incident Reporting and Update
API, select this privilege to remediate the location and status attributes.
■ Smart Response Rules to execute
You specify which Smart Response Rules that can be executed on
a per role basis. Configured Smart Response Rules are listed in the
"Available" column on the left. To expose a Smart Response Rule
for execution by a user of this role, select it and click the arrow to add
it to the right-side column. Use the CTRL key to select multiple rules.
■ Perform attribute lookup
Lets users look up incident attributes from external sources and
populate their values for incident remediation.
■ Delete incidents
Lets users delete an incident.
■ Hide incidents
Lets users hide an incident.
■ Unhide incidents
Lets users restore previously hidden incidents.
■ Export Web archive
Lets users export a report that the system compiles from a Web
archive of incidents.
■ Export XML
Lets users export a report of incidents in XML format.
■ Email incident report as CSV attachment
Lets users email as an attachment a report containing a
comma-separated listing of incident details.

Incident Reporting Select among the following user privileges to enable access for Web
and Update API Services clients that use the Incident Reporting and Update API or the
deprecated Reporting API:
■ Incident Reporting
Enables Web Services clients to retrieve incident details.
■ Incident Update
Enables Web Services clients to update incident details. (Does not
apply to clients that use the deprecated Reporting API.)

See the Symantec Data Loss Prevention Incident Reporting and Update
API Developers Guide for more information.
Managing roles and users 118
Configuring roles

Display Attributes Select among the following Display Attributes to customize what
attributes appear in the Incidents view for the policy violations that users
of the role can view.

Shared attributes are common to all types of incidents:

■ Matches
The highlighted text of the message that violated the policy appears
on the Matches tab of the Incident Snapshot screen.
■ History
The incident history.
■ Body
The body of the message.
■ Attachments
The names of any attachments or files.
■ Sender
The message sender.
■ Recipients
The message recipients.
■ Subject
The subject of the message.
■ Original Message
Controls whether or not the original message that caused the policy
violation incident can be viewed.
Note: To view an attachment properly, both the "Attachment" and the
"Original Message" options must be checked.

Endpoint attributes are specific to Endpoint incidents:

■ Username
The name of the Endpoint user.
■ Machine name
The name of the computer where the Endpoint Agent is installed.
Discover attributes are specific to Discover incidents:
■ File Owner
The name of the owner of the file being scanned.
■ Location
The location of the file being scanned.
Managing roles and users 119
Configuring roles

Custom Attributes The Custom Attributes list includes all of the custom attributes
configured by your system administrator, if any.
■ Select View All if you want users to be able to view all custom attribute
values.
■ Select Edit All if you want users to edit all custom attribute values.
■ To restrict the users to certain custom attributes, clear the View All
and Edit All check boxes and individually select the View and/or Edit
check box for each custom attribute you want viewable or editable.
Note: If you select Edit for any custom attribute, the View check box is
automatically selected (indicated by being grayed out). If you want the
users in this role to be able to view all custom attribute values, select
View All.

■ In the Discover section, you grant users in this role the following privileges:

Folder Risk Reporting This privilege lets users view Folder Risk Reports. Refer to the Symantec
Data Loss Prevention Data Insight Implementation Guide.
Note: This privilege is only available for Symantec Data Loss Prevention
Data Insight licenses.

Content Root This privilege lets users configure and run Content Root Enumeration
Enumeration scans. For more information about Content Root Enumeration scans, See
“Working with Content Root Enumeration scans” on page 2162.

4 In the Incident Access tab, configure any conditions (filters) on the types of incidents
that users in this role can view.

Note: You must select the View option on the General tab for settings on the Incident
Access tab to have any effect.

To add an Incident Access condition:

■ Click Add Condition.
■ Select the type of condition and its parameters from left to right, as if writing a sentence.
(Note that the first drop-down list in a condition contains the alphabetized
system-provided conditions that are associated with any custom attributes.)
For example, select Policy Group from the first drop-down list, select Is Any Of from
the second list, and then select Default Policy Group from the final listbox. These
settings would limit users to viewing only those incidents that the default policy group
detected.

5 In the Policy Management tab, select one of the following policy privileges for the role:
Managing roles and users 120
Configuring roles

■ Import Policies
This privilege lets users import policy files that have been exported from an Enforce
Server.
To enable this privilege, the role must also have the Server Administration, Author
Policies, Author Response Rules, and All Policy Groups privileges.
■ Author Policies
This privilege lets users add, edit, and delete policies within the policy groups that are
selected.
It also lets users modify system data identifiers, and create custom data identifiers.
It also lets users create and modify User Groups.
This privilege does not let users create or manage Data Profiles. This activity requires
Enforce Server administrator privileges.
■ Discover Scan Control
Lets the users in this role create Discover targets, run scans, and view Discover
Servers.
■ Credential Management
Lets users create and modify the credentials that the system requires to access target
systems and perform Discover scans.
■ Policy Groups
Select All Policy Groups only if users in this role need access to all existing policy
groups and any that will be created in the future.
Otherwise you can select individual policy groups or the Default Policy Group.

Note: These options do not grant the right to create, modify, or delete policy groups.
Only the users whose role includes the Server Administration privilege can work with
policy groups.

■ Author Response Rules

Enables users in this role to create, edit, and delete response rules.

Note: Users cannot edit or author response rules for policy remediation unless you
select the Author Response Rules option.

Note: Preventing users from authoring response rules does not prevent them from executing
response rules. For example, a user with no response-rule authoring privileges can still
execute smart response rules from an incident list or incident snapshot.
Managing roles and users 121
Configuring user accounts

6 In the Users tab, select any users to which to assign this role. If you have not yet configured
any users, you can assign users to roles after you create the users.
7 Click Save to save your newly created role to the Enforce Server database.

Configuring user accounts

User accounts are the means by which users log onto the system and perform tasks. The role
that the user account belongs to limits what the user can do in the system.
To configure a user account:
1 In the Enforce Server Administration Console, select System > Login Management >
DLP Users to create a new user account or to reconfigure an existing user account. Or,
click Profile to reconfigure the user account to which you are currently logged on.
2 Click Add DLP User to add a new user, or click the name of an existing user to modify
that user's configuration.
3 Enter a name for a new user account in the Name field.
■ The user account name must be between 8 and 30 characters long, is case-sensitive
, and cannot contain backslashes (\).
■ If you use certificate authentication, the Name field value does not have to match the
user's Common Name (CN). However, you may choose to use the same value for
both the Name and Common Name (CN) so that you can easily locate the configuration
for a specific CN. The Enforce Server administration console shows only the Name
field value in the list of configured users.
■ If you are using Active Directory authentication, the user account name must match
the name of the Active Directory user account. Note that all Symantec Data Loss
Prevention user names are case-sensitive, even though Active Directory user names
are not. Active Directory users will need to enter the case-sensitive account name
when logging onto the Enforce Server administration console.
See “Integrating Active Directory for user authentication” on page 137.
Managing roles and users 122
Configuring user accounts

4 Configure the Authentication section of the Configure User page. Only options that are
enabled are available on this page.

Option Instructions

Use Single Sign On If SAML authentication had been enabled, the user can sign on using Single Sign On Mapping
Mapping on the Configure User page.

Use Password Select this option to use password authentication and allow the user to sign on using the
access Enforce Server administration console log on page. This option is required if the user account
will be used for a Reporting API Web Service client.

If you select this option, also enter the user password in the Password and the Re-enter
Password fields. The password must be at least eight characters long and is case-sensitive.
For security purposes, the password is obfuscated and each character appears as an asterisk.

If you configure advanced password settings, the user must specify a strong password. In
addition, the password may expire at a certain date and the user has to define a new one
periodically.

See “Configuring password enforcement settings” on page 124.

You can choose password authentication even if you also use certificate authentication. If you
use certificate authentication, you can optionally disable sign on from the Enforce Server
administration console log on page.

See “Disabling password authentication and forms-based logon” on page 154.

Symantec Data Loss Prevention authenticates all Reporting API clients using password
authentication. If you configure Symantec Data Loss Prevention to use certificate authentication,
any user account that is used to access the Reporting API Web Service must have a valid
password. See the Symantec Data Loss Prevention Reporting API Developers Guide.
Note: If you configure Active Directory integration with the Enforce Server, users authenticate
using their Active Directory passwords. In this case the password field does not appear on
the Users screen.

See “Integrating Active Directory for user authentication” on page 137.

Managing roles and users 123
Configuring user accounts

Option Instructions

Use Certificate Select this option to use certificate authentication and allow the user to automatically single
authentication sign-on with a certificate that is generated by a separate Private Key Infrastructure (PKI). This
option is available only if you have manually configured support for certificate authentication.

See “About authenticating users” on page 127.

See “About certificate authentication configuration” on page 142.

If you select this option, you must specify the common name (CN) value for the user in the
Common Name (CN) field. The CN value appears in the Subject field of the user's certificate,
which is generated by the PKI. Common names generally use the format, first_name
last_name identification_number.

The Enforce Server uses the CN value to map the certificate to this user account. If an
authenticated certificate contains the specified CN value, all other attributes of this user
account, such as the default role and reporting preferences, are applied when the user logs
on.
Note: You cannot specify the same Common Name (CN) value in multiple Enforce Server
user accounts.

Account Disabled Select this option to lock the user out of the Enforce Server administration console. This option
disables access for the user account regardless of which authentication mechanism you use.

For security, after a certain number of consecutive failed logon attempts, the system
automatically disables the account and locks out the user. In this case the Account Disabled
option is checked. To reinstate the user account and allow the user to log on to the system,
clear this option by unchecking it.

5 Optionally enter an Email Address and select a Language for the user in the General
section of the page. The Language selection depends on the language pack(s) you have
installed.
6 In the Report Preferences section of the Users screen you specify the preferences for
how this user is to receive incident reports, including Text File Encoding and CSV
Delimiter.
If the role grants the privilege for XML Export, you can select to include incident violations
and incident history in the XML export.
7 In the Roles section, select the roles that are available to this user to assign data and
incident access privileges.
You must assign the user at least one role to access the Enforce Server administration
console.
See “Configuring roles” on page 114.
Managing roles and users 124
Configuring password enforcement settings

8 Select the Default Role to assign to this user at log on.

The default role is applied if no specific role is requested when the user logs on.
For example, the Enforce Server administration console uses the default role if the user
uses single sign-on with certificate authentication or uses the logon page.

Note: Individual users can change their default role by clicking Profile and selecting a
different option from the Default Role menu. The new default role is applied at the next
logon.

See “About authenticating users” on page 127.

9 Click Save to save the user configuration.

Note: Once you have saved a new user, you cannot edit the user name.

10 Manage users and roles as necessary.

See “Manage and add roles” on page 126.
See “Manage and add users” on page 126.

Configuring password enforcement settings

At the Systems > Settings > General screen you can require users to use strong passwords.
Strong passwords must contain at least eight characters, at least one number, and at least
one uppercase letter. Strong passwords cannot have more than two repeated characters in a
row. If you enable strong passwords, the effect is system-wide. Existing users without a strong
password must update their profiles at next logon.
You can also require users to change their passwords at regular intervals. In this case at the
end of the interval you specify, the system forces users to create a new password.
If you use Active Directory authentication, these password settings only apply to the
Administrator password. All other user account passwords are derived from Active Directory.
See “Integrating Active Directory for user authentication” on page 137.
Managing roles and users 125
Resetting the Administrator password

To configure advanced authentication settings

1 Go to System > Settings > General and click Configure.
2 To require strong passwords, locate the DLP User Authentication section and select
Require Strong Passwords.
Symantec Data Loss Prevention prompts existing users who do not have strong passwords
to create one at next logon.
3 To set the period for which passwords remain valid, type a number (representing the
number of days) in the Password Rotation Period field.
To let passwords remain valid forever, type 0 (the character for zero).

Resetting the Administrator password

Symantec Data Loss Prevention provides the AdminPasswordReset utility to reset the
Administrator's password. There is no method to recover a lost password, but you can use
this utility to assign a new password. You can also use this utility if certificate authentication
mechanisms are disabled and you have not yet defined a password for the Administrator
account.
To use the AdminPasswordReset utility, you must specify the password to the Enforce Server
database. Use the following procedure to reset the password.
To reset the Administrator password for forms-based logon
1 Log on to the Enforce Server computer using the account that you created during Symantec
Data Loss Prevention installation.

Note: Do not change permissions or ownership on any configuration file from another
root or Administrator account.

2 Change directory to the /opt/Symantec/DataLossPrevention/EnforceServer

/15.5/Protect/bin (Linux) or c:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\bin (Windows)
directory. If you installed Symantec Data Loss Prevention into a different directory,
substitute the correct path.
3 Execute the AdminPasswordReset utility using the following syntax:

AdminPasswordReset -dbpass oracle_password -newpass new_administrator_password

Replace oracle_password with the password to the Enforce Server database, and replace
new_administrator_password with the password you want to set.
Managing roles and users 126
Manage and add roles

Manage and add roles

The System > Login Management > Roles screen displays an alphabetical list of the roles
that are defined for your organization.
Roles listed on this screen display the following information:
■ Name – The name of the role
■ Description – A brief description of the role
Assuming that you have the appropriate privileges, you can view, add, modify, or delete roles
as follows:
■ Add a new role, or modify an existing one.
Click Add Role to begin adding a new role to the system.
Click anywhere in a row or the pencil icon (far right) to modify that role
See “Configuring roles” on page 114.
■ Click the red X icon (far right) to delete the role; a dialog box confirms the deletion.
Before editing or deleting roles, note the following guidelines:
■ If you change the privileges for a role, users in that role who are currently logged on to the
system are not affected. For example, if you remove the Edit privilege for a role, users
currently logged on retain permission to edit custom attributes for that session. However,
the next time users log on, the changes to that role take effect, and those users can no
longer edit custom attributes.
■ If you revoke an incident-viewing privilege for a role, the Enforce Server automatically
deletes any saved reports that rely on the revoked privilege. For example, if you revoke
the privilege to view network incidents, the system deletes any saved network incident
reports associated with the newly restricted role.
■ Before you can delete a role, you must make sure there are no users associated with the
role.
■ When you delete a role, you delete all shared saved reports that a user in that role saved.
See “Manage and add users” on page 126.

Manage and add users

The System > Login Management > DLP Users screen lists all the active user accounts in
the system.
For each user account, the following information is listed:
■ User Name – The name the user enters to log on to the Enforce Server
■ Email – The email address of the user
Managing roles and users 127
About authenticating users

■ Access – The role(s) in which the user is a member

Assuming that you have the appropriate privileges, you can add, edit, or delete user accounts
as follows:
■ Add a new user account, or modify an existing one.
Click Add to begin adding a new user to the system.
Click anywhere in a row or the pencil icon (far right) to view and edit that user account.
See “Configuring user accounts” on page 121.
■ Click the red X icon (far right) to delete the user account; a dialog box confirms the deletion.

Note: The Administrator account is created on install and cannot be removed from the
system.

Note: When you delete a user account, you also delete all private saved reports that are
associated with that user.

See “Manage and add roles” on page 126.

About authenticating users

Enforce Server administration console logon authentication options include SAML, forms-based,
Active Directory/Kerberos, and certificate.
Table 5-2 provides the descriptions of these mechanisms for authenticating users to the Enforce
Server administration console:
Managing roles and users 128
About authenticating users

Table 5-2 Enforce Server authentication mechanisms

Authentication Sign-on mechanism Description

mechanism

SAML Single sign-on With SAML authentication, the Enforce Server administration console
authentication authenticates each user by validating the supplied email, user name,
or other user attributes that map to attributes the identity provider uses.

When SAML is enabled, users access the Enforce Server Admin console
URL and are redirected to the identity provider logon page, where they
enter their credentials. After they are authenticated with the identity
provider, their user attributes are sent to the Enforce Server. The
Enforce Server attempts to find a user with matching attributes. If the
user is found, they are logged on to the Enforce Server administration
console.

Configuration template file used:

springSecurityContext-SAML.xml

See “About SAML authentication” on page 131.

Password Forms-based sign-on With password authentication, the Enforce Server administration console
authentication authenticates each user. It determines if the supplied user name and
password combination matches an active user account in the Enforce
Server configuration. An active user account is authenticated if it has
been assigned a valid role.

Users enter their credentials into the Enforce Server administration

console's logon page and submit them over an HTTPS connection to
the Tomcat container that hosts the administration console.

With password authentication, you must configure the user name and
password of each user account directly in the Enforce Server
administration console. You must also ensure that each user account
has at least one assigned role.

Configuration template file used:

springSecurityContext-Form.xml

See “Manage and add users” on page 126.

Managing roles and users 129
About authenticating users

Table 5-2 Enforce Server authentication mechanisms (continued)

Authentication Sign-on mechanism Description

mechanism

Active Directory Forms-based sign-on With Microsoft Active Directory authentication, the Enforce Server
authentication administration console first evaluates a supplied user name to determine
if the name exists in a configured Active Directory server. If the user
name exists in Active Directory, the supplied password for the user is
evaluated against the Active Directory password. Any password that is
configured in the Enforce Server configuration is ignored.

With Active Directory authentication, you must configure a user account

for each new Active Directory user in the Enforce Server administration
console. When you upgrade to Symantec Data Loss Prevention 15,
your existing users do not have to be set up again.

You do not have to enter a password for an Active Directory user

account. You can switch to Active Directory authentication after you
have already created user accounts in the system. However, only those
existing user names that match Active Directory user names remain
valid after the switch.

Configuration template file used:

springSecurityContext-Kerberos.xml

See “Verifying the Active Directory connection” on page 140.

Managing roles and users 130
About authenticating users

Table 5-2 Enforce Server authentication mechanisms (continued)

Authentication Sign-on mechanism Description

mechanism

Certificate Single sign-on from Certificate authentication enables a user to automatically log on to the
authentication Public Key Infrastructure Enforce Server administration console using an X.509 client certificate.
(PKI) This certificate is generated by your public key infrastructure (PKI). To
use certificate-based single sign-on, you must first enable certificate
authentication as described in this section.

See “Configuring certificate authentication for the Enforce Server

administration console” on page 144.

The client certificate must be delivered to the Enforce Server when a

client's browser performs the SSL handshake with the Enforce Server
administration console. For example, you might use a smart card reader
and middleware with your browser to automatically present a certificate
to the Enforce Server. Or, you might obtain an X.509 certificate from a
certificate authority. Then you would upload the certificate to a browser
that is configured to send the certificate to the Enforce Server.

When a user accesses the Enforce Server administration console, the

PKI automatically delivers the user's certificate to the Tomcat container
that hosts the administration console. The Tomcat container validates
the client certificate using the certificate authorities that you have
configured in the Tomcat trust store.

Configuration template file used:

springSecurityContext-Certificate.xml

See “Adding certificate authority (CA) certificates to the Tomcat trust

store” on page 146.

The Enforce Server administration console uses the validated certificate

to determine whether the certificate has been revoked.

See “About certificate revocation checks” on page 150.

If the certificate is valid and has not been revoked, then the Enforce
Server uses the common name (CN) in the certificate to determine if
that CN is mapped to an active user account with a role in the Enforce
Server configuration. For each user that accesses the Enforce Server
administration console using certificate-based single sign-on, you must
create a user account in the Enforce Server that defines the
corresponding user's CN value. You must also assign one or more valid
roles to the user account.

Here are some important things to note when you set up SAML authentication.
■ You must restart the manager when you change the way you authenticate users in SAML.
Changing this mapping criteria in the springSecurityContext file for SAML without
Managing roles and users 131
Configuring user authentication

restarting the manager results in users that are out of sync, as the system continues to use
previous version of the file. For example, if you change the mapping criteria from user name
to email address, you must restart the manager.
■ You must remap each user when you change the way you map users in SAML. Changing
mapping criteria invalidates the existing user's mapping.
■ You must validate the XML syntax before you restart the manager. Some characters such
as "&" that can be part of a user attribute make the XML invalid. You need to replace these
characters with their XML escape string. For example, instead of "&" use "&amp".
■ Do not delete any XML nodes in the XML files.
■ Attribute names in XML must exactly match (including case) attribute names in the identity
provider.
■ When switching from forms-based to SAML authentication, you must go through each user
and disable password access for non-Web Services users.
■ When switching from Certificate authentication to SAML authentication, make sure that the
ClientAuth value in server.xml is set to false.

See “Configuring user authentication” on page 131.

Configuring user authentication

About SAML authentication
SAML (Security Assertion Markup Language) user authentication is now available for logging
on to the Enforce Server administration console. SAML is an XML-based open standard data
format for exchanging authentication and authorization data between service providers and
identity providers. DLP is the service provider.
Before using SAML, you must set up the service provider, the identity provider, and map the
user attributes to identify the user.
Three types of mapping are available: by email, by user name, and by custom user attributes.
When you use SAML, the ROLE\USERNAME logon for local users is not supported.
Symantec supports the following identity providers, both on-premises and cloud based:
■ SAM (Symantec Access Manager)
■ Okta
■ SSOCircle
See the Symantec Data Loss Prevention System Requirements Guide at
http://www.symantec.com/docs/doc10602 for updates on supported IdPs.
Managing roles and users 132
Configuring user authentication

See “Setting up authentication” on page 132.

Setting up authentication
Table 5-3 shows a summary of the tasks for the setup with links to more information on each
step.

Table 5-3 Authentication configuration steps

Step Task More information

Step Edit the Spring context file for the authentication See “Set up and configure the authentication
1 method. method” on page 133.

Step Set up the authentication configuration. For SAML:See “Set up the SAML authentication
2 configuration” on page 135.

For Active Directory/Kerberos:

See “Configuring Active Directory authentication”

on page 136.

For Forms-based:

See “Configuring forms-based authentication”

on page 137.

For Certificate:

See “Configuring certificate authentication”

on page 137.

Step Restart the Enforce Server. See “About Symantec Data Loss Prevention
3 services” on page 101.

Step For SAML, generate and download the service See “Generate or download Enforce (service
4 provider SAML metadata. The Enforce Server providers) SAML metadata” on page 135.
administration console is the service provider.

Step For SAML, configure Enforce as a SAML service See “Configure the Enforce Server as a SAML
5 provider with the identity provider. service provider with the IdP (Create an
application in your identity provider)” on page 136.

Step For SAML, download the identity provider See “Export the IdP metadata to DLP”
6 metadata. on page 136.

Step Complete the process by restarting the Enforce See “About Symantec Data Loss Prevention
7 Server. services” on page 101.

Step Log on to the Enforce Server administration See “Administrator Bypass URL” on page 133.
8 console using the Administrator Bypass URL.
Managing roles and users 133
Configuring user authentication

Note: The Enforce Server administration console (the service provider in SAML) and the IdP
exchange messages using the settings in the configuration. Ensure that your settings match
with your IdP's configuration and capabilities. Unmatched settings break the system.
You must restart the Enforce Server twice: once after you set up the authentication configuration
in the springSecurityContext.xml file, and once after you download the IdP metadata file
and replace the contents of idp-metadata.xml in the Enforce install directory with the IdP
metadata.

See “Administrator Bypass URL” on page 133.

Administrator Bypass URL

The administrator bypass URL, https://<hostnameOrlp>/ProtectManager/admin/Logon
enables you to bypass SAML authentication. You can log on to the Enforce Server
administration console and use forms-based authentication to set up users. You must enter
this URL in your browser; you cannot navigate to this URL through the Enforce Server
administration console user interface.

Note: Only one active logon is available with the Bypass URL.

See “Set up and configure the authentication method” on page 133.

Set up and configure the authentication method

These steps present an overview of the common tasks for setting up and configuring all
authentication methods. Additional steps or changes for each method are explained in "Final
steps" following the initial template file configuration.

Note: The files that you must modify are commented with details to help you through the update
process.

To set up the authentication method

1 Delete (or rename) the springSecurityContext.xml file in the [your install
directory]/Protect/tomcat/webapps/ProtectManager/WEB-INF/.

2 Go to the [your install

directory]/Protect/tomcat/webapps/ProtectManager/security/template folder
and select the appropriate configuration template file for your authentication method:
■ SpringSecurityContext-SAML.xml for SAML authentication configurations
Managing roles and users 134
Configuring user authentication

■ SpringSecurityContext-Form.xml for forms and client certificate-based authentication

configurations
■ SpringSecurityContext-Certificate.xmlfor client certificate-based authentication
only
■ springSecurityContext-Kerberos.xml for Active Directory/Kerberos authentication
configurations

3 Copy the file you selected into the [your install

directory]/Protect/tomcat/webapps/ProtectManager/WEB-INF/ folder.

4 Rename the file to springSecurityContext.xml.

5 Configure the springSecurityContext.xml file:
6 Final steps:
■ SAML: For instructions on how to set up the SAML authentication configuration, see
Set up the SAML authentication configuration.
■ Forms Based: If the template file that you copied is for forms-based authentication,
there are no additional settings to configure. The DLP User Authentication section
of the General Settings now indicates that your user authentication method is Forms
Based.
■ Client certificate: To enable client certificate authentication, set clientAuth to want
or true in <InstallDirectory>/Protect/tomcat/config/server.xml. The DLP
User Authentication section of the General Settings now indicates that your user
authentication method is Certificate.
■ Active Directory: To enable Active Directory authentication, replace the value for
krbConfLocation in
[your install
directory]/Protect/tomcat/webapps/ProtectManager/WEB-INF/springSecurityContext.xml
with the path to your krb5.ini file.
The DLP User Authentication section of the General Settings now indicates that
your user authentication method is Active Directory. You can configure the list of
domains in this DLP User Authentication section of the General Settings page

Note: You can no longer perform the initial setup of Active Directory through the Enforce
Server administration console.

See “Configuring the Enforce Server for Active Directory authentication” on page 141.
See “Set up the SAML authentication configuration” on page 135.
Managing roles and users 135
Configuring user authentication

Set up the SAML authentication configuration

Get the information about your IdP, such as its choice of authentication methods, available
user identifiers, available user attributes, and the required service provider metadata.
Open [your install directory]/Protect/tomcat/webapps/ProtectManager/WEB-INF/
and set the entityBaseURL property to your Enforce URL: https://<host name or
IP>/ProtectManager.

Note: Unless you only want to access the Enforce Server administration console from the host
machine, don't use localhost as the host name.

Set the property value of "nameID" by editing the property name ="nameID" value in the
Spring file to a name identifier such as emailAddress, WindowsDomainQualifiedName, or
another nameID that your IdP supports. Here's an example for email address:
<property name="nameID"
value=urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress" />

You may want to use a combination of user attributes returned from the IdP to identify a Data
Loss Prevention user. In this case you can set the userAttributes property. For example:

<bean id=userLookupService" class="com.vontu.login.spring.VontuSAMLUserDetailsService">

<!--
<property name="user Attributes">
<set>
<value>UserName</value>
<value>EmailAddress</value>
<value>EmployeeID</value>
</set>
</property>

Generate or download Enforce (service providers) SAML metadata

To download the Enforce SAML metadata
1 Restart the Enforce Server.
2 Log on as Administrator using the Bypass url. This Bypass URL is accessed directly; you
don't need to logon to the Enforce Server administration console to access this URL.
Managing roles and users 136
Configuring user authentication

3 Go to System > Settings > General and navigate to the DLP User Authentication
section.
4 Click the link to the right of The SAML config file for your IdP is at to download the
metadata.
See “Configure the Enforce Server as a SAML service provider with the IdP (Create an
application in your identity provider)” on page 136.

Configure the Enforce Server as a SAML service provider with the

IdP (Create an application in your identity provider)
These steps vary depending on the IdP that you use. Here is a broad overview of the steps if
you use Symantec VIP Access Manager as your IdP:
To configure the Enforce Server as a SAML service provider with the IdP create an application
1 Log on to the VIP Access Manager administration console as administrator.
2 Click generic template.
3 Name the connector.
4 Select the access policy as SSO (single sign-on).
5 Configure your portal by selecting an icon for your site (this icon appears on the identity
provider's dashboard).
6 Upload the Enforce Server metadata.
See “Export the IdP metadata to DLP” on page 136.

Export the IdP metadata to DLP

Download the IdP metadata and replace the contents of the idp-metadata.xml file at
<installdirectory>/Protect/tomcat/webapps/ProtectManager/security/idp-metadata.xml
with the IdP metadata that you downloaded.
See “Configuring Active Directory authentication” on page 136.

Configuring Active Directory authentication

If the template file that you copied is for Active Directory/Kerberos authentication, open the
<InstallDirectory>/Protect/tomcat/webapps/ProtectManager/WEB-INF/springSecurityContext.xml
file in a text editor. This is the springSecurityContext-Kerberos.xml file that you previously
renamed to springSecurityContext.xml. Set the krbConfLocation value to your Kerberos
authentication file. For example (line breaks added for legibility):
Managing roles and users 137
Integrating Active Directory for user authentication

<bean class="org.springframework.security.kerberos.authentication.sun.
GlobalJunJaasKerberosConfig">

property name="krbConfLocation" value="C:\Program Files\Symantec\
DataLossPrevention\EnforceServer\15.5\protect
\config\krb5.ini"/>
</bean>

See “Set up and configure the authentication method” on page 133.

See “Configuring forms-based authentication” on page 137.
See “Integrating Active Directory for user authentication” on page 137.

Configuring forms-based authentication

After you copy the template file for forms-based authentication, there are no additional settings
to configure.
See “Configuring certificate authentication” on page 137.

Configuring certificate authentication

After you copy the template file for client certificate-based authentication, go to the <Install
Directory>/Protect/tomcat/config/server.xml file and set the client auth value to
want or true.

See “Generate or download Enforce (service providers) SAML metadata” on page 135.

Integrating Active Directory for user authentication

You can configure the Enforce Server to use Microsoft Active Directory for user authentication.
After you switch to Active Directory authentication, you must still define users in the Enforce
Server administration console. If the user names you enter in the Administration Console match
Active Directory users, the system associates any new user accounts with Active Directory
passwords. You can switch to Active Directory authentication after you have already created
Managing roles and users 138
Integrating Active Directory for user authentication

user accounts in the system. Only those existing user names that match Active Directory user
names remain valid after the switch.
Users must use their Active Directory passwords when they log on. Note that all Symantec
Data Loss Prevention user names remain case sensitive, even though Active Directory user
names are not. You can switch to Active Directory authentication after already having created
user names in Symantec Data Loss Prevention. However, users still have to use the
case-sensitive Symantec Data Loss Prevention user name when they log on.
To use Active Directory authentication
1 Verify that the Enforce Server host is time-synchronized with the Active Directory server.

Note: Ensure that the clock on the Active Directory host is synched to within five minutes
of the clock on the Enforce Server host.

2 (Linux only) Make sure that the following Red Hat RPMs are installed on the Enforce
Server host:
■ krb5-workstation

■ krb5-libs

■ pam_krb5

3 Create the krb5.ini (or krb5.conf for Linux) configuration file that gives the Enforce
Server information about your Active Directory domain structure and Active Directory
server addresses.
See “Creating the configuration file for Active Directory integration” on page 138.
4 Confirm that the Enforce Server can communicate with the Active Directory server.
See “Verifying the Active Directory connection” on page 140.
5 Configure Symantec Data Loss Prevention to use Active Directory authentication.
See “Configuring the Enforce Server for Active Directory authentication” on page 141.

Creating the configuration file for Active Directory integration

You must create a krb5.ini configuration file (or krb5.conf on Linux) to give Symantec Data
Loss Prevention information about your Active Directory domain structure and server locations.
This step is required if you have more than one Active Directory domain. However, even if
your Active Directory structure includes only one domain, it is still recommended to create this
file. The kinit utility uses this file to confirm that Symantec Data Loss Prevention can
communicate with the Active Directory server.
Managing roles and users 139
Integrating Active Directory for user authentication

Note: If you are running Symantec Data Loss Prevention on Linux, verify the Active Directory
connection using the kinit utility. You must rename the krb5.ini file as krb5.conf. The kinit
utility requires the file to be named krb5.conf on Linux. Symantec Data Loss Prevention
assumes that you use kinit to verify the Active Directory connection, and directs you to rename
the file as krb5.conf.

Symantec Data Loss Prevention provides a sample krb5.ini file that you can modify for use
with your own system. The sample file is stored in \15.5\Protect\config (for example,
\Program Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config
on Windows or /opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config
on Linux). If you are running Symantec Data Loss Prevention on Linux, Symantec recommends
renaming the file to krb5.conf. The sample file, which is divided into two sections, looks like
this:

[libdefaults]
default_realm = TEST.LAB
[realms]
ENG.COMPANY.COM = {
kdc = engAD.eng.company.com
}
MARK.COMPANY.COM = {
kdc = markAD.eng.company.com
}
QA.COMPANY.COM = {
kdc = qaAD.eng.company.com
}

The [libdefaults] section identifies the default domain. (Note that Kerberos realms
correspond to Active Directory domains.) The [realms] section defines an Active Directory
server for each domain. In the previous example, the Active Directory server for
ENG.COMPANY.COM is engAD.eng.company.com.
Managing roles and users 140
Integrating Active Directory for user authentication

To create the krb5.ini or krb5.conf file

1 Go to SymantecDLP\Protect\config and locate the sample krb5.ini file. For example,
locate the file in \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config (on
Windows) or
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config (on Linux).

2 Copy the sample krb5.ini file to the c:\windows directory (on Windows) or the /etc
directory (on Linux). If you are running Symantec Data Loss Prevention on Linux, plan to
verify the Active Directory connection using the kinit command-line tool. Rename the file
as krb5.conf.
See “Verifying the Active Directory connection” on page 140.
3 Open the krb5.ini or krb5.conf file in a text editor.
4 Replace the sample default_realm value with the fully qualified name of your default
domain. (The value for default_realm must be all capital letters.) For example, modify
the value to look like the following:

default_realm = MYDOMAIN.LAB

5 Replace the other sample domain names with the names of your actual domains. (Domain
names must be all capital letters.) For example, replace ENG.COMPANY.COM with
ADOMAIN.COMPANY.COM.

6 Replace the sample kdc values with the host names or IP addresses of your Active
Directory servers. (Be sure to follow the specified format, in which opening brackets are
followed immediately by line breaks.) For example, replace engAD.eng.company.com with
ADserver.eng.company.com, and so on.

7 Remove any unused kdc entries from the configuration file. For example, if you have only
two domains besides the default domain, delete the unused kdc entry.
8 Save the file.

Verifying the Active Directory connection

kinit is a command-line tool you can use to confirm that the Active Directory server responds
to requests. It also verifies that the Enforce Server has access to the Active Directory server.
For Microsoft Windows installations, the utility is installed by the Symantec Data Loss Prevention
installer in the C:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.1\Protect\jre\bin directory.
For Linux installations, the utility is part of the Red Hat Enterprise Linux distribution, and is in
the following location: /usr/kerberos/bin/kinit. You can also download Java SE 6 and
locate the kinit tool in \java_home\jdk1.6.0\bin.
Managing roles and users 141
Integrating Active Directory for user authentication

If you run the Enforce Server on Linux, use the kinit utility to test access from the Enforce
Server to the Active Directory server. Rename the krb5.ini file as krb5.conf. The kinit
utility requires the file to be named krb5.conf on Linux.
To test the connection to the Active Directory server
1 On the Enforce Server host, go to the command line and navigate to the directory where
kinit is located.

2 Issue a kinit command using a known user name and password as parameters. (Note
that the password is visible in clear text when you type it on the command line.) For
example, issue the following:

kinit kchatterjee mypwd10#

The first time you contact Active Directory you may receive an error that it cannot find the
krb5.ini or krb5.conf file in the expected location. On Windows, the error looks similar
to the following:

krb_error 0 Could not load configuration file c:\winnt\krb5.ini

(The system cannot find the file specified) No error.

In this case, copy the krb5.ini or krb5.conf file to the expected location and then rerun
the kinit command that is previously shown.
3 Depending on how the Active Directory server responds to the command, take one of the
following actions:
■ If the Active Directory server indicates it has successfully created a Kerberos ticket,
continue configuring Symantec Data Loss Prevention.
■ If you receive an error message, consult with your Active Directory administrator.

Configuring the Enforce Server for Active Directory authentication

Perform the procedure in this section when you first set up Active Directory authentication,
and any time you want to modify existing Active Directory settings. Make sure that you have
completed the prerequisite steps before you enable Active Directory authentication.
See “Integrating Active Directory for user authentication” on page 137.
To configure the Enforce Server to use Active Directory for authentication:
1 Make sure all users other than the Administrator are logged out of the system.
2 In the Enforce Server administration console, go to System > Settings > General and
click Configure (at top left).
Managing roles and users 142
About certificate authentication configuration

3 At the Edit General Settings screen that appears, locate the Active Directory
Authentication section near the bottom and select (check) Perform Active Directory
Authentication.
The system then displays several fields to fill out.
4 See “Creating the configuration file for Active Directory integration” on page 138.
5 If your environment has more than one Active Directory domain, click Configure and
enter the domain names (separated by commas) in the Active Directory Domain List
field.
The system displays Active Directory domains in a drop-down list on the user logon page.
Users then select the appropriate domain at logon. Do not list the default domain, as it
already appears in the drop-down list by default.
6 Click Save.
7 Go to the operating system services tool and restart the Symantec Data Loss Prevention
Manager service.

About certificate authentication configuration

Certificate authentication enables a user to automatically log on to the Enforce Server
administration console. The user logs on using a client certificate that your public key
infrastructure (PKI) generates. When a user accesses the Enforce Server administration
console, the PKI automatically delivers the user's certificate to the Tomcat container that hosts
the administration console. The Tomcat container validates the client certificate using the
certificate authorities that you have configured in the Tomcat trust store.
The client certificate is delivered to the Enforce Server computer when a client's browser
performs the SSL handshake with the Enforce Server. For example, some browsers might be
configured to operate with a smart card reader to present the certificate. Alternately, you can
upload the X.509 certificate to a browser and configure the browser to send the certificate to
the Enforce Server.
If the certificate is valid, the Enforce Server administration console may also determine if the
certificate was revoked.
See “About certificate revocation checks” on page 150.
If the certificate is valid, then the Enforce Server uses the common name (CN) in the certificate
to determine if that CN is mapped to an active user account with a role.

Note: Some browsers cache a user's client certificate, and automatically log the user on to the
Administration Console after the user has chosen to sign out. In this case, users must close
the browser window to complete the log out process.
Managing roles and users 143
About certificate authentication configuration

The following table describes the steps necessary to use certificate authentication with
Symantec Data Loss Prevention.

Table 5-4 Steps to configure certificate authentication

Phase Action Description

1 Enable certificate authentication on the Enforce You can configure an existing Enforce Server
Server computer. to enable authentication. Enforce Servers have
form-based authentication by default.

See “Configuring certificate authentication for

the Enforce Server administration console”
on page 144.

2 Add certificate authority (CA) certificates to You can add CA certificates to the Tomcat trust
establish the trust chain. store with the Java keytool utility to manually
add certificates to an existing Enforce Server.

See “Adding certificate authority (CA) certificates

to the Tomcat trust store” on page 146.

3 (Optional) Change the Tomcat trust store The Symantec Data Loss Prevention installer
password. configures each new Enforce Server installation
with a default Tomcat trust store password.
Follow these instructions to configure a secure
password.

See “Changing the Tomcat trust store password”

on page 147.

4 Map certificate common name (CN) values to See “Mapping Common Name (CN) values to
Enforce Server user accounts. Symantec Data Loss Prevention user accounts”
on page 149.

5 Configure the Enforce Server to check for See “About certificate revocation checks”
certificate revocation. on page 150.

6 Verify Enforce Server access using See “Troubleshooting certificate authentication”

certificate-based single sign-on. on page 153.

7 (Optional) Disable forms-based logon. If you want to use certificate-based single

sign-on for all access to the Enforce Server,
disable forms-based logon.

See “Disabling password authentication and

forms-based logon” on page 154.
Managing roles and users 144
About certificate authentication configuration

Configuring certificate authentication for the Enforce Server

administration console
Form-based authentication is available by default on the Enforce Server. You must add
certificate authentication manually. Follow this procedure to manually enable form and certificate
authentication on a Symantec Data Loss Prevention installation.
To enable form and certificate authentication for users of the Enforce Server administration
console
1 Log on to the Enforce Server computer using the account that you created during Symantec
Data Loss Prevention installation.

Note: Do not change permissions or ownership on any configuration file from another
root or Administrator account.

2 Copy the corresponding springSecurityContext.xml file into the Tomcat WEB-INF

directory.
3 Edit C:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\tomcat\conf\server.xml
(Windows) or
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/tomcat/conf/server.xml
(Linux) and change the certificateVerification value from none to optional. Change
the revocationEnabled value from true to false. Save the file.
4 Restart the Enforce Server. This change to the server.xml file that you edited in the
previous step enables the Use Certificate authentication check box in the Enforce Server
administration console user interface.
5 Logon to the Enforce Server administration console and go to System > Login
Management > DLP Users.
6 Check Use Certificate authentication and indicate the corresponding CN mapping.
7 Add the CA certificates to the Tomcat trust store using the Java keytool utility.
See “Adding certificate authority (CA) certificates to the Tomcat trust store” on page 146.
Ensure that you have installed all necessary certificates and that users can log on with
certificate authentication.
Now the end user has both form-based authentication and certificate authentication.
About certificate revocation checks
Follow this procedure to enable certificate authentication on Symantec Data Loss Prevention.
Managing roles and users 145
About certificate authentication configuration

To enable certificate authentication for users of the Enforce Server administration console
1 Log on to the Enforce Server computer using the account that you created during Symantec
Data Loss Prevention installation.

Note: Do not change permissions or ownership on any configuration file from another
root or Administrator account.

2 Copy the corresponding springSecurityContext.xml file into the Tomcat WEB-INF

directory.
3 Edit C:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\tomcat\conf\server.xml
(Windows) or
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/tomcat/conf/server.xml
(Linux) and change thecertificate verification value from false to optional. Save
the file.
4 Restart the Enforce Server. This change to the server.xml file that you edited in the
previous step enables the Use Certificate authentication check box in the Enforce Server
administration console user interface.
5 Logon to the Enforce Server administration console and go to System > Login
Management > DLP Users.
6 Check Use Certificate authentication and indicate the corresponding Common Name
(CN) mapping.
7 Add the CA certificates to the Tomcat trust store using the Java keytool utility.
See “Adding certificate authority (CA) certificates to the Tomcat trust store” on page 146.
Ensure that you have installed all necessary certificates and that users can log on with
certificate authentication.
Managing roles and users 146
About certificate authentication configuration

8 For certificate authentication only, copy the springSecurityContext-Certificate.xml

file from C:\Program Files\Symantec\DataLossPrevention\EnforceServer\
15.5\Protect\tomcat\webapps\ProtectManager\security\template (Windows) or
opt/Symantec/DataLossPrevention/EnforceServer/
15.5/Protect/tomcat/webapps/ProtectManager/WEB-INF (Linux) and rename it to
springSecurityContext.xml.

9 Edit the C:\Program

Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\tomcat\conf\server.xml
(Windows) or
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/tomcat/conf/server.xml
file and change the certificateVerification value from optional to required.
Restart the Enforce Server.
Now the user has certificate authentication only.

See “Adding certificate authority (CA) certificates to the Tomcat trust store” on page 146.

Adding certificate authority (CA) certificates to the Tomcat trust store

To use certificate authentication with Symantec Data Loss Prevention, you must add all of the
CA certificates that are required to authenticate users in your system to the Tomcat trust store.
For Symantec Data Loss Prevention 15.0 and later, CA certificates can only be imported to
the Enforce Server using the Java keytool utility. Each X.509 certificate must be provided in
Distinguished Encoding Rules (DER) format in a .cer file. If multiple CAs are required to
establish the certificate chain, then you must add multiple .cer files.
To add certificate CA certificates to the Tomcat trust store
1 Log on to the Enforce Server computer using the account that you created during Symantec
Data Loss Prevention installation.

Note: Do not change permissions or ownership on any configuration file from another
root or Administrator account.

2 Change directory to the

/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/tomcat/conf
(Linux) or c:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\tomcat\conf
(Windows) directory. If you installed Symantec Data Loss Prevention to a different directory,
substitute the correct path.
3 Copy all certificate files (.cer files) that you want to import to the conf directory on the
Enforce Server computer.
Managing roles and users 147
About certificate authentication configuration

4 Use the keytool utility that is installed with Symantec Data Loss Prevention to add a
certificate to the Tomcat trust store. For Windows systems, enter:

c:\Program Files\Symantec\DataLossPrevention\EnforceServer\jre\bin\keytool
-import
-trustcacerts
-alias CA_CERT_1
-file certificate_1.cer
-keystore .\truststore.jks

For Linux systems, enter:

/opt/Symantec/DataLossPrevention/jre/bin/keytool
-import
-trustcacerts
-alias CA_CERT_1
-file certificate_1.cer
-keystore ./truststore.jks

In these commands, replace CA_CERT_1 with a unique alias for the certificate that you
import. Replace certificate_1.cer with the name of the certificate file you copied to the
Enforce Server computer.
5 Enter the password to the keystore at the keytool utility prompt. The default keystore
password is protect.
6 Repeat these steps to install all the certificate files that are necessary to complete the
certificate chain.
7 Stop and then restart the Symantec DLP Manager service to apply your changes.
8 If you have not yet changed the default Tomcat keystore password, do so now.
See “Changing the Tomcat trust store password” on page 147.

Changing the Tomcat trust store password

When you install Symantec Data Loss Prevention, the Tomcat trust store uses protect as
the default password. Follow this procedure to assign a secure password to the Tomcat trust
store when you use certificate authentication.
Managing roles and users 148
About certificate authentication configuration

To change the Tomcat trust store password

1 Log on to the Enforce Server computer using the account that you created during Symantec
Data Loss Prevention installation.

Note: Do not change permissions or ownership on any configuration file from another
root or Administrator account.

2 Change directory to the

/opt/Symantec/DataLossPrevention/EnforceServer/15.5/jre/bin/ (Linux) or
c:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\
(Windows) directory. If you installed Symantec Data Loss Prevention to a different directory,
substitute the correct path.
3 Use the keytool utility that is installed with Symantec Data Loss Prevention to change
the Tomcat truststore password. For Windows systems, enter:

c:\Program Files\Symantec\DataLossPrevention\ServerJRE\1.8.0_162\bin\
keytool - storepasswd -new new_password -keystore ./truststore.jks

For Linux systems, enter:

/opt/Symantec/DataLossPrevention/EnforceServer/15.5/jre/bin/keytool -storepasswd
-new new_password -keystore ./truststore.jks

Replace new_password with a secure password.

4 Enter the current password to the keystore when the keytool utility prompts you to do
so. The default password is protect.
5 Change directory to the
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/tomcat/conf
(Linux) or c:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\tomcat\conf
(Windows) directory. If you installed Symantec Data Loss Prevention into a different
directory, substitute the correct path.
6 Open the server.xml file with a text editor.
Managing roles and users 149
About certificate authentication configuration

7 In the following line in the file, edit the truststorePass="protect" entry to specify your
new password:

<Connector URIEncoding="UTF-8" acceptCount="100" clientAuth="want"

debug="0" disableUploadTimeout="true" enableLookups="false"
keystoreFile="conf/.keystore" keystorePass="protect"
maxSpareThreads="75" maxThreads="150" minSpareThreads="25"
port="443" scheme="https" secure="true" sslProtocol="TLS"
truststoreFile="conf/truststore.jks" truststorePass="protect"/>

Replace protect with the new password that you defined in the keytool command.
8 Save your changes and exit the text editor.
9 Change directory to the
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config (Linux)
or c:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config (Windows)
directory. If you installed Symantec Data Loss Prevention into a different directory,
substitute the correct path.
10 Open the Manager.properties file with a text editor.
Add the following line in the file to specify the new password:

com.vontu.manager.tomcat.truststore.password = password

Replace password with the new password. Do not enclose the password in quotation
marks.
11 Save your changes and exit the text editor.
12 Open the Protect.properties file with a text editor.
13 Edit (or if not present, add) the following line in the file to specify the new password:
com.vontu.manager.tomcat.truststore.password = password

Replace password with the new password. Do not enclose the password in quotation
marks.
14 Save your changes and exit the text editor.
15 Stop and then restart the Symantec DLP Manager service to apply your changes.

Mapping Common Name (CN) values to Symantec Data Loss

Prevention user accounts
Each user that accesses the Enforce Server administration console using certificate-based
single sign-on must have an active user account in the Enforce Server configuration. The user
Managing roles and users 150
About certificate authentication configuration

account associates the common name (CN) value from the user's client certificate to one or
more roles in the Enforce Server administration console. You can map a CN value to only one
Enforce Server user account.
The user account that you create does not require a separate Enforce Server administration
console password. You can optionally configure a password if you want to allow the user to
also log on from the Enforce Server administration console log-on page. If you enable password
authentication and the user does not provide a certificate when the browser asks for one, then
the Enforce Server displays the log-on page. A log-on failure is displayed if password
authentication is disabled and the user does not provide a certificate.
An active user account must identify a user's CN value and have a valid role assigned in the
Enforce Server to log on using single sign-on with certificate authentication. You can disable
or delete the associated Enforce Server user account to prevent a user from accessing the
Enforce Server administration console without revoking their client certificate.
See “Configuring user accounts” on page 121.

About certificate revocation checks

While managing your public key infrastructure, you may need to revoke a client's certificate
with the CA. For example, you might revoke a certificate if an employee leaves the company,
or if an employee's credentials are lost or stolen. When you revoke a certificate, the CA uses
one or more Certificate Revocation Lists (CRLs) to publish those certificates that are no longer
valid.

Note: Certificate revocation checking is disabled by default. You must enable it and configure
it. See “Configuring certificate revocation checks” on page 151.

Symantec Data Loss Prevention retrieves revocation lists from a Certificate Revocation List
Distribution Point (CRLDP). To check revocation using a CRLDP, the client certificate must
include a CRL distribution point field. The following shows an example CRLDP field definition:

[1]CRL Distribution Point

Distribution Point Name:
Full Name: URL=http://my_crldp

Note: Symantec Data Loss Prevention does not support specifying the CRLDP using an LDAP
URL.

If the CRL distribution point is defined in each certificate and the Enforce Server can directly
access the server, then no additional configuration is required to perform revocation checks.
If the CRL distribution point is accessible only by a proxy server, then you must configure the
proxy server settings in the Symantec Data Loss Prevention configuration.
Managing roles and users 151
About certificate authentication configuration

See “Accessing the CRLDP with a proxy” on page 152.

Regardless of which revocation checking method you use, you must enable certificate revocation
checks on the Enforce Server computer. Certificate revocation checks are enabled by default
if you select certificate installation during the Enforce Server installation. If you upgraded an
existing Symantec Data Loss Prevention installation, certificate revocation is not enabled by
default.
See “Configuring certificate revocation checks” on page 151.

Configuring certificate revocation checks

When you enable certificate revocation checks, Symantec Data Loss Prevention uses a CRLDP
to determine the revocation status.
Follow this procedure to enable certificate revocation checks.
To configure certificate revocation checks
1 Ensure that the CRLDP is defined in the CRL distribution point field of each client certificate.
2 Log on to the Enforce Server computer using the account that you created during Symantec
Data Loss Prevention installation.

Note: Do not change permissions or ownership on any configuration file from another
root or Administrator account.

3 Navigate to the c:\Program

Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\tomcat\conf\server.xml
(Windows) or
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/tomcat/conf/server.xml
(Linux) directory and update the revocationEnabled value from false to true.
4 To enable revocation checking using a CRLDP, add or uncomment the following line in
the file:

wrapper.java.additional.22=-Dcom.sun.security.enableCRLDP=true

This option is enabled by default for new Symantec Data Loss Prevention installations.
Managing roles and users 152
About certificate authentication configuration

5 If you use CRLDP revocation checks, optionally configure the cache lifetime using the
property:

wrapper.java.additional.22=-Dsun.security.certpath.ldap.cache.lifetime=30

This parameter specifies the length of time, in seconds, to cache the revocation lists that
are obtained from a CRL distribution point. After this time is reached, a lookup is performed
to refresh the cache the next time there is an authentication request. The default cache
lifetime 30 seconds. Specify 0 to disable the cache, or -1 to store cache results indefinitely.
6 Stop and then restart the Symantec DLP Manager service to apply your changes.

Note: Symantec Data Loss Prevention supports certificate revocation when the Enforce Server
is in non-FIPS mode.

Accessing the CRLDP with a proxy

Symantec recommends that you allow direct access from the Enforce Server computer to all
CRLDP servers that are required to perform certificate revocation checks. If the CRLDP servers
are accessible only through a proxy, then you must configure the proxy settings on the Enforce
Server computer.
When you configure a proxy, the Enforce Server uses your proxy configuration for all HTTP
connections, such as those connections that are created to connect to a CRLDP server to
fetch certificate revocation lists. Check with your proxy administrator before you configure
these proxy settings, and consider allowing direct access to CRLDP servers if at all possible.
To configure proxy settings for a CRLDP server
1 Ensure that the CRLDP is defined in the CRL distribution point field of each client certificate.
2 Log on to the Enforce Server computer using the account that you created during Symantec
Data Loss Prevention installation.

Note: Do not change permissions or ownership on any configuration file from another
root or Administrator account.

3 Change directory to the

/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config (Linux)
or c:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config (Windows)
directory. If you installed Symantec Data Loss Prevention into a different directory,
substitute the correct path.
4 Open the SymantecDLPManager.conf file with a text editor.
Managing roles and users 153
About certificate authentication configuration

5 Add or edit the following configuration properties to identify the proxy:

wrapper.java.additional.22=-Dhttp.proxyHost=myproxy.mydomain.com
wrapper.java.additional.23=-Dhttp.proxyPort=8080
wrapper.java.additional.24=-Dhttp.nonProxyHosts=hosts

Replace myproxy.mydomain.com and 8080 with the host name and port of your proxy
server. You can include server host names, fully qualified domain names, or IP addresses
separated with a pipe character. For example:

wrapper.java.additional.24=-Dhttp.nonProxyHosts=crldp-server|
127.0.0.1|DataInsight_Server_Host

6 Save your changes to the configuration file.

7 Stop and then restart the Symantec DLP Manager service to apply your changes.

Troubleshooting certificate authentication

By default Symantec Data Loss Prevention logs each successful log-on request to the Enforce
Server administration console. Symantec Data Loss Prevention also logs an error message
if a logon request is made without supplying a certificate, or if a valid certificate presents a CN
that does not map to a valid user account in the Enforce Server configuration.

Note: If certificate authentication fails while the browser establishes an HTTPS connection to
the Enforce Server administration console, then Symantec Data Loss Prevention cannot log
an error message.

You can optionally log additional information about certificate revocation checks by adding or
uncommenting the following system property in the SymantecDLPManager.conf file:

wrapper.java.additional.90=-Djava.security.debug=certpath

SymantecDLPManager.conf is located in the c:\Program

Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config (Windows)
or /opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config (Linux)
directory. All debug messages are logged to
c:\ProgramData\Symantec\DataLossPrevention\EnforceServer\15.5
\Protect\logs\debug\SymantecDLPManager.log (Windows) or
/var/log/Symantec/DataLossPrevention/EnforceServer/15.5/debug/SymantecDLPManager.log
(Linux).
Managing roles and users 154
About certificate authentication configuration

Disabling password authentication and forms-based logon

Forms-based log on with password authentication can be used as a fallback access mechanism
while you configure and test certificate authentication. After you configure certificate
authentication, you can disable forms-based logon and password authentication. Your public
key infrastructure then handles all logon requests.
Once you configure the common name (CN) with both forms and certificate enabled, then you
can switch to certificate-only. You replace the springSecurityContext.xml file with the
springSecurityContext-Certificate.xml file and restart the Enforce Server. Form-based
logon is then completely disabled.

Note: When you disable forms-based logon you disable the feature for all users, including
those with Administrator privileges. As an alternative, you can disable forms-based logon or
certificate authentication for an individual user by configuring that user's account.
See “Configuring user accounts” on page 121.

If you later turn on forms-based logon but the Administrator user account does not have a
password configured, you can reset the Administrator password. Reset the password using
the AdminPasswordReset utility.
See “Resetting the Administrator password” on page 125.
Chapter 6
Connecting to group
directories
This chapter includes the following topics:

■ Creating connections to LDAP servers

■ Configuring directory server connections

■ Scheduling directory server indexing

Creating connections to LDAP servers

Symantec Data Loss Prevention supports directory server connections to LDAP-compliant
directory servers such as Microsoft Active Directory (AD). A group directory connection specifies
how the Enforce Server or Discover Server connects to the directory server.
The connection to the directory server must be established before you create any user groups
in the Enforce Server. The Enforce Server or Discover Server uses the connection to obtain
details about the groups. If this connection is not created, you are not able to define any User
Groups. The connection is not permanent, but you can configure the connection to synchronize
at a specified interval. The directory server contains all of the information that you need to
create User Groups.
See “User Groups” on page 376.

Note: If you use a directory server that contains a self-signed authentication certificate, you
must add the certificate to the Enforce Server or the Discover Server. If your directory server
uses a pre-authorized certificate, it is automatically added to the Enforce Server or Discover
Server. See “Importing SSL certificates to Enforce or Discover servers” on page 277.
Connecting to group directories 156
Configuring directory server connections

To create a group directory connection

1 Navigate to the System > Settings > Directory Connections screen.
2 Click Add Connection.
3 Configure the directory connection.
See “Configuring directory server connections” on page 156.

Configuring directory server connections

The Directory Connections page is the home page for configuring directory server connections.
Once you define the directory connection, you can create one or more User Groups.
See “Configuring User Groups” on page 936.

Table 6-1 Configuring directory server connections

Step Action Description

1 Navigate to the Directory Connections This page is available at System > Settings > Directory
page (if not already there). Connections.

2 Click Create New Connection. This action takes you to the Configure Directory
Connection page.

3 Enter a Name for the directory server The Connection Name is the user-defined name for the
connection. connection. It appears at the Directory Connections home
page once the connection is configured.

4 Specify the Network Parameters for the Table 6-2 provides details on these parameters.
directory server connection. Enter or specify the following parameters:

■ The Hostname of the computer where the directory

server is installed.
■ The Port on the directory server that supports
connections.
■ The Base DN (distinguished name) of the directory
server.
■ The Encryption Method for the connection, either None
or Secure.

5 Specify the Authentication mode for Table 6-3 provides details on configuring the authentication
connecting to the directory server. parameters.

6 Click Test Connection to verify the If there is anything wrong with the connection, the system
connection. displays an error message describing the problem.
Connecting to group directories 157
Configuring directory server connections

Table 6-1 Configuring directory server connections (continued)

Step Action Description

7 Click Save to save the direction connection The system automatically indexes the directory server after
configuration. you successfully create, test, and save the directory server
connection.

8 Select the Index and Replication Status Verify that the directory server was indexed. After some time
tab. (depending on the size of the directory server query), you
should see that the Replication Status is "Completed
<date> <time>". If you do not see that the status is
completed, verify that you have configured and tested the
directory connection properly. Contact your directory server
administrator for assistance.

9 Select the Index Settings tab. You can adjust the directory server indexing schedule as
necessary at the Index Settings tab.

See “Scheduling directory server indexing” on page 158.

Table 6-2 Directory connection network parameters

Network parameters Description

Hostname Enter the Hostname of the directory server.

For example: enforce.dlp.symantec.com

You must enter the Fully Qualified Name (FQN) of the directory server. Do not use
the IP address.

Port Enter the connection Port for the directory server.

For example: 389

Typically the port is 389 or 636 for secure connections.

Base DN Enter the Base DN for the directory server. This field only accepts one directory
server entry.

For example: dc=enforce,dc=dlp,dc=symantec,dc=com

The Base DN string cannot contain any space characters.

The Base DN is the base distinguished name of the directory server. Typically, this
name is the domain name of the directory server. The Base DN parameter defines
the initial depth of the directory server search.
Connecting to group directories 158
Scheduling directory server indexing

Table 6-2 Directory connection network parameters (continued)

Network parameters Description

Encryption Method Select the Secure option if you want the communication between the directory server
and the Enforce Server to be encrypted using SSL.
Note: If you choose to use a secure connection, you may need to import the SSL
certificate for the directory server to the Enforce Server keystore. See “Importing SSL
certificates to Enforce or Discover servers” on page 277.

Table 6-3 Directory connection authentication parameters

Authentication Description

Authentication Select the Authentication option to connect to the directory server using
authentication mode. Check Connect with Credentials to add your username and
password to authenticate to the directory server.

Username To authenticate with Active Directory, use one of the following methods:

■ Domain and user name, for example: Domain\username

■ User name and domain, for example: [email protected]
■ Fully distinguished user name and domain (without spaces), for example:
cn=username,cn=Users,dc=domain,dc=com
To authenticate with another type of directory server:

■ A different syntax may be required, for example:

uid=username,ou=people,o=company

Password Enter the password for the user name that was specified in the preceding field.

The password is obfuscated when you enter it.

Scheduling directory server indexing

Each directory connection is set to automatically index the configured LDAP server once at
12:00 AM the day after you create the initial connection. You can modify the indexing schedule
to specify when and how often the index is synchronized.
Each directory server connection is set to automatically index the configured User Groups
hosted in the directory server once at 12:00 AM the day after you create the initial connection.
After you create, test, and save the directory server connection, the system automatically
indexes all of the User Groups that are hosted in the directory whose connection you have
established. You can modify this setting, and schedule indexing daily, weekly, or monthly.
Connecting to group directories 159
Scheduling directory server indexing

To schedule group directory indexing

1 Select an existing group directory server connection from the System > Settings >
Directory Connections screen. Or, create a new connection.
See “Configuring directory server connections” on page 156.
2 Adjust the Index Settings to the desired schedule.
See Table 6-4 on page 159.

Table 6-4 Schedule group directory server indexing and view status

Index Settings Description

Index the directory server The Once setting is selected by default and automatically indexes the director
once. server at 12:00 AM the day after you create the initial connection.

You can modify the default Once indexing schedule by specifying when and
how often the index is supposed to be rebuilt.

Index the directory server Select the Daily option to schedule the index daily.
daily.
Specify the time of day and, optionally, the Until duration for this schedule.

Index the directory server Select the Weekly option to schedule the index to occur once a week.
weekly.
Specify the day of the week to index.

Specify the time to index.

Optionally, specify the Until duration for this schedule.

Index the directory server Specify the day of the month to index the directory and the time.
monthly.
Optionally, specify the Until duration for this schedule.

View the indexing and Select the Index and Replication Status tab to view the status of the indexing
replication status. process.

■ Indexing Status
Displays the next scheduled index, date and time.
■ Detection Server Name
Displays the detection server where the User Group profile is deployed.
■ Replication Status
■ Displays the data and time of the most recent synchronization with the
directory group server.
Chapter 7
Managing stored
credentials
This chapter includes the following topics:

■ About the credential store

■ Adding new credentials to the credential store

■ Configuring endpoint credentials

■ Managing credentials in the credential store

■ Managing stored credentials

About the credential store

An authentication credential can be stored as a named credential in a central credential store.
It can be defined once, and then referenced by any number of Discover targets. Passwords
are encrypted before they are stored.
The credential store simplifies management of user name and password changes.
You can add, delete, or edit stored credentials.
See “Adding new credentials to the credential store” on page 161.
See “Managing credentials in the credential store” on page 162.
The Credential Management screen is accessible to users with the "Credential Management"
privilege.
Stored credentials can be used when you edit or create a Discover target.
See “Network Discover/Cloud Storage Discover scan target configuration options” on page 2090.
Managing stored credentials 161
Adding new credentials to the credential store

Adding new credentials to the credential store

You can add new credentials to the credential store. These credentials can later be referenced
with the credential name.
To add a stored credential
1 Click System > Settings > Credentials, and click Add Credential.
2 Enter the following information:

Credential Name Enter your name for this stored credential.

The credential name must be unique within the

credential store. The name is used only to identify
the credential.

Access Username Enter the user name for authentication as

<domain_name>\<username> in the NT4
format. The username must be a Windows
domain user account.

Access Password Enter the password for authentication.

Re-enter Access Password Re-enter the password.

3 Click Save.
4 You can later edit or delete credentials from the credential store.
See “Managing credentials in the credential store” on page 162.
See “Configuring endpoint credentials” on page 161.

Configuring endpoint credentials

You must add credentials to the Credential Store before you can access credentials for Endpoint
FlexResponse or the Endpoint Discover Quarantine response rule. The credentials are stored
in an encrypted folder on all endpoints that are connected to an Endpoint Server. Because all
endpoints store the credentials, you must be careful about the type of credentials you store.
Use credentials that cannot access other areas of your system. Before your endpoint credentials
can be used, you must enable the Enforce Server to recognize them.
To create endpoint credentials
1 Go to: System > Settings > General.
2 Click Configure.
Managing stored credentials 162
Managing credentials in the credential store

3 Under the Credential Management section, ensure that the Allow Saved Credentials
on Endpoint Agent checkbox is selected.
4 Click Save.
5 Go to: System > Settings > Credentials.
6 Click Add Credential.
7 Under the General section, enter the details of the credential you want to add.
8 Under Usage Permission, select Servers and Endpoint agents.
9 Click Save.
See “About the credential store” on page 160.
See “Configuring the Endpoint Discover: Quarantine File action” on page 1815.

Managing credentials in the credential store

You can delete or edit a stored credential.
To delete a stored credential
1 Click System > Settings > Credentials. Locate the name of the stored credential that
you want to remove.
2 Click the delete icon to the right of the name. A credential can be deleted only if it is not
currently referenced in a Discover target or indexed document profile.
To edit a stored credential
1 Click System > Settings > Credentials. Locate the name of the stored credential that
you want to edit.
2 Click the edit icon (pencil) to the right of the name.
3 Update the user name or password.
4 Click Save.
5 If you change the password for a given credential, the new password is used for all
subsequent Discover scans that use that credential.

Managing stored credentials

An authentication credential can be stored in a central credential store. It can be defined once
as a named credential, and then referenced by any number of Network Discover/Cloud Storage
Discover targets.
Managing stored credentials 163
Managing stored credentials

Store your authentication credentials in a central store to simplify management of user name
and password changes.
You can add, delete, or edit stored credentials.
To add a stored credential
1 In System > Settings > Credentials, click Add Credential.
2 Enter the following information:

Credential Name Enter your name for this stored credential.

The credential name must be unique within the

credential store. The name is used only to identify
the credential.

Access Username Enter the user name for authentication.

Access Password Enter the password for authentication.

Re-enter Access Password Re-enter the password.

3 Click Save.
To delete a stored credential
1 In System > Settings > Credentials, locate the name of the stored credential that you
want to remove.
2 Click the delete icon to the right of the name. A credential can be deleted only if it is not
currently referenced in a Discover target or indexed document profile.
To edit a stored credential
1 In System > Settings > Credentials, locate the name of the stored credential that you
want to edit.
2 Click the edit icon (pencil) to the right of the name.
3 Update the user name or password.
4 Click Save.
5 If you change the password for a given credential, the new password is used for all
subsequent Discover scans that use that credential.
See “Providing the password authentication for Network Discover scanned content” on page 2095.
Chapter 8
Managing system events
and messages
This chapter includes the following topics:

■ About system events

■ System events reports

■ Working with saved system reports

■ Server and Detectors event detail

■ Configuring event thresholds and triggers

■ About system event responses

■ Enabling a syslog server

■ About system alerts

■ Configuring the Enforce Server to send email alerts

■ Configuring system alerts

■ About log review

■ System event codes and messages

About system events

System events related to your Symantec Data Loss Prevention installation are monitored,
reported, and logged. System events include notifications from Cloud Operations for cloud
services.
System event reports are viewed from the Enforce Server administration console:
Managing system events and messages 165
System events reports

■ The five most recent system events of severity Warning or Severe are listed on the
Overview screen (System > Servers and Detectors > Overview).
See “About the Overview screen” on page 278.
■ Reports on all system events of any severity can be viewed by going to System > Servers
and Detectors > Events.
See “System events reports” on page 165.
■ Recent system events for a particular detection server or cloud service are listed on the
Server/Detector Detail screen for that server or detector.
See “Server/Detector Detail screen” on page 283.
■ Click on any event in an event list to go to the Event Details screen for that event. The
Event Details screen provides additional information about the event.
See “Server and Detectors event detail” on page 169.
There are three ways that system events can be brought to your attention:
■ System event reports displayed on the administration console
■ System alert email messages
See “About system alerts” on page 175.
■ Syslog functionality
See “Enabling a syslog server” on page 174.
Some system events require a response.
See “About system event responses” on page 172.
To narrow the focus of system event management you can:
■ Use the filters in the various system event notification methods.
See “System events reports” on page 165.
■ Configure the system event thresholds for individual servers.
See “Configuring event thresholds and triggers” on page 170.

System events reports

To view all system events, go to the system events report screen (System > Servers and
Detectors > Events). This screen lists events, one event per line. The list contains those
events that match the selected data range, and any other filter options that are listed in the
Applied Filters bar. For each event, the following information is displayed:

Table 8-1
Events Description

Type The type (severity) of the event. Type may be any one of those listed in Table 8-2.
Managing system events and messages 166
System events reports

Table 8-1 (continued)

Events Description

Time The date and time of the event.

Server The name of the server on which the event occurred.

Host The IP address or host name of the server on which the event occurred.

Code A number that identifies the kind of event.

See the Symantec Data Loss Prevention Administration Guide for information on event
code numbers.

Summary A brief description of the event. Click on the summary for more detail about the event.

Table 8-2 System event types

Event Description

System information

Warning

Severe

You can select from several report handling options.

See “Common incident report features” on page 1933.
Click any event in the list to go to the Event Details screen for that event. The Event Details
screen provides additional information about the event.
See “Server and Detectors event detail” on page 169.
Since the list of events can be long, filters are available to help you select only the events that
you are interested in. By default, only the Date filter is enabled and it is initially set to All Dates.
The Date filter selects events by the dates the events occurred.
To filter the list of system events by date of occurrence
1 Go to the Filter section of the events report screen and select one of the date range
options.
2 Click Apply.
3 Select Custom from the date list to specify beginning and end dates.
In addition to filtering by date range, you can also apply advanced filters. Advanced filters are
cumulative with the current date filter. This means that events are only listed if they match the
advanced filter and also fall within the current date range. Multiple advanced filters can be
Managing system events and messages 167
System events reports

applied. If multiple filters are applied, events are only listed if they match all the filters and the
date range.
To apply additional advanced filters
1 Click on Advanced Filters and Summarization.
2 Click on Add Filter.
3 Choose the filter you want to use from the left-most drop-down list. Available filters are
listed in Table 8-3.
4 Choose the filter-operator from the middle drop-down list.

Note: You can use the Cloud Operations filter value to view events from Cloud Operations
for your detectors.

For each advanced filter you can specify a filter-operator Is Any Of or Is None Of.
5 Enter the filter value, or values, in the right-hand text box, or click a value in the list to
select it.
■ To select multiple values from a list, hold down the Control key and click each one.
■ To select a range of values from a list, click the first one, then hold down the Shift key
and click the last value in the range you want.

6 (Optional) Specify additional advanced filters if needed.

7 When you have finished specifying a filter or set of filters, click Apply.
Click the red X to delete an advanced filter.
The Applied Filters bar lists the filters that are used to produce the list of events that is
displayed. Note that multiple filters are cumulative. For an event to appear on the list it must
pass all the applied filters.
The following advanced filters are available:

Table 8-3 System events advanced filter options

Filter Description

Event Code Filter events by the code numbers that identify each
kind of event. You can filter by a single code number
or multiple code numbers separated by commas
(2121, 1202, 1204). Filtering by code number
ranges, or greater than, or less than operators is
not supported.
Managing system events and messages 168
Working with saved system reports

Table 8-3 System events advanced filter options (continued)

Filter Description

Event type Filter events by event severity type (Info, Warning,

or Severe).

Server Filter events by the server on which the event

occurred.

Note: A small subset of the parameters that trigger system events have thresholds that can
be configured. These parameters should only be adjusted with advice from Symantec Support.
Before changing these settings, you should have a thorough understanding of the implications
that are involved. The default values are appropriate for most installations.
See “Configuring event thresholds and triggers” on page 170.

See “About system events” on page 164.

See “Server and Detectors event detail” on page 169.
See “ Working with saved system reports” on page 168.
See “Configuring event thresholds and triggers” on page 170.
See “About system alerts” on page 175.

Working with saved system reports

The System Reports screen lists system and agent-related reports that have previously been
saved. To display the System Reports screen, click System > System Reports. Use this
screen to work with saved system reports.
To create a saved system report
1 Go to one of the following screens:
■ System Events (System > Events)
■ Agents Overview (System > Agents > Overview)
■ Agents Events (System > Agents > Events)
See “About the Enforce Server administration console” on page 83.
2 Select the filters and summaries for your custom report.
See “About custom reports and dashboards” on page 1912.
3 Select Report > Save As.
Managing system events and messages 169
Server and Detectors event detail

4 Enter the saved report information.

See “Saving custom incident reports” on page 1914.
5 Click Save.
The System Reports screen is divided into two sections:
■ System Event - Saved Reports lists saved system reports.
■ Agent Management - Saved Reports lists saved agent reports.
For each saved report you can perform the following operations:
■ Share the report. Click share to allow other Symantec Data Loss Prevention uses who
have the same role as you to share the report. Sharing a report cannot be undone; after a
report is shared it cannot be made private. After a report is shared, all users with whom it
is shared can view, edit, or delete the report.
See “Saving custom incident reports” on page 1914.
■ Change the report name or description. Click the pencil icon to the right of the report name
to edit it.
■ Change the report scheduling. Click the calendar icon to the right of the report name to
edit the delivery schedule of the report and to whom it is sent.
See “Saving custom incident reports” on page 1914.
See “Delivery schedule options for incident and system reports” on page 1917.
■ Delete the report. Click the red X to the right of the report name to delete the report.

Server and Detectors event detail

To view the Server and Detectors Event Detail screen, go to System > Servers and
Detectors > Events and click one of the listed events.
See “System events reports” on page 165.
The Server and Detectors Event Detail screen displays all of the information available for
the selected event. The information on this screen is not editable.
The Server and Detectors Event Detail screen is divided into two sections—General and
Message.
Managing system events and messages 170
Configuring event thresholds and triggers

Table 8-4 Event detail — General

Item Description

Type The event is one of the following types:

■ Info: Information about the system.
■ Warning: A problem that is not severe enough to generate an error.
■ Severe: An error that requires immediate attention.

Time The date and time of the event.

Server or The name of the server or detector.

Detector

Host The host name or IP address of the server.

Table 8-5 Event detail — Message

Item Description

Code A number that identifies the kind of event.

See “System event codes and messages” on page 180.

Summary A brief description of the event.

Detail Detailed information about the event.

See “About system events” on page 164.

See “System events reports” on page 165.
See “About system alerts” on page 175.

Configuring event thresholds and triggers

A small subset of the parameters that trigger system events have thresholds that can be
configured. These parameters are configured for each detection server or detector separately.
These parameters should only be adjusted with advice from Symantec Support. Before changing
these settings, you should have a thorough understanding of the implications. The default
values are appropriate for most installations.
See “About system events” on page 164.
Managing system events and messages 171
Configuring event thresholds and triggers

To view and change the configurable parameters that trigger system events
1 Go to the Overview screen (System > Servers and Detectors > Overview).
2 Click on the name of a detection server or detector to display that server's Server/Detector
Detail screen.
3 Click Server/Detector Settings.
The Advanced Server/Detector Settings screen for that server is displayed.
4 Change the configurable parameters, as needed.

Table 8-6 Configurable parameters that trigger events

Parameter Description Event

BoxMonitor.DiskUsageError Indicates the amount of filled disk space Low disk space
(as a percentage) that triggers a severe
system event. For example, a Severe
event occurs if a detection server is
installed on the C drive and the disk
space error value is 90. The detection
server creates a Severe system event
when the C drive usage is 90% or
greater. The default is 90.

BoxMonitor.DiskUsageWarning Indicates the amount of filled disk space Low disk space
(as a percentage) that triggers a
Warning system event. For example, a
Warning event occurs if the detection
server is installed on the C drive and the
disk space warning value is 80. Then
the detection server generates a
Warning system event when the C drive
usage is 80% or greater. The default is
80.

BoxMonitor.MaxRestartCount Indicates the number of times that a process name restarts excessively
system process can be restarted in one
hour before a Severe system event is
generated. The default is 3.

IncidentDetection.MessageWaitSevere Indicates the number of minutes Long message wait time

messages need to wait to be processed
before a Severe system event is sent
about message wait times. The default
is 240.
Managing system events and messages 172
About system event responses

Table 8-6 Configurable parameters that trigger events (continued)

Parameter Description Event

IncidentDetection.MessageWaitWarning Indicates the number of minutes Long message wait time

messages need to wait to be processed
before sending a Severe system event
about message wait times. The default
is 60.

IncidentWriter.BacklogInfo Indicates the number of incidents that N incidents in queue

can be queued before an Info system
event is generated. This type of backlog
usually indicates that incidents are not
processed or are not processed
correctly because the system may have
slowed down or stopped. The default is
1000.

IncidentWriter.BacklogWarning Indicates the number of incidents that N incidents in queue

can be queued before generating a
Warning system event. This type of
backlog usually indicates that incidents
are not processed or are not processed
correctly because the system may have
slowed down or stopped. The default is
3000.

IncidentWriter.BacklogSevere Indicates the number of incidents that N incidents in queue

can be queued before a Severe system
event is generated. This type of backlog
usually indicates that incidents are not
processed or are not processed
correctly because the system may have
slowed down or stopped. The default is
10000.

About system event responses

There are three ways that system events can be brought to your attention:
■ System event reports displayed on the administration console
■ System alert email messages
See “About system alerts” on page 175.
■ Syslog functionality
See “Enabling a syslog server” on page 174.
Managing system events and messages 173
About system event responses

In most cases, the system event summary and detail information should provide enough
information to direct investigation and remediation steps. The following table provides some
general guidelines for responding to system events.

Table 8-7 System event responses

System event or category Appropriate response

Low disk space If this event is reported on a detection server, recycle the
Symantec Data Loss Prevention services on the detection server.
The detection server may have lost its connection to the Enforce
Server. The detection server then queues its incidents locally,
and fills up the disk.

If this event is reported on an Enforce Server, check the status

of the Oracle and the Symantec DLP Incident Persister services.
Low disk space may result if incidents do not transfer properly
from the file system to the database. This event may also indicate
a need to add additional disk space.

Tablespace is almost full Add additional data files to the database. When the hard disk is
at 80% of capacity, obtain a bigger disk instead of adding
additional data files.

Refer to the Symantec Data Loss Prevention Installation Guide.

Licensing and versioning Contact Symantec Support.

Monitor not responding Restart the Symantec DLP Detection Server service. If the event
persists, check the network connections. Make sure the computer
that hosts the detections server is turned on by connecting to it.
You can connect with terminal services or another remote desktop
connection method. If necessary, contact Symantec Support.

See “About Symantec Data Loss Prevention services”

on page 101.

Alert or scheduled report sending Go to System > Settings > General and ensure that the settings
failed in the Reports and Alerts and SMTP sections are configured
correctly. Check network connectivity between the Enforce Server
and the SMTP server. Contact Symantec Support.

Auto key ignition failed Contact Symantec Support.

Cryptographic keys are inconsistent Contact Symantec Support.

Managing system events and messages 174
Enabling a syslog server

Table 8-7 System event responses (continued)

System event or category Appropriate response

Long message wait time Increase detection server capacity by adding more CPUs or
replacing the computer with a more powerful one.

Decrease the load on the detection server. You can decrease

the load by applying the traffic filters that have been configured
to detect fewer incidents. You can also re-route portions of the
traffic to other detection servers.

Increase the threshold wait times if all of the following items are
true:

■ This message is issued during peak hours.

■ The message wait time drops down to zero before the next
peak.
■ The business is willing to have such delays in message
processing.

process_name restarts excessively Check the process by going to System > Servers > Overview.
To see individual processes on this screen, Process Control must
be enabled by going to System > Settings > General >
Configure.

N incidents in queue Investigate the reason for the incidents filling up the queue.
The most likely reasons are as follows:

■ Connection problems. Response: Make sure the

communication link between the Endpoint Server and the
detection server is stable.
■ Insufficient connection bandwidth for the number of generated
incidents (typical for WAN connections). Response: Consider
changing policies (by configuring the filters) so that they
generate fewer incidents.

Enabling a syslog server

Syslog functionality sends Severe system events to a syslog server. Syslog servers allow
system administrators to filter and route the system event notifications on a more granular
level. System administrators who use syslog regularly for monitoring their systems may prefer
to use syslog instead of alerts. Syslog may be preferred if the volume of alerts seems unwieldy
for email.
Syslog functionality is an on or off option. If syslog is turned on, all Severe events are sent to
the syslog server.
Managing system events and messages 175
About system alerts

To enable syslog functionality

1 Go to the \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config directory
on Windows or the
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config directory
on Linux.
2 Open the Manager.properties file.
3 Uncomment the #systemevent.syslog.host= line by removing the # symbol from the
beginning of the line, and enter the hostname or IP address of the syslog server.
4 Uncomment the #systemevent.syslog.port= line by removing the # symbol from the
beginning of the line. Enter the port number that should accept connections from the
Enforce Server server. The default is 514.
5 Uncomment the #systemevent.syslog.format= [{0}] {1} - {2} line by removing the
# symbol from the beginning of the line. Then define the system event message format
to be sent to the syslog server:
If the line is uncommented without any changes, the notification messages are sent in the
format: [server name] summary - details. The format variables are:
■ {0} - the name of the server on which the event occurred
■ {1} - the event summary
■ {2} - the event detail
For example, the following configuration specifies that Severe system event notifications
are sent to a syslog host named server1 which uses port 600.

systemevent.syslog.host=server1
systemevent.syslog.port=600
systemevent.syslog.format= [{0}] {1} - {2}

Using this example, a low disk space event notification from an Enforce Server on a host
named dlp-1 would look like:

dlp-1 Low disk space - Hard disk space for

incident data storage server is low. Disk usage is over 82%.

See “About system events” on page 164.

About system alerts

System alerts are email messages that are sent to designated addresses when a particular
system event occurs. You define what alerts (if any) that you want to use for your installation.
Managing system events and messages 176
Configuring the Enforce Server to send email alerts

Alerts are specified and edited on the Configure Alert screen, which is reached by System
> Servers and Detectors > Alerts > Add Alert.
Alerts can be specified based on event severity, server name, or event code, or a combination
of those factors. Alerts can be sent for any system event.
The email that is generated by the alert has a subject line that begins with Symantec Data
Loss Prevention System Alert followed by a short event summary. The body of the email
contains the same information that is displayed by the Event Detail screen to provide complete
information about the event.
See “Configuring the Enforce Server to send email alerts” on page 176.
See “Configuring system alerts” on page 177.
See “Server and Detectors event detail” on page 169.

Configuring the Enforce Server to send email alerts

To send out email alerts regarding specified system events, the Enforce Server has to be
configured to support sending of alerts and reports. This section describes how to specify the
report format and how to configure Symantec Data Loss Prevention to communicate with an
SMTP server.
After completing the configuration described here, you can schedule the sending of specific
reports and create specific system alerts.
To configure Symantec Data Loss Prevention to send alerts and reports
1 Go to System > Settings > General and click Configure.
The Edit General Settings screen is displayed.
2 In the Reports and Alerts section, select one of the following distribution methods:
■ Send reports as links, logon is required to view. Symantec Data Loss Prevention
sends email messages with links to reports. You must log on to the Enforce Server to
view the reports.

Note: Reports with incident data cannot be distributed if this option is set.

■ Send report data with emails. Symantec Data Loss Prevention sends email messages
and attaches the report data.
Managing system events and messages 177
Configuring system alerts

3 Enter the Enforce Server domain name or IP address in the Fully Qualified Manager
Name field.
If you send reports as links, Symantec Data Loss Prevention uses the domain name as
the basis of the URL in the report email.
Do not specify a port number unless you have modified the Enforce Server to run on a
port other than the default of 443.
4 If you want alert recipients to see any correlated incidents, check the Correlations Enabled
box.
When correlations are enabled, users see them on the Incident Snapshot screen.
5 In the SMTP section, identify the SMTP server to use for sending out alerts and reports.
Enter the relevant information in the following fields:
■ Server: The fully qualified hostname or IP address of the SMTP server that Symantec
Data Loss Prevention uses to deliver system events and scheduled reports.
■ System email: The email address for the alert sender. Symantec Data Loss Prevention
specifies this email address as the sender of all outgoing email messages. Your IT
department may require the system email to be a valid email address on your SMTP
server.
■ User ID: If your SMTP server requires it, type a valid user name for accessing the
server. For example, enter DOMAIN\bsmith.
■ Password: If your SMTP server requires it, enter the password for the User ID.

6 Click Save.
See “About system alerts” on page 175.
See “Configuring system alerts” on page 177.
See “About system events” on page 164.

Configuring system alerts

You can configure Symantec Data Loss Prevention to send an email alert whenever it detects
a specified system event. Alerts can be specified based on event severity, server name, or
event code, or a combination of those factors. Alerts can be sent for any system event.
See “About system alerts” on page 175.
Note that the Enforce Server must first be configured to send alerts and reports.
See “Configuring the Enforce Server to send email alerts” on page 176.
Managing system events and messages 178
Configuring system alerts

Alerts are specified and edited on the Configure Alert screen, which is reached by System
> Servers > Alerts and then choosing Add Alert to create a new alert, or clicking on the name
of an existing alert to modify it.
To create or modify an alert
1 Go the Alerts screen (System > Servers and Detectors > Alerts).
2 Click the Add Alert tab to create a new alert, or click on the name of an alert to modify
it.
The Configure Alert screen is displayed.
3 Fill in (or modify) the name of the alert. The alert name is displayed in the subject line of
the email alert message.
4 Fill in (or modify) a description of the alert.
5 Click Add Condition to specify a condition that will trigger the alert.
Each time you click Add Condition you can add another condition. If you specify multiple
conditions, every one of the conditions must be met to trigger the alert.
Click on the red X next to a condition to remove it from an existing alert.
6 Enter the email address that the alert is to be sent to. Separate multiple addresses by
commas.
7 Limit the maximum number of times this alert can be sent in one hour by entering a number
in the Max Per Hour box.
If no number is entered in this box, there is no limit on the number of times this alert can
be sent out. The recommended practice is to limit alerts to one or two per hour, and to
substitute a larger number later if necessary. If you specify a large number, or no number
at all, recipient mailboxes may be overloaded with continual alerts.
8 Click Save to finish.
The Alerts list is displayed.
There are three kinds of conditions that you can specify to trigger an alert:
■ Event type - the severity of the event.
■ Server - the server associated with the event.
■ Event code - a code number that identifies a particular kind of event.
For each kind of condition, you can choose one of two operators:
■ Is any of.
■ Is none of.
For each kind of condition, you can specify appropriate parameters:
Managing system events and messages 179
About log review

■ Event type. You can select one, or a combination of, Information, Warning, Severe. Click
on an event type to specify it. To specify multiple types, hold down the Control key while
clicking on event types. You can specify one, two, or all three types.
■ Server. You can select one or more servers from the list of available servers. Click on the
name of server to specify it. To specify multiple servers, hold down the Control key while
clicking on server names. You can specify as many different servers as necessary.
■ Event code. Enter the code number. To enter multiple code numbers, separate them with
commas or use the Return key to enter each code on a separate line.
See “System event codes and messages” on page 180.
By combining multiple conditions, you can define alerts that cover a wide variety of system
conditions.

Note: If you define more than one condition, the conditions are treated as if they were connected
by the Boolean "AND" operator. This means that the Enforce Server only sends the alert if all
conditions are met. For example, if you define an event type condition and a server condition,
the Enforce Server only sends the alert if the specified event occurs on the designated server.

See “About system alerts” on page 175.

See “Configuring the Enforce Server to send email alerts” on page 176.
See “System events reports” on page 165.

About log review

Your Symantec Data Loss Prevention installation includes a number of log files. These files
provide information on server communication, Enforce Server and detection server operation,
incident detection, and so on.
By default, logs for the Enforce Server and detection server are stored in the following
directories:
■ Windows:c:\ProgramData\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\logs
■ Linux: /var/log/Symantec/DataLossPrevention/EnforceServer/15.5/
See “About log files” on page 333.
See also the Symantec Data Loss Prevention System Maintenance Guide for additional
information about working with logs.
Managing system events and messages 180
System event codes and messages

System event codes and messages

Symantec Data Loss Prevention system events are monitored, reported, and logged. Each
event is identified by code number listed in the tables.
See “About system events” on page 164.
System event lists and reports can be filtered by event codes.
See “System events reports” on page 165.

Note: Numbers enclosed in braces, such as {0}, indicate text strings that are dynamically
inserted into the actual event name or description message.

Table 8-8 General detection server events

Code Summary Description

1000 Monitor started All monitor processes have been started.

1001 Local monitor started All monitor processes have been started.

1002 Monitor started Some monitor processes are disabled and haven't been
started.

1003 Local monitor started Some monitor processes are disabled and haven't been
started.

1004 Monitor stopped All monitor processes have been stopped.

1005 Local monitor stopped All monitor processes have been stopped.

1006 {0} failed to start Process {0} can't be started. See log files for more detail.

1007 {0} restarts excessively Process {0} has restarted {1} times during last {2} minutes.

1008 {0} is down {0} process went down before it had fully started.

1010 Restarted {0} {0} process was restarted because it went down unexpectedly.

1011 Restarted {0} {0} was restarted because it was not responding.

1012 Unable to start {0} Cannot bind to the shutdown datagram socket. Will retry.

1013 {0} resumed starting Successfully bound to the shutdown socket.

1014 Low disk space Hard disk space is low. Symantec Data Loss Prevention
server disk usage is over {0}%.
Managing system events and messages 181
System event codes and messages

Table 8-9 Endpoint server events

Code Summary Description

1100 Aggregator started None

1101 Aggregator failed to start Error starting Aggregator. {0} No incidents will be detected.

1102 Communications with non-legacy SSL keystore and truststore are not configured for this
agents are disabled endpoint server. Please go to configure server page to
configure SSL keystore and truststore.

Table 8-10 Detection configuration events

Code Summary Description

1200 Loaded policy "{0}" Policy "{0}" v{1} ({2}) has been successfully loaded.

1201 Loaded policies {0} None

1202 No policies loaded No relevant policies are found. No incidents will be detected.
1203 Unloaded policy "{0}" Policy "{0}" has been unloaded.

1204 Updated policy "{0}" Policy "{0}" has been successfully updated. The current policy
version is {1}. Active channels: {2}.

1205 Incident limit reached for Policy The policy "{0}" has found incidents in more than {1}
"{0}" messages within the last {2} hours. The policy will not be
enforced until the policy is changed, or the reset period of {2}
hours is reached.

1206 Long message wait time Message wait time was {0}:{1}:{2}:{3}.

1207 Failed to load Vector Machine Failed to load [{0}] Vector Machine Learning profile. See
Learning profile server logs for more details.

1208 Failed to unload Vector Machine Failed to unload [{0}] Vector Machine Learning profile. See
Learning profile server logs for more details.

1209 Loaded Vector Machine Learning Loaded [{0}] Vector Machine Learning profile.
profile

1210 Unloaded Vector Machine Unloaded [{0}] Vector Machine Learning profile.
Learning profile

1211 Vector Machine Learning training Training succeeded for [{0}] Vector Machine Learning profile.
successful

1212 Vector Machine Learning training Training failed for [{0}] Vector Machine Learning profile.
failed
Managing system events and messages 182
System event codes and messages

Table 8-10 Detection configuration events (continued)

Code Summary Description

1213 {0} messages timed out in {0} messages timed out in Detection in the last {1} minutes.
Detection recently Enable Detection execution trace logs for details.

1214 Detected regular expression rules Policy set contains regular expression rule(s) with invalid
with invalid patterns patterns. See FileReader.log for details.

Table 8-11 File reader events

Code Summary Description

1301 File Reader started None

1302 File Reader failed to start Error starting File Reader. {0} No incidents will be detected.

1303 Unable to delete folder File Reader was unable to delete folder "{0}" in the file system.
Please investigate, as this will cause system malfunction.

1304 Channel enabled Monitor channel "{0}" has been enabled.

1305 Channel disabled Monitor channel "{0}" has been disabled. 1306 License
received. {0}.

1306 License received. None

1307 started Process is started.

1308 down Process is down.

Table 8-12 ICAP events

Code Summary Description

1400 ICAP channel configured The channel is in {0} mode

1401 Invalid license The ICAP channel is not licensed or the license has expired.
No incidents will be detected or prevented by the ICAP
channel.

1402 Content Removal Incorrect Configuration rule in line {0} is outdated or not written in
proper grammar format. Either remove it from the config file
or update the rule.

1403 Out of memory Error (Web While processing request on connection ID{0}, out of memory
Prevent) while processing error occurred. Please tune your setup for traffic load.
message
Managing system events and messages 183
System event codes and messages

Table 8-12 ICAP events (continued)

Code Summary Description

1404 Host restriction Any host (ICAP client) can connect to ICAP Server.

1405 Host restriction error Unable to get the IP address of host {0}.

1406 Host restriction error Unable to get the IP address of any host in Icap.AllowHosts.

1407 Protocol Trace Enabled Enabled Traces available at {0}.

1408 Invalid Load Balance Factor Icap LoadBalanceFactor configured to 0. Treating it as 1.

Table 8-13 MTA events

Code Summary Description

1500 Invalid license The SMTP Prevent channel is not licensed or the license has
expired. No incidents will be detected or prevented by the
SMTP Prevent channel.

1501 Bind address error Unable to bind {0}. Please check the configured address or
the RequestProcessor log for more information. 1502 MTA
restriction error Unable to resolve host {0}.

1503 All MTAs restricted Client MTAs are restricted, but no hosts were resolved.
Please check the RequestProcessor log for more information
and correct the RequestProcessor.AllowHosts setting for this
Prevent server.

1504 Downstream TLS Handshake TLS handshake with downstream MTA {0} failed. Please
failed check SmtpPrevent and RequestProcessor logs for more
information.

1505 Downstream TLS Handshake TLS handshake with downstream MTA {0} was successfully
successful completed.

Table 8-14 File inductor events

Code Summary Description

1600 Override folder invalid Monitor channel {0} has invalid source folder: {1} Using folder:
{2}.

1601 Source folder invalid Monitor channel {0} has invalid source folder: {1} The channel
is disabled.
Managing system events and messages 184
System event codes and messages

Table 8-15 File scan events

Code Summary Description

1700 Scan start failed Discover target with ID {0} does not exist. 1701 Scan
terminated {0}

1702 Scan completed Scan completed. Discover Target Name - "{0}"

1703 Scan start failed {0}

1704 Share list had errors {0}

1705 Scheduled scan failed Failed to start a scheduled scan of Discover target {0}. {1}

1706 Scan suspend failed {0}

1707 Scan resume failed {0}

1708 Scheduled scan suspension Scheduled suspension failed for scan of Discover target {0}.
failed {1}

1709 Scheduled scan resume failed Scheduled suspension failed for scan of Discover target {0}.
{1}

1710 Maximum Scan Duration Timeout Discover target "{0}" timed out because of Maximum Scan
Occurred Duration.

1711 Maximum Scan Duration Timeout Maximum scan time duration timed out for scan: {0}. However,
Failed an error occurred while trying to abort the scan.

1712 Scan Idle Timeout Occurred Discover target "{0}" timed out because of Scan Idle Timeout.

1713 Scan Idle Timeout Failed Maximum idle time duration timed out for scan: {0}. However,
an error occurred while trying to abort the scan.

1714 Scan terminated - Invalid Server Scan of discover target "{0}" has been terminated from the
State state of "{1}" because the associated discover server {2}
entered an unexpected state of "{3}".

1715 Scan terminated - Server Scan of discover target "{0}" has been terminated because
Removed the associated discover server {1} is no longer available.

1716 Scan terminated - Server Scan of discover target "{0}" has been terminated because
Reassigned the associated discover server {1} is already scanning
discover target(s) "{2}".

1717 Scan terminated - Transition Failed to handle the state change of discover server {1} while
Failed scanning discover target "{0}". See log files for details.
Managing system events and messages 185
System event codes and messages

Table 8-15 File scan events (continued)

Code Summary Description

1718 Scan start failed Scan of discover target "{0}" has failed to start. See log files
for detailed error description.

1719 Scan start failed due to Scan of discover target "{0}" has failed, as its target type is
unsupported target type no longer supported.

1720 Scan started Scan started. Discover Target Name - "{0}"

1721 Scan paused Scan paused. Discover Target Name - "{0}"

1722 Scan stopped Scan stopped. Discover Target Name - "{0}"

1723 Scan queued Scan queued. Discover Target Name - "{0}"

1724 Scan failed Scan failed. Discover Target Name - "{0}"

Table 8-16 Incident attachment external storage events

Code Summary Description

1750 Incident attachment migration Migration of incident attachments from database to external
started storage directory has started.

1751 Incident attachment migration Completed migrating incident attachments from database to
completed external storage directory.

1752 Incident attachment migration One or more incident attachments could not be migrated from
failed database to external storage directory. Check the incident
persister log for more details. Once the error is resolved,
restart the SymantecDLPIncidentPersisterService
service to resume the migration.

1753 Incident attachment migration One or more incident attachments migration from database
error. to external storage directory has encountered error. Check
the incident persister log for more details. Migration will
continue and will retry erred attachment later.

1754 Failed to update incident Failed to update the schedule to delete incident attachments
attachment deletion schedule in the external directory. Check the incident persister log for
more details.

1755 Incident attachment deletion Deletion of obsolete incident attachments from the external
started storage directory has started.

1756 Incident attachment deletion Deletion of obsolete incident attachments from the external
completed storage directory has completed.
Managing system events and messages 186
System event codes and messages

Table 8-16 Incident attachment external storage events (continued)

Code Summary Description

1757 Incident attachment deletion One or more incident attachments could not be deleted from
failed the external storage directory. Check the incident persister
log for more details.

1758 Incident attachment external Incident attachment external storage directory is not
storage directory is not accessible. Check the incident persister log for more details.
accessible

Incident attachment external Incident attachment external storage directory is accessible.

storage directory is accessible

Table 8-17 Incident persister and incident writer events

Code Summary Description

1800 Incident Persister is unable to Persister ran out of memory processing incident {0}.
process incident Incident

1801 Incident Persister failed to

process incident {0}

1802 Corrupted incident received A corrupted incident was received, and renamed to {0}.

1803 Policy misconfigured Policy "{0}" has no associated severity.

1804 Incident Persister is unable to Incident Persister cannot start because it failed to access the
start incident folder {0}. Check folder permissions.

1805 Incident Persister is unable to Incidents folder The Incident Persister is unable to access
access the incident folder {0}. Check folder permissions.

1806 Response rule processing failed Response rule processing failed to start: {0}.
to start

1807 Response rule processing Response rule command runtime execution failed from error:
execution failed {0}.

1808 Unable to write incident Failed to delete old temporary file {0}.

1809 Unable to write incident Failed to rename temporary incident file {0}.

1810 Unable to list incidents Failed to list incident files in folder {0}. Check folder
permissions.

1811 Error sending incident Unexpected error occurred while sending an incident. {0}
Look in the incident writer log for more information.
Managing system events and messages 187
System event codes and messages

Table 8-17 Incident persister and incident writer events (continued)

Code Summary Description

1812 Incident writer stopped Failed to delete incident file {0} after it was sent. Delete the
file manually, correct the problem and restart the incident
writer.

1813 Failed to list incidents Failed to list incident files in folder {0}. Check folder
permissions.

1814 Incident queue backlogged There are {0} incidents in this server's queue.

1815 Low disk space on incident server Hard disk space for the incident data storage server is low.
Disk usage is over {0}%.

1816 Failed to update policy statistics Failed to update policy statistics for policy {0}.

1817 Daily incident maximum The daily incident maximum for policy {0} has been
exceeded exceeded.\n No further incidents will be generated.

1818 Incident is oversized, has been Incident is oversized, has been partially persisted with
persisted with a limited number messageID {0}, Incident File Name {1}.
of components and/or violations

1821 Failure to process an incident Unexpected error occurred while sending an incident {0}
received from the cloud gateway

Table 8-18 Install or update events

Code Summary Description

1900 Failed to load update package Database connection error occurred while loading the
software update package {0}.

1901 Software update failed Failed to apply software update from package {0}. Check the
update service log.

Table 8-19 Key ignition password events

Code Summary Description

2000 Key ignition error Failed to ignite keys with the new ignition password. Detection
against Exact Data Profiles will be disabled.

2001 Unable to update key ignition The key ignition password won't be updated, because the
password. cryptographic keys aren't ignited. Exact Data Matching will
be disabled.
Managing system events and messages 188
System event codes and messages

Table 8-20 Admin password reset event code

Code Summary Description

2099 Administrator password reset The Administrator password has been reset by the password
reset tool.

Table 8-21 Manager administrator and policy events

Code Summary Description

2100 Administrator saved The administrator settings were successfully saved.

2101 Data source removed The data source with ID {0} was removed by {1}.

2102 Data source saved The {0} data source was saved by {1}.

2103 Document source removed The document source with ID {0} was removed by {1}.

2104 Document source saved The {0} document source was saved by {1}.

2105 New protocol created The new protocol {0} was created by {1}.

2106 Protocol order changed The protocol {0} was moved {1} by {2}.

2107 Protocol removed The protocol {0} was removed by {1}.

2108 Protocol saved The protocol {0} was edited by {1}.

2109 User removed The user with ID {0} was removed by {1}.

2110 User saved The user {0} was saved by {1}.

2111 Runaway lookup detected One of the attribute lookup plug-ins did not complete
gracefully and left a running thread in the system. Manager
restart may be required for cleanup.

2112 Loaded Custom Attribute Lookup Plug-ins The following Custom Attribute
Lookup Plug-ins were loaded: {0}.

2113 No Custom Attribute Lookup No Custom Attribute Lookup Plug-in was found.
Plug-in was loaded

2114 Custom attribute lookup failed Lookup plug-in {0} timed out. It was unloaded.

2115 Custom attribute lookup failed Failed to instantiate lookup plug-in {0}. It was unloaded. Error
message: {1}

2116 Policy changed The {0} policy was changed by {1}.

2117 Policy removed The {0} policy was removed by {1}.

Managing system events and messages 189
System event codes and messages

Table 8-21 Manager administrator and policy events (continued)

Code Summary Description

2118 Alert or scheduled report sending configured by {1} contains the following unreachable email
failed. {0} addresses: {2}. Either the addresses are bad or your email
server does not allow relay to those addresses.

2119 System settings changed The system settings were changed by {0}.

2120 Endpoint Location settings The endpoint location settings were changed by {0}.
changed

2121 The account ''{1}'' has been The maximum consecutive failed logon number of {0}
locked out attempts has been exceeded for account ''{1}'', consequently
it has been locked out.

2122 Loaded FlexResponse Actions The following FlexResponse Actions were loaded: {0}.

2123 No FlexResponse Action was No FlexResponse Action was found.

loaded.

2124 A runaway FlexResponse action One of the FlexResponse plug-ins did not complete gracefully
was detected. and left a running thread in the system. Manager restart may
be required for cleanup.

2125 Data Insight settings changed. The Data Insight settings were changed by {0}.

2126 Agent configuration created Agent configuration {0} was created by {1}.

2127 Agent configuration modified Agent configuration {0} was modified by {1}.

2128 Agent configuration removed Agent configuration {0} was removed by {1}.

2129 Agent configuration applied Agent configuration {0} was applied to endpoint server {1} by
{2}.

2130 Directory Connection source The directory connection source with ID {0} was removed by
removed {1}.

2131 Directory Connection source The {0} directory connection source was saved by {1}.
saved

2132 Agent Troubleshooting Task Agent Troubleshooting task of type {0} created by user {1}.

2133 Certificate authority file Certificate authority file {0} generated.

generated.

2134 Certificate authority file is corrupt. Certificate authority file {0} is corrupt.
Managing system events and messages 190
System event codes and messages

Table 8-21 Manager administrator and policy events (continued)

Code Summary Description

2135 Password changed for certificate Password changed for certificate authority file {0}. New
authority file. certificate authority file is {1}.

2136 Server keystore generated. Server keystore {0} generated for endpoint server {1}.

2137 Server keystore is missing or Server keystore {0} for endpoint server {1} is missing or
corrupt. corrupt.

2138 Server truststore generated. Server truststore {0} generated for endpoint server {1}.

2139 Server truststore is missing or Server truststore {0} for endpoint server {1} is missing or
corrupt. corrupt.

2140 Client certificates and key Client certificates and key generated.
generated.

2141 Agent installer package Agent installer package generated for platforms {0}.
generated.

Table 8-22 Enforce licensing and key ignition events

Code Summary Description

2200 End User License Agreement The Symantec Data Loss Prevention End User License
accepted Agreement was accepted by {0}, {1}, {2}.

2201 License is invalid None

2202 License has expired One or more of your product licenses has expired. Some
system feature may be disabled. Check the status of your
licenses on the system settings page.

2203 License about to expire One or more of your product licenses will expire soon. Check
the status of your licenses on the system settings page.

2204 No license The license does not exist, is expired or invalid. No incidents
will be detected.

2205 Keys ignited The cryptographic keys were ignited by administrator logon.

2206 Key ignition failed Failed to ignite the cryptographic keys manually. Please look
in the Enforce Server logs for more information. It will be
impossible to create new exact data profiles.

2207 Auto key ignition The cryptographic keys were automatically ignited.
Managing system events and messages 191
System event codes and messages

Table 8-22 Enforce licensing and key ignition events (continued)

Code Summary Description

2208 Manual key ignition required The automatic ignition of the cryptographic keys is not
configured. Administrator logon is required to ignite the
cryptographic keys. No new exact data profiles can be created
until the administrator logs on.

Table 8-23 Manager major events

Code Summary Description

2300 Low disk space Hard disk space is low. Symantec Data Loss Prevention
Enforce Server disk usage is over {0}%.

2301 Tablespace is almost full Oracle tablespace {0} is over {1}% full.

2302 {0} not responding Detection Server {0} did not update its heartbeat for at least
20 minutes.

2303 Monitor configuration changed The {0} monitor configuration was changed by {1}.

2304 System update uploaded A system update was uploaded that affected the following
components: {0}.

2305 SMTP server is not reachable. SMTP server is not reachable. Cannot send out alerts or
schedule reports.

2306 Enforce Server started The Enforce Server was started.

2307 Enforce Server stopped The Enforce Server was stopped.

2308 Monitor status updater exception The monitor status updater encountered a general exception.
Please look at the Enforce Server logs for more information.

2309 System statistics update failed Unable to update the Enforce Server disk usage and database
usage statistics. Please look at the Enforce Server logs for
more information.

2310 Statistics aggregation failure The statistics summarization task encountered a general
exception. Refer to the Enforce Server logs for more
information.

2311 Version mismatch Enforce version is {0}, but this monitor's version is {1}.

2312 Incident deletion failed Incident Deletion failed .

2313 Incident deletion completed Incident deletion ran for {0} and deleted {1} incident(s).

2314 Endpoint data deletion failed Endpoint data deletion failed.

Managing system events and messages 192
System event codes and messages

Table 8-23 Manager major events (continued)

Code Summary Description

2315 Low disk space on incident server Hard disk space for the incident data storage server is low.
Disk usage is over {0}%.

2316 Over {0} incidents currently Persisting over {0} incidents can decrease database
contained in the database performance.

2318 Incident deletion flagging process Incident deletion flagging process started.
started.

2319 Incident deletion flagging process Incident deletion flagging process ended.
ended.

Table 8-24 Monitor version support events

Code Summary Description

2320 Version obsolete Detection server is not supported when two major versions
older than Enforce server version. Enforce version is {0}, and
this detection server's version is {1}. This detection server
must be upgraded.

2321 Version older than Enforce Enforce will not have visibility for this detection server and
version will not be able to send updates to it. Detection server
incidents will be received and processed normally. Enforce
version is {0}, and this detection server's version is {1}.

2322 Version older than Enforce Functionality introduced with recent versions of Enforce
version relevant to this type of detection server will not be supported
by this detection server. Enforce version is {0}, and this
detection server's version is {1}.

2323 Minor version older than Enforce Functionality introduced with recent versions of Enforce
minor version relevant to this type of detection server will not be supported
by this detection server and might be incompatible with this
detection server. Enforce version is {0}, and this detection
server's version is {1}. This detection server should be
upgraded.

2324 Version newer than Enforce Detection server is not supported when its version is newer
version than the Enforce server version. Enforce version is {0}, and
this detection server's version is {1}. Enforce must be
upgraded or detection server must be downgraded.
Managing system events and messages 193
System event codes and messages

Table 8-25 Manager reporting events

Code Summary Description

2400 Export web archive finished Archive "{0}" for user {1} was created successfully.

2401 Export web archive canceled Archive "{0}" for user {1} was canceled.

2402 Export web archive failed Failed to create archive "{0}" for user {1}. The report specified
had over {2} incidents.

2403 Export web archive failed Failed to create archive "{0}" for user {1}. Failure occurred at
incident {2}.

2404 Unable to run scheduled report The scheduled report job {0} was invalid and has been
removed.

2405 Unable to run scheduled report The scheduled report {0} owned by {1} encountered an error:
{2}.

2406 Report scheduling is disabled The scheduled report {0} owned by {1} cannot be run because
report scheduling is disabled.

2407 Report scheduling is disabled The scheduled report cannot be run because report
scheduling is disabled.

2408 Unable to run scheduled report Unable to connect to mail server when delivery scheduled
report {0}{1}.

2409 Unable to run scheduled report User {0} is no longer in role {1} which scheduled report {2}
belongs to. The schedule has been deleted.

2410 Unable to run scheduled report Unable to run scheduled report {0} for user {1} because the
account is currently locked.

2411 Scheduled report sent The schedule report {0} owned by {1} was successfully sent.

2412 Export XML report failed XML Export of report by user [{0}] failed XML Export of report
by user [{0}] failed.

2420 Unable to run scheduled data Unable to distribute report {0} (id={1}) by data owner because
owner report distribution sending of report data has been disabled.

2421 Report distribution by data owner Report distribution by data owner for report {0} (id={1}) failed.
failed

2422 Report distribution by data owner Report distribution by data owner for report {0} (id={1})
finished finished with {2} incidents for {3} data owners. {4} incidents
for {5} data owners failed to be exported.
Managing system events and messages 194
System event codes and messages

Table 8-25 Manager reporting events (continued)

Code Summary Description

2423 Report distribution to data owner The report distribution {1} (id={2}) for the data owner "{0}"
truncated exceeded the maximum allowed size. Only the first {3}
incidents were sent to "{0}".

Table 8-26 Messaging events

Code Summary Description

2500 Unexpected Error Processing {0} encountered an unexpected error processing a message.
Message See the log file for details.

2501 Memory Throttler disabled {0} x {1} bytes need to be available for memory throttling.
Only {2} bytes were available. Memory Throttler has been
disabled.

Table 8-27 Detection server communication events

Code Summary Description

2600 Communication error Unexpected error occurred while sending {1} updates to {0}.
{2} Please look at the monitor controller logs for more
information.

2650 Communication error(VML) Unexpected error occurred while sending profile updates
config set {0} to {1} {2}. Please look at the monitor controller
logs for more information.

Table 8-28 Monitor controller events

Code Summary Description

2700 Monitor Controller started Monitor Controller service was started.

2701 Monitor Controller stopped Monitor Controller service was stopped.

2702 Update transferred to {0} Successfully transferred update package {1} to detection
server {0}.

2703 Update transfer complete Successfully transferred update package {0} to all detection
servers.

2704 Update of {0} failed Failed to transfer update package to detection server {0}.

2705 Configuration file delivery Successfully transferred config file {0} to detection server.
complete
Managing system events and messages 195
System event codes and messages

Table 8-28 Monitor controller events (continued)

Code Summary Description

2706 Log upload request sent. Successfully sent log upload request {0}.

2707 Unable to send log upload Encountered a recoverable error while attempting to deliver
request log upload request {0}.

2708 Unable to send log upload Encountered an unrecoverable error while attempting to
request deliver log upload request {0}.

2709 Using built-in certificate Using built-in certificate to secure the communication between
Enforce and Detection Servers.

2710 Using user generated certificate Using user generated certificate to secure the communication
between Enforce and Detection Servers.

2711 Time mismatch between Enforce Time mismatch between Enforce and Monitor. It is
and Monitor. This may affect recommended to fix the time on the monitor through automatic
certain functions in the system. time synchronization.

2712 Connected to cloud detector Connected to cloud detector.

2713 Cloud connector disconnected Error {0} - check your network settings.

Table 8-29 Packet capture events

Code Summary Description

2800 Bad spool directory configured Packet Capture has been configured with a spool directory:
for Packet Capture {0}. This directory does not have write privileges. Please
check the directory permissions and monitor configuration
file. Then restart the monitor.

2801 Failed to send list of NICs. {0} {0}.

Table 8-30 EDM index events and messages

Code Summary Description

2900 EDM profile search failed {0}.

2901 Keys are not ignited Exact Data Matching will be disabled until the cryptographic
keys are ignited.

2902 Index folder inaccessible Failed to list files in the index folder {0}. Check the
configuration and the folder permissions.
Managing system events and messages 196
System event codes and messages

Table 8-30 EDM index events and messages (continued)

Code Summary Description

2903 Created index folder The local index folder {0} specified in the configuration had
not existed. It was created.

2904 Invalid index folder The index folder {0} specified in the configuration does not
exist.

2905 Exact data profile creation failed Data file for exact data profile "{0}" was not created. Please
look in the enforce server logs for more information.

2906 Indexing canceled Creation of database profile "{0}" was canceled.

2907 Replication canceled Canceled replication of database profile "{0}" version {1} to
server {2}.

2908 Replication failed Connection to database was lost while replicating database
profile {0} to server {1}.

2909 Replication failed Database error occurred while replicating database profile
{0} to server {1}.

2910 Failed to remove index file Failed to delete index file {1} of database profile {0}.

2911 Failed to remove index files Failed to delete index files {1} of database profile {0}.

2912 Failed to remove orphaned file Failed to remove orphaned database profile index file {0}.

2913 Replication failed Replication of database profile {0} to server {2} failed.{1}
Check the monitor controller log for more details.

2914 Replication completed Completed replication of database profile {0} to server {2}.
File {1} was transferred successfully.

2915 Replication completed Completed replication of database profile {0} to the server
{2}. Files {1} were transferred successfully.

2916 Database profile removed Database profile {0} was removed. File {1} was deleted
successfully.

2917 Database profile removed Database profile {0} was removed. Files {1} were deleted
successfully.

2918 Loaded database profile Loaded database profile {0} from {1}.

2919 Unloaded database profile Unloaded database profile {0}.

2920 Failed to load database profile {2} No incidents will be detected against database profile "{0}"
version {1}.
Managing system events and messages 197
System event codes and messages

Table 8-30 EDM index events and messages (continued)

Code Summary Description

2921 Failed to unload database profile {2} It may not be possible to reload the database profile "{0}"
version {1} in the future without detection server restart.

2922 Couldn't find registered content Registered content with ID {0} wasn't found in database during
indexing.

2923 Database error Database error occurred during indexing. {0}

2924 Process shutdown during The process has been shutdown during indexing. Some
indexing registered content may have failed to create.

2925 Policy is inaccurate Policy "{0}" has one or more rules with unsatisfactory
detection accuracy against {1}.{2}

2926 Created exact data profile Created {0} from file "{1}".\nRows processed: {2}\nInvalid
rows: {3}\nThe exact data profile will now be replicated to all
Symantec Data Loss Prevention Servers.

2927 User Group "{0}" synchronization The following User Group directories have been
failed removed/renamed in the Directory Server and could not be
synchronized: {1}.Please update the "{2}" User Group page
to reflect such changes.

2928 One or more EDM profiles are out Check the "Manage > Data Profiles > Exact Data" page for
of date and must be reindexed more details. The following EDM profiles are out of date: {0}.

Table 8-31 IDM index events and messages

Code Summary Description

3000 {0} {1} Document profile wasn't created.

3001 Indexing canceled Creation of document profile "{0}" was canceled.

3002 Replication canceled Canceled replication of document profile "{0}" version {1} to
server {2}.

3003 Replication failed Connection to database was lost while replicating document
profile "{0}" version {1} to server {2}.

3004 Replication failed Database error occurred while replicating document profile
"{0}" version {1} to server {2}.

3005 Failed to remove index file Failed to delete index file {2} of document profile "{0}" version
{1}.
Managing system events and messages 198
System event codes and messages

Table 8-31 IDM index events and messages (continued)

Code Summary Description

3006 Failed to remove index files Failed to delete index files {2} of document profile "{0}" version
{1}.

3007 Failed to remove orphaned file {0}

3008 Replication failed Replication of document profile "{0}" version {1} to server {3}
failed. {2}\nCheck the monitor controller log for more details.

3009 Replication completed Completed replication of document profile "{0}" version {1}
to server {3}. File {2} was transferred successfully.

3010 Replication completed Completed replication of document profile "{0}" version {1}
to server {3}.\nFiles {2} were transferred successfully.

3011 Document profile removed Document profile "{0}" version {1} was removed. File {2} was
deleted successfully.

3012 Document profile removed Document profile "{0}" version {1} was removed. Files {2}
were deleted successfully.

3013 Loaded document profile Loaded document profile "{0}" version {1} from {2}.

3014 Unloaded document profile Unloaded document profile "{0}" version {1}.

3015 Failed to load document profile {2}No incidents will be detected against document profile "{0}"
version {1}.

3016 Failed to unload document profile {2} It may not be possible to reload the document profile "{0}"
version {1} in the future without monitor restart.

3017 Created document profile Created "{0}" from "{1}". There are {2} accessible files in the
content root. {3} The profile contains index for {4}
document(s). {5} The document profile will now be replicated
to all Symantec Data Loss Prevention Servers.

3018 Document profile {0} has reached maximum size. Only {1} out of {2} documents
are indexed.

3019 Nothing to index Document source "{0}" found no files to index.

3020 Created document profile Created "{0}" from "{1}". There are {2} accessible files in the
content root. {3} The profile contains index for {4}
document(s). Comparing to last indexing run: {5} new
document(s) were added, {6} document(s) were updated, {7}
documents were unchanged, and {8} documents were
removed. The document profile will now be replicated to all
Symantec Data Loss Prevention servers.
Managing system events and messages 199
System event codes and messages

Table 8-31 IDM index events and messages (continued)

Code Summary Description

3021 Nothing to index The new remote IDM profile for source "{0}" was identical to
the previous imported version.

3022 Profile conversion IDM profile {0} has been converted to {1} on the endpoint.

3023 Endpoint IDM profiles memory IDM profile {0} size plus already deployed profiles size are
usage too large to fit on the endpoint, only exact matching will be
available.

Table 8-32 Attribute lookup events

Code Summary Description

3100 Invalid Attributes detected with Invalid or unsafe Attributes passed from Standard In were
Script Lookup Plugin removed during script execution. Please check the logs for
more details.

3101 Invalid Attributes detected with Invalid or unsafe Attributes passed to Standard Out were
Script Lookup Plugin removed during script execution. Please check the logs for
more details.

Table 8-33 Monitor stub events

Code Summary Description

3200 AggregatorStub started None

3201 {0} updated List of updates:{1}.

3202 {0} store intialized Initial items:{1}.

3203 Received {0} Size: {1} bytes.

3204 FileReaderStub started None

3205 IncidentWriterStub started Using test incidents folder {0}.

3206 Received configuration for {0} {1}.

3207 PacketCaptureStub started None

3208 RequestProcessorStub started None

3209 Received advanced settings None

3210 Updated settings Updated settings:{0}.

Managing system events and messages 200
System event codes and messages

Table 8-33 Monitor stub events (continued)

Code Summary Description

3211 Loaded advanced settings None

3212 UpdateServiceStub started None

3213 DetectionServerDatabaseStub None

started

Table 8-34 Packet capture events

Code Summary Description

3300 Packet Capture started Packet Capture has successfully started.

3301 Capture failed to start on device Device {0} is configured for capture, but could not be
{0} initialized. Please see PacketCapture.log for more information.

3302 PacketCapture could not elevate PacketCapture could not elevate its privileges. Some
its privilege level initialization tasks are likely to fail. Please check ownership
and permissions of the PacketCapture executable.

3303 PacketCapture failed to drop its Root privileges are still attainable after attempting to drop
privilege level them. PacketCapture will not continue

3304 Packet Capture started again as Packet capture started processing again because some disk
more disk space is available space was freed on the monitor hard drives.

3305 Packet Capture stopped due to Packet capture stopped processing packets because there
disk space limit is too little space on the monitor hard drives.

3306 Endace DAG driver is not Packet Capture was unable to activate Endace device
available support. Please see PacketCapture.log for more information.

3307 PF_RING driver is not available Packet Capture was unable to activate devices using the
PF_RING interface. Please check PacketCapture.log and
your system logs for more information.

3308 PACKET_MMAP driver is not Packet Capture was unable to activate devices using the
available PACKET_MMAP interface. Please check PacketCapture.log
and your system logs for more information.

3309 {0} is not available Packet Capture was unable to load {0} . No native capture
interface is available. Please see PacketCapture.log for more
information.
Managing system events and messages 201
System event codes and messages

Table 8-34 Packet capture events (continued)

Code Summary Description

3310 No {0} Traffic Captured {0} traffic has not been captured in the last {1} seconds.
Please check Protocol filters and the traffic sent to the
monitoring NIC.

3311 Could not create directory Could not create directory {0} : {1}.

Table 8-35 Log collection events

Code Summary Description

3400 Couldn't add files to zip The files requested for collection could not be written to an
archive file.

3401 Couldn't send log collection The files requested for collection could not be sent.

3402 Couldn't read logging properties A properties file could not be read. Logging configuration
changes were not applied.

3403 Couldn't unzip log configuration The zip file containing logging configuration changes could
package not be unpacked. Configuration changes will not be applied.

3404 Couldn't find files to collect There were no files found for the last log collection request
sent to server.

3405 File creation failed Could not create file to collect endpoint logs.

3406 Disk usage exceeded File creation failed due to insufficient disk space.

3407 Max open file limit exceeded File creation failed as max allowed number of files are already
open.

Table 8-36 Enforce SPC events

Code Summary Description

3500 SPC Server successfully SPC Server successfully registered. Product Instance Id [{0}].
registered.

3501 SPC Server successfully SPC Server successfully unregistered. Product Instance Id
unregistered. [{0}].

3502 A self-signed certificate was A self-signed certificate was generated. Certificate alias [{0}].
generated.
Managing system events and messages 202
System event codes and messages

Table 8-37 Enforce user data sources events

Code Summary Description

3600 User import completed User import from source {0} completed successfully.
successfully.

3601 User import failed. User import from data source {0} has failed.

3602 Updated user data linked to Updated user data linked to {0} existing incident events.
incidents.

Table 8-38 Catalog item distribution related events

Code Summary Description

3700 Unable to write catalog item Failed to delete old temporary file {0}.

3701 Unable to rename catalog item Failed to rename temporary catalog item file {0}.

3702 Unable to list catalog items Failed to list catalog item files in folder {0}.Check folder
permissions.

3703 Error sending catalog items Unexpected error occurred while sending an catalog
item.{0}Look in the file reader log for more information.

3704 File Reader failed to delete files. Failed to delete catalog file {0} after it was sent.\nDelete the
file manually, correct the problem and restart the File Reader.

3705 Failed to list catalog item files Failed to list catalog item files in folder {0}.Check folder
permissions.

3706 The configuration is not valid. The property {0} was configured with invalid value {1}. Please
make sure that this has correct value provided.

3707 Scan failed: Remediation Remediation detection catalog update timed out after {0}
detection catalog could not be seconds for target {1}.
updated

Table 8-39 Detection server database events

Code Summary Description

3800 DetectionServerDatabase started None

3801 DetectionServerDatabase failed Error starting DetectionServerDatabase. Reason: {0}.

to start
Managing system events and messages 203
System event codes and messages

Table 8-39 Detection server database events (continued)

Code Summary Description

3802 Invalid Port for Could not retrieve the port for DetectionServerDatabase
DetectionServerDatabase process to listen to connection. Reason: {0}. Check if the
property file setting has the valid port number.

Table 8-40 Endpoint communication layer events

Code Summary Description

3900 Internal communications error. Internal communications error. Please see {0} for errors.
Search for the string {1}.

3901 System events have been System event throttle limit exceeded. {0} events have been
suppressed. suppressed. Internal error code = {1}.

Table 8-41 Agent communication event code

Code Summary Description

4000 Agent Handshaker error Agent Handshaker error. Please see {0} for errors. Search
for the string {1}.

Table 8-42 Monitor controller replication communication layer application error events

Code Summary Description

4050 Agent data batch persist error Unexpected error occurred while agent data being persisted
: {0}. Please look at the monitor controller logs for more
information.

4051 Agent status attribute batch Status attribute data for {0} agent(s) could not be persisted.
persist error Please look at the monitor controller logs for more information.

4052 Agent event batch persist Event data for {0} agent(s) could not be persisted. Please
look at the monitor controller logs for more information.

Table 8-43 Enforce Server web services event code

Code Summary Description

4101 Response Rule Execution Request fetch failed even after {0} retries. Database
Service Database failure on connection still down. The service will be stopped.
request fetch
Managing system events and messages 204
System event codes and messages

Table 8-44 Cloud service enrollment events

Code Summary Description

4200 Cloud Service enrollment: Cloud Service enrollment: successfully received client
successfully received client certificate from Symantec Managed PKI Service.
certificate from Symantec
Managed PKI Service

4201 Cloud Service enrollment: error ERROR {0}.

requesting client certificate from
Symantec Managed PKI Service

4205 Symantec Managed PKI Symantec Managed PKI certificate expires in {0} days.
certificate expires in {0} days

4206 Symantec Managed PKI Service Symantec Managed PKI Service certificate has expired.
certificate has expired

4210 Cloud Service enrollment bundle Invalid enrollment file content.

error

4211 Cloud Service enrollment bundle Enrollment file missing from ZIP bundle.
error

4212 Invalid Cloud Detector enrollment Detector info doesn't match the existing configuration.
bundle

Table 8-45 Cloud detector event code

Code Summary Description

4300 Cloud Detector created in Cloud detector {0} created in Enforce.

Enforce

Table 8-46 User Groups profile event code

Code Summary Description

4400 One or more User Group profiles Check the "Manage > Policies > User Groups" page for
are out of date and must be more details. The following User Group profiles are out of
reindexed. date: {0}.

Table 8-47 Cloud operations event code

Code Summary Description

4701 Cloud operations events or Cloud operations issued an event or notification about the
notifications cloud service.
Managing system events and messages 205
System event codes and messages

Table 8-48 OCR event codes

Code Summary Description

4800 OCR service is busy Request not processed. OCR server's request queue
is full.

4801 Request failed to connect to Please verify OCR server's address, port, and that it
OCR server is reachable. Check logs for more detail.

4802 OCR server had an internal Please check OCR server logs for details about what
server error went wrong.

4803 OCR request was not {0}

successful

4804 Failed to initialize OCR Client {0}

4805 An Unknown error {0}

encountered

4807 The client and/or OCR server Unable to verify client and server with each other as
are not authorized with each authorized endpoints. Please verify that the client and
other server keystores are configured correctly. Check logs
on detection server and OCR server for more details.
Chapter 9
Managing the Symantec
Data Loss Prevention
database
This chapter includes the following topics:

■ Working with Symantec Data Loss Prevention database diagnostic tools

■ Viewing tablespaces and data file allocations

■ Viewing table details

■ Checking the database update readiness

Working with Symantec Data Loss Prevention

database diagnostic tools
The Enforce Server administration console lets you view diagnostic information about the
tablespaces and tables in your database to help you better manage your database resources.
You can see how full your tablespaces and tables are, and whether or not the files in the tables
are automatically extendable to accommodate more data. This information can help you
manage your database by understanding where you may want to enable the Oracle Autoextend
feature on data files, or otherwise manage your database resources. You can also generate
a detailed database report to share with Symantec Technical Support for help with
troubleshooting database issues.
You can view the allocation of tablespaces, including the size, memory usage, extendability,
status, and number of files in each tablespace. You can also view the name, size, and
Autoextend setting for each file in a tablespace. In addition, you can view table-level allocations
for incident data tables, other tables, indexes, and large object (LOB) tables.
Managing the Symantec Data Loss Prevention database 207
Viewing tablespaces and data file allocations

You can generate a full database report in HTML format to share with Symantec Technical
Support at any time by clicking Get full report. The data in the report can help Symantec
Technical Support troubleshoot issues in your database.
See “Generating a database report” on page 208.

Viewing tablespaces and data file allocations

You can view tablespaces and data file allocations on the Database Tablespaces Summary
page (System > Database > Tablespaces Summary).
The Database Tablespaces Summary page displays the following information:
■ Name: The name of the tablespace.
■ Size: The size of the tablespace in megabytes.
■ Used (%): The percentage of the tablespace currently in use. This percentage is calculated
based on the Used (MB) and Size values. It does not take into account the Extendable
To (MB) value.
■ Used (MB): The amount of the tablespace currently in use, in megabytes.
■ Extendable To (MB): The size to which the tablespace can be extended. This value is
based on the Autoextend settings of the files within the tablespace.
■ Status: The current status of the tablespace according to the percentage of the tablespace
currently in use, depending on the warning thresholds. If you are using the default warning
threshold settings, the status is:
■ OK: The tablespace is under 80% full, or the tablespace can be automatically extended.
■ Warning: The tablespace is between 80% and 90% full . If you see a warning on a
tablespace, you may consider enabling Autoextend on the data files in the tablespace
or extending the maximum value for data file auto-extensibility.
■ Severe: The tablespace is more than 90% full. If you see a severe warning on a
tablespace, you should enable Autoextend on the data files in the tablespace, extend
the maximum value for data file auto-extensibility, or determine whether you can purge
some of the data in the tablespace.

■ Number of Files: The number of data files in the tablespace.

Select a tablespace from the list to view details about the files it contains. The tablespace file
view displays the following information:
■ Name: The name of the file.
■ Size: The size of the file, in megabytes.
Managing the Symantec Data Loss Prevention database 208
Viewing tablespaces and data file allocations

■ Auto Extendable: Specifies if the file is automatically extendable based on the Autoextend
setting of the file in the Oracle database.
■ Extendable To (MB): The maximum size to which the file can be automatically extended,
in megabytes.
■ Path: The path to the file.

Adjusting warning thresholds for tablespace usage in large databases

If your database contains a very large amount of data (1 terabyte or more), you may want to
adjust the warning thresholds for tablespace usage. For such large databases, Symantec
recommends adjusting the Warning threshold to 85% full, and the Severe threshold to 95%
full. You may want to set these thresholds even higher for larger databases. You can specify
these values in the Manager.properties file.
To adjust the tablespace usage warning thresholds
1 Open the Manager.properties file in a text editor.
2 Set the Warning and Severe thresholds to the following values:

com.vontu.manager.tablespaceThreshold.warning=85
com.vontu.manager.tablespaceThreshold.severe=95

3 Save the changes to the Manager.properties file and close it.

4 Restart the Symantec DLP Manager service to apply your changes.

Generating a database report

You can generate a full database report in HTML format at any time by clicking Get full report
on the Database Tablespaces Summary page. The database report includes the following
information:
■ Detailed database information
■ Incident data distribution
■ Message data distribution
■ Policy group information
■ Policy information
■ Endpoint agent information
■ Detection server (monitor) information
Symantec Technical Support may request this report to help troubleshoot database issues.
Managing the Symantec Data Loss Prevention database 209
Viewing table details

To generate a database report

1 Navigate to System > Database > Tablespaces Summary.
2 Click Get full report.
3 The report takes several minutes to generate. Refresh your screen after several minutes
to view the link to the report.
4 To open or save the report, click the link above the Tablespaces Allocation table. The
link includes the timestamp of the report for your convenience.
5 In the Open File dialog box, chose whether to open the file or save it.
6 To view the report, open it in a web browser or text editor.
7 To update the report, click Update full report.

Viewing table details

You can view table-level allocations on the Database Table Details page (System > Database
> Table Details). Viewing table-level allocations can be useful after a large data purge to see
the de-allocation of space within your database segments. You can refresh the information
displayed on this page by clicking Update table data at any time.
The Database Table Details page displays your table-level allocations on one of four tabs:
■ Incident Tables: This tab lists all the incident data tables in the Symantec Data Loss
Prevention database schema. The tab displays the following information:
■ Table Name: The name of the table.
■ In Tablespace: The name of the tablespace that contains the table.
■ Size (MB): The size of the table, in megabytes.
■ % Full: The percentage of the table currently in use.

■ Other Tables: This tab lists all other tables in the schema. The tab displays the following
information:
■ Table Name: The name of the table.
■ In Tablespace: The name of the tablespace that contains the table.
■ Size (MB): The size of the table, in megabytes.
■ % Full: The percentage of the table currently in use.

■ Indices: This table lists all of the indexes in the schema. The tab displays the following
information:
■ Index Name: The name of the index.
Managing the Symantec Data Loss Prevention database 210
Checking the database update readiness

■ Table Name: The name of the table that contains the index.
■ In Tablespace: The name of the tablespace that contains the table.
■ Size (MB): The size of the table, in megabytes.
■ % Full: The percentage of the table currently in use.

■ LOB Segments: This table lists all of the large object (LOB) tables in the schema. The tab
displays the following information:
■ Table Name: The name of the table.
■ Column Name: The name of the table column containing the LOB data.
■ In Tablespace: The name of the tablespace that contains the table.
■ LOB Segment Size (MB): The size of the LOB segment, in megabytes.
■ LOB Index Size: The size of the LOB index, in megabytes.
■ % Full: The percentage of the table currently in use.

Note: The percentage used value for each table displays the percentage of the table currently
in use as reported by the Oracle database in dark blue. It also includes an additional estimated
percentage used range in light blue. Symantec Data Loss Prevention calculates this range
based on tablespace utilization.

Checking the database update readiness

You use the Update Readiness tool to confirm that the Oracle database is ready to upgrade
to the next Symantec Data Loss Prevention version.
The Update Readiness tool tests the following items in the database schema:
■ Oracle version
■ Oracle patches
■ Permissions
■ Tablespaces
■ Existing schema against standard schema
■ Real Application Clusters
■ Change Data Capture
■ Virtual columns
■ Partitioned tables
Managing the Symantec Data Loss Prevention database 211
Checking the database update readiness

■ Numeric overflow
■ Temp Oracle space
Table 9-1 lists tasks you complete to run the tool.

Table 9-1 Using the Update Readiness tool

Step Task Details

1 Prepare to run the Update Readiness See “Preparing to run the Update Readiness tool”
tool. on page 211.

2 Create the Update Readiness tool See “Creating the Update Readiness tool database
database account. account” on page 213.

3 Run the tool. See “Running the Update Readiness tool at the
command line” on page 215.

4 Review the update readiness results. See “Reviewing update readiness results” on page 218.

Preparing to run the Update Readiness tool

Preparing the Update Readiness tool includes downloading the tool and moving it to the Enforce
Server.
To prepare the Update Readiness tool
1 Obtain the latest version of the tool (for both major or minor release versions of Symantec
Data Loss Prevention) from Software Downloads.
The tool file name is Symantec_DLP_15.5_Update_Readiness_Tool_15.5.0-1.zip. The
tool version changes when updated tools are released.
The latest version of the Update Readiness tool includes important fixes and improvements,
and should be the version that you use before attempting an upgrade. See the Support
Center article About the Symantec Data Loss Prevention Update Readiness tool, and
URT test results for information about the latest version; subscribe to the article to be
informed about new versions.
Symantec recommends that you download the tool to the DLPDownloadHome directory.

Note: Review the Readme file that is included with the tool for a list of Symantec Data
Loss Prevention versions the tool is capable of testing.

2 Log on as Administrator to the database server system.

3 Confirm the following if you are running a three-tier deployment:
Managing the Symantec Data Loss Prevention database 212
Checking the database update readiness

■ That you are running the same Oracle Client version as the Oracle Server version.
If the versions do not match, the Oracle Client cannot connect to the database, which
causes the Update Readiness tool to fail.
■ That the Oracle Client is installed as Administrator.
If the Oracle Client is not installed as Administrator, reinstall it and select Administrator
on the Select Installation Type panel. Selecting Administrator enables the
command-line clients, expdp and impdp.

4 Stop Oracle database jobs if your database has scheduled jobs.

See “Stopping Oracle database jobs” on page 212.
5 Unzip the tool, then copy the contents of the unzipped folder to the following location. Do
not unzip the tool as a folder to this location: The contents of the tool folder must reside
directly in the URT folder as specified:
c:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\Migrator\URT\
(for Windows)
opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/Migrator/URT/
(for Linux)
During the upgrade process, the Migration Utility checks the database update readiness
by running the Update Readiness tool from this location.
See “Checking the database update readiness” on page 210.

Stopping Oracle database jobs

If your database has scheduled jobs, you must unschedule them and clear the jobs queue
before you run the Update Readiness tool and start the migration process. After the jobs are
unscheduled and the jobs queue is clear, you can run the Update Readiness tool and continue
your migration.
Managing the Symantec Data Loss Prevention database 213
Checking the database update readiness

To unschedule jobs
1 Log on to SQL*Plus using the Symantec Data Loss Prevention database user name and
password.
2 Run the following:

BEGIN
FOR rec IN (SELECT * FROM user_jobs) LOOP
dbms_job.broken( rec.job, true);
dbms_job.remove( rec.job);
END LOOP;
END;
/

3 Verify that all jobs are unscheduled by running the following:

Select count(*) from user_jobs;

Confirm that the count is zero. If the count is not zero, run the command to clear the queue
again. If a job is running when you attempt to clear the queue, the job continues to run
until it completes and is not cleared. For long running jobs, Symantec recommends that
you wait for the job to complete instead of terminating the job.
4 Exit SQL*Plus.

Creating the Update Readiness tool database account

Before you can run the Update Readiness tool, you must create a database account.
To create the new Update Readiness tool database account
1 Navigate to the /script folder where you extracted the Update Readiness tool.
2 Start SQL*Plus:

sqlplus /nolog

3 Run the oracle_create_user.sql script:

SQL> @oracle_create_user.sql

4 At the Please enter the password for sys user prompt, enter the password for the SYS
user.
5 At the Please enter Service Name prompt, enter a user name.
6 At the Please enter required username to be created prompt, enter a name for the new
upgrade readiness database account.
Managing the Symantec Data Loss Prevention database 214
Checking the database update readiness

7 At the Please enter a password for the new username prompt, enter a password for
the new upgrade readiness database account.
Use the following guidelines to create an acceptable password:
■ Passwords cannot contain more than 30 characters.
■ Passwords cannot contain double quotation marks, commas, or backslashes.
■ Avoid using the & character.
■ Passwords are case-sensitive by default. You can change the case sensitivity through
an Oracle configuration setting.
■ If your password uses special characters other than _, #, or $, or if your password
begins with a number, you must enclose the password in double quotes when you
configure it.
Store the user name and password in a secure location for future use. You use this user
name and password to run the Update Readiness tool.
8 As the database sysdba user, grant permission to the Symantec Data Loss Prevention
schema user name for the following database objects:

sqlplus sys/[sysdba password] as sysdba

GRANT READ,WRITE ON directory DATA_PUMP_DIR TO [schema user name];
GRANT SELECT ON dba_registry_history TO [schema user name];
GRANT SELECT ON dba_temp_free_space TO [schema user name];

See “Preparing to run the Update Readiness tool” on page 211.

See “Checking the database update readiness” on page 210.

Running the Update Readiness tool from the Enforce Server

administration console
You can run the Update Readiness tool from the Enforce Server administration console to
check the update readiness for the next Symantec Data Loss Prevention version. To run the
tool, you must have User Administration (Superuser) or Server Administration user privileges.
To run the Update Readiness tool
1 Go to System > Servers and Detectors > Overview, and click System Servers and
Detectors Overview.
2 Click Upload the Update Readiness tool and locate the tool.
If you the tool has already been uploaded, and you upload a new version, the old version
is deleted.
See “Preparing to run the Update Readiness tool” on page 211.
Managing the Symantec Data Loss Prevention database 215
Checking the database update readiness

3 Enter the Update Readiness tool database account user credentials.

Warning: Do not enter the protect user database credentials. Entering credentials other
than the Update Readiness tool database account overwrites the Symantec Data Loss
Prevention database.

See “Creating the Update Readiness tool database account” on page 213.
4 Click Run Update Readiness Tool to begin the update readiness check.
You can click Refresh this page to update the status of the readiness check. When you
refresh, a link to a summary of results returned at that point in time displays. The process
may take up to an hour depending on the size of the database.
When the tool completes the test, you are provided with a link you can use to download
the results log.
See “Reviewing update readiness results” on page 218.
See “Checking the database update readiness” on page 210.

Running the Update Readiness tool at the command line

You can run the Update Readiness tool from the command prompt on the Enforce Server host
computer.
Managing the Symantec Data Loss Prevention database 216
Checking the database update readiness

To run the Update Readiness tool

1 Open a command prompt window.
2 Go to the URT directory:
c:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\Migrator\URT
(for Windows)
opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/Migrator/URT
(for Linux)
Managing the Symantec Data Loss Prevention database 217
Checking the database update readiness

3 Run the Update Readiness tool using the following command:

"C:\Program Files\Symantec\DataLossPrevention\ServerJRE\1.8.0_181\bin\java" UpdateReadinessTool

--username <schema user name>
--password <password>
--readiness_username <readiness_username>
--readiness_password <readiness_password>
--sid <database_system_id>
[--quick]

"/opt/Symantec/DataLossPrevention/ServerJRE/1.8.0_181/bin/java" UpdateReadinessTool
--username <schema user name>
--password <password>
--readiness_username <readiness_username>
--readiness_password <readiness_password>
--sid <database_system_id>
[--quick]

The following table identifies the commands:

<schema user name> The Symantec Data Loss Prevention schema user name.

<password> The Symantec Data Loss Prevention schema password.

<readiness_username> The Update Readiness tool database account user you created.

See “Creating the Update Readiness tool database account”

on page 213.

<readiness_password> The password for the Update Readiness tool database account
user.

<database_system_id> The database system ID (SERVICE_NAME), typically "protect."

[--quick] The optional command only runs the database object check and
skips the update readiness test.

After the test completes, you can locate the results in a log file in the /output directory.
This directory is located where you extracted the Update Readiness tool. If you do not
include [--quick] when you run the tool, the test may take up to an hour to complete.
You can verify the status of the test by reviewing log files in the /output directory.
See “Preparing to run the Update Readiness tool” on page 211.
See “Reviewing update readiness results” on page 218.
Managing the Symantec Data Loss Prevention database 218
Checking the database update readiness

Reviewing update readiness results

After you run the Update Readiness tool, the tool returns test results in a log file. Table 9-2
lists the results summarized in the log file.

Table 9-2 Update Readiness results

Status Description

Pass Items that display under this section are confirmed and ready for update.

Warning If not fixed, items that display under this section may prevent the database from
upgrading properly.

Error These items prevent the upgrade from completing and must be fixed.

See “Checking the database update readiness” on page 210.

Chapter 10
Working with Symantec
Information Centric
Encryption
This chapter includes the following topics:

■ About Symantec Information Centric Encryption

■ About the Symantec ICE Utility

■ Overview of implementing Information Centric Encryption capabilities

■ Configuring the Enforce Server to connect to the Symantec ICE Cloud

About Symantec Information Centric Encryption

Symantec Information Centric Encryption (ICE) is a risk-reduction solution that lets your
employees, partners, and trusted individuals securely share company email and files. Symantec
ICE can help you to detect confidential email and files and encrypt them so that only the users
that you authorize can access them.
Typical encryption technologies may allow data loss after email or files are decrypted. Once
they are decrypted, they can be sent to other individuals and are no longer protected. However,
ICE encryption technology encrypts and protects email and files throughout their life, regardless
of where they travel.
When an email or file is determined to be confidential or critical, ICE automatically encrypts it
in place by using the ICE library and encryption services. Once it is encrypted, only the users
that you authorize can read it.
ICE also includes the Information Centric Encryption Cloud Console, which provides you with
visibility into the use of ICE-encrypted email and files. You can monitor who has accessed
Working with Symantec Information Centric Encryption 220
About the Symantec ICE Utility

those email and files, from where they are accessed, and how they are used. You can also
use the ICE Cloud Console to set specific group permissions. You can set permissions for the
saving, sharing, and editing of email and files for policy groups. You can also revoke access
to individual email and files or revoke rights to access email and files for specific policy groups.
How and what you protect depends upon the Symantec solution you integrate with ICE. ICE
is designed to bring end-to-end encryption to multiple Symantec products, enhancing the
security of your emails and files. Table 10-1 lists the most common ways you can use ICE
with Symantec products.

Table 10-1 ICE encryption solutions

To... Use ICE with...

Protect files in cloud file storage such as Box and Symantec CloudSOC
OneDrive.

Protect files stored in: Symantec Data Loss Prevention 15 and later

■ Cloud file storage such as Box and OneDrive. Symantec Data Loss Prevention also allows you to
■ Enterprise file storage such as File System create robust policies and remediation rules to
servers and Microsoft SharePoint. protect these files and emails.
■ Endpoint content such as removable drives.

Protect files uploaded by browsers over HTTPs.

Protect emails and attachments for Microsoft Office

365 Exchange Online and Google G Suite Gmail.

Protect emails and email attachments in the cloud. Symantec Data Loss Prevention for Email with
Cloud Console (DLP Cloud Console)

ICE with DLP Cloud Console has a minimal

on-premises footprint.

Integrate classification with encryption capabilities Symantec Information Centric Tagging (ICT)
for multilevel protection of sensitive information both
Integrating the capabilities of ICE and ICT results
inside and outside your network.
in a powerful information protection solution known
Applies to files and email in a Windows as Symantec Information Centric Security Module.
environment.

See the Symantec™ Information Centric Encryption Deployment Guide for details on integrating
Symantec ICE with Symantec Data Loss Prevention.

About the Symantec ICE Utility

The Symantec ICE Utility allows an authorized user to decrypt a file that has been encrypted
by ICE. If a user attempts to access a file that ICE protects, the ICE Utility prompts the user
Working with Symantec Information Centric Encryption 221
About the Symantec ICE Utility

for authentication. If the user is authenticated, the ICE Utility decrypts the file. The user can
decrypt ICE-encrypted files when endpoints are not connected to the Internet.
The ICE Utility also applies any permission sets assigned to the user in the ICE Cloud Console.
For example, if you have disabled printing for the user or the policy group, the user is not able
to print the document.

Note: On mobile devices, the ICE Utility is called ICE Workspace. You can get ICE Workspace
with the VIP Access for Mobile app.

The ICE Utility is context aware, meaning that it recognizes a user's environment. The ICE
Utility can be deployed in two types of environments: managed environments and unmanaged
environments.
The Symantec ICE Utility automatically detects a network proxy that is configured on an
endpoint and uses it to connect to the Symantec ICE Cloud. Additionally, in a managed
environments, the ICE Utility uses the same network proxy settings that are stored in an agent
configuration used by the DLP Agent that is installed on the same endpoint.
■ In managed environments, your organization provides and maintains the devices on which
users access protected files.
In managed environments, the ICE Utility leverages the policies and security controls that
your organization puts in place over user devices. In this environment, the ICE Utility gives
the user greater flexibility with decrypting and working with protected files. Files open in
their native app, and the user has full access to the file to edit, share, save, save as, and
print the file. Users are required to authenticate at least once every 180 days (configurable
in the ICE Cloud Console).
The managed version of the ICE Utility works the same across Windows and macOS
platforms; however, the Windows version of the ICE Utility installation package also includes
the ICT agent. Users can only install the ICT agent if you have implemented ICT and
correctly configured the ICT agent installation package.
■ In unmanaged environments, such as those of your partners or in which employees bring
their own devices, users' devices are outside your direct control.
Since you have no direct control over the security of the users' devices in unmanaged
environments, the ICE Utility provides additional security. The ICE Utility enforces stricter
restrictions over when and how a file is decrypted, and allows you greater content control
through the use of permission sets.
When users attempt to open a protected file on a device without the ICE Utility, they are
prompted to download the ICE Utility.
Users that attempt to access an encrypted file are required to authenticate at least once
every 24 hours (configurable in the ICE Cloud Console).
Working with Symantec Information Centric Encryption 222
Overview of implementing Information Centric Encryption capabilities

■ On Windows, supported file types are decrypted and opened in their native app, but
the permissions that you assigned to the user are enforced. So, if you have restricted
printing for the user or the policy group, the user is unable to print the file.
Files that ICE does not support open in their native app, but ICE does not enforce
permissions.
■ On macOS, supported file types are opened in their native app, if the edit permission
is enabled on the Information Centric Encryption Cloud Console. However, if the
permissions include content lock or print restrictions, such files open in the Mac
Preview application in view-only mode. For Office formats, ICE-encrypted files launch
the Microsoft Office application. If the user does not have Microsoft Office installed,
then Word documents open in Mac TextEdit, and Excel and PowerPoint files open in
Mac Preview.
On iOS, supported file types are opened in a view-only mode irrespective of the
permissions that are assigned to the user.

In all environments, when the user finishes with the file, the ICE Utility encrypts it again,
maintaining the file's security throughout its lifetime. However, if the permissions for a user
allow the user to save the file with a new name, the new file is not encrypted.
See the following for more information about the ICE Utility.

For information about See

How to provide the ICE Utility to your users

How users are authenticated through the ICE Utility

Where ICE Utility logs are stored

How the ICE Utility works on mobile devices

How customers using Symantec Data Loss

Prevention Cloud Service for Email with Microsoft
Office 365 Exchange Online can allow users to view
emails without the ICE Utility

Overview of implementing Information Centric

Encryption capabilities
The high-level steps for implementing Information Centric Encryption with Symantec Data
Loss Prevention are provided in Table 10-2. Specific task steps are provided in the topics
referenced in the "Details" column.
For more information about Information Centric Encryption, refer to the Symantec Information
Centric Encryption Deployment Guide at http://www.symantec.com/docs/DOC9707.
Working with Symantec Information Centric Encryption 223
Overview of implementing Information Centric Encryption capabilities

Table 10-2 Overview of implementing Information Centric Encryption capabilities

Step Action Details

1 Depending on your See “Installing a new license file” on page 234.

organization's security
needs, install one or both
of the following licenses:

■ Network Protect ICE

■ Endpoint Prevent ICE

2 Configure the Enforce See “Configuring the Enforce Server to connect to the
Server to connect to the Symantec ICE Cloud” on page 224.
Symantec ICE Cloud.

3 Configure policy response See “Configuring the Endpoint Prevent: Encrypt action”
rule actions to protect on page 1821.
sensitive files using ICE
See “Configuring the Network Protect: Encrypt File action”
encryption.
on page 1838.

See “Configuring the Server FlexResponse action” on page 1788.

4 Configure Network Protect See “Configuring Network Protect for file shares” on page 2177.
to enable ICE encryption
protection for supported
scan targets.

5 Configure Cloud Service See “Encrypting cloud email with Symantec Information Centric
for Email policy response Encryption” on page 2518.
rule actions to protect both
sensitive emails and
attachments or sensitive
email attachments using
ICE encryption.

6 Enable ICE encryption in See “Information Centric Encryption settings for DLP Agents”
Endpoint Prevent to protect on page 2371.
confidential files that are:
See “Configuring Network Protect for SharePoint servers”
■ Stored on removable on page 2203.
devices that are
connected to endpoints
■ Stored on cloud storage
applications
■ Uploaded with
browsers using HTTPS
Working with Symantec Information Centric Encryption 224
Configuring the Enforce Server to connect to the Symantec ICE Cloud

Table 10-2 Overview of implementing Information Centric Encryption capabilities

(continued)

Step Action Details

7 Download and then install The ICE Utility is available for download from Symantec
the ICE Utility on all FileConnect.
managed devices within
See “About the Symantec ICE Utility” on page 220.
your organization. The ICE
Utility is required for users
to be able to access
ICE-encrypted files.

Unmanaged device users

will be prompted to
download and install the
ICE Utility when they
attempt to access an
ICE-encrypted file for the
first time on a particular
device.

Configuring the Enforce Server to connect to the

Symantec ICE Cloud
After you install the Endpoint Prevent ICE license, or the Network Protect ICE license, or
upload your Cloud Service for Email enrollment bundle, you must configure the Enforce Server
to connect to the Symantec ICE Cloud. This step is a prerequisite for enabling any of the
encryption-related functions that you can configure using the Enforce Server administration
console.
See “Installing a new license file” on page 234.
To configure the Enforce Server to connect to the Symantec ICE Cloud:
1 Go to System > Settings > General and click Configure.
2 At the Edit General Settings screen, scroll down to the ICE Cloud Access Settings
section.
3 Type the following Symantec ICE Cloud details in the provided fields:
■ Service URL
■ Customer ID
■ Domain ID
■ Service User ID
Working with Symantec Information Centric Encryption 225
Configuring the Enforce Server to connect to the Symantec ICE Cloud

■ Service Password

Note: Obtain this information from the Settings > Advanced Configuration > External
Services page of the ICE Cloud Console. Note that the Service Password is only visible
when you first authorize an external service. If you have lost your Service Password, the
only way to see your Service Password is to obtain a new one.

4 Click Save.
5 To enable and configure the ICE functionality in Symantec Data Loss Prevention, do one
or more of the following, depending on which ICE licenses are installed:
■ Configure Network Protect to enable ICE encryption protection for the supported scan
targets.
See “Configuring Network Protect for file shares” on page 2177.
■ Configure Cloud Service for Email to enable ICE email encryption of Office 365 email
and Gmail in the cloud.
See the Cloud Service for Email Implementation Guide at the Symantec Support Center
at http://www.symantec.com/docs/DOC9008.
■ Enable ICE in Endpoint Prevent to encrypt the following sensitive files:
■ Files that are transferred to removable storage
■ Files that are transferred by a cloud storage application
■ Files that are uploaded with browsers using HTTPS
See “Information Centric Encryption settings for DLP Agents” on page 2371.
Chapter 11
Working with Symantec
Information Centric Tagging
This chapter includes the following topics:

■ About integrating Information Centric Tagging with Data Loss Prevention

■ Overview of steps to tie Information Centric Tagging to Data Loss Prevention

■ Integrating the ICT server with the Enforce Server

■ Importing the ICT classification taxonomy

■ Supported file types for ICT-Data Loss Prevention integration

About integrating Information Centric Tagging with

Data Loss Prevention
Symantec Information Centric Tagging (ICT) is a data classification product that defines and
supports the application of tags and watermarks to emails and files. Information Centric Tagging
is also part of the separately licensed Information Centric Security Module (ICSM). ICSM
additionally offers data protection by providing encryption options--including Symantec
Information Centric Encryption (ICE)--that can be associated with certain tags.
The data classification taxonomy is a hierarchy of configured organization-scope-sensitivity
level tags. You use the administration console to import the taxonomy from the Information
Centric Tagging product into the Data Loss Prevention Enforce Server database.
Working with Symantec Information Centric Tagging 227
About integrating Information Centric Tagging with Data Loss Prevention

Note: Import of the taxonomy requires that a Data Loss Prevention domain user, whose name
is identified when ICT server credentials are added to the credential store, is also associated
in ICT with certain Active Directory User Groups. This association provides the user access
to ICT Administration Webservice methods. Additionally, an entry must be added to the Windows
Hosts file, mapping the ICT server IP address to its host name.

Once you have imported the taxonomy, you select appropriate tags from it to define response
rules of the ICT Classification And Tagging action type. You then attach the rules to policies
so that ICT tags are applied to content according to your corporate policy.
Tags can be applied in two ways:
■ You create Endpoint Discover scans. These scans apply the tags in response to policy
violations, or to all targeted content solely as a baseline Classification Scan.
■ ICT end users apply tags. The ICT Administrator enables Data Loss Prevention integration
by selecting the Symantec DLP Policies Integration option during ICT system setup.
Those Data Loss Prevention policies configured with ICT-based response rules are imported
to ICT. Data Loss Prevention policies, not ICT rules, drive automatic classification on the
ICT endpoint..
You can also use the imported taxonomy to create detection rules using the Content Matches
Classification option. You create the rules by selecting the tags displayed on the administration
console. Tagged content is discovered in the metadata of supported emails and files.

Note: Tagging can be used to notify Symantec Endpoint Protection (SEP) about certain files.
(This requires a separate license and the presence of a SEP agent on the Data Loss Prevention
endpoint.) To enable integration with SEP, when the ICT Administrator creates the classification
taxonomy, the Administrator can enable the Information Centric Defense option. This ICD
option appears on the classification level screens. When your Endpoint Discover scan runs
and applies a tag that contains this option, Data Loss Prevention notifies SEP about this file.
In a forthcoming release of SEP that integrates this functionality, SEP administrators will be
able to configure SEP to take necessary action on the classified file

The integration of ICT with Data Loss Prevention requires ongoing coordination between you
and the ICT Administrator. Some of the events requiring communication include:
■ You decide to use ICT tags in Data Loss Prevention. You notify the ICT Administrator, who
lets you know when the ICT taxonomy is ready. You import the taxonomy into Data Loss
Prevention, create ICT-based response rules that use those tags, and attach them to
policies.
■ If ICT end users will be applying the tags, you notify the ICT Administrator that the policies
are in place. The ICT Administrator confirms that the Symantec DLP Policies Integration
check box is selected on the ICT Administration Console. The Data Loss Prevention policies
Working with Symantec Information Centric Tagging 228
Overview of steps to tie Information Centric Tagging to Data Loss Prevention

are imported to ICT so that automatic classification is driven by Data Loss Prevention
policies, not by ICT rules.
■ If you will be applying the tags as part of Endpoint Discover scans, as a courtesy, you notify
the ICT Administrator. If ICT end users are working with those files, tagging activity may
fail.
See “Overview of steps to tie Information Centric Tagging to Data Loss Prevention” on page 228.
For more information, see the Information Centric Tagging documentation here:
https://support.symantec.com/en_US/article.DOC11257.html

Overview of steps to tie Information Centric Tagging

to Data Loss Prevention
The high-level steps for integrating Symantec Information Centric Tagging with Symantec Data
Loss Prevention are provided in Table 11-1. Specific task steps are provided in the topics
referenced in the "Details" column.

Table 11-1 Overview of implementing Information Centric Tagging capabilities

Step Action Details

1 Prepare to integrate the ICT server with the See “Integrating the ICT server with the
Enforce Server by defining the ICT server Enforce Server” on page 229.
credentials, and the ICT Web Service URL or an
XML-file pathname.

2 Schedule or trigger the Information Centric Tagging See “Importing the ICT classification
classification taxonomy import. taxonomy” on page 231.

3 For detection purposes, define response rules with See “Configuring the Content Matches
the Content Matches Classification option, then Classification condition” on page 863.
attach them to policies.

4 For tagging purposes, define response rules with See “Configuring response rule actions”
the ICT Classification And Tagging Action type, on page 1765.
then attach them to policies.
See “Configuring the Endpoint: ICT
Classification And Tagging action”
on page 1814.

5 For ICT tagging driven by Endpoint Discover See “About Endpoint Discover classification
scans, define the scans, either for policy-violation scanning” on page 2320.
tagging or as a baseline Classification Scan.
See “Creating an Endpoint Discover scan”
Note: These tagging scans require the DLP Agent on page 2326.
on the endpoint. (Mac and Windows)
Working with Symantec Information Centric Tagging 229
Integrating the ICT server with the Enforce Server

Table 11-1 Overview of implementing Information Centric Tagging capabilities (continued)

Step Action Details

6 For ICT tagging applied by end users, have the

ICT administrator enable Symantec DLP Policies
Integration, from the ICT Administration Console.
Note: This form of tagging requires both the DLP
Agent and the ICT agent on the endpoint.
(Windows only)

Integrating the ICT server with the Enforce Server

To integrate the Enforce Server with the ICT server, define the ICT server settings. These
settings include the ICT server credentials and the ICT Web Service URL or an XML pathname.
To define your Information Centric Tagging server settings
1 In the Enforce Server administration console, navigate to System > Settings > Information
Centric Tagging.
2 To enable the settings, click Edit.
3 In the Server Credential field, select the ICT server credential from the drop-down menu.
The credential name represents the login and password to the ICT server.
To add the credential to the menu, go to the Credentials page in the administration
console and enter it. The credential must be a Windows domain user account with privileges
to access ICT. These privileges are established in ICT, when the administrator associates
the domain user with certain Active Directory User Groups. See "Creating AD users,
groups, and Organizational Units" in the Information Centric Tagging Deployment Guide.
See “Adding new credentials to the credential store” on page 161.
4 In the ICT Web Service URL field, type either the ICT Web Service URL or an XML file
pathname.
See “About automatic and static imports of the ICT classification taxonomy” on page 229.
If you change the ICT Web Service URL: See “Changing the ICT Web Service URL”
on page 230.

About automatic and static imports of the ICT classification taxonomy

You can use the ICT Web Service for automatic, scheduled imports of the ICT classification
taxonomy. If you cannot use the ICT Web Service--perhaps you have a restrictive firewall or
a policy on the Enforce Server that does not allow database updates from external
Working with Symantec Information Centric Tagging 230
Integrating the ICT server with the Enforce Server

processes--you can alternately import a static, XML-based version of the taxonomy. For either
of these methods, you can perform the import immediately, rather than schedule it.
See “Using the ICT Web Service for scheduled classification taxonomy imports” on page 230.
See “Using an XML file for static classification taxonomy imports” on page 231.

Using the ICT Web Service for scheduled classification taxonomy

imports
To use the ICT Web Service for ICT classification taxonomy imports
◆ On the Information Centric Tagging page, in the ICT Web Service URL field, type the
ICT Web Service URL.
The URL syntax is
http://<ICT_server>/ICT/Admin-Webservice/Classifications.asmx.

Requirements for using the ICT Web Service for imports are:
■ A network connectivity on port 80 between the Data Loss Prevention Enforce Server and
the Information Centric Tagging server.
■ The ICT server identified to Windows from the Enforce Server:
■ Navigate to %systemdrive%\Windows\System32\drivers\etc\.
■ Edit the Windows Hosts file to map the ICT server IP address to its host name, using
the tabulated format: <IP> <FQDN of ICT server>.

Changing the ICT Web Service URL

The need to change the ICT Web Service URL is rare; however, if you change the name of
the Information Centric Tagging server, for example, and a URL change is necessary, see
Table 11-2 for actions you may need to take.

Table 11-2 Implications of changing the ICT Web Service URL

Circumstance Action

You have not yet synchronized an ICT classification Change the URL without taking any other action.
import using this URL.
Click Edit to enable the ICT Web Service URL field.
Make the change, then click Save.

You have synchronized an ICT classification import Change the URL without taking any other action.
using this URL and the new URL still points to the
same taxonomy as before.
Working with Symantec Information Centric Tagging 231
Importing the ICT classification taxonomy

Table 11-2 Implications of changing the ICT Web Service URL (continued)

Circumstance Action

You have synchronized an ICT classification import If you have existing detection rules in use:
using this URL, but the new URL points to a different
1 Delete any incidents generated from those
taxonomy.
rules.

2 Delete any detection rules that use the Content

Matches Classification option.

3 Define new rules using the taxonomy that

results from using the new ICT Web Service
URL.

Using an XML file for static classification taxonomy imports

To import the ICT taxonomy using an XML file
1 Log on to the ICT server as a Windows user with privileges to access the ICT SQL
database.
2 Use the local Internet Explorer browser on the server to browse the ICT Web Service.
The Web Service URL uses this syntax:
http://<ICT_server>/ICT/Admin-Webservice/Classifications.asmx

3 Run the GetAllClassifications operation.

On the Classifications tab, click Invoke.
4 Select and copy the entire resulting XML from the IE browser window and save it to a text
file.
5 Drop the file anywhere on the Enforce Server.

Note: This step requires administrator (write) permission on the Enforce Server.

6 On the Information Centric Tagging page, in the ICT Web Service URL field, enter the
XML pathname instead of the URL. A sample XML pathname is:
file://Program Files/Symantec/Data
LossPrevention/EnforceServer/15.5/Protect/config/ICT.xml

Importing the ICT classification taxonomy

You can establish a daily import schedule or do an immediate import.
Working with Symantec Information Centric Tagging 232
Supported file types for ICT-Data Loss Prevention integration

To set a synchronization schedule for the ICT classification taxonomy import

◆ On the Information Centric Tagging page, in the Sync daily at field, from the two
drop-down menus, select the hour and minutes for the import. The ICT Web Service
synchronization will run daily.
To do an immediate import of the ICT classification taxonomy
◆ On the Information Centric Tagging page, to immediately trigger an import, click SYNC
NOW.
After a synchronization runs, the imported taxonomy appears on the Information Centric
Tagging page, under the columns for Organization, Scope, Sensitivity, and Level. Click
any column to sort it.
Be aware that when you resynchronize the taxonomy, any existing taxonomy is deleted and
replaced with the new one.
Note that in Information Centric Tagging, once a classification is created, it cannot be deleted.
Your existing Data Loss Prevention detection policies will continue to work, even when a new
import runs. However, the ICT administrator can make changes to the classifications. Therefore,
over time, you should review your existing policies. Update or delete and recreate them, if
necessary, to reflect changed Organization-Scope-Sensitivity Level tags. Your review should
also include the names of your policies, if they are indicative of the tags being detected.

Supported file types for ICT-Data Loss Prevention

integration
Table 11-3 lists the supported file types from which ICT tags can be read by Data Loss
Prevention policies (Detection) and to which the DLP Agent and ICT agent can write tags
(Endpoint Discover).

Table 11-3 Supported file types for ICT-Data Loss Prevention integration

File types Extension Read Write tags

Portable Document Format pdf Y Y

Image files gif Y Y

Working with Symantec Information Centric Tagging 233
Supported file types for ICT-Data Loss Prevention integration

Table 11-3 Supported file types for ICT-Data Loss Prevention integration (continued)

File types Extension Read Write tags

JPEG jpe, jpg, jpeg, jfif N Y

TIFF tif, tiff N Y

Hypertext Markup Language htm, html N N

Chapter 12
Adding a new product
module
This chapter includes the following topics:

■ Installing a new license file

■ About system upgrades

Installing a new license file

When you first purchase Symantec Data Loss Prevention, upgrade to a later version, or
purchase additional product modules, you must install one or more Symantec Data Loss
Prevention license files. License files have names in the format name.slf.
You can also enter a license file for one module to start and, later on, enter license files for
additional modules.
To install a license:
1 Download the new license file.
2 Go to System > Settings > General and click Configure.
3 At the Edit General Settings screen, scroll down to the License section.
4 In the Install License field, browse for the new Symantec Data Loss Prevention license
file you downloaded, then click Save to agree to the terms and conditions of the end user
license agreement (EULA) for the software and to install the license.
The Current License list displays the following information for each product license:
■ Product – The individual Symantec Data Loss Prevention product name
■ Count – The number of users licensed to use the product
■ Status – The current state of the product
Adding a new product module 235
About system upgrades

■ Expiration – The expiration date of license for the product

A month before Expiration of the license, warning messages appear on the System > Servers
> Overview screen. When you see a message about the expiration of your license, contact
Symantec to purchase a new license key before the current license expires.

About system upgrades

For information about upgrading the Symantec Data Loss Prevention software, see the
Symantec Data Loss Prevention Upgrade Guide.
See “About Symantec Data Loss Prevention administration” on page 82.
Chapter 13
Applying a Maintenance
Pack
This chapter includes the following topics:

■ Applying a Symantec Data Loss Prevention Maintenance Pack

Applying a Symantec Data Loss Prevention

Maintenance Pack
Maintenance Packs can only be applied to an already installed version of Symantec Data Loss
Prevention. For example, a maintenance pack for 15.5 can only be applied to Symantec Data
Loss Prevention 15.5 (new or upgraded installation).
Before applying a maintenance pack or installing Symantec Data Loss Prevention, refer to the
Symantec Data Loss Prevention System Requirements and Compatibility Guide for information
about system requirements. This guide is available online here:
https://www.symantec.com/docs/DOC10602

Steps to apply a maintenance pack on Windows servers

The following table describes the high-level steps that are involved in applying the maintenance
pack to a Windows server. Each step is described in more detail elsewhere in this chapter, as
indicated.
Before you apply a maintenance pack, create an EnforceReinstallationResources.zip file
using the Reinstallation Resources Utility. This file includes the CryptoMasterKey.properties
file and the keystore files for your Symantec Data Loss Prevention deployment. You can use
the file to rollback to a previous version.
Applying a Maintenance Pack 237
Applying a Symantec Data Loss Prevention Maintenance Pack

See the Symantec Data Loss Prevention Upgrade Guide for Windows at the Symantec Support
Center at http://www.symantec.com/docs/DOC9258.

Table 13-1 Steps to apply the maintenance pack to a Windows environment

Step Action Description

1 Download and extract the maintenance pack See “Downloading the maintenance pack
software. software for Windows servers” on page 237.

2 Confirm that all users are logged out of the If users are logged in during the
Enforce Server administration console. maintenance pack application process,
subsequent logins fail during the End User
Licensing Agreement confirmation.

3 Apply the maintenance pack to the Enforce See “Updating the Enforce Server on
Server. Windows” on page 237.

The process to apply the maintenance pack

to a single-tier installation omits the
detection server update step.

See “Updating a single-tier system on

Windows” on page 239.

4 Apply the maintenance pack to the detection See “Updating the detection server on
server. Windows” on page 238.

Downloading the maintenance pack software for Windows servers

Copy the MSP files to the computer from where you intend to perform the upgrade. That
computer must have a reliable network connection to the Enforce Server.
Copy the MSP files into a directory on a system that is accessible to you. The root directory
where you move the files is referred to as the DLPDownloadHome directory.
Choose from the following files based on your current installation:
■ Apply the maintenance pack to the Enforce Server: EnforceServer.msp
■ Apply the maintenance pack to the detection server: DetectionServer.msp
■ Apply the maintenance pack to a single-tier installation: SingleTierServer.msp

Updating the Enforce Server on Windows

These instructions assume that Symantec Data Loss Prevention 15.5 is installed and that the
EnforceServer.msp file has been copied into the DLPDownloadHome directory on the Enforce
Server computer.
Applying a Maintenance Pack 238
Applying a Symantec Data Loss Prevention Maintenance Pack

To update the Enforce Server

◆ Install the maintenance pack by completing the following steps:

Note: You can install the maintenance pack using Silent Mode by running the following
command:
msiexec /p "EnforceServer.msp" ORACLE_PASSWORD=<ORACLE PASSWORD>/qn
/norestart /L*v EnforceServer.log

where <ORACLE PASSWORD> is the database password used for Symantec Data Loss
Prevention 15.5.

a Click Start > Run > Browse to navigate to the folder where you copied the
EnforceServer.msp file.

b Double-click EnforceServer.msp to execute the file, and click OK.

c Click Next on the Welcome panel.

d Enter the Symantec Data Loss Prevention database password in Oracle Database Server
Information panel.

e Click Update.

The update process may take a few minutes. The installation program window may display
for a few minutes while the services startup. After the update process completes, a
completion notice displays.

Updating the detection server on Windows

These instructions assume that Symantec Data Loss Prevention 15.5 is installed and the
DetectionServer.msp file has been copied into the DLPDownloadHome directory on the detection
server computer.
Applying a Maintenance Pack 239
Applying a Symantec Data Loss Prevention Maintenance Pack

To update the detection server

◆ Install the maintenance pack by completing the following steps:

Note: You can install the maintenance pack using Silent Mode by running the following
command:
msiexec /p "DetectionServer.msp" /qn /norestart /L*v DetectionServer.log

a Click Start > Run > Browse to navigate to the folder where you copied the
DetectionServer.msp file.

b Double-click DetectionServer.msp to execute the file, and click OK.

c Click Next on the Welcome panel.

d Click Update.

The update process may take a few minutes. The installation program window may display
for a few minutes while the services startup. After the update process completes, a
completion notice displays.

Updating a single-tier system on Windows

The following instructions assume that the SingleTierServer.msp file has been copied into
the DLPDownloadHome directory on the Enforce Server computer.
Applying a Maintenance Pack 240
Applying a Symantec Data Loss Prevention Maintenance Pack

To update a single-tier system

◆ Install the maintenance pack by completing the following steps:

Note: You can install the maintenance pack using Silent Mode by running the following
command:
msiexec /p "SingleTierServer.msp" ORACLE_PASSWORD=<ORACLE PASSWORD>/qn
/norestart /L*v EnforceServer.log

where <ORACLE PASSWORD> is the database password used for Symantec Data Loss
Prevention.

a Click Start > Run > Browse to navigate to the folder where you copied the
SingleTierServer.msp file.

b Double-click SingleTierServer.msp to execute the file, and click OK.

c Click Next on the Welcome panel.

d Enter the Symantec Data Loss Prevention database password in Oracle Database
Server Information panel.

e Click Update.

The update process may take a few minutes. The installation program window may display
for a few minutes while the services startup. After the update process completes, a
completion notice displays.

Steps to apply a maintenance pack on Linux servers

The following table describes the high-level steps that are involved in applying a Symantec
Data Loss Prevention maintenance pack to a Linux server. Each step is described in more
detail elsewhere in this chapter, as indicated.
Before you apply a maintenance pack, create an EnforceReinstallationResources.zip file
using the Reinstallation Resources Utility. This file includes the CryptoMasterKey.properties
file and the keystore files for your Symantec Data Loss Prevention deployment. You can use
the file to rollback to a previous version.
See the Symantec Data Loss Prevention Upgrade Guide for Linux at the Symantec Support
Center at http://www.symantec.com/docs/DOC9258.
Applying a Maintenance Pack 241
Applying a Symantec Data Loss Prevention Maintenance Pack

Table 13-2 Steps to apply the maintenance pack on Linux

Step Action Description

1 Download and extract the upgrade software. See “Downloading the maintenance pack
software for Windows servers” on page 237.

2 Confirm that all users are logged out of the Enforce If users are logged in during the
Server administration console. maintenance pack application process,
subsequent logins fail during the End User
Licensing Agreement confirmation.

3 Apply the maintenance pack to the Enforce Server. See “Updating the Enforce Server on Linux”
on page 241.

The process to apply the maintenance pack

to a single-tier installation omits the
detection server update step.

See “Updating a single-tier system on Linux”

on page 243.

4 Apply the maintenance pack to the detection See “Updating the detection server on Linux”
server. on page 242.

Downloading and extracting the maintenance pack software for Linux

servers
Copy the ZIP files to the computer from where you intend to perform the upgrade. That computer
must have a reliable network connection to the Enforce Server.
Copy the ZIP files into a directory on a system that is accessible to you. The root directory
where you move the files is referred to as the DLPDownloadHome directory.
Choose from the following files based on your current installation:
■ Apply the maintenance pack to the Enforce Server: EnforceServer.zip
■ Apply the maintenance pack to the detection server: DetectionServer.zip
■ Update a single-tier installation: SingleTierServer.zip

Updating the Enforce Server on Linux

The instructions that follow describe how to install the maintenance pack on an Enforce Server
on a Linux computer.
These instructions assume that Symantec Data Loss Prevention 15.5 is installed and that the
EnforceServer.zip file has been copied into the /opt/temp directory on the Enforce Server
computer.
Applying a Maintenance Pack 242
Applying a Symantec Data Loss Prevention Maintenance Pack

To update the Enforce Server

1 Log on as root to the Enforce Server system.
2 Navigate to the directory where you copied the EnforceServer.zip file. (/opt/temp)
3 Unzip the file to the same directory.
4 Perform the update process by running the following command:

rpm -Uvh
symantec-dlp-15-5-content-extraction-service-15.5-01074.x86_64.rpm
symantec-dlp-15-5-server-platform-common-15.5-01074.x86_64.rpm
symantec-dlp-15-5-content-extraction-plugins-15.5-01074.x86_64.rpm
symantec-dlp-15-5-enforce-server-15.5-01074.x86_64.rpm

Note: Replace filenames with those the maintenance pack version you are installing.

You can install the RPMs at once by running the following command:
rpm -Uvh *.rpm

If you used any relocations (--relocate default-path=new-path) during the initial

installation, you must use them again with the upgrade command.
5 Run the Update Configuration utility by running the following command:
cd "/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/install"
./EnforceServerUpdateConfigurationUtility

Note: You can install the maintenance pack using Silent Mode by running the following
command:
./EnforceServerUpdateConfigurationUtility -silent
-ORACLE_HOME=/opt/oracle/product/12.1.0/db_1 -oraclePassword=<ORACLE
PASSWORD>

where <ORACLE PASSWORD> is the database password used for Symantec Data Loss
Prevention.

During the update process, services shut down, then restart automatically. You can review
the update log file EnforceServerUpdateConfigurationUtility.log located at
/var/log/Symantec/DataLossPrevention/EnforceServer/15.5/debug.

Updating the detection server on Linux

The instructions that follow describe how to apply the maintenance pack to a detection server
on a Linux computer.
Applying a Maintenance Pack 243
Applying a Symantec Data Loss Prevention Maintenance Pack

These instructions assume that Symantec Data Loss Prevention 15.5 is installed and that the
DetectionServer.zip file has been copied into the /opt/temp/ directory on the server
computer.
To update the detection server
1 Log on as root to the system where the detection server is installed.
2 Navigate to the directory where you copied the DetectionServer.zip file. (/opt/temp)
3 Unzip the file to the same directory.
4 Apply the maintenance pack to the detection server by running the following command:

Note: Replace filenames with those the maintenance pack version you are installing.

You can install the RPMs at once by running the following command:
rpm -Uvh *.rpm

If you used any relocations (--relocate default-path=new-path) during the initial

installation, you must use them again with the upgrade command.

Updating a single-tier system on Linux

The instructions that follow describe how to apply a the maintenance pack to a single-tier
installation on a Linux computer.
These instructions assume that Symantec Data Loss Prevention 15.5 is installed and that the
SingleTierServer.zip file has been copied into the /opt/temp directory on the computer.

To update a single-tier installation

1 Log on as root to the Enforce Server system.
2 Navigate to the directory where you copied the SingleTierServer.zip file. (/opt/temp)
3 Unzip the file to the same directory.
Applying a Maintenance Pack 244
Applying a Symantec Data Loss Prevention Maintenance Pack

4 Apply the maintenance pack to the single-tier installation by running the following command:

rpm -Uvh
symantec-dlp-15-5-content-extraction-plugins-15.5-01074.x86_64.rpm
symantec-dlp-15-5-content-extraction-service-15.5-01074.x86_64.rpm
symantec-dlp-15-5-detection-server-15.5-01074.x86_64.rpm
symantec-dlp-15-5-enforce-server-15.5-01074.x86_64.rpm
symantec-dlp-15-5-server-platform-common-15.5-01074.x86_64.rpm
symantec-dlp-15-5-single-tier-server-15.5-01074.x86_64.rpm

Note: Replace filenames with those the maintenance pack version you are installing.

You can install the RPMs at once by running the following command:
rpm -Uvh *.rpm

If you used any relocations (--relocate default-path=new-path) during the initial

./SingleTierServerUpdateConfigurationUtility

Note: You can install Maintenance Patch 1 using Silent Mode by running the following
command:
./SingleTierServerUpdateConfigurationUtility -silent
-ORACLE_HOME=/opt/oracle/product/12.1.0/db_1 -oraclePassword=<ORACLE
PASSWORD>

where <ORACLE PASSWORD> is the database password used for Symantec Data Loss
Prevention 15.5.

During the update process, services shut down, then restart automatically. You can review
the update log file SingleTierServerUpdateConfigurationUtility.log located at
/var/log/Symantec/DataLossPrevention/SingleTierServer/15.5/debug/.
Section 3
Managing detection servers

■ Chapter 14. Installing and managing detection servers and cloud detectors

■ Chapter 15. Managing log files

■ Chapter 16. Using Symantec Data Loss Prevention utilities

Chapter 14
Installing and managing
detection servers and cloud
detectors
This chapter includes the following topics:

■ About managing Symantec Data Loss Prevention servers

■ Preparing for Microsoft Rights Management file monitoring

■ Enabling Advanced Process Control

■ Server controls

■ Server configuration—basic

■ Editing a detector

■ Server and detector configuration—advanced

■ Adding a detection server

■ Adding a cloud detector

■ Removing a server

■ Importing SSL certificates to Enforce or Discover servers

■ About the Overview screen

■ Configuring the Enforce Server to use a proxy to connect to cloud services

■ Server and detector status overview

■ Recent error and warning events list

Installing and managing detection servers and cloud detectors 247
About managing Symantec Data Loss Prevention servers

■ Server/Detector Detail screen

■ Advanced server settings

■ Advanced detector settings

■ About using load balancers in an endpoint deployment

About managing Symantec Data Loss Prevention

servers
Symantec Data Loss Prevention servers and cloud detectors are managed from the System
> Servers and Detectors > Overview screen. This screen provides an overview of your
system, including server status and recent system events. It displays summary information
about all Symantec Data Loss Prevention servers, a list of recent error and warning events,
and information about your license. From this screen you can add or remove detection servers.
■ Click on the name of a server to display its Server/Detector Detail screen, from which you
can control and configure that server.
See “Installing a new license file” on page 234.
See “About the Enforce Server administration console” on page 83.
See “About the Overview screen” on page 278.
See “Server/Detector Detail screen” on page 283.
See “Adding a detection server” on page 273.
See “Adding a cloud detector” on page 275.
See “Removing a server” on page 277.
See “Server controls” on page 251.
See “Server configuration—basic” on page 253.

Preparing for Microsoft Rights Management file

monitoring
You must complete prerequisites before enabling Microsoft Rights Management (RMS) file
detection. The following prerequisites apply to RMS administered by Azure RMS or Active
Directory (AD) RMS.
Installing and managing detection servers and cloud detectors 248
Preparing for Microsoft Rights Management file monitoring

Table 14-1 Microsoft Rights Management file monitoring prerequisites

RMS client Requirements

Azure RMS Install the RMS client, version 2.1, on the detection server.

AD RMS ■ Install the RMS client, version 2.1, on the detection server using a domain service
user that is added to the AD RMS Super Users group.
■ Provide both the AD RMS Service User and the DLP Service User with Read and
Execute permissions to access ServerCertification.asmx. Refer to the
Microsoft Developer Network for additional details:
https://msdn.microsoft.com/en-us/library/mt433203.aspx.
■ Add the detection server to the AD RMS server domain.
■ Run the detection server services using a domain user that is a member of the AD
RMS Super Users group.

After you install the detection server, you enable RMS file detection. See “Enabling Microsoft
Rights Management file monitoring” on page 248.

Enabling Microsoft Rights Management file monitoring

Symantec Data Loss Prevention can detect files that are encrypted using Microsoft Rights
Management (RMS) administered by Azure or Active Directory (AD).
Before you enable Microsoft Rights Management file monitoring, confirm that prerequisites for
the RMS environment and the detection server have been completed. See “Preparing for
Microsoft Rights Management file monitoring” on page 247.

Enabling RMS detection for Azure-managed RMS

For Azure RMS, complete the following on each detection server to enable RMS file monitoring:
1 Locate the plugin Enable-Plugin.ps1 located on the detection server at the following
path:

C:\Program Files\Symantec\DataLossPrevention\ContentExtractionService\15.5\Protect\plugins\
contentextraction\MicrosoftRightsManagementPlugin\

2 Run the plugin by executing the following command:

powershell.exe -ExecutionPolicy RemoteSigned -File

"C:\Program Files\Symantec\DataLossPrevention\ContentExtractionService\15.5\Protect\plugins\
contentextraction\MicrosoftRightsManagementPlugin\Enable-Plugin.ps1"

3 Run the configuration utility ConfigurationCreator.exe to add the system user. Run
the utility as the protect user.
Installing and managing detection servers and cloud detectors 249
Preparing for Microsoft Rights Management file monitoring

Note: Enter all credentials accurately to ensure that the feature is enabled.

C:\Program Files\Symantec\DataLossPrevention\ContentExtractionService\15.5\Protect\
plugins\contentextraction\MicrosoftRightsManagementPlugin\ConfigurationCreator.exe
Do you want to configure ADAL authentication [y/n]: n
Do you want to configure symmetric key authentication [y/n]: y
Enter your symmetric key (base-64): [user's Azure RMS symmetric key]
Enter your app principal ID: [user's Azure RMS app principal ID]
Enter your BPOS tenant ID: [user's Azure RMS BPOS tenant ID]

After running this script, the following files are created in the
MicrosoftRightsManagementPlugin at \Program
Files\Symantec\DataLossPrevention\ContentExtractionService\15.5\Protect\plugins\contentextraction\:

■ rightsManagementConfiguration

■ rightsManagementConfigurationProtection

4 Restart each detection server to complete the process.

Note: You can confirm that Symantec Data Loss Prevention is monitoring RMS content
by reviewing the ContentExtractionHost_FileReader.log file (located at
\ProgramData\Symantec\DataLossPrevention\DetectionServer\15.5\protect\Logs\debug).
Error messages that display for the MicrosoftRightsManagementPlugin.cpp item indicate
that the plugin is not monitoring RMS content.
Installing and managing detection servers and cloud detectors 250
Enabling Advanced Process Control

Enabling RMS detection for AD-managed RMS

For AD RMS, complete the following on each detection server to enable RMS file monitoring:
1 Run the plugin, Enable-Plugin.ps1, which is located at located at \Program
Files\Symantec\DataLossPrevention\Protect\bin on the Enforce Server.

powershell.exe -ExecutionPolicy RemoteSigned -File

"C:\Program Files\Symantec\DataLossPrevention\ContentExtractionService\15.5\Protect\plugins\
contentextraction\MicrosoftRightsManagementPlugin\Enable-Plugin.ps1"

2 Restart each detection server to complete the process.

Enabling Advanced Process Control

Symantec Data Loss Prevention Advanced Process Control lets you start or stop individual
server processes from the Enforce Server administration console. You do not have to start or
stop an entire server. This feature can be useful for debugging. When Advanced Process
Control is off (the default), each Server/Detector Detail screen shows only the status of the
entire server. When you turn Advanced Process Control on, the General section of the
Server/Detector Detail screen displays individual processes.
See “Server/Detector Detail screen” on page 283.
To enable Advanced Process Control
1 Go to System > Settings > General and click Configure.
The Edit General Settings screen is displayed.
2 Scroll down to the Process Control section and check the Advanced Process Control
box.
3 Click Save.
Table 14-2 describes the individual processes and the servers on which they run once advanced
process control is enabled.
Installing and managing detection servers and cloud detectors 251
Server controls

Table 14-2 Advanced processes

Process Description Control

Monitor Controller The Monitor Controller process The MonitorController Status is available for
controls detection servers. the Enforce Server.

File Reader The File Reader process detects The FileReader Status is available for all
incidents. detection servers.

Incident Writer The Incident Writer process sends The IncidentWriter Status is available for all
incidents to the Enforce Server. detection servers, unless they are part of a
single-tier installation, in which case there is only
one Incident Writer process.

Packet Capture The Packet Capture process The PacketCapture Status is available for
captures network streams. Network Monitor.

Request The Request Processor processes The RequestProcessor Status is available for
Processor SMTP requests. Network Prevent for Email.

Endpoint Server The Endpoint Server process The EndpointServer Status is available for
interacts with Symantec DLP Endpoint Prevent.
Agents.

Detection Server The Detection Server Database The DetectionServerDatabase Status is

Database process is used for automated available for Network Discover.
incident remediation tracking.

See “Server configuration—basic” on page 253.

Server controls
Servers and their processes are controlled from the Server/Detector Detail screen.
■ To reach the Server/Detector Detail screen for a particular server, go to the System >
Servers and Detectors > Overview screen and click a server name, detector name, or
appliance name in the list.
See “Server/Detector Detail screen” on page 283.
The status of the server and its processes appears in the General section of the
Server/Detector Detail screen. The Start, Recycle and Stop buttons control server and
process operations.
Current status of the server is displayed in the General section of the Server/Detector Detail
screen. The possible values are:
Installing and managing detection servers and cloud detectors 252
Server controls

Table 14-3 Server status values

Icon Status

Starting - In the process of starting.

Running - Running without errors.

Running Selected - Some processes on the server are stopped or have errors. To see
the statuses of individual processes, you must first enable Advanced Process Control
on the System Settings screen.

Stopping - In the process of stopping.

Stopped - Fully stopped.

Unknown - The Server has encountered one of the following errors:

■ Start. To start a server or process, click Start.

■ Recycle. To stop and restart a server, click Recycle.
■ Stop. To stop a server or process, click Stop.
■ To halt a process during its start-up procedure, click Terminate.
■ To reboot an appliance, click Reboot.

Note: Status and controls for individual server processes are only displayed if Advanced
Process Control is enabled for the Enforce Server. To enable Advanced Process Control, go
to System > Settings > General > Configure, check the Advanced Process Control box,
and click Save.

■ To update the status, click the refresh icon in the upper-right portion of the screen, as
needed.
See “About Symantec Data Loss Prevention administration” on page 82.
See “About the Overview screen” on page 278.
See “Server/Detector Detail screen” on page 283.
See “Server configuration—basic” on page 253.
See “System events reports” on page 165.
See “Server and Detectors event detail” on page 169.
Installing and managing detection servers and cloud detectors 253
Server configuration—basic

Server configuration—basic
Enforce Servers are configured from the System > Settings > General menu.
Detection servers and detectors are configured from each server's individual Configure Server
screen.
To configure a server
1 Go to the System > Servers and Detectors > Overview screen.
2 Click on the name of the server in the list.
That server's Server/Detector Detail screen is displayed. The following buttons are in
the upper-left portion of a Server/Detector Detail:
■ Done. Click Done to return to the previous screen.
■ Configure. Click Configure to specify a basic configuration for this server.
■ Server Settings. Click Server Settings to specify advanced configuration parameters
for this server. Use caution when modifying advanced server settings. It is
recommended that you check with Symantec Support before changing any of the
advanced settings.
See “Server and detector configuration—advanced” on page 273.
See Symantec Data Loss Prevention online Help for information about advanced
server configuration.

3 Click Configure or Server Settings to display a configuration screen for that type of
server.
4 Specify or change settings on the screen as needed, and then click Save.
Click Cancel to return to the previous screen without changing any settings.

Note: A server must be recycled before new settings take effect.

See “Server controls” on page 251.

The Configure Server screen contains a General section for all detection servers that contains
the following parameters:
■ Name. The name you choose to give the server. This name appears in the Enforce Server
administration console (System > Servers and Detectors > Overview). The name is
limited to 255 characters.
■ Host. The host name or IP address of the system hosting the server. Host names must be
fully qualified. If the host has more than one IP address, specify the address on which the
detection server listens for connections to the Enforce Server.
Installing and managing detection servers and cloud detectors 254
Server configuration—basic

■ Port. The port number used by the detection server to communicate with the Enforce
Server. The default is 8100.
For Single Tier Monitors, the Host field on the Configure Server page is pre-populated with
the local IP address 127.0.0.1. You cannot change this value.
The next portions of a Configure Server screen vary according to the type of server, except
for the OCR Engine and Detection tabs, which are common to all servers.
Click the OCR Engine tab to set up a connection to an OCR server.
See “Server configuration—basic”on page 705 on page 705.
Click the Detection tab to customize the Inspection Content Size.
See “Increasing the inspection content size” on page 459.
See “Network Monitor Server—basic configuration” on page 254.
See “Network Discover/Cloud Storage Discover Server and Network Protect—basic
configuration” on page 261.
See “Network Prevent for Email Server—basic configuration” on page 256.
See “Network Prevent for Web Server—basic configuration” on page 259.
See “Endpoint Server—basic configuration” on page 262.
See “Single Tier Monitor — basic configuration” on page 263.
See “Server/Detector Detail screen” on page 283.

Network Monitor Server—basic configuration

Field Description

Source Folder Override The source folder is the directory the server uses to
buffer network streams before it processes them.
The recommended setting is to leave the Source
Folder Override field blank to accept the default. If
you want to specify a custom buffer directory, type
the full path to the directory.

Network Interfaces Select the network interface card(s) to use for

monitoring. Note that to monitor a NIC WinPcap
software must be installed on the Network Monitor
Server.

See the Symantec Data Loss Prevention Installation

Guide for more information about NICs.

Th Protocol section of the Packet Capture specifies the types of network traffic (by protocol)
to capture. It also specifies any custom parameters to apply. This section lists the standard
protocols that you have licensed with Symantec, and any custom TCP protocols you have
added.
To monitor a particular protocol, check its box. When you initially configure a server, the settings
for each selected protocol are inherited from the system-wide protocol settings. You configure
these settings by going to System > Settings > Protocol. System-wide default settings are
listed as Standard.
Consult Symantec Data Loss Prevention online Help for information about working with
system-wide settings.
To override the inherited filtering settings for a protocol, click the name of the protocol. The
following custom settings are available (some settings may not be available for some protocols):
■ IP filter
■ L7 sender filter
■ L7 recipient filter
■ Content filter
■ Search Depth (packets)
■ Sampling rate
■ Maximum wait until written
■ Maximum wait until dropped
■ Maximum stream packets
■ Minimum stream size
Installing and managing detection servers and cloud detectors 256
Server configuration—basic

■ Maximum stream size

■ Segment Interval
■ No traffic notification timeout (The maximum value for this setting is 360000 seconds.)
Use the SMTP Copy Rule to modify the source folder where this server retrieves SMTP
message files. You can modify the Source Folder by entering the full path to a folder.
See “About Symantec Data Loss Prevention administration” on page 82.
See “About the Overview screen” on page 278.
See “Server/Detector Detail screen” on page 283.
See “Server configuration—basic” on page 253.
See “Server controls” on page 251.
In addition to the settings available through the Configure Server screen, you can specify
advanced settings for this server. To specify advanced configuration parameters, click Server
Settings on the server's Server/Detector Detail screen. Use caution when modifying advanced
server settings. Check with Symantec Support before you change any advanced setting.
See “Advanced server settings” on page 285.
See the Symantec Data Loss Prevention online Help for information about advanced server
settings.

Network Prevent for Email Server—basic configuration

Field Description

Trial Mode Trial mode lets you test prevention capabilities

without blocking requests. When trial mode is
selected, the server detects incidents and creates
incident reports, but does not block any messages.
Deselect this option to block those messages that
are found to violate Symantec Data Loss Prevention
policies.

Keystore Password If you use TLS authentication in a forwarding mode

configuration, enter the correct password for the
keystore file.

Next Hop Configuration Select Reflect to operate Network Prevent for Email
Server in reflecting mode. Select Forward to
operate in forwarding mode.
Note: If you select Forward you must also select
Enable MX Lookup orDisable MX Lookup to
configure the method that is used to determine the
next-hop MTA.

Enable MX Lookup This option applies only to forwarding mode

configurations.

Select Enable MX Lookup to perform a DNS query

on a domain name to obtain the mail exchange (MX)
records for the server. Network Prevent for Email
Server uses the returned MX records to select the
address of the next hop mail server.

If you select Enable MX Lookup, also add one or

more domain names in the Enter Domains text
box. For example:

companyname.com

Network Prevent for Email Server performs MX

record queries for the domain names that you
specify.
Note: You must include at least one valid entry in
the Enter Domains text box to successfully
configure forwarding mode behavior.
Installing and managing detection servers and cloud detectors 258
Server configuration—basic

Field Description

Disable MX Lookup This field applies only to forwarding mode

configurations.

Select Disable MX Lookup if you want to specify

the exact or IP address of one or more next-hop
MTAs. Network Prevent for Email Server uses the
hostnames or addresses that you specify and does
not perform an MX record lookup.

If you select Disable MX Lookup, also add one or

more hostnames or IP addresses for next-hop MTAs
in the Enter Hostnames text box. You can specify
multiple entries by placing each entry on a separate
line. For example:

smtp1.companyname.com
smtp2.companyname.com
smtp3.companyname.com

Network Prevent for Email Server always tries to

use the first MTA that you specify in the list. If that
MTA is not available, Network Prevent for Email
Server tries the next available entry in the list.
Note: You must include at least one valid entry in
the Enter Hostnames text box to successfully
configure forwarding mode behavior.

See the Symantec Data Loss Prevention MTA Integration Guide for Network Prevent for Email
for additional information about configuring Network Prevent for Email Server options.
See “About Symantec Data Loss Prevention administration” on page 82.
See “About the Overview screen” on page 278.
See “Server/Detector Detail screen” on page 283.
See “Server configuration—basic” on page 253.
See “Server controls” on page 251.
In addition to the settings available through the Configure Server screen, you can specify
advanced settings for this server. To specify advanced configuration parameters, click Server
Settings on the server's Server/Detector Detail screen. Use caution when modifying advanced
server settings. Check with Symantec Support before you change any advanced setting.
See “Advanced server settings” on page 285.
Installing and managing detection servers and cloud detectors 259
Server configuration—basic

See the Symantec Data Loss Prevention online Help for information about advanced server
settings.

Network Prevent for Web Server—basic configuration

Detection servers are configured from each server's individual Configure Server screen. To
display the Configure Server screen, go to the Overview screen (System > Servers and
Detectors > Overview) and click the name of the server in the list. That server's
Server/Detector Detail screen appears. Click Configure to display the Configure Server
screen.
A Network Prevent for Web Server Configure Server screen is divided into a general section,
a Symantec Encryption Server Administration section, and two tabs:
■ General section. This section specifies the server's name, host, and port.
■ Symantec Encryption Server Administration section. This section specifies the Symantec
Encryption Server Name, the Universal Service Protocol Port, and the Credential.
■ ICAP tab. This tab is for configuring the Internet Content Adaptation Protocol (ICAP) Use
the ICAP tab to configure web-based network traffic.
The ICAP tab is divided into four sections:
■ The Trial Mode section enables you to test prevention without blocking traffic. When trial
mode is selected, the server detects incidents and creates incident reports, but it does not
block any traffic. This option enables you to test your policies without blocking traffic. Check
the box to enable trial mode.
■ Click the box in the Security Configuration section to enable Secure ICAP with the Blue
Coat ProxySG server. You also must have a keystore configured and provide the keystore
password when you enable secure ICAP.
See “Configuring a secure ICAP keystore for Network Prevent for Web” on page 2067.
For instructions on setting up the Secure ICAP client configuration with Blue Coat ProxySG,
see the Blue Coat ProxySG documentation at
https://www.symantec.com/docs/DOC10459.html.
■ The Request Filtering section configures traffic filtering criteria:

Field Description

Ignore Requests Smaller Than Specify the minimum body size of HTTP
requests to inspect on this server. The
default value is 4096 bytes. HTTP requests
with bodies smaller than this number are
not inspected.
Installing and managing detection servers and cloud detectors 260
Server configuration—basic

Field Description

Ignore Requests from Hosts or Domains Enter the host names or domains whose
requests should be filtered out (ignored).
Enter one host or domain name per line.

Ignore Requests from User Agents Enter the names of user agents whose
requests should be filtered out (ignored).
Enter one agent per line.

■ The Response Filtering section configures the filtering criteria to manage HTTP responses:

Field Description

Ignore Responses Smaller Than Enter the minimum body size of HTTP
responses to inspect on this server. The
default value is 4096 bytes. HTTP
responses with bodies smaller than this
number are not inspected.

Inspect Content Type Specify the MIME content types that you
want this server to monitor. By default, this
field contains content type values for
standard Microsoft Office, PDF, and
plain-text formats. You can add other MIME
content type values. Enter separate content
types on separate lines. For example, to
inspect Excel files enter
application/ynd.ms-excel.

Ignore Responses from Hosts or Domains Enter the host names or domains whose
responses are to be ignored. Enter one host
or domain name per line.

Ignore Responses to User Agents Enter the names of user agents whose
responses are to be ignored. Enter one user
agent per line.

■ Click the OCR Engine tab to add an OCR Engine Configuration profile. Scroll to select
a configuration.
See “Server configuration—basic”on page 705 on page 705.
See “Creating an OCR configuration” on page 711.
■ The Connection section configures settings for the ICAP connection between an HTTP
proxy server and the Network Prevent for Web Server:
Installing and managing detection servers and cloud detectors 261
Server configuration—basic

Field Description

TCP Port Specify the TCP port number that this

server is to use to listen to ICAP requests.
The same value must be configured on the
HTTP proxy sending ICAP requests to this
server. The recommended value is 1344.

Maximum Number of Requests Enter the maximum number of simultaneous

ICAP request connections. The default is
25.

Maximum Number of Responses Enter the maximum number of simultaneous

ICAP response connections from the HTTP
proxy or proxies that are allowed. The
default is 25.

Connection Backlog Enter the maximum number of waiting

connections allowed. Each waiting
connection means that a user waits at their
browser. The minimum value is 1.

See “Configuring Network Prevent for Web Server” on page 2064.

See “About Symantec Data Loss Prevention administration” on page 82.
See “About the Overview screen” on page 278.
See “Server/Detector Detail screen” on page 283.
See “Server configuration—basic” on page 253.
See “Server controls” on page 251.
In addition to the settings available through the Configure Server screen, you can specify
advanced settings for this server. To specify advanced configuration parameters, click Server
Settings on the server's Server/Detector Detail screen. Use caution when modifying Advanced
Server settings. Check with Symantec Support before you change any advanced setting.
See “Advanced server settings” on page 285.
See the Symantec Data Loss Prevention online Help for information about Advanced Server
settings.

Network Discover/Cloud Storage Discover Server and Network

Protect—basic configuration
Detection servers are configured from each server's individual Configure Server screen. To
display the Configure screen for a server, go to the System > Servers and Detectors >
Installing and managing detection servers and cloud detectors 262
Server configuration—basic

Overview screen and click on the name of the server in the list. That server's Server/Detector
Detail screen is displayed. Click Configure. The server's Configure Server screen is displayed.
See “Modifying the Network Discover/Cloud Storage Discover Server configuration” on page 2083.
A Network Discover Server's Configure Server screen is divided into a the following sections:
■ General section. This section is for specifying the server's name, host, and port.
See “Server configuration—basic” on page 253.
■ Discover tab. This tab is for performing the following configurations:
■ Modifying the number of parallel scans that run on this Discover Server.
The maximum count can be increased at any time. After it is increased, any queued
scans that are eligible to run on the Network Discover Server are started. The count
can be decreased only if the Network Discover Server has no running scans. Before
you reduce the count, pause, or stop, all scans running on the server.
To view the scans running on Network Discover Servers, go to Manage > Discover
Scanning > Discover Targets.

■ Configuring network proxy settings for connecting to the Symantec Information Centric
Encryption (ICE) Cloud.
You can specify an existing network proxy in your setup and, optionally, provide the
authentication credentials for connecting to it. Network Discover uses the proxy server
to communicate with the ICE Cloud whenever file share (File System) scans trigger the
Network Protect: Encrypt File response action.
See “Configuring Network Discover to use a proxy to connect to the Symantec ICE
Cloud for file share scans” on page 2085.

See “About Symantec Data Loss Prevention administration” on page 82.

See “Server/Detector Detail screen” on page 283.
See “Server configuration—basic” on page 253.
See “Server controls” on page 251.
In addition to the settings available through the Configure Server screen, you can also specify
advanced settings for this server. To specify advanced configuration parameters, click Server
Settings on the Server/Detector Detail screen. Use caution when modifying advanced server
settings. It is recommended that you check with Symantec Support before changing any of
the advanced settings.
See “Advanced server settings” on page 285.

Endpoint Server—basic configuration

Detection servers are configured from each server's individual Configure Server screen. To
display the Configure screen for a server, go to the System > Servers and Detectors >
Installing and managing detection servers and cloud detectors 263
Server configuration—basic

Overview screen and click the name of the server. The Server/Detector Detail screen for
that server is displayed. Click Configure to display the Configure Server screen for that
server.
See “Adding a detection server” on page 273.
The Configure Server screen for an Endpoint Server is divided into a general section and the
following tabs:
■ General. This section is for specifying the server name, host, and port.
See “Server configuration—basic” on page 253.
■ Agent. This section is for adding agent security certificates to the Endpoint Server.
See “Adding and editing agent configurations” on page 2348.
Agent Listener. Use this section to configure the Endpoint Server to listen for connections
from Symantec DLP Agents:

Field Description

Bind address Enter the IP address on which the Endpoint Server listens for communications from
the Symantec DLP Agents. The default IP address is 0.0.0.0 which allows the
Endpoint Server to listen on all host IP addresses.

Port Enter the port over which the Endpoint Server listens for communications from the
Symantec DLP Agents.
Note: Many Linux systems restrict ports below 1024 to root access. The Endpoint
Server cannot by configured to listen for connections from Symantec DLP Agents
to these restricted ports on Linux systems.

Note: If you are using FIPS 140-2 mode for communication between the Endpoint Server and
DLP Agents, do not use Diffie-Hellman (DH) cipher suites. Mixing cipher suites prevents the
agent and Endpoint Server from communicating. You can confirm the current cipher suit setting
by referring to the EndpointCommunications.SSLCipherSuites setting on the Server
Settings page. See “Advanced server settings” on page 285.

Single Tier Monitor — basic configuration

Detection servers are configured from each server's individual Configure Server screen. To
display the Configure Server screen, go to the System > Servers and Detectors > Overview
screen and click the name of the server in the list. That server's Server/Detector Detail screen
appears. Click Configure to display the Configure Server screen.
The Single Tier Monitor is a detection server that includes the detection capabilities of the
Network Monitor, Network Discover/Cloud Storage Discover, Network Prevent for Web, Network
Prevent for Email, and the Endpoint Prevent and Endpoint Discover detection servers. Each
Installing and managing detection servers and cloud detectors 264
Server configuration—basic

of these detection server types is associated with one or more detection "channels." The Single
Server deployment simplifies Symantec Data Loss Prevention administration and reduces
maintenance and hardware costs for small organizations, or for branch offices of larger
enterprises that would benefit from on-site deployments of Symantec Data Loss Prevention.

Configuring the channels for Network Monitor

Network Monitor uses two channels: Packet Capture and SMTP Copy Rule. To configure
Network Monitor, enter your configuration information on both the Packet Capture and SMTP
Copy Rule tabs on the Configure Server screen.
To configure the Packet Capture and SMTP Copy Rule tabs
1 Optional: On the Packet Capture tab of the Configure Server Screen, specify the Source
Folder Override.
The source folder is the directory the server uses to buffer network streams before it
processes them. The recommended setting is to leave the Source Folder Override field
blank to accept the default. If you want to specify a custom buffer directory, type the full
path to the directory.
2 Select the Network Interfaces.
Select the network interface card(s) to use for monitoring.
Note that to monitor a NIC WinPcap software must be installed on the Network Monitor
Server.
See the Symantec Data Loss Prevention Installation Guide for more information about
NICs.
3 In the Protocol section, check the box for each type of network traffic to capture.
When you initially configure a server, the settings for each selected protocol are inherited
from the system-wide protocol settings. You configure these settings by going to System
> Settings > Protocol. System-wide default settings are listed as Standard. To override
the inherited filtering settings for a protocol, click the name of the protocol. The following
custom settings are available (some settings may not be available for some protocols):
■ IP filter
■ L7 sender filter
■ L7 recipient filter
■ Content filter
■ Search Depth (packets)
■ Sampling rate
■ Maximum wait until written
Installing and managing detection servers and cloud detectors 265
Server configuration—basic

■ Maximum wait until dropped

■ Maximum stream packets
■ Minimum stream size
■ Maximum stream size
■ Segment Interval
■ No traffic notification timeout (The maximum value for this setting is 360000 seconds.)

4 Optional: On the SMTP Copy Rule tab, specify the Source Folder Override to modify
the source folder where this server retrieves SMTP message files.
You can modify the source folder by entering the full path to a folder. Leave this field blank
to use the default source folder.

Configuring the channel for Network Discover/Cloud Storage Discover

Network Discover/Cloud Storage Discover uses the Discover channel. On the Discover tab,
you can modify the number of parallel scans that run on the Single Tier Monitor by entering a
number in the Maximum Parallel Scans field.

Note: If you plan to use the grid scanning feature to distribute the scanning workload across
multiple detection servers, retain the default value (1).

The maximum count can be increased at any time. After it is increased, any queued scans
that are eligible to run on the Network Discover Server are started. The count can be decreased
only if the Network Discover Server has no running scans. Before you reduce the count, pause,
or stop, all scans running on the server.

Configuring the channel for Network Prevent for Web

Network Prevent for Web uses the ICAP channel. The ICAP channel configuration tab is
divided into four sections: Request Filtering, Response Filtering, and Connection.
Installing and managing detection servers and cloud detectors 266
Server configuration—basic

To configure the ICAP tab

1 Verify or change the Trial Mode setting. Trial Mode lets you test prevention without
blocking requests in real time. If you select Trial Mode, Symantec Data Loss Prevention
detects incidents and indicates that it has blocked an HTTP communication, but it does
not block the communication.
2 Verify or modify the filter options for requests from HTTP clients (user agents). The options
in the Request Filtering section are as follows:

Ignore Requests Smaller Than Specifies the minimum body size of HTTP
requests to inspect. (The default is 4096 bytes.)
For example, search-strings typed in to search
engines such as Yahoo or Google are usually
short. By adjusting this value, you can exclude
those searches from inspection.

Ignore Requests without Attachments Causes the server to inspect only the requests
that contain attachments. This option can be
useful if you are mainly concerned with requests
intended to post sensitive files.

Ignore Requests to Hosts or Domains Causes the server to ignore requests to the hosts
or domains you specify. This option can be useful
if you expect a lot of HTTP traffic between the
domains of your corporate headquarters and
branch offices. You can type one or more host
or domain names (for example,
www.company.com), each on its own line.

Ignore Requests from User Agents Causes the server to ignore requests from user
agents (HTTP clients) you specify. This option
can be useful if your organization uses a program
or language (such as Java) that makes frequent
HTTP requests. You can type one or more user
agent values, each on its own line.
Installing and managing detection servers and cloud detectors 267
Server configuration—basic

3 Verify or modify the filter options for responses from web servers. The options in the
Response Filtering section are as follows:

Ignore Responses Smaller Than Specifies the minimum size of the body of HTTP
responses that are inspected by this server.
(Default is 4096 bytes.)

Inspect Content Type Specifies the MIME content types that Symantec
Data Loss Prevention should monitor in
responses. By default, this field contains
content-type values for Microsoft Office, PDF,
and plain text formats. To add others, type one
MIME content type per line. For example, type
application/word2013 to have Symantec
Data Loss Prevention analyze Microsoft Word
2013 files.

Note that it is generally more efficient to specify

MIME content types at the Web proxy level.

Ignore Responses from Hosts or Domains Causes the server to ignore responses from the
hosts or domains you specify. You can type one
or more host or domain names (for example,
www.company.com), each on its own line.

Ignore Responses to User Agents Causes the server to ignore responses to user
agents (HTTP clients) you specify. You can type
one or more user agent values, each on its own
line.
Installing and managing detection servers and cloud detectors 268
Server configuration—basic

4 Verify or modify settings for the ICAP connection between the HTTP proxy server and the
Web Prevent Server. The Connection options are as follows:

TCP Port Specifies the TCP port number over which this
server listens for ICAP requests. This number
must match the value that is configured on the
HTTP proxy that sends ICAP requests to this
server. The recommended value is 1344.

Maximum Number of Requests Specifies the maximum number of simultaneous

ICAP request connections from the HTTP proxy
or proxies. The default is 25.

Maximum Number of Responses Specifies the maximum number of simultaneous

ICAP response connections from the HTTP proxy
or proxies. The default is 25.

Connection Backlog Specifies the number of waiting connections

allowed. A waiting connection is a user waiting
for an HTTP response from the browser. The
minimum value is 1. If the HTTP proxy gets too
many requests (or responses), the proxy handles
them according to your proxy configuration. You
can configure the HTTP proxy to block any
requests (or responses) greater than this number.

Configuring the channel for Network Prevent for Email

Network Prevent for Email uses the Inline SMTP channel. The Inline SMTP configuration tab
is divided into three sections: Maximum number of connections, Security Configuration,
and Next Hop Configuration.
To configure the Inline SMTP tab
1 Verify or change the Trial Mode setting. Trial Mode lets you test prevention without
blocking requests in real time. If you select Trial Mode, Symantec Data Loss Prevention
detects incidents and indicates that it has blocked an email message, but it does not block
the message.
2 Verify or modify the Maximum number of connections. By default, the maximum number
of connections is 12.
Installing and managing detection servers and cloud detectors 269
Server configuration—basic

3 If you use TLS authentication in a forwarding mode configuration, enter the correct
password for the keystore file in the Keystore Password field of the Security
Configuration section.
Installing and managing detection servers and cloud detectors 270
Server configuration—basic

4 In the Next Hop Configuration section, configure reflecting mode or forwarding mode by
modifying the following fields:

Field Description

Next Hop Configuration Select Reflect to operate Network Prevent for

Email Server in reflecting mode. Select Forward
to operate in forwarding mode.
Note: If you select Forward you must also select
Enable MX Lookup or Disable MX Lookup to
configure the method used to determine the
next-hop MTA.

Enable MX Lookup This option applies only to forwarding mode

configurations.

Select Enable MX Lookup to perform a DNS

query on a domain name to obtain the mail
exchange (MX) records for the server. Network
Prevent for Email Server uses the returned MX
records to select the address of the next hop mail
server.

If you select Enable MX Lookup, also add one

or more domain names in the Enter Domains
text box. For example:

companyname.com

Network Prevent for Email Server performs MX

record queries for the domain names that you
specify.
Note: You must include at least one valid entry
in the Enter Domains text box to successfully
configure forwarding mode behavior.
Installing and managing detection servers and cloud detectors 271
Server configuration—basic

Field Description

Disable MX Lookup This field applies only to forwarding mode

configurations.

Select Disable MX Lookup if you want to specify

the exact hostname or IP address of one or more
next-hop MTAs. Network Prevent for Email
Server uses the hostnames or addresses that
you specify and does not perform an MX record
lookup.

If you select Disable MX Lookup, also add one

or more hostnames or IP addresses for next-hop
MTAs in the Enter Hostnames text box. You can
specify multiple entries by placing each entry on
a separate line. For example:

smtp1.companyname.com
smtp2.companyname.com
smtp3.companyname.com

Network Prevent for Email Server always tries to

proxy to the first MTA that you specify in the list.
If that MTA is not available, Network Prevent for
Email Server tries the next available entry in the
list.
Note: You must include at least one valid entry
in the Enter Hostnames text box to successfully
configure forwarding mode behavior.

Configuring the channel for Endpoint

Endpoint uses the Endpoint channel. You can configure the Endpoint channel on the Agent
tab.
To configure the Agent tab
◆ Configure the Agent Listener fields:

Field Description

Bind address Enter the IP address on which the Endpoint Server listens for communications
from the Symantec DLP Agents. The default IP address is 0.0.0.0 which allows
the Endpoint Server to listen on all host IP addresses.

Port Enter the port over which the Endpoint Server listens for communications from
the Symantec DLP Agents.
Installing and managing detection servers and cloud detectors 272
Editing a detector

Configuring Advanced Server Settings for the Single Tier Monitor

Because the Single Tier Monitor runs multiple channels on the same detection server, you
must modify some Advanced Server Settings to get the best performance from your system.
To modify the Advanced Server Settings on your Single Tier Monitor
1 Log on to the Enforce Server as Administrator.
2 Go to System > Servers and Detectors > Overview.
The Overview page appears.
3 Click the Single Tier Monitor detection server row.
The Server/Detector Detail page appears.
4 Click Server Settings.
The Server/Detector Detail - Advanced Settings page appears.
5 Modify the following settings:

Setting Value

MessageChain.NumChains 32

MessageChain.CacheSize 32

PacketCapture.NUMBER_BUFFER_POOL_PACKETS 1,200,000

PacketCapture.NUMBER_SMALL_POOL_PACKETS 1,000,000

6 Click Save.
See “About Symantec Data Loss Prevention administration” on page 82.
See “About the Overview screen” on page 278.
See “Server/Detector Detail screen” on page 283.
See “Server configuration—basic” on page 253.
See “Server controls” on page 251.
See “Advanced server settings” on page 285.
See the Symantec Data Loss Prevention online Help for information about Advanced Server
settings.

Editing a detector
You can change the name of your detector on the Server/Detector Detail screen.
Installing and managing detection servers and cloud detectors 273
Server and detector configuration—advanced

Editing the name of a detector

1 Go to System > Servers and Detectors > Overview and click on the name of the detector.
The Server/Detector Detail screen appears.
2 Click Edit.
The Edit Detector page appears.
3 Enter a new name for the detector in the Detector Name field.
4 Click Save.

Server and detector configuration—advanced

Symantec Data Loss Prevention provides advanced server and detector configuration settings
for each detection server or detector in your system.

Note: Check with Symantec Support before changing any advanced settings. If you make a
mistake when changing advanced settings, you can severely degrade performance or even
disable the server entirely.

To change an advanced configuration setting for a detection server or detector

1 Go to System > Servers and Detectors > Overview and click on the name of the detection
server.
That server's Server/Detector Detail screen appears.
2 Click Server Settings or Detector Settings, as appropriate.
The Server/Detector Detail - Advanced Settings screen appears.
See Symantec Data Loss Prevention online Help for information about advanced server
configuration.
See “Advanced server settings” on page 285.
3 With the guidance of Symantec Support, modify the appropriate setting(s).
4 Click Save.
Changes to settings on this screen normally do not take effect until you restart the server.
See “Server configuration—basic” on page 253.

Adding a detection server

Add the detection servers that you want to your Symantec Data Loss Prevention system from
the System > Servers and Detectors > Overview screen.
Installing and managing detection servers and cloud detectors 274
Adding a detection server

You can add the following types of servers:

■ Network Monitor Server, which monitors network traffic.
■ Network Discover/Cloud Storage Discover Server, which inspects stored data for policy
violations.
■ Network Prevent for Email Server, which prevents SMTP violations.
■ Cloud Prevent for Email Server, which prevents Microsoft Office 365 Exchange traffic
violations.
■ Network Prevent for Web Server, which prevents ICAP proxy server violations such as
FTP, HTTP, and HTTPS.
■ Endpoint Prevent, which controls Symantec DLP Agents that monitor and scan endpoints.
■ Single-Tier Server: By selecting the Single-Tier Server option, the detection servers that
you have licensed are installed on the same host as the Enforce Server. The single-tier
server performs detection for the following products (you must have a license for each):
Network Monitor, Network Discover, Network Prevent for Email, Network Prevent for Web,
and Endpoint Prevent.

Note: Symantec recommends that you apply the same hardware and software configuration
to all of the detections servers that you intend to use for grid scans. Symantec Data Loss
Prevention supports grid scans that have up to 11 participating detection servers.

To add a detection server

1 Go to the System Overview screen (System > Servers and Detectors > Overview).
See “About the Overview screen” on page 278.
2 Click Add Server.
The Add Server screen appears.
3 Select the type of server you want to install and click Next.
The Configure Server screen for that detection server appears.
Installing and managing detection servers and cloud detectors 275
Adding a cloud detector

4 To perform the basic server configuration, use the Configure Server screen, then click
Save when you are finished.
See “Network Monitor Server—basic configuration” on page 254.
See “Network Prevent for Email Server—basic configuration” on page 256.
See Symantec Data Loss Prevention Cloud Prevent for Microsoft Office 365 Implementation
Guide for more details.
See “Network Prevent for Web Server—basic configuration” on page 259.
See “Network Discover/Cloud Storage Discover Server and Network Protect—basic
configuration” on page 261.
See “Endpoint Server—basic configuration” on page 262.
5 In addition to the configuration steps specific to each server, you can configure the OCR
Engine or Detection server Inspection Content Size from tabs on this screen.
See OCR Engine configuration.
See Inspection Content Size settings.
6 To return to the System Overview screen, click Done.
Your new server is displayed in the Servers and Detectors list with a status of Unknown.
7 Click on the server to display its Server/Detector Detail screen.
See “Server/Detector Detail screen” on page 283.
8 Click [Recycle] to restart the server.
9 Click Done to return to the System Overview screen.
When the server is finished restarting, its status displays Running.
10 If necessary, click Server Settings on the Server/Detector Detail screen to perform
Advanced Server configuration.
See “Advanced server settings” on page 285.
See Symantec Data Loss Prevention online Help for information about Advanced Server
configuration.
See “Server configuration—basic” on page 253.

Adding a cloud detector

A cloud detector is a Symantec Data Loss Prevention detection service deployed in the
Symantec Cloud. After Symantec has set up your detection service in the cloud, Symantec
sends you an enrollment bundle. This bundle contains the information that you need to set up
Installing and managing detection servers and cloud detectors 276
Adding a cloud detector

the connection from your on-premises Enforce Server to the detection service in the Symantec
Cloud.
The enrollment bundle is a ZIP archive. For security reasons, you should save the unextracted
ZIP file to a location that is not accessible by others users. For example, on a Microsoft Windows
system, save the bundle to a folder such as:

c:\Users\username\downloads

On a Linux system, save the bundle to a directory such as:

/home/username/

See the documentation for your cloud detector for more detailed information about the
enrollment process.
After you have saved the enrollment bundle, register your cloud detector to enable
communication between it and your on-premises Enforce Server.
To register a cloud detector
1 Log on to the Enforce Server as Administrator.
2 Navigate to System > Servers and Detectors > Overview.
The Overview page appears.
3 Click Add Cloud Detector.
The Add Cloud Detector page appears.
4 Click Browse in the Enrollment Bundle File field.
5 Locate your saved enrollment bundle file, then enter a name in the Detector Name field.
6 Click Enroll Detector.
The Server/Detector Detail screen appears.
7 If necessary, click Detector Settings on the Server/Detector Detail screen to perform
advanced detector configuration.
See “Advanced detector settings” on page 326.
8 Click Done.
It may take several minutes for the Enforce Server administration console to show that the
cloud detector is running. To verify that the detector was added, check the System > Servers
and Detectors > Overview page. The detector should appear in the Servers and Detectors
list with the Connected status.
Installing and managing detection servers and cloud detectors 277
Removing a server

Removing a server
See the appropriate Symantec Data Loss Prevention Installation Guide for information about
uninstalling Symantec Data Loss Prevention from a server.
An Enforce Server administration console lists the detection servers registered with it on the
System > Servers and Detectors > Overview screen. If Symantec Data Loss Prevention is
uninstalled from a detection server, or that server is stopped or disconnected from the network,
its status is shown as Unknown on the console.
A detection server can be removed (de-registered) from an Enforce Server administration
console. When a detection server is removed from an Enforce Server, its Symantec Data Loss
Prevention services continue to operate. This means that even though a detection server is
de-registered from Enforce, it continues to function unless some action is taken to halt it. In
other words, even though it is removed from an Enforce Server administration console, a
detection server continues to operate. Incidents it detects are stored on the detection server.
If a detection server is re-registered with an Enforce Server, incidents detected and stored are
then forwarded to Enforce.
To remove (de-register) a detection server from Enforce
1 Go to System > Servers and Detectors > Overview.
See “About the Overview screen” on page 278.
2 In the Servers and Detectors section of the screen, click the red X on a server's status
line to remove it from this Enforce Server administration console.
See “Server controls” on page 251.
3 Click OK to confirm.
The server's status line is removed from the System Overview list.

Importing SSL certificates to Enforce or Discover

servers
You can import SSL certificates to the Java trusted keystore on the Enforce or Discover servers.
The SSL certificate can be self-signed (server) or issued by a well-known certificate authority
(CA).
You may need to import an SSL certificate to make secure connections to external servers
such as Active Directory (AD). If a recognized authority has signed the certificate of the external
server, the certificate is automatically added to the Enforce Server. If the server certificate is
self-signed, you must manually import it to the Enforce or Discover Servers.
Installing and managing detection servers and cloud detectors 278
About the Overview screen

Table 14-4 Importing an SSL certificate to Enforce or Discover

Step Description

1 Copy the certificate file you want to import to the Enforce Server or Discover Server computer.

2 Change directory to c:\Program

Files\Symantec\DataLossPrevention\ServerJRE\1.8.0_181\lib\security on
the Enforce Server or Discover Server computer.

3 Execute the keytool utility with the -importcert option to import the public key certificate
to the Enforce Server or Discover Server keystore:

keytool -importcert -alias new_endpointgroup_alias

-keystore ..\lib\security\cacerts -file my-domaincontroller.crt

In this example command, new_endpointgroup_alias is a new alias to assign to the imported

certificate and my-domaincontroler.crt is the path to your certificate.

4 When you are prompted, enter the password for the keystore.

By default, the password is changeit. If you want you can change the password when prompted.

To change the password, use: keytool -storepassword -alias

new_endpointgroup_alias -keystore ..\lib\security\cacerts

5 Answer Yes when you are asked if you trust this certificate.

6 Restart the Enforce Server or Discover Server.

See “Configuring directory server connections” on page 156.

About the Overview screen

The System Overview screen is reached by System > Servers and Detectors > Overview.
This screen provides a quick snapshot of system status. It lists information about the Enforce
Server, and each registered detection server, cloud detector, or appliance.
The System Overview screen provides the following features:
■ The Add Server button is used to register a detection server. When this screen is first
viewed after installation, only the Enforce Server is listed. You must register your various
detection servers with the Add Server button. After you register detection servers, they
are listed in the Servers and Detectors section of the screen.
See “Adding a detection server” on page 273.
■ The Add Cloud Detector button is used to register a cloud detector. When this screen is
first viewed after installation, only the Enforce Server is listed. You must register your cloud
Installing and managing detection servers and cloud detectors 279
Configuring the Enforce Server to use a proxy to connect to cloud services

detectors with the Add Cloud Detector button. After you register cloud detectors, they are
listed in the Servers and Detectors section of the screen.
■ The Add Appliance button is used to register and appliance. When this screen is first
viewed after installation, on the Enforce Server is listed. You must register your appliances
with the Add Appliance button. After you register your appliances, they are listed in the
Servers and Detectors section of the screen.
See “Adding an appliance” on page 2539.
■ The System Readiness and Appliances Update button is used to access the System
Readiness and Appliances Update screen where you can run tests to confirm that
database update readiness and update appliances.

■ The Upgrade button is for upgrading Symantec Data Loss Prevention to a newer version.
See “About system upgrades” on page 235.
See also the appropriate Symantec Data Loss Prevention Upgrade Guide.
■ The Servers and Detectors section of the screen displays summary information about
the status of each server, detector, or appliance. It can also be use to remove (de-register)
a server, detector, or appliance.
See “Server and detector status overview” on page 280.
■ The Recent Error and Warning Events section shows the last five events of error or
warning severity for any of the servers listed in the Servers and Detectors section.
See “Recent error and warning events list” on page 282.
■ The License section of the screen lists the Symantec Data Loss Prevention individual
products that you are licensed to use.
See “Server configuration—basic” on page 253.
See “About Symantec Data Loss Prevention administration” on page 82.

Configuring the Enforce Server to use a proxy to

connect to cloud services
To configure the Enforce Server to use a proxy to connect to cloud services, you must set up
your proxy according to the proxy manufacturer's instructions. Then you configure the Enforce
Server to support the use of the proxy. After setting up your proxy, use these instructions to
complete the setup.
If you have configured the Enforce Server to connect to the Symantec ICE Cloud, Network
Protect uses the configured proxy to connect to the ICE Cloud whenever a SharePoint scan
triggers the SharePoint Encrypt response action.
See “Configuring the Enforce Server to connect to the Symantec ICE Cloud” on page 224.
Installing and managing detection servers and cloud detectors 280
Server and detector status overview

Network Discover also supports network proxies for connecting to the ICE Cloud during file
share (File System) scans. To configure the network proxy settings for file share scans, you
must update the Network Discover/Cloud Storage Discover Server configuration.
See “Configuring Network Discover to use a proxy to connect to the Symantec ICE Cloud for
file share scans” on page 2085.
To configure the Enforce Server to use a proxy to connect to a cloud service
1 Go to System > Settings > General and click Configure. The Edit General Settings
screen is displayed.
2 In the Enforce to Cloud Proxy Settings section, select one of the following proxy
categories:
■ No proxy, or transparent proxy, or
■ Manual proxy

3 If you choose Manual proxy, fields for a URL, Port, and Proxy is Authenticated appear.
■ Enter the the HTTP Proxy URL.
■ Enter a port number.

4 If you are using an authenticated proxy, also enter

■ a user ID
■ a password

Note: The Enforce Server supports basic authentication when using a proxy to connect
to cloud services. For connecting to the ICE Cloud, the Enforce Server supports basic,
NTLM, and Kerberos authentication.

5 Click Save.

Server and detector status overview

The Servers and Detectors section of the System Overview screen is reached by System
> Servers and Detectors > Overview. This section of the screen provides a quick overview
of system status.

Table 14-5 Server and detector statuses

Icon Status Description

Starting The server is starting up.

Installing and managing detection servers and cloud detectors 281
Server and detector status overview

Table 14-5 Server and detector statuses (continued)

Icon Status Description

Running The server is running normally without errors.

Running Selected Some Symantec Data Loss Prevention processes on the server are
stopped or have errors. To see the statuses of individual processes, you
must first enable Advanced Process Control on the System Settings
screen.

See “Enabling Advanced Process Control” on page 250.

Stopping The server is in the process of stopping Symantec Data Loss Prevention
services.

See “About Symantec Data Loss Prevention services” on page 101.

Stopped All Symantec Data Loss Prevention processes are stopped.

Unknown The server is experiencing one of the following errors:

■ The Enforce Server is not reachable from server.

■ Symantec Data Loss Prevention is not installed on the server.
■ A license key has not been configured for the Enforce Server.
■ There is problem with Symantec Data Loss Prevention account
permissions in Windows.

For each server, the following additional information appears. You can also click on any server
name to display the Server/Detector Detail screen for that server.

Table 14-6 Server and detector status additional information

Column name Description

Messages (Last 10 sec) The number of messages processed in the last 10 seconds.

Messages (Today) The number of messages processed since 12:00 AM today.

Incidents (Today) The number of incidents processed since 12:00 AM today.

For Endpoint Servers, the Messages and Incidents are not aligned. This
is because messages are being processed at the Endpoint and not the
Endpoint Server. However, the incident count still increases.
Installing and managing detection servers and cloud detectors 282
Recent error and warning events list

Table 14-6 Server and detector status additional information (continued)

Column name Description

Incident Queue For the Enforce Server, this is the number of incidents that are in the
database, but do not yet have an assigned status. This number is updated
whenever this screen is generated.

For the other types of servers, this is the number of incidents that have
not yet been written to the Enforce Server. This number is updated
approximately every 30 seconds. If the server is shut down, this number
is the last number updated by the server. Presumably the incidents are
still in the incidents folder.

Message Wait Time The amount of time it takes to process a message after it enters the
system. This data applies to the last message processed. If the server
that processed the last message is disconnected, this is N/A.

To see details about a server or detector

◆ Click on any server name to see additional details regarding that server.
See “Server/Detector Detail screen” on page 283.
To remove a server or detector from an Enforce Server
◆ Click the red X for that server, and then confirm your decision.

Note: Removing (de-registering) a server only disconnects it from this Enforce Server, it does
not stop the detection server from operating.

See “Removing a server” on page 277.

Recent error and warning events list

The Recent Error and Warning Events section of the System > Servers and Detectors >
Overview screen shows the last five events of either error or warning severity for any of the
servers listed in the Servers and Detectors section.

Table 14-7 Recent error and warning events information

Column name Description

Type

The yellow triangle indicates a warning, the red octagon indicates an error.
Installing and managing detection servers and cloud detectors 283
Server/Detector Detail screen

Table 14-7 Recent error and warning events information (continued)

Column name Description

Time The date and time when the event occurred.

Server The name of the server on which the event occurred.

Host The IP address or name of the machine where the server resides. The server and
host names may be the same.

Code The system event code. The Messagecolumn provides the code text. Event lists
can be filtered by code number.

Message A summary of the error or warning message that is associated with this event code.

■ To display a list of all error and warning events, click Show all.
■ To display the Event Detail screen for additional information about that particular event,
click an event.
See “About the Overview screen” on page 278.
See “System events reports” on page 165.
See “Server and Detectors event detail” on page 169.

Server/Detector Detail screen

The Server/Detector Detail screen provides detailed information about a single selected
server, detector, or appliance. The Server/Detector Detail screen is also used to control and
configure a server, detector, or appliance.
To display the Server/Detector Detail screen for a particular server or detector
1 Navigate to the System > Servers and Detectors > Overview screen.
2 Click the detection server, detector, or appliance name in the Servers and Detectors list.
See “About the Overview screen” on page 278.
The Server/Detector Detail screen is divided into sections. The sections listed below display
all server, detector, and appliance types. The system displays sections based on the type of
detection.
Installing and managing detection servers and cloud detectors 284
Server/Detector Detail screen

Table 14-8 Server Detail screen display information

Server Detail display Description

sections

General The General section identifies the server, displays system status and statistics,
and provides controls for starting and stopping the server and its processes.

See “Server controls” on page 251.

Configuration The Configuration section displays the Channels, Policy Groups, Agent
Configuration, User Device, and Configuration Status for the detection server.

All Agents The All Agents section displays a summary of all agents that are assigned to
an Endpoint Server.

Click the number next to an agent status to view agent details on the System
> Agents > Overview > Summary Reports screen.
Note: The system only displays the Agent Summary section for an Endpoint
Server.

Recent Error and The Recent Error and Warning Events section displays the five most recent
Warning Events Warning or Severe events that have occurred on this server.

Click on an event to show event details. Click show all to display all error and
warning events.

See “About system events” on page 164.

All Recent Events The All Recent Events section displays all events of all severities that have
occurred on this server during the past 24 hours.

Click on an event to show event details. Click show all to display all detection
server events.

Deployed Exact Data The Deployed Exact Data Profile section lists any Exact Data or Document
Profiles Profiles you have deployed to the detection server. The system displays the
version of the index in the profile.

See “Data Profiles” on page 375.

See “About the Overview screen” on page 278.

See “Server configuration—basic” on page 253.
See “Server controls” on page 251.
See “System events reports” on page 165.
See “Server and Detectors event detail” on page 169.
Installing and managing detection servers and cloud detectors 285
Advanced server settings

Advanced server settings

Click Server Settings on the detection server's System > Servers and Detectors > Overview
> Server/Detector Detail screen to modify the settings on that server.
Use caution when modifying these settings on a server. Contact Symantec Support before
changing any of the settings on this screen. Changes to these settings normally do not take
effect until after the server has been restarted.
You cannot change settings for the Enforce Server from the Server/Detector Detail screen.
The Server/Detector Detail - Advanced Settings screen only displays for detection servers
and detectors.

Note: If you change advanced server settings to Endpoint Servers in a load-balanced

environment, you must apply the same changes to all Endpoint Servers in the load-balanced
environment.

Table 14-9 Detection server advanced settings

Setting Default Description

BoxMonitor.Channels Varies The values are case-sensitive and

comma-separated if multiple.
Although any mix of them can be
configured, the following are the
officially supported configurations:

■ Network Monitor Server: Packet

Capture, Copy Rule
■ Discover Server: Discover
■ Endpoint Server: Endpoint
■ Network Prevent for Email:
Inline SMTP
■ Network Prevent for Web: ICAP

BoxMonitor.DetectionServerDatabase on Enables the BoxMonitor process to

start the Automated Incident
Remediation Tracking database on
the Detection Server. If you set this
to off, you must start the
remediation tracking database
manually.

BoxMonitor.DetectionServerDatabaseMemory -Xrs -Xms300M Any combination of JVM memory

-Xmx1024M flags can be used.
Installing and managing detection servers and cloud detectors 286
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

BoxMonitor.DiskUsageError 90 The amount of disk space filled (as

a percentage) that will trigger a
severe system event. For instance,
if Symantec Data Loss Prevention
is installed on the C drive and this
value is 90, then the detection
server creates a severe system
event when the C drive usage is
above 90%.

BoxMonitor.DiskUsageWarning 80 The amount of disk space filled (as

a percentage) that will trigger a
warning system event. For instance,
if Symantec Data Loss Prevention
is installed on the C drive and this
value is 80, then the detection
server generates a warning system
event when the C drive usage is
above 80%.

BoxMonitor.EndpointServer on Enables the Endpoint Server.

BoxMonitor.EndpointServerMemory -Xrs -Xms300M Any combination of JVM memory

-Xmx4096M flags can be used. For example:
-Xrs -Xms300m -Xmx1024m.

BoxMonitor.FileReader on If off, the BoxMonitor cannot start

the FileReader, although it can still
be started manually.

BoxMonitor.FileReaderMemory -Xrs -Xms1200M FileReader JVM command-line

-Xmx4G arguments.

BoxMonitor.HeartbeatGapBeforeRestart 960000 The time interval in milliseconds that

the BoxMonitor waits for a monitor
process (for example, FileReader,
IncidentWriter) to report the
heartbeat. If the heartbeat is not
received within this time interval the
BoxMonitor restarts the process.
Installing and managing detection servers and cloud detectors 287
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

BoxMonitor.IncidentWriter on If off, the BoxMonitor cannot start

the IncidentWriter in the two-tier
mode, although it can still be started
manually. This setting has no effect
in the single-tier mode.

BoxMonitor.IncidentWriterMemory -Xrs IncidentWriter JVM command-line

arguments. For example: -Xrs

BoxMonitor.InitialRestartWaitTime 5000 The time interval in milliseconds that

the BoxMonitor waits after restarting
a monitor process, such FileReader
or IncidentWriter.

BoxMonitor.MaxRestartCount 3 The number of times that a process

can be restarted in one hour before
generating a SEVERE system
event.

BoxMonitor.MaxRestartCountDuringStartup 5 The maximum times that the

monitor server will attempt to restart
on its own.

BoxMonitor.PacketCapture on If off, the BoxMonitor cannot start

PacketCapture, although it can still
be started manually. The
PacketCapture channel must be
enabled for this setting to work.

BoxMonitor.PacketCaptureDirectives -Xrs PacketCapture command line

parameters (in Java). For example:
-Xrs

BoxMonitor.ProcessLaunchTimeout 30000 The time interval (in milliseconds)

for a monitor process (e.g.
FileReader) to start.

BoxMonitor.ProcessShutdownTimeout 45000 The time interval (in milliseconds)

allotted to each monitor process to
shut down gracefully. If the process
is still running after this time the
BoxMonitor attempts to kill the
process.
Installing and managing detection servers and cloud detectors 288
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

BoxMonitor.RequestProcessor on If off, the BoxMonitor cannot start

the RequestProcessor; although, it
can still be started manually. The
Inline SMTP channel must be
enabled for this setting to work.

BoxMonitor.RequestProcessorMemory -Xrs -Xms300M Any combination of JVM memory

-Xmx1300M flags can be used. For example:
-Xrs -Xms300M -Xmx1300M

BoxMonitor.RmiConnectionTimeout 15000 The time interval (in milliseconds)

allowed to establish connection to
the RMI object.

BoxMonitor.RmiRegistryPort 37329 The TCP port on which the

BoxMonitor starts the RMI registry.

BoxMonitor.StatisticsUpdatePeriod 10000 The monitor statistics are updated

after this time interval (in
milliseconds).

Classification.WebserviceLogRetentionDats 7 Specifies the number of days

classification web service logs are
retained.

ContentExtraction.DefaultCharsetForSubFileName N/A Defines the default character set

that is used in decoding the
sub-filename if the charset
conversion fails.
Installing and managing detection servers and cloud detectors 289
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

ContentExtraction.EnableMetaData off Allows detection on file metadata.

If the setting is turned on, you can
detect metadata for Microsoft Office
and PDF files. For Microsoft Office
files, OLE metadata is supported,
which includes the fields Title,
Subject, Author, and Keywords. For
PDF files, only Document
Information Dictionary metadata is
supported, which includes fields
such as Author, Title, Subject,
Creation, and Update dates.
Extensible Metadata Platform
(XMP) content is not detected. Note
that enabling this metadata
detection option can cause false
positives.
Installing and managing detection servers and cloud detectors 290
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

ContentExtraction.ImageExtractorEnabled 1 Allows you to adjust or turn off

content extraction for Form
Recognition.

The default setting, 1, loads the

Image Extractor plug-in on demand.
If one or more Form Recognition
rules are used, the Dynamic Image
Extractor plug-in automatically loads
on the detection server when
corresponding policy updates are
received. When Form Recognition
rules are deleted or disabled, the
plug-in automatically unloads. This
option prevents the Dynamic Image
Extractor plug-in from running if
Form Recognition is not being used.

Enter O to disable the Image

Extractor plug-in. This setting
prevents Form Recognition from
extracting images, effectively
disabling the feature.

Enter 2 if you want the Image

Extractor plug-in load when the
content extraction service launches
after the detection server starts up.
The plugin continues to run
regardless of whether form
Recognition policies have been
configured or not.

ContentExtraction.LongContentSize 1M If the message component exceeds

this size (in bytes) then the
ContentExtraction.LongTimeout
is used instead of
ContentExtraction.ShortTimeout.
Installing and managing detection servers and cloud detectors 291
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

ContentExtraction.LongTimeout Varies The default value for this setting

varies depending on detection
server type (60,000 or 120,000).

The time interval (in milliseconds)

given to the ContentExtractor
to process a document larger than
ContentExtraction.
LongContentSize. If the
document cannot be processed
within the specified time it's reported
as unprocessed. This value should
be greater than
ContentExtraction.
ShortTimeout and less than
ContentExtraction.
RunawayTimeout.

ContentExtraction.MarkupAsText off Bypasses Content Extraction for

files that are determined to be XML
or HTML. This should be used in
cases such as web pages
containing data in the header block
or script blocks. Default is off.

ContentExtraction.MaxContentSize 30M The maximum size (in MB) of the

document that can be processed by
the ContentExtractor.

ContentExtraction.MaxNumImagesToExtract 10 The maximum number of images to

extract from PDF files and
multi-page TIFF documents.

ContentExtraction.RunawayTimeout 300,000 The time interval (in milliseconds)

given to the ContentExtractor to
finish processing of any document.
If the ContentExtractor does not
finish processing some document
within this time it will be considered
unstable and it will be restarted.
This value should be significantly
greater than
ContentExtraction.
LongTimeout.
Installing and managing detection servers and cloud detectors 292
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

ContentExtraction.ShortTimeout 30,000 The time interval (in milliseconds)

given to the ContentExtractor to
process a document smaller than
ContentExtraction.LongContentSize.
If the document cannot be
processed within the specified time
it is reported as unprocessed. This
value should be less than
ContentExtraction.
LongTimeout.

ContentExtraction.TemporaryDirectory N/A Specifies the directory for temporary

content extraction files.

ContentExtraction.TrackedChanges off Allows detection of content that has

changed over time (Track Changes
content) in Microsoft Office
documents.
Note: Using the foregoing option
might reduce the accuracy rate for
IDM and data identifiers. The default
is set to off (disallow).

To index content that has changed

over time, set
ContentExtraction.
TrackedChanges=on in the
Indexer.properties file. The
default and recommended setting
is off.
Installing and managing detection servers and cloud detectors 293
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

DDM.MaxBinMatchSize 30,000,000 The maximum size (in bytes) used

to generate the MD5 hash for an
exact binary match in an IDM. This
setting should not be changed. The
following conditions must be
matched for IDM to work correctly:

■ This setting must be exactly

identical to the max_bin_
match_size setting on the
Enforce Server in the
indexer.properties file.
■ This setting must be smaller or
equal to the FileReader.
FileMaxSize value.
■ This setting must be smaller or
equal to the
ContentExtraction.
MaxContentSize value on the
Enforce Server in the
indexer.properties file.

Note: Changing the first or third

item in the list requires re-indexing
all IDM files.

Detection.EncodingGuessingDefaultEncoding ISO-8859-1 Specifies the backup encoding

assumed for a byte stream.

Detection.EncodingGuessingEnabled on Designates whether the encoding

of unknown byte streams should be
guessed.

Detection.EncodingGuessingMinimumConfidence 50 Specifies the confidence level

required for guessing the encoding
of unknown byte streams.
Installing and managing detection servers and cloud detectors 294
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

Detection.MessageTimeout ReportIntervalInSeconds 3600 Number of seconds between each

System Event published to display
the number of messages that have
timed out recently. These System
Events are scheduled to be
published at a fixed rate, but will be
skipped if no messages have timed
out in that period.

DI.MaxViolations 100 Specifies the maximum number of

violations allowed with data
identifiers.

Discover.CountAllFilteredItems false Provides more accurate scan

statistics by counting the items in
folders skipped because of filtering.

Setting the value to false enables

optimized Discover path filters,
which improve performance but may
occasionally lead to unexpected
filter behavior. Optimized filters
normalize slashes, truncate filter
strings before wildcard characters,
and remove trailing slashes.
Therefore, the filter string /Fol*der
will match /Folder, but it will also
match /FolXYZ.

Set this value to true to disable

optimized Discover path filters.

Discover.Exchange.FollowRedirects true Specifies whether to follow

redirects. Symantec Data Loss
Prevention follows redirects only
from the public root folder.

Discover.Exchange.ScanHiddenItems false Scan hidden items in Exchange

repositories, when set to true.

Discover.Exchange.UseSecureHttpConnections true Specifies whether connections to

Exchange repositories and Active
Directory are secure when using the
Exchange Web Services crawler.
Installing and managing detection servers and cloud detectors 295
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

Discover.IgnorePstMessageClasses IPM.Appointment, This setting specifies a

comma-separated list of .pst
IPM.Contact,
message classes. All items in a
IPM.Task, .pst file that have a message class
in the list will be ignored (no attempt
REPORT. IPM.
will be made to extract the .pst
Note. DR,
item). This setting is case-sensitive.
REPORT. IPM.
Note.IPNRN

Discover.IncludePstMessageClasses IPM.Note This setting specifies a

comma-separated list of .pst
message classes. All items in a
.pst file that have a message class
in the list will be included.

When both the include setting and

the ignore setting are defined,
Discover.IncludePstMessageClasses
takes precedence.

Discover.PollInterval 10000 Specifies the time interval (in

milliseconds) at which Enforce
retrieves data from the Discover
monitor while scanning.

Discover.Sharepoint.FetchACL true Turns off ACL fetching for integrated

SharePoint scans. The default value
is true (on).

Discover.Sharepoint.SocketTimeout 60000 Sets the timeout value of the socket

connection (in milliseconds)
between the Network Discover
server and the SharePoint target.
Installing and managing detection servers and cloud detectors 296
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

Discover.ValidateSSLCertificates false Set to true to enable validation of

the SSL certificates for the HTTPS
connections for SharePoint and
Exchange targets. When validation
is enabled, scanning SharePoint or
Exchange servers using self-signed
or untrusted certificates fails. If the
SharePoint web application or
Exchange server is signed by a
certificate issued by a certificate
authority (CA), then the server
certificate or the server CA
certificate must reside in the Java
trusted keystore used by the
Discover Server. If the certificate is
not in the keystore, you must import
it manually using the keytool
utility.

See “Importing SSL certificates to

Enforce or Discover servers”
on page 277.

EDM.HighlightAllMatchesInProximity false If false (default), the system

highlights the minimum number of
matches, starting from the leftmost.
For example, if the EDM policy is
configured to match 3 out of 8
column fields in the index, only the
first 3 matches are highlighted in the
incident snapshot.

If true, the system highlights all

matches occurring in the proximity
window, including duplicates. For
example, if the policy is configured
to match 3 of 8 and there are 7
matches occurring within the
proximity window, the system
highlights all 7 matches in the
incident snapshot.
Installing and managing detection servers and cloud detectors 297
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

EDM.MatchCountVariant 3 Specifies how matches are counted.

■ 1 - Counts the total number of
token sets matched.
■ 2 - Counts the number of unique
token sets matched.
■ 3 - Counts the number of unique
super sets of token sets.
(default)

See “Configuring Advanced Settings

for EDM policies” on page 557.

EDM.MaximumNumberOfMatchesToReturn 100 Defines a top limit on the number of

matches returned from each RAM
index search.

See “Configuring Advanced Settings

for EDM policies” on page 557.

EDM.RunProximityLogic true If true, runs the token proximity

check.

See “Configuring Advanced Settings

for EDM policies” on page 557.

EDM.SimpleTextProximityRadius 35 Number of tokens that are

evaluated together when the
proximity check is enabled.

See “Configuring Advanced Settings

for EDM policies” on page 557.

EDM.TokenVerifierEnabled false If enabled (true), the server

validates tokens for Chinese,
Japanese, and Korean (CJK)
keywords.

Default is disabled (false).

Installing and managing detection servers and cloud detectors 298
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

EndpointCommunications. 0 If enabled, limits the transfer rate of

AllConnInboundDataThrottleInKBPS all inbound traffic in kilobits per
second.

Default is disabled.

Changes to this setting apply to all

new connections. Changes do not
affect existing connections.

EndpointCommunications. 0 If enabled, limits the transfer rate of

AllConnOutboundDataThrottleInKBPS all outbound traffic in kilobits per
second.

Default is disabled.

Changes to this setting apply to all

new connections. Changes do not
affect existing connections.

EndpointCommunications. 60 Maximum time for server to wait for

ApplicationHandshakeTimeoutInSeconds each round trip during application
handshake communications before
closing the server-to-agent
connection.

Applies to the duration of time

between when the agent accepts
the TCP connection and when the
agent receives the handshake
message. This duration includes the
SSL handshake and the agent
receiving the HTTP headers. If the
process exceeds the specified
duration, the connection closes.

Changes to this setting apply to all

new connections. Changes do not
affect existing connections.

EndpointCommunications.MaxActiveAgentsPerServer 90000 Sets the maximum number of

agents associated with a given
server at any moment in time.

This setting is implemented after the

next Endpoint Server restart.
Installing and managing detection servers and cloud detectors 299
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

EndpointCommunications. 150000 Sets the maximum number of

MaxActiveAgentsPerServerGroup agents that will be associated with
a given group of servers behind the
same local load balancer at any
moment in time. Used for maximum
sizes of caches for internal endpoint
features.

This setting is implemented after the

next Endpoint Server restart.

EndpointCommunications.MaxConcurrentConnections 90000 Sets the maximum number of

simultaneous connections to allow.

Changes to this setting apply to all

new connections. Changes do not
affect existing connections.

EndpointCommunications. 86400 (1 day) Sets the maximum time to allow a

MaxConnectionLifetimeInSeconds connection to remain open. Do not
set connections to remain open
indefinitely. Connections that close
ensure that SSL session keys are
frequently updated to improve
security. This timeout only applies
during the normal operation phase
of a connection, after the SSL
handshake and application
handshake phases of a connection.

This setting is implemented

immediately to all connections.

EndpointCommunications.ShutdownTimeoutInMillis 5000 (5 seconds) Sets the maximum time to wait to

gracefully close connections during
shutdown before forcing
connections to close.

This setting is implemented

immediately to all connections.
Installing and managing detection servers and cloud detectors 300
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

EndpointCommunications.SSLCipherSuites TLS_RSA_WITH_ Lists the allowed SSL cipher suites.

AES_128_CBC_SHA Enter multiple entries, separated by
commas.

Changes to this setting apply to all

new connections. Changes do not
affect existing connections. You
must restart the Endpoint Server for
changes you make to take effect.
See “Server controls” on page 251.

If you are using FIPS 140-2 mode

for communication between the
Endpoint Server and DLP Agents,
do not use Diffie-Hellman (DH)
cipher suites. Mixing cipher suites
prevents the agent and Endpoint
Server from communicating.

EndpointCommunications. 86400 Sets the maximum SSL session

SSLSessionCacheTimeoutInSeconds entry lifetime in the SSL session
cache.

The default settings equals one day.

This setting is implemented after the
next Endpoint Server restart.

EndpointMessageStatistics.MaxFileDetectionCount 100 The maximum number of times a

valid file will be scanned. The file
must not cause an incident. After
exceeding this number, a system
event is generated recommending
that the file be filtered out.

EndpointMessageStatistics.MaxFolderDetectionCount 1800 The maximum number of times a

valid folder will be scanned. The
folder must not cause an incident.
After exceeding this number, a
system event is generated
recommending that the file be
filtered out.
Installing and managing detection servers and cloud detectors 301
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

EndpointMessageStatistics.MaxMessageCount 2000 The maximum number of times a

valid message will be scanned. The
message must not cause an
incident. After exceeding this
number, a system event is
generated recommending that the
file be filtered out.

EndpointMessageStatistics.MaxSetSize 3 The maximum list of hosts displayed

from where valid files, folders, and
messages come. When a system
event for

EndpointMessageStatistics.

MaxFileDetectionCount,

EndpointMessageStatistics.

MaxFolderDetectionCount,

or EndpointMessageStatistics.

MaxMessageCount is generated,
Symantec Data Loss Prevention
lists the host machines where these
system events were generated. This
setting limits the number of hosts
displayed in the list.

EndpointServer.Discover.ScanStatusBatchInterval 60000 The interval of time in milliseconds

the Endpoint Server accumulates
Endpoint Discover scan statuses
before sending them to the Endpoint
Server as a batch.
Installing and managing detection servers and cloud detectors 302
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

EndpointServer.Discover.ScanStatusBatchSize 1000 The number of scan statuses the

Aggregator accumulates before
sending them to the Enforce Server
as a batch. The Endpoint Server
forwards a batch of statuses to the
Enforce Server when the status
count reaches the configured value.

The batch is forwarded to the

Enforce Server when any of the
thresholds for the following settings
are met:

■ EndpointServer.Discover.
ScanStatusBatchInterval
■ EndpointServer.Discover.
ScanStatusBatchSize

EndpointServer.EndpointSystemEventQueueSize 20000 The maximum number of system

events that can be stored in the
endpoint agent's queue to be sent
to the Endpoint Server. If the
database connection is lost or some
other occurrence results in a
massive number of system events,
any additional system events that
occur after this number is reached
are discarded. This value can be
adjusted according to memory
requirements.

EndpointServer.MaxPercentage 60 The maximum amount (in

MemToStoreEndpointFiles percentage) of memory to use to
store shadow cache files.

EndpointServer.MaxTimeToKeepEndpointFilesOpen 20000 The time interval (in minutes) that

the endpoint file is kept open or the
file size can exceed the
EndpointServer.
MaxEndpointFileSize setting,
whichever occurs first.

EndpointServer.MaxTimeToWaitForWriter 1000 The maximum time (in milliseconds)

that the agent will wait to connect
to the server.
Installing and managing detection servers and cloud detectors 303
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

EndpointServer.NoOfRecievers 15 The number of endpoint shadow

cache file receivers.

EndpointServer.NoOfWriters 10 The number of endpoint shadow

cache file writers.

FileReader.MaxFileSize 30M The maximum size (in MB) of a

message to be processed. Larger
messages are truncated to this size.
To process large files, ensure that
this value is equal to or greater than
the value of
ContentExtraction.MaxContentSize.

FileReader.MaxFileSystemCrawlerMemory 1024M The maximum memory that is

allocated for the File System
Crawler. If this value is less than
FileReader.MaxFileSize, then
the greater of the two values is
assigned.

FileReader.MaxReadGap 15 The time that a child process can

have data but not have read
anything before it stops sending
heartbeats.

FileReader.ScheduledInterval 1000 The time interval (in milliseconds)

between drop folder checks by the
filereader. This affects Copy Rule,
Packet Capture, and File System
channels only.

FileReader.TempDirectory Path to a secure A secure directory on the detection

directory as specified in server in which to store temporary
the filereader. files for the file reader.
temp. io.dir
attribute in the
FileReader.
properties
configuration file.

FormRecognition.ALIGNMENT_COEFFICIENT 85.00 A threshold on a scale from 0 to

100, indicating how well an image
should align with an indexed gallery
form in order to create an incident.
Installing and managing detection servers and cloud detectors 304
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

FormRecognition.CANONICAL_FORM_WIDTH 930 The width in pixels to which all

images are internally resized for
form recognition.

Icap.AllowHosts any The default value of "any" permits

all systems to make a connection
to the Network Prevent for Web
Server on the ICAP service port.
Replacing "any" with the IP address
or Fully-Qualified Domain Name
(FQDN) of one or more systems
restricts ICAP connections to just
those designated systems. To
designate multiple systems,
separate their IP addresses of
FQDNs by commas.

Icap.AllowStreaming false If true, ICAP output is streamed to

the proxy directly without buffering
the ICAP request first.

Icap.BufferSize 3K The size (in kilobytes) of the

memory buffer used for ICAP
request streaming and chunking.
The streaming can happen only if
the request is larger than
FileReader.MaxFileSize and the
request has a Content-Length
header.
Installing and managing detection servers and cloud detectors 305
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

Icap.DisableHealthCheck false If true, disables the ICAP periodic

self-check. If false, enables the
ICAP periodic self-check. This
setting is useful for debugging to
remove clutter produced by
self-check requests from the logs.

Icap.EnableIncidentSuppression true Enables the Incident Suppression

cache for Gmail Tablet ICAP traffic.

Icap.EnableTrace false If set to true, protocol debug tracing

is enabled once a folder is specified
using the Icap.TraceFolder setting.

Icap.ExchangeActiveSyncCommandsToInspect SendMail A comma-separated, case-sensitive

list of ActiveSync commands which
need to be sent through Symantec
Data Loss Prevention detection. If
this parameter is left blank,
ActiveSync support is disabled. If
this parameter is set to "any", all
ActiveSync commands are
inspected.

Icap.IncidentSuppressionCacheCleanupInterval 120000 The time interval in milliseconds for

running the Incident Suppression
cache clean-up thread.

Icap.IncidentSuppressionCacheTimeout 120000 The time in milliseconds to

invalidate the Incident Suppression
cache entry.

Icap.LoadBalanceFactor 1 The number of web proxy servers

that a Network Prevent for
Webserver is able to communicate
with. For example, if the server is
configured to communicate with 3
proxies, set the
Icap.LoadBalanceFactor value
to 3.

Icap.SpoolFolder N/A This value is needed for ICAP

Spools.
Installing and managing detection servers and cloud detectors 306
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

Icap.TraceFolder N/A The fully qualified name of the folder

or directory where protocol debug
trace data is stored when the
Icap.EnableTrace setting is true.
By default, the value for this setting
is left blank.

ImagePreclassifier.ENABLE_FORM_RECOGNITION true Determines what types of images

_PRECLASSIFIER are processed for form recognition.
If true, Symantec Data Loss
Prevention filters out colorful
photographs, images such as logos,
email signatures, and other images
that are not characteristic of forms.
If false, Symantec Data Loss
Prevention processes all images.

ImagePreclassifier.ENABLE_OCR_PRECLASSIFIER true Determines what types of images

are processed for optical character
recognition (OCR). If true,
Symantec Data Loss Prevention
filters out colorful photographs,
images such as logos, email
signatures, and other images that
do not include meaningful text. If
false, Symantec Data Loss
Prevention processes all images.

ImageRecognition.NUM_WORKER_THREADS 2 The number of threads in the pool

used by the image recognition
detection process. The value for this
setting should equal half of the
number of physical cores on your
system.

IncidentDetection.IncidentLimitResetTime 86400000 Specifies the time frame (in

milliseconds) used by the

IncidentDetection.

MaxIncidentsPerPolicy

setting. The default setting

86400000 equals one day.
Installing and managing detection servers and cloud detectors 307
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

IncidentDetection.MaxContentLength 2000000 Applies only to regular expression

rules. On a per-component basis,
only the first MaxContentLength
number of characters are scanned
for violations. The default
(2,000,000) is equivalent to > 1000
pages of typical text. The limiter
exists to prevent regular expression
rules from taking too long.

IncidentDetection.MaxIncidentsPerPolicy 10000 Defines the maximum number of

incidents detected by a specific
policy on a particular monitor within
the time-frame specified in the

IncidentDetection.

IncidentTimeLimitResetTime.

The default is 10,000 incidents per

policy per time limit.

IncidentDetection.MessageWaitSevere 240 The number of minutes to wait

before sending a severe system
event about message wait times.

IncidentDetection.MessageWaitWarning 60 The number of minutes to wait

before sending a warning system
event about message wait times.

IncidentDetection.MinNormalizedSize 30 This setting applies to IDM

detection. It MUST be kept in sync
with the corresponding setting in the
Indexer.properties file on the
Enforce Server (which applies to
indexing). Derivative detections only
apply to messages when their
normalized content is greater than
this setting. If the normalized
content size is less than this setting,
IDM detection does a straight binary
match.
Installing and managing detection servers and cloud detectors 308
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

IncidentDetection.patternConditionMaxViolations 100 The maximum number of matches

a detection server reports. The
detection server does not report
matches more than the value of the

IncidentDetection.

patternConditionMaxViolations

parameter, even if there are any.

IncidentDetection.StopCachingWhenMemoryLowerThan 400M Instructs Detection to stop caching

tokenized and cryptographic content
between rule executions if the
available JVM memory drops below
this value (in megabytes). Setting
this attribute to 0 enables caching
regardless of the available memory
and is not recommended because
OutOfMemoryErrors may occur.

Setting this attribute to a value close

to, or larger than, the value of the
-Xmx option in
BoxMonitor.FileReaderMemory
effectively disables the caching.

Note that setting this value too low

can have severe performance
consequences.

IncidentDetection.TrialMode false Prevention trial mode setting to

generate prevention incidents
without having a prevention setup.

If true, SMTP incidents coming from

the Copy Rule and Packet Capture
channels appear as if they were
prevented and HTTP incidents
coming from Packet Capture
channel appear as if they were
prevented.

IncidentWriter.BacklogInfo 1000 The number of incidents that collect

in the log before an information level
message about the number of
messages is generated.
Installing and managing detection servers and cloud detectors 309
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

IncidentWriter.BacklogSevere 10000 The number of incidents that collect

in the log before a severe level
message about the number of
messages is generated.

IncidentWriter.BacklogWarning 3000 The number of incidents that collect

in the log before a warning level
message about the number of
messages is generated.

IncidentWriter.ResolveIncidentDNSNames false If true, only recipient host names

are resolved from IP.

IncidentWriter.ShouldEncryptContent true If true, the monitor will encrypt the

body of every message, message
component and cracked component
before writing to disk or sending to
Enforce.

Keyword.TokenVerifierEnabled false Default is disabled (false).

If enabled (true), the server

validates tokens for Asian language
keywords (Chinese, Japanese, and
Korean).

See “Enabling and using CJK token

verification for server keyword
matching” on page 847.

L7.cleanHttpBody true If true, the HTML entity references

are replaced with spaces.
Installing and managing detection servers and cloud detectors 310
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

L7.DefaultBATV Standard This setting determines the tagging

scheme that Network Prevent for
Email uses to interpret Bounce
Address Tag Validation (BATV) tags
in the MAIL FROM header of a
message. If this setting is
“Standard” (the default), Network
Prevent uses the tagging scheme
described in the BATV specification:

http://tools.ietf.org/html/

draft-levine-mass-batv-02

Change this setting to “Ironport” to

enable compatibility with the
IronPort proxy’s implementation of
BATV tagging.

L7.DefaultUrlEncodedCharset UTF-8 Defines the default character set to

be used in decoding query
parameters or URL-encoded body
when the character set information
is missing from the header.

L7.discardDuplicateMessages true If true, the Monitor ignores duplicate

messages based on the
messageID.
Installing and managing detection servers and cloud detectors 311
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

See http://tools.ietf.org/html/

draft-levine-mass-batv-02 for more

information about BATV.

L7.httpClientIdHeader X-Forwarded-For The sender identifier header name.

L7.MAX_NUM_HTTP_HEADERS 30 If any HTTP message that contains

more than the specified header
lines, it is discarded.

L7.maxWordLength 30 The maximum word length (in

characters) allowed in UTCP string
extraction.

L7.messageIDCacheCleanupInterval 600000 The length of time that the

messageID is cached. The system
will not cache duplicate messages
during this time period if the
L7.discardDuplicateMessages
setting is set to true.
Installing and managing detection servers and cloud detectors 312
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

L7.minSizeOfGetUrl 100 The minimum size of the GET URL

to process. HTTP GET actions are
not inspected by Symantec Data
Loss Prevention for policy violations
if the number of bytes in the URL is
less than the value of this setting.
For example, with the default value
of 100, no detection check is
performed when a browser displays
the Symantec web site at:
http://www.symantec.com/index.jsp.
The reason is that the URL contains
only 33 characters, which is less
than the 100 minimum.
Note: Other request types such as
POST or PUT are not affected by
L7.minSizeofGetURL. In order for
Symantec Data Loss Prevention to
inspect any GET actions at all, the
L7.processGets setting must be set
to true.

L7.processGets true If true, the GET requests are

processed. If false, the GET
requests are not processed. Note
that this setting interacts with the
L7.minSizeofGetURL setting.

Lexer.IncludePunctuationInWords true If true, punctuation characters

internal to a token are considered
during detection.

See “Configuring Advanced Settings

for EDM policies” on page 557.
Installing and managing detection servers and cloud detectors 313
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

Lexer.MaximumNumberOfTokens 30000 Maximum number of tokens

extracted from each message
component for detection. Applicable
to all detection technologies where
tokenization is required (EDM,
profiled DGM, and the system
patterns supported by those
technologies). Increasing the default
value may cause the detection
server to run out of memory and
restart.

See “Configuring Advanced Settings

for EDM policies” on page 557.

Lexer.Validate true If true, performs system

pattern-specific validation.

See “Configuring Advanced Settings

for EDM policies” on page 557.

MessageChain.ArchiveTimedOutStreams false Specifies whether messages should

be archived to the temp folder

MessageChain.CacheSize 8 Limits the number of messages that

can be queued in the message
chains.

MessageChain.ContentDumpEnabled false If set to true, each message

entering the detection message
chain is logged to
${\SymantecDLP.temp.dir\}/dump.
This setting is intended for use in
troubleshooting and debugging.

MessageChain.MaximumComponentTime 60,000 The time interval (in milliseconds)

allowed before any chain
component is restarted.

MessageChain.MaximumFailureTime 360000 Number of milliseconds that must

elapse before restarting the file
reader. This is tracked after a
message chain error is detected
and that message chain has not
been recovered.
Installing and managing detection servers and cloud detectors 314
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

MessageChain.MaximumMessageTime Varies This setting varies between is either

600,000 or 1,800,000 depending on
detection server type.

The maximum time interval (in

milliseconds) that a message can
remain in a message chain.

MessageChain.MemoryThrottlerReservedBytes 200,000,000 Number of bytes required to be

available before a message is sent
through the message chain. This
setting can avoid out of memory
issues. The default value is 200 MB.
The throttler can be disabled by
setting this value to 0.

MessageChain.MinimumFailureTime 30000 Number of milliseconds that must

elapse before failure of a message
chain is tracked. Failure eventually
leads to restarting the message
chain or file reader.

MessageChain.NumChains Varies This number varies depending on

detection server type. It is either 4
or 8.
The number of messages, in
parallel, that the file reader will
process. Setting this number higher
than 8 (with the other default
settings) is not recommended. A
higher setting does not substantially
increase performance and there is
a much greater risk of running out
of memory. Setting this to less than
8 (in some cases 1) helps when
processing big files, but it may slow
down the system considerably.
Installing and managing detection servers and cloud detectors 315
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

MessageChain.StopProcessing 200M Instructs detection to stop drilling

WhenMemoryLowerThan down into and processing sub-files
if JVM available memory drops
below this value. Setting this
attribute to 0 will force sub-file
processing, regardless of how little
memory is available. Setting this
attribute to a value close to or larger
than the value of the -Xmx option
in
BoxMonitor.FileReaderMemory
will effectively disable sub-file
processing.

OCR.ENABLE_AUTO_LANGUAGE_DETECTION true When true, this setting enables the

OCR engine to extract text more
quickly by automatically identifying
the language or languages in an
image, rather than processing every
language in the OCR configuration.
When false, the OCR engine
extracts the text using every
language in the OCR configuration,
making text extraction slower but
improving accuracy.

OCR.ENABLE_SPELL_CHECK true When true, this setting enables the

OCR engine to extract text more
accurately by using internal spelling
dictionaries. When false, the
accuracy of extracted text may be
reduced.

OCR. RECORD_REQUEST _STATISTICS false When true, this setting enables the
OCR sizing tool. The OCR sizing
tool gives you insight into your
image traffic data, which helps you
determine the sizing requirements
for your OCR implementation.

PacketCapture.DISCARD_HTTP_GET true If true, discards HTTP GET

streams.
Installing and managing detection servers and cloud detectors 316
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

PacketCapture.DOES_DISCARD_ false If true, a list of tcpstreams is

TRIGGER_STREAM_DUMP dumped to an output file in the log
directory the first time a discard
message is received.

PacketCapture.ENDACE_BIN_PATH N/A To enable packet-capture using an

Endace card, enter the path to the
Endace /bin directory. Note that
environment variables (such as
%ENDACE_HOME%) cannot be used
in this setting. For example:
/usr/local/bin

PacketCapture.ENDACE_LIB_PATH N/A To enable packet-capture using an

Endace card, enter the path to the
Endace /lib directory. Note that
environment variables (such as
%ENDACE_HOME%) cannot be used
in this setting. For example:
/usr/local/lib

PacketCapture.ENDACE_XILINX_PATH N/A To enable packet-capture using an

Endace card, enter the path to the
Endace /xilinx directory. Note that
environment variables (such as
%ENDACE_HOME%) cannot be used
in this setting. For example:
/usr/local/dag/xilinx

PacketCapture.Filter tcp || ip proto 47 || When set to the default value all

(vlan && (tcp || ip non-TCP packets are filtered out
proto 47)) and not sent to Network Monitor.
The default value can be overridden
using the tcpdump filter format
documented in the tcpdump
program. This setting allows
specialists to create more exact
filters (source and destination IPs
for given ports).

PacketCapture.INPUT_SOURCE_FILE /dummy.dmp The full path and name of the input

file.
Installing and managing detection servers and cloud detectors 317
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

PacketCapture.IS_ARCHIVING_PACKETS false DO NOT USE THIS FIELD.

Diagnostic setting that creates
dumps of packets captured in
packetcapture for later reuse. This
feature is unsupported and does not
have normal error checking. May
cause repeated restarts on pcap.

PacketCapture.IS_ENDACE_ENABLED false To enable packet-capture using an

Endace card, set this value to true.

PacketCapture.IS_FTP_RETR_ENABLED false If true, FTP GETS and FTP PUTS

are processed. If false, only process
FTP PUTS are processed.

PacketCapture.IS_INPUT_SOURCE_FILE false If true, continually reads in packets

from a tcpdump formatted file
indicated in INPUT_SOURCE_FILE.
Set to dag when an Endace card is
installed.

PacketCapture.IS_NAPATECH_ENABLED false To enable packet-capture using a

Napatech card, set this value to
true. The default setting is false.

PacketCapture.KERNEL_BUFFER_SIZE_I686 64M For 32-bit Linux platforms, this

setting specifies the amount of
memory allocated to buffer network
packets. Specify K for kilobytes or
M for megabytes. Do not specify a
value larger than 128M.

PacketCapture.KERNEL_BUFFER_SIZE_Win32 16M For 32-bit Windows platforms, this

setting specifies the amount of
memory allocated to buffer network
packets. Specify K for kilobytes or
M for megabytes.

PacketCapture.KERNEL_BUFFER_SIZE_X64 64M For 64-bit Windows platforms, this

setting specifies the amount of
memory allocated to buffer network
packets. Specify K for kilobytes or
M for megabytes.
Installing and managing detection servers and cloud detectors 318
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

PacketCapture.KERNEL_BUFFER_SIZE_X86_64 64M For 64-bit Linux platforms, this

setting specifies the amount of
memory allocated to buffer network
packets. Specify K for kilobytes or
M for megabytes. Do not specify a
value larger than 64M.

PacketCapture.MAX_FILES_PER_DIRECTORY 30000 After the specified number of file

streams are processed a new
directory is created.

PacketCapture.MBYTES_LEFT_ 1000 If the amount of disk space (in MB)

TO_DISABLE_CAPTURE left on the drop_pcap drive falls
below this specification, packet
capture is suspended. For example,
if this number is 100, pcap will stop
writing out drop_pcap files when
there is less than 100 MB on the
installed drive

PacketCapture.MBYTES_REQUIRED 1500 The amount of disk space (in MB)

_TO_RESTART_CAPTURE needed on the drop_pcap drive
before packet capture resumes
again after stopping due to lack of
space. For example, if this value is
150 and packet capture is
suspended, packet capture resumes
when more than 150 MB is available
on the drop_pcap drive.

PacketCapture.NAPATECH_TOOLS_PATH N/A This setting specifies the location of

the Napatech Tools directory. This
directory is not set by default. If
packet-capture is enabled for
Napatech, enter the fully qualified
path to the Napatech Tools
installation directory.
Installing and managing detection servers and cloud detectors 319
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

PacketCapture.NO_TRAFFIC_ALERT_PERIOD 86,400 The refresh time (in seconds),

between no traffic alert messages.
No traffic system events are created
for a given protocol based on this
time period. For instance, if this is
set to 24*60*60 seconds, a new
message is sent every day that
there is no new traffic for a given
protocol. Do not confuse with the
per protocol traffic timeout, that tells
us how long we initially go without
traffic before sending the first alert.

PacketCapture.NUMBER_BUFFER_ 600000 The number of standard-sized

POOL_PACKETS preallocated packet buffers used to
buffer and sort incoming traffic.

PacketCapture.NUMBER_JUMBO_ POOL_PACKETS 1 The number of large-sized

preallocated packet buffers that are
used to buffer and sort incoming
traffic.

PacketCapture.NUMBER_SMALL_ POOL_PACKETS 200000 The number of small-sized

preallocated packet buffers that are
used to buffer and sort incoming
traffic.

PacketCapture.RING_CAPTURE_LENGTH 1518 Controls the amount of packet data

that is captured. The default value
of 1518 is sufficient to capture
typical Ethernet networks and
Ethernet over 802.1Q tagged
VLANs.
Installing and managing detection servers and cloud detectors 320
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

PacketCapture.RING_DEVICE_MEM 67108864 This setting is deprecated. Instead,

use the PacketCapture.KERNEL_
BUFFER_SIZE_I686 setting (for
32-bit Linux platforms) or the
PacketCapture.KERNEL_
BUFFER_SIZE_X86_64 setting (for
64-bit Linux platforms).

Specifies the amount of memory (in

bytes) to be allocated to buffer
packets per device. (The default of
67108864 is equivalent to 64MB.)

PacketCapture.SIZE_BUFFER_POOL_PACKETS 1540 The size of standard-sized buffer

pool packets.

PacketCapture.SIZE_JUMBO_POOL_PACKETS 10000 The size of jumbo-sized buffer pool

packets.

PacketCapture.SIZE_SMALL_POOL_PACKETS 150 The size of small-sized buffer pool

packets.

PacketCapture.SPOOL_DIRECTORY N/A The directory in which to spool

streams with large numbers of
packets. This setting is user
defined.

PacketCapture.STREAM_WRITE_TIMEOUT 5000 The time (in milliseconds) between

each count (StreamManager's write
timeout)

RequestProcessor.AddDefaultHeader true If true, adds a default header to

every email processed (when in
Inline SMTP mode). The default
header is
RequestProcessor.DefaultHeader.
This header is added to all
messages that pass through the
system, i.e., if it is redirected, if
another header is added, if the
message has no policy violations
then the header is added.
Installing and managing detection servers and cloud detectors 321
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

RequestProcessor.AddHeaderOnMessageTimeout false The default value sets the system

to continue sending messages if
there is a message timeout.

Set to true, then the X-Header

"X-Symantec-DLP: Message timed
out (potential Enforce System event
1213)” is inserted in the email
message. The downstream edge
MTA uses this header information
to handle the message, and the log
message displays “Passed
message through due to timeout,
with added timeout header.”

RequestProcessor.AllowExtensions 8BITMIME VRFY DSN This setting lists the SMTP protocol
HELP PIPELINING extensions that Network Prevent for
SIZE Email can use when it
ENHANCEDSTATUSCODES communicates with other MTAs.
STARTTLS

RequestProcessor.AllowHosts any The default value of any permits all

systems to make connections to the
Network Prevent for Email Server
on the SMTP service port.
Replacing any with the IP address
or Fully-Qualified Domain Name
(FQDN) of one or more systems
restricts SMTP connections to just
those designated systems. To
designate multiple systems,
separate their addresses with
commas. Use only a comma to
separate addresses; do not include
any spaces between the addresses.

RequestProcessor.AllowUnauthenticatedConnections false The default value ensures that

MTAs must authenticate with
Network Prevent for Email for TLS
communication.

RequestProcessor.Backlog 12 The backlog that the request

processor specifies for the server
socket listener.
Installing and managing detection servers and cloud detectors 322
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

RequestProcessor.BindAddress 0.0.0.0 IP address to which a Network

Prevent for Email Server listener
binds. When BindAddress is
configured, the server will only
answer a connection to that IP
address. The default value of
0.0.0.0 is a wild card that permits
listening to all available addresses
including 127.0.0.1.

RequestProcessor.BlockStatusCodeOverride 5.7.1 Enables overriding of the ESMTP

status code sent back to the
upstream MTA when executing a
block response rule.

Accepted values are 5.7.0 and

5.7.1. If any other values are
entered, this setting will fall back to
the default of 5.7.1.

Use of the 5.7.0 value (other or

undefined security status) is
preferred when the detection server
is working with Office365 email,
because the 5.7.1 value provides
an incorrect context for the
Office365 use case.

RequestProcessor.CacheCleanupInterval 120000 Specifies the interval after which the

cached responses are cleaned from
the cache. Units are in milliseconds.

RequestProcessor.CachedMessageTimeout 120000 Specifies the amount of time after

generation when a given cached
response can be cleared from the
cache. Units are in milliseconds.

RequestProcessor.CacheEnabled false Enables caching of responses for

duplicate SMTP messages. The
cache was added as part of the
cloud solution to support envelope
splitting.
Installing and managing detection servers and cloud detectors 323
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

RequestProcessor.DefaultCommandTimeout 300 Specifies the number of seconds

the Network Prevent for Email
Server waits for a response to an
SMTP command before closing
connections to the upstream and
downstream MTAs. The default is
300 seconds. This setting does not
apply to the "." command (the end
of a DATA command). Do not
modify the default without first
consulting Symantec support.

RequestProcessor.DefaultPassHeader X-CFilter-Loop: This is the default header that will

Reflected be added if RequestProcessor.
AddDefaultPassHeader is set to
true, when in Inline SMTP mode.
Must be in a valid header format,
recommended to be an X header.

RequestProcessor.DotCommandTimeout 600 Specifies the number of seconds

the Network Prevent for Email
Server waits for a response to the
"." command (the end of a DATA
command) before closing
connections to the upstream and
downstream MTAs. The default is
600 seconds. Do not modify the
default without first consulting
Symantec support.

RequestProcessor.ForwardConnectionTimeout 20000 The timeout value to use when

forwarding to an MTA.

RequestProcessor.KeyManagementAlgorithm SunX509 The key management algorithm

used in TLS communication.

RequestProcessor.MaxLineSize 1048576 The maximum size (in bytes) of data

lines expected from an external
MTA. If the data lines are larger
than they are broken down to this
size.

RequestProcessor.Mode ESMTP Specifies the protocol mode to use

(SMTP or ESMTP).
Installing and managing detection servers and cloud detectors 324
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

RequestProcessor.MTAResubmitPort 10026 This is the port number used by the

request processor on the MTA to
resend the SMTP message.

RequestProcessor.NumberOfDNSAttempts 4 The maximum number of DNS

queries that Network Prevent for
Email performs when it attempts to
obtain mail exchange (MX) records
for a domain. Network Prevent for
Email uses this setting only if you
have enabled MX record lookups.

RequestProcessor.RPLTimeout 360000 The maximum time in milliseconds

allowed for email message
processing by a Prevent server. Any
email messages not processed
during this time interval are passed
on by the server.

RequestProcessor.ServerSocketPort 10025 The port number to be used by the

SMTP monitor to listen for incoming
connections from MTA.

RequestProcessor.TagHighestSeverity false When set to true, an additional

email header that reports the
highest severity of all the violated
policies is added to the message.
For example, if the email violated a
policy of severity HIGH and a policy
of severity LOW, it shows:
X-DLP-MAX-Severity:HIGH.

RequestProcessor.TagPolicyCount false When set to true an additional email

header reporting the total number
of policies that the message violates
is added to the message. For
example, if the message violates 3
policies a header reading:
X-DLP-Policy-Count: 3 is added.
Installing and managing detection servers and cloud detectors 325
Advanced server settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

RequestProcessor.TagScore false When set to true an additional email

header reporting the total
cumulative score of all the policies
that the message violates is added
to the message. Scores are
calculated using the formula:
High=4, Medium=3, Low=2, and
Info=1. For example, if a message
violates three policies, one with a
severity of medium and two with a
severity of low a header reading:
X-DLP-Score: 7 is added.

RequestProcessor.TrustManagementAlgorithm PKIX The trust management algorithm

that Network Prevent for Email uses
when it validates certificates for TLS
communication. You can optionally
specify a built-in Java trust manager
algorithm (such as SunX509 or
SunPKIX) or a custom algorithm
that you have developed.

RequestProcessorListener.ServerSocketPort 12355 The local TCP port that FileReader

will use to listen for connections
from RequestProcessor on a
Network Prevent server.

ServerCommunicator.CONNECT_ 60 The delay time (in seconds) after

which a detection server returning
DELAY_POST_WAKEUP_
online attempts to connect to the
OR_POST_VPN_ Enforce Server. The default value
is 60 seconds. The range for this
SECONDS
setting is 30 to 600 seconds.

SocketCommunication.BufferSize 8K The size of the buffer that Network

Prevent for Web uses to process
ICAP requests. Increase the default
value only if you need to process
ICAP requests that are greater than
8K. Certain features, such as Active
Directory authentication, may
require an increas in buffer size.
Installing and managing detection servers and cloud detectors 326
Advanced detector settings

Table 14-9 Detection server advanced settings (continued)

Setting Default Description

UnicodeNormalizer.AsianCharRanges default Can be used to override the default

definition of characters that are
considered Asian by the detection
engine. Must be either default, or a
comma-separated list of ranges, for
example: 11A80-11F9,3200-321E

UnicodeNormalizer.Enabled on Can be used to disable Unicode

normalization.

Enter off to disable.

UnicodeNormalizer.NewlineEliminationEnabled on Can be used to disable newline

elimination for Asian languages.

Enter off to disable.

See “About Symantec Data Loss Prevention administration” on page 82.

See “Advanced agent settings” on page 2372.
See “About the Overview screen” on page 278.
See “Server/Detector Detail screen” on page 283.
See “Server configuration—basic” on page 253.
See “Server controls” on page 251.

Advanced detector settings

Click Detector Settings on the detector's System > Servers and Detectors > Overview >
Server/Detector Detail screen to modify the settings on that server.
Use caution when modifying these settings on a detector. Contact Symantec Support before
changing any of the settings on this screen. Changes to these settings normally do not take
effect until after the detector has been restarted.
You cannot change settings for the Enforce Server from the Server/Detector Detail screen.
The Server/Detector Detail - Advanced Settings screen only displays for detection servers
and detectors.
Installing and managing detection servers and cloud detectors 327
Advanced detector settings

Table 14-10 Detector advanced settings

Setting Default Description

ContentExtraction.EnableMetaData off Allows detection on file metadata. If the setting is

turned on, you can detect metadata for Microsoft
Office and PDF files. For Microsoft Office files, OLE
metadata is supported, which includes the fields
Title, Subject, Author, and Keywords. For PDF files,
only Document Information Dictionary metadata is
supported, which includes fields such as Author,
Title, Subject, Creation, and Update dates.
Extensible Metadata Platform (XMP) content is not
detected. Note that enabling this metadata detection
option can cause false positives.

ContentExtraction.MarkupAsText off Bypasses Content Extraction for files that are

determined to be XML or HTML. This should be
used in cases such as web pages containing data
in the header block or script blocks. Default is off.

ContentExtraction.TrackedChanges off Allows detection of content that has changed over

time (Track Changes content) in Microsoft Office
documents.
Note: Using the foregoing option might reduce the
accuracy rate for IDM and data identifiers. The
default is set to off (disallow).

To index content that has changed over time, set

ContentExtraction.TrackedChanges=on in file
\Protect\config\Indexer.properties. The
default and recommended setting is
ContentExtraction.TrackedChanges=off.
Installing and managing detection servers and cloud detectors 328
Advanced detector settings

Table 14-10 Detector advanced settings (continued)

Setting Default Description

DDM.MaxBinMatchSize 30,000,000 The maximum size (in bytes) used to generate the
MD5 hash for an exact binary match in an IDM. This
setting should not be changed. The following
conditions must be matched for IDM to work
correctly:

■ This setting must be exactly identical to the

max_bin_match_size setting on the Enforce
Server in file indexer.properties.
■ This setting must be smaller or equal to the
FileReader.FileMaxSize value.
■ This setting must be smaller or equal to the
ContentExtraction.MaxContentSize value on the
Enforce Server in file indexer.properties.

Note: Changing the first or third item in the list

requires re-indexing all IDM files.

Detection.EncodingGuessingDefaultEncoding ISO-8859-1 Specifies the backup encoding assumed for a byte

stream.

Detection.EncodingGuessingEnabled on Designates whether the encoding of unknown byte

streams should be guessed.

Detection.EncodingGuessingMinimumConfidence 50 Specifies the confidence level required for guessing

the encoding of unknown byte streams.

DI.MaxViolations 100 Specifies the maximum number of violations allowed

with data identifiers.

EDM.MatchCountVariant 3 Specifies how matches are counted.

■ 1 - Counts the total number of token sets

matched.
■ 2 - Counts the number of unique token sets
matched.
■ 3 - Counts the number of unique super sets of
token sets. (default)

See “Configuring Advanced Settings for EDM

policies” on page 557.

EDM.MaximumNumberOfMatchesToReturn 100 Defines a top limit on the number of matches

returned from each RAM index search.

See “Configuring Advanced Settings for EDM

policies” on page 557.
Installing and managing detection servers and cloud detectors 329
Advanced detector settings

Table 14-10 Detector advanced settings (continued)

Setting Default Description

EDM.SimpleTextProximityRadius 35 Number of tokens that are evaluated together when

the proximity check is enabled.

See “Configuring Advanced Settings for EDM

policies” on page 557.

EDM.TokenVerifierEnabled false If enabled (true), the server validates tokens for

Chinese, Japanese, and Korean (CJK) keywords.

Default is disabled (false).

IncidentDetection.MaxContentLength 2000000 Applies only to regular expression rules. On a

per-component basis, only the first
MaxContentLength number of characters are
scanned for violations. The default (2,000,000) is
equivalent to > 1000 pages of typical text. The
limiter exists to prevent regular expression rules
from taking too long.

IncidentDetection.MinNormalizedSize 30 This setting applies to IDM detection. It must be

kept in sync with the corresponding setting in the
Indexer.properties file on the Enforce Server
(which applies to indexing). Derivative detections
only apply to messages when their normalized
content is greater than this setting. If the normalized
content size is less than this setting, IDM detection
does a straight binary match.

IncidentDetection.patternConditionMaxViolations 100 The maximum number of matches a detector

reports. The detector does not report matches more
than the value of the
'IncidentDetection.patternConditionMaxViolations'
parameter, even if there are any.

Keyword.TokenVerifierEnabled false Default is disabled (false).

If enabled (true), the server validates tokens for

Asian language keywords (Chinese, Japanese, and
Korean).

See “Enabling and using CJK token verification for

server keyword matching” on page 847.
Installing and managing detection servers and cloud detectors 330
About using load balancers in an endpoint deployment

Table 14-10 Detector advanced settings (continued)

Setting Default Description

Lexer.IncludePunctuationInWords true If true, punctuation characters internal to a token

are considered during detection.

See “Configuring Advanced Settings for EDM

policies” on page 557.

Lexer.MaximumNumberOfTokens 30000 Maximum number of tokens extracted from each

message component for detection. Applicable to all
detection technologies where tokenization is
required (EDM, profiled DGM, and the system
patterns supported by those technologies).
Increasing the default value may cause the detector
to run out of memory and restart.

See “Configuring Advanced Settings for EDM

policies” on page 557.

Lexer.Validate true If true, performs system pattern-specific validation.

See “Configuring Advanced Settings for EDM

policies” on page 557.

UnicodeNormalizer.AsianCharRanges default Can be used to override the default definition of

characters that are considered Asian by the
detection engine. Must be either default, or a
comma-separated list of ranges, for example:
11A80-11F9,3200-321E

UnicodeNormalizer.Enabled on Can be used to disable Unicode normalization.

Enter off to disable.

UnicodeNormalizer.NewlineEliminationEnabled on Can be used to disable newline elimination for Asian

languages.

Enter off to disable.

About using load balancers in an endpoint deployment

You can use a load balancer to manage multiple Endpoint Servers, or a server pool. Adding
Endpoint Servers to a load-balanced server pool enables Symantec Data Loss Prevention to
use less bandwidth while managing more agents. When setting up a server pool to manage
Endpoint Servers and agents, default Symantec Data Loss Prevention settings allow for
communication between servers and agents. However, there are a number of load balancer
settings that may affect how Endpoint Servers and agents communicate. You may have to
Installing and managing detection servers and cloud detectors 331
About using load balancers in an endpoint deployment

make changes to advanced agent and server settings if the load balancer you use does not
use default settings.
In general, load balancers should have the following settings applied to work best with Symantec
Data Loss Prevention:
■ 1-Gbps throughput
■ Source IP persistence. Set the persistence time to be greater than the agent polling period.
■ 24-hour SSL session timeout period
The Endpoint Servers communicate most efficiently with agents when the load balancer is set
up to use source IP persistence. (This protocol name may differ across load balancer brands.)
Using source IP persistence in a Symantec Data Loss Prevention implementation ensures
that if an agent is restarted on the same network, it reconnects to the same Endpoint Server
regardless of the SSL session state. Source IP persistence also uses less bandwidth during
the SSL handshake between agents and Endpoint Servers. This protocol also helps maintain
the event/attribute cache coherence.
For agents that connect to the Endpoint Server over a NAT or a proxy, SSL session server
affinity is the optimal load balancer setting. However, if this setting is used, and the agent is
restarted or if the SSL cached session identity is flushed, a new SSL session is negotiated.
Negotiating a new SSL session may cause the agent to connect to a different monitor more
frequently which may interfere with agent status updates on the Enforce Server.
You review agent connection settings if the load balancer idle connection settings is not set
to default. The load balancer idle connection setting can also be called connection timeout
interval, clean idle connection, and so-on depending on the load balancer brand.
You can assess your Symantec Data Loss Prevention and load balancer settings by considering
the following two scenarios:
■ Default DLP settings. Table 14-11
■ Non-default DLP settings. Table 14-12

Note: Contact Symantec Support before changing default advanced agent and advanced
server settings.
Installing and managing detection servers and cloud detectors 332
About using load balancers in an endpoint deployment

Table 14-11 Default Symantec Data Loss Prevention settings scenario

Description Resolution

Symantec Data Loss Prevention uses Consider how the agent idle timeout coincides with the load balancer
non-persistent connections by default. Using close idle connection setting. If the load balancer is configured to close
non-persistent connections means that idle connections after less than 30 seconds, agents are prematurely
Endpoint Servers close connections to agents disconnected from Endpoint Servers.
after agents are idle for 30 seconds.
To resolve the issue, complete one of the following:

■ Change the agent idle timeout setting (EndpointCommunications.

IDLE_TIMEOUT_IN_SECONDS.int) to less than the close idle
connection setting on the load balancer.
■ Increase the agent heartbeat setting
(EndpointCommunications.HEARTBEAT_INTERVAL_IN_SECONDS.int)
to be less than the load balancer close idle connections setting.
The user must also increase the no traffic timeout setting
(CommLayer.NO_TRAFFIC_TIMEOUT_IN_SECONDS.int) to a
value greater than the agent heartbeat setting.

Table 14-12 Non-default Symantec Data Loss Prevention settings scenario

Description Resolution

Consider how changes to default Symantec To resolve the issue, complete one of the following:
Data Loss Prevention settings affect how the
■ Change the agent heartbeat
load balancer handles idle and persistent
(EndpointCommunications.HEARTBEAT_INTERVAL_IN_SECONDS.int)
agent connections. For example, if you change
and no traffic timeout settings
the idle timeout setting to 0 to create a
(CommLayer.NO_TRAFFIC_TIMEOUT_IN_SECONDS.int) to less
persistent connection and you leave the default
than the load balancer idle connection setting.
agent heartbeat setting (270 seconds), you
■ Verify that the no traffic timeout setting is greater than the heartbeat
must consider the idle connection setting on
setting.
the load balancer. If the idle connection setting
on the load balancer is less than 270 seconds,
then agents are prematurely disconnected
from Endpoint Servers.

See “Advanced server settings” on page 285.

See “Advanced agent settings” on page 2372.
Chapter 15
Managing log files
This chapter includes the following topics:

■ About log files

■ Log collection and configuration screen

■ Configuring server logging behavior

■ Collecting server logs and configuration files

■ About log event codes

About log files

Symantec Data Loss Prevention provides a number of different log files that record information
about the behavior of the software. Log files fall into these categories:
■ Operational log files record detailed information about the tasks the software performs and
any errors that occur while the software performs those tasks. You can use the contents
of operational log files to verify that the software functions as you expect it to. You can also
use these files to troubleshoot any problems in the way the software integrates with other
components of your system.
For example, you can use operational log files to verify that a Network Prevent for Email
Server communicates with a specific MTA on your network.
See “Operational log files” on page 334.
■ Debug log files record fine-grained technical details about the individual processes or
software components that comprise Symantec Data Loss Prevention. The contents of
debug log files are not intended for use in diagnosing system configuration errors or in
verifying expected software functionality. You do not need to examine debug log files to
administer or maintain an Symantec Data Loss Prevention installation. However, Symantec
Support may ask you to provide debug log files for further analysis when you report a
Managing log files 334
About log files

problem. Some debug log files are not created by default. Symantec Support can explain
how to configure the software to create the file if necessary.
See “Debug log files” on page 337.
■ Installation log files record information about the Symantec Data Loss Prevention installation
tasks that are performed on a particular computer. You can use these log files to verify an
installation or troubleshoot installation errors. Installation log files reside in the following
locations:
■ installdir\SymantecDLP\.install4j\installation.log stores the installation log
for Symantec Data Loss Prevention.
■ installdir\oracle_home\admin\protect\ stores the installation log for Oracle.
See the Symantec Data Loss Prevention Installation Guide for more information.

Operational log files

The Enforce Server and the detection servers store operational log files in the
c:\ProgramData\Symantec\DataLossPrevention\<EnforceServer or
DetectionServer>\15.5\Protect\logs\ directory on Windows installations and in the
/var/log/Symantec/DataLossPrevention/<EnforceServer or DetectionServer>/15.5/
directory on Linux installations. A number at the end of the log file name indicates the count
(shown as 0 in Table 15-1).
Table 15-1 lists and describes the Symantec Data Loss Prevention operational log files.

Table 15-1 Operational log files

Log file name Description Server

agentmanagement_webservices_access_0.log Logs successful and failed attempts Enforce Server

to access the Agent Management
API web service.

agentmanagement_webservices_soap_0.log Logs the entire SOAP request and Enforce Server

response for most requests to the
Agent Management API web
Service.
Managing log files 335
About log files

Table 15-1 Operational log files (continued)

Log file name Description Server

boxmonitor_operational_0.log The BoxMonitor process All detection servers

oversees the detection server
processes that pertain to that
particular server type.

For example, the processes that run

on Network Monitor are file reader
and packet capture.

The BoxMonitor log file is typically

very small, and it shows how the
application processes are running.

detection_operational_0.log The detection operation log file All detection servers

provides details about how the
detection server configuration and
whether it is operating correctly.

detection_operational_trace_0.log The detection trace log file provides All detection servers
details about each message that
the detection server processes. The
log file includes information such
as:

■ The policies that were applied

to the message
■ The policy rules that were
matched in the message
■ The number of incidents the
message generated.

machinelearning_training_operational_0.log This log records information about Enforce Server

the tasks, logs, and configuration
files called on startup of the VML
training process.

manager_operational_0.log. Logs information about the Enforce Server

Symantec Data Loss Prevention
manager process, which
implements the Enforce Server
administration console user
interface.
Managing log files 336
About log files

Table 15-1 Operational log files (continued)

Log file name Description Server

monitorcontroller_operational_0.log Records a detailed log of the Enforce Server

connections between the Enforce
Server and all detection servers. It
provides details about the
information that is exchanged
between these servers including
whether policies have been pushed
to the detection servers or not.

SmtpPrevent_operational0.log This operational log file pertains to SMTP Prevent

SMTP Prevent only. It is the primary detection servers
log for tracking the health and
activity of a Network Prevent for
Email system. Examine this file for
information about the
communication between the MTAs
and the detection server.

WebPrevent_Access0.log This access log file contains ■ Network Prevent

information about the requests that for Web detection
are processed by Network Prevent servers
for Web detection servers. It is
similar to web access logs for a
proxy server.

WebPrevent_Operational0.log This operational log file reports on ■ Network Prevent

the operating condition of Network for Web detection
Prevent for Web, such as whether servers
the system is up or down and
connection management.

webservices_access_0.log This log file records successful and Enforce Server

failed attempts to access the
Incident Reporting Web Service.
Managing log files 337
About log files

Table 15-1 Operational log files (continued)

Log file name Description Server

webservices_soap_0.log Contains the entire SOAP request Enforce Server

and response for most requests to
the Incident Reporting API Web
Service. This log records all
requests and responses except
responses to incident binary
requests. This log file is not created
by default. See the Symantec Data
Loss Prevention Incident Reporting
API Developers Guide for more
information.

See “Network Prevent for Web operational log files and event codes” on page 351.
See “Network Prevent for Web access log files and fields” on page 352.
See “Network Prevent for Email log levels” on page 355.
See “Network Prevent for Email operational log codes” on page 355.
See “Network Prevent for Email originated responses and codes” on page 359.

Debug log files

The Enforce Server and the detection servers store debug log files in the
c:\ProgramData\Symantec\DataLossPrevention\<Enforce Server or Detection
Server>\15.5\Protect\logs\ directory on Windows installations and in the
/var/log/Symantec/DataLossPrevention/<Enforce Server or Detection Server>/15.5/
directory on Linux installations. A number at the end of the log file name indicates the count
(shown as 0 in debug log files).
The following table lists and describes the Symantec Data Loss Prevention debug log files.
Managing log files 338
About log files

Table 15-2 Debug log files

Log file name Description Server

Aggregator0.log This file describes communications between the Endpoint

detection server and the agents. detection
servers
Look at this log to troubleshoot the following
problems:

■ Connection to the agents

■ To find out why incidents do not appear when they
should
■ If unexpected agent events occur

BoxMonitor0.log This file is typically very small, and it shows how the All
application processes are running. The BoxMonitor detection
process oversees the detection server processes that servers
pertain to that particular server type.

For example, the processes that run on Network

Monitor are file reader and packet capture.

ContentExtractionAPI_FileReader.log Logs the behavior of the Content Extraction API file Detection
reader that sends requests to the plug-in host. The Server
default logging level is "info" which is configurable
using log4cxx_config_filereader.xml in the
C:\Program
Files\Symantec\DataLossPrevention\
DetectionServer (Windows) or
/opt/Symantec/DataLossPrevention/
DetectionServer/15.5/Protect/config
(Linux) directory.

ContentExtractionAPI_Manager.log Logs the behavior of the Content Extraction API Enforce

manager that sends requests to the plug-in host. The Server
default logging level is "info" which is configurable
using log4cxx_config_manager.xml in the
C:\Program
Files\Symantec\DataLossPrevention\
DetectionServer (Windows) or
/opt/Symantec/DataLossPrevention/
DetectionServer/15.5/Protect/config
(Linux) directory.
Managing log files 339
About log files

Table 15-2 Debug log files (continued)

Log file name Description Server

ContentExtractionHost_FileReader.log Logs the behavior of the Content Extraction File Detection

Reader hosts and plug-ins. The default logging level Server
is "info" which is configurable using
log4cxx_config_filereader.xml in the
C:\Program
Files\Symantec\DataLossPrevention\
DetectionServer (Windows) or
/opt/Symantec/DataLossPrevention/
DetectionServer/15.5/Protect/config
(Linux) directory.

ContentExtractionHost_Manager.log Logs the behavior of the Content Extraction Manager Enforce

hosts and plug-ins. The default logging level is "info" Server
which is configurable using
log4cxx_config_manager.xml in the
C:\Program
Files\Symantec\DataLossPrevention\
DetectionServer (Windows) or
/opt/Symantec/DataLossPrevention/
DetectionServer/15.5/Protect/config
(Linux) directory.

DiscoverNative.log.0 This log file is located in c:\Program Files\ Discover

Symantec\DataLossPrevention\ detection
DetectionServer\15.5\Protect\logs\debug servers

This log file contains the log statements that the

Network Discover/Cloud Storage Discover native
code emits. Currently contains the information that
is related to .pst scanning. This log file applies only
to the Network Discover/Cloud Storage Discover
Servers that run on Windows platforms.

You can configure this log in the c:\Program

Files\ Symantec\DataLossPrevention\
DetectionServer\15.5\Protect\config\
DiscoverNativeLogging.properties file.

FileReader0.log This log file pertains to the file reader process and All
contains application-specific logging, which may be detection
helpful in resolving issues in detection and incident servers
creation. One symptom that shows up is content
extractor timeouts.
Managing log files 340
About log files

Table 15-2 Debug log files (continued)

Log file name Description Server

flash_client_0.log Logs messages from the Adobe Flex client used for Enforce
folder risk reports by Network Discover. Server

flash_server_remoting_0.log Contains log messages from BlazeDS, an Enforce

open-source component that responds to remote Server
procedure calls from an Adobe Flex client. This log
indicates whether the Enforce Server has received
messages from the Flash client. At permissive log
levels (FINE, FINER, FINEST), the BlazeDS logs
contain the content of the client requests to the server
and the content of the server responses to the client

IncidentPersister0.log This log file pertains to the Incident Persister process. Enforce
This process reads incidents from the incidents folder Server
on the Enforce Server, and writes them to the
database. Look at this log if the incident queue on
the Enforce Server (manager) grows too large. This
situation can be observed also by checking the
incidents folder on the Enforce Server to see if
incidents have backed up.

Indexer0.log This log file contains information when an EDM profile Enforce
or IDM profile is indexed. It also includes the Server
information that is collected when the external indexer (or
is used. If indexing fails then this log should be computer
consulted. where
the
external
indexer
is
running)

jdbc.log This log file is a trace of JDBC calls to the database. Enforce
By default, writing to this log is turned off. Server
Managing log files 341
About log files

Table 15-2 Debug log files (continued)

Log file name Description Server

machinelearning_native_filereader.log This log file records the runtime category classification Detection
(positive and negative) and associated confidence Server
levels for each message detected by a VML profile.
The default logging level is "info" which is configurable
using \log4cxx_config_filereader.xml in the
C:\Program
Files\Symantec\DataLossPrevention\
DetectionServer (Windows) or
/opt/Symantec/DataLossPrevention/
DetectionServer/15.5/Protect/config
(Linux) directory.

machinelearning_training_0_0.log This log file records the design-time base accuracy Enforce
percentages for the k-fold evaluations for all VML Server
profiles.

machinelearning_training_native_manager.log This log file records the total number of features Enforce
modeled at design-time for each VML profile training Server
run. The default logging level is "info" which is
configurable using log4cxx_config_manager.xml
in the C:\Program
Files\Symantec\DataLossPrevention\
DetectionServer (Windows) or
/opt/Symantec/DataLossPrevention/
DetectionServer/15.5/Protect/config
(Linux) directory.

MonitorController0.log This log file is a detailed log of the connections Enforce

between the Enforce Server and the detection Server
servers. It gives details around the information that
is exchanged between these servers including
whether policies have been pushed to the detection
servers or not.

PacketCapture.log This log file pertains to the packet capture process Network
that reassembles packets into messages and writes Monitor
to the drop_pcap directory. Look at this log if there
is a problem with dropped packets or traffic is lower
than expected. PacketCapture is not a Java
process, so it does not follow the same logging rules
as the other Symantec Data Loss Prevention system
processes.
Managing log files 342
About log files

Table 15-2 Debug log files (continued)

Log file name Description Server

PacketCapture0.log This log file describes issues with PacketCapture Network

communications. Monitor

RequestProcessor0.log This log file pertains to SMTP Prevent only. The log SMTP
file is primarily for use in cases where Prevent
SmtpPrevent_operational0.log is not sufficient. detection
servers

ScanDetail-target-0.log Where target is the name of the scan target. All white Discover
spaces in the target's name are replaced with detection
hyphens. This log file pertains to Discover server servers
scanning. It is a file by file record of what happened
in the scan. If the scan of the file is successful, it
reads success, and then the path, size, time, owner,
and ACL information of the file scanned. If it failed,
a warning appears followed by the file name.

tomcat\localhost.date.log These Tomcat log files contain information for any Enforce
action that involves the user interface. The logs Server
include the user interface errors from red error
message box, password failures when logging on,
and Oracle errors (ORA –#).

SymantecDLPIncidentPersister.log This log file contains minimal information: stdout Enforce

and stderr only (fatal events). Server

SymantecDLPManager.log This log file contains minimal information: stdout Enforce

and stderr only (fatal events). Server

SymantecDLPMonitor.log This log file contains minimal information: stdout All

and stderr only (fatal events). detection
servers

SymantecDLPMonitorController.log This log file contains minimal information: stdout Enforce

and stderr only (fatal events). Server

SymantecDLPNotifier.log This log file pertains to the Notifier service and its Enforce
communications with the Enforce Server and the Server
MonitorController service. Look at this file to
see if the MonitorController service registered
a policy change.

SymantecDLPUpdate.log This log file is populated when you update Symantec Enforce
Data Loss Prevention. Server
Managing log files 343
Log collection and configuration screen

See “Network Prevent for Web protocol debug log files” on page 354.
See “Network Prevent for Email log levels” on page 355.

Log collection and configuration screen

Use the System > Servers and Detectors > Logs screen to collect log files or to configure
logging behavior for any Symantec Data Loss Prevention server. The Logs screen contains
two tabs that provide the following features:
■ Collection—Use this tab to collect log files and configuration files from one or more
Symantec Data Loss Prevention servers.
See “Collecting server logs and configuration files” on page 347.
■ Configuration—Use this tab to configure basic logging behavior for a Symantec Data Loss
Prevention server, or to apply a custom log configuration file to a server.
See “Configuring server logging behavior” on page 343.
See “About log files” on page 333.

Configuring server logging behavior

Use the Configuration tab of the System > Servers and Detectors > Logs screen to change
logging configuration parameters for any server in the Symantec Data Loss Prevention
deployment. The Select a Diagnostic Log Setting menu provides preconfigured settings for
Enforce Server and detection server logging parameters. You can select an available
preconfigured setting to define common log levels or to enable logging for common server
features. The Select a Diagnostic Log Setting menu also provides a default setting that
returns logging configuration parameters to the default settings used at installation time.
Table 15-3 describes the preconfigured log settings available for the Enforce Server.
Optionally, you can upload a custom log configuration file that you have created or modified
using a text editor. (Use the Collection tab to download a log configuration file that you want
to customize.) You can upload only those configuration files that modify logging properties (file
names that end with Logging.properties). When you upload a new log configuration file to
a server, the server first backs up the existing configuration file of the same name. The new
file is then copied into the configuration file directory and its properties are applied immediately.
You do not need to restart the server process for the changes to take effect, unless you are
directed to do so. As of the current software release, only changes to the
PacketCaptureNativeLogging.properties and DiscoverNativeLogging.properties files
require you to restart the server process.
See “Server controls” on page 251.
Managing log files 344
Configuring server logging behavior

Make sure that the configuration file that you upload contains valid property definitions that
are applicable to the type of server you want to configure. If you make a mistake when uploading
a log configuration file, use the preconfigured Restore Defaults setting to revert the log
configuration to its original installed state.
The Enforce Server administration console performs only minimal validation of the log
configuration files that you upload. It ensures that:
■ Configuration file names correspond to actual logging configuration file names.
■ Root level logging is enabled in the configuration file. This configuration ensures that some
basic logging functionality is always available for a server.
■ Properties in the file that define logging levels contain only valid values (such as INFO,
FINE, or WARNING).

If the server detects a problem with any of these items, it displays an error message and
cancels the file upload.
If the Enforce Server successfully uploads a log configuration file change to a detection server,
the administration console reports that the configuration change was submitted. If the detection
server then encounters any problems when tries to apply the configuration change, it logs a
system event warning to indicate the problem.

Table 15-3 Preconfigured log settings for the Enforce Server

Select a Diagnostic Log Description

Setting value

Restore Defaults Restores log file parameters to their default values.

Incident Reporting API Logs the entire SOAP request and response message for most requests to the Incident
SOAP Logging Reporting API Web Service. The logged messages are stored in the
webservices_soap.log file. To begin logging to this file, edit the
c:\ProgramData\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\
ManagerLogging.properties (Windows) or
/var/log/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config/
ManagerLogging.properties (Linux) file to set the com.vontu.enforce.

reportingapi.webservice.log.

WebServiceSOAPLogHandler.level property to INFO.

You can use the contents of webservices_soap.log to diagnose problems when

developing Incident Reporting API Web Service clients. See the Symantec Data Loss
Prevention Incident Reporting API Developers Guide for more information.
Managing log files 345
Configuring server logging behavior

Table 15-3 Preconfigured log settings for the Enforce Server (continued)

Select a Diagnostic Log Description

Setting value

Custom Attribute Lookup Logs diagnostic information each time the Enforce Server uses a lookup plug-in to
Logging populate custom attributes for an incident. Lookup plug-ins populate custom attribute
data using LDAP, CSV files, or other data repositories. The diagnostic information is
recorded in the Tomcat log file
(c:\ProgramData\Symantec\DataLossPrevention\EnforceServer\
15.5\Protect\logs\tomcat\localhost.date.log [Windows] or
/var/log/Symantec/DataLossPrevention/EnforceServer/
15.5/Protect/tomcat/localhost.date.log [Linux]) and the
IncidentPersister_0.log file.

See “About custom attributes” on page 1968.

See “About using custom attributes” on page 1969.

Table 15-4 Preconfigured log settings for detection servers

Select a Detection server uses Description

Diagnostic Log
Setting value

Restore All detection servers Restores log file parameters to their default values.
Defaults

Discover Trace Network Discover Servers Enables informational logging for Network Discover scans. These
Logging log messages are stored in FileReader0.log.

Detection All detection servers Logs information about each message that the detection server
Trace Logging processes. This includes information such as:

■ The policies that were applied to the message

■ The policy rules that were matched in the message
■ The number of incidents that the message generated.

When you enable Detection Trace Logging, the resulting

messages are stored in the
detection_operational_trace_0.log file.
Note: Trace logging can produce a large amount of data, and the
data is stored in clear text format. Use trace logging only when
you need to debug a specific problem.
Managing log files 346
Configuring server logging behavior

Table 15-4 Preconfigured log settings for detection servers (continued)

Select a Detection server uses Description

Diagnostic Log
Setting value

Packet Capture Network Monitor Servers Enables basic debug logging for packet capture with Network
Debug Logging Monitor. This setting logs information in the PacketCapture.log
file.

While this type of logging can produce a large amount of data, the
Packet Capture Debug Logging setting limits the log file size to
50 MB and the maximum number of log files to 10.

If you apply this log configuration setting to a server, you must

restart the server process to enable the change.

Email Prevent Network Prevent for Email Enables full message logging for Network Prevent for Email
Logging servers servers. This setting logs the complete message content and
includes execution and error tracing information. Logged
information is stored in the RequestProcessor0.log file.
Note: Trace logging can produce a large amount of data, and the
data is stored in clear text format. Use trace logging only when
you need to debug a specific problem.

See “Network Prevent for Email operational log codes” on page 355.

See “Network Prevent for Email originated responses and codes”

on page 359.

ICAP Prevent Network Prevent for Web Enables operational and access logging for Network Prevent for
Message servers Web. This setting logs information in the FileReader0.log file.
Processing
See “Network Prevent for Web operational log files and event
Logging
codes” on page 351.

See “Network Prevent for Web access log files and fields”
on page 352.

Follow this procedure to change the log configuration for a Symantec Data Loss Prevention
server.
To configure logging properties for a server
1 Click the Configuration tab if it is not already selected.
2 If you want to configure logging properties for a detection server, select the server name
from the Select a Detection Server menu.
Managing log files 347
Collecting server logs and configuration files

3 If you want to apply preconfigured log settings to a server, select the configuration name
from the Select a Diagnostic Configuration menu next to the server you want to
configure.
See Table 15-3 and Table 15-4 for a description of the diagnostic configurations.
4 If you instead want to use a customized log configuration file, click Browse... next to the
server you want to configure. Then select the logging configuration file to use from the
File Upload dialog, and click Open. You upload only logging configuration files, and not
configuration files that affect other server features.

Note: If the Browse button is unavailable because of a previous menu selection, click
Clear Form.

5 Click Configure Logs to apply the preconfigured setting or custom log configuration file
to the selected server.
6 Check for any system event warnings that indicate a problem in applying configuration
changes on a server.
See “Log collection and configuration screen” on page 343.

Note: The following debug log files are configured manually outside of the logging framework
available through the Enforce Server administration console:
ContentExtractionAPI_FileReader.log, ContentExtractionAPI_Manager.log,
ContentExtractionHost_FileReader.log, ContentExtractionHost_Manager.log,
machinelearning_native_filereader.log, and
machinelearning_training_native_manager.log. Refer to the entry for each of these log
files in debug log file list for configuration details. See “Debug log files” on page 337.

Collecting server logs and configuration files

Use the Collection tab of the System > Servers and Detectors > Logs screen to collect log
files and configuration files from one or more Symantec Data Loss Prevention servers. You
can collect files from a single detection server or from all detection servers, as well as from
the Enforce Server computer. You can limit the collected files to only those files that were last
updated in a specified range of dates.
The Enforce Server administration console stores all log and configuration files that you collect
in a single ZIP file on the Enforce Server computer. If you retrieve files from multiple Symantec
Data Loss Prevention servers, each server's files are stored in a separate subdirectory of the
ZIP file.
Managing log files 348
Collecting server logs and configuration files

Checkboxes on the Collection tab enable you to collect different types of files from the selected
servers. Table 15-5 describes each type of file.

Table 15-5 File types for collection

File type Description

Operational Operational log files record detailed information about the tasks the software performs and any errors
Logs that occur while the software performs those tasks. You can use the contents of operational log files
to verify that the software functions as you expect it to. You can also use these files to troubleshoot
any problems in the way the software integrates with other components of your system.

For example, you can use operational log files to verify that a Network Prevent for Email Server
communicates with a specific MTA on your network.

Debug and Debug log files record fine-grained technical details about the individual processes or software
Trace Logs components that comprise Symantec Data Loss Prevention. The contents of debug log files are not
intended for use in diagnosing system configuration errors or in verifying expected software
functionality. You do not need to examine debug log files to administer or maintain an Symantec
Data Loss Prevention installation. However, Symantec Support may ask you to provide debug log
files for further analysis when you report a problem. Some debug log files are not created by default.
Symantec Support can explain how to configure the software to create the file if necessary.

Configuration Use the Configuration Files option to retrieve both logging configuration files and server feature
Files configuration files.

Logging configuration files define the overall level of logging detail that is recorded in server log files.
Logging configuration files also determine whether specific features or subsystem events are recorded
to log files.

For example, by default the Enforce console does not log SOAP messages that are generated from
Incident Reporting API Web service clients. The ManagerLogging.properties file contains a
property that enables logging for SOAP messages.

You can modify many common logging configuration properties by using the presets that are available
on the Configuration tab.

If you want to update a logging configuration file by hand, use the Configuration Files checkbox to
download the configuration files for a server. You can modify individual logging properties using a
text editor and then use the Configuration tab to upload the modified file to the server.

See “Configuring server logging behavior” on page 343.

The Configuration Files option retrieves the active logging configuration files and also any backup
log configuration files that were created when you used the Configuration tab. This option also
retrieves server feature configuration files. Server feature configuration files affect many different
aspects of server behavior, such as the location of a syslog server or the communication settings of
the server. You can collect these configuration files to help diagnose problems or verify server settings.
However, you cannot use the Configuration tab to change server feature configuration files. You
can only use the tab to change logging configuration files.
Managing log files 349
Collecting server logs and configuration files

Table 15-5 File types for collection (continued)

File type Description

Agent Logs Use the Agent Logs option to collect DLP agent service and operational log files from an Endpoint
Prevent detection server. This option is available only for Endpoint Prevent servers. To collect agent
logs using this option, you must have already pulled the log files from individual agents to the Endpoint
Prevent detection server using a Pull Logs action.

Use the Agent List screen to select individual agents and pull selected log files to the Endpoint
Prevent detection server. Then use the Agent Logs option on this page to collect the log files.

When the logs are pulled from the endpoint, they are stored on the Endpoint Server in an unencrypted
format. After you collect the logs from the Endpoint Server, the logs are deleted from the Endpoint
Server and are stored only on the Enforce Server. You can only collect logs from one endpoint at a
time.

See “Using the Agent List screen” on page 2430.

Operational, debug, trace log files are stored in the server_identifier/logs subdirectory
of the ZIP file. server_identifier identifies the server that generated the log files, and it
corresponds to one of the following values:
■ If you collect log files from the Enforce Server, Symantec Data Loss Prevention replaces
server_identifier with the string Enforce. Note that Symantec Data Loss Prevention does
not use the localized name of the Enforce Server.
■ If a detection server’s name includes only ASCII characters, Symantec Data Loss Prevention
uses the detection server name for the server_identifier value.
■ If a detection server’s name contains non-ASCII characters, Symantec Data Loss Prevention
uses the string DetectionServer-ID-id_number for the server_identifier value. id_number
is a unique identification number for the detection server.
If you collect agent service log files or operational log files from an Endpoint Prevent server,
the files are placed in the server_identifier/agentlogs subdirectory. Each agent log file
uses the individual agent name as the log file prefix.
Follow this procedure to collect log files and log configuration files from Symantec Data Loss
Prevention servers.
To collect log files from one or more servers
1 Click the Collection tab if it is not already selected.
2 Use the Date Range menu to select a range of dates for the files you want to collect. Note
that the collection process does not truncate downloaded log files in any way. The date
range limits collected files to those files that were last updated in the specified range.
3 To collect log files from the Enforce Server, select one or more of the checkboxes next
to the Enforce Server entry to indicate the type of files you want to collect.
Managing log files 350
About log event codes

4 To collect log files from one or all detection servers, use the Select a Detection Server
menu to select either the name of a detection server or the Collect Logs from All
Detection Servers option. Then select one or more of the checkboxes next to the menu
to indicate the type of files you want to collect.
5 Click Collect Logs to begin the log collection process.
The administration console adds a new entry for the log collection process in the Previous
Log Collections list at the bottom of the screen. If you are retrieving many log files, you
may need to refresh the screen periodically to determine when the log collection process
has completed.

Note: You can run only one log collection process at a time.

6 To cancel an active log collection process, click Cancel next to the log collection entry.
You may need to cancel log collection if one or more servers are offline and the collection
process cannot complete. When you cancel the log collection, the ZIP file contains only
those files that were successfully collected.
7 To download collected logs to your local computer, click Download next to the log collection
entry.
8 To remove ZIP files stored on the Enforce Server, click Delete next to a log collection
entry.
See “Log collection and configuration screen” on page 343.
See “About log files” on page 333.

About log event codes

Operational log file messages are formatted to closely match industry standards for the various
protocols involved. These log messages contain event codes that describe the specific task
that the software was trying to perform when the message was recorded. Log messages are
generally formatted as:

Timestamp [Log Level] (Event Code) Event description [event parameters]

■ See “Network Prevent for Web operational log files and event codes” on page 351.
■ See “Network Prevent for Email operational log codes” on page 355.
■ See “Network Prevent for Email originated responses and codes” on page 359.
Managing log files 351
About log event codes

Network Prevent for Web operational log files and event codes
Network Prevent for Web log file names use the format of WebPrevent_OperationalX.log
(where X is a number). The number of files that are stored and their sizes can be specified by
changing the values in the FileReaderLogging.properties file. This file is in the c:\Program
Files\Symantec\DataLossPrevention\DetectionServer\15.5\Protect\config (Windows)
or /opt/Symantec/DataLossPrevention/DetectionServer/15.5/Protect/config (Linux)
directory. By default, the values are:
■ com.vontu.icap.log.IcapOperationalLogHandler.limit = 5000000

■ com.vontu.icap.log.IcapOperationalLogHandler.count = 5

Table 15-6 lists the Network Prevent for Web-defined operational logging codes by category.
The italicized part of the text contains event parameters.

Table 15-6 Status codes for Network Prevent for Web operational logs

Code Text and Description

Operational Events

1100 Starting Network Prevent for Web

1101 Shutting down Network Prevent for Web

Connectivity Events

1200 Listening for incoming connections at

icap_bind_address:icap_bind_port

Where:

■ icap_bind_address is the Network Prevent for Web bind address to which the server listens.
This address is specified with the Icap.BindAddress Advanced Setting.
■ icap_bind_port is the port at which the server listens. This port is set in the Server >
Configure page.

1201 Connection (id=conn_id) opened from

host(icap_client_ip:icap_client_port)

Where:

■ conn_id is the connection ID that is allocated to this connection. This ID can be helpful in
doing correlations between multiple logs.
■ icap_client_ip and icap_client_port are the proxy's IP address and port from which the
connect operation to Network Prevent for Web was performed.
Managing log files 352
About log event codes

Table 15-6 Status codes for Network Prevent for Web operational logs (continued)

Code Text and Description

1202 Connection (id=conn_id) closed (close_reason)

Where:

■ conn_id is the connection ID that is allocated to the connect operation.

■ close_reason provides the reason for closing the connection.

1203 Connection states: REQMOD=N, RESPMOD=N,

OPTIONS=N, OTHERS=N

Where N indicates the number of connections in each state, when the message was logged.

This message provides the system state in terms of connection management. It is logged
whenever a connection is opened or closed.

Connectivity Errors

5200 Failed to create listener at icap_bind_address:icap_bind_port

Where:

■ icap_bind_address is the Network Prevent for Web bind address to which the server listens.
This address can be specified with the Icap.BindAddress Advanced Setting.
■ icap_bind_port is the port at which the server listens. This port is set on the Server >
Configure page.

5201 Connection was rejected from unauthorized host (host_ip:port)

Where host_ip and port are the proxy system IP and port address from which a connect attempt
to Network Prevent for Web was performed. If the host is not listed in the Icap.AllowHosts
Advanced setting, it is unable to form a connection.

See “About log files” on page 333.

Network Prevent for Web access log files and fields

Network Prevent for Web log file names use the format of WebPrevent_AccessX.log (where
X is a number). The number of files that are stored and their sizes can be specified by changing
the values in the FileReaderLogging.properties file. By default, the values are:
■ com.vontu.icap.log.IcapAccessLogHandler.limit = 5000000

■ com.vontu.icap.log.IcapAccessLogHandler.count = 5

A Network Prevent for Web access log is similar to a proxy server’s web access log. The “start”
log message format is:
Managing log files 353
About log event codes

# Web Prevent starting: start_time

Where start_time format is date:time, for example: 13/Aug/2018:03:11:22:015-0700.

The description message format is:

# host_ip "auth_user" time_stamp "request_line" icap_status_code

request_size "referer" "user_agent" processing_time(ms) conn_id client_ip
client_port action_code icap_method_code traffic_source_code

Table 15-7 lists the fields. The values of fields that are enclosed in quotes in this example are
quoted in an actual message. If field values cannot be determined, the message displays -
or "" as a default value.

Table 15-7 Network Prevent for Web access log fields

Fields Explanation

host_ip IP address of the host that made the request.

auth_user Authorized user for this request.

time_stamp Time that Network Prevent for Web receives the request.

request_line Line that represents the request.

icap_status_code ICAP response code that Network Prevent for Web sends by for this
request.

request_size Request size in bytes.

referrer Header value from the request that contains the URI from which this request
came.

user_agent User agent that is associated with the request.

processing_time Request processing time in milliseconds. This value is the total of the
(milliseconds) receiving, content inspection, and sending times.

conn_id Connection ID associated with the request.

client_ip IP of the ICAP client (proxy).

client_port Port of the ICAP client (proxy).

Managing log files 354
About log event codes

Table 15-7 Network Prevent for Web access log fields (continued)

Fields Explanation

action_code An integer representing the action that Network Prevent for Web takes.
Where the action code is one of the following:

■ 0 = UNKNOWN
■ 1 = ALLOW
■ 2 = BLOCK
■ 3 = REDACT
■ 4 = ERROR
■ 5 = ALLOW_WITHOUT_INSPECTION
■ 6 = OPTIONS_RESPONSE
■ 7 = REDIRECT

icap_method_code An integer representing the ICAP method that is associated with this
request. Where the ICAP method code is one of the following:

■ -1 = ILLEGAL
■ 0 = OPTIONS
■ 1 = REQMOD
■ 2 = RESPMOD
■ 3 = LOG

traffic_source_code An integer that represents the source of the network traffic. Where the
traffic source code is one of the following:

■ 1 = WEB
■ 2 = UNKNOWN

See “About log files” on page 333.

Network Prevent for Web protocol debug log files

To enable ICAP trace logging, set the Icap.EnableTrace advanced setting to true and use
the Icap.TraceFolder advanced setting to specify a directory to receive the traces. Symantec
Data Loss Prevention service must be restarted for this change to take effect.
Trace files that are placed in the specified directory have file names in the format:
timestamp-conn_id. The first line of a trace file provides information about the connecting host
IP and port along with a timestamp. File data that is read from the socket is displayed in the
format <<timestamp number_of_bytes_read. Data that is written to the socket is displayed
in the format >>timestamp number_of_bytes_written. The last line should note that the
connection has been closed.
Managing log files 355
About log event codes

Note: Trace logging produces a large amount of data and therefore requires a large amount
of free disk storage space. Trace logging should be used only for debugging an issue because
the data that is written in the file is in clear text.

See “About log files” on page 333.

Network Prevent for Email log levels

Network Prevent for Email log file names use the format of EmailPrevent_OperationalX.log
(where X is a number). The number of files that are stored and their sizes can be specified by
changing the values in the FileReaderLogging.properties file. By default, the values are:
■ com.vontu.mta.log.SmtpOperationalLogHandler.limit = 5000000

■ com.vontu.mta.log.SmtpOperationalLogHandler.count = 5

At various log levels, components in the com.vontu.mta.rp package output varying levels of
detail. The com.vontu.mta.rp.level setting specifies log levels in the
RequestProcessorLogging.properties file which is stored in the
FileReaderLogging.properties file. This file is in the c:\Program
Files\Symantec\DataLossPrevention\DetectionServer\15.5\Protect\config (Windows)
or /opt/Symantec/DataLossPrevention/DetectionServer/15.5/Protect/config (Linux)
directory. For example, com.vontu.mta.rp.level = FINE specifies the FINE level of detail.
Table 15-8 describes the Network Prevent for Email log levels.

Table 15-8 Network Prevent for Email log levels

Level Guidelines

INFO General events: connect and disconnect notices, information on the messages that are
processed per connection.

FINE Some additional execution tracing information.

FINER Envelope command streams, message headers, detection results.

FINEST Complete message content, deepest execution tracing, and error tracing.

See “About log files” on page 333.

Network Prevent for Email operational log codes

Table 15-9 lists the defined Network Prevent for Email operational logging codes by category.
Managing log files 356
About log event codes

Table 15-9 Status codes for Network Prevent for Email operational log

Code Description

Core Events

1100 Starting Network Prevent for Email

1101 Shutting down Network Prevent for Email

1102 Reconnecting to FileReader (tid=id)

Where id is the thread identifier.

The RequestProcessor attempts to re-establish its connection with the FileReader for detection.

1103 Reconnected to the FileReader successfully (tid=id)

The RequestProcessor was able to re-establish its connection to the FileReader.

Core Errors

5100 Could not connect to the FileReader (tid=id timeout=.3s)

An attempt to re-connect to the FileReader failed.

5101 FileReader connection lost (tid=id)

The RequestProcessor connection to the FileReader was lost.

Connectivity Events

1200 Listening for incoming connections (local=hostname)

Hostnames is an IP address or fully-qualified domain name.

1201 Connection accepted (tid=id cid=N

local=hostname:port
remote=hostname:port)

Where N is the connection identifier.

1202 Peer disconnected (tid=id cid=N

local=hostname:port
remote=hostname:port)

1203 Forward connection established (tid=id cid=N

local=hostname:port
remote=hostname:port)
Managing log files 357
About log event codes

Table 15-9 Status codes for Network Prevent for Email operational log (continued)

Code Description

1204 Forward connection closed (tid=id cid=N

local=hostname:port
remote=hostname:port)

1205 Service connection closed (tid=id cid=N

local=hostname:port
remote=hostname:port messages=1 time=0.14s)

Connectivity Errors

5200 Connection is rejected from the unauthorized host (tid=id

local=hostname:port
remote=hostname:port)

5201 Local connection error (tid=id cid=N

local=hostname:port
remote=hostname:port reason=Explanation)

5202 Sender connection error (tid=id cid=N

local=hostname:port
remote=hostname:port reason=Explanation)

5203 Forwarding connection error (tid=id cid=N

local=hostname:port
remote=hostname:port reason=Explanation)

5204 Peer disconnected unexpectedly (tid=id cid=N

local=hostname:port
remote=hostname:port reason=Explanation)

5205 Could not create listener (address=local=hostname:port

reason=Explanation)

5206 Authorized MTAs contains invalid hosts: hostname,

hostname, ...

5207 MTA restrictions are active, but no MTAs are authorized

to communicate with this host
Managing log files 358
About log event codes

Table 15-9 Status codes for Network Prevent for Email operational log (continued)

Code Description

5208 TLS handshake failed (reason=Explanation tid=id cid=N

local=hostname remote=hostname)

5209 TLS handshake completed (tid=id cid=N

local=hostname remote=hostname)

5210 All forward hosts unavailable (tid=id cid=N

reason=Explanation)

5211 DNS lookup failure (tid=id cid=N

NextHop=hostname reason=Explanation)

5303 Failed to encrypt incoming message (tid=id cid=N

local=hostname remote=hostname)

5304 Failed to decrypt outgoing message (tid=id cid=N

local=hostname remote=hostname)

Message Events

1300 Message complete (cid=N message_id=3 dlp_id=message_identifier

size=number sender=email_address recipient_count=N
disposition=response estatus=statuscode rtime=N
dtime=N mtime=N

Where:

■ Recipient_count is the total number of addressees in the To, CC, and BCC fields.
■ Response is the Network Prevent for Email response which can be one of: PASS, BLOCK,
BLOCK_AND_REDIRECT, REDIRECT, MODIFY, or ERROR.
■ Thee status is an Enhanced Status code.
See “Network Prevent for Email originated responses and codes” on page 359.
■ The rtime is the time in seconds for Network Prevent for Emailto fully receive the message
from the sending MTA.
■ The dtime is the time in seconds for Network Prevent for Email to perform detection on
the message.
■ The mtime is the total time in seconds for Network Prevent for Email to process the
message Message Errors.

Message Errors
Managing log files 359
About log event codes

Table 15-9 Status codes for Network Prevent for Email operational log (continued)

Code Description

5300 Error while processing message (cid=N message_id=header_ID

dlp_id=message_identifier size=0 sender=email_address
recipient_count=N disposition=response estatus=statuscode
rtime=N dtime=N mtime=N reason=Explanation

Where header_ID is an RFC 822 Message-Id header if one exists.

5301 Sender rejected during re-submit

5302 Recipient rejected during re-submit

See “About log files” on page 333.

Network Prevent for Email originated responses and codes

Network Prevent for Email originates the following responses. Other protocol responses are
expected as Network Prevent for Email relays command stream responses from the forwarding
MTA to the sending MTA. Table 15-10 shows the responses that occur in situations where
Network Prevent must override the receiving MTA. It also shows the situations where Network
Preventgenerates a specific response to an event that is not relayed from downstream.
“Enhanced Status” is the RFC1893 Enhanced Status Code associated with the response.

Table 15-10 Network Prevent for Email originated responses

Code Enhanced Text Description

Status

250 2.0.0 Ok: Carry on. Success code that Network Prevent for Email uses.

221 2.0.0 Service The normal connection termination code that Network Prevent
closing. for Email generates if a QUIT request is received when no
forward MTA connection is active.

451 4.3.0 Error: This “general, transient” error response is issued when a
Processing (potentially) recoverable error condition arises. This error
error. response is issued when a more specific error response is not
available. Forward connections are sometimes closed, and
their unexpected termination is occasionally a cause of a code
451, status 4.3.0. However sending connections should remain
open when such a condition arises unless the sending MTA
chooses to terminate.
Managing log files 360
About log event codes

Table 15-10 Network Prevent for Email originated responses (continued)

Code Enhanced Text Description

Status

421 4.3.0 Fatal: This “general, terminal” error response is issued when a fatal,
Processing unrecoverable error condition arises. This error results in the
error. immediate termination of any sender or receiver connections.
Closing
connection.

421 4.4.1 Fatal: That an attempt to connect the forward MTA was refused or
Forwarding otherwise failed to establish properly.
agent
unavailable.

421 4.4.2 Fatal: Closing connection. The forwarded MTA connection is lost in
Connection a state where further conversation with the sending MTA is
lost to not possible. The loss usually occurs in the middle of message
forwarding header or body buffering. The connection is terminated
agent. immediately.

451 4.4.2 Error: The forward MTA connection was lost in a state that may be
Connection recoverable if the connection can be re-established. The
lost to sending MTA connection is maintained unless it chooses to
forwarding terminate.
agent.

421 4.4.7 Error: The last command issued did not receive a response within
Request the time window that is defined in the
timeout RequestProcessor.DefaultCommandTimeout. (The time
exceeded. window may be from RequestProcessor.DotCommandTimeout
if the command issued was the “.”). The connection is closed
immediately.

421 4.4.7 Error: The connection was idle (no commands actively awaiting
Connection response) in excess of the time window that is defined in
timeout RequestProcessor.DefaultCommandTimeout.
exceeded.
Managing log files 361
About log event codes

Table 15-10 Network Prevent for Email originated responses (continued)

Code Enhanced Text Description

Status

501 5.5.2 Fatal: A fatal violation of the SMTP protocol (or the constraints that
Invalid are placed on it) occurred. The violation is not expected to
transmission change on a resubmitted message attempt. This message is
request. only issued in response to a single command or data line that
exceeds the boundaries that are defined in
RequestProcessor.MaxLineLength.

502 5.5.1 Error: Defined but not currently used.

Unrecognized
command.

550 5.7.1 User This combination of code and status indicates that a Blocking
Supplied. response rule has been engaged. The text that is returned is
supplied as part of the response rule definition.

Note that a 4xx code and a 4.x.x enhanced status indicate a temporary error. In such cases
the MTA can resubmit the message to the Network Prevent for Email Server. A 5xx code and
a 5.x.x enhanced status indicate a permanent error. In such cases the MTA should treat the
message as undeliverable.
See “About log files” on page 333.
Chapter 16
Using Symantec Data Loss
Prevention utilities
This chapter includes the following topics:

■ About Symantec Data Loss Prevention utilities

■ About Endpoint utilities

■ About DBPasswordChanger

About Symantec Data Loss Prevention utilities

Symantec provides a suite of utilities to help users accomplish those tasks that need to be
done on an infrequent basis. The utilities are typically used to perform troubleshooting and
maintenance tasks. They are also used to prepare data and files for use with the Symantec
Data Loss Prevention software.
The Symantec Data Loss Prevention utilities are provided for both Windows and Linux operating
systems. You use the command line to run the utilities on both operating systems. The utilities
operate in a similar manner regardless of operating system.
Table 16-1 describes how and when to use each utility.

Table 16-1 Symantec Data Loss Prevention utilities

Name Description

DBPasswordChanger Changes the encrypted password that the Enforce Server uses to connect to the Oracle
database.

See “About DBPasswordChanger” on page 364.

Using Symantec Data Loss Prevention utilities 363
About Endpoint utilities

Table 16-1 Symantec Data Loss Prevention utilities (continued)

Name Description

sslkeytool Generates custom authentication keys to improve the security of the data that is transmitted
between the Enforce Server and detection servers. The custom authentication keys must be
copied to each Symantec Data Loss Prevention server.

See the topic "About the sslkeytool utility and server certificates" in the Symantec Data Loss
Prevention Installation Guide.

SQL Preindexer Indexes an SQL database or runs an SQL query on specific data tables within the database.
This utility is designed to pipe its output directly to the Remote EDM Indexer utility.

See “About the SQL Preindexer for EDM” on page 586.

Remote EDM Indexer Converts a comma-separated or tab-delimited data file into an exact data matching index.
The utility can be run on a remote machine to provide the same indexing functionality that is
available locally on the Enforce Server.

This utility is often used with the SQL Preindexer. The SQL Preindexer can run an SQL query
and pass the resulting data directly to the Remote EDM Indexer to create an EDM index.

See “About the Remote EDM Indexer” on page 586.

About Endpoint utilities

Table 16-2 describes those utilities that apply to the Endpoint products.
See “About agent password management”on page 2489 on page 2489.

Table 16-2 Endpoint utilities

Name Description

Service_Shutdown.exe This utility enables an administrator to turn off both the agent and the watchdog services on
an endpoint. (As a tamper-proofing measure, it is not possible for a user to stop either the
agent or the watchdog service.)

See “Shutting down the agent and the watchdog services on Windows endpoints” on page 2492.

Vontu_sqlite3.exe This utility provides an SQL interface that enables you to view or modify the encrypted
database files that the Symantec DLP Agent uses. Use this tool when you want to investigate
or make changes to the Symantec Data Loss Prevention files.

See “Inspecting the database files accessed by the agent” on page 2493.

Logdump.exe This tool lets you view the Symantec DLP Agent extended log files, which are hidden for
security reasons.

See “Viewing extended log files” on page 2494.

Using Symantec Data Loss Prevention utilities 364
About DBPasswordChanger

Table 16-2 Endpoint utilities (continued)

Name Description

Start_agent This utility enables an administrator to start agents running on Mac endpoints that have been
shut down using the shutdown task.

See “Starting DLP Agents that run on Mac endpoints” on page 2499.

About DBPasswordChanger
Symantec Data Loss Prevention stores encrypted passwords to the Oracle database in a file
that is called DatabasePassword.properties, located in C:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config (Windows)
or /opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config (Linux).
Because the contents of the file are encrypted, you cannot directly modify the file. The
DBPasswordChanger utility changes the stored Oracle database passwords that the Enforce
Server uses.
Before you can use DBPasswordChanger to change the password to the Oracle database
you must:
■ Shut down the Enforce Server.
■ Change the Oracle database password using Oracle utilities.
See “Example of using DBPasswordChanger” on page 365.

DBPasswordChanger syntax
The DBPasswordChanger utility uses the following syntax:

DBPasswordChanger password_file new_oracle_password

All command-line parameters are required. The following table describes each command-line
parameter.
See “Example of using DBPasswordChanger” on page 365.
Using Symantec Data Loss Prevention utilities 365
About DBPasswordChanger

Table 16-3 DBPasswordChanger command-line parameters

Parameter Description

password_file Specifies the file that contains the encrypted password. By

default, this file is named DatabasePassword.properties
and is stored in

C:\Program Files\Symantec\DataLossPrevention
\EnforceServer\15.5\Protect\config (Windows) or

/opt/Symantec/DataLossPrevention/
EnforceServer/15.5/Protect/config (Linux).

new_oracle_password Specifies the new Oracle password to encrypt and store.

Example of using DBPasswordChanger

If Symantec Data Loss Prevention was installed in the default location, then the
DBPasswordChanger utility is located at C:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\bin (Windows) or
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/bin (Linux). You must
be an Administrator (or root) to run DBPasswordChanger.
For example, type:

DBPasswordChanger \Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\Datab
protect_oracle

See “DBPasswordChanger syntax” on page 364.

Section 4
Authoring policies

■ Chapter 17. Introduction to policies

■ Chapter 18. Overview of policy detection

■ Chapter 19. Creating policies from templates

■ Chapter 20. Configuring policies

■ Chapter 21. Administering policies

■ Chapter 22. Best practices for authoring policies

■ Chapter 23. Increasing the Inspection Content Size

■ Chapter 24. Installing remote indexers

■ Chapter 25. Detecting content using Exact Match Data Identifiers (EMDI)

■ Chapter 26. Detecting content using Exact Data Matching (EDM)

■ Chapter 27. Detecting content using Indexed Document Matching (IDM)

■ Chapter 28. Detecting content using Vector Machine Learning (VML)

■ Chapter 29. Detecting content using Form Recognition - Sensitive Image Recognition

■ Chapter 30. Detecting Content using OCR - Sensitive Image Recognition

■ Chapter 31. Detecting content using data identifiers

Authoring policies 367

■ Chapter 32. Detecting content using keyword matching

■ Chapter 33. Detecting content using regular expressions

■ Chapter 34. Detecting content using classification matching

■ Chapter 35. Detecting international language content

■ Chapter 36. Detecting file properties

■ Chapter 37. Detecting network incidents

■ Chapter 38. Detecting endpoint events

■ Chapter 39. Detecting described identities

■ Chapter 40. Detecting synchronized identities

■ Chapter 41. Detecting profiled identities

■ Chapter 42. Using contextual attributes for Application Detection

■ Chapter 43. Supported file formats for detection

■ Chapter 44. Supported Office Open XML formats for high-performance content extraction

■ Chapter 45. Library of system data identifiers

■ Chapter 46. Library of policy templates

Chapter 17
Introduction to policies
This chapter includes the following topics:

■ About Data Loss Prevention policies

■ Policy components

■ Policy templates

■ Solution packs

■ Policy groups

■ Policy deployment

■ Policy severity

■ Policy authoring privileges

■ Data Profiles

■ User Groups

■ Policy template import and export

■ Workflow for implementing policies

■ Viewing, printing, and downloading policy details

About Data Loss Prevention policies

You implement policies to detect and prevent data loss. A Symantec Data Loss Prevention
policy combines detection rules and response actions. If a policy rule is violated, the system
generates an incident that you can report and act on. The policy rules you implement are
based on your information security objectives. The actions you take in response to policy
Introduction to policies 369
About Data Loss Prevention policies

violations are based on your compliance requirements. The Enforce Server administration
console provides an intuitive, centralized, Web-based interface for authoring policies.
See “Workflow for implementing policies” on page 378.
Table 17-1 describes the policy authoring features provided by Symantec Data Loss Prevention.

Table 17-1 Policy authoring features

Feature Description

Intuitive policy The policy builder interface supports Boolean logic for detection configuration.
building
You can combine different detection methods and technologies in a single policy.

See “Detecting data loss” on page 381.

See “Best practices for authoring policies” on page 449.

Decoupled The system stores response rules and policies as separate entities.
response rules
You can manage and update response rules without having to change policies; you can reuse
response rules across policies.

See “About response rules” on page 1738.

Fine-grained policy The system provides severity levels for policy violations.
reporting
You can report the overall severity of a policy violation by the highest severity.

See “Policy severity” on page 374.

Centralized data The system stores data and group profiles separate from policies.
and group profiling
This separation enables you to manage and update profiles without changing policies.

See “Data Profiles” on page 375.

See “User Groups” on page 376.

Template-based The system provides 65 pre-built policy templates.

policy detection
You can use these templates to quickly configure and deploy policies.

See “Policy templates” on page 371.

Policy sharing The system supports policy template import and export.

You can share policy templates across environments and systems.

See “Policy template import and export” on page 377.

Role-based access The system provides role-based access control for various user and administrative functions.
control
You can create roles for policy authoring, policy administration, and response rule authoring.

See “Policy authoring privileges” on page 375.

Introduction to policies 370
Policy components

Policy components
A valid policy has at least one detection or group rule with at least one match condition.
Response rules are optional policy components.
Policy components describes Data Loss Prevention policy components.

Table 17-2 Policy components

Component Use Description

Policy group Required A policy must be assigned to a single Policy Group.

See “Policy groups” on page 372.

Policy name Required The policy name must be unique within the Policy Group

See “Manage and add policies” on page 432.

Policy rule Required A valid policy must contain at least one rule that declares at least one
match condition.

See “Policy matching conditions” on page 386.

Data Profile May be Exact Data Matching (EDM), Indexed Document Matching (IDM), Vector
required Machine Learning (VML), and Form Recognition policies all require data
profiles.

See “Data Profiles” on page 375.

User group May be A policy requires a User Group only if a group method in the policy
required requires it.

Synchronized DGM rules and exceptions require a User Group.

See “User Groups” on page 376.

Policy description Optional A policy description helps users identify the purpose of the policy.

See “Configuring policies” on page 413.

Policy label Optional A policy label helps Veritas Data Insight business users identify the
purpose of the policy when using the Self-Service Portal.

See “Configuring policies” on page 413.

Response Rule Optional A policy can implement one or more response rules to report and
remediate incidents.

See “About response rules” on page 1738.

Policy exception Optional A policy can contain one or more exceptions to exclude data from
matching.

See “Exception conditions” on page 393.

Introduction to policies 371
Policy templates

Table 17-2 Policy components (continued)

Component Use Description

Compound match Optional A policy rule or exception can implement multiple match conditions.
conditions
See “Compound conditions” on page 394.

Policy templates
Symantec Data Loss Prevention provides policy templates to help you quickly deploy detection
policies in your enterprise. You can share policies across systems and environments by
importing and exporting policy rules and exceptions as templates.
Using policy templates saves you time and helps you avoid errors and information gaps in
your policies because the detection methods are predefined. You can edit a template to create
a policy that precisely suits your needs. You can also export and import your own policy
templates.
Some policy templates are based on well-known sets of regulations, such as the Payment
Card Industry Security Standard, Gramm-Leach-Bliley, California SB1386, and HIPAA. Other
policy templates are more generic, such as Customer Data Protection, Employee Data
Protection, and Encrypted Data. Although the regulation-based templates can help address
the requirements of the relevant regulations, consult with your legal counsel to verify compliance.
See “Creating a policy from a template” on page 397.
Table 17-3 describes the system-defined policy templates provided by Symantec Data Loss
Prevention.

Table 17-3 System-defined policy templates

Policy template type Description

US Regulatory Enforcement See “US Regulatory Enforcement policy templates” on page 400.

General Data Protection Regulation See “General Data Protection Regulation (GDPR) policy templates”
on page 402.

International Regulatory Enforcement See “International Regulatory Enforcement policy templates” on page 403.

Customer and Employee Data Protection See “Customer and Employee Data Protection policy templates”
on page 404.

Confidential or Classified Data Protection See “Confidential or Classified Data Protection policy templates”
on page 405.

Network Security Enforcement See “Network Security Enforcement policy templates” on page 406.
Introduction to policies 372
Solution packs

Table 17-3 System-defined policy templates (continued)

Policy template type Description

Acceptable Use Enforcement See “Acceptable Use Enforcement policy templates” on page 407.

Imported Templates See “Policy template import and export” on page 377.

Solution packs
Symantec Data Loss Prevention provides solution packs for several industry verticals. A
solution pack contains configured policies, response rules, user roles, reports, protocols, and
the incident statuses that support a particular industry or organization. For a list of available
solution packs and instructions, refer to chapter 4, "Importing a solution pack" in the Symantec
Data Loss Prevention Installation Guide. You can import one solution pack to the Enforce
Server.
Once you have imported the solution pack, start by reviewing its policies. By default the solution
pack activates the policies it provides.
See “Manage and add policies” on page 432.

Policy groups
You deploy policies to detection servers using policy groups. Policy groups limit the policies,
incidents, and detection mechanisms that are accessible to specific users.
Each policy belongs to one policy group. When you configure a policy, you assign it to a policy
group. You can change the policy group assignment, but you cannot assign a policy to more
than one policy group. You deploy policy groups to one or more detection servers.
The Enforce Server is configured with a single policy group called the Default Policy Group.
The system deploys the default policy group to all detection servers. If you define a new policy,
the system assigns the policy to the default policy group, unless you create and specify a
different policy group. You can change the name of the default policy group. A solution pack
creates several policy groups and assigns policies to them.
After you create a policy group, you can link policies, Discover targets, and roles to the policy
group. When you create a Discover target, you must associate it with a single policy group.
When you associate a role with particular policy groups, you can restrict users in that role.
Policies in that policy group detect incidents and report them to users in the role that is assigned
to that policy group.
The relationship between policy groups and detection servers depends on the server type.
You can deploy a policy group to one or more Network Monitor, Network Prevent, or Endpoint
Servers. Policy groups that you deploy to an Endpoint Server apply to any DLP Agent that is
Introduction to policies 373
Policy deployment

registered with that server. The Enforce Server automatically associates all policy groups with
all Network Discover Servers.
For Network Monitor and Network Prevent, each policy group is assigned to one or more
Network Monitor Servers, Network Prevent for Email Servers, or Network Prevent for Web
Servers. For Network Discover, policy groups are assigned to individual Discover targets. A
single detection server may handle as many policy groups as necessary to scan its targets.
For Endpoint Monitor, policy groups are assigned to the Endpoint Server and apply to all
registered DLP Agents.
See “Manage and add policy groups” on page 435.
See “Creating and modifying policy groups” on page 436.

Policy deployment
You can use policy groups to organize and deploy your policies in different ways. For example,
consider a situation in which your detection servers are set up across a system that spans
several countries. You can use policy groups to ensure that a detection server runs only the
policies that are valid for a specific location.
You can dedicate some of your detection servers to monitor internal network traffic and dedicate
others to monitor network exit points. You can use policy groups to deploy less restrictive
policies to servers that monitor internal traffic. At the same time, you can deploy stricter policies
to servers that monitor traffic leaving your network.
You can use policy groups to organize policies and incidents by business units, departments,
geographic regions, or any other organizational unit. For example, policy groups for specific
departments may be appropriate where security responsibilities are distributed among various
groups. In such cases, policy groups provide for role-based access control over the viewing
and editing of incidents. You deploy policy groups according to the required division of access
rights within your organization (for example, by business unit).
You can use policy groups for detection-server allocation, which may be more common where
security departments are centralized. In these cases, you would carefully choose the detection
server allocation for each role and reflect the server name in the policy group name. For
example, you might name the groups Inbound and Outbound, United States and International,
or Testing and Production.
In more complex environments, you might consider some combination of the following policy
groups for deploying policies:
■ Sales and Marketing - US
■ Sales and Marketing - Europe
■ Sales and Marketing - Asia
■ Sales and Marketing - Australia, New Zealand
Introduction to policies 374
Policy severity

■ Human Resources - US
■ Human Resources - International
■ Research and Development
■ Customer service
Lastly, you can use policy groups to test policies before deploying them in production, to
manage legacy policies, and to import and export policy templates.
See “Policy groups” on page 372.
See “About role-based access control” on page 109.

Policy severity
When you configure a detection rule, you can select a policy severity level. You can then use
response rules to take action based on a severity level. For example, you can configure a
response rule to take action after a specified number of "High" severity violations.
See “About response rule conditions” on page 1752.
The default severity level is set to "High," unless you change it. The default severity level
applies to any condition that the detection rule matches. For example, if the default severity
level is set to "High," every detection rule violation is labeled with this severity level. If you do
not want to tag every violation with a specific severity, you can define the criteria by which a
severity level is established. In this case the default behavior is overridden. For example, you
can define the "High" severity level to be applied only after a specified number of condition
matches have occurred.
See “Defining rule severity” on page 420.
In addition, you can define multiple severity levels to layer severity reporting. For example,
you can set the "High" severity level after 100 matches, and the medium severity level to apply
after 50 matches.

Table 17-4 Rule severity levels

Rule severity level Description

High If a condition match occurs, it is labeled "High" severity.

Medium If a condition match occurs, it is labeled "Medium" severity.

Low If a condition match occurs, it is labeled "Low" severity.

Info If a condition match occurs, it is labeled "Info" severity.

Introduction to policies 375
Policy authoring privileges

Policy authoring privileges

Policy authors configure and manage policies and their rules and exceptions. To author policies,
a user must be assigned to a role that grants the policy authoring privilege. This role can be
expanded to include management of policy groups, scanning targets, and credentials.
Response rule authoring privileges are separate credentials from policy authoring and
administration privileges. Whether or not policy authors have response rule authoring privileges
is based on your enterprise needs.
Table 17-5 describes the typical privileges for the policy and response rule authoring roles.

Table 17-5 Policy authoring privileges

Role privilege Description

Author Policies Add, configure, and manage policies.

Add, configure, and manage policy rules and exceptions.

Import and export policy templates.

Modify system-defined data identifiers and create custom data identifiers.

Add, configure, and manage User Groups.

Add response rules to policies (but do not create response rules).

See “About role-based access control” on page 109.

Enforce Server Add, configure, and manage policy groups.

Administration
Add, configure, and manage Data Profiles.

See “Configuring roles” on page 114.

Author Response Add, configure, and manage response rules (but do not add them to policies).
Rules
See “About response rule authoring privileges” on page 1757.

Data Profiles
Data Profiles are user-defined configurations that you create to implement Exact Data Matching
(EDM), Indexed Document Matching (IDM), Form Recognition, and Vector Machine Learning
(VML) policy conditions.
See “Data Loss Prevention policy detection technologies” on page 383.
Table 17-6 describes the types of Data Profiles that the system supports.
Introduction to policies 376
User Groups

Table 17-6 Types of Data Profiles

Data Profile type Description

Exact Data Profile An Exact Data Profile is used for Exact Data Matching (EDM) policies. The Exact Data Profile
contains data that has been indexed from a structured data source, such as a database,
directory server, or CSV file. The Exact Data Profile runs on the detection server. If an EDM
policy is deployed to an endpoint, the DLP Agent sends the message to the detection server
for evaluation (two-tier detection).

See “About the Exact Data Profile and index” on page 528.

See “Introducing profiled Directory Group Matching (DGM)” on page 942.

See “About two-tier detection for EDM on the endpoint” on page 533.

Indexed Document An Indexed Document Profile is used for Indexed Document Matching (IDM) policies. The
Profile Indexed Document Profile contains data that has been indexed from a collection of confidential
documents. The Indexed Document Profile runs on the detection server. If an IDM policy is
deployed to an endpoint, the DLP Agent sends the message to the detection server for
evaluation (two-tier detection).

See “About the Indexed Document Profile” on page 615.

Vector Machine A Vector Machine Learning Profile is used for Vector Machine Learning (VML) policies. The
Learning Profile Vector Machine Learning Profile contains a statistical model of the features (keywords)
extracted from content that you want to protect. The VML profile is loaded into memory by
the detection server and DLP Agent. VML does not require two-tier detection.

See “About the Vector Machine Learning Profile” on page 665.

Form Recognition A Form Recognition Profile is used for Form Recognition policies. The Form Recognition
Profile Profile contains blank images of forms you want to detect.

When you configure a profile, yoo specify a numeric value to represent the Fill Threshold.
This number is a value from 1-10. 1 represents a form that has been filled out minimally and
10 a form that is completely filled in. If the Fill Threshold is met or exceeded, an incident is
opened.

See “Managing Form Recognition profiles” on page 700.

User Groups
You define User Groups on the Enforce Server. User Groups contain user identity information
that you populate by synchronizing the Enforce Server with a group directory server (Microsoft
Active Directory).
You must have at least policy authoring or server administrator privileges to define User Groups.
You must define the User Groups before you synchronize users.
Introduction to policies 377
Policy template import and export

Once you define a User Group, you populate it with users, groups, and business units from
your directory server. After the user group is populated, you associate it with the User/Sender
and Recipient detection rules or exceptions. The policy only applies to members of that User
Group.
See “Introducing synchronized Directory Group Matching (DGM)” on page 935.
See “Configuring directory server connections” on page 156.
See “Configuring User Groups” on page 936.

Policy template import and export

You can export and import policy templates to and from the Enforce Server. This feature lets
you share policy templates across environments, version existing policies, and archive legacy
policies.
Consider a scenario where you author and refine a policy on a test system and then export
the policy as a template. You then import this policy template to a production system for
deployment to one or more detection servers. Or, if you want to retire a policy, you export it
as a template for archiving, then remove it from the system.
See “Importing policy templates” on page 441.
See “Exporting policy detection as a template” on page 442.
A policy template is an XML file. The template contains the policy metadata, and the detection
and the group rules and exceptions. If a policy template contains more than one condition that
requires a Data Profile, the system imports only one of these conditions. A policy template
does not include policy response rules, or modified or custom data identifiers.
Table 17-7 describes policy template components.

Table 17-7 Components included in policy templates

Policy component Description Included in

Template

Policy metadata (name, The name of the template has to be less than 60 characters or YES
description, label) it does not appear in the Imported Templates list.

Described Content Matching If the template contains only DCM methods, it imports as YES
(DCM) rules and exceptions exported without changes.

Exact Data Matching (EDM) If the template contains multiple EDM or IDM match conditions, YES
and Indexed Document only one is exported.
Matching (IDM) conditions
If the template contains an EDM and an IDM condition, the
system drops the IDM.
Introduction to policies 378
Workflow for implementing policies

Table 17-7 Components included in policy templates (continued)

Policy component Description Included in

Template

User Group User group methods are maintained on import only if the user NO
groups exist on the target before import.

Policy Group Policy groups do not export. On import you can select a local NO
policy group, otherwise the system assigns the policy to the
Default Policy group.

Response Rules You must define and add response rules to policies from the NO
local Enforce Server instance.

Data Profiles On import you must reference a locally defined Data Profile, NO
otherwise the system drops any methods that require a Data
Profile.

Custom data identifiers Modified and custom data identifiers do not export. NO

Custom protocols Custom protocols do not export. NO

Policy state Policy state (Active/Suspended) does not export. NO

Workflow for implementing policies

Policies define the content, event context, and identities you want to detect. Policies may also
define response rule actions if a policy is violated. Successful policy creation is a process that
requires careful analysis and proper configuration to achieve optimum results.
Table 17-8 describes the typical workflow for implementing Data Loss Prevention policies.

Table 17-8 Policy implementation process

Action Description

Familiarize yourself with the different types of detection See “Detecting data loss” on page 381.
technologies and methods that Symantec Data Loss
See “Data Loss Prevention policy detection technologies”
Prevention provides, and considerations for authoring
on page 383.
data loss prevention policies.
See “Policy matching conditions” on page 386.

See “Best practices for authoring policies” on page 449.

Develop a policy detection strategy that defines the type See “Develop a policy strategy that supports your data
of data you want to protect from data loss. security objectives” on page 451.
Introduction to policies 379
Viewing, printing, and downloading policy details

Table 17-8 Policy implementation process (continued)

Action Description

Review the policy templates that ship with Symantec See “Policy templates” on page 371.
Data Loss Prevention, and any templates that you import
See “Solution packs” on page 372.
manually or by solution pack.

Create policy groups to control how your policies are See “Policy groups” on page 372.
accessed, edited, and deployed.
See “Policy deployment” on page 373.

To detect exact data or content or similar unstructured See “Data Profiles” on page 375.
data, create one or more Data Profiles.

To detect exact identities from a synchronized directory See “User Groups” on page 376.
server (Active Directory), configure one or more User
Groups.

Configure conditions for detection and group rules and See “Creating a policy from a template” on page 397.
exceptions.

Test and tune your policies. See “Test and tune policies to improve match accuracy”
on page 453.

Add response rules to the policy to take action when See “About response rules” on page 1738.
the policy is violated.

Manage the policies in your enterprise. See “Manage and add policies” on page 432.

Viewing, printing, and downloading policy details

You may be required to share high-level details about your policies with individuals who are
not Symantec Data Loss Prevention users. For example, you might be asked to provide policy
details to an information security officer in your company, or to and outside security auditor.
To facilitate such an action, you can view and print policy details in an easily readable format
from the Policy List screen. The policy detail view does not include any technical nomenclature
or branding specific to Symantec Data Loss Prevention. It displays the policy name, description,
label, group, status, version, and last modified date for the policy. It also displays the detection
and the response rules for that policy.
Any user with the Author Policies privilege for a given policy or set of policies can view and
print policy details.
See “Policy authoring privileges” on page 375.
Table 17-9 describes how to work with policy details.
Introduction to policies 380
Viewing, printing, and downloading policy details

Table 17-9 Working with policy details

Action Description

View and print details for a single policy. See “Viewing and printing policy details”
on page 444.

Download details for all policies. See “Downloading policy details” on page 444.
Chapter 18
Overview of policy detection
This chapter includes the following topics:

■ Detecting data loss

■ Data Loss Prevention policy detection technologies

■ Policy matching conditions

■ Detection messages and message components

■ Exception conditions

■ Compound conditions

■ Policy detection execution

■ Two-tier detection for DLP Agents

Detecting data loss

Symantec Data Loss Prevention detects data from virtually any type of message or file, any
user, sender, or recipient, wherever your data or endpoints exist. You can use Data Loss
Prevention to detect both the content and the context of data within your enterprise. You define
and manage your detection policies from the centralized, Web-based Enforce Server
administration console.
See “Content that can be detected” on page 382.
See “Files that can be detected” on page 382.
See “Protocols that can be monitored” on page 382.
See “Endpoint events that can be detected” on page 383.
See “Identities that can be detected” on page 383.
See “Languages that can be detected” on page 383.
Overview of policy detection 382
Detecting data loss

Content that can be detected

Symantec Data Loss Prevention detects data and document content, including text, markup,
presentations, spreadsheets, archive files and their contents, email messages, database files,
designs and graphics, multimedia files, image-based forms and more. For example, the system
can open a compressed file and scan a Microsoft Word document within the compressed file
for the keyword "confidential." If the keyword is matched, the detection engine flags the message
as an incident.
Content-based detection is based on actual content, not the file itself. A detection server can
detect extracts or derivatives of protected or described content. This content may include
sections of documents that have been copied and pasted to other documents or emails. A
detection server can also identify sensitive data in a different file format than the source file.
For example, if a confidential Word file is fingerprinted, the detection engine can match the
content emailed in a PDF attachment.
See “Content matching conditions” on page 387.

Files that can be detected

Symantec Data Loss Prevention recognizes many types of files and attachments based on
their context, including file type, file name, and file size. Symantec Data Loss Prevention
identifies over 300 types of files, including word-processing formats, multimedia files,
spreadsheets, presentations, pictures, encapsulation formats, encryption formats, and others.
For file type detection, the system does not rely on the file extension to identify the file type.
For example, the system recognizes a Microsoft Word file even if a user changes the file
extension to .txt. In this case the detection engine checks the binary signature of the file to
match its type.
See “File property matching conditions” on page 388.

Protocols that can be monitored

Symantec Data Loss Prevention detects messages on the network by identifying the protocol
signature: email (SMTP), Web (HTTP), file transfer (FTP), newsgroups (NNTP), TCP, Telnet,
and SSL.
You can configure a detection server to listen on non-default ports for data loss violations. For
example, if your network transmits Web traffic on port 81 instead of port 80, the system still
recognizes the transmitted content as HTTP.
See “Protocol matching condition for network” on page 389.
Overview of policy detection 383
Data Loss Prevention policy detection technologies

Endpoint events that can be detected

Symantec Data Loss Prevention lets you detect data loss violations at several endpoint
destinations. These destinations include the local drive, CD/DVD drive, removable storage
devices, network file shares, Windows Clipboard, printers and faxes, and application files. You
can also detect protocol events on the endpoint for email (SMTP), Web (HTTP), and file transfer
(FTP) traffic.
For example, the DLP Agent (installed on each endpoint computer) can detect the copying of
a confidential file to a USB device. Or, the DLP Agent can allow the copying of files only to a
specific class of USB device that meets corporate encryption requirements.
See “Endpoint matching conditions” on page 389.

Identities that can be detected

Symantec Data Loss Prevention lets you detect the identity of data users, message senders,
and message recipients using a variety of methods. These methods include described identity
patterns and exact identities matched from a directory server or a corporate database.
For example, you can detect email messages sent by a specific user, or allow email messages
sent to or from a specific group of users as defined in your Microsoft Active Directory server.
See “Groups (identity) matching conditions” on page 390.

Languages that can be detected

Symantec Data Loss Prevention provides broad international support for detecting data loss
in many languages. Supported languages include most Western and Central European
languages, Hebrew, Arabic, Chinese (simplified and traditional), Japanese, Korean, and more.
The detection engine uses Unicode internally. You can build localized policy rules and
exceptions using any detection technology in any supported language.
See “Supported languages for detection” on page 92.
See “Detecting non-English language content” on page 866.

Data Loss Prevention policy detection technologies

Symantec Data Loss Prevention provides several types of detection technologies to help you
author policies to detect data loss. Each type of detection technology provides unique
capabilities. Often you combine technologies in policies to achieve precise detection results.
In addition, Symantec Data Loss Prevention provides you with several ways to extend policy
detection and match any type of data, content, or files you want.
See “About Data Loss Prevention policies” on page 368.
Overview of policy detection 384
Data Loss Prevention policy detection technologies

See “Best practices for authoring policies” on page 449.

Table 18-1 lists the various types of the detection technologies and customizations provided
by Data Loss Prevention.

Table 18-1 Data Loss Prevention detection technologies

Technology Description

Exact Data Matching (EDM) Use EDM to detect personally identifiable information.

See “Introducing Exact Data Matching (EDM)” on page 525.

Exact Match Data Identifiers Use EMDI to detect structured data, especially personally-identifiable information.
(EMDI) EMDI provides better matching performance and greater memory efficiency than EDM.

See “Introducing Exact Match Data Identifiers (EMDI)” on page 468.

Indexed Document Matching Use IDM to detect exact files and file contents, and derivative content.
(IDM)
See “Introducing Indexed Document Matching (IDM)” on page 612.

Vector Machine Learning Use VML to detect similar document content.

(VML)
See “Introducing Vector Machine Learning (VML)” on page 664.

Form Recognition Use Form Recognition to detect images of forms that belong to a gallery associated
with a Form Recognition policy.

See “About Form Recognition detection” on page 695.

Directory Group Matching Use DGM to detect exact identities synchronized from a directory server or profiled
(DGM) from a database.

See “Introducing synchronized Directory Group Matching (DGM)” on page 935.

See “Introducing profiled Directory Group Matching (DGM)” on page 942.

Overview of policy detection 385
Data Loss Prevention policy detection technologies

Table 18-1 Data Loss Prevention detection technologies (continued)

Technology Description

Described Content Matching Use DCM to detect message content and context, including:
(DCM)
■ Data Identifiers to match content using precise patterns and data validators.
See “Introducing data identifiers” on page 717.
■ Keywords to detect content using key words, key phrases, and keyword dictionaries.
See “Introducing keyword matching” on page 838.
■ Regular Expressions to detect characters, patterns, and strings.
See “Introducing regular expression matching” on page 852.
■ File properties to detect files by type, name, size, and custom type.
See “Introducing file property detection” on page 900.
■ User, sender, and recipient patterns to detect described identities.
See “Introducing described identity matching” on page 925.
■ Protocol signatures to detect network traffic.
See “Introducing protocol monitoring for network” on page 912.
■ Destinations, devices, and protocols to detect endpoint events.
See “Introducing endpoint event detection” on page 915.

Information Centric Tagging ■ Classifications to detect Information Centric Tagging tags

(ICT) See “Introducing classification matching” on page 858.
Overview of policy detection 386
Policy matching conditions

Table 18-1 Data Loss Prevention detection technologies (continued)

Technology Description

Custom policy detection Data Loss Prevention provides methods for customizing and extending detection,
methods including:

■ Custom Data Identifiers

Implement your own data identifier patterns and system-defined validators.
See “Introducing data identifiers” on page 717.
■ Custom script validators for Data Identifiers
Use the Symantec Data Loss Prevention Scripting Language to validate custom
data types.
See “Workflow for creating custom data identifiers” on page 812.
■ Custom file type identification
Use the Symantec Data Loss Prevention Scripting Language to detect custom file
types.
See “About custom file type identification” on page 901.
■ Custom endpoint device detection
Detect or allow any endpoint device using regular expressions.
See “About endpoint device detection” on page 917.
■ Custom network protocol detection
Define custom TCP ports to tap.
See “Introducing protocol monitoring for network” on page 912.
■ Custom content extraction
Use a plug-in to identify custom file formats and extract file contents for analysis
by the detection server.
See “Overview of detection file format support” on page 962.

Policy matching conditions

Symantec Data Loss Prevention provides several types of match conditions, each offering
unique detection capabilities. You implement match conditions in policies as rules or exceptions.
Detection rules use conditions to match message content or context. Group rules use conditions
to match identities. You can also use conditions as detection and group policy exceptions.
See “Exception conditions” on page 393.
Table 18-2 lists the various types of policy matching conditions provided by Data Loss
Prevention.

Table 18-2 Policy match condition types

Condition type Description

Content See “Content matching conditions” on page 387.

Overview of policy detection 387
Policy matching conditions

Table 18-2 Policy match condition types (continued)

Condition type Description

File property See “File property matching conditions” on page 388.

Protocol See “Protocol matching condition for network” on page 389.

Endpoint See “Endpoint matching conditions” on page 389.

Groups (identity) See “Groups (identity) matching conditions” on page 390.

Content matching conditions

Symantec Data Loss Prevention provides several conditions to match message content. Certain
content conditions require an associated Data Profile and index. For content detection, you
can match on individual message components, including header, body, attachments, and
subject for some conditions.
See “Detection messages and message components” on page 391.
See “Content that can be detected” on page 382.
Table 18-3 lists the content matching conditions that you can use without a Data Profile and
index.

Table 18-3 Content matching conditions

Content rule type Description

Content Matches Regular Match described content using regular expressions.

Expression
See “Introducing regular expression matching” on page 852.

See “Configuring the Content Matches Regular Expression condition” on page 854.

Content Matches Keyword Match described content using keywords, key phrases, and keyword dictionaries

See “Introducing keyword matching” on page 838.

See “Configuring the Content Matches Keyword condition” on page 844.

Content Matches Data Match described content using Data Identifier patterns and validators.
Identifier
See “Introducing data identifiers” on page 717.

See “Configuring the Content Matches data identifier condition” on page 737.

Content Matches Match described content using Information Centric Tagging tagged files and emails.
Classification
See “Introducing classification matching” on page 858.

Table 18-4 lists the content matching conditions that require a Data Profile and index.
Overview of policy detection 388
Policy matching conditions

See “Data Profiles” on page 375.

See “Two-tier detection for DLP Agents” on page 395.

Table 18-4 Index-based content matching conditions

Content rule type Description

Content Matches Exact Data Match exact data profiled from a structured data source such as a database or CSV
From an Exact Data Profile file.
(EDM)
See “Introducing Exact Data Matching (EDM)” on page 525.

See “Configuring the Content Matches Exact Data policy condition for EDM”
on page 551.
Note: This condition requires two-tier detection on the endpoint. See “About two-tier
detection for EDM on the endpoint” on page 533.

Content Matches Document Match files and file contents exactly or partially using fingerprinting
Signature From an Indexed
See “Introducing Indexed Document Matching (IDM)” on page 612.
Document Profile (IDM)
See “Configuring the Content Matches Document Signature policy condition”
on page 646.
Note: This condition requires two-tier detection on the endpoint. See “About the
Indexed Document Profile” on page 615.

Detect using Vector Machine Match file contents with features similar to example content you have trained.
Learning profile (VML)
See “Introducing Vector Machine Learning (VML)” on page 664.

See “Configuring the Detect using Vector Machine Learning Profile condition”
on page 679.

File property matching conditions

Symantec Data Loss Prevention provides several conditions to match file properties, including
file type, file size, and file name.
See “Files that can be detected” on page 382.

Table 18-5 File property match conditions

Condition type Description

Message Attachment or File Match specific file formats and document attachments.
Type Match
See “About file type matching” on page 900.

See “Configuring the Message Attachment or File Type Match condition” on page 904.
Overview of policy detection 389
Policy matching conditions

Table 18-5 File property match conditions (continued)

Condition type Description

Message Attachment or File Match files or attachments over or under a specified size.
Size Match
See “About file size matching” on page 902.

See “Configuring the Message Attachment or File Size Match condition” on page 905.

Message Attachment or File Match files or attachments that have a specific name or match wildcards.
Name Match
See “About file name matching” on page 903.

See “Configuring the Message Attachment or File Name Match condition”

on page 906.

Message/Email Properties and Classify Microsoft Exchange email messages based on specific message attributes
Attributes (MAPI attributes).

Custom File Type Signature Match custom file types based on their binary signature using scripting.

See “About custom file type identification” on page 901.

See “Enabling the Custom File Type Signature condition in the policy console”
on page 908.

Protocol matching condition for network

Symantec Data Loss Prevention provides the single Protocol Monitoring condition to match
network traffic for policy detection rules and exceptions.
See “Protocols that can be monitored” on page 382.

Table 18-6 Protocol matching condition for network monitoring

Match condition Description

Protocol Monitoring Match incidents on the network transmitted using a specified protocol, including
SMTP, FTP, HTTP/S, IM, and NNTP.

See “Introducing protocol monitoring for network” on page 912.

See “Configuring the Protocol Monitoring condition for network detection” on page 913.

Endpoint matching conditions

Symantec Data Loss Prevention provides several conditions for matching endpoint events.
See “Endpoint events that can be detected” on page 383.
Overview of policy detection 390
Policy matching conditions

Table 18-7 Endpoint matching conditions

Condition Description

Protocol or Endpoint Match endpoint messages transmitted using a specified transport protocol or when
Monitoring data is moved or copied to a particular destination.

See “Introducing endpoint event detection” on page 915.

See “Configuring the Endpoint Monitoring condition” on page 918.

Endpoint Device Class or ID Match endpoint events occurring on specified hardware devices.

See “Introducing endpoint event detection” on page 915.

See “Configuring the Endpoint Device Class or ID condition” on page 920.

Endpoint Location Match endpoint events depending if the DLP Agent is on or off the corporate network.

See “Introducing endpoint event detection” on page 915.

See “Configuring the Endpoint Location condition” on page 919.

Groups (identity) matching conditions

Symantec Data Loss Prevention provides several conditions for matching the identity of users
and groups, and message senders and recipients.
The sender and recipient pattern rules are reusable across policies. The Directory Group
Matching (DGM) rules let you match on sender and recipients derived from Active Directory
(synchronized DGM) or from an Exact Data Profile (profiled DGM).
See “Identities that can be detected” on page 383.
See “Two-tier detection for DLP Agents” on page 395.

Table 18-8 Available group rules for identity matching

Group rule Description

Sender/User Matches Pattern Match message senders and users by email address, user ID, IM screen name,
and IP address.

See “Introducing described identity matching” on page 925.

See “Configuring the Sender/User Matches Pattern condition” on page 927.

Recipient Matches Pattern Match message recipients by email or IP address, or Web domain.

See “Introducing described identity matching” on page 925.

See “Configuring the Recipient Matches Pattern condition” on page 930.

Overview of policy detection 391
Detection messages and message components

Table 18-8 Available group rules for identity matching (continued)

Group rule Description

Sender/User based on a Match message senders and users from a synchronized directory server.
Directory Server Group
See “Introducing synchronized Directory Group Matching (DGM)” on page 935.

See “Configuring the Sender/User based on a Directory Server Group condition”

on page 939.

Sender/User based on a Match message senders and users from a profiled directory server.
Directory from: an Exact Data
See “Introducing profiled Directory Group Matching (DGM)” on page 942.
Profile
See “Configuring the Sender/User based on a Profiled Directory condition”
on page 944.
Note: This condition requires two-tier detection on the endpoint. See “About two-tier
detection for profiled DGM” on page 942.

Recipient based on a Directory Match message recipients from a synchronized directory server.
Server Group
See “Introducing synchronized Directory Group Matching (DGM)” on page 935.

See “Configuring the Recipient based on a Directory Server Group condition”

on page 940.
Note: This condition requires two-tier detection on the endpoint. See “About two-tier
detection for synchronized DGM” on page 936.

Recipient based on a Directory Match message recipients from a profiled directory server.
from: an Exact Data Profile
See “Configuring Exact Data profiles for DGM” on page 943.
See “Configuring the Recipient based on a Profiled Directory condition” on page 945.
Note: This condition requires two-tier detection on the endpoint. See “About two-tier
detection for profiled DGM” on page 942.

Detection messages and message components

Data Loss Prevention detection servers and DLP Agents receive input data for analysis in the
form of messages. The system determines the message type; for example, an email or a Word
document. Depending on the message type, the system either parses the message content
into components (header, subject, body, attachments), or it leaves the message intact. The
system evaluates the message or message components to see if any policy match conditions
apply. If a condition applies and it supports component matching, the system evaluates the
content against each selected message component. If the condition does not support component
matching, the system evaluates the entire message against the match condition.
See “Selecting components to match on” on page 423.
Overview of policy detection 392
Detection messages and message components

The content-based conditions support cross-component matching. You can configure the DCM
content conditions to match across all message components. The EDM condition matches on
message envelope, body, and attachments. The document conditions match on the message
body and attachments, except File Type and Name which only match on the attachment.
Protocol, endpoint, and identity conditions match on the entire message, as does any condition
evaluated by the DLP Agent. The subject component only applies to SMTP email or NNTP
messages.
Table 18-9 summarizes the component matching supported by each match condition type.

Table 18-9 Message components to match on

Condition type Envelope Subject Body Attachment(s)

Described content (DCM) match match match match

conditions for content detection:

Keyword, Data Identifier, Regular

Expression

Information Centric Tagging (ICT) match match

classifications for content
detection:

Classification

Exact Data Matching (EDM) match match match

Indexed Document Matching match match

(IDM)

Vector Machine Learning (VML) match match

Form Recognition match

File Size (DCM) match match

File Type and File Name (DCM) match

Protocol (DCM) match (entire message)

Endpoint (DCM) match (entire message)

Identity (DCM and DGM) match (entire message)

Any condition evaluated by the match (entire message)

DLP Agent
Overview of policy detection 393
Exception conditions

Exception conditions
Symantec Data Loss Prevention provides policy exceptions to exclude messages and message
components from matching. You can use exception conditions to refine the scope of your
detection and group rules.
See “Use a limited number of exceptions to narrow detection scope” on page 455.

Warning: Do not use multiple compound exceptions in a single policy. Doing so can cause
detection to run out of memory. If you find that the policy needs multiple compound exceptions
to produce matches, you should reconsider the design of the matching conditions.

The system evaluates an inbound message or message component against policy exceptions
before policy rules. If the exception supports cross-component matching (content-based
exceptions), the exception can be configured to match on individual message components.
Otherwise, the exception matches on the entire message.
If an exception is met, the system ejects the entire message or message component containing
the content that triggered the exception. The ejected message or message component is no
longer available for evaluation against policy rules. The system does not discard only the
matched content or data item; it discards the entire message or message component that
contained the excepted item.

Note: Symantec Data Loss Prevention does not support match-level exceptions, only component
or message-level exceptions.

For example, consider a policy that has a detection rule with one condition and an exception
with one condition. The rule matches messages containing Microsoft Word attachments and
generates an incident for each match. The exception excludes from matching messages from
[email protected]. An email from [email protected] that contains a Word attachment is
excepted from matching and does not trigger an incident. The detection exception condition
excluding [email protected] messages takes precedence over the detection rule match
condition that would otherwise match on the message.
See “Policy detection execution” on page 394.
You can implement any condition as an exception, except the EDM condition Content Matches
Exact Data From. In addition, Network Prevent for Web does not support synchronized DGM
exceptions. You can implement IDM as an exception, but the exception excludes exact files
from matching, not file contents. To exclude file contents, you "whitelist" it. VML can be used
as an exception if the content is from the same category.
See “Adding an exception to a policy” on page 424.
See “CAN-SPAM Act policy template” on page 1563.
Overview of policy detection 394
Compound conditions

See “White listing file contents to exclude from partial matching” on page 627.

Compound conditions
A valid policy must declare at least one rule that defines at least one match condition. The
condition matches input data to detect data loss. A rule with a single condition is a simple rule.
Optionally, you can declare multiple conditions within a single detection or group rule. A rule
with multiple conditions is a compound condition.
For compound conditions, each condition in the rule must match to trigger a violation. Thus,
for a single policy that declares one rule with two conditions, if one condition matches but the
other does not, detection does not report a match. If both conditions match, detection reports
a match, assuming that the rule is set to count all matches. In programmatic terms, two or
more conditions in the same rule are ANDed together.
Like rules, you can declare multiple conditions within a single exception. In this case, all
conditions in the exception must match for the exception to apply.
See “Policy detection execution” on page 394.
See “Use compound conditions to improve match accuracy” on page 455.
See “Exception conditions” on page 393.

Policy detection execution

You can include any combination of detection rules, group rules, and exceptions in a single
policy. A detection server evaluates policy exceptions first. If any exception is met, the entire
message or message component matching the exception is ejected and is no longer available
for policy matching.
The detection server evaluates the detection and group rules in the policy on a per-rule basis.
In programmatic terms, where you have a single policy definition, the connection between
conditions in the same rule or exception is AND (compound conditions). The connection
between two or more rules of the same type is OR (for example, 2 detection rules). But, if you
combine rules of different type in a single policy (for example, 1 detection rule and 1 group
rule), the connection between the rules is AND. In this configuration both rules must match to
trigger an incident. However, exception conditions created across the "Detection" and "Groups"
tabs are connected by an implicit OR.
See “Compound conditions” on page 394.
See “Exception conditions” on page 393.
Table 18-10 summarizes the policy condition execution logic for the detection server for various
policy configurations.
Overview of policy detection 395
Two-tier detection for DLP Agents

Table 18-10 Policy condition execution logic

Policy configuration Logic Description

Compound conditions AND If a single rule or exception in a policy contains two or more
match conditions, all conditions must match.

Rules or exceptions of same OR If there are two detection rules in a single policy, or two group
type rules in a single policy, or two exceptions of the same type
(detection or group), the rules or exceptions are independent
of each other.

Rules of different type AND If one or more detection rules is combined with one or more
group rules in a single policy, the rules are dependent.

Exceptions of different type OR If one or more detection exceptions is combined with one or
more group exceptions in a single policy, the exceptions are
independent.

Two-tier detection for DLP Agents

Symantec Data Loss Prevention uses a two-tier detection architecture to analyze activity on
endpoints for some index-based match conditions.
Two-tier detection requires communication and data transfer between the DLP Agent and the
Endpoint Server to detect incidents. If a match condition requires two-tier detection, the condition
is not evaluated locally on the endpoint by the DLP Agent. Instead, the DLP Agent sends the
data to the Endpoint Server for policy evaluation.
See “Guidelines for authoring Endpoint policies” on page 2275.
The effect of two-tier detection is that policy evaluation is delayed for the time it takes the data
to be sent to and evaluated by the Endpoint Server. If the DLP Agent is not connected to the
network or cannot communicate with the Endpoint Server, the condition requiring two-tier
detection is not evaluated until the DLP Agent connects. This delay can impact performance
of the DLP Agent if the message is a large file or attachment.
See “Troubleshooting policies” on page 445.
Two-tier detection has implications for the kinds of policies you author for endpoints. You can
reduce the potential bottleneck of two-tier detection by being aware of the detection conditions
that require two-tier detection and author your endpoint policies in such a way to eliminate or
reduce the need for two-tier detection.
See “Author policies to limit the potential effect of two-tier detection” on page 456.
Table 18-11 lists the detection conditions that require two-tier detection on the endpoint.
Overview of policy detection 396
Two-tier detection for DLP Agents

Note: You cannot combine an Endpoint Prevent: Notify or Block response rule with two-tier
match conditions, including Exact Data Matching (EDM), Directory Group Matching (DGM),
and Indexed Document Matching (IDM) when two-tier detection is enabled. If you do, the
system displays a warning for both the detection condition and the response rule.

Table 18-11 Policy matching conditions requiring two-tier detection

Detection technology Match condition Description

Exact Data Matching (EDM) Content Matches Exact Data from See “Introducing Exact Data Matching
an Exact Data Profile (EDM)” on page 525.

See “About two-tier detection for EDM

on the endpoint” on page 533.

Profiled Directory Group Matching Sender/User based on a Directory See “Introducing profiled Directory
(DGM) from an Exact Data Profile Group Matching (DGM)” on page 942.

Recipient based on a Directory from See “About two-tier detection for

an Exact Data Profile profiled DGM” on page 942.

Synchronized Directory Group Recipient based on a Directory See “Introducing synchronized

Matching (DGM) Server Group Directory Group Matching (DGM)”
on page 935.

See “About two-tier detection for

synchronized DGM” on page 936.

Indexed Document Matching (IDM) Content Matches Document See “Introducing Indexed Document
Signature from an Indexed Document Matching (IDM)” on page 612.
Profile
See “Two-tier IDM detection”
on page 615.
Note: Two-tier detection for IDM only
applies if it is enabled on the Endpoint
Server (two_tier_idm = on). If Endpoint
IDM is enabled (two_tier_idm = off),
two-tier detection is not used.
Chapter 19
Creating policies from
templates
This chapter includes the following topics:

■ Creating a policy from a template

■ US Regulatory Enforcement policy templates

■ General Data Protection Regulation (GDPR) policy templates

■ International Regulatory Enforcement policy templates

■ Customer and Employee Data Protection policy templates

■ Confidential or Classified Data Protection policy templates

■ Network Security Enforcement policy templates

■ Acceptable Use Enforcement policy templates

■ Columbia Personal Data Regulatory Enforcement policy template

■ Choosing an Exact Data Profile

■ Choosing an Indexed Document Profile

Creating a policy from a template

You can create a policy from a system-provided template or from a template you import to the
Enforce Server.
See “Policy templates” on page 371.
See “Policy template import and export” on page 377.
Creating policies from templates 398
Creating a policy from a template

Table 19-1 Create a policy from a template

Action Description

Add a policy from a template. See “Adding a new policy or policy template” on page 412.

Choose the template you want to At the Manage > Policies > Policy List > New Policy - Template List screen the
use. system lists all policy templates.
System-provided template categories:

■ See “US Regulatory Enforcement policy templates” on page 400.

■ See “General Data Protection Regulation (GDPR) policy templates” on page 402.
■ See “International Regulatory Enforcement policy templates” on page 403.
■ See “Customer and Employee Data Protection policy templates” on page 404.
■ See “Confidential or Classified Data Protection policy templates” on page 405.
■ See “Network Security Enforcement policy templates” on page 406.
■ See “Acceptable Use Enforcement policy templates” on page 407.
■ See “Columbia Personal Data Regulatory Enforcement policy template”
on page 408.
Imported Templates appear individually after import:

■ See “Importing policy templates” on page 441.

Click Next to configure the policy. For example, select the Webmail policy template and click Next.

See “Configuring policies” on page 413.

Choose a Data Profile (if If the template relies on one or more Data Profiles, the system prompts you to
prompted). select each:
■ Exact Data Profile
See “Choosing an Exact Data Profile” on page 409.
■ Indexed Document Profile
See “Choosing an Indexed Document Profile” on page 411.
If you do not have a Data Profile, you can either:

■ Cancel the policy definition process, define the profile, and resume creating the
policy from the template.
■ Click Next to configure the policy.
On creation of the policy, the system drops any rules or exceptions that rely on
the Data Profile.

Note: You should use a profile if a template calls for it.

Creating policies from templates 399
Creating a policy from a template

Table 19-1 Create a policy from a template (continued)

Action Description

Edit the policy name or If you intend to modify a system-defined template, you may want to change the
description (optional). name so you can distinguish it from the original.

See “Configuring policies” on page 413.

Note: If you want to export the policy as a template, the policy name must be less
than 60 characters. If it is more, the template does not appear in the Imported
Templates section of the Template List screen.

Note: The Policy Label field is reserved for the Veritas Data Insight Self-Service
Portal.

Select a policy group (if If you have defined a policy group, select it from the Policy Group list.
necessary).
See “Creating and modifying policy groups” on page 436.

If you have not defined a policy group, the system deploys the policy to the Default
Policy Group.

Edit the policy rules or exceptions The Configure Policy screen displays the rules and exceptions (if any) provided
(if necessary). by the policy.

You can modify, add, and remove policy rules and exceptions to meet your
requirements.

See “Configuring policy rules” on page 417.

See “Configuring policy exceptions” on page 426.

Save the policy and export it Click Save to save the policy.
(optional).
You can export policy detection as a template for sharing or archiving.

See “Exporting policy detection as a template” on page 442.

For example, if you changed the configuration of a system-defined policy template,

you may want to export it for sharing across environments.

Test and tune the policy Test and tune the policy using data the policy should and should not detect.
(recommended).
Review the incidents that the policy generates. Refine the policy rules and
exceptions as necessary to reduce false positives and false negatives.

Add response rules (optional). Add response rules to the policy to report and remediate violations.

See “Implementing response rules” on page 1758.

Note: Response rules are not included in policy templates.
Creating policies from templates 400
US Regulatory Enforcement policy templates

US Regulatory Enforcement policy templates

Symantec Data Loss Prevention provides several policy templates supporting US Regulatory
Enforcement guidelines.
See “Creating a policy from a template” on page 397.

Table 19-2 US Regulatory Enforcement policy templates

Policy template Description

CAN-SPAM Act Establishes requirements for sending commercial email.

See “CAN-SPAM Act policy template” on page 1563.

Defense Message System (DMS) GENSER Detects information classified as confidential.

Classification
See “Defense Message System (DMS) GENSER Classification
policy template” on page 1572.

Export Administration Regulations (EAR) Enforces the U.S. Department of Commerce Export Administration
Regulations (EAR).

See “Export Administration Regulations (EAR) policy template”

on page 1576.

FACTA 2003 (Red Flag Rules) Enforces sections 114 and 315 (or Red Flag Rules) of the Fair
and Accurate Credit Transactions Act (FACTA) of 2003.

See “FACTA 2003 (Red Flag Rules) policy template” on page 1577.

Gramm-Leach-Bliley This policy limits sharing of consumer information by financial

institutions.

See “Gramm-Leach-Bliley policy template” on page 1688.

HIPAA and HITECH (including PHI) This policy enforces the US Health Insurance Portability and
Accountability Act (HIPAA).

See “HIPAA and HITECH (including PHI) policy template”

on page 1690.

International Traffic in Arms Regulations (ITAR) This policy enforces the US Department of State ITAR provisions.

See “International Traffic in Arms Regulations (ITAR) policy

template” on page 1696.

Medicare and Medicaid (including PHI) This policy detects protected health information (PHI) associated
with the United States Medicare and Medicaid programs.

See “Medicare and Medicaid (including PHI)” on page 1698.

Creating policies from templates 401
US Regulatory Enforcement policy templates

Table 19-2 US Regulatory Enforcement policy templates (continued)

Policy template Description

NASD Rule 2711 and NYSE Rules 351 and 472 This policy protects the name(s) of any companies that are involved
in an upcoming stock offering.

See “NASD Rule 2711 and NYSE Rules 351 and 472 policy
template” on page 1700.

NASD Rule 3010 and NYSE Rule 342 This policy monitors brokers-dealers communications.

See “NASD Rule 3010 and NYSE Rule 342 policy template”
on page 1702.

NERC Security Guidelines for Electric Utilities This policy detects the information that is outlined in the North
American Electric Reliability Council (NERC) security guidelines
for the electricity sector.

See “NERC Security Guidelines for Electric Utilities policy template”

on page 1703.

Office of Foreign Assets Control (OFAC) This template detects communications involving targeted OFAC
groups.

See “Office of Foreign Assets Control (OFAC) policy template”

on page 1706.

OMB Memo 06-16 and FIPS 199 Regulations This template detects information that is classified as confidential.

See “OMB Memo 06-16 and FIPS 199 Regulations policy template”
on page 1707.

Payment Card Industry Data Security Standard This template detects credit card number data.

See “Payment Card Industry (PCI) Data Security Standard policy

template” on page 1709.

Sarbanes-Oxley This template detects sensitive financial data.

See “Sarbanes-Oxley policy template” on page 1716.

SEC Fair Disclosure Regulation This template detects data disclosure of material financial
information.

See “SEC Fair Disclosure Regulation policy template” on page 1719.

State Data Privacy This template detects breaches of state-mandated confidentiality.

See “State Data Privacy policy template” on page 1723.

Creating policies from templates 402
General Data Protection Regulation (GDPR) policy templates

Table 19-2 US Regulatory Enforcement policy templates (continued)

Policy template Description

US Intelligence Control Markings (CAPCO) and This template detects authorized terms to identify classified
DCID 1/7 information in the US Federal Intelligence community.

See “US Intelligence Control Markings (CAPCO) and DCID 1/7

policy template” on page 1729.

General Data Protection Regulation (GDPR) policy

templates
The General Data Protection Regulation (GDPR) is a regulation by which the European
Commission intends to strengthen and unify data protection for individuals within the EU. It
also addresses export of personal data outside the EU. The primary objectives of the GDPR
are to give citizens back the control of their personal data and to simplify the regulatory
environment for international business by unifying the regulation within the EU. The GDPR
replaces the EU Data Protection Directives as of 25 May 2018.
Symantec Data Loss Prevention provides several policy template for General Data Protection
Regulation (GDPR) compliance.
See “Creating a policy from a template” on page 397.

Table 19-3
Policy template Description

General Data Protection Regulations (Banking and This policy protects personal identifiable information related
Finance) to banking and finance.

See “General Data Protection Regulation (Banking and

Finance)” on page 1583.

General Data Protection Regulation (Digital Identity) This policy protects personal identifiable information related
to digital identity.

See “General Data Protection Regulation (Digital Identity)”

on page 1617.

General Data Protection Regulation (Government This policy protects personal identifiable information related
Identification) to government identification.

See “General Data Protection Regulation (Government

Identification)” on page 1618.
Creating policies from templates 403
International Regulatory Enforcement policy templates

Table 19-3 (continued)

Policy template Description

General Data Protection Regulation (Healthcare and This policy protects personal identifiable information related
Insurance) to healthcare and insurance.

See “General Data Protection Regulation (Healthcare and

Insurance)” on page 1656.

General Data Protection Regulation (Personal Profile) This policy protects personal identifiable information related
to personal profile data.

See “General Data Protection Regulation (Personal

Profile)” on page 1672.

General Data Protection Regulation (Travel) This policy protects personal identifiable information related
to travel.

See “General Data Protection Regulation (Travel)”

on page 1675.

International Regulatory Enforcement policy

templates
Symantec Data Loss Prevention provides several policy templates for International Regulatory
Enforcement.
See “Creating a policy from a template” on page 397.

Table 19-4 International Regulatory Enforcement policy templates

Policy template Description

Caldicott Report This policy protects UK patient information.

See “Caldicott Report policy template” on page 1561.

Data Protection Act 1998 This policy protects personal identifiable information.

See “Data Protection Act 1998 policy template” on page 1568.

EU Data Protection Directives This policy detects personal data specific to the EU directives.

See “Data Protection Directives (EU) policy template” on page 1570.

Note: The EU Data Protection Directives are replaced by the General Data
Protection Regulation (GDPR) on 25 May 2018. See “General Data
Protection Regulation (GDPR) policy templates” on page 402.
Creating policies from templates 404
Customer and Employee Data Protection policy templates

Table 19-4 International Regulatory Enforcement policy templates (continued)

Policy template Description

Human Rights Act 1998 This policy enforces Article 8 of the act for UK citizens.
See “Human Rights Act 1998 policy template” on page 1694.

PIPEDA This policy detects Canadian citizen customer data.

See “PIPEDA policy template” on page 1711.

Customer and Employee Data Protection policy

templates
Symantec Data Loss Prevention provides several policy templates for Customer and Employee
Data Protection.
See “Creating a policy from a template” on page 397.

Table 19-5 Customer and Employee Data Protection policy templates

Policy template Description

Canadian Social Insurance Numbers This policy detects patterns indicating Canadian social insurance
numbers.

See “Canadian Social Insurance Numbers policy template” on page 1562.

Credit Card Numbers This policy detects patterns indicating credit card numbers.

See “Credit Card Numbers policy template” on page 1566.

Customer Data Protection This policy detects customer data.

See “Customer Data Protection policy template” on page 1567.

Employee Data Protection This policy detects employee data.

See “Employee Data Protection policy template” on page 1574.

Individual Taxpayer Identification Numbers This policy detects IRS-issued tax processing numbers.
(ITIN)
See “Individual Taxpayer Identification Numbers (ITIN) policy template”
on page 1695.

SWIFT Codes This policy detects codes banks use to transfer money across
international borders.

See “SWIFT Codes policy template” on page 1726.

Creating policies from templates 405
Confidential or Classified Data Protection policy templates

Table 19-5 Customer and Employee Data Protection policy templates (continued)

Policy template Description

UK Drivers License Numbers This policy detects UK Drivers License Numbers.

See “UK Drivers License Numbers policy template” on page 1727.

UK Electoral Roll Numbers This policy detects UK Electoral Roll Numbers.

See “UK Electoral Roll Numbers policy template” on page 1727.

UK National Insurance Numbers This policy detects UK National Insurance Numbers.

See “UK National Insurance Numbers policy template” on page 1728.

UK National Health Service Number This policy detects personal identification numbers issued by the NHS.

See “UK National Health Service (NHS) Number policy template”

on page 1728.

UK Passport Numbers This policy detects valid UK passports.

See “UK Passport Numbers policy template” on page 1728.

UK Tax ID Numbers This policy detects UK Tax ID Numbers.

See “UK Tax ID Numbers policy template” on page 1729.

US Social Security Numbers This policy detects patterns indicating social security numbers.

See “US Social Security Numbers policy template” on page 1730.

Confidential or Classified Data Protection policy

templates
Symantec Data Loss Prevention provides several policy templates for Confidential or Classified
Data Protection.
See “Creating a policy from a template” on page 397.

Table 19-6 Confidential or Classified Data Protection policy templates

Policy template Description

Confidential Documents This policy detects company-confidential documents.

See “Confidential Documents policy template” on page 1565.

Design Documents This policy detects various types of design documents.

See “Design Documents policy template” on page 1573.

Creating policies from templates 406
Network Security Enforcement policy templates

Table 19-6 Confidential or Classified Data Protection policy templates (continued)

Policy template Description

Encrypted Data This policy detects the use of encryption by a variety of methods.
See “Encrypted Data policy template” on page 1575.

Financial Information This policy detects financial data and information.

See “Financial Information policy template” on page 1581.

Merger and Acquisition Agreements This policy detects information and communications about upcoming merger
and acquisition activity.

See “Merger and Acquisition Agreements policy template” on page 1699.

Price Information This policy detects specific SKU and pricing information.

See “Price Information policy template” on page 1713.

Project Data This policy detects discussions of sensitive projects.

See “Project Data policy template” on page 1713.

Proprietary Media Files This policy detects various types of video and audio files.

See “Proprietary Media Files policy template” on page 1713.

Publishing Documents This policy detects various types of publishing documents.

See “Publishing Documents policy template” on page 1714.

Resumes This policy detects active job searches.

See “Resumes policy template” on page 1716.

Source Code This policy detects various types of source code.

See “Source Code policy template” on page 1722.

Symantec DLP Awareness and This policy detects any communications that refer to Symantec DLP or
Avoidance other data loss prevention systems and possible avoidance of detection.

See “Symantec DLP Awareness and Avoidance policy template”

on page 1726.

Network Security Enforcement policy templates

Symantec Data Loss Prevention provides several policy templates for Network Security
Enforcement.
See “Creating a policy from a template” on page 397.
Creating policies from templates 407
Acceptable Use Enforcement policy templates

Table 19-7 Network Security Enforcement policy templates

Policy template Description

Common Spyware Upload Sites This policy detects access to common spyware upload Web sites.
See “Common Spyware Upload Sites policy template” on page 1564.

Network Diagrams This policy detects computer network diagrams.

See “Network Diagrams policy template” on page 1704.

Network Security This policy detects evidence of hacking tools and attack planning.

See “Network Security policy template” on page 1705.

Password Files This policy detects password file formats.

See “Password Files policy template” on page 1709.

Acceptable Use Enforcement policy templates

Symantec Data Loss Prevention provides several policy templates for allowing acceptable
uses of information.
See “Creating a policy from a template” on page 397.

Table 19-8 Acceptable Use Enforcement policy templates

Policy template Description

Competitor Communications This policy detects forbidden communications with competitors.

See “Competitor Communications policy template” on page 1565.

Forbidden Websites This policy detects access to specified Web sites.

See “Forbidden Websites policy template” on page 1581.

Gambling This policy detects any reference to gambling.

See “Gambling policy template” on page 1582.

Illegal Drugs This policy detects conversations about illegal drugs and controlled
substances.

See “Illegal Drugs policy template” on page 1695.

Media Files This policy detects various types of video and audio files.

See “Media Files policy template” on page 1697.

Creating policies from templates 408
Columbia Personal Data Regulatory Enforcement policy template

Table 19-8 Acceptable Use Enforcement policy templates (continued)

Policy template Description

Offensive Language This policy detects the use of offensive language.

See “Offensive Language policy template” on page 1705.

Racist Language This policy detects the use of racist language.

See “Racist Language policy template” on page 1715.

Restricted Files This policy detects various file types that are generally inappropriate to send
out of the company.

See “Restricted Files policy template” on page 1715.

Restricted Recipients This policy detects communications with specified recipients.

See “Restricted Recipients policy template” on page 1715.

Sexually Explicit Language This policy detects sexually explicit content.

See “Sexually Explicit Language policy template” on page 1721.

Violence and Weapons This policy detects violent language and discussions about weapons.

See “Violence and Weapons policy template” on page 1731.

Webmail This policy detects the use of a variety of Webmail services.

See “Webmail policy template” on page 1731.

Yahoo Message Board Activity This policy detects Yahoo message board activity.
See “Yahoo Message Board Activity policy template” on page 1732.

Yahoo and MSN Messengers on Port This policy detects Yahoo IM and MSN Messenger activity.
80
See “Yahoo and MSN Messengers on Port 80 policy template” on page 1733.

Columbia Personal Data Regulatory Enforcement

policy template
Symantec Data Loss Prevention provides a policy templates for the enforcement of Columbian
personal data regulations.
See “Creating a policy from a template” on page 397.
Creating policies from templates 409
Choosing an Exact Data Profile

Table 19-9 Columbia Personal Data Regulatory Enforcement policy template

Policy template Description

Columbian Personal Data Protection Law 1581 This policy detects violations of the Columbian Personal
Data Protection Law 1581.

See “Colombian Personal Data Protection Law 1581 policy

template” on page 1564.

Choosing an Exact Data Profile

If the policy template you select implements Exact Data Matching (EDM), the system prompts
you to choose an Exact Data Profile. Table 19-10 lists the policy templates that are based on
Exact Data Profiles.
If you do not have an Exact Data Profile, you can cancel policy creation and define a profile.
Or, you can choose not to use an Exact Data Profile. In this case the system disables the
associated EDM detection rules in the policy template. You can use any DCM rules or
exceptions the policy template provides.
See “Introducing Exact Data Matching (EDM)” on page 525.
See “About the Exact Data Profile and index” on page 528.
To choose an Exact Data Profile
1 Select an Exact Data Profile from the list of available profiles.
2 Click Next to continue with creating the policy from the template.
Click Previous to return to the list of policy templates.
See “Creating a policy from a template” on page 397.

Note: When the system prompts you to select an Exact Data Profile, the display lists the data
columns to include in the profile to provide the highest level of accuracy. If data fields in your
Exact Data Profile are not represented in the selected policy template, the system displays
those fields for content matching when you define the detection rule

Table 19-10 Policy templates that implement Exact Data Matching (EDM)

Policy template Description

Caldicott Report See “Caldicott Report policy template” on page 1561.

Customer Data Protection See “Customer Data Protection policy template” on page 1567.

Data Protection Act 1988 See “Data Protection Act 1998 policy template” on page 1568.
Creating policies from templates 410
Choosing an Exact Data Profile

Table 19-10 Policy templates that implement Exact Data Matching (EDM) (continued)

Policy template Description

Employee Data Protection See “Employee Data Protection policy template” on page 1574.

EU Data Protection Directives See “Data Protection Directives (EU) policy template” on page 1570.

Export Administration Regulations (EAR) See “Export Administration Regulations (EAR) policy template”
on page 1576.

FACTA 2003 (Red Flag Rules) See “FACTA 2003 (Red Flag Rules) policy template” on page 1577.

General Data Protection Regulations See “General Data Protection Regulation (Banking and Finance)”
(Banking and Finance) on page 1583.

General Data Protection Regulations See “General Data Protection Regulation (Digital Identity)” on page 1617.
(Digital Identity)

General Data Protection Regulations See “General Data Protection Regulation (Government Identification)”
(Government Identification) on page 1618.

General Data Protection Regulations See “General Data Protection Regulation (Healthcare and Insurance)”
(Healthcare and Insurance) on page 1656.

General Data Protection Regulations See “General Data Protection Regulation (Personal Profile)” on page 1672.
(Personal Profile)

General Data Protection Regulations See “General Data Protection Regulation (Travel)” on page 1675.
(Travel)

Gramm-Leach-Bliley See “Gramm-Leach-Bliley policy template” on page 1688.

HIPAA and HITECH (including PHI) See “HIPAA and HITECH (including PHI) policy template” on page 1690.

Human Rights Act 1998 See “Human Rights Act 1998 policy template” on page 1694.

International Traffic in Arms Regulations See “International Traffic in Arms Regulations (ITAR) policy template”
(ITAR) on page 1696.

Payment Card Industry Data Security See “Payment Card Industry (PCI) Data Security Standard policy
Standard template” on page 1709.

PIPEDA See “PIPEDA policy template” on page 1711.

Price Information See “Price Information policy template” on page 1713.

Resumes See “Resumes policy template” on page 1716.

State Data Privacy See “SEC Fair Disclosure Regulation policy template” on page 1719.
Creating policies from templates 411
Choosing an Indexed Document Profile

Choosing an Indexed Document Profile

If the policy template you chose uses Indexed Document Matching (IDM) detection, the system
prompts you to select the Document Profile.
See “Introducing Indexed Document Matching (IDM)” on page 612.
To use a Document Profile
1 Select the Document Profile from the list of available profiles.
2 Click Next to create the policy from the template.
See “Creating a policy from a template” on page 397.
If you do not have a Document Profile, you can cancel policy creation and define the Document
Profile. Or, you can choose to not use a Document Profile. In this case the system disables
any IDM rules or exceptions for the policy instance. If the policy template contains DCM rules
or exceptions, you may use them.
See “About the Indexed Document Profile” on page 615.

Table 19-11 Policy templates that implement Indexed Document Matching (IDM)

Policy template Description

CAN-SPAM Act (IDM exception) See “CAN-SPAM Act policy template” on page 1563.

NASD Rule 2711 and NYSE Rules 351 See “NASD Rule 2711 and NYSE Rules 351 and 472 policy template”
and 472 on page 1700.

NERC Security Guidelines for Electric See “NERC Security Guidelines for Electric Utilities policy template”
Utilities on page 1703.

Sarbanes-Oxley See “Sarbanes-Oxley policy template” on page 1716.

SEC Fair Disclosure Regulation See “SEC Fair Disclosure Regulation policy template” on page 1719.

Confidential Documents See “Confidential Documents policy template” on page 1565.

Design Documents See “Design Documents policy template” on page 1573.

Financial Information See “Financial Information policy template” on page 1581.

Project Data See “Project Data policy template” on page 1713.

Proprietary Media Files See “Proprietary Media Files policy template” on page 1713.

Publishing Documents See “Publishing Documents policy template” on page 1714.

Source Code See “Source Code policy template” on page 1722.

Network Diagrams See “Network Diagrams policy template” on page 1704.

Chapter 20
Configuring policies
This chapter includes the following topics:

■ Adding a new policy or policy template

■ Configuring policies

■ Adding a rule to a policy

■ Configuring policy rules

■ Defining rule severity

■ Configuring match counting

■ Selecting components to match on

■ Adding an exception to a policy

■ Configuring policy exceptions

■ Configuring compound match conditions

■ Input character limits for policy configuration

Adding a new policy or policy template

As a policy author you can define a new policy from scratch or from a template.
See “Workflow for implementing policies” on page 378.
Configuring policies 413
Configuring policies

To add a new policy or a policy template

1 Click New at the Manage > Polices > Policy List screen.
See “Manage and add policies” on page 432.
2 Choose the type of policy you want to add at the New Policy screen.
Select Add a blank policy to add a new empty policy.
See “Policy components” on page 370.
Select Add a policy from a template to add a policy from a template.
See “Policy templates” on page 371.
3 Click Next to configure the policy or the policy template.
See “Configuring policies” on page 413.
See “Creating a policy from a template” on page 397.
Click Cancel to not add a policy and return to the Policy List screen.

Configuring policies
The Manage > Policies > Policy List > Configure Policy screen is the home page for
configuring policies.
Table 20-1 describes the workflow for configuring policies.

Table 20-1 Configuring policies

Action Description

Define a new policy, or edit an existing policy. Add a new blank policy.

See “Adding a new policy or policy template” on page 412.

Create a policy from a template.

See “Creating a policy from a template” on page 397.

Select an existing policy at the Manage > Policies > Policy

List screen to edit it.

See “Manage and add policies” on page 432.

Enter a policy Name and Description. The policy name must be unique in the policy group you deploy
the policy to.

See “Input character limits for policy configuration” on page 431.

Note: The Policy Label field is reserved for the Veritas Data
Insight Self-Service Portal.
Configuring policies 414
Configuring policies

Table 20-1 Configuring policies (continued)

Action Description

Select the Policy Group from the list where the The Default Policy Group is selected if there is no policy group
policy is to be deployed. configured.

See “Creating and modifying policy groups” on page 436.

Set the Status for the policy. You can enable (default setting) or disable a policy. A disabled
policy is deployed but is not loaded into memory to detect
incidents.

See “Manage and add policies” on page 432.

Add a rule to the policy, or edit an existing rule. Click Add Rule to add a rule.

See “Adding a rule to a policy” on page 415.

Select an existing rule to edit it.

Configure the rule with one or more conditions. For a valid policy, you must configure at least one rule that
declares at least one condition. Compound conditions and
exceptions are optional.

See “Configuring policy rules” on page 417.

Optionally, add one or more policy exceptions, or Click Add Exception to add it.
edit an existing exception.
See “Adding an exception to a policy” on page 424.d

Select an existing exception to edit it.

Configure any exception(s). See “Configuring policy exceptions” on page 426.

Save the policy configuration. Click Save to save the policy configuration to the Enforce Server
database.

See “Policy components” on page 370.

Export the policy as a template. Optionally, you can export the policy rules and exceptions as a
template.

See “Exporting policy detection as a template” on page 442.

Add one or more response rules to the policy. You configure response rules independent of policies.

See “Configuring response rules” on page 1763.

See “Adding an automated response rule to a policy”

on page 442.
Configuring policies 415
Adding a rule to a policy

Adding a rule to a policy

At the Manage > Policies > Policy List > Configure Policy – Add Rule screen you add one
or more rules to a policy.
You can add two types of rules to a policy: detection and group. If two or more rules in a policy
are the same type, the system connects them by OR. If two or more rules in the same policy
are different types, the system connects them by AND.
See “Policy detection execution” on page 394.

Note: Exceptions are added separate from rules. See “Adding an exception to a policy”
on page 424.

To add one or more rules to a policy

1 Choose the type of rule (detection or group) to add to the policy.
To add a detection rule, select the Detection tab and click Add Rule.
To add a group (identity) rule, select the Groups tab and click Add Rule.
See “Policy matching conditions” on page 386.
2 Select the detection or the group rule you want to implement from the list of rules.
See Table 20-2 on page 415.
3 Select the prerequisite component, if required.
If the policy rule requires a Data Profile, Data Identifier, or User Group select it from
the list.
4 Click Next to configure the policy rule.
See “Configuring policy rules” on page 417.

Table 20-2 Adding policy rules

Rule Prerequisite Description

Content match conditions

Content Matches Regular See “Introducing regular expression matching”

Expression on page 852.

Content Matches Exact Data Exact Data Profile See “About the Exact Data Profile and index”
on page 528.

See “Choosing an Exact Data Profile” on page 409.

Content Matches Keyword See “Introducing keyword matching” on page 838.

Configuring policies 416
Adding a rule to a policy

Table 20-2 Adding policy rules (continued)

Rule Prerequisite Description

Content Matches Document Indexed Document See “Introducing Indexed Document Matching (IDM)”
Signature Profile on page 612.

See “Choosing an Indexed Document Profile”

on page 411.

Content Matches Data Identifier Data Identifier See “Introducing data identifiers” on page 717.

See “Selecting a data identifier breadth” on page 739.

Content Matches Classification ICT See “Overview of steps to tie Information Centric
Tagging to Data Loss Prevention” on page 228.

See “Configuring the Content Matches Classification

condition” on page 863.

Detect using Vector Machine VML Profile See “Introducing Vector Machine Learning (VML)”
Learning on page 664.

See “Configuring VML profiles and policy conditions”

on page 668.

Context match conditions

Contextual Attributes (Cloud Cloud Detection Service See “Introducing contextual attributes for cloud
Applications and API Detection or API Detection applications” on page 948.
Appliance only) Appliance

File Properties match conditions

Message Attachment or File See “About file type matching” on page 900.
Type Match

Message Attachment or File See “About file size matching” on page 902.
Size Match

Message Attachment or File See “About file name matching” on page 903.
Name Match

Custom File Type Signature Rule enabled See “About custom file type identification” on page 901.

Custom script See “Enabling the Custom File Type Signature

condition in the policy console” on page 908.

Protocol and Endpoint match conditions

Protocol Monitoring Custom protocols (if any) See “Introducing protocol monitoring for network”
on page 912.
Configuring policies 417
Configuring policy rules

Table 20-2 Adding policy rules (continued)

Rule Prerequisite Description

Endpoint Monitoring See “About endpoint protocol monitoring” on page 915.

Endpoint Device Class or ID Custom device(s) See “About endpoint device detection” on page 917.

Endpoint Location See “About endpoint location detection” on page 917.

Form Recognition

Detect using Form Recognition Form Recognition Profile See “About Form Recognition detection” on page 695.
Profile
See “Configuring the Form Recognition detection rule”
on page 699.

Groups (Identities) match conditions

Sender/User Matches Pattern See “Introducing described identity matching”

on page 925.
Recipient Matches Pattern

Sender/User based on a User Group See “Introducing synchronized Directory Group

Directory Server Group Matching (DGM)” on page 935.

Recipient based on a Directory See “Configuring User Groups” on page 936.

Server Group

Sender/User based on a Exact Data Profile See “Introducing profiled Directory Group Matching
Directory from: (DGM)” on page 942.

Recipient based on a Directory See “Configuring Exact Data profiles for DGM”
from: on page 943.

Configuring policy rules

At the Manage > Policies > Policy List > Configure Policy – Edit Rule screen, you configure
a policy rule with one or more match conditions. The configuration of each rule condition
depends on its type.
See Table 20-4 on page 419.

Table 20-3 Configuring policy rules

Step Action Description

Step 1 Add a rule to a policy, or modify See “Adding a rule to a policy” on page 415.
a rule.
To modify an existing rule, select the rule in the policy builder interface at
the Configure Policy – Edit Rule screen.
Configuring policies 418
Configuring policy rules

Table 20-3 Configuring policy rules (continued)

Step Action Description

Step 2 Name the rule, or modify a In the General section of the rule, enter a name in the Rule Name field,
name. or modify the name of an existing rule.

Step 3 Set the rule severity. In the Severity section of the rule, select or modify a "Default" severity
level.

In addition to the default severity, you can add multiple severity levels to
a rule.

See “Defining rule severity” on page 420.

Step 4 Configure the match condition. In the Conditions section of the rule, you configure one or more match
conditions for the rule. The configuration of a condition depends on its
type.

See Table 20-4 on page 419.

Step 5 Configure match counting (if If the rule calls for it, configure how you want to count matches.
required).
See “Configuring match counting” on page 421.

Step 6 Select components to match on If the rule is content-based, select one or more available content rules to
(if available). match on.

See “Selecting components to match on” on page 423.

Step 7 Add and configure one or more To define a compound rule, Add another match condition from the Also
additional match conditions Match list.
(optional).
Configure the additional condition according to its type (Step 4).

See “Configuring compound match conditions” on page 429.

Note: All conditions in a single rule must match to trigger an incident. See
“Policy detection execution” on page 394.

Step 8 Save the policy configuration. When you are done cofiguring the rule, click OK.

This action returns you to the Configure Policy screen where you can
Save the policy.

See “Manage and add policies” on page 432.

Table 20-4 lists each of the available match conditions and provides links to topics for
configuring each condition.
Configuring policies 419
Configuring policy rules

Table 20-4 Configuring policy match conditions

Rule Description

Content match conditions

Content Matches Regular See “Configuring the Content Matches Regular Expression condition”
Expression on page 854.

Content Matches Exact Data from See “Configuring the Content Matches Exact Data policy condition
an Exact Data Profile for EDM” on page 551.

Content Matches Keyword See “Configuring the Content Matches Keyword condition”
on page 844.

Content Matches Document See “Configuring the Content Matches Document Signature policy
Signature condition” on page 646.

Content Matches Data Identifier See “Configuring the Content Matches data identifier condition”
on page 737.

Detect using Vector Machine See “Configuring the Detect using Vector Machine Learning Profile
Learning profile condition” on page 679.

Content Matches Classification See “Configuring the Content Matches Classification condition”
on page 863.

Detect using Form Recognition See “Configuring the Form Recognition detection rule” on page 699.
profile

C Context

Contextual Attributes (Cloud See “Introducing contextual attributes for cloud applications”
Applications and API Detection on page 948.
Appliance only)

File Properties match conditions

Message Attachment or File Type See “Configuring the Message Attachment or File Type Match
Match condition” on page 904.

Message Attachment or File Size See “Configuring the Message Attachment or File Size Match
Match condition” on page 905.

Message Attachment or File Name See “Configuring the Message Attachment or File Name Match
Match condition” on page 906.

Custom File Type Signature See “Configuring the Custom File Type Signature condition”
on page 908.

Protocol match conditions

Configuring policies 420
Defining rule severity

Table 20-4 Configuring policy match conditions (continued)

Rule Description

Network Monitoring See “Configuring the Protocol Monitoring condition for network
detection” on page 913.

Endpoint Monitoring See “Configuring the Endpoint Monitoring condition” on page 918.

Endpoint Device Class or ID See “Configuring the Endpoint Device Class or ID condition”
on page 920.

Endpoint Location See “Configuring the Endpoint Location condition” on page 919.

Groups match conditions

Sender/User Matches Pattern See “Configuring the Sender/User Matches Pattern condition”
on page 927.

Recipient Matches Pattern See “Configuring the Recipient Matches Pattern condition”
on page 930.

Sender/User based on a Directory See “Configuring the Sender/User based on a Directory Server
Server Group Group condition” on page 939.

Sender/User based on a Directory See “Configuring the Sender/User based on a Profiled Directory
from an Exact Data Profile condition” on page 944.

Recipient based on a Directory See “Configuring the Recipient based on a Directory Server Group
Server Group condition” on page 940.

Recipient based on a Directory from See “Configuring the Recipient based on a Profiled Directory
an Exact Data Profile condition” on page 945.

Defining rule severity

The system assigns a severity level to a policy rule violation. The default setting is "High." You
can configure the default, and add one or more additional severity levels.
See “Policy severity” on page 374.
Policy rule severity works with the Severity response rule condition. If you set the default
policy rule severity level to "High" and define additional severity levels, the system does not
assign the additional severity to the incident based on match count. The result is that if you
have a response rule set to a match count severity level that is less than the default "High"
severity, the response rule does not execute
See “Configuring the Severity response condition” on page 1778.
Configuring policies 421
Configuring match counting

To define policy rule severity

1 Configure a policy rule.
See “Configuring policy rules” on page 417.
2 Select a Default level from the Severity list.
The default severity level is the baseline level that the system reports. The system applies
the default severity level to any rule match, unless additional severity levels override the
default setting.
3 Click Add Severity to define additional severity levels for the rule.
If you add a severity level it is based on the match count.
4 Select the desired severity level, choose the match count range, and enter the match
count.
For example, you can set a Medium severity with X range to match after 100 matches
have been counted.
5 If you add an additional severity level, you can select it to be the default severity.
6 To remove a defined severity level, click the X icon beside the severity definition.

Configuring match counting

Some conditions let you specify how you want to count matches. Count all matches is the
default behavior. You can configure the minimum number of matches required to cause an
incident. Or, you can count all matches as one incident. If a condition supports match counting,
you can configure this setting for both policy rules and exceptions.
See Table 20-6 on page 422.

Table 20-5 Configuring match counting

Parameter Condition Incident description

type

Check for Simple This configuration reports a match count of 1 if there are one or more matches; it
existence does not count multiple matches. For example, 10 matches are one incident.

Compound This configuration reports a match count of 1 if there are one or more matches
and ALL conditions in the rule or exception are set to check for existence.
Configuring policies 422
Configuring match counting

Table 20-5 Configuring match counting (continued)

Parameter Condition Incident description

type

Count all Simple This configuration reports a match count of the exact number of matches detected
matches by the condition. For example, 10 matches count as 10 incidents.

Compound This configuration reports a match count of the sum of all condition matches in
the rule or exception. The default is one incident per condition match and applies
if any condition in the rule or exception is set to count all matches.

For example, if a rule has two conditions and one is set to count all matches and
detects four matches, and the other condition is set to check for existence and
detects six matches, the reported match count is 10. If a third condition in the rule
detects a match, the match count is 11.

Only report You can change the default one incident per match count by specifying the
incidents with minimum number of matches required to report an incident.
at least _
For example, in a rule with two conditions, if you configure one condition to count
matches
all matches and specify five as the minimum number of matches for each condition,
a sum of 10 matches reported by the two conditions generates two incidents. You
must be consistent and select this option for each condition in the rule or exception
to achieve this behavior.
Note: The count all matches setting applies to each message component you
match on. For example, consider a policy where you specify a match count of 3
and configure a keyword rule that matches on all four message components
(default setting for this condition). If a message is received with two instances of
the keyword in the body and one instance of the keyword in the envelope, the
system does not report this as a match. However, if three instances of the keyword
appear in an attachment (or any other single message component), the system
would report it as a match.

Count all unique Only count Unique match counting is available for Data Identifiers, keyword matching, and
matches unique regular expression matching.
matches
See “About unique match counting” on page 734.

Table 20-6 Conditions that support match counting

Condition Description

Content Matches Regular See “Introducing regular expression matching” on page 852.
Expression
See “Configuring the Content Matches Regular Expression condition” on page 854.

Content Matches Keyword See “Introducing keyword matching” on page 838.

See “Configuring the Content Matches Keyword condition” on page 844.

Configuring policies 423
Selecting components to match on

Table 20-6 Conditions that support match counting (continued)

Condition Description

Content Matches Document See “Configuring the Content Matches Document Signature policy condition”
Signature (IDM) on page 646.

Content Matches Data Identifier See “Introducing data identifiers” on page 717.

See “Configuring the Content Matches data identifier condition” on page 737.

See “Configuring unique match counting” on page 775.

Recipient Matches Pattern See “Introducing described identity matching” on page 925.

See “Configuring the Recipient Matches Pattern condition” on page 930.

Selecting components to match on

The availability of one or more message components to match on depends on the type of rule
or exception condition you implement.
See “Detection messages and message components” on page 391.

Table 20-7 Match on components

Component Description

Envelope If the condition supports matching on the Envelope component, select it to match on the message
metadata. The envelope contains the header, transport information, and the subject if the message
is an SMTP email.

If the condition does not support matching on the Envelope component, this option is grayed out.

If the condition matches on the entire message, the Envelope is selected and cannot be deselected,
and the other components cannot be selected.

Subject Certain detection conditions match on the Subject component for some types of messages.

See “Detection messages and message components” on page 391.

For the detection conditions that support subject component matching, you can match on the Subject
for the following types of messages:

■ SMTP (email) messages from Network Monitor or Network Prevent for Email.
■ NNTP messages from Network Monitor.

To match on the Subject component, you must select (check) the Subject component and uncheck
(deselect) the Envelope component for the policy rule. If you select both components, the system
matches the subject twice because the message subject is included in the envelope as part of the
header.
Configuring policies 424
Adding an exception to a policy

Table 20-7 Match on components (continued)

Component Description

Body If the condition matches on the Body message component, select it to match on the text or content
of the message.

Attachment(s) If the condition matches on the Attachment(s) message component, select it to detect content in
files sent by, downloaded with, or attached to the message.

Adding an exception to a policy

At the Manage > Policies > Policy List > Configure Policy – Add Exception screen you
add one or more exception conditions to a policy. Policy exceptions are executed before policy
rules. If there is an exception match, the entire message is discarded.
See “Exception conditions” on page 393.

Note: You can create exceptions for all policy conditions, except the EDM condition Content
Matches Exact Data From. In addition, Network Prevent for Web does not support
synchronized DGM exceptions.

To add an exception to a policy

1 Add an exception to a policy.
To add a detection rule exception, select the Detection tab and click Add Exception.
To add a group rule exception, select the Groups tab and click Add Exception.
2 Select the policy exception to implement.
The Add Detection Exception screen lists all available detection exceptions that you
can add to a policy.
The Add Group Exception screen lists all available group exceptions that you can add
to a policy.
See Table 20-8 on page 425.
3 If necessary, choose the profile, data identifier, or user group.
4 Click Next to configure the exception.
See “Configuring policy exceptions” on page 426.
Configuring policies 425
Adding an exception to a policy

Table 20-8 Selecting a policy exception

Exception Prerequisite Description

Content

Content Matches Regular See “Introducing regular expression matching” on page 852.
Expression

Content Matches Keyword See “Introducing keyword matching” on page 838.

Content Matches Document Indexed Document See “Choosing an Indexed Document Profile” on page 411.
Signature Profile

Content Matches Data Data Identifier See “Introducing data identifiers” on page 717.
Identifier
See “Selecting a data identifier breadth” on page 739.

Detect using Vector Machine VML Profile See “Configuring VML policy exceptions” on page 680.
Learning profile
See “Configuring VML profiles and policy conditions”
on page 668.

Context

Contextual Attributes (Cloud Cloud Detection See “Introducing contextual attributes for cloud applications”
Applications and API Service or API on page 948.
Detection Appliance only) Detection
Appliance

File Properties

Message Attachment or File See “About file type matching” on page 900.
Type Match

Message Attachment or File See “About file size matching” on page 902.
Size Match

Message Attachment or File See “About file name matching” on page 903.
Name Match

Custom File Type Signature Condition enabled See “About custom file type identification” on page 901.

Custom script
added

Protocol and Endpoint

Network Protocol See “Introducing protocol monitoring for network”

on page 912.
Configuring policies 426
Configuring policy exceptions

Table 20-8 Selecting a policy exception (continued)

Exception Prerequisite Description

Endpoint Protocol, See “About endpoint protocol monitoring” on page 915.

Destination, Application

Endpoint Device Class or ID See “About endpoint device detection” on page 917.

Endpoint Location See “About endpoint location detection” on page 917.

Form Recognition

Detect using Form Form Recognition See “About Form Recognition detection” on page 695.
Recognition Profile Profile
See “Configuring the Form Recognition exception rule”
on page 700.

Group (identity)

Sender/User Matches Pattern See “Introducing described identity matching” on page 925.

Recipient Matches Pattern

Sender/User based on a User Group See “Introducing synchronized Directory Group Matching
Directory Server Group (DGM)” on page 935.

Recipient based on a Directory See “Configuring User Groups” on page 936.

Server Group
Note: Network Prevent for Web does not support this type
of exception. Use profiled DGM instead.

Sender/User based on a Exact Data Profile See “Introducing profiled Directory Group Matching (DGM)”
Directory from: on page 942.

Recipient based on a Directory See “Configuring Exact Data profiles for DGM” on page 943.
from:

Configuring policy exceptions

At the Manage > Policies > Policy List > Configure Policy – Edit Exception screen you
configure one or more conditions for a policy exception.
See Table 20-10 on page 428.
If an exception condition matches, the system discards the matched component from the
system. This component is no longer available for evaluation.
See “Exception conditions” on page 393.
Configuring policies 427
Configuring policy exceptions

Table 20-9 Configure policy exceptions

Step Action Description

Step 1 Add a new policy exception, or See “Adding an exception to a policy” on page 424.
edit an existing exception.
Select an existing policy exception to modify it.

Step 2 Name the exception, or edit an In the General section, enter a unique name for the exception, or modify
existing name or description. the name of an existing exception.
Note: The exception name is limited to 60 characters.

Step 3 Select the components to apply If the exception is content-based, you can match on the entire message
the exception to (if available). or on individual message components.

See “Detection messages and message components” on page 391.

Select one of the Apply Exception to options:

■ Entire Message
This option applies the exception to the entire message.
■ Matched Components Only
This option applies the exception to each message component you
select from the Match On options in the Conditions section of the
exception.

Step 4 Configure the exception condition. In the Conditions section of the Configure Policy - Edit Exception
screen, define the condition for the policy exception. The configuration
of a condition depends on the exception type.

See Table 20-10 on page 428.

Step 5 Add one or more additional You can add conditions until the exception is structured as desired.
conditions to the exception
See “Configuring compound match conditions” on page 429.
(optional).
To add another condition to an exception, select the condition from the
Also Match list.

Click Add and configure the condition.

Step 6 Save and manage the policy. Click OK to complete the exception definition process.

Click Save to save the policy.

See “Manage and add policies” on page 432.

Table 20-10 lists the exception conditions that you can configure, with links to configuration
details.
Configuring policies 428
Configuring policy exceptions

Table 20-10 Policy exception conditions available for configuration

Exception Description

Content

Content Matches Regular Expression See “Configuring the Content Matches Regular Expression condition”
on page 854.

Content Matches Keyword See “Configuring the Content Matches Keyword condition” on page 844.

Content Matches Document Signature See “Configuring the Content Matches Document Signature policy
condition” on page 646.

Content Matches Data Identifier See “Configuring the Content Matches data identifier condition”
on page 737.

Detect using Vector Machine Learning See “Configuring VML policy exceptions” on page 680.
Profile

Context

Contextual Attributes (Cloud See “Introducing contextual attributes for cloud applications”
Applications and API Detection on page 948.
Appliance only)

File Properties

Message Attachment or File Type Match See “Configuring the Message Attachment or File Type Match
condition” on page 904.

Message Attachment or File Size Match See “Configuring the Message Attachment or File Size Match condition”
on page 905.

Message Attachment or File Name Match See “Configuring the Message Attachment or File Name Match
condition” on page 906.

Custom File Type Signature See “Configuring the Custom File Type Signature condition”
on page 908.

Protocol and Endpoint

Network Protocol See “Configuring the Protocol Monitoring condition for network
detection” on page 913.

Endpoint Protocol or Destination See “Configuring the Endpoint Monitoring condition” on page 918.

Endpoint Device Class or ID See “Configuring the Endpoint Device Class or ID condition”
on page 920.

Endpoint Location See “Configuring the Endpoint Location condition” on page 919.
Configuring policies 429
Configuring compound match conditions

Table 20-10 Policy exception conditions available for configuration (continued)

Exception Description

Form Recognition

Detect using Form Recognition profile See “Configuring the Form Recognition exception rule” on page 700.

Group (identity)

Sender/User Matches Pattern See “Configuring the Sender/User Matches Pattern condition”
on page 927.

Recipient Matches Pattern See “Configuring the Recipient Matches Pattern condition” on page 930.

Sender/User based on a Directory Server See “Configuring the Sender/User based on a Directory Server Group
Group condition” on page 939.

Recipient based on a Directory Server See “Configuring the Recipient based on a Directory Server Group
Group condition” on page 940.

Sender/User based on a Directory from See “Configuring the Sender/User based on a Profiled Directory
an EDM Profile condition” on page 944.

Recipient based on a Directory from and See “Configuring the Recipient based on a Profiled Directory condition”
EDM Profile on page 945.

Configuring compound match conditions

You can create compound match conditions for policy rules and exceptions.
See “Configuring compound match conditions” on page 429.
The detection engine connects compound conditions with an AND. All conditions in the rule
or exception must be met to trigger or except an incident.
See “Policy detection execution” on page 394.
You are not limited to the number of match conditions you can include in a rule or exception.
However, the multiple conditions you declare in a single rule or exception should be logically
associated. Do not mistake compound rules or exceptions with multiple rules or exceptions in
a policy.
See “Use compound conditions to improve match accuracy” on page 455.
Configuring policies 430
Configuring compound match conditions

Table 20-11 Configure a compound policy rule or exception

Step Action Description

Step 1 Modify or configure an You can add one or more additional match conditions to a policy rule at the
existing policy rule or Configure Policy – Edit Rule screen.
exception.
You can add one or more additional match conditions to a rule or exception
at the Configure Policy – Edit Rule or Configure Policy – Edit Exception
screen.

Step 2 Select an additional match Select the additional match condition from the Also Match list.
condition.
This list appears at the bottom of the Conditions section for an existing rule
or exception.

Step 3 Review the available The system lists all available additional conditions you can add to a policy
conditions. rule or exception.

See “Adding a rule to a policy” on page 415.

See “Adding an exception to a policy” on page 424.

Step 4 Add the additional Click Add to add the additional match condition to the policy rule or exception.
condition.
Once added, you can collapse and expand each condition in a rule or
exception.

Step 5 Configure the additional See “Configuring policy rules” on page 417.
condition.
See “Configuring policy exceptions” on page 426.

Step 6 Select the same or any If the condition supports component matching, specify where the data must
component to match. match to generate or except an incident.

Same Component – The matched data must exist in the same component
as the other condition(s) that also support component matching to trigger a
match.

Any Component – The matched data can exist in any component that you
have selected.

See “About cross-component matching” on page 733.

Step 6 Repeat this process to You can add as many conditions to a rule or exception as you need.
additional match conditions
All conditions in a single rule or exception must match to trigger an incident,
to the rule or exception.
or to trigger the exception.

Step 7 Save the policy. Click OK to close the rule or exception configuration screen.

Click Save to save the policy configuration.

Configuring policies 431
Input character limits for policy configuration

Input character limits for policy configuration

When configuring a policy, consider the following input character limits for policy configuration
components.

Table 20-12 Input character limits for policy configuration

Configuration element Input character limit

Name of a policy component, including: 60 characters

■ Policy Note: To import a policy as a template, the policy name must be less than
■ Rule 60 characters, otherwise it does not appear in the Imported Templates
■ Exception list.

■ Group
■ Condition

Description of policy component. 255 characters

Name of Data Profile, including: 255 characters

■ Exact Data
■ Indexed Document
■ Vector Machine Learning
■ Form Recognition

Data Identifier pattern limits 100 characters per line

See “Using the data identifier pattern language” on page 814.

Chapter 21
Administering policies
This chapter includes the following topics:

■ Manage and add policies

■ Manage and add policy groups

■ Creating and modifying policy groups

■ Importing policies

■ Exporting policies

■ Cloning policies

■ Importing policy templates

■ Exporting policy detection as a template

■ Adding an automated response rule to a policy

■ Removing policies and policy groups

■ Viewing and printing policy details

■ Downloading policy details

■ Troubleshooting policies

■ Updating EDM and IDM profiles to the latest version

■ Updating policies after upgrading to the latest version

Manage and add policies

The Manage > Policies > Policy List screen is the home page for adding and managing
policies. You implement policies to detect and report data loss.
Administering policies 433
Manage and add policies

See “Workflow for implementing policies” on page 378.

Table 21-1 lists and describes the actions you can take at the Policy List screen.

Table 21-1 Policy List screen actions

Action Description

Add a policy Click New to create a new policy.

See “Adding a new policy or policy template” on page 412.

Modify a policy Click the policy name or edit icon to modify an existing policy.

See “Configuring policies” on page 413.

Activate a policy Select the policy or policies you want to activate, then click Activate in the policy list
toolbar.

Make a policy inactive Select the policy or policies you want to make inactive, then click Suspend in the policy
list toolbar.
Note: By default, all solution pack policies are activated on installation of the solution
pack.

Sort policies Click any column header to sort the policy list.

Filter policies You can filter your policy list by Status, Name, Description, or Policy Group.

To filter your policy list, click Filter in the policy list toolbar, then select or enter your filter
criteria in the appropriate column or columns.

To remove filters from your policy list, click Clear in the policy list toolbar.

Remove a policy Select the policy or policies you want to remove, then click Delete in the policy list toolbar.

You can also click the red X icon at the end of the policy row to delete an individual
policy.
Note: You cannot remove a policy that has active incidents.

See “Removing policies and policy groups” on page 443.

Import and export policies You can import and export policies using the Import and Export buttons in the policy
list toolbar.

See “Importing policies” on page 437.

See “Exporting policies” on page 439.

Export and import policy You can export and import policy templates for reuse when authoring new policies.
templates
See “Importing policy templates” on page 441.

See “Exporting policy detection as a template” on page 442.

Administering policies 434
Manage and add policies

Table 21-1 Policy List screen actions (continued)

Action Description

Download policy details Click Download Details in the policy list toolbar to download details for the selected
policies in the Policy List. Symantec Data Loss Prevention exports the policy details
as HTML files in a ZIP archive. Open the archive to view and print policy details.

See “Downloading policy details” on page 444.

View and print policy details To view policy details for a single policy, click the printer icon at the end of the policy
row. To print the policy details, use the print feature of your web browser.

See “Viewing and printing policy details” on page 444.

Clone a policy Select the policy or policies you want to clone, then click Clone in the policy list toolbar.

See “Cloning policies” on page 440.

Assign policies to a policy You can assign individual or multiple policies to a policy group from the policy list page.
group
Select the policy or policies you want to assign to a policy group, then click Assign
Group in the policy list toolbar. Select the policy group from the drop-down list.

See “Policy groups” on page 372.

Table 21-2 lists and describes the display fields at the Policy List screen.

Table 21-2 Policy List screen display fields

Column Description

Status The status column displays one of three states for the policy:

■ Misconfigured Policy:
The policy icon is a yellow caution sign.
See “Policy components” on page 370.
■ Active Policy:
The policy icon is green. An active policy can detect incidents.
■ Suspended Policy
The policy icon is red. A suspended policy is deployed but does not detect incidents.

Name View and sort by the name of the policy.

See “About Data Loss Prevention policies” on page 368.

Description View the description of the policy.

See “Policy templates” on page 371.

Policy Group View and sort by the policy group to which the policy is deployed.

See “Policy groups” on page 372.

Administering policies 435
Manage and add policy groups

Table 21-2 Policy List screen display fields (continued)

Column Description

Last Modified View and sort by the date the policy was last updated.
See “Policy authoring privileges” on page 375.

Manage and add policy groups

The System > Servers and Detectors > Policy Groups screen lists the configured policy
groups in the system.
From the Policy Groups screen you manage existing policy groups and add new ones.

Table 21-3 Policy Groups screen actions

Action Description

Add a policy group Click Add to define a new policy group.

See “Policy groups” on page 372.

Modify a policy group To modify an existing policy group, click the name of the group.

See “Creating and modifying policy groups” on page 436.

Remove a policy group Select the policy group then click Delete.
Note: If you delete a policy group, you delete any policies that are assigned to that group.

See “Removing policies and policy groups” on page 443.

Find a policy group You can search for a policy group by applying entering a search term in the Search bar.
You can filter your results by Name, Description, or Servers by selecting the filter then
clicking Apply Filter.

View policies in a group To view the policies deployed to an existing policy group, navigate to the System > Servers
and Detectors > Policy Groups > Configure Policy Group screen.

See “Creating and modifying policy groups” on page 436.

Table 21-4 Policy Groups screen display fields

Column Description

Name The name of the policy group.

Description The description of the policy group.

Administering policies 436
Creating and modifying policy groups

Table 21-4 Policy Groups screen display fields (continued)

Column Description

Available Servers and The detection server or cloud detector to which the policy group is deployed.
Detectors
See “Policy deployment” on page 373.

Last Modified The date the policy group was last modified.

Creating and modifying policy groups

At the System > Servers and Detectors > Policy Groups screen you configure a new policy
group or modify an existing one.
See “Policy groups” on page 372.
To configure a policy group
1 Add a new policy group, or modify an existing one.
See “Manage and add policy groups” on page 435.
2 Enter the Name of the policy group, or modify an existing name.
Use an informative name. Policy authors and Enforce Server administrators rely on the
policy group name when they associate the policy group with policies, roles, targets.
The name value is limited to 256 characters.
3 Enter a Description of the policy group, or modify an exiting description of an existing
policy group.
4 Select one or more Servers and Detectors to assign the policy group to.
The system displays a check box for each detection server currently configured and
registered with the Enforce Server.
■ Select the All Servers or Detectors option to assign the policy group to all detection
servers and cloud detectors in your system. If you leave this checkbox unselected,
you can assign the policy group to individual servers.
The All Discover Servers entry is not configurable because the system automatically
assigns all policy groups to all Network Discover Servers. This feature lets you assign
policy groups to individual Discover targets.
See “Configuring the required fields for Network Discover targets” on page 2092.
■ Deselect the All Servers or Detectors option to assign the policy group to individual
detection servers.
The system displays a check box for each server currently configured and registered
with the Enforce Server.
Administering policies 437
Importing policies

Select each individual detection server to assign the policy group.

5 Click Save to save the policy group configuration.

Note: The Policies in this Group section of the Polices Group screen lists all the policies in
the policy group. You cannot edit these entries. When you create a new policy group, this
section is blank. After you deploy one or more policies to a policy group (during policy
configuration), the Policies in this Group section displays each policy in the policy group.

See “Configuring policies” on page 413.

See “Policy deployment” on page 373.

Importing policies
You can export policies from an Enforce Server and import them to another Enforce Server.
This feature makes it easier to move policies from one environment to another. For example,
you can export policies from your test environment and import them into your production
environment.

About importing policies

To import policies, you must have the Import Policies privilege. To enable this privilege, you
must also have the Server Administration, Author Policies, Author Response Rules, and
All Policy Groups privileges.
See “Configuring roles” on page 114.
When you import a policy, please note the following points:
■ The policy is imported in the same state in which it was exported. For example, if a policy
was active when it was exported, it will be active when you import it. The only exception
to this behavior is for pre-existing policies on system to which you are importing the policy
(the "target system"). If the existing policy is active, then the imported policy will also be
active, regardless of its state on the exporting system.
■ Imported policies will overwrite existing policies that have the same name. You can change
the name of the exported policy in the XML file if you want to import it without overwriting
the existing policy.
■ If the policy group to which the exported policy belonged exists on the target system, the
policy will be added to that policy group, or overwrite a policy of the same name in that
group. If the policy group does not exist on the target system, it will be created upon import.
If the policy exists on the target system, but it belongs to a different policy group, the
imported policy will be assigned to a newly created policy group on the target system, and
will not overwrite the existing policy.
Administering policies 438
Importing policies

■ When you import a policy, you can choose whether or not to import its response rules if
those rules conflict with existing response rules on the target system.
■ The Policy Import Preview page will display warnings about any policy elements that will
be created or overwritten when you import the policy.
■ You can only import one policy at a time.
To import a policy
1 Navigate to Manage > Policies > Policy List.
2 Click Import.
The Import Policy page appears.
3 Click Browse to select the exported policy file you want to import.
4 Click Import Policy.
The Policy import preview page appears. This page will warn you of any policy elements
that may be overwritten when you import this policy. If the policy you are importing includes
any response rules among the elements that may be overwritten, you can exclude those
response rules from import on this page.
5 Click Proceed with import.
The policy is imported. If the policy has any unresolved references, the Policy References
Check page appears.
You can resolve any unresolved policy references on this page.
See “About policy references” on page 438.

About policy references

Policies are exported in XML format. The XML policy files contain policy metadata, references
to any data profiles, response rules, data identifiers, and the detection and group rules and
exceptions. The files do not contain the actual data profiles, directory connections, credentials,
or FlexResponse plug-ins. You must provide those items on the system into which you are
importing the policy.
When you import a policy, Symantec Data Loss Prevention will alert you to any unresolved
references on the Policy References Check page. The Policy References Check page
displays at the end of the policy import process. You can also view this page by clicking the
unresolved references icon on the Policy List and Policy Edit pages.
To resolve policy references, click the edit (pencil) icon on the Policy References Check
page. Symantec Data Loss Prevention displays the appropriate edit page for each unresolved
reference.Table 21-5 provides information about resolving policy references.
Administering policies 439
Exporting policies

Table 21-5 Resolving policy references

Unresolved policy reference Resolution

Policy group where no detection server is specified: Select detection servers for the policy group.

Directory connection with missing credentials: Provide the credentials for the directory connection.

EDM profile with missing source file and index: Specify the correct data source file.

IDM profile with missing import path and file name: Specify the correct data source.

Remote IDM profile with missing credentials: Provide the credentials for the remote IDM profile.

VML profile with trained profile and related data Provide the trained profile and its related data, train
missing: and accept the VML profile.

Form Recognition profile with missing gallery ZIP Provide the gallery ZIP archive.
archive:

Endpoint quarantine response rule with missing Provide the credentials for the endpoint quarantine
saved credentials: response rule.

Response rule with a missing Server FlexResponse Deploy the Server FlexResponse JAR file on the
plug-in: target system.

See “Deploying a Server FlexResponse plug-in”

on page 2143.

Exporting policies
You can export your policy data to an XML file to easily share policies between Enforce Servers.

About policy export

■ Policy rules, including Form Recognition, EDM, IDM, and VML definitions
■ Endpoint locations and devices
■ Sender and recipient patterns
■ Response rules
■ Data identifiers
■ Custom protocols
Exported policies do not include the following items:
■ Credentials
■ Form Recognition, EDM, IDM, or VML indexes
■ Form Recognition, EDM or IDM data source files
■ VML training files
■ FlexResponse plug-ins
To export policies
1 Navigate to Manage > Policies > Policy List.
2 Take one of the following actions:
■ To export a single policy, click the export icon for that policy.
■ To export multiple policies to a ZIP archive, select the policies you want to export, then
click Export.

3 Symantec Data Loss Prevention exports your policy or policies using the following naming
conventions:
■ For single policies, the naming convention is
ENFORCEHOSTNAME-POLICYNAME-DATE-TIME.XML.

■ For bulk policy export, the naming convention is

ENFORCEHOSTNAME-policies-DATE-TIME.ZIP.

Cloning policies
You can clone policies from the Policy List page.
Cloned policies are exact copies of the original policy. They include the following items:
■ Modified policy name, description, and policy group.
Cloned policies appear in the Policy List as Copy N of original policy name.
■ Policy rules, including Form Recognition, EDM, IDM, and VML definitions
Administering policies 441
Importing policy templates

■ Endpoint locations and devices

■ Sender and recipient patterns
■ Response rules
■ Data identifiers
■ Custom protocols

Note: You must have policy authoring privileges to clone policies.

For information about importing and exporting policies and policy templates, see these topics:
See “Exporting policies” on page 439.
See “Importing policies” on page 437.
See “Exporting policy detection as a template” on page 442.
See “Importing policy templates” on page 441.

Importing policy templates

You can import one or more policy templates to the Enforce Server. You must have policy
system privileges to import policy templates.
See “Policy template import and export” on page 377.
See “Exporting policy detection as a template” on page 442.
To import one or more policy templates to the Enforce Server
1 Place one or more policy templates XML file(s) in the \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\templates
directory on the Enforce Server host.
You can import multiple policy templates by placing them all in the templates directory.
2 Make sure that the directory and file(s) are readable by the "protect" system user.
3 Log on to the Enforce Server Administration Console with policy authoring privileges.
4 Navigate to Manage > Policies > Policy List and click Add Policy.
5 Choose the option Add a policy from a template and click Next.
6 Scroll down to the bottom of the template list to the Imported Templates section.
You should see an entry for each XML file you placed in the templates directory.
7 Select the imported policy template and click Next to configure it.
See “Configuring policies” on page 413.
Administering policies 442
Exporting policy detection as a template

Exporting policy detection as a template

You can export policy detection rules and exceptions in a template (XML file). You cannot
export policy response rules. You can only export one policy template at a time.
See “Policy template import and export” on page 377.
To export a policy as a template
1 Log on to the Enforce Server administration console with administrator privileges.
2 Navigate to the Manage > Policies > Policy List > Configure Policy screen for the policy
you want to export.
3 At the bottom of the Configure Policy screen, click the Export this policy as a template
link.
4 Save the policy to a local or network destination of your choice.
For example, the system exports a policy named Webmail to the policy template file
Webmail.xml which you can save to your local drive.

See “Importing policy templates” on page 441.

For information about importing, exporting, and cloning policies, see these topics:
See “Exporting policies” on page 439.
See “Importing policies” on page 437.
See “Cloning policies” on page 440.

Adding an automated response rule to a policy

You can add one or more automated response rules to a policy to take action when that policy
is violated.
See “About response rules” on page 1738.

Note: Smart response rules are executed manually and are not deployed with policies.

To add an automated response rule to a policy

1 Log on to the Enforce Server administration console with policy authoring privileges.
See “Policy authoring privileges” on page 375.
2 Navigate to the Manage > Policies > Policy List > Configure Policy screen for the policy
you want to add a response rule to.
Administering policies 443
Removing policies and policy groups

3 Select the response rule you want to add from those available in the drop-down menu.
Policies and response rules are configured separately. To add a response rule to a policy,
the response rule must first be defined and saved independently.
See “Implementing response rules” on page 1758.
4 Click Add Response Rule to add the response rule to the policy.
5 Repeat the process to add additional response rules to the policy.
6 Save the policy when you are done adding response rules.
7 Verify that the policy status is green after adding the response rule to the policy.
See “Manage and add policies” on page 432.

Note: If the policy status is a yellow caution sign, the policy is misconfigured. The system does
not support certain pairings of detection rules and automated response rule actions. See
Table 81-2 on page 2276.

Removing policies and policy groups

Consider the following guidelines before you delete a policy or a policy group from the Enforce
Server.

Table 21-6 Guidelines for removing policies and policy groups

Action Description Guideline

Remove a If you attempt to delete a policy that has If you want to delete a policy, you must first delete all
policy associated incidents, the system does incidents that are associated with that policy from the
not let you remove the policy. Enforce Server.

See “Manage and add policies” on page 432.

An alternative is to create an undeployed policy group (one

that is not assigned to any detection servers). This method
is useful to maintain legacy policies and incidents for review
without keeping these policies in a deployed policy group.

See “Policy template import and export” on page 377.

Administering policies 444
Viewing and printing policy details

Table 21-6 Guidelines for removing policies and policy groups (continued)

Action Description Guideline

Remove a If you attempt to delete a policy group Before you delete a policy group, remove any policies from
policy group that contains one or more policies, the that group by either deleting them or assigning them to
system displays an error message. And, different policy groups.
the policy group is not deleted.
See “Manage and add policy groups” on page 435.

If you want to remove a policy group, create a maintenance

policy group and move the policies you want to remove to
the maintenance group.

See “Creating and modifying policy groups” on page 436.

See “About Data Loss Prevention policies” on page 368.

See “Policy groups” on page 372.

Viewing and printing policy details

You can view and print policy details for a single policy from the Policy List screen.
You must have the Author Policies privilege for the policies you want to view and print.
See “Policy authoring privileges” on page 375.
See “Viewing, printing, and downloading policy details” on page 379.
To view and print policy details
1 Navigate to Manage > Policies > Policy List and click the printer icon at the end of the
policy row.
The Policy Snapshot screen appears.
2 View the general policy information, detection rules, and response rules on the Policy
Snapshot screen.
3 To print the policy details, use the Print command in your web browser from the Policy
Snapshot screen.

Downloading policy details

You can download a ZIP archive of details for policies in the Policy List. The ZIP archive
contains HTML documents with details for each selected policy on the Policy List, as well as
an index file to make it easier to find the policy details you want. The files are titled using the
policy ID, such as 123.html. The index file is titled downloaded_policies_DATE.html, and it
Administering policies 445
Troubleshooting policies

contains the policy name, description, status, policy group, and last modified date of all selected
policies in the download, as well as links to the policy details.
You must have the Author Policies privilege for the policies you want to download.
See “Policy authoring privileges” on page 375.
See “Viewing, printing, and downloading policy details” on page 379.
To download policy details
1 Navigate to Manage > Policies > Policy List, select the policy or policies you want, then
click Download Details.
2 In the Open File dialog box, click select Save File, then click OK.
3 To view details for a policy, extract the files from the ZIP archive, then open the file you
want to view. Use the index file to search through the downloaded policies by policy name,
description, status, policy group, or last modified date.
The Policy Snapshot screen appears.
4 To print the policy details, use the Print command in your web browser from the Policy
Snapshot screen.

Troubleshooting policies
Table 21-7 lists log files to consult for troubleshooting policies.

Table 21-7 Log files for troubleshooting policies

Log file Description

SymantecDLPDetectionServer.log Logs when policies and profiles are sent from the Enforce Server to
detection servers and endpoint servers. Displays JRE errors.

See “Debug log files” on page 337.

detection_operational.log Log the loading of policies and detection execution.

detection_operational_trace.log See “Operational log files” on page 334.

FileReader.log Logs when an index file is loaded into memory. For EDM, look for the
line "loaded database profile." For IDM look for the line: "loaded
document profile."

See “Debug log files” on page 337.

Indexer.log Logs the operations of the Indexer process to generate EDM and IDM
indexes.

See “Debug log files” on page 337.

Administering policies 446
Updating EDM and IDM profiles to the latest version

See “About log files” on page 333.

See “Log collection and configuration screen” on page 343.
See “Configuring server logging behavior” on page 343.
See “Collecting server logs and configuration files” on page 347.
See “Log files for troubleshooting VML training and policy detection” on page 686.
See “Advanced server settings” on page 285.
See “Advanced agent settings” on page 2372.

Updating EDM and IDM profiles to the latest version

You must reindex your data and document sources when you upgrade. Before deploying an
index into production, test the updated profile and policies based on the profile to ensure that
they detect data loss as expected on the upgraded system.
Table 21-8 lists the reindexing requirements for updating your EDM and IDM profiles and
provides links for more information.

Table 21-8 Reindexing requirements for EDM and IDM data profiles

Technology and features Required action(s) More information

Exact Data Matching (EDM) If you have existing Exact Data profiles supporting See “Updating EDM indexes to the
EDM policies and you want to use new EDM latest version” on page 574.
■ Multi-token matching
features, before upgrading the detection server(s)
■ Proportional proximity
you must:
range
■ Reindex each structured data source using
the latest EDM indexer, and
■ Load each index into a newly-generated Exact
Data profile.

Indexed Document If you have existing Indexed Document profiles

Matching (IDM) supporting IDM policies and you want to use
Agent IDM, after upgrading you must:
■ Exact match IDM on the
endpoint (Agent IDM) ■ Disable two-tier detection on the Endpoint
Server, and
■ Reindex each document data source so that
the endpoint index is generated and deployed
to the Endpoint Server for download by the
DLP Agent.
Administering policies 447
Updating policies after upgrading to the latest version

Updating policies after upgrading to the latest version

Several policy templates were updated at Symantec Data Loss Prevention 15.1. When you
upgrade to version 15.1, the system updates the system-defined policy templates. Policies
you have created based on an upgraded policy template are not changed so that configurations
you have made are not overwritten. If you have created policies based on one or more of the
updated policy templates, you should update your policies so that they are current.
The General Data Protection Regulation (GDPR) policy templates were updated to include
several new European data identifiers. The keyword lists were also updated.
Policy templates that use data identifier patterns to detect Social Security Numbers (SSNs)
were updated to use the Randomized US SSN data identifier in Symantec Data Loss Prevention
12.5. The Radomized US SSN data identifier detects both traditional and randomized SSNs.
Symantec recommends that you update your SSN policies to use the Randomized US SSN
data identifier if you have not done so already.
See “Updating policies to use the Randomized US SSN data identifier” on page 810.
Table 21-9 lists the policy templates updated for this release of Symantec Data Loss Prevention.

Table 21-9 Policy templates updated in Data Loss Prevention version 12.5

Updated template Updated component(s) Policy description

General Data Protection Data identifiers This policy protects personal identifiable information
Regulations (Banking and related to banking and finance.
Keyword lists
Finance)
See “General Data Protection Regulation (Banking
and Finance)” on page 1583.

General Data Protection Data identifiers This policy protects personal identifiable information
Regulation (Digital Identity) related to digital identity.
Keyword lists
See “General Data Protection Regulation (Digital
Identity)” on page 1617.

General Data Protection Data identifiers This policy protects personal identifiable information
Regulation (Government related to government identification.
Keyword lists
Identification)
See “General Data Protection Regulation
(Government Identification)” on page 1618.

General Data Protection Data identifiers This policy protects personal identifiable information
Regulation (Healthcare and related to healthcare and insurance.
Keyword lists
Insurance)
See “General Data Protection Regulation (Healthcare
and Insurance)” on page 1656.
Administering policies 448
Updating policies after upgrading to the latest version

Table 21-9 Policy templates updated in Data Loss Prevention version 12.5 (continued)

Updated template Updated component(s) Policy description

General Data Protection Data identifiers This policy protects personal identifiable information
Regulation (Personal Profile) related to personal profile data.
Keyword lists
See “General Data Protection Regulation (Personal
Profile)” on page 1672.

General Data Protection Data identifiers This policy protects personal identifiable information
Regulation (Travel) related to travel.
Keyword lists
See “General Data Protection Regulation (Travel)”
on page 1675.
Chapter 22
Best practices for authoring
policies
This chapter includes the following topics:

■ Best practices for authoring policies

■ Develop a policy strategy that supports your data security objectives

■ Use a limited number of policies to get started

■ Use policy templates but modify them to meet your requirements

■ Use the appropriate match condition for your data loss prevention objectives

■ Test and tune policies to improve match accuracy

■ Start with high match thresholds to reduce false positives

■ Use a limited number of exceptions to narrow detection scope

■ Use compound conditions to improve match accuracy

■ Author policies to limit the potential effect of two-tier detection

■ Use policy groups to manage policy lifecycle

■ Follow detection-specific best practices

Best practices for authoring policies

This section provides general policy authoring best practices for Symantec Data Loss
Prevention. This section assumes that the reader has general familiarity with policy authoring,
including the configuration, testing, and deployment of policies, detection rules, match
conditions, and policy exceptions
Best practices for authoring policies 450
Best practices for authoring policies

See “About Data Loss Prevention policies” on page 368.

See “Detecting data loss” on page 381.
Best practices are not intended to provide detailed troubleshooting guidance. Rather, it is goal
of this section to provide best practices that, when followed, will help to reduce the need for
policy troubleshooting and support.

Table 22-1 Summary of policy authoring best practices

Best practice Description

Develop a policy strategy that supports your data security See “Develop a policy strategy that supports your data
objectives. security objectives” on page 451.

Use a limited number of policies to get started. See “Use a limited number of policies to get started”
on page 451.

Use policy templates but modify them to meet your See “Use policy templates but modify them to meet your
requirements. requirements” on page 452.

Use policy groups to manage policy lifecycle. See “Use policy groups to manage policy lifecycle”
on page 457.

Use the appropriate match condition for your data loss See “Use the appropriate match condition for your data
prevention objectives. loss prevention objectives” on page 452.

Test and tune policies to improve match accuracy. See “Test and tune policies to improve match accuracy”
on page 453.

Start with high match thresholds to reduce false positives. See “Start with high match thresholds to reduce false
positives” on page 454.

Use a limited number of exceptions to narrow detection See “Use a limited number of exceptions to narrow
scope. detection scope” on page 455.

Use compound conditions to improve match accuracy. See “Use compound conditions to improve match
accuracy” on page 455.

Author policies to limit the potential effect of two-tier See “Author policies to limit the potential effect of two-tier
detection. detection” on page 456.

Follow detection-specific best practices. See “Follow detection-specific best practices” on page 457.
Best practices for authoring policies 451
Develop a policy strategy that supports your data security objectives

Develop a policy strategy that supports your data

security objectives
The goal of detection is to achieve accurate results based on true policy matches. Well-authored
policies should accurately detect the data you want to protect with minimal false positives.
Through the use of well-defined policies that implement the right type and combination of rules,
conditions, and exceptions, you can achieve accurate detection results and prevent the loss
of the most critical data in your enterprise
There are two general approaches to developing a data loss prevention policy strategy:
■ Information-driven – Identify sensitive data and author policies to prevent it from being lost.
■ Regulation-driven– Review government and industry regulations and author policies to
comply with them.
Table 22-2 describes these two approaches in more detail.

Table 22-2 Policy detection approaches

Approach Description

Information-driven With this approach you start by identifying specific data items and data combinations you
want to protect. Examples of such data may include fields profiled from a database, a list of
keywords, a set of users, or a combination of these elements. You then group similar data
items together and create policies to identify and protect them. This approach works best
when you have limited access to the data or no particular concerns about a given regulation.

Regulation-driven With this approach you begin with a policy template based on the regulations with which you
must comply. Examples of such templates may include HIPAA or FACTA. Also, begin with
a large set of data (such as customer or employee data). Use the high-level requirements
stipulated by the regulations as the basis for this approach. Then, decide what sensitive data
items and documents in your enterprise meet these requirements. These data items become
the conditions for the detection rules and exceptions in your policies.

Use a limited number of policies to get started

The policy detection rules you implement are based on your organization's information security
objectives. The actions you take in response to policy violations are based on your organization's
compliance requirements. In general you should start small with policy detection. Enable one
or two policy templates, or a few simple conditions, such as keyword matching. Review the
incidents each policy detects. Tune the results before you implement response rules to take
action.
Generally it is better to have fewer policies that are configured to address specific data loss
prevention objectives rather than many policies that attempt to address all of your security
Best practices for authoring policies 452
Use policy templates but modify them to meet your requirements

requirements. Having too many policies can impact the performance of the system and can
lead to too many false positives.
See “Test and tune policies to improve match accuracy” on page 453.

Use policy templates but modify them to meet your

requirements
Policy templates provide an excellent starting point for authoring policies. Symantec Data Loss
Prevention provides 65 pre-built policy templates that contain detection rules and conditions
for many different types of use cases, including regulatory compliance, data protection, security
enforcement, and acceptable use scenarios.
You should use the system-provided policy templates as starting points for your policies. Doing
so will save time and help you avoid errors and information gaps in your policies since the
detection methods are predefined. However, for most situations you will want to modify the
policy template and tailor it for your specific environment. Deploying a policy template
out-of-the-box without configuring it for your environment is not recommended.
See “Creating a policy from a template” on page 397.

Use the appropriate match condition for your data

loss prevention objectives
To prevent data loss, it is necessary to accurately detect all types of confidential data wherever
that data is stored, copied, or transmitted. To meet your data security objectives, you need to
implement the appropriate detection methods for the type of data you want to protect. The
recommendation is to determine the detection methods that work best for you, and tune the
policies as necessary based on the results of your detection testing.
Table 22-3 describes the primary use case for each type of policy match condition provided
by Data Loss Prevention.

Table 22-3 Match conditions compared

Type of data you want to protect Condition Matching

Personally Identifiable Information (PII), such as EDM Exact profiled data

SSNs, CCNs, and Driver's License numbers
Data Identifiers Described, validated data patterns
Best practices for authoring policies 453
Test and tune policies to improve match accuracy

Table 22-3 Match conditions compared (continued)

Type of data you want to protect Condition Matching

Confidential documents, such as Microsoft Word, IDM Exact file contents

PowerPoint, PDF, etc.
Partial file contents (derivative)

VML Similar file contents

Confidential files and images, such as CAD IDM Exact file

drawings
File Properties File context (type, name, size)

Words and phrases, such as "Confidential" or Keywords Exact words, phrases, proximity
"Proprietary"

Characters, strings, text Regular Expressions Described text

Network and endpoint communications Protocol and Endpoint Protocols, destinations, monitoring

Determined by the identity of the user, sender, Synchronized DGM Exact identity from LDAP server
recipient
Profiled DGM Exact profiled identity

Sender/user, recipient Described identity patterns

Describes a document, such as author, title, date, Content-based conditions File type metadata
etc.

Test and tune policies to improve match accuracy

When you create detection policies, there are two common detection problems to avoid. If you
create a policy that is too general or too broad, it generates incidents when no real match has
occurred (false positive). On the other hand, if a policy has rules that are too specific or narrow
about the data it detects, the policy may miss some of the matches you intend to catch (false
negatives). Table 22-4 describes these common problems in more detail.
To reduce false positives and negatives, you need to tune your policies. The best way to tune
detection is to identify a single, specific use case that is a priority, such as protecting source
code for a particular product. You then create a single policy—either from scratch or based
on a template, depending on your DLP strategy—containing one or two detection rules and
test the policy to see how many (quantity) and the types (quality) of incidents the policy
generates. Based on these initial results, you adjust the detection rule(s) as needed. If the
policy generates more false positives than you want, make the detection rule(s) more specific
by fine-tuning the existing match conditions, adding additional match conditions, and creating
policy exceptions. If the policy does not detect some incidents, make the detection condition(s)
less specific.
Best practices for authoring policies 454
Start with high match thresholds to reduce false positives

As your policies mature, it is important to continuously test and tune them to ensure ongoing
accuracy.
See “Follow detection-specific best practices” on page 457.

Table 22-4 Common detection problems to avoid

Problem Cause Description

False positives Policy rules too False positives create high costs in time and resources that are required to
general or broad investigate and resolve apparent incidents that are not actual incidents. Since
many organizations do not have the capacity to manage excess false positives,
it is important that your policies define contextual rules to improve accuracy.

For example, a policy is designed to protect customer names and generates an

incident for anything that contains a first and last name. Since most messages
contain a name—in many cases both first and last names—this policy is too broad
and general. Although it may catch all instances of customer names being sent
outside the network, this policy will return too many false positives by detecting
email messages that do not divulge protected information. First and last names
require a much greater understanding of context to determine if the data is
confidential

False Policy rules too False negatives obscure gaps in security by allowing data loss, the potential for
negatives tight or narrow financial losses, legal exposure, and damage to the reputation of an organization.
False negatives are especially dangerous because you do not know you have
lost sensitive data.

For example, a policy that contains a keyword match on the word "confidential"
but also contains a condition that excludes all Microsoft Word documents would
be too narrow and be suspect to false negatives because it would likely miss
detecting many actual incidents contained in such documents

See “Start with high match thresholds to reduce false positives” on page 454.
See “Use a limited number of exceptions to narrow detection scope” on page 455.
See “Use compound conditions to improve match accuracy” on page 455.

Start with high match thresholds to reduce false

positives
For content-based detection rules, there is a configuration setting that lets you "count all
matches" but only report an incident after a threshold number of matches has been reached.
The general recommendation is to start with high match thresholds for your content-based
detection policies. As you tune your policies you can reduce the match thresholds to be more
precise.
Best practices for authoring policies 455
Use a limited number of exceptions to narrow detection scope

See “Configuring match counting” on page 421.

Use a limited number of exceptions to narrow

detection scope
You can implement exception conditions for any detection rule, except EDM rules. The limited
use of exception conditions can help to reduce false positives by narrowing the scope of policy
detection. However, if you need to use several exceptions in a single policy to achieve the
desired detection results, reconsider the design of the policy. Make sure the policy is
well-defined and uses the proper match conditions.

Caution: Too many compound exceptions in a policy can cause system performance issues.
You should avoid the use of compound exceptions as much as possible.

It is important to understand how exception conditions work so you can use them properly.
Exception conditions disqualify messages from creating incidents. Exception conditions are
checked first by the detection server before match conditions. If the exception condition matches,
the system immediately discards the entire message or message component that met the
exception. There is no support for match-level exceptions. Once the message or message
component is discarded by meeting an exception, the data is no longer available for policy
evaluation.
See “Exception conditions” on page 393.
See “Use compound conditions to improve match accuracy” on page 455.

Use compound conditions to improve match accuracy

Compound conditions can help you improve the match accuracy of your policies. Suppose
you are concerned about Microsoft Word documents leaving the network. Initially, you add a
policy that uses an attachment type condition to catch all Word files. You quickly discover that
too many messages contain Word file attachments that do not divulge protected information.
When you examine the incidents more closely, you realize that you are more concerned with
Word files that contain the word CONFIDENTIAL. In this case you can convert the attachment
type condition to a compound rule by adding a keyword rule for the word CONFIDENTIAL.
Such a configuration would achieve more accurate detection results.
See “Compound conditions” on page 394.
Best practices for authoring policies 456
Author policies to limit the potential effect of two-tier detection

Author policies to limit the potential effect of two-tier

detection
The Exact Data Matching (EDM) and profiled Directory Group Matching (DGM) conditions
require two-tier detection. For these conditions, the DLP Agent must send the data to the
Endpoint Server for evaluation. Indexed Document Matching (IDM) uses two-tier detection if
it is enabled.
See “Two-tier detection for DLP Agents” on page 395.
On the endpoint the DLP Agent executes the least expensive rules first. If you are deploying
a policy to the endpoint that requires two-tier detection, you can author the policy in such a
way to limit the potential effect of two-tier detection.
Table 22-5 provides some considerations for authoring policies to limit the potential effect of
two-tier detection.
See “Detection messages and message components” on page 391.

Table 22-5 Policy configurations for two-tier detection rules

Two-tier match condition Policy configuration

Exact Data Matching (EDM) For EDM policies, consider including Data Identifier rules OR'd with EDM rules.
For example, for a policy that uses an EDM condition to match social security
numbers, you could add a second rule that uses the SSN Data Identifier condition.
The Data Identifier does not require two-tier detection and is evaluated locally by
the DLP Agent. If the DLP Agent is not connected to the Endpoint Server when
the DLP Agent receives the data, the DLP Agent can still perform SSN pattern
matching based on the Data Identifier condition.

See “Combine Data Identifiers with EDM rules to limit the impact of two-tier
detection” on page 610.

For example policy configurations, each of the policy templates that provide EDM
conditions also provide corresponding Data Identifier conditions.

See “Choosing an Exact Data Profile” on page 409.

Best practices for authoring policies 457
Use policy groups to manage policy lifecycle

Table 22-5 Policy configurations for two-tier detection rules (continued)

Two-tier match condition Policy configuration

Indexed Document Matching For IDM policies that match file contents, consider using VML rules OR'd with IDM
(IDM) rules. VML rules do not require two-tier detection and are executed locally by the
DLP Agent. If you do not need to match file contents exactly, you may want to use
VML instead of IDM.

See “Use the appropriate match condition for your data loss prevention objectives”
on page 452.

If you are only concerned with file matching, not file contents, consider using
compound file property rules instead of IDM. File property rules do not require
two-tier detection.

See “Use compound file property rules to protect design and multimedia files”
on page 909.

Directory Group Matching (DGM) For the synchronized DGM Recipient condition, consider including a Recipient
Matches Pattern condition OR'd with the DGM condition. The pattern condition
does not require two-tier detection and is evaluated locally by the DLP Agent.

See “About two-tier detection for synchronized DGM” on page 936.

Use policy groups to manage policy lifecycle

Use policy groups to test policies before using them in production. Create a test policy group
to which only you have access. Then, create policies and add them to the test policy group.
Review the incidents your test policies capture. After you tune the policies and confirm that
they capture the expected incidents, you can rename the policy group and grant the appropriate
roles access to it. You can also use policy groups to manage legacy policies, as well as policies
you want to import or export.
See “Policy groups” on page 372.
See “Removing policies and policy groups” on page 443.

Follow detection-specific best practices

In additional to these general policy authoring considerations, you should be aware of and
keep in mind policy tuning considerations specific to each type of match condition.
Table 22-6 lists detection specific considerations, with links to topics for more information.
Best practices for authoring policies 458
Follow detection-specific best practices

Table 22-6 Best practices for specific detection methods

Detection method Description

EDM See “Best practices for using EDM” on page 601.

IDM See “Best practices for using IDM” on page 648.

VML See “Best practices for using VML” on page 687.

Data identifiers See “Best practices for using data identifiers” on page 833.

Keywords See “Best practices for using keyword matching” on page 849.

Regular expressions See “Best practices for using regular expression matching” on page 855.

Non-English language See “Best practices for detecting non-English language content” on page 867.
detection

File properties See “Best practices for using file property matching” on page 909.

Network protocols See “Best practices for using network protocol matching” on page 914.

Endpoint events See “Best practices for using endpoint detection” on page 923.

Described identities See “Best practices for using described identity matching” on page 932.

Synchronized DGM See “Best practices for using synchronized DGM” on page 941.

Profiled DGM See “Best practices for using profiled DGM” on page 946.

Metadata detection See “Best practices for using metadata detection” on page 991.
Chapter 23
Increasing the Inspection
Content Size
This chapter includes the following topics:

■ Increasing the inspection content size

Increasing the inspection content size

Data Loss Prevention provides an easier way for you to increase the inspection content size.
The default maximum file inspection size is unchanged (30 MB), but you can easily adjust the
inspection size to higher values. The adjustments can be made using a slider at the System
> Servers and Detectors > Overview > Configure Server page under the Detection tab for
detection servers. Currently, the highest limit for the servers (except Discover Exchange
Crawler and Web Prevent) is 2 GB.
The adjustments can be made using a slider at the System > Agents > Agent Configuration
page under the Settings tab for agent configurations. Currently the highest limit for the DLP
Agent is 150 MB.
There are different content inspection file size limits for different channels. Table 23-1 lists the
different channels that Symantec has tested and the corresponding supported file size limits.

Table 23-1 Channel-specific content inspection file size limits

Channel File size limit

Endpoint Prevent 150 MB

EDAR 150 MB

Discover

Discover Exchange Crawler 150 MB

Increasing the Inspection Content Size 460
Increasing the inspection content size

Table 23-1 Channel-specific content inspection file size limits (continued)

Channel File size limit

Discover File System 2 GB

Discover Sharepoint 2 GB

Appliance - REST 1.2 GB

1.7 GB (Base64 encoded)

Web Prevent

Web Prevent FTP 150 MB

Web Prevent HTTPS/HTTP 100 MB

SMTP Prevent 150 MB

Increasing the maximum inspection size limit for files means that larger files are inspected.
Inspection of larger files takes longer and requires more memory for the inspection to complete.
Also, timeout limits increase, so the detection engine takes longer to timeout in the case of
detection failures.
Depending on the content inspection size you choose, certain advanced settings are
automatically adjusted. The Inspection Content Size feature only shows the inspection size
options that you can enable based on your existing system memory.

Note: To complete the update, you must restart the service after you have increased the
maximum inspection size limit using the slider or edited any properties files.

The behavior of the "Increasing the maximum inspection size limit" feature is enabled or
disabled depending on many factors:
■ For a new detection server, the slider is disabled by default and the box is not checked.
■ For a new Agent, the slider is enabled at 30 MB by default and the box is checked.
■ Memory limits on the server are different from memory limits on the agent.
■ You cannot use the slider to increase the maximum inspection size limit if the detection
server is not connected an Enforce Server.

Note: The maximum inspection size limit for the DLP cloud services is not
customer-configurable. These limits are enumerated in the Service Description for the DLP
cloud services. This feature is only available for detection servers, appliances, and the DLP
Agent.
Increasing the Inspection Content Size 461
Increasing the inspection content size

To customize the inspection content size

1 Go to System > Servers and Detectors > Configure a Server for detection servers or
System > Agents > Agent Configuration > Settings for DLP Agents.
2 Click the Detection tab for detection servers or go to the Setting section for DLP Agents.
3 Click Customize settings, under Inspection Content Size.
Move the slider to the size you want. These values that follow are examples only; you
only see the options that can be enabled based on your system memory.
■ 30 MB, 50 MB, 100 MB, or 150 MB for DLP Agents
■ 30 MB, 100 MB, 150 MB, 500 MB, or 2 GB for detection servers and appliances
When you select a new size, Symantec Data Loss Prevention automatically updates
Advanced Server or Advanced Agent settings to implement your selection. If your settings
are different from the preferred and recommended settings, a link to Preview updated
settings appears.
4 Click Preview updated settings to see the Advanced Setting Name, Current Value,
and Preferred Value.
5 For the detection servers only, if you need to change properties file settings, a Tuning
Guidelines link appears. You can click the link and review the Symantec Support Center
article Guidelines for editing properties files to scan large files.You do not need to edit
properties files for the DLP Agent.
6 Restart the service. To complete the update, you must restart the service after you have
adjusted the maximum inspection size limit using the slider or edited any properties files.

System Event Codes

System events are shown whenever the Advanced Settings are updated. For a list of system
events that you might see after Advanced Settings have been updated, see Table 23-2

Table 23-2 System Events for changes in Advanced Settings for larger files.

System event code Description/Message Server or Agent

5306 Agent advanced settings update Agent

is complete.

5307 Agent advanced settings have Agent

been updated.

5308 Agent advanced settings update Agent

has failed.

5309 Server advanced settings update Server

is complete.
Increasing the Inspection Content Size 462
Increasing the inspection content size

Table 23-2 System Events for changes in Advanced Settings for larger files. (continued)

System event code Description/Message Server or Agent

5310 Advanced settings have been Server

updated for the server.

5311 Advanced settings update has Server

failed for the server {0}.

If you choose a setting of 500 MB or greater on the detection server, Symantec recommends
that you enable external storage for incident attachments (blob externalization). To enable
external storage for incident attachments during installation or upgrade, see "External storage
for incident attachments,” in the Symantec Data Loss Prevention Installation Guide and
Symantec Data Loss Prevention Upgrade Guide. You can find the Symantec Data Loss
Prevention Installation Guide at the Symantec Support Center at
https://www.symantec.com/docs/doc9257.html. You can find the Symantec Data Loss
Prevention Upgrade Guide at the Symantec Support Center at
https://www.symantec.com/docs/doc9258.html.
To enable external storage for incident attachments after installation or upgrade, see "About
the incident attachment external storage directory" in the Symantec Data Loss Prevention
System Maintenance Guide. You can find the Symantec Data Loss Prevention System
Maintenance Guide at the Symantec Support Center at
https://www.symantec.com/docs/doc9267.html.
Chapter 24
Installing remote indexers
This chapter includes the following topics:

■ About installing remote indexers

■ Installing a remote indexer on Windows

■ Installing a remote indexer on Linux

■ Configuring a remote indexer on Linux

About installing remote indexers

You install remote indexers on one or more systems where the confidential files you want to
index are stored. The steps to install remote indexers are different depending on the operating
system.

Note: The indexer that is available on the Enforce Server administration console does not
require separate installation. It is installed when you install the Enforce Server.

If you install a remote indexer on Windows, you can perform a Silent Mode installation, or you
run the graphical user interface method to install.
See “Installing a remote indexer on Windows” on page 464.
On Linux, you install RPM files, then you configure the installation. You can configure the
installation using the Silent Mode method or by running a command prompt to enter
configuration parameters.
See “Installing a remote indexer on Linux” on page 466.
You can install the Remote EDM, the Remote EMDI, and the Remote IDM Indexer on all
supported Windows and Linux platforms. See the Symantec Data Loss Prevention System
Requirements Guide for platform details.
Installing remote indexers 464
Installing a remote indexer on Windows

Note: You must be logged on as administrator (Windows) or root (Linux) to install the remote
indexers. There is an issue with the permissions that are needed to run the remote indexers.
You need to follow a workaround procedure to assure that users other than administrator or
root can run the remote indexers.

See “Installing a remote indexer on Windows” on page 464.

Installing a remote indexer on Windows

Follow this procedure to install the remote indexer software on a remote indexer computer.
You specify the type of remote indexer during the configuration process that follows this
installation process.

Note: The following instructions assume that the indexer installer (Indexers.msi) has been
copied from the Enforce Server to a local directory on the remote computer. The Indexers.msi
file is included in your software download (DLPDownloadHome) directory. It should have been
copied to a local directory on the Enforce Server during the Enforce Server installation process.

Using the graphical user interface method to install does not generate log information. To
generate log information, run the installation using the following command:
C:\msiexec /i Indexers.msi /L*v c:\indexers_install.log

You can complete the installation using Silent Mode. Enter values with information specific to
your installation for the following:

Table 24-1 Indexer Silent Mode installation parameters for Windows

Command Description

INSTALLATION_DIRECTORY Specifies where the remote indexer is installed. The

default location is C:\Program
Files\Symantec\DataLossPrevention.

DATA_DIRECTORY Defines where Symantec Data Loss Prevention

stores files that are updated while the indexer is
running (for example, logs and licenses). The default
location is

C:\ProgramData\Symantec\DataLossPrevention
\Indexer\.

JRE_DIRECTORY Specifies where the JRE resides.

Installing remote indexers 465
Installing a remote indexer on Windows

Table 24-1 Indexer Silent Mode installation parameters for Windows (continued)

Command Description

FIPS_OPTION Defines whether to disable (Disabled) or enable

(Enabled) FIPS encryption.

The following is an example of what the completed command might look like:

msiexec /i Indexers.msi /qn /norestart /L*v Indexers.log

FIPS_OPTION=Disabled
INSTALLATION_DIRECTORY="C:\Program Files\Symantec\DataLossPrevention"
DATA_DIRECTORY="C:\ProgramData\Symantec\DataLossPrevention\Indexer\"

To install a remote indexer on Windows

1 Log on as Administrator to the system on which you intend to install the remote indexer.
2 Go to the folder where you copied the Indexers.msi file.

Note: Using the graphical user interface method to install does not generate log information.
To generate log information, run the installation using the following command:
C:\msiexec /i Indexers.msi /L*v c:\indexer_install.log

3 Double-click Indexers.msi to open the file, and click OK.

4 In the Welcome panel, click Next.
5 After you review the license agreement, select I accept the agreement, and click Next.
6 In the Destination Folder panel, accept the default destination directory, or enter an
alternate directory, and click Next. The default installation directory is:
c:\Program Files\Symantec\DataLossPrevention\

Symantec recommends that you use the default destination directory. References to the
"installation directory" in Symantec Data Loss Prevention documentation are to this default
location.
7 In the JRE Directory panel, accept the default JRE location (or click Browse to locate
it), and click Next.
8 In the FIPS Cryptography Mode panel, select whether to disable or enable FIPS
encryption.
9 Click Next.
10 Click Install.
See “About the Remote EDM Indexer” on page 586.
Installing remote indexers 466
Installing a remote indexer on Linux

See “About the Remote EMDI Indexer” on page 505.

See “About the Remote IDM Indexer” on page 655.
See “Installing a remote indexer on Linux” on page 466.

Installing a remote indexer on Linux

Note: The following instructions assume that the Indexers.zip file has been copied into the
/opt/temp/ directory on the server computer.

To install an indexer on Linux

1 Log on as root to the computer on which you intend to install the remote indexer.
2 Copy the remote indexer installer (Indexers.zip) from the Enforce Server to a local
directory on the remote indexer computer. The Indexers.zipfile is included in your
software download (DLPDownloadHome) directory. It should have been copied to a local
directory on the Enforce Server during the Enforce Server installation process.
3 Navigate to the directory where you copied the Indexers.zip file (/opt/temp/).
4 Unzip the file to the same directory.
5 Confirm file dependencies for RPM files by running the following command:
rpm -qpR symantec-dlp-15-1-indexers-15.5-1.el6.x86_64.rpm

6 Run the following command to install all RPM files in the folder:
rpm -ivh *.rpm

See “Configuring a remote indexer on Linux” on page 466.

Configuring a remote indexer on Linux

After you install a remote indexer, you configure it by running the Remote indexer configuration
utility.
You can compete the installation using Silent Mode. Table 24-2 lists the installation parameters
you use during the remote indexer Silent Mode installation.
Installing remote indexers 467
Configuring a remote indexer on Linux

Table 24-2 Indexer Silent Mode installation parameters on Linux

Command Description

jreDirectory Specifies where the JRE resides.

fipsOption Defines whether to disable (Disabled) or enable

(Enabled) FIPS encryption.

The following is an example of what the completed command might look like:

./IndexersConfigurationUtility -silent
-jreDirectory=/opt/Symantec/DataLossPrevention/Server\ JRE/1.8.0_181/
-fipsOption=Disabled

To configure a remote indexer on Linux

1 Navigate to the installation directory:
/opt/Symantec/DataLossPrevention/Indexers/15.5/Protect/install

2 Run the remote indexer configuration utility. Use the following command to launch the
utility:
./IndexersConfigurationUtility

3 Enter the following information in the Remote indexer configuration utility:

JRE directory Enter the JRE directory.

The default directory is

/opt/Symantec/DataLossPrevention/Server
JRE/[JRE version].
Note: If you install the JRE before running
./IndexersConfigurationUtility, then you do not
enter the JRE directory. The Remote Indexer
Configuration Utility automatically defines the
JRE path.

FIPS encryption Select whether to disable or enable FIPS

encryption.

See “About the Remote EDM Indexer” on page 586.

See “About the Remote EMDI Indexer” on page 505.
See “About the Remote IDM Indexer” on page 655.
Chapter 25
Detecting content using
Exact Match Data
Identifiers (EMDI)
This chapter includes the following topics:

■ Introducing Exact Match Data Identifiers (EMDI)

■ Configuring Exact Match Data Identifier profiles

■ Using multi-token matching with EMDI

■ Memory requirements for EMDI

■ Remote EMDI indexing

■ Properties file settings for EMDI

■ Best practices for using EMDI

■ EMDI Troubleshooting

Introducing Exact Match Data Identifiers (EMDI)

Exact Match Data Identifier (EMDI) detection is a powerful exact matching detection technology
that enables you to detect structured data, especially personally-identifiable information (PII),
with a high degree of accuracy. You can use EMDI to exactly match indexed records across
all Data Loss Prevention channels. Fast performing and secure, EMDI can help you reduce
false positives when compared to data identifiers and regular expressions. EMDI provides
better matching performance and greater memory efficiency than Exact Data Matching (EDM).
Detecting content using Exact Match Data Identifiers (EMDI) 469
Introducing Exact Match Data Identifiers (EMDI)

Before you proceed with EMDI, it's important for you to have a good understanding of data
identifiers and how they are used in Symantec Data Loss Prevention.
See “About using EMDI to protect content” on page 469.

About using EMDI to protect content

EMDI works as an additional validation check against data identifier pattern matchers. With
EMDI, Data Loss Prevention doesn't rely on the Credit Card Number data identifier to match
any pattern that looks like a credit card number and passes a Luhn check. Instead, EMDI
enables customers to exactly match only the credit card numbers that are contained within
their index of records. To exactly match, you can use the Credit Card Number and at least
one additional column of identifying information within the index of records, such as the Issuing
Bank Number that corresponds to that record in the data source that the EMDI profile uses.
Since data sources can contain more than two kinds of information, you could also use the
Card Expiration Date as a third field to ensure an accurate match. Both system (built-in) and
custom data identifiers are supported.
EMDI covers every EDM detection use case that involves two or more columns with at least
one column that has highly unique data that matches a highly discriminatory pattern (that is
expressible with a data identifier). These columns are known as "key columns."
EMDI supports up to 4 million rows and 32 columns per index. These larger indexes are always
deployed to detection servers, appliances, and cloud services. Indexes larger than 100 MB
are not distributed to DLP Agents by default, but this maximum limit can be configured. All
existing system data identifiers and most custom data identifiers are supported.
You configure EMDI at Manage Data Profiles > Exact Data > Add Exact Match Data Identifier
Profile. provides the steps you need to take for implementation.
To configure EMDI
1 You identify and prepare the data you want to protect.
2 You create an Exact Match Data Identifier profile and identify data source columns as
Required, Optional, or Ignore to generate a match. Required columns must be mapped
to either a built-in system data identifier or a custom data identifier.
3 You enable the index as an Exact Match Data Identifier validator either inline in a policy
as part of a data identifier condition, or as part of the configuration of the data identifier.
4 When you add an EMDI validator to an existing data identifier validator, EMDI is used
each time the existing validator is used in a policy.
5 You index the structured data source using the Enforce Server administration console,
or remotely using the Remote EMDI Indexer. During the indexing process, the system
indexes record data that is contained within tabular CSV files. You can schedule indexing
on a regular basis to ensure that the EMDI index reflects the current data.
Detecting content using Exact Match Data Identifiers (EMDI) 470
Introducing Exact Match Data Identifiers (EMDI)

See “About EMDI and key columns” on page 470.

About EMDI and key columns

An important concept for EMDI is the "key column." When using EMDI, you must specify two
or more columns with at least one "key column" that has highly unique and discriminatory
values that matches a distinctive pattern (that is expressible with a data identifier).
In the following examples the data in the first (bold) "key" column is used as a data identifier
pattern that must be in a match.
■ Detect two (or more) out of
(Account Number, Routing Number First Name, Last Name, Last 4 SSN)
■ Detect two (or more) out of
(Driver's License Number, First Name, Last Name, DOB, Address, City, State)
■ Detect two (or more) out of
(Medical Record Number, First Name, Last Name, Last 4 SSN)
■ Detect two (or more) out of
(Credit Card Number, Issuing Bank Name, CVV, Card Expiration Date)
■ Detect both of
(Part Number, Part Description)
See “About EMDI policy features” on page 470.

About EMDI policy features

EMDI policy matching includes validation of matching data identifier patterns using an indexed
data source. It searches for indexed content in a given message or file. Then it generates an
incident if a match is found within a proximity window before and after the data identifier match.
A proximity window of 50 tokens before and 50 tokens after the data identifier match is the
default value and maximum value. This value is configurable; you can change it from 1 to 50.
Policy matching requirements and features of EMDI include the following:
■ You must specify one required column that can be matched by a highly discriminating data
identifier. This column is referred to as the "key column."
■ The key column must be highly variable (with few repeating values).
■ A minimum of two columns are required for a match; a required "key" column and an
optional column.
■ For highly variable data (with few repeated values in the index) the EMDI algorithm
generates fewer than one false positive per 1000 data identifier matches. Common repeated
values in key or non-key columns may result in higher rates of false positives.
Detecting content using Exact Match Data Identifiers (EMDI) 471
Introducing Exact Match Data Identifiers (EMDI)

■ The number of rows per index is limited to 4 million.

■ The system provides match highlighting at the incident snapshot screen. Tokens from
matching rows are highlighted, not only the matching data identifier value.
■ EMDI supports single-token and multi-token cell indexing and matching. A multi-token is
a cell that contains two or more words. Since a single CJK (Chinese, Japanese, Korean)
character is regarded as a token, two or more CJK characters are treated as a multi-token.
See “EMDI compared to EDM” on page 471.

EMDI compared to EDM

EMDI relies on a different underlying detection technology than EDM, and is neither a substitute
nor a replacement for EDM. However, one of the advantages of EMDI over EDM is that EMDI
is available as a locally-executed exact matching technology on the DLP Agent. EDM is only
available on the DLP Agent in two-tier detection mode.Table 25-1 lists comparisons between
EMDI and EDM.

Table 25-1 EMDI compared to EDM

EMDI EDM

EMDI can support EDM detection scenarios that involve matching There is no requirement that EDM
against two or more columns of a data source when at least one of must match against a column that
those columns matches a data identifier. EMDI supports both system can be represented by a data
and custom data identifiers. identifier.

EMDI scans an entire data source, within the stated limits. By default, EDM scans only the first
30,000 tokens for inspected
content, though this limit can be
increased.

EMDI performs matching locally on the DLP Agent, so there is no EDM is only available on the DLP
need to implement two-tier detection. Agent in two-tier detection mode.

Available on all channels, including detection servers, appliances, EDM is available on detection
the cloud, and DLP Agents (including disconnected DLP Agents). servers, appliances, and the cloud.
EDM is only available on the
endpoint in two-tier detection mode.

Supports blocking, user notification, and encryption on the DLP EDM is only available on the DLP
Agent. Agent in two-tier detection mode.
When operating in two-tier detection
mode, the DLP Agent does not
support synchronous response
actions such as blocking, user
notification, or encryption.
Detecting content using Exact Match Data Identifiers (EMDI) 472
Introducing Exact Match Data Identifiers (EMDI)

Table 25-1 EMDI compared to EDM (continued)

EMDI EDM

The memory footprint for EMDI is 1/5 of the memory footprint for EDM memory footprint is about 5
EDM for the same indexed data source. times that of the memory footprint
for EMDI.

EMDI supports up to 4 million rows x 32 columns per index up to EDM supports hundreds of millions
128 million cells per index. of rows x 32 columns up to 6 billion
cells per index.

EMDI has a stringent security model that makes it suitable for profile EDM profiles are never deployed
deployment on the DLP Agent. on the DLP Agent.

There is no natural language processing for Chinese, Japanese, EDM supports natural language
and Korean for EMDI matching. processing for Chinese, Japanese,
and Korean.

You can use either EMDI or EDM for some exact matching cases that have at least two source
columns and where one column has values that can be expressed with a data identifier. The
following recommendations detail when it is better to use EMDI rather than EDM, and vice
versa.

Use EMDI instead of EDM if:

■ You already use data identifiers and you want to improve detection accuracy with exact
matching.
■ You need exact matching and detection-time enforcement on your DLP Agents, such as
blocking, user notification, or encryption.
■ You have a need to be more flexible with the identifier detection. For example, you need
to detect identifiers with nonstandard separator characters (for example, match 123*456
or 123/456 or 123_456).
■ You need to use exact matching in an exception.

Use EDM instead of EMDI if:

■ You need to exclude specific combinations of columns from a match. For example, you
need to match three of the following four columns: Identification Number, Last Name, City,
and Postal Code; but you need to exclude the Last Name, City, and Postal Code
combination.
■ You need to use more discriminating policy features, such as data owner exception and
the where clause.
■ You need to protect against indexes with a large number of rows (greater than 4 million).
Detecting content using Exact Match Data Identifiers (EMDI) 473
Introducing Exact Match Data Identifiers (EMDI)

See “About the Exact Match Data Identifier profile and index” on page 473.

About the Exact Match Data Identifier profile and index

The Exact Match Data Identifier Profile is the user-defined configuration that you create to
index the data source. The index is a secure file that contains hashes of the exact data values
from each field in your data source, along with information about those data values. The index
does not contain the data values themselves.
The index that is generated consists of one binary source file called EmdiDataSource.rdx. By
default, Symantec Data Loss Prevention stores index files in
C:\ProgramData\Symantec\DataLossPrevention\ServerPlatformCommon\15.5\Protect\index
(on Windows) or in
/var/Symantec/DataLossPrevention/ServerPlatformCommon/15.5/Protect/index (on
Linux) on the Enforce Server. Symantec Data Loss Prevention automatically deploys all EMDI
indexes (*.rdx files) to the index directory on all detection servers.
The system deploys the endpoint index (EmdiDataSource.rdx) to each designated Endpoint
Server. When a DLP Agent connects to the Endpoint Server, the DLP Agent downloads the
latest version of the endpoint index; if the agent already has the latest version of the index,
nothing happens. The indexes are saved in an encrypted binary format in the endpoint database.
When an active policy that references an EMDI profile is deployed to a detection server, the
detection server loads the corresponding EMDI index into RAM. If a new detection server is
added after an index has been created, the *.rdx files in the index folder on the Enforce
Server are deployed to the index folder on the new detection server. You cannot manually
deploy index files to detection servers.
See “About the Exact Match Data Identifier source file” on page 473.

About the Exact Match Data Identifier source file

The data source file is a tabular file containing data in a standard delimited format (comma,
semicolon, pipe, or tab). You extract the data from a database, spreadsheet, or other structured
data source. You also cleanse the data for profiling. You upload the data source file to the
Enforce Server when you define the Exact Match Data Identifier Profile. For example, you
can convert an Excel spreadsheet to a comma-separated values (CSV) format. The resulting
*.csv file can be used as the data source for your EMDI profile.
See “Cleanse the EMDI data source file of blank columns and duplicate rows” on page 519.
See “Creating the Exact Match Data Identifier source file” on page 477.
You can use the SQL preindexer to index the data source directly. However, this approach
has limitations because in most cases the data must first be cleansed before it is indexed.
See “Remote EMDI indexing” on page 504.
Detecting content using Exact Match Data Identifiers (EMDI) 474
Introducing Exact Match Data Identifiers (EMDI)

The data source file must contain at least one key column that contains largely unique values
that can be expressed as a data identifier. The parameters affecting the uniqueness of the
key columns can be edited in the Indexer.properties file located at \Program
Files\Symantec\Data Loss
Prevention\EnforceServer\15.5\Protect\config\Indexer.properties (Windows)
or/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config/Indexer.properties
(Linux).

These parameters are listed in Table 25-2.

Table 25-2 Parameters affecting indexer sensitivity to key-column uniqueness

Parameters in Indexer.properties Function

EMDI.MaxDuplicateCellsPercentage=1 Maximum percentage of duplicated key column cells

in the index; the default value is 1%.

EMDI.MaxNonMatchingDIPercentage=1 Maximum percentage of key column cells that don’t

match the data identifier that is assigned to this
profile; the default value is 1%.

Non-configurable limits for EMDI: The same value can appear no more than five times in a
key column in a given EMDI index. This is a different number than
EMDI.MaxDuplicateCellsPercentage, which instead indicates the total number of duplicates
in the index.
See “Best practices for using EMDI” on page 517.

Note: The format for the data source file should be a text-based format using commas,
semicolons, pipes, or tabs as delimiters. You should avoid using a spreadsheet format for the
data source file (such as XLS or XLSX) because such programs use scientific notation to
render numbers.

See “About cleansing the Exact Match Data Identifier source file” on page 474.

About cleansing the Exact Match Data Identifier source file

Once you have created the data source file, you must prepare the data for indexing by cleansing
it. You must cleanse the data source file to ensure that your EMDI policies are as accurate as
possible. You can use tools such as Stream Editor (sed) and awk to cleanse the data source
file. Melissa Data provides tools for normalizing data in the data source, such as addresses.
Table 25-3 provides the steps you must take to cleanse the data source file for indexing.
Detecting content using Exact Match Data Identifiers (EMDI) 475
Introducing Exact Match Data Identifiers (EMDI)

Table 25-3 Workflow for cleansing the data source file

Step Action Description

1 Prepare the data source file for indexing. See “Preparing the Exact Match Data Identifier
source for indexing” on page 478.

2 Ensure that you have specified a key See “About EMDI and key columns” on page 470.
column that can be matched by a highly
variable data identifier. Ensure that the
key column contains reasonably unique
data.

4 Remove incomplete and duplicate See “About cleansing the Exact Match Data Identifier
records. Do not fill empty cells with fake source file” on page 474.
data.

5 Remove improper characters. See “Remove ambiguous character types from the
EMDI data source file” on page 520.

6 Verify that the data source file is below See “Preparing the Exact Match Data Identifier
the error threshold. The error threshold source for indexing” on page 478.
is the maximum percentage of rows that
contain errors before indexing stops.

See “About EMDI index scheduling” on page 475.

About EMDI index scheduling

After you have indexed an exact data source extract, its schema cannot be changed. If the
data source changes, or the number of columns or data mapping of the exact data source file
changes, you must create a new EMDI index and update the policies that reference the changed
data. In this case you can schedule the indexing to keep the index in sync with the data source.
Here is a typical use case: You extract data from a database to a file and cleanse it to create
your data source file. Using the Enforce Server administration console you define an Exact
Match Data Identifier profile and index the data source file. The system generates the *.rdx
index files and deploys them to one or more detection servers, appliances, cloud services,
and agents. If you know that the data changes frequently, you need to generate a new data
source file regularly to keep up with the changes to the database. In this case, you can use
index scheduling to automate the indexing of the data source file so you do not have to return
to the Enforce Server administration console and reindex the updated data source. Your only
task is to provide an updated and cleansed data source file to the Enforce Server for scheduled
indexing.
See “Configuring Exact Match Data Identifier profiles” on page 476.
Detecting content using Exact Match Data Identifiers (EMDI) 476
Configuring Exact Match Data Identifier profiles

Configuring Exact Match Data Identifier profiles

To implement EMDI, you create the Exact Match Data Identifier Profile and index the data
source. You also need to edit an existing data identifier or create a new custom data identifier.
Then, for each data identifier breadth, you must add and configure EMDI as an optional validator
and enable an EMDI validation check during policy creation or on the Manage > Policies >
Data Identifiers page. Table 25-4 details the steps in this process.
See “About the Exact Match Data Identifier profile and index” on page 473.

Table 25-4 Implementing Exact Match Data Identifier matching

Step Action Description

1 Create the data source file. Export the source data from the database (or other data repository) to
a tabular text file with delimited fields.

See “About the Exact Match Data Identifier source file” on page 473.

See “Creating the Exact Match Data Identifier source file” on page 477.

2 Prepare the data source file for Cleanse the data source file.
indexing.
See “Cleanse the EMDI data source file of blank columns and duplicate
rows” on page 519.

3 Upload the data source file to the You can copy or upload the data source file to the Enforce Server, or
Enforce Server. access it remotely.

See “Uploading the Exact Match Data Identifier source files to the
Enforce Server” on page 480.

4 Edit an existing data identifier or See “Adding an EMDI check to a built-in or custom data identifier
create a new custom data condition in a policy” on page 487.
identifier to add EMDI as a
validator.

5 Create an Exact Match Data An Exact Match Data Identifier profile is required to use Exact Match
Identifier profile. Data Identifier matching. The Exact Match Data Identifier profile
specifies the data source, data field types, and the indexing schedule.

See “Adding Exact Match Data Identifier Profiles” on page 482.

See “Creating and modifying the Exact Match Data Identifier profiles”
on page 483.
Detecting content using Exact Match Data Identifiers (EMDI) 477
Configuring Exact Match Data Identifier profiles

Table 25-4 Implementing Exact Match Data Identifier matching (continued)

Step Action Description

6 Mark each column in the data Use the slider to mark each index column (data source field) as Ignore,
source as Ignore, Optional, or Optional, or Required. Each index must contain at least one required
Required, in the data source. ("key") column that is mapped to a system data identifier or custom
data identifier. It must also contain at least one optional column.

See “Adding Exact Match Data Identifier Profiles” on page 482.

See “Creating and modifying the Exact Match Data Identifier profiles”
on page 483.

7 Enable the policy as an Exact After the policy is created, it must be enabled as an Exact Match Data
Match Data Identifier check. Identifier Check for data identifier validation.

See “Adding an EMDI check to a built-in or custom data identifier

condition in a policy” on page 487.

8 Index the data source, or Schedule the indexing to keep the index in sync with the data source.
schedule indexing.
See “About EMDI index scheduling” on page 475.

See “Scheduling EMDI profile indexing” on page 485.

See “Creating the Exact Match Data Identifier source file” on page 477.

Creating the Exact Match Data Identifier source file

The first step in the EMDI indexing process is to create the data source. A data source is a
tabular file containing data in a standard delimited format, with data delimited by commas,
semicolons, pipes, or tabs.
See Table 25-5 for instructions.

Table 25-5 Create the exact match data identifier source file

Step Description

1 Export the data you want to protect from a database or other tabular data format, such as an Excel
spreadsheet, to a tabular text file. The data source file you create must be a tabular text file that contains
rows of data from the original source. Each row from the original source is included as a row in the data
source file. Delimit columns using a tab, a comma, a semi-colon, or a pipe. Pipe is preferred. Comma
should not be used if your data source fields contain numbers.

See “About the exact data source file” on page 529.

The data source file cannot exceed 32 columns or 4 million rows. If you plan to upload the data source
file to the Enforce Server, browser capacity limits the data source size to 2 GB. For file sizes larger than
this size you can copy the file to the Enforce Server using FTP/S, SCP, SFTP, CIFS, or NFS.
Detecting content using Exact Match Data Identifiers (EMDI) 478
Configuring Exact Match Data Identifier profiles

Table 25-5 Create the exact match data identifier source file (continued)

Step Description

2 For all EMDI implementations, make sure that the data source contains at least one column of unique
data values (Required column) and one Optional column. Three or more columns (including one Required
column) are recommended.

3 Prepare the exact match data identifier source file for indexing.

See “Preparing the Exact Match Data Identifier source for indexing” on page 478.

See See “Preparing the Exact Match Data Identifier source for indexing” on page 478. for
instructions.

Preparing the Exact Match Data Identifier source for indexing

Once you create the Exact Match Data Identifier source file, you must prepare it so that you
can index your data. When you index an EMDI profile, the Enforce Server keeps track of empty
cells and any misplaced data that count as errors.
EMDI is designed to detect combinations of globally unique data fields. Your EMDI index must
include at least one column of data that contains nearly unique values for each record in the
row. Column data such as account numbers, social security numbers, and credit card numbers
are often highly unique. On the other hand, states or ZIP Codes are not unique, nor are names.
If you do not include at least one column of unique data (a key column) in your index, your
EMDI profile does not accurately detect the data you want to protect.
Table 1-17 describes the various types of unique data to include in your EMDI indexes, as
well as fields that are not unique. You can include the non-unique fields in your EMDI indexes
as long as you have at least one unique column field.

Table 25-6 Examples of unique data for EMDI policies

Unique data for EMDI Non-unique data for EMDI

The following data fields are often unique: The following data fields are not unique:

■ Account number ■ First name

■ Bank Card number ■ Last name
■ Phone number ■ City
■ Social security number ■ State
■ Tax ID number ■ ZIP Code
■ Drivers license number ■ Password
■ Employee number ■ PIN
■ Insurance number
Detecting content using Exact Match Data Identifiers (EMDI) 479
Configuring Exact Match Data Identifier profiles

When you index an EMDI profile, the Enforce Server keeps track of empty cells and any
misplaced data which count as errors. For example, an error may be a name that appears in
a column for phone numbers. Errors can constitute a certain percentage of the data in the
profile (five percent, by default). If this default error threshold is met, Symantec Data Loss
Prevention stops indexing. It then displays an error to warn you that your data may be
unorganized or corrupted.
To prepare the exact match data identifier source for EMDI indexing
1 Make sure that the data source file is formatted as follows:
■ The data source must have at least two columns and at least one column that can be
mapped to a data identifier. One of the columns should contain unique values. For
example, credit card numbers, driver’s license numbers, or account numbers (as
opposed to first and last names, which are generic).
See “Ensure data source has at least one column of unique data (EDM)” on page 602.
■ Verify that you have delimited the data source using commas, pipes ( | ), tabs, or
semicolons. If the data source file uses commas as delimiters, remove any commas
that do not serve as delimiters.
See “Do not use the comma delimiter if the data source has number fields (EDM)”
on page 605.
■ Verify that data values are not enclosed in quotes.
■ Remove single-character and abbreviated data values from the data source. For
example, remove the column name and all values for a column in which the possible
values are Y and N. You should also remove values such as "CA" for California, or
other abbreviations for states.
■ Remove columns with frequently repeating values.
■ Optionally, remove any columns that contain numeric values with fewer than five digits,
as these can cause false positives in production deployments.
See “Remove ambiguous character types from the data source file (EDM)” on page 604.
■ A field delimiter should not appear in a field value.
■ Eliminate duplicate records.
See “Cleanse the data source file of blank columns and duplicate rows (EDM)”
on page 603.

2 Once you have prepared the exact match data identifier source file, proceed with the next
step in the EMDI process: upload the exact data source file to the Enforce Server for
profiling the data you want to protect.
See “Uploading the Exact Match Data Identifier source files to the Enforce Server” on page 480.
Detecting content using Exact Match Data Identifiers (EMDI) 480
Configuring Exact Match Data Identifier profiles

Uploading the Exact Match Data Identifier source files to the Enforce
Server
After you have prepared the data source file for indexing, load it to the Enforce Server so the
data source can be indexed.
See “Creating and modifying the Exact Match Data Identifier profiles” on page 483.
Listed here are the options you have for making the data source file available to the Enforce
Server. Consult with your database administrator to determine the best method for your needs.

Table 25-7 Uploading the exact match data identifier source file to the Enforce Server for
indexing

Upload option(s) Use case Description

Upload Data Source Data source If you have a smaller data source file (less than 50 MB), upload the data source
to Server Now file is less than file to the Enforce Server using the Enforce Server administration console.
50 MB. When creating the Exact Match Data Identifier Profile, you can specify the
file path or browse to the directory and upload the data source file.
Note: Due to browser capacity limits, the maximum file size that you can upload
is 2 GB. However, uploading any file over 50 MB is not recommended, since
files over this size can take a long time to upload. If your data source file is
over 50 MB, consider copying the data source file to the datafiles directory
using the next option.

Reference Data Data source If you have a large data source file (over 50 MB), copy it to the datafiles
Source on Manager file is over 50 directory on the host where the Enforce Server is installed.
Host MB.
On Windows this directory is located at

C:\ProgramData\Symantec\DataLossPrevention
\ServerPlatformCommon\15.5\Protect\datafiles.

On Linux this directory is located at

/var/Symantec/DataLossPrevention/
ServerPlatformCommon/15.5/datafiles.

This option is convenient because it makes the data file available through a
drop-down list during configuration of the Exact Match Data Identifier Profile.
If it is a large file, use a third-party solution (such as Secure FTP) to transfer
the data source file to the Enforce Server.
Note: Ensure that the Enforce Server user (usually called "protect") has modify
permissions (on Windows) or rw permissions (on Linux) for all files in the
datafiles directory.
Detecting content using Exact Match Data Identifiers (EMDI) 481
Configuring Exact Match Data Identifier profiles

Table 25-7 Uploading the exact match data identifier source file to the Enforce Server for
indexing (continued)

Upload option(s) Use case Description

Use This File Name Data source You may want to create an EMDI profile before you have created the exact
file is not yet match data identifier source file. In this case you can create a profile template
created. and specify the name of the data source file you plan to create. This option lets
you define EMDI policies using the EMDI profile template before you index the
data source. The policies do not operate until the data source is indexed.

When you have created the data source file you place it in the

\ProgramData\Symantec\DataLossPrevention
\ServerPlatformCommon\15.5\Protect\datafiles

directory on Windows or

/var/Symantec/DataLossPrevention/
ServerPlatformCommon/15.5/Protect/datafiles

on Linux and index the data source immediately on save or schedule indexing.

See “Creating and modifying the Exact Match Data Identifier profiles”
on page 483.
Detecting content using Exact Match Data Identifiers (EMDI) 482
Configuring Exact Match Data Identifier profiles

Table 25-7 Uploading the exact match data identifier source file to the Enforce Server for
indexing (continued)

Upload option(s) Use case Description

Use This File Name Data source is In some environments it may not be secure or feasible to copy or upload the
to be indexed data source file to the Enforce Server. In this situation you can index the data
and
remotely and source remotely using the Remote EMDI Indexer.
Load Externally copied to the
See “Remote EMDI indexing” on page 504.
Generated Index Enforce
Server. This utility lets you index an exact match data identifier source on a computer
other than the Enforce Server host. This feature is useful when you do not want
to copy the data source file to the same computer as the Enforce Server. As
an example, consider a situation where the originating department wants to
avoid the security risk of copying the data to an extra-departmental host. In
this case you can use the Remote EMDI Indexer.

First you create an EMDI profile template where you choose the Use this File
Name and the Number of Columns options. You must specify the name of
the exact match data identifier source file and the number of columns it contains.

See “Creating an EMDI profile template for remote indexing” on page 508.

You then use the Remote EMDI Indexer to remotely index the data source and
copy the index files to the Enforce Server host and load the externally generated
index. The Load Externally Generated Index option is only available after
you have defined and saved the profile. Remote indexes are loaded on Windows
from these directories:

\ProgramData\Symantec\DataLossPrevention
\EnforceServer\15.5\Protect\index

and on Linux from the

/var/Symantec\DataLossPrevention/EnforceServer/15.5/Protect/index

on the Enforce Server host.

See “Uploading the Exact Match Data Identifier source files to the Enforce
Server” on page 480.

See “Adding Exact Match Data Identifier Profiles” on page 482.

Adding Exact Match Data Identifier Profiles

The Manage > Data Profiles > Exact Data > Add Exact Match Data Identifier Profile screen
is the home page for managing and adding Exact Match Data Identifier profiles. An Exact
Match Data Identifier profile is required to implement data identifier conditions with EMDI
optionally enabled as a validator. An Exact Match Data Identifier profile specifies the data
source, the indexing parameters, and the indexing schedule. Once you have created the EMDI
profile, you index the data source and add the data identifier validation on the Manage >
Detecting content using Exact Match Data Identifiers (EMDI) 483
Configuring Exact Match Data Identifier profiles

Policies > Data Identifiers page or on the Manage > Data Profiles > Exact Data > Add
Exact Match Data Identifier Profile page.
See “Creating and modifying the Exact Match Data Identifier profiles” on page 483.

Creating and modifying the Exact Match Data Identifier profiles

See “Configuring Exact Match Data Identifier profiles” on page 476.

Note: If you use the Remote EMDI Indexer to generate the Exact Match Data Identifier profile,
refer to See “Creating an EMDI profile template for remote indexing” on page 508.

To create or modify an Exact Match Data Identifier Profile

1 Make sure that you have created the data source file.
See “Creating the Exact Match Data Identifier source file” on page 477.
2 Make sure that you have prepared the data source file for indexing.
See “Preparing the Exact Match Data Identifier source for indexing” on page 478.
3 In the Enforce Server administration console, navigate to Manage > Data Profiles >
Exact Data.
4 Click Add Exact Match Data Identifier Profile.
5 Enter a unique, descriptive Name for the profile (limited to 256 characters).
For easy reference, choose a name that describes the data content and the index type
(for example, Employee Data EMDI).
If you modify an existing Exact Match Data Identifier profile you can change the profile
name.
6 Select one of the following Data Source options to make the data source file available to
the Enforce Server:
■ Upload Data Source to Server Now
If you want to create a new profile, click Browse and select the data source file, or
enter the full path to the data source file.
If you want to modify an existing profile, select Upload Now.
See “Uploading the Exact Match Data Identifier source files to the Enforce Server”
on page 480.
■ Reference Data Source on Manager Host
If you copied the data source file to the datafiles directory on the Enforce Server, it
appears in the drop-down list for selection.
Detecting content using Exact Match Data Identifiers (EMDI) 484
Configuring Exact Match Data Identifier profiles

See “Uploading the Exact Match Data Identifier source files to the Enforce Server”
on page 480.
■ Use This File Name
Select this option if you have not yet created the data source file but want to configure
EMDI policies using a placeholder EMDI profile. Enter the file name of the data source
you plan to create, including the Number of Columns it is to have. When you do
create the data source, you must copy it to the datafiles directory.

Note: Use this option with caution. Be sure to remember to create the data source file
and copy it to the datafiles directory. Name the data source file exactly the same
as the name you enter here and include the exact number of columns you specify
here.

■ Load Externally Generated Index

Select this option if you have created an index on a remote computer using the Remote
EMDI Indexer. This option is only available after you have defined and saved the
profile. Profiles are loaded on Windows from the
\ProgramData\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\index
directory and on Linux from the
/var/Symantec\DataLossPrevention/EnforceServer/15.5/Protect/index directory
on the Enforce Server host.

7 If the first row of your data source contains Column Names, select Read first row as
column names.
8 Specify the Error Threshold, which is the maximum percentage of rows that contain
errors before indexing stops.
A data source error is either an empty cell, a cell with the wrong type of data, or extra
cells in the data source. For example, a name in a column for phone numbers is an error.
If errors exceed a certain percentage of the overall data source (by default, 5%), the
system quits indexing and displays an indexing error message. The index is not created
if the data source has more invalid records than the error threshold value allows. Although
you can change the threshold value, more than a small percentage of errors in the data
source can indicate that the data source is corrupt, is in an incorrect format, or cannot be
read. If you have a significant percentage of errors (10% or more), stop indexing and
cleanse the data source.
See “Preparing the Exact Match Data Identifier source for indexing” on page 478.
9 Select the Column Separator Char (delimiter) that you have used to separate the values
in the data source file. The delimiters you can use are tabs, commas, semicolons, or
pipes.
Detecting content using Exact Match Data Identifiers (EMDI) 485
Configuring Exact Match Data Identifier profiles

10 Select one of the following encoding values for the content to analyze, which must match
the encoding of your data source:
■ ISO-8859-1 (Latin-1) (default value)
Standard 8-bit encoding for Western European languages using the Latin alphabet.
■ UTF-8
Use this encoding for all languages that use the Unicode 4.0 standard (all single- and
double-byte characters), including those in East Asian languages.
■ UTF-16
Use this encoding for all languages that use the Unicode 4.0 standard (all single- and
double-byte characters), including those in East Asian languages.

Note: Make sure that you select the correct encoding. The system does not prevent you
from creating an EMDI profile using the wrong encoding. The system only reports an error
at run-time when the EMDI policy attempts to match inbound data. To make sure that you
select the correct encoding, after you click Next, verify that the column names appear
correctly. If the column names do not look correct, you chose the wrong encoding.

11 Click Next to go to the second Add Exact Match Data Identifier Profile screen.
See “Scheduling EMDI profile indexing” on page 485.

Scheduling EMDI profile indexing

When you configure an Exact Match Data Identifier profile, you can set a schedule for indexing
the data source (Submit Indexing on Job Schedule).
See “About EMDI index scheduling” on page 475.
Before you set up a schedule, consider the following recommendations:
■ If you update your data sources occasionally (for example, less than once a month), there
is no need to create a schedule. Index the data each time you update the data source.
■ Schedule indexing for times of minimal system use. Indexing affects performance throughout
the Symantec Data Loss Prevention system, and large data sources can take time to index.
■ Index a data source as soon as you add or modify the corresponding exact data profile,
and re-index the data source whenever you update it. For example, consider a scenario
whereby every Wednesday at 2:00 A.M. you update the data source. In this case you
should schedule indexing every Wednesday at 3:00 A.M. Do not index data sources daily
as daily indexing can degrade performance.
■ If you need to update indexes frequently (for example, daily), Symantec recommends that
you use the Remote EMDI Indexer.
Detecting content using Exact Match Data Identifiers (EMDI) 486
Configuring Exact Match Data Identifier profiles

■ Monitor results and modify your indexing schedule accordingly. If performance is good and
you want more timely updates, schedule more frequent data updates and indexing.
The Indexing section lets you index the Exact Match Data Identifier profile as soon as you
save it (recommended). You can also index on a regular schedule as follows:

Table 25-8 Scheduling indexing for Exact Match Data Identifier Profiles

Parameter Description

Submit Indexing Select this option to index the Exact Match Data Identifier profile.
Job on Save

Submit Indexing Select this option to schedule an indexing job. The default option is No Regular Schedule. If you
Job on Schedule want to index according to a schedule, select a desired schedule period, as described.

Index Once On – Enter the date to index the document profile in the format MM/DD/YY. You can also click the
date widget and select a date.

At – Select the hour to start indexing.

Index Daily At – Select the hour to start indexing.

Until – Select this check box to specify a date in the format MM/DD/YY when the indexing should
stop. You can also click the date widget and select a date.

Index Weekly Day of the week – Select the day(s) to index the document profile.

At – Select the hour to start indexing.

Until – Select this check box to specify a date in the format MM/DD/YY when the indexing should
stop. You can also click the date widget and select a date.

Index Monthly Day – Enter the number of the day of each month you want the indexing to occur. The number
must be 1 through 28.

At – Select the hour to start indexing.

Until – Select this check box to specify a date in the format MM/DD/YY when the indexing should
stop. You can also click the date widget and select a date.

See “Associating data identifiers with your data source (EMDI)” on page 486.

Associating data identifiers with your data source (EMDI)

On this screen you associate data identifiers with your data source.
Detecting content using Exact Match Data Identifiers (EMDI) 487
Configuring Exact Match Data Identifier profiles

To continue configuring your Exact Match Data Identifier profiles

1 Designate columns in your data source as Required, Optional, or Ignored. You must
associate Required columns with an existing data identifier.
Confirm that the column names in your data source are accurately represented in the
Data Source Field column. If you selected the Column Names option, the Data Source
Field column lists the names in the first row of your data source. If you did not select the
Column Names option, the column lists Col 1, Col 2, and so on.
2 In the Indexing section of the screen, select one of the following options:
■ Submit Indexing Job on Save
Select this option to begin indexing the data source when you save the exact data
profile.
■ Submit Indexing Job on Schedule
Select this option to index the data source according to a specific schedule. Make a
selection from the Schedule drop-down list and specify days, dates, and times as
required.
See “Scheduling EMDI profile indexing” on page 485.

3 Click Finish.
After Symantec Data Loss Prevention finishes indexing, it deletes the original data source
from the Enforce Server. After you index a data source, you cannot change its schema.
If you change column designations for a data source after you index it, you must create
a new EMDI profile.
You can add Exact Match Data Identifier validators to existing data identifier policies.
See “Adding an EMDI check to a built-in or custom data identifier condition in a policy”
on page 487.

Adding an EMDI check to a built-in or custom data identifier condition

in a policy
You can add an EMDI validation check to an existing data identifier, or you can create a custom
data identifier that includes an EMDI validation check.
To add an EMDI validation check to an existing policy
1 Go to Manage > Policies > Policy List.
2 Check the box to choose an existing policy.
3 Double-click the policy to begin editing.
4 Rename the policy to indicate that uses EMDI as a validator.
5 Verify the Wide, Medium, or Narrow breadth.
Detecting content using Exact Match Data Identifiers (EMDI) 488
Using multi-token matching with EMDI

6 Click Optional Validators.

7 Click Exact Match Data Identifier Check.
8 Select a Profile. When you scroll to view profiles, you only see profiles where the key
column matches the data identifier in use.
9 Select at least one Required column that must be matched.
10 Choose how many other optional columns to match. You must have at least one optional
column.
11 Select the desired Proximity using the slider. The maximum proximity for EMDI is 50
tokens before or after the data identifier or pattern match. You can select a lower level.
12 Verify a Match Counting value. Your options are:
Check for existence (don't count multiple matches)
Count all matches
Count all unique matches.
13 Select a value for Only report incidents with at least [n] matches.
14 Click what to match on:
Envelope
Subject
Body
Attachments.
15 Click OK.
16 Click Save.
You can also create a custom data identifier that includes an EMDI validation check. To review
the steps to create a custom data identifier, see the "Detecting content using data identifiers"
topic in the Symantec Data Loss Prevention Administration Guide or Help. Then follow the
steps to add an EMDI validator. For more information on configuring policies, see the
"Configuring policies" topic in the Symantec Data Loss Prevention Administration Guide or
Help.
See “Using multi-token matching with EMDI” on page 488.

Using multi-token matching with EMDI

EMDI validation occurs after a pattern in a file or message matches a data identifier. The EMDI
validator then searches within the defined proximity window (by default, plus or minus 50
tokens) for both individual tokens and multi-token strings. It then validates whether any of
those tokens in combination with the matching data identifier pattern correspond to a row in
Detecting content using Exact Match Data Identifiers (EMDI) 489
Using multi-token matching with EMDI

the EMDI index. If the Required column matches and there are enough Optional column
matches within the proximity window, then an EMDI match is generated.
A multi-token cell is a cell in the index that contains multiple words separated by spaces,
leading or trailing punctuation, or alternative Latin and Chinese, Japanese, or Korean language
characters. The sub-token parts of a multi-token cell obey the same rules as single-token cells:
they are normalized according to their pattern where normalization can apply. Messages and
files that are inspected must match a multi-token cell exactly, including whitespace and
punctuation (assuming the default settings).
For example, an indexed cell containing the string "Bank of America" is a multi-token comprising
three sub-token parts. During detection, "bank of america" (normalized) matches the multi-token
cell, but "bank america" does not.
See “Characteristics of multi-token cells for EMDI” on page 489.

Characteristics of multi-token cells for EMDI

Table 25-9 lists and describes characteristics of multi-token matching.

Table 25-9 Characteristics of multi-tokens

Characteristic Description

The number of tokens in a single cell is limited to 100 With CJK tokens, each character is treated as a single
tokens. token and the number of CJK characters is limited to 100.
If more than 100 tokens are found in a single cell during
indexing, indexing is terminated.

Whitespace in Latin multi-token cells is considered, but See “Multi-token with spaces for EMDI” on page 490.
multiple white spaces are normalized to 1.

Punctuation immediately preceding and following a token See “Multi-token with punctuation for EMDI” on page 491.
or sub-token is always ignored.
See “Additional examples for multi-token cells with
punctuation for EMDI” on page 492.

You can configure how punctuation within a token or Lexer.IncludePunctuationInWords = true

multi-token is treated during detection. For most cases the
Note: This setting can only be set to false on the server,
default setting (true) is appropriate. With the false
not on the DLP Agent. On the DLP Agent, this setting is
setting, punctuation is treated as whitespace.
fixed to true.

See “Configuring Advanced Settings for EDM policies”

on page 557.

For proximity range checking the sub-token parts of a See “Proximity matching example for EMDI” on page 496.
multi-token are counted as single tokens.

See “Multi-token with spaces for EMDI” on page 490.

Detecting content using Exact Match Data Identifiers (EMDI) 490
Using multi-token matching with EMDI

Multi-token with spaces for EMDI

Table 25-10 shows examples of multi-tokens with spaces for EMDI.

Table 25-10 Multi-token cell with spaces examples

Description Indexed content Detected content Explanation

Cell contains space. Bank of America Bank of America Cell with spaces is
multi-token.

Multi-token must match

exactly.

Cell contains multiple Bank of America Bank of America Multiple spaces are
spaces. normalized to one.

Cells contain space between 傠傫傠傫傠傫傠傫 White spaces between CKJ

CKJ characters. characters are ignored.
傠傫傠傫

Cells contain space between EMDI 傠傫 EMDI 傠傫 White spaces between Latin
Latin and CJK characters. and CJK characters are
EMDI傠傫
ignored.

See “Multi-token with mixed language characters for EMDI” on page 490.

Multi-token with mixed language characters for EMDI

Table 25-11 shows examples of multi-tokens with mixed Latin and CJK characters.

Table 25-11 Multi-token cell with Latin and CJK characters examples for EMDI

Description Cell content Should match Explanation

Cell includes Latin and CJK ABC傠傫 ABC傠傫 Mixed Latin-CJK cell is
characters with no spaces. multi-token.
傠傫ABC 傠傫ABC
Whitespace between Latin
Also matches with:
and CJK characters is
ABC 傠傫 ignored.
傠傥 ABC

EMDI ignores whitespace

between the Latin
characters and the CJK
token.
Detecting content using Exact Match Data Identifiers (EMDI) 491
Using multi-token matching with EMDI

Table 25-11 Multi-token cell with Latin and CJK characters examples for EMDI (continued)

Description Cell content Should match Explanation

Cell includes Latin and CJK ABC 傠傫 ABC 傠傫 Multiple spaces are ignored.
with one or more spaces.
傠傥 ABC 傠傥 ABC

Also matches with:

ABC傠傫

傠傫ABC

Cell contains Latin or CJK 什仁仂仃仄仅仇仈仉什仁仂仃仄仅仇仈仉 Single-token cell.

with numbers. 147(什仂仅 51-1) 147(什仂仅 51-1)

See “Multi-token with punctuation for EMDI” on page 491.

Multi-token with punctuation for EMDI

Punctuation is always ignored if it comes at the beginning (leading) or end (trailing) of a token
or multi-token. Whether punctuation that is included in a token or multi-token is required for
matching depends on the Advanced server setting Lexer.IncludePunctuationInWords,
which by default is set to true (enabled). For the DLP Agent, this value is set to true and
cannot be changed.
See “Multi-token punctuation characters (EDM)” on page 569.

Note: For convenience purposes the Lexer.IncludePunctuationInWords parameter is referred

to by the three-letter acronym "WIP" throughout this section.

The WIP setting operates at detection-time to alter how matches are reported. For most EMDI
policies you should not change the WIP setting. For a few limited situations, such as account
numbers or addresses, you may need to set IncludePunctuationInWords = false depending
on your detection requirements.
See “Multi-token punctuation characters (EDM)” on page 569.
Table 25-12 lists and explains how multi-token matching works with punctuation.
Detecting content using Exact Match Data Identifiers (EMDI) 492
Using multi-token matching with EMDI

Table 25-12 Multi-token punctuation table for EMDI

Indexed Detected WIP setting Match Explanation

content content

a.b a.b TRUE Yes The indexed content and the detected content are
exactly the same.

FALSE No The detected content is treated as "a b" and is therefore

not a match.

a.b ab TRUE No The indexed content and the detected content are
different.

FALSE No The indexed content and the detected content are

different.

ab a.b TRUE No The indexed content and the detected content are
different.

FALSE Yes The detected content is treated as "a b" and is therefore
a match.

ab ab TRUE Yes The indexed content and the detected content are
exactly the same.

FALSE Yes The indexed content and the detected content are
exactly the same.

See “Additional examples for multi-token cells with punctuation for EMDI” on page 492.

Additional examples for multi-token cells with punctuation for EMDI

Table 25-13 lists and describes some additional examples for multi-token cells with punctuation.
Keep in mind is that during indexing, if a token includes punctuation marks between characters,
the punctuation is always retained. This means that EMDI cannot detect that cell if the WIP
setting is false. In other words, if indexed data has a cell that has a token with internal
punctuation, the WIP setting should be set to true.
Detecting content using Exact Match Data Identifiers (EMDI) 493
Using multi-token matching with EMDI

Table 25-13 Additional use cases for multi-token cells with punctuation for EMDI

Description Indexed content Detected content Explanation

Cell contains a physical 346 Guerrero St., Apt. #2 346 Guerrero St., Apt. #2 The indexed content is a
address with punctuation. multi-token cell.
346 Guerrero St Apt 2
Both match because the
punctuation comes at the
beginning or end of the
sub-token parts and is
therefore ignored.

Cell contains internal O'NEAL ST. O'NEAL ST The indexed content is a

punctuation with no space multi-token cell.
before or after.
Internal punctuation is
included (assuming WIP is
true), and leading or trailing
punctuation is ignored
(assuming there is a space
delimiter after the
punctuation).

Cell contains Asian 傠傫##傠傫傠傫##傠傫 (if WIP true) The indexed content is a
language characters (CJK) single token cell.
with indexed internal
During detection, Asian
punctuation.
language characters (CJK)
with internal punctuation are
affected by the WIP setting.
Thus, in this example 傠傫
##傠傫 matches only if the
WIP setting is true.

If the WIP setting is false, 傠

傫##傠傫 is considered a
multi-token because the
internal punctuation is
treated as whitespace. Thus,
no content can match.
Detecting content using Exact Match Data Identifiers (EMDI) 494
Using multi-token matching with EMDI

Table 25-13 Additional use cases for multi-token cells with punctuation for EMDI (continued)

Description Indexed content Detected content Explanation

Cell contains Asian 傠傫傠傫傠傫傠傫 The indexed content is a

language characters (CJK) multi-token cell.
傠傫##傠傫 (if WIP false)
without indexed internal
The detected content
punctuation.
matches as indexed. If the
WIP setting is false, the
detected content matches
傠傫##傠傫 because internal
punctuation is ignored.

Cell contains mix of Latin EMDI##傠傫 EMDI 傠傫 The indexed content is a

and CJK characters with multi-token cell.
punctuation separating the
A cell with alternate Latin
Latin characters and Asian
and CJK characters is
characters.
always a multi-token.
Punctuation between Latin
and Asian characters is
always treated as a single
white space regardless of
the WIP setting.

Cell contains mix of Latin DLP##EMDI 傠傫##傠傥 DLP##EMDI##傠傫##傠傥 The indexed content is a
and CJK characters with (if WIP true) multi-token cell.
internal punctuation.
DLP##EMDI 傠傫##傠傥 (if During detection,
WIP true) punctuation between the
Latin and characters and the
Asian characters is treated
as a single whitespace.
Leading and trailing
punctuation is ignored.

If the WIP setting is true the

punctuation internal to the
Latin characters and internal
to the Asian character is
retained.

If the WIP setting is false, no

content can match because
internal punctuation is
ignored.
Detecting content using Exact Match Data Identifiers (EMDI) 495
Using multi-token matching with EMDI

Table 25-13 Additional use cases for multi-token cells with punctuation for EMDI (continued)

Description Indexed content Detected content Explanation

Cell contains mix of Latin DLP EMDI 傠傫傠傥 DLP EMDI 傠傫傠傥 The indexed content is a
and CJK characters with multi-token cell.
DLP#EMDI 傠傫#傠傥 (if
internal punctuation.
WIP false) During detection,
punctuation between the
DLP#EMDI##傠傫#傠傥 (if
Latin characters and the
WIP false)
Asian characters is treated
as a single whitespace.
Leading and trailing
punctuation is ignored.
Thus, it matches as indexed.

If the WIP setting is false, it

matches DLP;EMDI##傠傫
#傠傥 because internal
punctuation is ignored.

See “Multi-token punctuation characters for EMDI” on page 495.

Multi-token punctuation characters for EMDI

In EMDI, a multi-token cell is any cell that has been indexed that contains punctuation (as well
as spaces or alternative Latin words and CJK characters).
Table 25-14 lists the symbols that are identified and treated as punctuation during EMDI
indexing.

Table 25-14 Characters treated as punctuation for indexing for EMDI

Punctuation name Character representation

Apostrophe '

Tilde ~

Exclamation point !

Ampersand &

Dash -

Single quotation mark '

Double quotation mark "

Period (dot) .
Detecting content using Exact Match Data Identifiers (EMDI) 496
Using multi-token matching with EMDI

Table 25-14 Characters treated as punctuation for indexing for EMDI (continued)

Punctuation name Character representation

Question mark ?

At sign @

Dollar sign $

Percent sign %

Asterisk *

Caret symbol ^

Open parenthesis (

Close parenthesis )

Open bracket [

Close bracket ]

Open brace {

Close brace }

Forward slash /

Back slash \

Pound sign #

Equal sign =

Plus sign +

See “Proximity matching example for EMDI” on page 496.

Proximity matching example for EMDI

EMDI protects confidential data by correlating uniquely identifiable information, such as a
social security number, with data that is not unique, such as a last name. When you correlate
data, it is important to ensure that terms are related. In natural languages, it is more likely that
when two words appear close together they are used in the same context and are therefore
related.
Based on the premise that word proximity indicates relatedness, EMDI employs a
proximity-matching radius to limit how much free-form content the system examines when it
Detecting content using Exact Match Data Identifiers (EMDI) 497
Using multi-token matching with EMDI

searches for matches. EMDI proximity matching is designed to reduce false positives by
ensuring that matched terms are proximate.
EMDI supports up to 50 tokens before and after the data identifier match. This limit can be
modified during policy creation. No dependency exists on the number of columns in the policy.
Table 25-15 shows a proximity matching example based on the default proximity radius setting.
In this example, the detected content produces one unique token set match, described as
follows:
■ The proximity range window is 100 tokens (50 tokens before and after the matching data
identifier pattern).
■ The total number of tokens from "Stevens" to the first token of "Bank of America" is within
100 tokens.
■ "Bank of America" is a multi-token. Each sub-token part of a multi-token is counted as a
single token for proximity purposes.
■ If a multi-token begins within the proximity window, it is matched even if it ends after the
proximity window. For example, "Bank of America" is matched if "Bank" is in the proximity
window, even if "of America" is not within the window.
Detecting content using Exact Match Data Identifiers (EMDI) 498
Memory requirements for EMDI

Table 25-15 Proximity example for EMDI

Indexed data Data Policy Proximity Detected content

Identifier
Match

Last_Name | SSN | Social Match 3 of 3 Radius = 50 Zendrerit inceptos Kathy Stevens

Employer Security tokens (default) lorem ipsum pharetra convallis leo
Number suscipit ipsum sodales rhoncus, vitae
Stevens |
dui nisi volutpat augue maecenas in,
123-45-6789 | Bank
luctus id risus magna arcu maecenas
of America
leo quisque. Rutrum convallis tortor
urna morbi elementum hac curabitur
morbi, nunc dictum primis elit
senectus faucibus convallis surfrent.
Aptentnour gravida adipiscing iaculis
himenaeos, 123-45-6789. Dictumst
lorem eget ipsum. Hendrerit inceptos
other sagittis quisque. Leo mollis per
nisl per felis, nullam cras mattis augue
turpis integer pharetra convallis
suscipit hendrerit? Lubilia en mictumst
horem eget ipsum. Inceptos urna
sagittis quisque dictum odio hendrerit
convallis suscipit ipsum wrdsrf
Zendrerit inceptos Kathy lorem ipsum
Bank of America.

See “Memory requirements for EMDI” on page 498.

Memory requirements for EMDI

Using EMDI for DLP Symantec Data Loss Prevention deployments affects hardware memory
requirements for Symantec Data Loss Prevention. In particular, EMDI affects the memory
required to index the data size as well as the memory required to load the index on the detection
server, the appliance, and the endpoint.
Once you have established what your specific EMDI memory requirements are, you can
evaluate how those requirements affect the general system requirements for your Data Loss
Prevention deployment. See the Symantec Data Loss Prevention System Requirements and
Compatibility Guide for details about general requirements.
See “EMDI memory configuration and limitations” on page 499.
Detecting content using Exact Match Data Identifiers (EMDI) 499
Memory requirements for EMDI

EMDI memory configuration and limitations

The memory requirements for EMDI are related to several factors, including:
■ Number of indexes you are building
■ Total size of the indexes
■ Number of cells in each index
These size limitations apply to EMDI indexes:
■ The maximum number of rows supported is 4 million. This count does not include invalid
rows.
■ The maximum number of columns is 32.
■ The number of invalid entries allowed is configurable in the Indexer.properties file. The
default is 1%.
■ A value from the required column can have a maximum of 5 duplicates. A specific value
in the required column cannot appear more than 5 times. This is not configurable to a
greater value. The total number of duplicate values in each required (or "key") column
cannot exceed 1% of the values. This is configurable by editing
EMDI.MaxDuplicateCellsPercentage=1 in the properties file.

■ The maximum number of supported cells is 128 million.

■ If any of these limits are exceeded the index creation is terminated.
Table 25-16 gives an overview of the steps that you can follow to determine and set memory
requirements for EMDI.

Table 25-16 Workflow for determining memory requirements for EMDI indexes

Step Action For more information

1 Determine the memory See “Determining requirements for both local indexers and
that is required to index remote indexers for EMDI” on page 500.
the data source.

2 Determine the memory See “Detection server memory requirements for EMDI”
that is required to load the on page 501.
index on the detection
server or the endpoint.

3 Increase the detection See “Increasing the memory for the detection server (File
server or endpoint Reader) for EMDI” on page 503.
memory according to your
See Properties file settings for EMDI on page 515.
calculations.
Detecting content using Exact Match Data Identifiers (EMDI) 500
Memory requirements for EMDI

Table 25-16 Workflow for determining memory requirements for EMDI indexes (continued)

Step Action For more information

4 Repeat for each EMDI

index you want to deploy.

See “Overview of configuring memory and indexing the data source for EMDI” on page 500.

Overview of configuring memory and indexing the data source for

EMDI
Table 25-17 provides the steps for determining how much memory is needed to index the data
source.

Table 25-17 Memory requirements for indexing the data source for EMDI

Step Action Details

1 Estimate the memory requirements See “Determining requirements for both local indexers and
for the indexer. remote indexers for EMDI” on page 500.

2 Increase the indexer memory. The next step is to increase the memory allocated to the
indexer. The procedure for increasing the indexer memory
differs depending on whether you use the EMDI indexer local
to the Enforce Server or the Remote EMDI Indexer.

3 Restart the Symantec DLP Manager You must restart this service after you have changed the
service. memory allocation.

4 Index the data source. The last step is to index the data source. You need to index
before you calculate remaining memory requirements.

See “Configuring Exact Data profiles for EDM” on page 534.

See “Determining requirements for both local indexers and remote indexers for EMDI”
on page 500.

Determining requirements for both local indexers and remote indexers

for EMDI
This topic provides an overview of memory requirements for both the EMDI indexer that is
local to the Symantec Data Loss Prevention Enforce Server and for the Remote EMDI indexer.
You do not need to change the EMDI indexer default value of 2048 MB. Make sure that the
system has enough free additional memory in case of parallel indexing. The additional memory
Detecting content using Exact Match Data Identifiers (EMDI) 501
Memory requirements for EMDI

that is required depends on the number of required and optional columns as well as the number
of cells. In the following examples,
R – Number of required columns
P – Number of optional columns
B – Bytes per cell
The general formula is: B = 4 * R * P / (P+1)

Example 1
For an index with 5 million cells (1 million rows x 5 columns), 1 required column, and 4 optional
columns:
The formula is: B = 4 * 1 * 4/5 = 3.2 bytes x cell
The total memory that is required for this index = 5 million * 3.2 = 16 MB

Example 2
For an index with 40 million cells (4 million rows x 10 columns), 1 required column, and 9
optional columns:
The formula is: B = 4 * 1 * 9/10 = 3.6 bytes x cell
The total memory that is required for this index = 40 million * 3.6 = 144 MB

Example 3
For an index with 128M cells (4M rows x 32 columns), 1 required column, and 31 option
columns:
The formula is B = 4 * 1 * 31/32 = 3.875 bytes x cell
The total memory that is required for this index = 128 million * 3.875 = 496 MB
See “Detection server memory requirements for EMDI” on page 501.

Detection server memory requirements for EMDI

The detection server should not use more than 60% of the memory of the computer. For
example, if your detection server needs 6 GB of memory to run, make sure that you have 10
GB on that server.

Default configuration for a detection server

The default configuration for detection server has 4 GB and eight message chains. See the
following formulas and Table 25-18 to determine how to calculate your actual memory
requirements.
Detecting content using Exact Match Data Identifiers (EMDI) 502
Memory requirements for EMDI

To load the index, the detection server needs, on average, 3.5 bytes per cell for system memory
plus 1 GB Java heap memory for each message chain in the detection server. The following
examples show scenarios for a customer who has three indexes that are all under the same
schedule.
For Java heap memory requirements, the formula is:
Java heap memory requirement = the number of message chains * 1 GB.
For system memory requirements, the general formula is:
System memory requirement = number of cells * 3.5 bytes.

Detection Server memory settings

The Advanced Server settings property for the number of message chains is:
MessageChain.NumChains.

The Java heap memory settings for a detection server are set in the Enforce Server
administration console at the Server Detail - Advanced Server Settings page, using the
BoxMonitor.FileReaderMemory property. The format is -Xrs -Xms1200M -Xmx4G. You don't
need to change the system memory setting, but make sure that the detection server has
enough free memory available.

Note: When you update this setting, only change the -Xmx value in this property. For example,
only change "4G." to a new value, and leave all other values the same.

The examples in Table 25-18 show the settings for five different situations.

Table 25-18 EMDI detection server Java heap memory settings and additional system
memory examples

Example Calculation Boxmonitor.FileReaderMemory Additional system

setting memory required

Example 1: Java heap memory -Xmx2G 16 MB

requirement:
2 message chains, a 5
million cell index 2 GB (default

system memory
requirement:

5 million * 3.2 = 16 MB
Detecting content using Exact Match Data Identifiers (EMDI) 503
Memory requirements for EMDI

Table 25-18 EMDI detection server Java heap memory settings and additional system
memory examples (continued)

Example Calculation Boxmonitor.FileReaderMemory Additional system

setting memory required

Example 2: Java heap memory -Xmx4G 720 MB

requirement:
4 message chains, five
40 million cell indexes 4 * 1 GB = 4 GB

system memory
requirement:

5 * 40 million * 3.6 =
720 MB

Example 3: Java heap memory -Xmx24G 4.96 GB

requirement:
24 message chains,
five 40 million cells 24 * 1 GB = 24 GB
indexes
system memory
requirement:

10 * 128 million =
3.875 = 4960 MB

See “Increasing the memory for the detection server (File Reader) for EMDI” on page 503.

Increasing the memory for the detection server (File Reader) for
EMDI
This topic provides instructions for increasing the File Reader memory allocation for a detection
server. These instructions assume that you have performed the necessary calculations.
See “Determining requirements for both local indexers and remote indexers for EMDI”
on page 500.
To increase the memory for detection server processing
1 In the Enforce Server administration console, navigate to the Server Detail - Advanced
Server Settings screen for the detection server where the EMDI index is deployed or to
be deployed.
2 Locate the following setting: BoxMonitor.FileReaderMemory.
Detecting content using Exact Match Data Identifiers (EMDI) 504
Remote EMDI indexing

3 Change the -Xmx4G value in the following string to match the calculations you have made.
-Xrs -Xms1200M -Xmx4G -XX:PermSize=128M -XX:MaxPermSize=256M
For example: -Xrs -Xms1200M -Xmx11G -XX:PermSize=128M -XX:MaxPermSize=256M
4 Save the configuration and restart the detection server.
See “Profile size limitations on the DLP Agent for EMDI ” on page 504.

Profile size limitations on the DLP Agent for EMDI

By default, no profiles larger than 100 MB are sent to the DLP Agent. To change this default,
edit the EMDI.MaxEndpointProfileMemoryInMB = in the Protect.properties file.
See Properties file settings for EMDI on page 515.
There is no limit on the number of 100 MB profiles that are sent to the agent. If you increase
the default value for the index or plan to deploy multiple indexes, you need to provision extra
memory on your DLP Agents to accommodate these increases.

Note: By default, deployment of EMDI profiles to DLP Agents is set to false. To enable EMDI
deployments to DLP Agents, set EMDI.EnabledOnAgents property in the Protect.properties
file to true for each DLP Agent.

Remote EMDI indexing

An EMDI index maps the data you want to protect to the Exact Match Data Identifier profile.
Here's the typical EMDI workflow for creating the EMDI index:
■ Upload the data source file to the Enforce Server.
■ Create the Exact Match Data Identifier profile.
■ Index the data source.
Instead of uploading the data source file to the Enforce Server for indexing, you can index the
data source locally and securely using the Remote EMDI Indexer.
For example, if copying the confidential data source file to the Enforce Server presents a
potential security or logistical issue, you can use the Remote EMDI Indexer to create the
cryptographic index directly on the data source host before moving the index to the Enforce
Server.
See “About the Remote EMDI Indexer” on page 505.
See “About the SQL Preindexer and EMDI” on page 505.
Detecting content using Exact Match Data Identifiers (EMDI) 505
Remote EMDI indexing

The Remote EMDI Indexer is a standalone tool that lets you index the data source file directly
on the data source host.
See “System requirements for remote EMDI indexing” on page 505.

About the Remote EMDI Indexer

The Remote EMDI Indexer utility converts a data source file to an EMDI index. The utility is
similar to the local EMDI Indexer that you can use on the Enforce Server. However, the Remote
EMDI Indexer is designed for use on a computer that is not part of the Symantec Data Loss
Prevention server configuration.
The Remote EMDI Indexer has the following advantages over using the EMDI Indexer on the
Enforce Server:
■ It enables the owner of the data, rather than the Symantec Data Loss Prevention
administrator, to index the data. The Symantec Data Loss Prevention administrator does
not need to have access to the original data source that is indexed.
■ It shifts the system load that is required for indexing onto another computer. The CPU and
RAM on the Enforce Server is reserved for other tasks.
See “About the SQL Preindexer and EMDI” on page 505.
See “Workflow for remote EMDI indexing” on page 506.

About the SQL Preindexer and EMDI

You use the SQL Preindexer utility with the Remote EMDI Indexer to run SQL queries against
Oracle databases. Then you pipe the resulting data to the Remote EMDI Indexer for indexing.
See “System requirements for remote EMDI indexing” on page 505.
The SQL Preindexer utility is installed in the C:\Program
Files\Symantec\DataLossPrevention\ServerPlatformCommon\Indexer\15.5\Protect\bin
directory during installation of the Remote EMDI Indexer. The SQL Preindexer utility generates
an index directly from an Oracle SQL database. The SQL Preindexer processes the database
query and passes it to the standard input of the Remote EMDI Indexer utility.
To use the SQL Preindexer the data source must be relatively clean since the query result
data is piped directly to the Remote EMDI Indexer.
See “About the Remote EMDI Indexer” on page 505.

System requirements for remote EMDI indexing

The Remote EMDI Indexer runs on the Windows and Linux operating system versions that
are supported for Symantec Data Loss Prevention servers. See the Symantec Data Loss
Detecting content using Exact Match Data Identifiers (EMDI) 506
Remote EMDI indexing

Prevention System Requirements and Compatibility Guide for more information about operating
system support.
The SQL Preindexer supports Oracle databases and requires a relatively clean data source.
See “About the SQL Preindexer and EMDI” on page 505.
The RAM requirements for using the Remote EMDI Indexer vary according to the size of the
data source being indexed.
See “Memory requirements for EMDI” on page 498.

Workflow for remote EMDI indexing

This section summarizes the steps to index a data file on a remote machine and then use the
index in Symantec Data Loss Prevention.
See “About the Exact Data Profile and index” on page 528.

Table 25-19 Steps to use the Remote EMDI Indexer

Step Action Description

Step 1 Install the Remote EMDI See “About installing the Remote EMDI indexer” on page 507.
Indexer on a computer that
is not part of the Symantec
Data Loss Prevention
system.

Step 2 Create an Exact Match Data On the Enforce Server, generate an EMDI Profile template using the *.emdi
Identifier profile on the file name extension and specifying the exact number of columns to be indexed.
Enforce Server to use with
See “Creating an EMDI profile template for remote indexing” on page 508.
the Remote EMDI Indexer.

Step 3 Copy the Exact Match Data Download the profile template from the Enforce Server and copy it to the
Identifier Profile file to the remote data source host computer.
computer where the Remote
See “Downloading and copying the EMDI profile file to a remote system”
EMDI Indexer resides.
on page 509.

Step 4 Run the Remote EMDI If you have a cleansed data source file, use the RemoteEMDIIndexer with
Indexer and create the index the -data, -profile, and -result options.
files.
If the data source is an Oracle database, use the SqlPreindexer and the
RemoteEMDIIndexer to index the data source directly with the -alias
(oracle DB host), -username and -password credentials, and the -query
string or -query_path.

See “Generating remote index files for EDM” on page 591.

Detecting content using Exact Match Data Identifiers (EMDI) 507
Remote EMDI indexing

Table 25-19 Steps to use the Remote EMDI Indexer (continued)

Step Action Description

Step 5 Copy the index files from the Copy the resulting *.pdx and *.rdx files from the remote machine to the
remote machine to the Enforce Server host on Windows at
Enforce Server. C:\ProgramData\Symantec\DataLossPrevention
\ServerPlatformCommon\15.5\index or on Linux at
/var/Symantec/DataLossPrevention
/ServerPlatformCommon/15.5/index.

See “Copying and loading remote EDM index files to the Enforce Server”
on page 594.

Step 6 Load the index files into the Update the EMDI profile by loading the externally generated index.
Enforce Server.
Submit the profile for indexing.

See “Copying and loading remote EDM index files to the Enforce Server”
on page 594.

Step 7 Troubleshoot any problems Verify that indexing is started and completes.
that occur during the
Check the system events for Code 2926 ("Created Exact Data Profile" and
indexing process.
"Data source saved").

The ExternalEmdiDataSource.<name>.pdx and *.rdx files are removed

from the index directory and replaced by the file EmdiDataSource.<tenant
id>.<profile id>.<version>.rdx.

See “Troubleshooting remote indexing errors for EDM” on page 599.

Step 8 Create policy with EMDI You should see the column data for defining the EMDI condition.
condition.
See “Configuring the Content Matches Exact Data policy condition for EDM”
on page 551.

About installing the Remote EMDI indexer

You install the remote indexer on one or more systems where the confidential files that you
want to protect are stored. The process for installing a remote indexer is the same for EMDI,
EDM, and IDM.
See “About installing remote indexers” on page 589.
You can install the Remote EMDI indexer on all supported Windows and Linux platforms. See
the Symantec Data Loss Prevention System Requirements Guide for platform details.
Detecting content using Exact Match Data Identifiers (EMDI) 508
Remote EMDI indexing

Creating an EMDI profile template for remote indexing

The EMDI Indexer uses an Exact Match Data Identifier Profile when it runs to ensure that the
data is correctly formatted. You must create the Exact Data Profile before you use the Remote
EMDI Indexer. The profile is a template that describes the columns that are used to organize
the data. The profile does not need to contain any data. After creating the profile, copy it to
the computer that runs the Remote EMDI Indexer.
To create an EMDI profile for remote indexing
1 From the Enforce Server administration console, navigate to the Manage > Data Profiles
> Exact Data screen.
2 Click Add Exact Match Data Identifier Profile.
3 In the Name field, enter a name for the profile.
4 In the Data Source field, select Use This File Name, and enter the name of the index
file to create with the *.emdi extension.
You must select this option when you just create the profile template. Later, you create
then index the profile with the data source using the Remote EMDI Indexer. Enter the file
name of the data source you plan to create for remote EMDI indexing. Be sure to name
the data source file exactly the same as the name you enter here.
After you copy the generated remote index back to the Enforce Server, use the Load
Externally Generated Index option to load the remote index into the profile template.
See “Copying and loading remote EDM index files to the Enforce Server” on page 594.
5 For remote EMDI indexing purposes you must specify the exact Number of Columns
the index is to have. Be sure to include the exact number of columns you specify here in
the data source file.
See “Uploading exact data source files for EDM to the Enforce Server” on page 539.
6 If the first row of the data source contains the column names, select the option Read first
row as column names.
7 In the Error Threshold text box, enter the maximum percentage of rows that can contain
errors.
If, during indexing of the data source, the number of rows with errors exceeds the
percentage that you specify here, the indexing operation fails.
8 In the Column Separator Char field, select the type of character that is used in your data
source to separate the columns of data.
9 In the File Encoding field, select the character encoding that is used in your data source.
If Latin characters are used, select the ISO-8859-1 option. For East Asian languages, use
either the UTF-8 or UTF-16 options.
Detecting content using Exact Match Data Identifiers (EMDI) 509
Remote EMDI indexing

10 Click Next to map the column headings from the data source to the profile.
11 At least one field must be selected as Required and mapped to a Data Identifier. At least
one field must be Optional.
12 Do not select any Indexing option available at this screen, since you intend to index
remotely.
13 Click Finish to complete the profile creation process.

Downloading and copying the EMDI profile file to a remote system

Download and copy the EMDI profile to the remote system
1 Configure an Exact Match Data Identifier Profile.
See “Creating an EDM profile template for remote indexing” on page 589.
2 Download the EMDI profile by selecting the download profile link at the Manage > Data
Profiles > Exact Data screen.
The system prompts you to save the EMDI profile as a file. The file extension is *.emdi.
3 Save the file.
If the data source host computer where you intend to run the Remote EMDI Indexer is
available on the same subnet as the Enforce Server, you can browse to that computer
and select it as the destination. Otherwise, manually copy the profile to the remote system.
4 Use the profile to index the data source using the Remote EMDI Indexer.
See “Generating remote index files for EDM” on page 591.

Generating remote index files for EMDI

You use the command-line Remote EMDI Indexer utility to generate an EMDI index for importing
to the Enforce Server. You can use the Remote EMDI Indexer to index a data source file that
you have generated and cleansed. Or you can pipe the output from the SQL Preindexer to
the standard input of the Remote EMDI Indexer. The SQL Preindexer requires an Oracle DB
data source and clean data.
When the indexing process completes, the Remote EMDI Indexer generates several files in
the specified result directory. These files are named after the data file that was indexed, with
one file having the .pdx extension and one or more files with the .rdx extension.
The remote EMDI indexer creates one .pdx file and one or more .rdx files:
■ ExternalEmdiDataSource.<DataSourceName>.pdx

■ ExternalEmdiDataSource.<DataSourceName>.<EmdiDataSourceID>.rdx
Detecting content using Exact Match Data Identifiers (EMDI) 510
Remote EMDI indexing

The number of .rdx files depends upon on how many columns you selected as key columns
when you created a profile.
For example, if you choose two columns, such as the CCN and SSN, you get two .rdx files.

Table 25-20 Options for generating remote EMDI indexes

Use case Description Remarks

Remote EMDI Indexer with data Specify data source file, EMDI profile, Use when you have a cleansed data
source file. output directory. source file; use for upgrading to DLP
15.5.

See “Remote indexing examples using

data source file (EDM)” on page 592.

Remote EMDI Indexer with SQL Query DB and pipe output to stdin of Requires Oracle DB and clean data.
Preindexer Remote EMDI Indexer.
See “Remote indexing examples using
SQL Preindexer (EDM)” on page 593.

Remote EMDI indexing examples using data source file

To use the Remote EMDI Indexer to index a flat data source file that you have generated and
cleansed, specify the local data source file name and path (-data), the local EMDI profile file
name and path (-profile), and the output directory for the generated index files (-result).
The syntax for using the Remote EMDI Indexer to generate an index from a cleansed data
source tabular text file is as follows:

RemoteEMDIIndexer -data=<local data source filename and path>

-profile=<local *.emdi profile file name and path>
-result=<local output directory for *.rdx and *pdx index files>

For example:

RemoteEMDIIndexer -data=C:\EMDIIndexDirectory\CustomerData.dat
-profile=C:\EMDIIndexDirectory\RemoteEMDIProfile.emdi
-result=C:\EMDIIndexDirectory\

This command generates an EMDI index using the local data source tabular text file
CustomerData.dat and the local RemoteEMDIProfile.emdi file that you generated and copied
from the Enforce Server to the remote host, where \EMDIIndexDirectory is the directory for
placing the generated index files.
When the generation of the indexes is successful, the utility displays the message "Successfully
created index" as the last line of output.
The remote EMDI indexer creates one .pdx file and one or more .rdx files:
Detecting content using Exact Match Data Identifiers (EMDI) 511
Remote EMDI indexing

■ ExternalEmdiDataSource.<DataSourceName>.pdx

■ ExternalEmdiDataSource.<DataSourceName>.<EmdiDataSourceID>.rdx

Remote EMDI Indexer command options

On install, the Remote EMDI Indexer utility is available at \Program
Files\Symantec\DataLossPrevention\Indexer\15.5\Protect\bin (Windows) and
opt/Symantec/DataLossPrevention/Indexer/15.5/Protect/bin (Linux).

If you are on Linux, change users to “SymantecDLP” before running the Remote EMDI Indexer.
The installation program creates the “SymantecDLP” user.
The Remote EMDI Indexer provides a command line interface. The syntax for running the
utility is as follows:

RemoteEMDIIndexer -profile=<file *.emdi> -result=<out_dir> [options]

Note the following about the syntax:

■ The Remote EMDI Indexer requires the -profile and -result arguments.
■ If you use a flat data source file as input, you must specify the file name and local path
using the -data option.
■ The -data option is omitted when you use the SQL Preindexer to pipe the data to the
Remote EMDI Indexer.
See “Remote indexing examples using data source file (EDM)” on page 592.
Table 25-21 describes the command options for the Remote EMDI Indexer.

Table 25-21 Remote EMDI Indexer command options

Option Summary Description

-data Data source to be indexed Specifies the data source to be indexed. If this option is not
(stdin) specified, the utility reads data from stdin.

Required if you use a Required if using data source file and not the SQL Preindexer.
tabular text file.
Detecting content using Exact Match Data Identifiers (EMDI) 512
Remote EMDI indexing

Table 25-21 Remote EMDI Indexer command options (continued)

Option Summary Description

-encoding Character encoding of data Specifies the character encoding of the data to index. The
to be indexed (ISO-8859-1). default is ISO-8859-1.

Use UTF-8 or UTF-16 if the data contains non-English

characters.

-ignore_date Ignore expiration date of the Overrides the expiration date of the Exact Data Profile if the
EMDI profile. profile has expired. By default, an Exact Data Profile expires
after 30 days.

-profile File containing the EMDI Specifies the Exact Match Data Identifier profile to use. This
profile profile is selected by clicking the download link on the Exact
Match Data Identifier screen in the Enforce Server
Required
administration console

-result Directory to place the Specifies the directory where the index files are generated.
resulting indexes.

Required

-verbose Display verbose output Displays a statistical summation of the indexing operation
when the index is complete.

See “Troubleshooting preindexing errors for EDM”

on page 598.

Remote EMDI indexing examples using the SQL Preindexer

If your data source is an Oracle DB and has clean data, you can index the data source directly
using the SQL Preindexer with the Remote EMDI Indexer.
The syntax is as follows:

SqlPreindexer -alias=<oracle connect string: //host:port/SID>

-username=<DB user> -password=<DB password> -query=<sql to run> |
RemoteEMDIIndexer -profile=<*.emdi profile file name and path>
-result=<output directory for index files>

For example:

SqlPreindexer -alias=@//myhost:1521/orcl -username=scott -password=tiger

-query="SELECT name, salary FROM employee" |
RemoteEMDIIndexer -profile=C:\ExportEMDIProfile.emdi -result=C:\EMDIIndexDirectory\

With this command the SQL Preindexer utility connects to the Oracle database and runs the
SQL query to retrieve name and salary data from the employee table. The SQL Preindexer
Detecting content using Exact Match Data Identifiers (EMDI) 513
Remote EMDI indexing

returns the result of the query to stdout (the command console). The SQL query must be in
quotes. The Remote EMDI Indexer command runs the utility and reads the query result from
the stdin console. The Remote EMDI Indexer indexes the data using the
ExportEMDIProfile.emdi profile as specified by the profile file name and local file path.

When the generation of the indexes is successful, the utility displays the message "Successfully
created index" as the last line of output.
In addition, the utility places the following generated index files in the EMDIIndexDirectory
-result directory:
■ ExternalEmdiDataSource.<DataSourceName>.pdx
■ ExternalEmdiDataSource.<DataSourceName>.<EmdiDataSourceID>.rdx
The number of .rdx files depends upon on how many columns you selected as key columns
when you created a profile.
For example, if you choose two columns, such as the CCN and SSN, you get two .rdx files.
Here is an example using SQL Preindexer and Remote EMDI Indexer commands:

SqlPreindexer -alias=@//localhost:1521/CUST -username=cust_user -password=cust_pword

-query="SELECT account_id, amount_owed, available_credit FROM customer_account" -verbose |
RemoteEMDIIndexer -profile=C:\EMDIIndexDirectory\CustomerData.emdi
-result=C:\EMDIIndexDirectory\ -verbose

Here the SQL Preindexer command queries the CUST.customer_account table in the database
for the account_id, amount_owed, and available_credit records. The result is piped to the
Remote EMDI Indexer which generates the index files based on the CustomerData.emdi
profile. The -verbose option is used for troubleshooting.
As an alternative to the -query SQL string you can use the -query_path option and specify
the file path and name for the SQL query (*.sql). If you do not specify a query or a query path
the entire DB is queried.

SqlPreindexer -alias=@//localhost:1521/cust -username=cust_user -password=cust_pwrd

-query_path=C:\EMDIIndexDirectory\QueryCust.sql -verbose |
RemoteEMDIIndexer -profile=C:\EMDIIndexDirectory\CustomerData.emdi
-result=C:\EMDIIndexDirectory\ -verbose

Copying and loading EMDI remote index files to the Enforce Server
The system creates one .pdx file and one or more .rdx files in the -result directory when
you remotely index a data source:
■ ExternalEmdiDataSource.<DataSourceName>.pdx

■ ExternalEmdiDataSource.<DataSourceName>.<EmdiDataSourceID>.rdx
Detecting content using Exact Match Data Identifiers (EMDI) 514
Remote EMDI indexing

One .rdx file is created for every key column. For example, the .rdx file can be
ExternalEmdiDataSource.MyProfile.3.rdx.

After you create the index file on a remote machine, you must copy the file to the Enforce
Server, load it into the previously created remote EMDI profile, and submit the indexing job.
See “Creating an EMDI profile template for remote indexing” on page 508.
To copy and load the files on the Enforce Server
1 Go to the directory where the index files were generated. (This directory is the one specified
in the -result option.)
2 Copy all of the index files with .pdx and .rdx extensions to the index directory on the
Enforce Server. This directory is located at
C:\ProgramData\Symantec\DataLossPrevention\ServerPlatformCommon\15.5\index
(Windows) or /var/Symantec/DataLossPrevention/ServerPlatformCommon/15.5/index
(Linux).
3 From the Enforce Server administration console, navigate to the Manage > Policies >
Exact Data screen.
This screen lists all the Exact Match Data Identifier profiles in the system.
4 Click the name of the Exact Match Data Identifier profile you used with the Remote EMDI
Indexer.
5 To load the new index files, go to the Data Source section of the Exact Data Profile and
select Load Externally Generated Index.
6 In the Indexing section, select Submit Indexing Job on Save.
As an alternative to indexing immediately on save, you can set up a job on the remote
machine to run the Remote EMDI Indexer on a schedule. The job should also copy the
generated files to the index directory on the Enforce Server. You can then schedule loading
the updated index files on the Enforce Server from the profile by selecting Load Externally
Generated Index and Submit Indexing Job on Schedule and configuring an indexing
schedule.
See “Use scheduled indexing to automate profile updates (EDM)” on page 607.
7 Click Save.

Troubleshooting EMDI preindexing errors

You may encounter errors when you index large amounts of data. Often the set of data contains
a data record that is incomplete, inconsistent, or inaccurate. Data rows that contain more
columns than expected or incorrect column data types often cannot be properly indexed and
are unrecognized.
The SQL Preindexer can be configured to provide a summary of information about the indexing
operation when it completes. To do so, specify the verbose option when running the SQL
Preindexer.
To see the rows of data that the Remote EMDI Indexer did not index, adjust the configuration
in the Indexer.properties file using the following procedure.
To record those data rows that were not indexed
1 Locate the Indexer.properties file at \Program Files\Symantec\Data Loss
Prevention\Indexer\15.5\Protect\config\Indexer.properties (Windows) or
/Symantec/DataLossPrevention/Indexer/15.5/Protect/config/Indexer.properties
(Linux).
2 Open the file in a text editor.
3 Locate the create_error_file property and change the “false” setting to “true.”
4 Save and close the Indexer.properties file.
The Remote EMDI Indexer logs errors in a file with the same name as the data file being
indexed and the .err suffix.
The rows of data that are listed in the error file are not encrypted. Safeguard the error file
to minimize any security risk from data exposure.
See “About the SQL Preindexer for EDM” on page 586.

Properties file settings for EMDI

The settings for EMDI in Table 25-22 can be configured in the Index.properties,
ProfileIndexConfiguration.properties, and Protect.properties files. These settings
enable EMDI on the DLP Agent, and control other EMDI metrics for columns, cells, log files,
and profile memory usage.
The Protect.properties and the ProfileIndexConfiguration.properties files are available
on the Enforce Server and the detection server.
The Indexer.properties file is available on the Enforce Server and only if you install the
Remote Indexer for EMDI, or IDM, or EDM.
After you edit the properties file settings, make sure that you restart the service to implement
your changes.
Detecting content using Exact Match Data Identifiers (EMDI) 516
Properties file settings for EMDI

Note: The EMDI.MaxEndpointProfileMemoryInMB = setting in the Protect.properties file

can be adjusted both on the Enforce Server and on the detection server. The setting on the
Enforce Server is used by the UI to indicate if the profile is too large to be shipped to the DLP
Agent. The setting on the detection server is the actual profile limit. You must keep both settings
identical on the Enforce Server and on the detection servers to avoid confusion.

Table 25-22 EMDI parameters configurable in properties files

EMDI parameter and file location Default Description

Protect.properties

On the Enforce Server:

C:\Program Files\Symantec\DataLossPrevention\
EnforceServer\15.5\Protect\config\Protect.properties
(Windows)

/opt/Symantec/DataLossPrevention/EnforceServer/
15.5/Protect/config/Protect.properties (Linux)

On the detection server:

C:\Program Files\Symantec\DataLossPrevention
\DetectionServer\15.5\Protect\config\Protect.properties
(Windows)

/opt/Symantec/DataLossPrevention/DetectionServer
/15.5/Protect/config/Protect.properties (Linux)

EMDI.EnabledOnAgents = false EMDI is disabled by default

on DLP Agents. To enable
EMDI on DLP Agents, set
this property to true.

EMDI.MaxEndpointProfileMemoryInMB = 100 Endpoint EMDI per profile

maximum memory usage in
megabytes. This limit is per
profile; not for all profiles
combined.

Indexer.properties

On the Enforce Server:

C:\Program Files\Symantec\DataLossPrevention\
EnforceServer\15.5\Protect\config\Indexer.properties
(Windows)

opt/Symantec/DataLossPrevention/EnforceServer/15.5
/Protect/config/Indexer.properties (Linux)
Detecting content using Exact Match Data Identifiers (EMDI) 517
Best practices for using EMDI

Table 25-22 EMDI parameters configurable in properties files (continued)

EMDI parameter and file location Default Description

emdi_indexer_log_max_files = 100 The maximum number of log

files for the EMDI indexer.

MaxDuplicateCellsPercentage = 1 The maximum integer

percentage of duplicate cells
in an index as a function of
the number of rows EMDI.

MaxNonMatchingDIPercentage = 1 The maximum integer

percentage of key column
values that don't match a
profile data identifier as a
function of the number of
rows EMDI.

ProfileIndexConfiguration

On the Enforce Server:

C:\Program Files\Symantec\DataLossPrevention\
EnforceServer\15.5\Protect\config\ProfileIndex
Configuration.properties (Windows)

/opt/Symantec/DataLossPrevention/EnforceServer/
15.5/Protect/config/ProfileIndexConfiguration.properties
(Linux)

On the detection server:

C:\Program Files\Symantec\DataLossPrevention\
DetectionServer\15.5\Protect\config\ProfileIndex
Configuration.properties (Windows)

/opt/Symantec/DataLossPrevention/EnforceServer/
15.5/Protect/config/ProfileIndexConfiguration.properties
(Linux)

emdi_matcher_log_max_files = 100 The maximum number of log

files for the EMDI matcher.

Best practices for using EMDI

Consider the recommendations in this section when you implement EMDI, to ensure that your
EMDI policies are as accurate as possible. Best practices are not intended to provide detailed
troubleshooting guidance. Following these best practices enables you to create a solid
implementation and reduces the need for troubleshooting and support.
Detecting content using Exact Match Data Identifiers (EMDI) 518
Best practices for using EMDI

Table 25-23 Summary of EMDI Best Practices

Best Practice More information

Never use any personally identifiable information See “Never use a personal identifier as an optional
(PII) as an optional column. column in EMDI” on page 519.

Use three or more columns in a match. See “Use three or more columns in a match for
EMDI” on page 519.

Don’t use EMDI validators as both optional and See “Don’t use EMDI validators as both optional
required for a given data identifier in a policy. and required for a given data identifier in a policy”
on page 519.

Use additional validators with EMDI where possible. See “Use additional validators with EMDI where
possible” on page 519.

Limit the required number of columns to no more See “Limit the required number of columns to two
than two or three. or three for EMDI” on page 519.

When matching with only a single optional column, See “When matching with only a single optional
avoid adding low-variability values as optional column, avoid adding low-variability values as
columns. optional columns with EMDI” on page 519.

Use full disk encryption on endpoint deployments. See “Use full disk encryption on EMDI endpoint
deployments” on page 519.

Eliminate duplicate rows and blank columns before See “Cleanse the EMDI data source file of blank
indexing. columns and duplicate rows” on page 519.

To reduce false positives, avoid single characters, See “Remove ambiguous character types from the
quotes, abbreviations, numeric fields with fewer EMDI data source file” on page 520.
than 5 digits, and dates.

Clean up your data source for multi-token cell See “Clean up your EMDI data source for
matching. multi-token matching” on page 521.

Use the pipe (|) character to delimit columns in your See “Do not use the comma delimiter if the EMDI
data source. data source has number fields” on page 521.

Ensure that the EMDI data source is clean for See “Ensure that the EMDI data source is clean for
indexing. indexing” on page 522.

Include the column headers as the first row of the See “Include column headers as the first row of the
data source file. EMDI data source file” on page 522.

Check the system alerts to tune Exact Match Data See “Check the EMDI system alerts to tune profile
Identifier profiles. accuracy” on page 522.

Automate profile updates with scheduled indexing. See “Use scheduled indexing to automate EMDI
profile updates” on page 523.
Detecting content using Exact Match Data Identifiers (EMDI) 519
Best practices for using EMDI

Never use a personal identifier as an optional column in EMDI

Map any personal identifier as a required column. Never use any personal identifier such as
an SSN, Credit Card Number, or Bank Account Number as an optional column.

Use three or more columns in a match for EMDI

Use three or more columns in a match to minimize false positives.

Don’t use EMDI validators as both optional and required for a given
data identifier in a policy
Do not use an EMDI validator in-line in a policy for a data identifier condition when the data
identifier has already been configured to use an EMDI validator.

Use additional validators with EMDI where possible

Use an additional validator, such as a Luhn check for a Credit Card. These additional validators
are applied before the EMDI lookup and reduce the number of false positives, as well as
improving performance.

Limit the required number of columns to two or three for EMDI

Try to limit the required number of columns to no more than two or three. The memory used
by a profile grows linearly with the number of required columns.

When matching with only a single optional column, avoid adding

low-variability values as optional columns with EMDI
When matching with a single optional column, avoid adding very low-variability values such
as States or 5-digit ZIP Codes as optional columns. Low variability values increase the likelihood
of false positives.

Use full disk encryption on EMDI endpoint deployments

For endpoint deployments, we recommend full disk encryption on the device.

Cleanse the EMDI data source file of blank columns and duplicate
rows
The data source file should be as clean as possible before you create the EMDI index, otherwise
the resulting profile may create false positives.
Detecting content using Exact Match Data Identifiers (EMDI) 520
Best practices for using EMDI

When you create the data source file, avoid including empty cells or blank columns. Blank
columns or fields count as errors when you generate the EMDI profile. A data source error is
either an empty cell or a cell with the wrong type of data (a name appearing in a phone number
column). The error threshold is the maximum percentage of rows that contain errors before
indexing stops. If the errors exceed the error threshold percentage for the profile (by default,
5%), the system stops indexing and displays an indexing error message.
The best practice is to remove blank columns and empty cells from the data source file, rather
than increasing the error threshold. Keep in mind that if you have many empty cells, it may
require a 100% error threshold for the system to create the profile. If you specify 100% as the
error threshold, the system indexes the data source without checking for errors.
In addition, do not fill empty cells or blank fields with fake data so that the error threshold is
met. Adding fake or "null" data to the data source file reduces the accuracy of the EMDI profile
and is discouraged. Content you want to monitor should be legitimate and not null.
See “Do not use the comma delimiter if the EMDI data source has number fields” on page 521.

Remove ambiguous character types from the EMDI data source file
You cannot have extraneous spaces, punctuation, and inconsistently populated fields in the
data source file. You can use tools such as Stream Editor (sed) and AWK to remove these
items from your data source file or files before indexing them.
Table 25-24 list characters to avoid in the data source file.

Table 25-24 Characters to avoid in the EMDI data source file

Characters to avoid Second column header: Explanation

Single characters Single character fields should be eliminated from

the data source file. These are more likely to cause
false positives, since a single character appears
frequently in normal communications.

Abbreviations Abbreviated fields should be eliminated from the

data source file for the same reason as single
characters.

Quotes Text fields should not be enclosed in quotes.

Small numbers Indexing numeric fields that contain fewer than 5

digits is not recommended because it likely yields
many false positives.
Detecting content using Exact Match Data Identifiers (EMDI) 521
Best practices for using EMDI

Table 25-24 Characters to avoid in the EMDI data source file (continued)

Characters to avoid Second column header: Explanation

Dates Date fields are also not recommended. Dates are

treated like a string, so if you index a date, such as
12/6/2007, the string has to match exactly. The
indexer only matches 12/6/2007, and not any other
date formats, such as Dec 6, 2007, 12-6-2007, or
6 Dec 2007. It must be an exact match.

Clean up your EMDI data source for multi-token matching

An EMDI validator performs a full-text search in a proximity of 50 tokens from a Data Identifier
match, checking each token (except those that are excluded because of ignored columns in
the data source) for potential matches.
If a cell in the data profile contains multiple words that are separated by spaces, punctuation,
or alternative Latin and Chinese, Japanese, and Korean (CJK) language characters, the cell
is a multi-token cell. The sub-token parts of a multi-token cell obey the same rules as
single-token cells: they are normalized according to their pattern where normalization can
apply.
If a cell contains a multi-token, the multi-token must match exactly. For example, a column
field with the value “Joe Brown” is a multi-token cell (assuming that multi-token matching is
enabled). At run-time the processor looks to match the exact string "Joe Brown,” including the
space (multiple spaces are normalized to one). The system does not match on "Joe" and
"Brown" if they are detected as single tokens.
Finally, do not change the WIP setting from "true" to "false" unless you are sure that is the
result you want to achieve. You should only set WIP = false when you need to loosen the
matching criteria, such as account numbers where formatting may change across messages.
Make sure that you test detection results to ensure that you get the matches that you expect.

Note: For the sake of brevity, the Lexer.IncludePunctuationInWords parameter is referred

to by the three-letter acronym "WIP."

Do not use the comma delimiter if the EMDI data source has number
fields
Of the four types of column delimiters that you can choose from for separating the fields in the
data source file (pipe, tab, semicolon, or comma), the pipe, semicolon, or tab (default) are
recommended. The comma delimiter is ambiguous and should not be used, especially if one
Detecting content using Exact Match Data Identifiers (EMDI) 522
Best practices for using EMDI

or more fields in your data source contain numbers. If you use a comma-delimited data source
file, make sure there are no commas in the data set other than those used as column delimiters.

Note: The system also treats the pound sign, equals sign, plus sign, and colon characters as
separators, but you should not use these because like the comma their meaning is ambiguous.

Ensure that the EMDI data source is clean for indexing

The following list summarizes a cleansed data source that is ready for indexing:
■ It contains at least one Required (key) column and one Optional column.
■ It is not a single-column data source; it has two or more columns.
■ Empty cells and rows and blank columns are removed.
■ Incomplete and duplicate records are removed.
■ The number of faulty cells is below the default error rate (5%) for indexing.
■ Fake data is not used to fill in blank cells or rows.
■ Improper and ambiguous characters are removed.
■ Multi-tokens comply with space and memory requirements.
■ Column fields are validated against the system-defined patterns that are available.
■ Mappings are validated against policy templates where applicable.

Include column headers as the first row of the EMDI data source file
When you extract the source data to the data source file, you should include the column
headers as the first row in the data source file. Including the column headers makes it easier
for you to identify the data you want to use in your policies.
The column names reflect the column mappings that were created when the exact data profile
was added. If there is an unmapped column, it is called Col X, where X is the column number
(starting with 1) in the original data profile.

Check the EMDI system alerts to tune profile accuracy

You should always review the system alerts after creating the Exact Match Data Identifier
profile. The system alerts provide very specific information about problems you encounter
when you create the profile. For example, an SSN in an address field affects accuracy.
Detecting content using Exact Match Data Identifiers (EMDI) 523
EMDI Troubleshooting

Use scheduled indexing to automate EMDI profile updates

When you configure an Exact Match Data Identifier Profile, you can set a schedule for
indexing the data source file. Index scheduling lets you decide when you want to index the
data source file. For example, instead of indexing the data source at the same time that you
define the profile, you can schedule it for a later date. Alternatively, if you need to reindex the
data source on a regular basis, you can schedule indexing to occur on a regular basis. Before
you set up an index schedule, consider the following:
■ If you update your data sources occasionally (for example, less than once a month),
generally there is no need to create a schedule. Index the data each time you update the
data source.
■ Schedule indexing for times of minimal system use. Indexing affects performance throughout
the Symantec Data Loss Prevention system, and large data sources can take time to index.
■ Index a data source as soon as you add or modify the corresponding exact data profile,
and re-index the data source whenever you update it. For example, consider a scenario
whereby every Wednesday at 2:00 P.M. you generate an updated data source file. In this
case you could schedule indexing every Wednesday at 3:00 P.M. This would give you
enough time to cleanse the data source file and copy it to the Enforce Server.
■ Do not index data sources daily, Daily indexing can degrade performance.
■ Monitor results and modify your indexing schedule accordingly. If performance is good and
you want more timely updates. For example, schedule more frequent data updates and
indexing.

EMDI Troubleshooting
Scan the following problems and solutions before you call Symantec support. Also, follow
EMDI Best Practices to avoid problems in your EMDI deployment.
See “Best practices for using EMDI” on page 517.

The EMDI index doesn’t get published to the Endpoint Agent

Solution: Verify that the parameter EMDI.EnabledOnAgents = true in the Protect.properties
file on each endpoint server.

The EMDI index doesn’t get published to the Endpoint Agent and
the EnabledOnAgents setting is true
Solution: Verify that the EMDI.MaxEndpointProfileMemoryInMB parameter in the
Protect.properties file on each endpoint server is set to a value larger than the index size.
Detecting content using Exact Match Data Identifiers (EMDI) 524
EMDI Troubleshooting

A key column that is in an EMDI index doesn’t generate an incident

Solution: If the Data Identifier in the key (required) column is associated with other validators,
make sure that the value passes these validators. Disable the validation against the EMDI
profile to see if an incident is generated against the same file or message.

EMDI generates an unexpectedly high number of false positives

Solution: Increase the minimum number of optional columns required for a match or remove
any optional columns that contain a large number of repeated values (for example, state or
ZIP Code).
Chapter 26
Detecting content using
Exact Data Matching (EDM)
This chapter includes the following topics:

■ Introducing Exact Data Matching (EDM)

■ Configuring Exact Data profiles for EDM

■ Configuring EDM policies

■ Using multi-token matching with EDM

■ Updating EDM indexes to the latest version

■ Memory requirements for EDM

■ Remote EDM indexing

■ Best practices for using EDM

Introducing Exact Data Matching (EDM)

Exact Data Matching (EDM) is designed to protect your most sensitive content. You can use
EDM to detect structured, tabular data, including personally identifiable information (PII). EDM
is designed to find records that are part of an indexed data source in either structured or
unstructured targets. Some examples are social security numbers, bank account numbers,
and credit card numbers. You can also detect confidential customer and employee records,
price list entries, parts from a parts list, and other confidential data stored in a structured data
source, such as a database, directory server, or a structured data file such as CSV or
spreadsheet.
To implement EDM policies, you identify and prepare the data you want to protect. You create
an Exact Data Profile and index the structured data source using the Enforce Server
Detecting content using Exact Data Matching (EDM) 526
Introducing Exact Data Matching (EDM)

administration console, or remotely using the Remote EDM Indexer. During the indexing
process, the system indexes the data by accessing and extracting the text-based content,
normalizing it, and securing it using a nonreversible hash. You can schedule indexing on a
regular basis after you have pulled current data from the data source to ensure that the EDM
index reflects the current data.
Once you have profiled the data, you configure the Content Matches Exact Data condition
to match individual pieces of the indexed data. For increased accuracy you can configure the
condition to match combinations of data fields from a particular record. The EDM policy condition
matches on data coming from the same row or record of data. For example, you can configure
the EDM policy condition to look for any three of First Name, Last Name, SSN, Account Number,
or Phone Number occurring together in a message and corresponding to a record from your
customer database.
Once the policy is deployed to one or more detection servers, cloud detection services, or
appliances, the system can detect the data fields (or records) that you have profiled in either
structured or unstructured format. For example, you could deploy the EDM policy to a Network
Discover Server and scan data repositories for confidential data matching data records in the
index. Your could also deploy the EDM policy to a Network Prevent for Email Server to detect
records in email communications and attachments, such as Microsoft Word files. If the
attachment is a spreadsheet, such as Microsoft Excel, the EDM policy can detect the presence
of confidential records there as well.
See “About the Exact Data Profile and index” on page 528.

About using EDM to protect content

To understand how EDM works, consider the following example. Your company maintains an
employee database that contains the following column fields:
■ First Name
■ Last Name
■ SSN
■ Date of Hire
■ Salary
In a structured data format such as a database, each row represents one record, with each
record containing values for each column data field. In this example, each row in the database
contains information for one employee, and you can use EDM to protect each record. For
example, one row in the data source file contains the following pipe ("|") delimited record:
First Name | Last Name | SSN | Date of hire | Salary
Bob | Smith | 123-45-6789 | 05/26/99 | $42500
Detecting content using Exact Data Matching (EDM) 527
Introducing Exact Data Matching (EDM)

You create an Exact Data Profile and index the data source file. When you configure the profile,
you map the data field columns to system-defined patterns and validate the data. You then
configure the EDM policy condition that references the Exact Data Profile. In this example, the
condition matches if a message contains all five data fields.
The detection server reports a match if it detects the following in any inbound message:
Bob Smith 123-45-6789 05/26/99 $42500
But, a message containing the following does not match because that record is not in the
index:
Betty Smith 000-00-0000 05/26/99 $42500
If you limited the condition to matching only the Last Name, SSN, and Salary column fields,
the following message is a match because it meets the criteria:
Robert, Smith, 123-45-6789, 05/29/99, $42500
Finally, the following message contents do not match because the value for the SSN is not
present in the profile:
Bob, Smith, 415-789-0000, 05/26/99, $42500
See “Configuring Exact Data profiles for EDM” on page 534.

EDM policy features

EDM policy matching involves searching for indexed content in a given message or file and
generating an incident if a match is found within the defined proximity range. The proximity
range can be changed by editing the EDM.SimpleTextProximityRadius Advanced Server
setting.
Policy matching features of EDM include the following:
■ You can select any number of columns to be matched from a given data source.
■ You can define excluded combinations so that matches against those combinations are
not reported.
■ When the system creates the index, it provides pattern validation for social security numbers,
credit card numbers, U.S. and Canada phone numbers and ZIP codes, email and IP
addresses, numbers, percents, and fields containing other values.
■ There is an editable stopword dictionary you can use to prevent single-token stopwords
from matching and prevents EDM from treating articles and prepositions as possible field
matches. Stopwords are common words, such as articles and prepositions. Stopwords are
not indexed.
■ The system provides match highlighting at the incident snapshot screen: tokens from
matching rows are highlighted.
Detecting content using Exact Data Matching (EDM) 528
Introducing Exact Data Matching (EDM)

■ You can use a WHERE clause in the EDM rule and matches that do not satisfy the WHERE
clause are ignored. For example, you can use a WHERE clause to only match on records
where the customer's country is the United States.
■ You can use Data Owner Exception to ignore detection based on the sender or recipient's
email address or domain. Data owner exception lets you tag or authorize a specific field
in an Exact Data Profile as the data owner. At run-time if the sender or recipient of the data
is authorized as a data owner, the condition does not trigger a match and the data is sent
or received by the data owner.
■ You can use profiled Directory Group Matching (DGM) to match on senders or recipients
of data based on email address or Windows user name.
■ Proximity matching range that is proportional to the number of required matches set in the
policy condition.
■ Full support for single- and multi-token cell indexing and matching. A multi-token is a cell
that is indexed that contains two or more words. Since a single CJK (Chinese, Japanese,
Korean) character is regarded as a token, two or more CJK characters are regarded as a
multi-token.

About the Exact Data Profile and index

The Exact Data Profile is the user-defined configuration that you create before indexing to
index the data source. The index is a set of secure files that contain hashes of the exact data
values from each field in your data source, along with information about those data values.
The index does not contain the data values themselves.
The index that is generated consists of 19 binary DataSource.rdx files, each with space to fit
into random access memory (RAM) on the detection server(s). By default, Symantec Data
Loss Prevention stores index files in
C:\ProgramData\Symantec\DataLossPrevention\ServerPlatformCommon\15.5\Protect\index
(on Windows) or in
/var/Symantec/DataLossPrevention/ServerPlatformCommon/15.5/Protect/index (on
Linux) on the Enforce Server.
Symantec Data Loss Prevention automatically deploys all EDM indexes (*.rdx files) to the
index directory on all detection servers. When an active policy that references an EDM profile
is deployed to a detection server, the detection server loads the corresponding EDM index
into RAM. If a new detection server is added after an index has been created, the *.rdx files
in the index folder on the Enforce Server are deployed to the index folder on the new detection
server. You cannot manually deploy index files to detection servers.
At run-time during detection, the system converts extracted content into hashed data values
using the same algorithm it employs for indexes. It then compares data values from input
content to those in the appropriate index file(s), identifying matches.
Detecting content using Exact Data Matching (EDM) 529
Introducing Exact Data Matching (EDM)

See “Creating and modifying Exact Data Profiles for EDM” on page 541.
See “Memory requirements for EDM” on page 579.

About the exact data source file

The data source file is a tabular file containing data in a standard delimited format (comma,
semicolon, pipe, or tab) that has been extracted from a database, spreadsheet, or other
structured data source, and cleansed for profiling. You upload the data source file to the Enforce
Server when you are defining the Exact Data Profile. For example, you can convert an Excel
spreadsheet to a comma-separated values (CSV) format and the resulting *.csv file can be
used as the data source for your EDM profile.
See “About cleansing the exact data source file for EDM” on page 530.
See “Creating the exact data source file for EDM” on page 535.
You can use the SQL pre-indexer to index the data source directly. However, this approach
has limitations because in most cases the data must first be cleansed before it is indexed.
See “Remote EDM indexing” on page 585.
The data source file must contain at least one unique column field. A unique column field is a
column that has mostly unique values. It can have duplicate values, but not more than the
number set in term_commonority_threshold. The default value for this setting is 10. Some
examples of unique column fields include social security number, drivers license number, and
credit card number.
See “Best practices for using EDM” on page 601.
The maximum number of columns for a single data source file is 32. If the data source file has
more than 32 columns, the Enforce Server administration console produces an error message
at the profile screen, and the data source file is not indexed. The maximum number of rows
is 4,294,967,294 and the total number of cells in a single data source file cannot exceed 6
billion cells. If your data source file is larger than this, split it into multiple files and index each
separately.
Table 26-1 summarizes size limitations for EDM data source files.

Note: The format for the data source file should be a text-based format using commas,
semicolons, pipes, or tabs as delimiters. In general you should avoid using a spreadsheet
format for the data source file (such as XLS or XLSX) because such programs use scientific
notation to render numbers.
Detecting content using Exact Data Matching (EDM) 530
Introducing Exact Data Matching (EDM)

Table 26-1 EDM data source file size limitations

Data source file Limit Description

Columns 32 The data source file cannot have more than 32 columns. If it does, the system
does not index it.

Cells 6 billion The data source file cannot have more than 6 billion data cells. If it does, the
system does not index it.

Rows 4,294,967,294 The maximum number of rows supported is 4,294,967,294.

About cleansing the exact data source file for EDM

Once you have created the data source file, you must prepare the data for indexing by cleansing
it. It is critical that you cleanse the data source file to ensure that your EDM policies are as
accurate as possible. You can use tools such as Stream Editor (sed) and awk to cleanse the
data source file. Melissa Data provides good tools for normalizing data in the data source,
such as addresses.
Table 26-2 provides the workflow for cleansing the data source file for indexing.

Table 26-2 Workflow for cleansing the data source file for EDM

Step Action Description

1 Prepare the data source file for indexing. See “Preparing the exact data source file for indexing
for EDM” on page 537.

2 Ensure that the data source has at least See “Ensure data source has at least one column
one column that is unique data. of unique data (EDM)” on page 602.

3 Remove incomplete and duplicate See “Cleanse the data source file of blank columns
records. Do not fill empty cells with and duplicate rows (EDM)” on page 603.
bogus data.

4 Remove improper characters. See “Remove ambiguous character types from the
data source file (EDM)” on page 604.

5 Verify that the data source file is below See “Preparing the exact data source file for indexing
the error threshold. The error threshold for EDM” on page 537.
is the maximum percentage of rows that
contain errors before indexing stops.

About using System Fields for data source validation with EDM
Column headings in your data source are useful for visual reference. However, they do not
tell Symantec Data Loss Prevention what kind of data the columns contain. To do this, you
Detecting content using Exact Data Matching (EDM) 531
Introducing Exact Data Matching (EDM)

use the Field Mappings section of the Exact Data Profile to specify mappings between fields
in your data source. You can also use field mappings to specify fields that the system recognizes
in the system-provided policy templates. The Field Mappings section also gives you advanced
options for specifying custom fields and validating the data in those fields.
See “Mapping Exact Data Profile fields for EDM” on page 545.
Consider the following example use of field mappings. Your company wants to protect employee
data, including employee social security numbers. You create a Data Loss Prevention policy
based on the Employee Data Protection template. The policy requires an exact data index
with fields for social security numbers and other employee data. You prepare your data source
and then create the Exact Data Profile. To validate the data in the social security number
field, you map this column field in your index to the "Social Security Number" system field
pattern. The system then validates all data in that field using the Social Security Number
validator to ensure that each data item is a social security number.
Using the system-defined field patterns to validate your data is critical to the accuracy of your
EDM policies. If there is no system-defined field pattern that corresponds to one or more data
fields in your index, you can define custom fields and choose the appropriate validator to
validate the data.
See “Map data source column to system fields to leverage validation (EDM)” on page 605.

About index scheduling for EDM

After you have indexed an exact data source extract, its schema cannot be changed because
the *.rdx index file is binary. If the data source changes, or the number of columns or data
mapping of the exact data source file changes, you must create a new EDM index and update
the policies that reference the changed data. In this case you can schedule the indexing to
keep the index in sync with the data source.
The typical use case is as follows. You extract data from a database to a file and cleanse it to
create your data source file. Using the Enforce Server administration console you define an
Exact Data Profile and index the data source file. The system generates the *.rdx index files
and deploys them to one or more detection servers. However, if you know that the data changes
frequently, you need to generate a new data source file weekly or monthly to keep up with the
changes to the database. In this case, you can use index scheduling to automate the indexing
of the data source file so you do not have to return to the Enforce Server administration console
and reindex the updated data source. Your only task is to drop an updated and cleansed data
source file to the Enforce Server for scheduled indexing.

Note: You must reindex after upgrading to the latest version of Symantec Data Loss Prevention.

See “Configuring Exact Data profiles for EDM” on page 534.

See “Scheduling Exact Data Profile indexing for EDM” on page 548.
Detecting content using Exact Data Matching (EDM) 532
Introducing Exact Data Matching (EDM)

See “Use scheduled indexing to automate profile updates (EDM)” on page 607.

About the Content Matches Exact Data From condition for EDM
The Content Matches Exact Data From an Exact Data Profile condition is the detection
component you use to implement EDM policy conditions. When you define this condition, you
select the EDM profile on which the condition is based. You also select the columns you want
to use in your condition, as well as any WHERE clause limitations.

Note: You cannot use the Content Matches Exact Data From an Exact Data Profile condition
as a policy exception. Symantec Data Loss Prevention does not support the use of the EDM
condition as a policy exception.

See “Configuring the Content Matches Exact Data policy condition for EDM” on page 551.

About Data Owner Exception for EDM

Although EDM does not support the explicit use of match exceptions in policies, EDM does
support criteria-based matching exceptions. This feature of EDM is known as Data Owner
Exception. Data owner exception lets you tag or authorize a specific field in an Exact Data
Profile as the data owner. At run-time if the sender or recipient of the data is authorized as a
data owner, the condition does not trigger a match and the data is sent or received by the data
owner.
You implement data owner exception by including either the email address field or domain
address field in your Exact Data Profile. In the EDM policy condition, you specify the field as
either the sender or recipient data owner. An authorized data owner, identified by email address
or a domain address, who is a sender can send confidential information without triggering an
EDM match or incident. This means that the sender can send any information that is contained
in the row where the sender's email address or domain is specified. Authorized data owner
recipients can be specified individually or all recipients in the list can be allowed to receive the
data without triggering a match.
As a policy author, data owner exception gives you the flexibility to allow data owners to use
their own data legitimately. For example, if data owner exception is enabled, an employee can
send an email containing their confidential information (such as an account number) without
triggering a match or an incident. Similarly, if data owner exception is configured for a recipient,
the system does not trigger an EDM match or incident if the data owner receives their own
information, such as when someone outside the company sends an email to the data owner
containing the data owner's account number.
See “About upgrading EDM deployments” on page 534.
See “Creating the exact data source file for Data Owner Exception for EDM” on page 536.
Detecting content using Exact Data Matching (EDM) 533
Introducing Exact Data Matching (EDM)

See “Configuring Data Owner Exception for EDM policy conditions” on page 554.

About profiled Directory Group Matching (DGM) for EDM

Profiled Directory Group Matching (DGM) is a specialized implementation of EDM that is used
to detect the exact identity of a message user, sender, or recipient that has been profiled from
a directory server or database.
Profiled DGM leverages EDM technology to detect identities that you have indexed from your
database or directory server using an Exact Data Profile. For example, you can use profiled
DGM to identify network user activity or to analyze content associated with particular users,
senders, or recipients. Or, you can exclude certain email addresses from analysis. Or, you
might want to prevent certain people from sending confidential information by email.
To implement profiled DGM, your exact data source file must contain one or more of the
following fields:
■ Email address
■ IP address
■ Windows user name
■ IM name
If you include the email address field in the DGM profile, the field appears in the Directory
EDM drop-down list at the incident snapshot screen in the Enforce Server administration
console, which facilitates remediation.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.
See “Include an email address field in the Exact Data Profile for profiled DGM (EDM)”
on page 610.
See “Use profiled DGM for Network Prevent for Web identity detection (EDM)” on page 611.

About two-tier detection for EDM on the endpoint

The EDM index is server-based. If you deploy a policy containing an EDM condition to the
DLP Agent on the endpoint, the system uses two-tier detection to evaluate data for matching.
The EDM detection condition is not evaluated locally by the DLP Agent. Instead, the DLP
Agent sends the data to the Endpoint Server for evaluation against the index. If the endpoint
is offline, the message cannot be sent until the server is available, which can affect endpoint
performance. In addition, two-tier detection has no ability to block, encrypt, or notify. Symantec
does not recommend two-tier detection.
See “Two-tier detection for DLP Agents” on page 395.
To check if you are using two-tier detection, read the
C:\ProgramData\Symantec\DataLossPrevention\DetectionServer\15.5\logs\debug\FileReader.log
Detecting content using Exact Data Matching (EDM) 534
Configuring Exact Data profiles for EDM

on the Endpoint Server to see if any EDM indexes are loaded. Look for the line "loaded database
profile."
See “Troubleshooting policies” on page 445.

About upgrading EDM deployments

To take advantage of the latest EDM enhancements, you must upgrade your servers to the
latest version of Symantec Data Loss Prevention version and you must reindex your EDM
data sources using the latest version of the EDM Indexer. Reindexing should be done after
you upgrade all of your servers. In that case, the old detection servers can continue to work
with the old indexes while you upgrade.
See “About Data Owner Exception for EDM” on page 532.
See “Updating EDM indexes to the latest version” on page 574.
See “Memory requirements for EDM” on page 579.
See “EDM index out-of-date error codes” on page 578.

Configuring Exact Data profiles for EDM

To implement EDM, you create the Exact Data Profile, index the data source, and define one
or more Content Matches Exact Data conditions to match profiled data exactly.
See “About the Exact Data Profile and index” on page 528.

Table 26-3 Implementing Exact Data Matching with EDM

Step Action Description

1 Create the data source file. Export the source data from the database (or other data repository) to
a tabular text file with delimited fields.

If you want to except data owners from matching, you need to include
specific data items in the data source file.

See “About the exact data source file” on page 529.

If you want to match identities for profiled Directory Group Matching

(DGM), you need to include specific data items in the data source files.

See “Creating the exact data source file for EDM” on page 535.

See “Creating the exact data source file for profiled DGM for EDM”
on page 537.
Detecting content using Exact Data Matching (EDM) 535
Configuring Exact Data profiles for EDM

Table 26-3 Implementing Exact Data Matching with EDM (continued)

Step Action Description

2 Prepare the data source file for Cleanse the data source file.
indexing.
See “Preparing the exact data source file for indexing for EDM”
on page 537.

3 Upload the data source file to the You can copy or upload the data source file to the Enforce Server, or
Enforce Server. access it remotely.

See “Uploading exact data source files for EDM to the Enforce Server”
on page 539.

4 Create an Exact Data Profile. An Exact Data Profile is required to implement Exact Data Matching
(EDM) policies. The Exact Data Profile specifies the data source, data
field types, and the indexing schedule.

See “Creating and modifying Exact Data Profiles for EDM” on page 541.

5 Map and validate the data fields. You map the source data fields to system or custom data types that
the system validates. For example, a social security number data field
needs to be nine digits.

See “About using System Fields for data source validation with EDM”
on page 530.

See “Mapping Exact Data Profile fields for EDM” on page 545.

6 Index the data source, or Schedule the indexing to keep the index in sync with the data
schedule indexing. source.See “About index scheduling for EDM” on page 531.

See “Scheduling Exact Data Profile indexing for EDM” on page 548.

7 Configure and tune one or more See “Configuring the Content Matches Exact Data policy condition for
Content Matches Exact Data EDM” on page 551.
policy conditions.

Creating the exact data source file for EDM

The first step in the EDM indexing process is to create the data source. A data source is a
tabular file containing data in a standard delimited format, where data is delimited by commas,
semicolons, pipes, or tabs.
If you plan to use a policy template, review it before creating the data source file to see which
data fields the policy uses. For relatively small data sources, include as many suggested fields
in your data source as possible. However, note that the more fields you include, the more
memory the resulting index requires. This consideration is important if you have a large data
source. When you create the data profile, you can confirm how well the fields in your data
source match against the suggested fields for the template.
Detecting content using Exact Data Matching (EDM) 536
Configuring Exact Data profiles for EDM

See Table 26-4 on page 536.

Table 26-4 Create the exact data source file

Step Description

1 Export the data you want to protect from a database or other tabular data format, such as an Excel
spreadsheet, to a tabular text file. The data source file you create must be a tabular text file that contains
rows of data from the original source. Each row from the original source is included as a row in the data
source file. Delimit columns using a tab, a comma, or a pipe. Pipe is preferred. Comma should not be
used if your data source fields contain numbers.

See “About the exact data source file” on page 529.

You must maintain all the structured data that you exported from the source database table or table-like
format in one data source file. You cannot split the data source across multiple files.

The data source file cannot exceed 32 columns, 4,294,967,294 rows, or 6 billion cells. If you plan to
upload the data source file to the Enforce Server, browser capacity limits the data source size to 2 GB.
For file sizes larger than this size you can copy the file to the Enforce Server using FTP/S, SCP, SFTP,
CIFS, or NFS.

2 Include required data fields for specific EDM implementations:

■ Unique data
For all EDM implementations, make sure that the data source contains at least one column of unique
data.
See “Ensure data source has at least one column of unique data (EDM)” on page 602.
■ Data Owner Exception
Make sure that the data source contains the email address field or domain field, if you plan to use
data owner exceptions.
See “Creating the exact data source file for Data Owner Exception for EDM” on page 536.
■ Directory Group Matching
Make sure that the data source includes one or more sender/recipient identifying fields.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.

3 Prepare the data source file for indexing.

See “Preparing the exact data source file for indexing for EDM” on page 537.

Creating the exact data source file for Data Owner Exception for
EDM
To implement Data Owner Exception and ignore data owners from detection, you must explicitly
include each user's email address or domain address in the Exact Data Profile. Each expected
domain (for example, symantec.com) must be explicitly added to the Exact Data Profile. The
system does not automatically match on subdomains (for example, support.symantec.com).
Each subdomain must be explicitly added to the Exact Data Profile.
Detecting content using Exact Data Matching (EDM) 537
Configuring Exact Data profiles for EDM

To implement the data owner exception feature, you must include either or both of the following
fields in your data source file:
■ Email address, such as [email protected]
■ Domain address, such as symantec.com
See “About Data Owner Exception for EDM” on page 532.
See “Configuring Data Owner Exception for EDM policy conditions” on page 554.

Creating the exact data source file for profiled DGM for EDM
Profiled DGM leverages Exact Data Matching (EDM) technology to precisely detect identities.
Identity-related attributes may include an IP address, email address, user name, business
unit, department, manager, title, or employment status. Other attributes may be whether that
employee has provided consent to be monitored, or whether the employee has access to
sensitive information. To implement profiled DGM, you must include at least one required data
field in your data source.
See “About the Exact Data Profile and index” on page 528.
Table 26-5 lists the required fields for profiled DGM. The data source file must contain at least
one of these fields.

Table 26-5 Profiled DGM data source fields for EDM

Field Description

Email address If you use an email address column field in the data source file, the email address appears in
the Directory EDM drop-down list at the incident snapshot screen.

IP address For example: 172.24.56.33

Windows user name If you use a Windows user name field in your data source, the data must be in the following
format: domain\user; for example: ACME\john_smith.

AOL IM name IM screen name

Skype name For example: myscreenname123

Microsoft Office
Communicator name

Preparing the exact data source file for indexing for EDM
Once you create the exact data source file, you must prepare it so that you can efficiently index
the data you want to protect.
Detecting content using Exact Data Matching (EDM) 538
Configuring Exact Data profiles for EDM

When you index an exact data profile, the Enforce Server keeps track of empty cells and any
misplaced data which count as errors. For example, an error may be a name that appears in
a column for phone numbers. Errors can constitute a certain percentage of the data in the
profile (five percent, by default). If this default error threshold is met, Symantec Data Loss
Prevention stops indexing. It then displays an error to warn you that your data may be
unorganized or corrupt.
To prepare the exact data source for EDM indexing
1 Make sure that the data source file is formatted as follows:
■ If the data source has more than 200,000 rows, verify that it has at least two columns
of data. One of the columns should contain unique values. For example, credit card
numbers, driver’s license numbers, or account numbers (as opposed to first and last
names, which are generic).
See “Ensure data source has at least one column of unique data (EDM)” on page 602.
■ Verify that you have delimited the data source using pipes ( | ) or tabs. If the data
source file uses commas as delimiters, remove any commas that do not serve as
delimiters.
See “Do not use the comma delimiter if the data source has number fields (EDM)”
on page 605.
■ Verify that data values are not enclosed in quotes.
■ Remove single-character and abbreviated data values from the data source. For
example, remove the column name and all values for a column in which the possible
values are Y and N.
■ Optionally, remove any columns that contain numeric values with less that five digits,
as these can cause false positives in production.
See “Remove ambiguous character types from the data source file (EDM)” on page 604.
■ Verify that numbers, such as credit card or social security, are delimited internally by
dashes, or spaces, or none at all. Make sure that you do not use a data-field delimiter
such as a comma as an internal delimiter in any such numbers. For example:
123-45-6789, or 123 45 6789, or 123456789 are valid, but not 123,45,6789.
See “Do not use the comma delimiter if the data source has number fields (EDM)”
on page 605.
■ Eliminate duplicate records, which can cause duplicate incidents in production.
See “Cleanse the data source file of blank columns and duplicate rows (EDM)”
on page 603.
■ Do not index common values. EDM works best with values that are unique. Think
about the data you want to index (and thus protect). Is this data truly valuable? If the
value is something common, it is not useful as an EDM value. For example, suppose
that you want to look for "US states." Since there are only 50 states, if your exact data
Detecting content using Exact Data Matching (EDM) 539
Configuring Exact Data profiles for EDM

profile has 300,000 rows, the result is a lot of duplicates of common values. Symantec
Data Loss Prevention indexes all values in the exact data profile, regardless of if the
data is used in a policy or not. It is good practice to use values that are less common
and preferably unique to get the best results with EDM.
See “Ensure data source has at least one column of unique data (EDM)” on page 602.

2 Once you have prepared the exact data source file, proceed with the next step in the EDM
process: upload the exact data source file to the Enforce Server for profiling the data you
want to protect.
See “Uploading exact data source files for EDM to the Enforce Server” on page 539.

Uploading exact data source files for EDM to the Enforce Server
After you have prepared the data source file for indexing, load it to the Enforce Server so the
data source can be indexed.
See “Creating and modifying Exact Data Profiles for EDM” on page 541.
Listed here are the options you have for making the data source file available to the Enforce
Server. Consult with your database administrator to determine the best method for your needs.

Table 26-6 Uploading the data source file for EDM to the Enforce Server for indexing

Upload option(s) Use case Description

Upload Data Source Data source file is If you have a smaller data source file (less than 50 MB), upload the data
to Server Now less than 50 MB source file to the Enforce Server using the Enforce Server administration
console (web interface). When creating the Exact Data Profile, you can
specify the file path or browse to the directory and upload the data source
file.
Note: Due to browser capacity limits, the maximum file size that you can
upload is 2 GB. However, uploading any file over 50 MB is not
recommended since files over this size can take a long time to upload. If
your data source file is over 50 MB, consider copying the data source file
to the datafiles directory using the next option.
Detecting content using Exact Data Matching (EDM) 540
Configuring Exact Data profiles for EDM

Table 26-6 Uploading the data source file for EDM to the Enforce Server for indexing
(continued)

Upload option(s) Use case Description

Reference Data Data source file is If you have a large data source file (over 50 MB), copy it to the datafiles
Source on Manager over 50 MB. directory on the host where Enforce is installed.
Host
■ On Windows this directory is located at
C:\ProgramData\Symantec\DataLossPrevention
\ServerPlatformCommon\15.5\Protect\datafiles.
■ On Linux this directory is located at
/var/Symantec/DataLossPrevention
/ServerPlatformCommon/15.5/datafiles.

This option is convenient because it makes the data file available through
a drop-down list during configuration of the Exact Data Profile. If it is a
large file, use a third-party solution (such as Secure FTP) to transfer the
data source file to the Enforce Server.
Note: Ensure that the Enforce user (usually called "protect") has modify
permissions (on Windows) or rw permissions (on Linux) for all files in the
datafiles directory.

Use This File Name Data source file is You may want to create an EDM profile before you have created the data
not yet created. source file. In this case you can create a profile template and specify the
name of the data source file you plan to create. This option lets you define
EDM policies using the EDM profile template before you index the data
source. The policies do not operate until the data source is indexed. When
you have created the data source file you place it in the
\ProgramData\Symantec\DataLossPrevention
\ServerPlatformCommon\15.5\Protect\datafiles directory
(Windows) or /var/Symantec/DataLossPrevention
/ServerPlatformCommon/15.5/Protect/datafiles (Linux) and
index the data source immediately on save or schedule indexing.

See “Creating and modifying Exact Data Profiles for EDM” on page 541.
Detecting content using Exact Data Matching (EDM) 541
Configuring Exact Data profiles for EDM

Table 26-6 Uploading the data source file for EDM to the Enforce Server for indexing
(continued)

Upload option(s) Use case Description

Use This File Name Data source is to In some environments it may not be secure or feasible to copy or upload
be indexed the data source file to the Enforce Server. In this situation you can index
and
remotely and the data source remotely using Remote EDM Indexer.
Load Externally copied to the
See “Remote EDM indexing” on page 585.
Generated Index Enforce Server.
This utility lets you index an exact data source on a computer other than
the Enforce Server host. This feature is useful when you do not want to
copy the data source file to the same computer as the Enforce Server.
As an example, consider a situation where the originating department
wants to avoid the security risk of copying the data to an
extra-departmental host. In this case you can use the Remote EDM
Indexer.

First you create an EDM profile template where you choose the Use this
File Name and the Number of Columns options. You must specify the
name of the data source file and the number of columns it contains.

See “Creating an EDM profile template for remote indexing” on page 589.

You then use the Remote EDM Indexer to remotely index the data source
and copy the index files to the Enforce Server host and load the externally
generated index. The Load Externally Generated Index option is only
available after you have defined and saved the profile. Remote indexes
are loaded from the \Program
Files\Symantec\DataLossPrevention\
EnforceServer\15.5\Protect\index directory on the Enforce
Server host.

See “Copying and loading remote EDM index files to the Enforce Server”
on page 594.

Creating and modifying Exact Data Profiles for EDM

The Manage > Data Profiles > Exact Data > Add Exact Data Profile screen is the home
page for managing and adding Exact Data Profiles. An Exact Data Profile is required to
implement an instance of the Content Matches Exact Data conditions. An Exact Data Profile
specifies the data source, the indexing parameters, and the indexing schedule. Once you have
created the EDM profile, you index the data source and configure one or more Content Matches
Exact Data conditions that can be added to rules to use the profile and detect exact content
matches.
See “Configuring Exact Data profiles for EDM” on page 534.
Detecting content using Exact Data Matching (EDM) 542
Configuring Exact Data profiles for EDM

Note: If you are using the Remote EDM Indexer to generate the Exact Data Profile, refer to
the following topic.

To create or modify an Exact Data Profile

1 Make sure that you have created the data source file.
See “Creating the exact data source file for EDM” on page 535.
2 Make sure that you have prepared the data source file for indexing.
See “Preparing the exact data source file for indexing for EDM” on page 537.
3 Make sure that the data source contains the email address field or domain field, if you
plan to use data owner exceptions.
See “About Data Owner Exception for EDM” on page 532.
4 In the Enforce Server administration console, navigate to Manage > Data Profiles >
Exact Data.
5 Click Add Exact Data Profile.
6 Enter a unique, descriptive Name for the profile (limited to 256 characters).
For easy reference, choose a name that describes the data content and the index type
(for example, Employee Data EDM).
If you modify an existing Exact Data Profile you can change the profile name.
7 Select one of the following Data Source options to make the data source file available to
the Enforce Server:
■ Upload Data Source to Server Now
If you are creating a new profile, click Browse and select the data source file, or enter
the full path to the data source file.
If you are modifying an existing profile, select Upload Now.
See “Uploading exact data source files for EDM to the Enforce Server” on page 539.
■ Reference Data Source on Manager Host
If you copied the data source file to the datafiles directory on the Enforce Server, it
appears in the drop-down list for selection.
See “Uploading exact data source files for EDM to the Enforce Server” on page 539.
■ Use This File Name
Select this option if you have not yet created the data source file but want to configure
EDM policies using a placeholder EDM profile. Enter the file name of the data source
you plan to create, including the Number of Columns it is to have. When you do
create the data source, you must copy it to the datafiles directory.
See “Uploading exact data source files for EDM to the Enforce Server” on page 539.
Detecting content using Exact Data Matching (EDM) 543
Configuring Exact Data profiles for EDM

■ Load Externally Generated Index

Select this option if you have created an index on a remote computer using the Remote
EDM Indexer. This option is only available after you have defined and saved the profile.
Profiles are loaded from the \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\index
directory (Windows) or the
/var/Symantec/DataLossPrevention/EnforceServer/15.5/index directory (Linux)
on the Enforce Server host.
See “Uploading exact data source files for EDM to the Enforce Server” on page 539.

8 If the first row of your data source contains Column Names, select Read first row as
column names.
9 Specify the Error Threshold, which is the maximum percentage of rows that contain
errors before indexing stops.
A data source error is either an empty cell, a cell with the wrong type of data, or extra
cells in the data source. For example, a name in a column for phone numbers is an error.
If errors exceed a certain percentage of the overall data source (by default, 5%), the
system quits indexing and displays an indexing error message. The index is not created
if the data source has more invalid records than the error threshold value allows. Although
you can change the threshold value, more than a small percentage of errors in the data
source can indicate that the data source is corrupt, is in an incorrect format, or cannot be
read. If you have a significant percentage of errors (10% or more), stop indexing and
cleanse the data source.
See “Preparing the exact data source file for indexing for EDM” on page 537.
10 Select the Column Separator Char (delimiter) that you have used to separate the values
in the data source file. The delimiters you can use are tabs, commas, or pipes.
11 Select one of the following encoding values for the content to analyze, which must match
the encoding of your data source:
■ ISO-8859-1 (Latin-1) (default value)
Standard 8-bit encoding for Western European languages using the Latin alphabet.
■ UTF-8
Use this encoding for all languages that use the Unicode 4.0 standard (all single- and
double-byte characters), including those in East Asian languages.
■ UTF-16
Detecting content using Exact Data Matching (EDM) 544
Configuring Exact Data profiles for EDM

Use this encoding for all languages that use the Unicode 4.0 standard (all single- and
double-byte characters), including those in East Asian languages.

Note: Make sure that you select the correct encoding. The system does not prevent you
from creating an EDM profile using the wrong encoding. The system only reports an error
at run-time when the EDM policy attempts to match inbound data. To make sure that you
select the correct encoding, after you click Next, verify that the column names appear
correctly. If the column names do not look correct, you chose the wrong encoding.

12 Click Next to go to the second Add Exact Data Profile screen.

13 The Field Mappings section displays the columns in the data source and the field to
which each column is mapped in the Exact Data Profile. Field mappings in existing Exact
Data Profiles are fixed and, therefore, are not editable.
See “About using System Fields for data source validation with EDM” on page 530.
See “Mapping Exact Data Profile fields for EDM” on page 545.
Confirm that the column names in your data source are accurately represented in the
Data Source Field column. If you selected the Column Names option, the Data Source
Field column lists the names in the first row of your data source. If you did not select the
Column Names option, the column lists Col 1, Col 2, and so on.
14 In the System Field column, select a field from the drop-down list for each data source
field. This step is required if you use a policy template, or if you want to check for errors
in the data source.
For example, for a data source field that is called SOCIAL_SECURITY_NUMBER, select
Social Security Number from the corresponding drop-down list. The values in the System
Field drop-down lists include all suggested fields for all policy templates.
15 Optionally, specify and name any custom fields (that is, the fields that are not pre-populated
in the System Field drop-down lists). To do so, perform these steps in the following order:
■ Click Advanced View to the right of the Field Mappings heading. This screen displays
two additional columns (Custom Name and Type).
■ To add a custom system field name, go to the appropriate System Field drop-down
list. Select Custom, and type the name in the corresponding Custom Name text field.
■ To specify a pattern type (for purposes of error checking), go to the appropriate Type
drop-down list and select the wanted pattern. To see descriptions of all available pattern
types, click Description at the top of the column.
Detecting content using Exact Data Matching (EDM) 545
Configuring Exact Data profiles for EDM

16 Check your field mappings against the suggested fields for the policy template you plan
to use. To do so, go to the Check Mappings Against drop-down list, select a template,
and click Check now on the right.
The system displays a list of all template fields that you have not mapped. You can go
back and map these fields now. Alternatively, you may want to expand your data source
to include as many expected fields as possible, and then re-create the exact data profile.
Symantec recommends that you include as many expected data fields as possible.
17 In the Indexing section of the screen, select one of the following options:
■ Submit Indexing Job on Save
Select this option to begin indexing the data source when you save the exact data
profile.
■ Submit Indexing Job on Schedule
Select this option to index the data source according to a specific schedule. Make a
selection from the Schedule drop-down list and specify days, dates, and times as
required.
See “About index scheduling for EDM” on page 531.
See “Scheduling Exact Data Profile indexing for EDM” on page 548.

18 Click Finish.
After Symantec Data Loss Prevention finishes indexing, it deletes the original data source
from the Enforce Server. After you index a data source, you cannot change its schema.
If you change column mappings for a data source after you index it, you must create a
new exact data profile.
After the indexing process is complete you can create new Content Matches Exact Data
conditions that can be added to a rule that references the Exact Data Profile you have
created.
See “Configuring the Content Matches Exact Data policy condition for EDM” on page 551.

Mapping Exact Data Profile fields for EDM

After you have added and configured the data source file and settings, the Manage > Data
Profiles > Exact Data > Add Exact Data Profile screen lets you map the fields from the data
source file to the Exact Data Profile you configure.
To enable error checking on a field in a data source or to use the index with a policy template
that uses a system field, you must map the field in the data source to the system field. The
Field Mappings section lets you map the columns in the original data source to system fields
in the Exact Data Profile.
Detecting content using Exact Data Matching (EDM) 546
Configuring Exact Data profiles for EDM

Table 26-7 Field mapping options

Field Description

Data Source Field If you selected the Column Names option at the Add Exact Data Profile screen, this column
lists the values that are found in the first row from the data source. If you did not select this
option, this column lists the columns by generic names (such as Col 1, Col 2, and so on).
Note: If you implement a data owner exception, you must map either or both the email address
and domain fields.

See “Configuring the Content Matches Exact Data policy condition for EDM” on page 551.

System Field Select the system field for each column.

A system field value (except None Selected) cannot be mapped to more than one column.

Some system fields have system patterns associated with them (such as social security
number) and some do not (such as last name).

See “Using system-provided pattern validators for EDM profiles” on page 547.

Check mappings Select a policy template from the drop-down list to compare the field mappings against and
against policy then click Check now.
template
All policy templates that implement EDM appear in the drop-down menu, including any you
have imported.

See “Choosing an Exact Data Profile” on page 409.

If you plan to use more than one policy template, select one and check it, and then select
another and check it, and so on.

If there are any fields in the policy template for which no data exists in the data source, a
message appears listing the missing fields. You can save the profile anyway or use a different
Exact Data Profile.

Advanced View If you want to customize the schema for the exact data profile, click Advanced View to display
the advanced field mapping options.

Table 26-8 lists and describes the additional columns you can specify in the Advanced View
screen.

Indexing Select one of the indexing options.

See “Scheduling Exact Data Profile indexing for EDM” on page 548.

Finish Click Finish when you are done configuring the Exact Data Profile.

From the Advanced View you map the system and data source fields to system patterns.
System patterns map the specified structure to the data in the Exact Data Profile and enable
efficient error checking and hints for the indexer.
Detecting content using Exact Data Matching (EDM) 547
Configuring Exact Data profiles for EDM

Table 26-8 Advanced View options for EDM

Field Description

Custom Name If you select Custom Name for a System Field, enter a unique name for it and then select a
value for Type. The name is limited to 60 characters.

Type If you select a value other than Custom for a System Field, some data types automatically
select a value for Type. For example, if you select Birth Date for the System Field, Date is
automatically selected as the Type. You can accept it or change it.

Some data types do not automatically select a value for Type. For example, if you select
Account Number for the System Field, the Type remains unselected. You can specify the
data type of your particular account numbers.

See “Using system-provided pattern validators for EDM profiles” on page 547.

Description Click the link (description) beside the Type column header to display a pop-up window
containing the available system data types.

See “Using system-provided pattern validators for EDM profiles” on page 547.

Simple View Click Simple View to return to the Simple View (with the Custom Name and Type columns
hidden).

See “Creating and modifying Exact Data Profiles for EDM” on page 541.

Using system-provided pattern validators for EDM profiles

Table 26-9 lists and describes the system-provided data validators for EDM profiles.

Table 26-9 System-provided data validators for EDM profiles

Type Description

Credit Card Number The Credit Card pattern is built around knowledge about various international credit cards,
their registered prefixes, and number of digits in account numbers. The following types of
Credit Cards patterns are validated: MasterCard, Visa, America Express, Diners Club, Discover,
Enroute, and JCB.

Optional spaces in designated areas within credit cards numbers are recognized. Note that
only spaces in generally accepted locations (for example, after every 4th digit in MC/Visa) are
recognized. Note that the possible location of spaces differs for different card types. Credit
card numbers are validated using checksum algorithm. If a number looks like a credit card
number (that is, it has correct number of digits and correct prefix), but does not pass checksum
algorithm, it is not considered a credit card, but just a number.

Email Email is a sequence of characters that looks like the following: [email protected], where
string may contain letters, digits, underscore, dash, and dot, and 'tld' is one of the approved
DNS top-level generic domains, or any two letters (for country domains).
Detecting content using Exact Data Matching (EDM) 548
Configuring Exact Data profiles for EDM

Table 26-9 System-provided data validators for EDM profiles (continued)

Type Description

IP Address IP Address is a collection of 4 sequences of between 1 and 3 digits, separated by dots.

Number Number is either float or integer, either by itself or in round brackets (parenthesis).

Percent Percent is a number immediately followed by the percent sign ("%"). No space is allowed
between a number and a percent sign.

Phone Only US and Canadian telephone numbers are recognized. The phone number must start
with any digit but 1, with the exception of numbers that include a country code.
Phone number can be one of the following formats:

■ 7 digits (no spaces or dashes)

■ Same as above, preceded by 3 digits, or by 3 digits in round brackets, followed by spaces
or dashes
■ 3 digits, followed by optional spaces or dashes, followed by 4 digits
■ Same as above, preceded by the number 1, followed by spaces or dashes

All of these cases can be optionally followed by an extension number, preceded by spaces or
dashes. The extension number is 2 to 5 digits preceded by any of the following (case
insensitive): 'x' 'ex' 'ext' 'exten' 'extens' 'extensions' optionally followed by a dot and spaces.
Note: The system does not recognize the pattern XXX-XXX-XXXX as a valid phone number
format because this format is frequently used in other forms of identification. If your data source
contains a column of phone numbers in that format, select None Selected to avoid confusion
between phone numbers and other data.

Postal Code Only US ZIP codes and Canadian Postal Codes are recognized. The US ZIP code is a sequence
of 5 digits, optionally followed by dash, followed by another 4 digits. The Canadian Postal
Code is a sequence like K2B 8C8, that is, "letter-digit-letter-space-digit-letter-digit" where
space(s) in the middle is optional.

Social Security Only US Social Security Numbers are recognized. The SOCIAL SECURITY NUMBER is 3
Number digits, optionally followed by spaces or dashes, followed by 2 digits, optionally followed by
spaces or dashes, followed by 4 digits.

Scheduling Exact Data Profile indexing for EDM

When you configure an Exact Data Profile, you can set a schedule for indexing the data source
(Submit Indexing on Job Schedule).
See “About index scheduling for EDM” on page 531.
Before you set up a schedule, consider the following recommendations:
■ If you update your data sources occasionally (for example, less than once a month), there
is no need to create a schedule. Index the data each time you update the data source.
Detecting content using Exact Data Matching (EDM) 549
Configuring Exact Data profiles for EDM

■ Schedule indexing for times of minimal system use. Indexing affects performance throughout
the Symantec Data Loss Prevention system, and large data sources can take time to index.
■ Index a data source as soon as you add or modify the corresponding exact data profile,
and re-index the data source whenever you update it. For example, consider a scenario
whereby every Wednesday at 2:00 A.M. you update the data source. In this case you
should schedule indexing every Wednesday at 3:00 A.M. Do not index data sources daily
as this can degrade performance.
■ If you need to update indexes frequently (for example, daily), Symantec recommend that
you use the Remote EDM Indexer.
■ Monitor results and modify your indexing schedule accordingly. If performance is good and
you want more timely updates, for example, schedule more frequent data updates and
indexing.
The Indexing section lets you index the Exact Data Profile as soon as you save it
(recommended) or on a regular schedule as follows:

Table 26-10 Scheduling indexing for Exact Data Profiles for EDM

Parameter Description

Submit Indexing Select this option to index the Exact Data Profile when you click Save.
Job on Save

Index Once On – Enter the date to index the document profile in the format MM/DD/YY. You can also click the
date widget and select a date.

At – Select the hour to start indexing.

Index Daily At – Select the hour to start indexing.

Until – Select this check box to specify a date in the format MM/DD/YY when the indexing should
stop. You can also click the date widget and select a date.

Index Weekly Day of the week – Select the day(s) to index the document profile.

At – Select the hour to start indexing.

Until – Select this check box to specify a date in the format MM/DD/YY when the indexing should
stop. You can also click the date widget and select a date.

Index Monthly Day – Enter the number of the day of each month you want the indexing to occur. The number
must be 1 through 28.

At – Select the hour to start indexing.

Until – Select this check box to specify a date in the format MM/DD/YY when the indexing should
stop. You can also click the date widget and select a date.
Detecting content using Exact Data Matching (EDM) 550
Configuring Exact Data profiles for EDM

See “Mapping Exact Data Profile fields for EDM” on page 545.
See “Creating and modifying Exact Data Profiles for EDM” on page 541.

Managing and adding Exact Data Profiles for EDM

You manage and create Exact Data Profiles for EDM at the Manage > Data Profiles > Exact
Data screen. Once a profile has been created, the Exact Data screen lists all Exact Data
Profiles configured in the system.
See “About the Exact Data Profile and index” on page 528.

Table 26-11 Exact Data screen actions for EDM

Action Description

Add EDM profile Click Add Exact Data Profile to define a new Exact Data Profile.

See “Configuring Exact Data profiles for EDM” on page 534.

Edit EDM profile To modify an existing Exact Data Profile, click the name of the profile, or click the pencil icon
at the far right of the profile row.

See “Creating and modifying Exact Data Profiles for EDM” on page 541.

Remove EDM profile Click the red X icon at the far right of the profile row to delete the Exact Data Profile from the
system. A dialog box confirms the deletion.
Note: You cannot edit or remove a profile if another user currently modifies that profile, or if a
policy exists that depends on that profile.

Download EDM Click the download profile link to download and save the Exact Data Profile.
profile
This is useful for archiving and sharing profiles across environments. The file is in the binary
*.edm format.

Refresh EDM profile Click the refresh arrow icon at the upper right of the Exact Data screen to fetch the latest status
status of the indexing process.

If you are in the process of indexing, the system displays the message "Indexing is starting."
The system does not automatically refresh the screen when the indexing process completes.

Table 26-12 Exact Data screen details for EDM

Column Description

Exact Data Profile The name of the exact data profile.

Last Active Version The version of the exact data profile and the name of the detection server that runs the profile.
Detecting content using Exact Data Matching (EDM) 551
Configuring EDM policies

Table 26-12 Exact Data screen details for EDM (continued)

Column Description

Status The current status of the exact data profile, which can be any of the following:
■ Next scheduled indexing (if it is not currently indexing)
■ Sending an index to a detection server
■ Indexing
■ Deploying to servers

In addition, the current status of the indexing process for each detection server, which can be
any of the following:

■ Completed, including a completion date

■ Pending index completion (waiting for the Enforce Server to finish indexing the exact data
source file)
■ Replicating indexing
■ Creating index (internally)
■ Building caches

Error messages The Exact Data screen displays any error messages in red.

For example, if the Exact Data Profile is corrupt or does not exist, the system displays an error
message.

Configuring EDM policies

This section describes how to configure EDM policy conditions.
See “Configuring the Content Matches Exact Data policy condition for EDM” on page 551.
See “Configuring Data Owner Exception for EDM policy conditions” on page 554.
See “Configuring the Sender/User based on a Profiled Directory policy condition for EDM”
on page 554.
See “Configuring the Recipient based on a Profiled Directory policy condition for EDM”
on page 555.
See “Configuring Advanced Settings for EDM policies” on page 557.

Configuring the Content Matches Exact Data policy condition for

EDM
Once you have defined the Exact Data Profile and indexed the data source, you configure one
or more Content Matches Exact Data conditions in policy rules
See “About the Content Matches Exact Data From condition for EDM” on page 532.
Detecting content using Exact Data Matching (EDM) 552
Configuring EDM policies

Table 26-13 Configure the Content Matches Exact Data policy condition for EDM

Steps Action Description

1 Configure an EDM Create a new EDM detection rule in a policy, or modify an existing EDM rule.
policy detection rule.
See “Configuring policies” on page 413.

See “Configuring policy rules” on page 417.

Match Data Rows when All of these match

2 Select the fields to The first thing you do when configuring the EDM condition is select each data
match. field that you want the condition to match. You can select all or deselect all fields
at once. The system displays all the fields or columns that were included in the
index. You do not have to select all the fields, but you should select at least 2 or
3, one of which must be unique, such as social security number, credit card
number, and so forth.

See “Best practices for using EDM” on page 601.

3 Choose the number of Choose the number of the selected fields to match from the drop down menu.
selected fields to match. This number represents the number of fields of those selected that must be present
in a message to trigger a match. You must select at least as many fields to match
as the number of data fields you check. For example, if you choose 2 of the
selected fields from the menu, you must have checked at least two fields present
in a message for detection.

See “Ensure data source has at least one column of unique data (EDM)”
on page 602.

4 Select the WHERE The WHERE clause option matches on the specified field value. You specify a
clause to enter specific WHERE clause value by selecting an exact data field from the menu and by
field values to match entering a value for that field in the adjacent text box. If you enter more than one
(optional). value, separate the values with commas.

See “Use a WHERE clause to detect records that meet specific criteria (EDM)”
on page 609.

For example, consider an Exact Data Profile for "Employees" with a "State" field
containing state abbreviations. In this example, to implement the WHERE clause,
you select (check) WHERE, choose "State" from the drop-down list, and enter
CA,NV in the text box. This WHERE clause then limits the detection server to
matching messages that contain either CA or NV as the value for the State field.
Note: You cannot specify a field for WHERE that is the same as one of the
selected matched fields.

Ignore Data Rows when Any of these match

5 Ignore data owners Selecting this option implements Data Owner Exception.
(optional).
See “Configuring Data Owner Exception for EDM policy conditions” on page 554.
Detecting content using Exact Data Matching (EDM) 553
Configuring EDM policies

Table 26-13 Configure the Content Matches Exact Data policy condition for EDM (continued)

Steps Action Description

6 Exclude data field You can use the exclude data field combinations to specify combinations of data
combinations (optional). values that are exempted from detection. If the data appears in exempted pairs
or groups, it does not cause a match. Excluded combinations are only available
when matching 2 or 3 fields. To enable this option, you must select 2 or 3 fields
to match from the _ of the selected fields menu at the top of the condition
configuration.

See “Leverage exception tuples to avoid false positives (EDM)” on page 609.

To implement excluded combinations, select an option from each Field N column

that appears. Then click the right-arrow icon to add the field combination to the
Excluded Combinations list. To remove a field from the list, select it and click
the left-arrow icon.
Note: Hold down the Ctrl key to select more than one field in the right-most
column.

Additional match condition parameters

7 Select an incident Enter or modify the minimum number of matches required for the condition to
minimum. report an incident.

For example, consider a scenario where you specify 1 of the selected fields for
a social security number field and an incident minimum of 5. In this situation the
engine must detect at least five matching social security numbers in a single
message to trigger an incident.
See “Match count variant examples (EDM)” on page 570.

8 Select components to Select one or more message components to match on:

match on.
■ Envelope – The header of the message.
■ Subject – (Not available for EDM.)
■ Body – The content of the message.
■ Attachments – The content of any files attached to or transported by the
message.

See “Selecting components to match on” on page 423.

9 Select one or more Select this option to create a compound condition. All conditions must match for
conditions to also the rule to trigger an incident.
match.
You can Add any available condition from the list.

See “Configuring compound match conditions” on page 429.

10 Test and troubleshoot See “Test and tune policies to improve match accuracy” on page 453.
the policy.
See “Troubleshooting policies” on page 445.
Detecting content using Exact Data Matching (EDM) 554
Configuring EDM policies

Configuring Data Owner Exception for EDM policy conditions

To except data owners from detection, you must include in your Exact Data Profile either an
email address or a domain address field (for example, symantec.com). Once Data Owner
Exception (DOE) is enabled, if the sender or recipient of confidential information is the data
owner (by email address or domain), the detection server allows the data to be sent or received
without generating an incident
To configure DOE for an EDM policy condition
1 When you are configuring the Content Matches Exact Data condition, select the Ignore
data owners option.
2 Select one of the following options:
■ Sender matches — Select this option to EXCLUDE the data sender from detection.
■ Any or All Recipient matches — Select one of these options to EXCLUDE any or
all data recipient(s) from detection.

Note: When you configure DOE for the EDM condition, you cannot select a value for Ignore
Sender/Recipient that is the same as one of the matched fields.

See “About Data Owner Exception for EDM” on page 532.

Configuring the Sender/User based on a Profiled Directory policy

condition for EDM
The Sender/User based on a Directory from detection rule lets you create detection rules
based on sender identity or (for endpoint incidents) user identity. This condition requires an
Exact Data Profile.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.
After you select the Exact Data Profile, when you configure the rule, the directory you selected
and the sender identifier(s) appear at the top of the page.
Table 26-14 describes the parameters for configuring the Sender/User based on a Directory
from an EDM Profile condition.
Detecting content using Exact Data Matching (EDM) 555
Configuring EDM policies

Table 26-14 Configuring the Sender/User based on a Directory from an EDM Profile condition

Parameter Description

Where Select this option to have the system match on the specified field values. Specify the values by
selecting a field from the drop-down list and typing the values for that field in the adjacent text box.
If you enter more than one value, separate the values with commas.

For example, for an Employees directory group profile that includes a Department field, you would
select Where, select Department from the drop-down list, and enter Marketing,Sales in the text
box. If the condition is implemented as a rule, in this example a match occurs only if the sender or
user works in Marketing or Sales (as long as the other input content meets all other detection criteria).
If the condition is implemented as an exception, in this example the system ignores from matching
messages from a sender or user who works in Marketing or Sales.

Is Any Of Enter or modify the information you want to match. For example, if you want to match any sender
in the Sales department, select Department from the drop-down list, and then enter Sales in this
field (assuming that your data includes a Department column). Use a comma-separated list if you
want to specify more than one value.

Configuring the Recipient based on a Profiled Directory policy

condition for EDM
The Recipient based on a Directory from condition lets you create detection methods based
on the identity of the recipient. This method requires an Exact Data Profile.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.
After you select the Exact Data Profile, when you configure the rule, the directory you selected
and the recipient identifier(s) appear at the top of the page.
Table 26-15 describes the parameters for configuring Recipient based on a Directory from
an EDM profile condition.

Table 26-15 Configuring the Recipient based on a Directory from an EDM profile condition

Parameter Description

For example, for an Employees directory group profile that includes a Department field, you would
select Where, select Department from the drop-down list, and enter Marketing, Sales in the text
box. For a detection rule, this example causes the system to capture an incident only if at least one
recipient works in Marketing or Sales (as long as the input content meets all other detection criteria).
For an exception, this example prevents the system from capturing an incident if at least one recipient
works in Marketing or Sales.
Detecting content using Exact Data Matching (EDM) 556
Configuring EDM policies

Table 26-15 Configuring the Recipient based on a Directory from an EDM profile condition
(continued)

Parameter Description

Is Any Of Enter or modify the information you want to match. For example, if you want to match any recipient
in the Sales department, select Department from the drop-down list, and then enter Sales in this
field (assuming that your data includes a Department column). Use a comma-separated list if you
want to specify more than one value.

About configuring natural language processing for Chinese,

Japanese, and Korean for EDM policies

Introducing EDM token matching

Symantec Data Loss Prevention detection servers support natural language processing for
Chinese, Japanese, and Korean (CJK) in policies that use Exact Data Matching (EDM)
detection. When natural language processing for CJK languages is enabled, the detection
server validates CJK tokens before reporting a match, which improves matching accuracy.

EDM token matching examples for CJK languages

Table 26-16 provides EDM token matching examples for Chinese, Japanese, and Korean
languages. All examples assume that the keyword condition is configured to match on whole
words only.
If token verification is enabled, the message size must be sufficient for the token verifier to
recognize the language. For example: the message “東京都市部の人口” is too small for a
message for the token verification process to recognize the language of the message. The
following message is a sufficient size for token verification processing:
今朝のニュースによると東京都市部の人口は増加傾向にあるとのことでした。全国的な人口
減少の傾向の中、東京への一極集中を表しています。

Table 26-16 EDM token matching examples for CJK

Language Keyword Matches on server with token Matches on server with

validation ON token validation OFF

Chinese 通信数字无线通信数字无线通信交通信息网站

Japanese 京都市京都府京都市左京区京都府京都市左京区東京都市部

の人口

Korean 정부 정부의 방침 정부의 방침 의정부 경전철

Detecting content using Exact Data Matching (EDM) 557
Configuring EDM policies

Enabling and using CJK token verification for EDM

To use token verification for Chinese, Japanese, and Korean (CJK) languages you must enable
it on each detection server by setting the advanced server setting EDM.TokenVerifierEnabled
to true. In addition, there must be a sufficient amount of message text for the system to
recognize the language.
Table 26-17 lists and describes the detection server parameter that lets you enable token
verification for CJK languages.

Table 26-17 EDM token verification parameter

Setting Default Description

EDM.TokenVerifierEnabled false Default is disabled (false).

If enabled (true), the server validates tokens for Chinese,

Japanese, and Korean language keywords.

See “Enable keyword token verification for CJK” on page 848. describes how to enable and
use token verification for CJK keywords.
Enable EDM token verification for CJK
1 Log on to the Enforce Server as an administrative user.
2 Navigate to the System > Servers and Detectors > Overview > Server/Detector Detail
- Advanced Settings screen for the detection server you want to configure.
See “Advanced server settings” on page 285.
3 Locate the parameter EDM.TokenVerifierEnabled.
4 Change the value to true from false (default).
Setting the server parameter EDM.TokenVerifierEnabled = true enables token validation
for CJK token detection.
5 Save the detection server configuration.
6 Recycle the detection server.

Configuring Advanced Settings for EDM policies

EDM has various advanced settings available at the System > Servers and Detectors >
Overview > Server/Detector Detail - Advanced Settings screen for the chosen detection
server. Use caution when modifying these settings on a server. Check with Symantec Data
Loss Prevention Support before changing any of the settings on this screen. Changes to these
settings do not take effect until after the server is restarted.
See “Advanced server settings” on page 285.
Detecting content using Exact Data Matching (EDM) 558
Configuring EDM policies

Table 26-18 Advanced Settings for EDM indexing and detection

EDM parameter Default Description

EDM.MatchCountVariant 3 This setting specifies how matches are counted.

■ 1 - Counts the number of token sets matched regardless of use
of the same tokens across several matches.
■ 2 - Counts the number of unique token sets.
■ 3 - Counts the number of unique supersets of token sets. (default)

See “Match count variant examples (EDM)” on page 570.

EDM.MaximumNumberOfMatches 100 Defines a top limit on the number of matches returned from each
ToReturn RAM index search. For multi-file indices, this limit is applied to each
sub-index search independently before the search results are
combined. As a result the number of actual matches can exceed
this limit for multiple file indices.

EDM.RunProximityLogic true If true (default), this setting runs the token proximity check. The
free-form text proximity is defined by the setting
EDM.SimpleTextProximityRadius. The tabular text proximity
is defined by belonging to the same table row.
Note: Disabling proximity is not recommended because it can
negatively affect the performance of the system.

EDM.SimpleTextProximityRadius 35 Provides the baseline range for proximity checking a matched token.
This value is multiplied by the number of required matches to equal
the complete proximity check range.

To keep the same "required match density," the proximity check

range behaves like a moving window in a text page. D is defined as
the proportionality factor for the window and is set in the policy
condition by choosing how many fields to match on for the EDM
condition. N is the SimpleTextProximityRadius value. A number of
tokens are in the proximity range if the first token in is within N x D
words from the last token. The proximity check range is directly
proportional to the number of matches by a factor of D.

See “Proximity matching example for EDM” on page 572.

Note: Increasing the radius value higher than the default can
negatively affect system performance and is not recommended.

EDM.TokenVerifierEnabled false Default is disabled (false).

If enabled (true), the server validates tokens for Chinese, Japanese,

and Korean language keywords.
Detecting content using Exact Data Matching (EDM) 559
Configuring EDM policies

Table 26-18 Advanced Settings for EDM indexing and detection (continued)

EDM parameter Default Description

Lexer.IncludePunctuationInWords true If true, during detection punctuation characters are considered as

part of a token.

If false, during detection punctuation within a token or multi-token

is treated as white space.

See “Multi-token with punctuation (EDM)” on page 563.

Note: This setting applies to detection content, not to indexed
content.

Lexer.MaximumNumberOfTokens 30000 Maximum number of tokens extracted from each message

component for detection. Applicable to all detection technologies
where tokenization is required (EDM, profiled DGM, and the system
patterns supported by those technologies). Increasing the default
value may cause the detection server to run out of memory and
restart.

Lexer.Validate true If true, performs system pattern-specific validation during indexing.

Setting this to false is not recommended.

See “Using system-provided pattern validators for EDM profiles”

on page 547.

MessageChain.NumChains Varies This number varies depending on detection server type. It is either
4 or 8. The number of messages, in parallel, that the filereader
processes. Setting this number higher than 8 (with the other default
settings) is not recommended. A higher setting does not substantially
increase performance and there is a much greater risk of running
out of memory. Setting this to less than 8 (in some cases 1) helps
when processing big files, but it may slow down the system
considerably.

Note: Maximum tokens per multi-token and stopwords are calculated and evaluated respectively
during indexing. TheLexer.MaxTokensPerMultiToken and Lexer Stopword Languages Advanced
Server settings are no longer necessary. The stopword language on Enforce is specified in
the indexer.properties file at C:\Program Files\Symantec\Data Loss
Prevention\Indexer\15.5\Protect\config\Indexer.properties. In English, the property
is stopword_languages = en.
Detecting content using Exact Data Matching (EDM) 560
Using multi-token matching with EDM

Using multi-token matching with EDM

EDM policy matching is based on tokens in the index. For languages based on the Latin
alphabet, a token is a word or string of alphanumeric characters delimited by spaces. For
Chinese, Japanese, and Korean languages, a token is determined by other means. Tokens
are normalized so that formatting and case are ignored. At run-time the server performs a
full-text search against an inbound message, checking each word against the index for potential
matches. The matching algorithm compares each word in the message with the contents of
each token in the index.
A multi-token cell is a cell in the index that contains multiple words separated by spaces,
leading or trailing punctuation, or alternative Latin and Chinese, Japanese, or Korean language
characters. The sub-token parts of a multi-token cell obey the same rules as single-token cells:
they are normalized according to their pattern where normalization can apply. Inbound message
data must match a multi-token cell exactly, including whitespace, punctuation, and stopwords
(assuming the default settings).
For example, an indexed cell containing the string "Bank of America" is a multi-token comprising
3 sub-token parts. During detection, the inbound message "bank of america" (normalized)
matches the multi-token cell, but "bank america" does not.
Multi-token matching is enabled by default. Multi-token cells are more computationally expensive
than single-token cells. If the index includes multi-token cells, you must verify that you have
enough memory to index, load, and process the EDM profile.
See “Characteristics of multi-token cells (EDM)” on page 560.
See “Memory requirements for EDM” on page 579.

Characteristics of multi-token cells (EDM)

Table 26-19 lists and describes characteristics of multi-token matching.
See “Using multi-token matching with EDM” on page 560.

Table 26-19 Characteristics of multi-tokens for EDM

Characteristic Description

The number of tokens in a single cell is limited to 200 The number of characters is not limited. In the case of a
tokens. CJK token, each character is treated as a single token and
the number of CJK characters is limited to 200 characters.

Whitespace in Latin multi-token cells is considered, but See “Multi-token with spaces (EDM)” on page 561.
multiple whitespaces are normalized to 1.
Detecting content using Exact Data Matching (EDM) 561
Using multi-token matching with EDM

Table 26-19 Characteristics of multi-tokens for EDM (continued)

Characteristic Description

Punctuation immediately preceding and following a token See “Multi-token with punctuation (EDM)” on page 563.
or sub-token is always ignored.
See “Additional examples for multi-token cells with
punctuation (EDM)” on page 564.

You can configure how punctuation within a token or Lexer.IncludePunctuationInWords = true

multi-token is treated during detection. For most cases the
See “Configuring Advanced Settings for EDM policies”
default setting ("true") is appropriate. If set to "false,"
on page 557.
punctuation is treated as whitespace.

For proximity range checking the sub-token parts of a See “Proximity matching example for EDM” on page 572.
multi-token are counted as single tokens.

The system does not consider stopwords when matching See “Multi-token with stopwords (EDM)” on page 562.
multi-tokens. In other words, stopwords are not excluded.

Multi-tokens are more computationally expensive than See “Memory requirements for EDM” on page 579.
single tokens and require additional memory for indexing,
loading, and processing.

Multi-token with spaces (EDM)

Table 26-20 shows examples of multi-tokens with spaces.

Table 26-20 Multi-token cell with spaces examples

Description Indexed content Detected content Explanation

Cell contains space Bank of America Bank of America Cell with spaces is
multi-token.

Multi-token must match

exactly.

Cells contains multiple Bank of America Bank of America Multiple spaces are
spaces normalized to one.

Cells contain space between 傠傫傠傫傠傫傠傫 White spaces between CKJ

CKJ characters characters are ignored.
傠傫傠傫

Cells contain space between EDM 傠傫 EDM 傠傫 White spaces between Latin
Latin and CJK characters and CJK characters are
EDM傠傫
ignored.
Detecting content using Exact Data Matching (EDM) 562
Using multi-token matching with EDM

Multi-token with stopwords (EDM)

Stopwords are common words, such as articles and prepositions. When creating single-tokens,
the EDM indexing process ignores words found in the EDM stopword list (\Program
Data\Symantec\DataLossPrevention\EnforceServer\15.5\config\stopwords), as well
as single letters. However, when creating multi-tokens, stopwords and single letters are not
ignored. Instead, they are part of the multi-token.
Table 26-21 shows multi-token matches with stopwords, single letters, and single digits.

Table 26-21 Cell contains stopwords or single letter or single digit (EDM)

Description Cell content Should match Explanation

Cell contains stopword. throw other ball throw other ball Common word ("other") is
filtered out during indexing
but not when it is part of a
multi-token.

Cell contains single letter. throw a ball throw a ball Single letter ("a") is filtered
out, but not when it is part of
a multi-token.

Cell contains single digit. throw 1 ball throw 1 ball Unlike single-letter words
that are stopwords, single
digits are never ignored.

Multi-token with mixed language characters (EDM)

Table 26-22 shows examples of multi-tokens with mixed Latin and CJK characters.

Table 26-22 Multi-token cell with Latin and CJK characters examples (EDM)

Description Cell content Should match Explanation

EDM ignores whitespace

between the Latin
characters and the CJK
token.
Detecting content using Exact Data Matching (EDM) 563
Using multi-token matching with EDM

Table 26-22 Multi-token cell with Latin and CJK characters examples (EDM) (continued)

Description Cell content Should match Explanation

Cell includes Latin and CJK ABC 傠傫 ABC 傠傫 Multiple spaces are ignored.
with one or more spaces.
傠傥 ABC 傠傥 ABC

Also matches with:

ABC傠傫

傠傫ABC

Cell contains Latin or CJK 什仁仂仃仄仅仇仈仉什仁仂仃仄仅仇仈仉 Single-token cell.

with numbers. 147(什仂仅 51-1) 147(什仂仅 51-1)

Multi-token with punctuation (EDM)

Punctuation is always ignored if it comes at the beginning (leading) or end (trailing) of a token
or multi-token. Whether punctuation included in a token or multi-token is required for matching
depends on the Advanced Server Setting Lexer.IncludePunctuationInWords, which by
default is set to true (enabled).
See “Multi-token punctuation characters (EDM)” on page 569.

Note: For convenience purposes the Lexer.IncludePunctuationInWords parameter is referred

to by the three-letter acronym "WIP" throughout this section.

The WIP setting operates at detection-time to alter how matches are reported. For most EDM
policies you should not change the WIP setting. For a few limited situations, such as account
numbers or addresses, you may need to set IncludePunctuationInWords = false depending
on your detection requirements.
See “Multi-token punctuation characters (EDM)” on page 569.
Table 26-23 lists and explains how multi-token matching works with punctuation.

Table 26-23 Multi-token punctuation table (EDM)

Indexed Detected WIP setting Match Explanation

content content

a.b a.b TRUE Yes The indexed content and the detected content are
exactly the same.

FALSE No The detected content is treated as "a b" and is therefore

not a match.
Detecting content using Exact Data Matching (EDM) 564
Using multi-token matching with EDM

Table 26-23 Multi-token punctuation table (EDM) (continued)

Indexed Detected WIP setting Match Explanation

content content

a.b ab TRUE No The indexed content and the detected content are
different.

FALSE No The indexed content and the detected content are

different.

ab a.b TRUE No The indexed content and the detected content are
different.

FALSE Yes The detected content is treated as "a b" and is therefore
a match.

ab ab TRUE Yes The indexed content and the detected content are
exactly the same

FALSE Yes The indexed content and the detected content are
exactly the same

Additional examples for multi-token cells with punctuation (EDM)

Table 26-24 lists and describes some additional examples for multi-token cells with punctuation.
In these examples, the main thing to keep in mind is that during indexing, if a token includes
punctuation marks between characters the punctuation is always retained. This means that
EDM cannot detect that cell if the WIP setting is false. In other words, if indexed data has cell
which has a token with internal punctuation, the WIP setting should be set to true.

Table 26-24 Additional use cases for multi-token cells with punctuation (EDM)

Description Indexed content Detected content Explanation

Table 26-24 Additional use cases for multi-token cells with punctuation (EDM) (continued)

Description Indexed content Detected content Explanation

Cell contains internal O'NEAL ST. O'NEAL ST The indexed content is a

Cell contains Asian 傠傫##傠傫傠傫##傠傫 (if WIP true) The indexed content is a
language characters (CJK) single token cell.
with indexed internal
During detection, Asian
punctuation.
language characters (CJK)
with internal punctuation is
affected by the WIP setting.
Thus, in this example 傠傫
##傠傫 matches only if the
WIP setting is true.

If the WIP setting is false, 傠

傫##傠傫 is considered a
multi-token because the
internal punctuation is
treated as whitespace. Thus,
no content can match.

Cell contains Asian 傠傫傠傫傠傫傠傫 The indexed content is a

Table 26-24 Additional use cases for multi-token cells with punctuation (EDM) (continued)

Description Indexed content Detected content Explanation

Cell contains mix of Latin EDM##傠傫 EDM 傠傫 The indexed content is a

and CJK characters with multi-token cell.
punctuation separating the
A cell with alternate Latin
Latin and Asian characters.
and CJK characters is
always a multi-token and
punctuation between Latin
and Asian characters is
always treated as a single
white space regardless of
the WIP setting.

Cell contains mix of Latin DLP##EDM 傠傫##傠傥 DLP##EDM##傠傫##傠傥 The indexed content is a
and CJK characters with (if WIP true) multi-token cell.
internal punctuation.
DLP##EDM 傠傫##傠傥 (if During detection,
WIP true) punctuation between the
Latin and Asian characters
is treated as a single
whitespace and leading and
trailing punctuation is
ignored.

If the WIP setting is true the

punctuation internal to the
Latin characters and internal
to the Asian character is
retained.

If the WIP setting is false, no

content can match because
internal punctuation is
ignored.
Detecting content using Exact Data Matching (EDM) 567
Using multi-token matching with EDM

Table 26-24 Additional use cases for multi-token cells with punctuation (EDM) (continued)

Description Indexed content Detected content Explanation

Cell contains mix of Latin DLP EDM 傠傫傠傥 DLP EDM 傠傫傠傥 The indexed content is a
and CJK characters with multi-token cell.
DLP#EDM 傠傫#傠傥 (if
internal punctuation.
WIP false) During detection,
punctuation between the
DLP#EDM##傠傫#傠傥 (if
Latin and Asian characters
WIP false)
is treated as a single
whitespace and leading and
trailing punctuation is
ignored. Thus, it matches as
indexed.

If the WIP setting is false, it

matches DLP;EDM##傠傫
#傠傥 because internal
punctuation is ignored.

Some special use cases for system-recognized data patterns (EDM)

EDM provides validation for and recognition of the following special data patterns:
■ Credit card number
■ Email address
■ IP address
■ Number
■ Percent
■ Phone number (US, Canada)
■ Postal code (US, Canada)
■ Social security number (US SSN)
See “Using system-provided pattern validators for EDM profiles” on page 547.

Note: It is a best practice to always validate your index against the recognized system patterns
when the data source includes one or more such column fields. See “Map data source column
to system fields to leverage validation (EDM)” on page 605.

The general rule for system-recognized patterns is that the WIP setting does not apply during
detection. Instead, the rules for that particular pattern apply. In other words, if the pattern is
recognized during detection, the WIP setting is not checked. This is always true if the pattern
Detecting content using Exact Data Matching (EDM) 568
Using multi-token matching with EDM

is a string of characters such as an email address, and if the cell contains a number that
conforms to one of the recognized number patterns (such as CCN or SSN).
In addition, even if the pattern is a generic number such as account number that does not
conform to one of the recognized number patterns, the WIP setting may still not apply. To
ensure accurate matching for generic numbers that do not conform to one of the
system-recognized patterns, you should not include punctuation in these number cells. If the
cell contents conforms to one of the system-recognized patterns, the punctuation rules for that
pattern apply and the WIP setting does not.
See “Do not use the comma delimiter if the data source has number fields (EDM)” on page 605.
See Table 26-25 on page 568. lists and describes examples for detecting system-recognized
data patterns.

Caution: This list is not exhaustive. It is provided for informational purposes only to ensure that
you are aware that data that matches system-defined patterns takes precedence and the WIP
setting is ignored. Before deploying your EDM policies into production, you must test detection
accuracy and adjust the index accordingly to ensure that the data that you have indexed
matches as expected during detection.

Table 26-25 Some special use cases for system-recognized data patterns (EDM)

Description Indexed content Detected content Explanation

Cell contains an email [email protected] [email protected] An email address is indexed

address. and detected as a
single-token regardless of
the WIP setting. It must
match exactly as indexed. If
you were to set WIP to false,
"person example com"
would not match as a
multi-token and does not
match the indexed
single-token.
Detecting content using Exact Data Matching (EDM) 569
Using multi-token matching with EDM

Table 26-25 Some special use cases for system-recognized data patterns (EDM) (continued)

Description Indexed content Detected content Explanation

Cells contains a 10-digit ########## ########## The WIP setting is ignored

account number. because the number
(###) ### ####
conforms to the phone
(###) ###-#### number pattern and its rules
take precedence.

## ###### ## ## ###### ## Must match exactly. The

pattern ##-######-## does
not match even if WIP is set
to false.

### #### ### ### #### ### Must match exactly. The
pattern ###-####-### does
not match even if WIP is set
to false.

Multi-token punctuation characters (EDM)

In EDM, a multi-token cell is any cell that has been indexed that contains punctuation (as well
as spaces or alternative Latin words and CJK characters).
See Table 26-26 on page 569.
Using multi-token matching with EDM lists the symbols that are identified and treated as
punctuation during EDM indexing.

Table 26-26 Characters treated as punctuation for indexing (EDM)

Punctuation name Character representation

Apostrophe '

Tilde ~

Exclamation point !

Ampersand &

Dash -

Single quotation mark '

Double quotation mark "

Period (dot) .
Detecting content using Exact Data Matching (EDM) 570
Using multi-token matching with EDM

Table 26-26 Characters treated as punctuation for indexing (EDM) (continued)

Punctuation name Character representation

Question mark ?

At sign @

Dollar sign $

Percent sign %

Asterisk *

Caret symbol ^

Open parenthesis (

Close parenthesis )

Open bracket [

Close bracket ]

Open brace {

Close brace }

Forward slash /

Back slash \

Pound sign #

Equal sign =

Plus sign +

Match count variant examples (EDM)

The default value for the Advanced Server setting EDM.MatchCountVariant eliminates the
matches that consist of the same set of tokens from some other match. Rarely is there a need
to change the default value, but if necessary you can configure how EDM matches are counted
using this parameter.
See “Advanced server settings” on page 285.
Table 26-27 provides examples for match counting. All examples assume that the policy is
set to match three out of four column fields and that the profile index contains the following
cell contents:
Kathy | Stevens | 123-45-6789 | 1111-1111-1111-1111
Detecting content using Exact Data Matching (EDM) 571
Using multi-token matching with EDM

Kathy | Stevens | 123-45-6789 | 2222-2222-2222-2222

Kathy | Stevens | 123-45-6789 | 3333-3333-3333-3333

Table 26-27 Match count variant examples (EDM)

Inbound message Match Number of matches Explanation

contents count
variant

Kathy Stevens 123-45-6789 1 3 Records matched in the profile: first

name, last name, and SSN.

2 1 Number of unique token sets matched.

3 1 Number of unique supersets of token

sets.

Kathy Stevens 123-45-6789 1 3 If EDM.HighlightAllMatchesInProximity

1111-1111-1111-1111 = false, EDM matches the left-most
2 1: if tokens for each profile data row. The
Kathy Stevens 123-45-6789 EDM.HighlightAllMatchesInProximity token set for each row is as follows:
= false (default)
Row # 1: Kathy Stevens 123-45-6789
2: if
EDM.HighlightAllMatchesInProximity Row # 2: Kathy Stevens 123-45-6789
= true Row # 3: Kathy Stevens 123-45-6789

3 1 If EDM.HighlightAllMatchesInProximity
= true, EDM matches all tokens within
the proximity window. The token set for
each row is as follows:

Row # 1: Kathy Stevens 123-45-6789

1111-1111-1111-1111 Kathy Stevens
123-45-6789

Row # 2: Kathy Stevens 123-45-6789

Kathy Stevens 123-45-6789

Row # 3: Kathy Stevens 123-45-6789

Kathy Stevens 123-45-6789
Detecting content using Exact Data Matching (EDM) 572
Using multi-token matching with EDM

Table 26-27 Match count variant examples (EDM) (continued)

Inbound message Match Number of matches Explanation

contents count
variant

1111-1111-1111-1111 1 3 If EDM.HighlightAllMatchesInProximity
Kathy Stevens 123-45-6789 = false, EDM matches the left-most
2 2 tokens for each profile data row. The
token set for each row is as follows:
3 2: if
EDM.HighlightAllMatchesInProximity Row # 1: 1111-1111-1111-1111 Kathy
= false (default) Stevens

1: if Row # 2: Kathy Stevens 123-45-6789

EDM.HighlightAllMatchesInProximity
Row # 3: Kathy Stevens 123-45-6789
= true
If EDM.HighlightAllMatchesInProximity
= true, EDM matches all tokens within
the proximity window. The token set for
each row is as follows:

Row # 1: 1111-1111-1111-1111 Kathy

Stevens 123-45-6789

Row # 2: Kathy Stevens 123-45-6789

Row # 3: Kathy Stevens 123-45-6789

Proximity matching example for EDM

EDM protects confidential data by correlating uniquely identifiable information, such as SSN,
with data that is not unique, such as last name. When correlating data, it is important to ensure
that terms are related. In natural languages, it is more likely that when two words appear close
together they are being used in the same context and are therefore related.
Based on the premise that word proximity indicates relatedness, EDM employs a
proximity-matching radius or range to limit how much freeform content the system will examine
when searching for matches. EDM proximity matching is designed to reduce false positives
by ensuring that matched terms are proximate.
The proximity range is proportional to the policy definition. The proximity range is determined
by the proximity radius multiplied by the number of matches required by the EDM policy
condition. The radius is set by the Advanced Server Setting parameter
EDM.SimpleTextProximityRadius. The default value is 35. In addition, proximity matching
applies to both free-form text and tabular data. There is no distinction at run-time between the
two. Thus, tabular data is treated the same as free text data and the proximity check is
performed beyond the scope of the length of the row contents
Detecting content using Exact Data Matching (EDM) 573
Using multi-token matching with EDM

For example, assuming the default radius of 35 and a policy set to match 3 out of 4 column
fields, the proximity range is 105 tokens (3 x 35). If the policy matches 2 out of 3 the proximity
range is 70 tokens (35 x 2).

Warning: While you can decrease the value of the proximity radius, Symantec does not
recomment increasing this value beyond the default (35). Doing so may cause performance
issues. See “Configuring Advanced Settings for EDM policies” on page 557.

Table 26-28 shows a proximity matching example based on the default proximity radius setting.
In this example, the detected content produces 1 unique token set match, described as follows:
■ The proximity range window is 105 tokens (35 x 3).
■ The proximity range window starts at the leftmost match ("Stevens") and ends at the
rightmost match ("123-45-6789").
■ The total number of tokens from "Stevens" to the SSN (including both) is 105 tokens.
■ The stopwords "other" and "a" are counted for proximity range purposes.
■ "Bank of America" is a multi-token. Each sub-token part of a multi-token is counted as a
single token for proximity purposes.

Table 26-28 Proximity example for EDM

Indexed data Policy Proximity Detected content

Last_Name | Employer | Match 3 of 3 Radius = 35 Zendrerit inceptos Kathy Stevens lorem ipsum pharetra
SSN tokens (default) convallis leo suscipit ipsum sodales rhoncus, vitae dui
nisi volutpat augue maecenas in, luctus id risus magna
Stevens | Bank of America
arcu maecenas leo quisque. Rutrum convallis tortor
| 123-45-6789
urna morbi elementum hac curabitur morbi, nunc dictum
primis elit senectus faucibus convallis surfrent.
Aptentnour gravida adipiscing iaculis himenaeos,
himenaeos a porta etiam viverra. Class torquent uni
other tristique cubilia in Bank of America. Dictumst
lorem eget ipsum. Hendrerit inceptos other sagittis
quisque. Leo mollis per nisl per felis, nullam cras mattis
augue turpis integer pharetra convallis suscipit
hendrerit? Lubilia en mictumst horem eget ipsum.
Inceptos urna sagittis quisque dictum odio hendrerit
convallis suscipit ipsum wrdsrf 123-45-6789.
Detecting content using Exact Data Matching (EDM) 574
Updating EDM indexes to the latest version

Updating EDM indexes to the latest version

When you upgrade to the latest version of Symantec Data Loss Prevention, you must update
each Exact Data profile by reindexing the data source using the latest EDM Indexer. You need
to verify the amount of memory that is required for indexing the data source, and loading and
processing the index at run-time on the detection server.
See “About upgrading EDM deployments” on page 534.
See “Memory requirements for EDM” on page 579.
If you do not reindex the data source file, the system presents error messages indicating that
the Exact Data profile is out-of-date. You must reindex the Exact Data profile, and re-calculate
memory requirements.
See “EDM index out-of-date error codes” on page 578.
Two primary upgrade scenarios exist for EDM:
■ You use the Remote EDM Indexer to create indexes remotely and copy them to the Enforce
Server.
See “Update process using the Remote EDM Indexer” on page 574.
■ You already have a data source file that is current and cleansed that you can copy to the
upgraded Enforce Server for indexing.
See “Update process using the Enforce Server for EDM” on page 576.

Update process using the Remote EDM Indexer

You can use the following procedure for upgrading your EDM deployments to the latest version
of Symantec Data Loss Prevention. This procedure assumes that you can remotely index the
data source and copy the index file to the Enforce Server.
See “Remote EDM indexing” on page 585.
If remote indexing is not possible, the other option for upgrade is to copy the data source file
to the Enforce Server.
See “Update process using the Enforce Server for EDM” on page 576.
Detecting content using Exact Data Matching (EDM) 575
Updating EDM indexes to the latest version

Table 26-29 Update process using the Remote EDM Indexer

Step Action Description

1 Upgrade the Enforce Server Refer to the Symantec Data Loss Prevention Upgrade Guide at
to the latest version. http://www.symantec.com/docs/DOC9258 for details.

Do not upgrade the EDM detection server(s) now.

The latest Enforce Server can continue to receive incidents from older
detection servers during the upgrade process. Policies and other data cannot
be pushed out to older detection servers. There is one-way communication
only between the latest version of Enforce and previous versions of detection
servers.

2 Create a newly-generated Using the latest Enforce Server administration console, create a new EDM
remote EDM profile profile template for remote EDM indexing.
template.
See “Creating an EDM profile template for remote indexing” on page 589.

Download the *.edm profile template and copy it to the remote data source
host system.

See “Downloading and copying the EDM profile file to a remote system”
on page 591.

3 Install the latest version of Install the latest version of the Symantec Data Loss Prevention Remote EDM
the Remote EDM Indexer on Indexer on the remote data source host so that you can index the data source.
the remote data source host.
See “Remote EDM indexing” on page 585.

4 Calculate the memory that Calculate the memory that is required for indexing before you attempt to index
is required to index the data the data source. The Remote EDM Indexer is allocated sufficient memory to
source and adjust the index most data sources. If you have a very large index you may have to
indexer memory setting. allocate more memory.

See “Memory requirements for EDM” on page 579.

5 Index the data source using The result of this process is multiple latest-version compatible *.rdx files
the latest Remote EDM that you can load into the latest version of the Enforce Server.
Indexer.
If you have a data source file prepared, run the Remote EDM Indexer and
index it.

See “Remote indexing examples using data source file (EDM)” on page 592.

If the data source is an Oracle database and the data is clean, use the SQL
Preindexer to pipe the data to the Remote EDM Indexer.

See “Remote indexing examples using SQL Preindexer (EDM)” on page 593.
Detecting content using Exact Data Matching (EDM) 576
Updating EDM indexes to the latest version

Table 26-29 Update process using the Remote EDM Indexer (continued)

Step Action Description

6 Calculate the memory that You need to calculate how much RAM the detection server requires to load
is required to load and and process the index at run-time. These calculations are required for each
process the index and adjust EDM index you want to deploy.
the detection server memory
See “Memory requirements for EDM” on page 579.
setting for each EDM
detection server host.

7 Update the EDM profile by Copy the *.pdx and *.rdx files from the remote host to the latest Enforce
loading the latest version of Server host file system.
the index.
Load the index into the EDM profile you created in Step 2.

See “Copying and loading remote EDM index files to the Enforce Server”
on page 594.

8 Upgrade one or more EDM Once you have created the latest-version compliant EDM profiles and
detection servers to the upgraded the Enforce Server, you can then upgrade the detection servers.
latest version.
Refer to the Symantec Data Loss Prevention Upgrade Guide at
http://www.symantec.com/docs/DOC9258 for details.

Make sure that you have calculated and verified the memory requirements
for loading and processing multi-token indexes on the detection server.

See “Memory requirements for EDM” on page 579.

9 Test and verify the updated To test the upgraded system and updated index, you can create a new policy
index. that references the updated index.

10 Remove out-of-date EDM Once you have verified the new EDM index and policy, you can retire the
indexes. legacy EDM index and policy.

Update process using the Enforce Server for EDM

Use the following index update procedure if remote indexing is not possible and you have a
current data source file that you can copy to the Enforce Server.
Detecting content using Exact Data Matching (EDM) 577
Updating EDM indexes to the latest version

Table 26-30 Update process using the Enforce Server

Step Action Description

1 Upgrade the Enforce Refer to the Symantec Data Loss Prevention Upgrade Guide at
Server to the latest http://www.symantec.com/docs/DOC9258 for details.
version.
Do not upgrade the EDM detection servers now.

The Enforce Server can continue to receive incidents from older detection servers during
the upgrade process. Policies and other data cannot be pushed out to older detection
servers (one-way communication only between the current version of Enforce and older
detection servers).

2 Create, prepare, and Copy the data source file to the opt/Symantec/DataLoss
copy the data source Prevention/EnforceServer/15.5/Protect/datafiles (Linux) or ProgramData
file to the 15.5 \Symantec\DataLossPrevention\ServerPlatformCommon\15.5\Protect\datafiles
Enforce Server host. (Windows) directory on the upgraded 15.5 Enforce Server host file system.

See “Creating the exact data source file for EDM” on page 535.

See “Preparing the exact data source file for indexing for EDM” on page 537.

See “Uploading exact data source files for EDM to the Enforce Server” on page 539.

3 Calculate memory the Calculate the memory that is required for indexing before you attempt to index the data
memory that is source.
required to index the
See “Memory requirements for EDM” on page 579.
data source and
update the indexer
memory setting.

4 Create a new Create a new EDM profile using the latest version of the Enforce Server administration
latest-version-compliant console.
EDM profile and index
Choose the option Reference Data Source on Manager Host for uploading the data
the data source file.
source file (assuming that you copied it to the /datafiles directory).

Index the data source file on save of the profile.

See “Creating and modifying Exact Data Profiles for EDM” on page 541.

5 Calculate the memory You need to calculate how much RAM the detection server requires to load and process
that is required to load the index and run-time. These calculations are required for each EDM index you want
and process the index to deploy and the memory adjustments are cumulative.
at run-time. Adjust the
See “Memory requirements for EDM” on page 579.
memory settings for
each EDM detection
server host.
Detecting content using Exact Data Matching (EDM) 578
Updating EDM indexes to the latest version

Table 26-30 Update process using the Enforce Server (continued)

Step Action Description

6 Upgrade the EDM Once you have created the latest-version-compliant EDM profile you can then upgrade
detection servers to the detection servers.
the latest version.
Refer to the Symantec Data Loss Prevention Upgrade Guide at
http://www.symantec.com/docs/DOC9258 for details.

Make sure that you have calculated and verified the memory requirements for loading
and processing multi-token indexes on the detection server.

See “Memory requirements for EDM” on page 579.

7 Test and verify the To test the upgraded system and updated index, you can create a new policy that
updated index. references the updated index.

8 Remove out-of-date Once you have verified the new EDM index and policy, you can retire the legacy EDM
EDM indexes. index and policy.
Note: Indexes that are created for versions earlier than 14.0 do not work with version
14.5 and later.

See “Remote EDM indexing” on page 585.

EDM index out-of-date error codes

The latest version of Symantec Data Loss Prevention provided several enhancements for
EDM. You must reindex the data source for each Exact Data profile using the latest EDM
Indexer.
If your EDM index is not compliant with the current version, the system returns error codes.
These error codes are listed in Table 26-31.

Table 26-31 Error messages for non-compliant Exact Data Profiles

Error message type Error code Error message

Enforce Server error 2928 One or more profiles are out of date and must be reindexed.
event
See “Updating EDM indexes to the latest version” on page 574.

See “Memory requirements for EDM” on page 579.

Enforce Server error 2928 Check the Manage > Data Profiles > Exact Data page for more details.
event detail The following EDM profiles are out of date: Profile X, Profile XY, and so
on.

System Event error 2928 One or more profiles are out of date and must be reindexed.

Exact Data Profile error N/A This profile is out of date, and must be reindexed.
Detecting content using Exact Data Matching (EDM) 579
Memory requirements for EDM

Memory requirements for EDM

Using EDM for Symantec Data Loss Prevention deployments affects hardware memory
requirements for Symantec Data Loss Prevention deployments. In particular, EDM affects the
memory required to index the data size as well as the memory required to load the index on
the detection server.
Once you have established what your specific EDM memory requirements are, you can evaluate
how those requirements affect the general system requirements for your Data Loss Prevention
deployment. See the Symantec Data Loss Prevention System Requirements and Compatibility
Guide for details about general requirements and potential EDM deployment impact.

About memory requirements for EDM

The memory requirements for EDM are related to several factors, including:
■ Number of indexes you are building
■ Total size of the indexes
■ Number of cells in each index
■ Number of message chains
These size limitations apply to EDM indexes:
■ The maximum number of rows supported is 4,294,967,294.
■ The maximum number of supported cells is 6 billion.
Table 26-32 gives an overview of the steps that you can follow to determine and set memory
requirements for EDM.

Table 26-32 Workflow for determining memory requirements for EDM indexes

Step Action For more information

1 Determine the memory See “Overview of configuring memory and indexing the data
that is required to index source for EDM” on page 580.
the data source.

2 Increase the indexer See “Determining requirements for both local and remote
memory according to your indexers for EDM” on page 580.
calculations.

3 Determine the memory See “Detection server memory requirements for EDM”
that is required to load the on page 582.
index on the detection
server.
Detecting content using Exact Data Matching (EDM) 580
Memory requirements for EDM

Table 26-32 Workflow for determining memory requirements for EDM indexes (continued)

Step Action For more information

4 Increase the detection See “Increasing the memory for the detection server (File
server memory according Reader) for EDM” on page 584.
to your calculations.

5 Repeat for each EDM

index you want to deploy.

Overview of configuring memory and indexing the data source for

EDM
Table 26-33 provides the steps for determining how much memory is needed to index the data
source.

Table 26-33 Memory requirements for indexing the data source for EDM

Step Action Details

1 Estimate the memory requirements See “Determining requirements for both local and remote
for the indexer. indexers for EDM” on page 580.

2 Increase the indexer memory. The next step is to increase the memory allocated to the
indexer. The procedure for increasing the indexer memory
differs depending on whether you are using the EDM indexer
local to the Enforce Server or the Remote EDM Indexer.

3 Restart the Symantec DLP Manager You must restart this service after you have changed the
service. memory allocation.

4 Index the data source. The last step is to index the data source. You need to do this
before you calculate remaining memory requirements.

See “Configuring Exact Data profiles for EDM” on page 534.

Determining requirements for both local and remote indexers for

EDM
This topic provides an overview of memory requirements for both the EDM indexer that is local
to the Symantec Data Loss Prevention Enforce Server and for the Remote EDM Indexer.
With the default settings, both EDM indexers can index any data source with 500 million cells
or less. For any data source with more than 500 million cells, an additional 3 bytes per cell is
needed to index the data source.
Detecting content using Exact Data Matching (EDM) 581
Memory requirements for EDM

You can schedule indexing for multiple indexes serially (at different times) or in parallel (at the
same time). When indexing serially, you need to allocate memory to accommodate the indexing
of the biggest index. When indexing in parallel, you need to allocate memory to accommodate
the indexing of all indexes that you are creating at that time.

Serial indexing
If you create the indexes serially (no two are created in parallel), the memory requirement for
the biggest index is:
2 billion cells – 0 .5 billion default x 3 bytes = 4.5 GB rounded to 5 GB additional memory.
This memory requirement includes the 2 GB (2048 MB) default memory for the Enforce Server
and the 5 GB additional system memory.
Table 26-34 provides examples for how the data source size affects indexer memory
requirements for serial indexes.

Table 26-34 Examples for indexer memory requirements-serial indexing for EDM

Data source size Indexer memory Description

requirement

100 million cells 2048 MB (default) No additional RAM is needed for the indexer.

500 million cells 2048 MB (default) No additional RAM is needed for the indexer.

1 billion cells 4 GB If you have a single data source with 1 billion cells (for
example, 10 columns by 100 million rows), you need extra
system memory for 0.5 billion cells (1 billion cells – 0.5 million
default) 0.5 million x 3 bytes, or 1.5 GB of RAM (rounded to
2 GB) to index the data source. This amount is added to the
default indexer RAM allotment.

2 billion cells 7 GB If you have a single data source with 2 billion cells (for
example, 10 columns by 200 million rows), you need extra
system memory for 1.5 billion cells (2 billion cells – 0.5 million
default) 1.5 million x 3 bytes, or 4.5 GB of RAM (rounded to
5 GB) to index the data source.

Parallel indexing with EDM

If you index these four files in Table 26-34 simultaneously (in parallel), you are indexing more
than 500 million cells. So, the additional memory (3.6 billion cells – 0.5 billion cells provided
by default) required is as follows:
3.1 billion cells x 3 bytes = 9.3 GB rounded to 10 GB additional memory.
Detecting content using Exact Data Matching (EDM) 582
Memory requirements for EDM

As explained in detail later, you set wrapper.java.maxmemory to 12 GB. This memory

requirement includes 2048 MB default memory for the Enforce Server and an additional 9 GB
system memory from the additional memory calculation above.

Note: For CJK language indexes, or indexes that are predominantly multi-token, these formulas
should use a multiplier of 4 bytes instead of 3 bytes. In both of these cases, a 350-million cell
data source is supported by default.

Detection server memory requirements for EDM

The detection server should not use more than 60% of the memory of the computer. For
example, if your detection server needs 6 GB memory to run, make sure you have 10 GB on
that server.

Default configuration for a detection server

The default configuration for detection server has 4GB and 8 message chains. See the following
formulas and Table 26-35 to determine how to calculate your actual memory requirements. In
addition, you can use the spreadsheet provided at the Symantec Support Center at
http://www.symantec.com/docs/DOC8255.html to determine your actual memory requirements.
See “Using the EDM Memory Requirements Spreadsheet” on page 585.
To load the index, the detection server needs 13 bytes per cell for system memory plus 1 GB
Java heap memory for each message chain in the detection server. The following examples
show scenarios for a customer who has three indexes that are all under the same schedule.
For Java heap memory requirements, the formula is:
Java heap memory requirement = the number of message chains * 1 GB.
For system memory requirements, the general formula is:
System memory requirement = number of cells * 13 bytes.

Detection Server memory settings for EDM

The Advanced Server Settings property for the number of message chains is:
MessageChain.NumChains.

The Java heap memory settings for a detection server are set in the Enforce Server
administration console at the Server Detail - Advanced Server Settings page, using the
BoxMonitor.FileReaderMemory. property. The format is -Xrs -Xms1200M -Xmx4G. You don't
needed to change the system memory setting, but make sure that the detection server has
enough free memory available.
Detecting content using Exact Data Matching (EDM) 583
Memory requirements for EDM

Note: When you update this setting, only change the -Xmx value in this property. For example,
only change "4G." to a new value, and leave all other values the same.

The examples in Table 26-35 show the settings for five different situations.

Table 26-35 EDM detection server Java heap memory settings and addition system memory
examples

Example Calculation Boxmonitor.FileReaderMemory Additional system

setting memory required

Example 1: Single Java Heap memory -Xmx6G 25 MB

small index with 2 requirement:
million cells to load
1 * 1 GB = 2 GB

System memory is:

2 million * 13 bytes =
25 MB

Example 2: Java heap memory -Xmx28G 37.2 GB

requirement is:
3 indexes when
running 24 chains: 24 * 1GB = 24 GB

■ Index 1: 100 million System Memory

cells requirement is:
■ Index 2: 1 billion For 100 million cells
cells index: 100 million * 13
■ Index 3: 2 billion bytes = 1.2 GB
cells
For 1 billion cells
index:

1 billion * 13 bytes =
12 GB

For 2 billion cells

index:

2 billion * 13 bytes =
24GB

Total system memory

requirement is:

1.2 GB + 12 GB + 24
GB = 37.2 GB
Detecting content using Exact Data Matching (EDM) 584
Memory requirements for EDM

Table 26-35 EDM detection server Java heap memory settings and addition system memory
examples (continued)

Example Calculation Boxmonitor.FileReaderMemory Additional system

setting memory required

Example 3: One single Java Heap memory -Xmx28G 60.5 GB

index with 5 billion requirement is:
cells and 24 message
24 * 1GB = 24 GB
chains
System memory
requirement is:

5 billion * 13 bytes =
60.5 GB

Example 4: One single Java heap memory -Xmx28G 19.3 GB

index with 1.6 billion requirement is:
cells and 24 message
24 * 1GB = 24 GB
chains
System memory
requirement is:

1.6 billion * 13 bytes =

19.3 GB

Example 5: One single Java heap memory -Xmx12G 6.1 GB

index with 500 million requirement is:
cells and 8 message
8 * 1 GB = 8 GB
chains
System memory
requirement is:

500 million * 13 bytes

= 6.1 GB

Increasing the memory for the detection server (File Reader) for
EDM
This topic provides instructions for increasing the File Reader memory allocation for a detection
server. These instructions assume that you have performed the necessary calculations.
To increase the memory for detection server processing
1 In the Enforce Server administration console, navigate to the Server Detail - Advanced
Server Settings screen for the detection server where the EDM index is deployed or to
be deployed.
2 Locate the following setting: BoxMonitor.FileReaderMemory.
Detecting content using Exact Data Matching (EDM) 585
Remote EDM indexing

Using the EDM Memory Requirements Spreadsheet

The EDM Memory Requirements Spreadsheet is a tool that you can use to determine the
additional system memory needed on the detection server to run your indexes. It is available
as an Excel spreadsheet on the Symantec Support Center at:
https://support.symantec.com/en_US/article.DOC8255.html
Figure 26-1 shows an example of the spreadsheet with four message chains and three indexes.

Figure 26-1 EDM Memory Requirements Spreadsheet

To compute the additions system memory required to run your indexes, enter the following
information:
1. Obtain the number of cells in each index (you can specify up to 10 indexes).
2. Enter that number into # of cells in Index.
When you change any value, the spreadsheet updates the Required RAM field.
The value in the Required RAM field is the additional system memory that is required to run
the indexes specified.

Remote EDM indexing

An EDM index maps the data you want to protect to the Exact Data profile. The typical EDM
workflow for creating the EDM index is to upload the data source file to the Enforce Server,
create the Exact Data profile, and index the data source. Instead of uploading the data source
file to the Enforce Server for indexing, you can index the data source locally and securely using
the Remote EDM Indexer.
See “About the Exact Data Profile and index” on page 528.
Detecting content using Exact Data Matching (EDM) 586
Remote EDM indexing

For example, if copying the confidential data source file to the Enforce Server presents a
potential security or logistical issue, you can use the Remote EDM Indexer to create the
cryptographic index directly on the data source host before moving the index to the Enforce
Server. If you are upgrading to the latest Symantec Data Loss Prevention version you may
want to use the Remote EDM Indexer to update your existing EDM indexes.
See “About the Remote EDM Indexer” on page 586.
See “About the SQL Preindexer for EDM” on page 586.
The Remote EDM Indexer is a standalone tool that lets you index the data source file directly
on the data source host.
See “System requirements for remote EDM indexing” on page 587.

About the Remote EDM Indexer

The Remote EDM Indexer utility converts a data source file to an EDM index. The utility is
similar to the local EDM Indexer used by the Enforce Server. However, the Remote EDM
Indexer is designed for use on a computer that is not part of the Symantec Data Loss Prevention
server configuration.
Using the Remote EDM Indexer to index a data source on a remote machine has the following
advantages over using the EDM Indexer on the Enforce Server:
■ It enables the owner of the data, rather than the Symantec Data Loss Prevention
administrator, to index the data.
■ It shifts the system load that is required for indexing onto another computer. The CPU and
RAM on the Enforce Server is reserved for other tasks.
See “About the SQL Preindexer for EDM” on page 586.
See “Workflow for remote EDM indexing” on page 587.

About the SQL Preindexer for EDM

You use the SQL Preindexer utility with the Remote EDM Indexer to run SQL queries against
Oracle databases and pipe the resulting data to the Remote EDM Indexer for indexing.
See “System requirements for remote EDM indexing” on page 587.
The SQL Preindexer utility is installed in the C:\Program
Files\Symantec\DataLossPrevention\ServerPlatformCommon\Indexer\15.1\Protect\bin
directory during installation of the Remote EDM Indexer. The SQL Preindexer utility generates
an index directly from an Oracle SQL database. The SQL Preindexer processes the database
query and passes it to the standard input of the Remote EDM Indexer utility.
To use the SQL Preindexer the data source must be relatively clean since the query result
data is piped directly to the Remote EDM Indexer.
Detecting content using Exact Data Matching (EDM) 587
Remote EDM indexing

See “About the Remote EDM Indexer” on page 586.

System requirements for remote EDM indexing

The Remote EDM Indexer runs on the Windows and Linux operating system versions that are
supported for Symantec Data Loss Prevention servers. See the Symantec Data Loss Prevention
System Requirements and Compatibility Guide for more information about operating system
support.
The SQL Preindexer supports Oracle databases and requires a relatively clean data source.
See “About the SQL Preindexer for EDM” on page 586.
The RAM requirements for using the Remote EDM Indexer vary according to the size of the
data source being indexed and the number of multi-token columns in the data source.
See “Memory requirements for EDM” on page 579.

Workflow for remote EDM indexing

This section summarizes the steps to index a data file on a remote machine and then use the
index in Symantec Data Loss Prevention.
See “About the Exact Data Profile and index” on page 528.

Table 26-36 Steps to use the Remote EDM Indexer

Step Action Description

Step 1 Install the Remote EDM See “Installing the Remote EDM Indexer” on page 588.
Indexer on a computer that
is not part of the Symantec
Data Loss Prevention
system.

Step 2 Create an Exact Data Profile On the Enforce Server, generate an EDM Profile template using the *.edm
on the Enforce Server to use file name extension and specifying the exact number of columns to be indexed.
with the Remote EDM
See “Creating an EDM profile template for remote indexing” on page 589.
Indexer.

Step 3 Copy the Exact Data Profile Download the profile template from the Enforce Server and copy it to the
file to the computer where remote data source host computer.
the Remote EDM Indexer
See “Downloading and copying the EDM profile file to a remote system”
resides.
on page 591.
Detecting content using Exact Data Matching (EDM) 588
Remote EDM indexing

Table 26-36 Steps to use the Remote EDM Indexer (continued)

Step Action Description

Step 4 Run the Remote EDM If you have a cleansed data source file, use the RemoteEDMIndexer with
Indexer and create the index the -data, -profile and -result options.
files.
If the data source is an Oracle database, use the SqlPreindexer and the
RemoteEDMIndexer to index the data source directly with the -alias (oracle
DB host), -username and -password credentials, and the -query string
or -query_path

See “Generating remote index files for EDM” on page 591.

Step 5 Copy the index files from the Copy the resulting *.pdx and *.rdx files from the remote machine to the
remote machine to the Enforce Server host at C:\ProgramData\Symantec\DataLossPrevention
Enforce Server. \EnforceServer\15.5\Protect\index.

See “Copying and loading remote EDM index files to the Enforce Server”
on page 594.

Step 6 Load the index files into the Update the EDM profile by loading the externally generated index.
Enforce Server.
Submit the profile for indexing.

See “Copying and loading remote EDM index files to the Enforce Server”
on page 594.

The ExternalDataSource.<name>.rdx and *.pdx files are removed

from the index directory and replaced by the file DataSource.<profile
id>.<version>.rdxver.

See “Troubleshooting remote indexing errors for EDM” on page 599.

Step 8 Create policy with EDM You should see the column data for defining the EDM condition.
condition.
See “Configuring the Content Matches Exact Data policy condition for EDM”
on page 551.

Installing the Remote EDM Indexer

You install the Remote EDM Indexer on one or more systems where the confidential files you
want to index are stored. The process for installing a remote indexer is the same for EMDI,
EDM, and IDM.
Detecting content using Exact Data Matching (EDM) 589
Remote EDM indexing

See “About installing remote indexers” on page 589.

You can install the Remote EDM Indexer on all of the supported Windows and Linux platforms.
See the Symantec Data Loss Prevention System Requirements Guide for platform details.

Creating an EDM profile template for remote indexing

The EDM Indexer uses an Exact Data Profile when it runs to ensure that the data is correctly
formatted. You must create the Exact Data Profile before you use the Remote EDM Indexer.
The profile is a template that describes the columns that are used to organize the data. The
profile does not need to contain any data. After creating the profile, copy it to the computer
that runs the Remote EDM Indexer.
See “About the Exact Data Profile and index” on page 528.
To create an EDM profile for remote indexing
1 From the Enforce Server administration console, navigate to the Manage > Data Profiles
> Exact Data screen.
2 Click Add Exact Data Profile.
3 In the Name field, enter a name for the profile.
4 In the Data Source field, select Use This File Name, and enter the name of the index
file to create with the *.edm extension.
You must select this option since you are only creating the profile template at this point.
Later you will then index the profile with data source using the Remote EDM Indexer.
Enter the file name of the data source you plan to create for remote EDM indexing. Be
sure to name the data source file exactly the same as the name you enter here.
See “Uploading exact data source files for EDM to the Enforce Server” on page 539.
Once you have copied the generated remote index back to the Enforce Server, you use
the Load Externally Generated Index option to load the remote index into the profile
template
See “Copying and loading remote EDM index files to the Enforce Server” on page 594.
5 In the Number of Columns text box, specify the number of columns in the data source
to be indexed.
For remote EDM indexing purposes you must specify the exact Number of Columns the
index is to have. Be sure to include the exact number of columns you specify here in the
data source file.
See “Uploading exact data source files for EDM to the Enforce Server” on page 539.
6 If the first row of the data source contains the column names, select the option Read first
row as column names.
Detecting content using Exact Data Matching (EDM) 590
Remote EDM indexing

7 In the Error Threshold text box, enter the maximum percentage of rows that can contain
errors.
If, during indexing of the data source, the number of rows with errors exceeds the
percentage that you specify here, the indexing operation fails.
8 In the Column Separator Char field, select the type of character that is used in your data
source to separate the columns of data.
9 In the File Encoding field, select the character encoding that is used in your data source.
If Latin characters are used, select the ISO-8859-1 option. For East Asian languages, use
either the UTF-8 or UTF-16 options.
10 Click Next to map the column headings from the data source to the profile.
11 In the Field Mappings section, map the Data Source Field to the System Field for each
column by selecting the column name from the System Field drop-down list.
The Data Source Field lists the number of columns you specified at the previous screen.
The System Field contains a list of standard column headings. If any of the column
headings in your data source match the choices available in the System Field list, map
each accordingly. Be sure that you match the selection in the System Field column to its
corresponding numbered column in the Data Source Field.
For example, for a data source that you have specified in the profile as having three
columns, the mapping configuration may be:

Data Source Field System Field

Col 1 First Name

Col 2 Last Name

Col 3 Social Security Number

12 If a Data Source Field does not map to a heading value in the options available from the
System Field column, click the Advanced View link.
In the Advanced View the system displays a Custom Name column beside the System
Field column.
Enter the correct column name in the text box that corresponds to the appropriate column
in the data source.
Optionally, you can specify the data type for the Custom Name you entered by selecting
the data type from the Type drop-down list. These data types are system-defined. Click
the description link beside the Type name for details on each system-defined data type.
Detecting content using Exact Data Matching (EDM) 591
Remote EDM indexing

13 If you intend to use the Exact Data Profile to implement a policy template that contains
one or more EDM rules, you can validate your profile mappings for the template. To do
this, select the template from the Check mappings against policy template drop-down
list and click Check now. The system indicates any unmapped fields that the template
requires.
14 Do not select any Indexing option available at this screen, since you intend to index
remotely.
15 Click Finish to complete the profile creation process.

Downloading and copying the EDM profile file to a remote system

Download and copy the EDM profile to the remote system
1 Configure an Exact Data Profile.
See “Creating an EDM profile template for remote indexing” on page 589.
2 Download the EDM profile by selecting the download profile link at the Manage > Data
Profiles > Exact Data screen.
The system prompts you to save the EDM profile as a file. The file extension is *.edm.
3 Save the file.
If the data source host computer where you intend to run the Remote EDM Indexer is
available on the same subnet as the Enforce Server you can browse to that computer
and select it as the destination. Otherwise, manually copy the profile to the remote system.
4 Use the profile to index the data source using the Remote EDM Indexer.
See “Generating remote index files for EDM” on page 591.

Generating remote index files for EDM

You use the command-line Remote EDM Indexer utility to generate an EDM index for importing
to the Enforce Server. You can use the Remote EDM Indexer to index data source file that
you have generated and cleansed. Or you can pipe the output from the SQL Preindexer to
the standard input of the Remote EDM Indexer. The SQL Preindexer requires an Oracle DB
data source and clean data.
When the indexing process completes, the Remote EDM Indexer generates several files in
the specified result directory. These files are named after the data file that was indexed, with
one file having the .pdx extension and another file with the .rdx extension. The system
generates 12 .rdx files named ExternalDataSource.<DataSourceName>.rdx.0 -
ExternalDataSource.<DataSourceName>.rdx.11.
Detecting content using Exact Data Matching (EDM) 592
Remote EDM indexing

Table 26-37 Options for generating remote EDM indexes

Use case Description Remarks

Remote EDM Indexer with data source Specify data source file, EDM profile, Use when you have a cleansed data
file. output directory. source file; use for upgrading to the
latest vesion.

See “Remote indexing examples using

data source file (EDM)” on page 592.

Remote EDM Indexer with SQL Query DB and pipe output to stdin of Requires Oracle DB and clean data.
Preindexer Remote EDM Indexer.
See “Remote indexing examples using
SQL Preindexer (EDM)” on page 593.

Remote indexing examples using data source file (EDM)

To use the Remote EDM Indexer to index a flat data source file you have generated and
cleansed, you specify the local data source file name and path (-data), the local EDM profile
file name and path (-profile), and the output directory for the generated index files (-result).
The syntax for using the Remote EDM Indexer to generate an index from a cleansed data
source tabular text file is as follows:

RemoteEDMIndexer -data=<local data source filename and path>

-profile=<local *.edm profile file name and path>
-result=<local output directory for *.rdx and *pdx index files>

For example:

RemoteEDMIndexer -data=C:\EDMIndexDirectory\CustomerData.dat
-profile=C:\EDMIndexDirectory\RemoteEDMProfile.edm
-result=C:\EDMIndexDirectory\

This command generates an EDM index using the local data source tabular text file
CustomerData.dat and the local RemoteEDMProfile.edm file that you generated and copied
from the Enforce Server to the remote host, where \EDMIndexDirectory is the directory for
placing the generated index files.
When the generation of the indexes is successful, the utility displays the message "Successfully
created index" as the last line of output.
In addition, the following index files are created and placed in the -result directory:
■ ExternalDataSource.CustomerData.pdx

■ ExternalDataSource.CustomerData.rdx
Detecting content using Exact Data Matching (EDM) 593
Remote EDM indexing

Twelve files, named ExternalDataSource.<DataSourceName>.rdx.0 -

ExternalDataSource.<DataSourceName>.rdx.11 are always generated. Copy these files to
the Enforce Server and update the EDM profile using the remote index.
See “Remote EDM Indexer command options” on page 597.

Remote indexing examples using SQL Preindexer (EDM)

If your data source is an Oracle DB and has clean data you can index the data source directly
using the SQL Preindexer with the Remote EDM Indexer.
The syntax is as follows:

SqlPreindexer -alias=<oracle connect string: //host:port/SID>

-username=<DB user> -password=<DB password> -query=<sql to run> |
RemoteEDMIndexer -profile=<*.edm profile file name and path>
-result=<output directory for index files>

For example:

SqlPreindexer -alias=@//myhost:1521/orcl -username=scott -password=tiger

-query="SELECT name, salary FROM employee" |
RemoteEDMIndexer -profile=C:\ExportEDMProfile.edm -result=C:\EDMIndexDirectory\

With this command the SQL Preindexer utility connects to the Oracle database and runs the
SQL query to retrieve name and salary data from the employee table. The SQL Preindexer
returns the result of the query to stdout (the command console). The SQL query must be in
quotes. The Remote EDM Indexer command runs the utility and reads the query result from
the stdin console. The Remote EDM Indexer indexes the data using the ExportEDMProfile.edm
profile as specified by the profile file name and local file path.
When the generation of the indexes is successful, the utility displays the message "Successfully
created index" as the last line of output.
In addition, the utility places the following generated index files in the EDMIndexDirectory
-result directory:
■ ExternalDataSource.CustomerData.pdx
■ ExternalDataSource.CustomerData.rdx
Here is another example using SQL Preindexer and Remote EDM Indexer commands:

SqlPreindexer -alias=@//localhost:1521/CUST -username=cust_user -password=cust_pword

-query="SELECT account_id, amount_owed, available_credit FROM customer_account" -verbose |
RemoteEDMIndexer -profile=C:\EDMIndexDirectory\CustomerData.edm
-result=C:\EDMIndexDirectory\ -verbose
Detecting content using Exact Data Matching (EDM) 594
Remote EDM indexing

Here the SQL Preindexer command queries the CUST.customer_account table in the database
for the account_id, amount_owed, and availble_credit records. The result is piped to the
Remote EDM Indexer which generates the index files based on the CustomerData.edm profile.
The -verbose option is used for troubleshooting.
As an alternative to the -query SQL string you can use the -query_path option and specify
the file path and name for the SQL query (*.sql). If you do not specify a query or query path
the entire DB is queried.

SqlPreindexer -alias=@//localhost:1521/cust -username=cust_user -password=cust_pwrd

-query_path=C:\EDMIndexDirectory\QueryCust.sql -verbose |
RemoteEDMIndexer -profile=C:\EDMIndexDirectory\CustomerData.edm
-result=C:\EDMIndexDirectory\ -verbose

See “SQL Preindexer command options (EDM)” on page 595.

Copying and loading remote EDM index files to the Enforce Server
The following files are created in the -result directory when you remotely index a data source:
■ ExternalDataSource.<DataSourceName>.pdx

■ ExternalDataSource.<DataSourceName>.rdx.0 -
ExternalDataSource.<DataSourceName>.rdx.11

After you create the index files on a remote machine, the files must be copied to the Enforce
Server, loaded into the previously created remote EDM profile, and indexed.
See “Creating an EDM profile template for remote indexing” on page 589.
To copy and load the files on the Enforce Server
1 Go to the directory where the index files were generated. (This directory is the one specified
in the -result option.)
2 Copy all of the index files with .pdx and .rdx extensions to the index directory on the
Enforce Server. This directory is located at
C:\ProgramData\Symantec\DataLossPrevention\ServerPlatformCommon\15.5\Index
(Windows) or /var/Symantec/DataLossPrevention/ServerPlatformCommon/15.5/index
(Linux).
3 From the Enforce Server administration console, navigate to the Manage > Policies >
Exact Data screen.
This screen lists all the Exact Data Profiles in the system.
4 Click the name of the Exact Data Profile you used with the Remote EDM Indexer.
5 To load the new index files, go to the Data Source section of the Exact Data Profile and
select Load Externally Generated Index.
Detecting content using Exact Data Matching (EDM) 595
Remote EDM indexing

6 In the Indexing section, select Submit Indexing Job on Save.

As an alternative to indexing immediately on save, consider scheduling a job on the remote
machine to run the Remote EDM Indexer on a regular basis. The job should also copy
the generated files to the index directory on the Enforce Server. You can then schedule
loading the updated index files on the Enforce Server from the profile by selecting Load
Externally Generated Index and Submit Indexing Job on Schedule and configuring
an indexing schedule.
See “Use scheduled indexing to automate profile updates (EDM)” on page 607.
7 Click Save.

SQL Preindexer command options (EDM)

On install the SQL Preindexer utility is available at C:\Program Files\Symantec\Data Loss
Prevention\Indexer\15.1\Protect\bin (Windows) and
/Symantec/DataLossPrevention/Indexer/15.1/Protect/bin (Linux).

The SQL Preindexer provides a command-line interface. The syntax for running the utility is
as follows:

SqlPreindexer -alias=<@//oracle_host:port/SID> -username=<DB_user> [options]

Note the following about the arguments:

■ The SQL Preindexer requires the -alias and -username arguments.
■ If you omit the -password option, the user is prompted to enter it.
■ If you use the -query option, the SQL query string must be in quotes.
■ If you omit the -query option, the utility indexes the entire database.
■ To query using wildcards, use the -qeury_path option. The SQL Preindexer does not
support the use of wildcards from the command line using the -query option. For example:
"select * from CUST_DATA" does not work with -query; you must query each individual
column field: "select cust_ID, cust_Name, cust_SSN from CUST_DATE." The query "select
* from CUST_DATA" works using the -qeury_path command.
See “Remote indexing examples using SQL Preindexer (EDM)” on page 593.
Table 26-38 lists the command options for the SQL Preindexer.
Detecting content using Exact Data Matching (EDM) 596
Remote EDM indexing

Table 26-38 SQL Preindexer command options (EDM)

Option Summary Description

-alias Oracle DB connect string Specifies the database alias that is used to connect to the
database in the following format:
Required
@//oracle_DB_host:port/SID

For example:

-alias=@//myhost:1521/ORCL

-alias=@//localhost:1521/CUST

-driver Oracle JDBC driver class Specifies the JDBC driver class, for example:
oracle.jdbc.driver.OracleDriver.

-encoding Character encoding Specifies the character encoding of the data to index. The
(iso-8859-1) default is iso-8859-1.

Data with non-English characters should use UTF-8 or UTF-16.

-password Oracle DB password Specifies the password to the database.

If this option is not specified, the password is read from stdin.

-query-query_path SQL query This option specifies the SQL query to perform. The statement
must be enclosed in quotes.

If you omit the -query option the utility indexes the entire
database.

SQL script Specifies the file name and local path that contains a SQL
query to run. Must be full path.

This option can be used as an alternative to the -query option

when the query is a long SQL statement.

-separator Output column separator Specifies whether the output column separator is a comma,
(tab) pipe, or tab. The default separator is a tab.

To specify a comma separator or pipe separator, enclose the

separator character in quotation marks: "," or "|".

-subprotocol Oracle thin driver Specifies the JDBC connect string subprotocol (for example,
oracle:thin).

-username Oracle DB user Specifies the name of the database user.

Required

-verbose Print verbose output for Displays a statistical summation of the operation when it is
debugging. complete.

See “Troubleshooting preindexing errors for EDM” on page 598.

Detecting content using Exact Data Matching (EDM) 597
Remote EDM indexing

Remote EDM Indexer command options

On install, the Remote EDM Indexer utility is available at \Program Files\Symantec\Data
Loss Prevention\Indexer\15.1\Protect\bin (Windows) and
opt/Symantec/DataLossPrevention/Indexer/15.1/Protect/bin (Linux).

If you are on Linux, change users to the “SymantecDLP” user before running the Remote EDM
Indexer. (The installation program creates the “SymantecDLP” user.)
The Remote EDM Indexer provides a command line interface. The syntax for running the utility
is as follows:

RemoteEDMIndexer -profile=<file *.edm> -result=<out_dir> [options]

Note the following about the syntax:

■ The Remote EDM Indexer requires the -profile and -result arguments.
■ If you use a flat data source file as input, you must specify the file name and local path
using the -data option.
■ The -data option is omitted when you use the SQL Preindexer to pipe the data to the
Remote EDM Indexer.
See “Remote indexing examples using data source file (EDM)” on page 592.
Table 26-39 describes the command options for the Remote EDM Indexer.

Table 26-39 Remote EDM Indexer command options

Option Summary Description

-data Data source to be indexed Specifies the data source to be indexed. If this option is not
(stdin) specified, the utility reads data from stdin.

Required if you use a Required if using data source file and not the SQL Preindexer.
tabular text file

-encoding Character encoding of data Specifies the character encoding of the data to index. The
to be indexed (ISO-8859-1) default is ISO-8859-1.

Use UTF-8 or UTF-16 if the data contains non-English

characters.

-ignore_date Ignore expiration date of the Overrides the expiration date of the Exact Data Profile if the
EDM profile profile has expired. (By default, an Exact Data Profile expires
after 30 days.)

-profile File containing the EDM Specifies the Exact Data Profile to be used. This profile is the
profile one that is selected by clicking the “download link” on the
Exact Data screen in the Enforce Server management console
Required
Detecting content using Exact Data Matching (EDM) 598
Remote EDM indexing

Table 26-39 Remote EDM Indexer command options (continued)

Option Summary Description

-result Directory to place the Specifies the directory where the index files are generated.
resulting indexes

Required

-verbose Display verbose output Displays a statistical summation of the indexing operation
when the index is complete.

See “Troubleshooting preindexing errors for EDM”

on page 598.

Troubleshooting preindexing errors for EDM

If you receive an error that the SQL Preindexer was unable to perform query or failed to prepare
for indexing, verify that the -query string is in quotes. You can test your -query string by
running only the SQL Preindexer command. If the command is correct the data queried from
the database is displayed to the console as stdout.
You may encounter errors when you index large amounts of data. Often the set of data contains
a data record that is incomplete, inconsistent, or inaccurate. Data rows that contain more
columns than expected or incorrect column data types often cannot be properly indexed and
are unrecognized.
The SQL Preindexer can be configured to provide a summary of information about the indexing
operation when it completes. To do so, specify the verbose option when running the SQL
Preindexer.
To see the rows of data that the Remote EDM Indexer did not index, adjust the configuration
in the Indexer.properties file using the following procedure.
To record those data rows that were not indexed
1 Locate the Indexer.properties file at \Program Files\Symantec\Data Loss
Prevention\Indexer\15.1\Protect\config\Indexer.properties (Windows) or
/Symantec/DataLossPrevention/Indexer/15.1/Protect/config/Indexer.properties
(Linux).
2 Open the file in a text editor.
Detecting content using Exact Data Matching (EDM) 599
Remote EDM indexing

3 Locate the create_error_file property and change the “false” setting to “true.”
4 Save and close the Indexer.properties file.
The Remote EDM Indexer logs errors in a file with the same name as the data file being
indexed and the .err suffix.
The rows of data that are listed in the error file are not encrypted. Safeguard the error file
to minimize any security risk from data exposure.
See “About the SQL Preindexer for EDM” on page 586.

Troubleshooting remote indexing errors for EDM

The Remote EDM Indexer displays a message that indicates whether the indexing operation
was successful or not. If the Remote EDM Indexer successfully creates the index, the console
displays the message "Successfully created index" as the last line of output. In addition, *.pdx
and *.rdx files are created in the -result directory.
The result depends on the error threshold that you specify in the EDM profile. Any error
percentage under the threshold completes successfully. Detailed information about the indexing
operation is available with the -verbose option.
See “Remote EDM Indexer command options” on page 597.
If the index generation is not successful, try these troubleshooting tips:

Table 26-40 Remote Indexer troubleshooting tips for EDM

Error Symptom Description

Index files not Use the -verbose option in Specifying the verbose option when running the Remote EDM
generated the command to reveal error Indexer provides a statistical summary of information about the
message. indexing operation after it completes. This information includes
the number of errors and where the errors occurred.

"Failed to create Verify file and path names. Verify that you included the full path and proper file name for
index" the -data file and the -profile file (*.edm). The paths must
be local to the host.
"Cannot compute
index"

"Unable to generate
index"

"Destination is not a Directory path not correct. Verify that you properly entered the full path to the destination
directory" directory for the required -result argument.
Detecting content using Exact Data Matching (EDM) 600
Remote EDM indexing

Table 26-40 Remote Indexer troubleshooting tips for EDM (continued)

Error Symptom Description

*.idx file instead Did not use -data argument The -data option is required if you are using a data source file
of *.rdx file and not the SQL Preindexer. In other words, the only time you
do not use the -data argument is when you are using the SQL
Preindexer.

If you run the Remote EDM Indexer without the -data option
and no SQL Preindexer query, you get an *.idx and *.rdx
file that cannot be used as for the EDM index. Rerun the index
using the -data option or a SQL Preindexer -query or
-query-path.

In addition, you may encounter errors when you index large amounts of data. Often the set of
data contains a data record that is incomplete, inconsistent, or incorrectly formatted. Data rows
that contain more columns than expected or incorrect data types often cannot be properly
indexed and are unrecognized during indexing. The rows of data with errors cannot be indexed
until those errors are corrected and the Remote EDM Indexer rerun. Symantec provides a
couple of ways to get information about any errors and the ultimate success of the indexing
operation.
To see the actual rows of data that the Remote EDM Indexer failed to index, modify the
Indexer.properties file.

To modify the Indexer.properties file and view remote indexing errors

1 Locate the Indexer.properties file at \Program Files\Symantec\Data Loss
Prevention\Indexer\15.1\Protect\config\Indexer.properties (Windows) or
/opt/Symantec/DataLossPrevention/Indexer/15.1/Protect/config/Indexer.properties
(Linux).
2 To edit the file, open it in a text editor.
3 Locate the create_error_file property parameter and change the “false” value to “true.”
4 Save and close the Indexer.properties file.
The Remote EDM Indexer logs errors in a file with the same name as the indexed data
file and with an .err extension. This error file is created in the logs directory.
The rows of data that are listed in the error file are not encrypted. Encrypt the error file to
minimize any security risk from data exposure.
Detecting content using Exact Data Matching (EDM) 601
Best practices for using EDM

Best practices for using EDM

EDM is the most accurate form of detection. It is also the most complex to set up and maintain.
To ensure that your EDM policies are as accurate as possible, consider the recommendations
in this section when you are implementing your EDM profiles and policies.
The following table provides a summary of the EDM policy considerations discussed in this
chapter, with links to individual topics for more details.

Table 26-41 Summary of EDM best practices

Best practice Description

Ensure that the data source file contains at least one See “Ensure data source has at least one column of unique
column of unique data. data (EDM)” on page 602.

Eliminate duplicate rows and blank columns before See “Cleanse the data source file of blank columns and
indexing. duplicate rows (EDM)” on page 603.

To reduce false positives, avoid single characters, quotes, See “Remove ambiguous character types from the data
abbreviations, numeric fields with less than 5 digits, and source file (EDM)” on page 604.
dates.

Understand multi-token indexing and clean up as See “Understand how multi-token cell matching functions
necessary. (EDM)” on page 604.

Use the pipe (|) character to delimit columns in your data See “Do not use the comma delimiter if the data source
source. has number fields (EDM)” on page 605.

Review an example cleansed data source file. See “Ensure that the data source is clean for indexing
(EDM)” on page 605.

Map data source column to system fields to leverage See “Map data source column to system fields to leverage
validation during indexing. validation (EDM)” on page 605.

Leverage EDM policy templates whenever possible. See “Leverage EDM policy templates when possible”
on page 606.

Include the column headers as the first row of the data See “Include column headers as the first row of the data
source file. source file (EDM)” on page 606.

Check the system alerts to tune Exact Data Profiles. See “Check the system alerts to tune profile accuracy
(EDM)” on page 607.

Use stopwords to exclude common words from matching. See “Use stopwords to exclude common words from
detection (EDM)” on page 607.

Automate profile updates with scheduled indexing. See “Use scheduled indexing to automate profile updates
(EDM)” on page 607.
Detecting content using Exact Data Matching (EDM) 602
Best practices for using EDM

Table 26-41 Summary of EDM best practices (continued)

Best practice Description

Match on two or three columns in an EDM rule. See “Match on 3 columns in an EDM condition to increase
detection accuracy” on page 608.

Leverage exception tuples to avoid false positives. See “Leverage exception tuples to avoid false positives
(EDM)” on page 609.

Use a WHERE clause to detect records that meet a See “Use a WHERE clause to detect records that meet
specific criteria. specific criteria (EDM)” on page 609.

Use the minimum matches field to fine tune EDM rules. See “Use the minimum matches field to fine tune EDM
rules” on page 610.

Consider using Data Identifiers in combination with EDM See “Combine Data Identifiers with EDM rules to limit the
rules. impact of two-tier detection” on page 610.

Include an email address field in the Exact Data Profile for See “Include an email address field in the Exact Data
profiled DGM. Profile for profiled DGM (EDM)” on page 610.

Use profiled DGM for Network Prevent for Web identity See “Use profiled DGM for Network Prevent for Web
detection identity detection (EDM)” on page 611.

Ensure data source has at least one column of unique data (EDM)
EDM is designed to detect combinations of data fields that are globally unique. At a minimum,
your EDM index must include at least one column of data that contains a unique value for each
record in the row. Column data such as account number, social security number, and credit
card number are inherently unique, whereas state or zip code are not unique, nor are names.
If you do not include at least one column of unique data in your index, your EDM profile will
not accurately detect the data you want to protect.
A unique column field is a column that has mostly unique values. It can have duplicate values,
but not more than the number set in term_commonority_threshold. The default value for this
setting is 10.
Table 26-42 describes the various types of unique data to include in your EDM indexes, as
well as fields that are not unique. You can include the non-unique fields in your EDM indexes
as long as you have at least one column field that is unique.
Detecting content using Exact Data Matching (EDM) 603
Best practices for using EDM

Table 26-42 Examples of unique data for EDM policies

Unique data for EDM Non-unique data

The following data fields are usually unique: The following data fields are not unique:
■ Account number ■ First name
■ Bank Card number ■ Last name
■ Phone number ■ City
■ Email address ■ State
■ Social security number ■ Zip code
■ Tax ID number ■ Password
■ Drivers license number ■ PIN number
■ Employee number
■ Insurance number

Cleanse the data source file of blank columns and duplicate rows
(EDM)
The data source file should be as clean as possible before you create the EDM index, otherwise
the resulting profile may create false positives.
When you create the data source file, avoid including empty cells or blank columns. Blank
columns or fields count as “errors” when you generate the EDM profile. A data source error is
either an empty cell or a cell with the wrong type of data (a name appearing in a phone number
column). The error threshold is the maximum percentage of rows that contain errors before
indexing stops. If the errors exceed the error threshold percentage for the profile (by default,
5%), the system stops indexing and displays an indexing error message.
The best practice is to remove blank columns and empty cells from the data source file, rather
than increasing the error threshold. Keep in mind that if you have many empty cells, it may
require a 100% error threshold for the system to create the profile. If you specify 100% as the
error threshold, the system indexes the data source without checking for errors.
In addition, do not fill empty cells or blank fields with bogus data so that the error threshold is
met. Adding fictitious or "null" data to the data source file will reduce the accuracy of the EDM
profile and is strongly discouraged. Content you want to monitor should be legitimate and not
null.
See “About cleansing the exact data source file for EDM” on page 530.
See “Preparing the exact data source file for indexing for EDM” on page 537.
See “Ensure that the data source is clean for indexing (EDM)” on page 605.
Detecting content using Exact Data Matching (EDM) 604
Best practices for using EDM

Remove ambiguous character types from the data source file (EDM)
You cannot have extraneous spaces, punctuation, and inconsistently populated fields in the
data source file. You can use tools such as Stream Editor (sed) and AWK to remove these
items from you data source file or files before indexing them.

Table 26-43 Characters to avoid in the data source file

Characters to avoid Explanation

Single characters Single character fields should be eliminated from the data source file. These are
more likely to cause false positives, since a single character is going to appear
frequently in normal communications.

Abbreviations Abbreviated fields should be eliminated from the data source file for the same reason
as single characters.

Quotes Text fields should not be enclosed in quotes.

Small numbers Indexing numeric fields that contain less than 5 digits is not recommended because
it will likely yield many false positives.

Dates Date fields are also not recommended. Dates are treated like a string, so if you are
indexing a date, such as 12/6/2007, the string will have to match exactly. The indexer
will only match 12/6/2007, and not any other date formats, such as Dec 6, 2007,
12-6-2007, or 6 Dec 2007. It must be an exact match.

Understand how multi-token cell matching functions (EDM)

An EDM rule performs a full-text search against the message, checking each word (except
those that are excluded by way of the columns you choose to match in the policy) for potential
matches. The matching algorithm compares each individual word in the message with the
contents of each token in the data profile.
If a cell in the data profile contains multiple words separated by spaces, punctuation, or
alternative Latin and Chinese, Japanese, and Korean (CJK) language characters, the cell is
a multi-token cell. The sub-token parts of a multi-token cell obey the same rules as single-token
cells: they are normalized according to their pattern where normalization can apply.
If a cell contains a multi-token, the multi-token must match exactly. For example, a column
field with the value “Joe Brown” is a multi-token cell (assuming multi-token matching is enabled).
At run-time the processor looks to match the exact string "Joe Brown,” including the space
(multiple spaces are normalized to one). The system does not match on "Joe" and "Brown" if
they are detected as single tokens.
In addition, multi-token cells are more computationally expensive than single-token cells. If
the index includes multi-token cells, you must verify that you have enough memory to index,
load, and process the EDM profile.
Detecting content using Exact Data Matching (EDM) 605
Best practices for using EDM

If multi-token matching is enabled, any punctuation that is next to a space is ignored. Therefore,
punctuation before and after a space is ignored.
Lastly, do not change the WIP setting from "true" to "false" unless you are sure that is the
result you want to achieve. You should only set WIP = false when you need to loosen the
matching criteria, such as account numbers where formatting may change across messages.
Make sure you test detection results to ensure you are getting the matches you expect.
See “Memory requirements for EDM” on page 579.

Do not use the comma delimiter if the data source has number fields
(EDM)
Of the three types of column delimiters that you can choose from for separating the fields in
the data source file (pipe, tab, semicolon, or comma), the pipe, semicolon, or tab (default) is
recommended. The comma delimiter is ambiguous and should not be used, especially if one
or more fields in your data source contain numbers. If you use a comma-delimited data source
file, make sure there are no commas in the data set other than those used as column delimiters.

Note: Although the system also treats the pound sign, equals sign, plus sign, semicolon, and
colon characters as separators, you should not use these because like the comma their
meaning is ambiguous.

Map data source column to system fields to leverage validation (EDM)

When you create the Exact Data Profile, you can validate how well the fields in your data
source match against system-defined patterns for that field. For example, if you map a field
to the credit card system pattern, the system will validate that the data matches the credit card
system pattern. If it does not, the system will create an error for every record that contains an
invalid credit card number. Mapping data source fields in your index to system-defined field
patterns helps you ensure that the fields in your index meet the data type criteria.
If there is no corresponding system field to map to a data source column, consider creating a
custom field to map data source column data. You can use the description field to annotate
both system and custom fields.
See “Mapping Exact Data Profile fields for EDM” on page 545.
See “Creating and modifying Exact Data Profiles for EDM” on page 541.

Ensure that the data source is clean for indexing (EDM)

The following list summarizes a cleansed data source that is ready for indexing:
■ It contains at least one unique column field.
Detecting content using Exact Data Matching (EDM) 606
Best practices for using EDM

■ It is not a single-column data source; it has two or more columns.

■ Empty cells and rows and blank columns are removed.
■ Incomplete and duplicate records are removed.
■ The number of faulty cells is below the default error rate (5%) for indexing.
■ Bogus data is not used to fill in blank cells or rows.
■ Improper and ambiguous characters are removed.
■ Multi-tokens comply with space and memory requirements.
■ Column fields are validated against the system-defined patterns that are available.
■ Mappings are validate against policy templates where applicable.
See “Ensure data source has at least one column of unique data (EDM)” on page 602.
See “Cleanse the data source file of blank columns and duplicate rows (EDM)” on page 603.
See “Remove ambiguous character types from the data source file (EDM)” on page 604.
See “Understand how multi-token cell matching functions (EDM)” on page 604.
See “Map data source column to system fields to leverage validation (EDM)” on page 605.

Leverage EDM policy templates when possible

Symantec Data Loss Prevention provides several policy templates that implement EDM rules.
The general recommendation is to use policy templates whenever possible when implementing
EDM. If you do use a policy template for EDM, you should validate the index against the
template when you configure the Exact Data Profile.
See “Creating and modifying Exact Data Profiles for EDM” on page 541.

Include column headers as the first row of the data source file (EDM)
When you extract the source data to the data source file, you should include the column
headers as the first row in the data source file. Including the column headers will make it easier
for you to identify the data you want to use in your policies.
The column names reflect the column mappings that were created when the exact data profile
was added. If there is an unmapped column, it is called Col X, where X is the column number
(starting with 1) in the original data profile.
If the Exact Data Profile is to be used for DGM, the file must have a column with a heading of
email, or the DGM will not appear in the Directory EDM drop-down list (at the remediation
page).
Detecting content using Exact Data Matching (EDM) 607
Best practices for using EDM

Check the system alerts to tune profile accuracy (EDM)

You should always review the system alerts after creating the Exact Data Profile. The system
alerts provide very specific information about problems encountered when creating the profile,
such as a SSN in an address field, which will affect accuracy.

Use stopwords to exclude common words from detection (EDM)

During indexing, words found in stopword files are ignored. Stopwords are common words
that are excluding from matching. For example, the stopwords file contains common words
such as articles, prepositions, and so forth. You can adjust the stopwords file by adding to or
removing words from the file. It is recommended that you back up the original before changing
it.
Stopword files are located at the following directory where the detection server running the
index is installed: \Program
Data\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\stopwords.
By default, the system uses the stopwords_en.txt file, which is the English language version.
Other language stopword files are also located in this same directory. You can change the
default stopword language file by updating the stopword_languages = en property in
C:\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\Indexer.properties
file on the Enforce Server.

Use scheduled indexing to automate profile updates (EDM)

When you configure an Exact Data Profile, you can set a schedule for indexing the data
source file. Index scheduling lets you decide when you want to index the data source file. For
example, instead of indexing the data source at the same time that you define the profile, you
can schedule it for a later date. Alternatively, if you need to reindex the data source on a regular
basis, you can schedule indexing to occur on a regular basis.
Before you set up an index schedule, consider the following:
■ If you update your data sources occasionally (for example, less than once a month),
generally there is no need to create a schedule. Index the data each time you update the
data source.
■ Schedule indexing for times of minimal system use. Indexing affects performance throughout
the Symantec Data Loss Prevention system, and large data sources can take time to index.
■ Index a data source as soon as you add or modify the corresponding exact data profile,
and re-index the data source whenever you update it. For example, consider a scenario
whereby every Wednesday at 2:00 P.M. you generate an updated data source file. In this
case you could schedule indexing every Wednesday at 3:00 P.M., giving you enough time
to cleanse the data source file and copy it to the Enforce Server.
Detecting content using Exact Data Matching (EDM) 608
Best practices for using EDM

■ Do not index data sources daily as this can degrade performance.

■ Monitor results and modify your indexing schedule accordingly. If performance is good and
you want more timely updates, for example, schedule more frequent data updates and
indexing.
Consider using scheduled indexing with remote EDM indexing to keep an EDM profile up to
date. For example, you can schedule a cron job on the remote machine to run the Remote
EDM Indexer on a regular basis. The job can also copy the generated index files to the index
directory on the Enforce Server. You can then configure the Enforce Server to load the externally
generated index and submit it for indexing on a scheduled basis.
See “About index scheduling for EDM” on page 531.
See “Scheduling Exact Data Profile indexing for EDM” on page 548.
See “Copying and loading remote EDM index files to the Enforce Server” on page 594.

Match on 3 columns in an EDM condition to increase detection

accuracy
In a structured data format such as a database, each row represents one record, with each
record containing related values for each column data field. Thus, for an EDM policy rule
condition to match, all the data must come from the same row or record of data. When you
define an EDM rule, you must select the fields that must be present to be a match. Although
there is no limit to the number of columns you can select to match in a row (up to the total
number of columns in the index, which is a maximum of 32), it is recommended that you match
on at least 2 or 3 columns, one of which must be unique. Generally matching on 3 fields is
preferred, but if one of the columns contains a unique value such as SSN or Credit Card
number, 2 columns may be used
Consider the following example. You want to create an EDM policy condition based on an
Exact Data Profile that contains the following 5 columns of indexed data:
■ First Name
■ Last Name
■ Social security number (SSN)
■ Phone Number
■ Email Address
If you select all 5 columns to be included in the policy, consider the possible results based on
the number of fields you require for each match.
If you choose "1 of the selected fields" to match, the policy will undoubtedly generate a large
number of false positives because the record will not be unique enough. (Even if the condition
Detecting content using Exact Data Matching (EDM) 609
Best practices for using EDM

only matches the SSN field, there may still be false positives because there are other types
of nine-digit numbers that may trigger a match.)
If you choose "2 of the selected fields" to match, the policy will still produce false positives
because there are potential worthless combinations of data: First Name + Last Name, Phone
Number + Email Address, or First Name + Phone Number.
If you choose to match on 4 or all 5 of the column fields, you will not be able to exclude certain
data field combinations because that option is only available for matches on 2 or 3 fields.
See “Leverage exception tuples to avoid false positives (EDM)” on page 609.
In this example, to ensure that you generate the most accurate match, the recommendation
is that you choose "3 of the selected fields to match." In this way you can reduce the number
of false positives while using one or more exceptions to exclude the combinations that do not
present a concern, such as First Name + Last Name + Phone Number
Whatever number of fields you choose to match, ensure that you are including the column
with the most unique data, and that you are matching at least 2-column fields.

Leverage exception tuples to avoid false positives (EDM)

The EDM policy condition lets you define exception tuples to exclude combinations on data.
You must select 2 or 3 columns to match to leverage exception tuples.
EDM allows detection based on any combination of columns in a given row of data (that is, N
of M fields from a given record). It can trigger on "tuples," or specified sets of data types. For
example, a combination of the first name and SSN fields could be acceptable, but a combination
of the last name and SSN fields would not. EDM also allows more complex rules such as
looking for N of M fields, but excluding specified tuples. For example, this type of rule definition
is required to identify incidents in violation of state data privacy laws, such as California SB
1386, which requires a first name and last name in combination with any of the following: SSN,
bank account number, credit card number, or driver's license number.
While exception tuples can help you reduce false positives, if you are using several exception
tuples, it may be a sign your index is flawed. In this case, consider redoing your index so you
do not have to use so many excluded combinations to achieve the desired matches.

Use a WHERE clause to detect records that meet specific criteria

(EDM)
Another configuration parameter of the EDM policy condition is the "Where" clause option.
This option matches on the exact value you specify for the field you select. You can enter
multiple values by separating each with commas. Using a WHERE clause to detect records
that meet specific criteria helps you improve the accuracy of your EDM policies.
Detecting content using Exact Data Matching (EDM) 610
Best practices for using EDM

For example, if you wanted to match only on an Exact Data Profile for "Employees" with a
"State" field containing certain states, you could configure the match where "State" equals
"CA,NV". This rule then causes the detection engine to match a message that contains either
CA or NV as content.

Use the minimum matches field to fine tune EDM rules

The minimum matches field is useful for fine-tuning the sensitivity of an EDM rule. For example,
one employee's first and last name in an outgoing email may be acceptable. However, 100
employees' first and last names is a serious breach. Another example might be a last name
and social security number policy. The policy might allow an employee to send information to
a doctor, but the sending of two last names and social security numbers is suspicious.

Combine Data Identifiers with EDM rules to limit the impact of two-tier
detection
When implementing EDM policies, it is recommended that you combine Data Identifiers (DIs)
rules with the EDM condition to form compound policies. As reference, note that all
system-provided policy templates that implement EDM rules also implement Data Identifier
rules in the same policy.
Data Identifiers and EDM are both designed to protect personally identifiable information (PII).
Including Data Identifiers with your EDM rules make your policies more robust and reusable
across detection servers because unlike EDM rules Data Identifiers are executed on the
endpoint and do not require two-tier detection. Thus, if an endpoint is off the network, the Data
Identifier rules can protect PII such as SSNs.
Data Identifier rules are also useful to use in your EDM policies while you are gathering and
preparing your confidential data for EDM indexing. For example, a policy might contain the
US SSN Data Identifier and an EDM rule for as yet unindexed or unknown SSNs.

Include an email address field in the Exact Data Profile for profiled
DGM (EDM)
You must include the appropriate fields in the Exact Data Profile to implement profiled DGM.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.
If you include the email address field in the Exact Data Profile for profiled DGM and map it to
the email data validator, email address will appear in the Directory EDM drop-down list (at
the remediation page).
Detecting content using Exact Data Matching (EDM) 611
Best practices for using EDM

Use profiled DGM for Network Prevent for Web identity detection
(EDM)
If you want to implement DGM for Network Prevent for Web, use one of the profiled DGM
conditions to implement identity matching. For example, you may want to use identity matching
to block all web traffic for a specific users. For Network Prevent for Web, you cannot use
synchronized DGM conditions for this use case.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.
See “Configuring the Sender/User based on a Profiled Directory condition” on page 944.
Chapter 27
Detecting content using
Indexed Document
Matching (IDM)
This chapter includes the following topics:

■ Introducing Indexed Document Matching (IDM)

■ Configuring IDM profiles and policy conditions

■ Best practices for using IDM

■ Remote IDM indexing

Introducing Indexed Document Matching (IDM)

You use Indexed Document Matching (IDM) to protect confidential information that is stored
as unstructured data in documents and files. For example, you can use IDM to detect financial
report data stored in Microsoft Office documents, merger and acquisition information stored
in PDF files, and source code stored in text files. You can also use IDM to detect binary files,
such as JPEG images, CAD designs, and multimedia files. In addition, you can use IDM to
detect derived content such as text that has been copied from a source document to another
file.
See “Supported forms of matching for IDM” on page 613.
See “About the Indexed Document Profile” on page 615.
Detecting content using Indexed Document Matching (IDM) 613
Introducing Indexed Document Matching (IDM)

About using IDM

To use IDM you collect the documents and files that you want to protect and index the files
and documents using the Enforce Server. During the indexing process the system uses an
algorithm to fingerprint each file or file contents. You then create a policy that contains one or
more IDM conditions that reference the index. The system then checks files against the index
for matches.
For example, consider a document source you have collected that includes several confidential
Microsoft Office documents (Word, Excel, PowerPoint) and image files (JPEG, BMP). You
create an Indexed Document Profile and index the documents and files. You then configure
the Content Matches Document Signature policy condition with a Minimum Document
Exposure setting of 50%. The IDM policy and index are deployed to a detection server.
In production the detection server checks inbound files against the index for matches. If an
inbound text-based file that the system can extract the contents from contains 50% or more
of content indexed from one of the source documents, the system records a match. And, if an
inbound image file has the same binary signature as one of the files that has been indexed,
the system records a match. The server and agent perform exact file matching automatically
on binary (non-extractable) files even though the policy condition is configured for partial
matching.

Note: The Mac Agent is substantially the same as the Windows Agent, except that the Mac
Agent does not support two-tier detection, and different channels are supported on the Mac
Agent and Windows Agent. See “Overview of Mac agent detection technologies and policy
authoring features” on page 2280.

See “Types of IDM detection” on page 614.

See “About the Indexed Document Profile” on page 615.

Supported forms of matching for IDM

IDM supports three forms of matching: exact file, exact file contents, and partial file contents.
Detection servers support all three forms of matching. The DLP Agent supports exact file and
partial file contents matching locally on the endpoint.
Table 27-1 summarizes the forms of matching by the platforms that IDM supports.
Detecting content using Indexed Document Matching (IDM) 614
Introducing Indexed Document Matching (IDM)

Table 27-1 Forms of matching for IDM

Type of matching Description Platform

Partial file contents Match of discrete passages of extracted and normalized Detection server
file contents.
DLP Agent
See “Using IDM to detect exact and partial file contents”
on page 621.

Exact file Match is based on the binary signature of the file. Detection server

See “Using IDM to detect exact files” on page 620. DLP Agent

Exact file contents Match is an exact match of the extracted and normalized Detection server
file contents.
Note: Symantec recommends
See “Using IDM to detect exact and partial file contents” that you use partial file contents
on page 621. matching rather than exact file
contents matching.

Types of IDM detection

There are three types of IDM detection implementations: agent, server, and two-tier. The type
you choose is based on your data loss prevention requirements.
Table 27-2 summarizes the three types of IDM detection.

Table 27-2 Types of IDM detection

Type Description Details

Agent IDM The DLP Agent supports partial contents matching in See “Agent IDM detection”
addition to exact file matching locally on the endpoint. on page 614.

Server IDM The detection server performs exact file matching, exact See “Server IDM detection”
file contents matching, and partial file contents matching. on page 615.

Two-tier IDM The DLP Agent sends the data to the detection server for See “Two-tier IDM detection”
policy evaluation. on page 615.

Agent IDM detection

With Agent IDM detection the DLP Agent evaluates documents locally in real time for partial
file contents and exact file matches. Agent IDM lets you use the block, notify, and user cancel
response rules on the endpoint with IDM policies. Symantec Data Loss Prevention also supports
detection on stream-based channels such as Printing or Copying/Pasting from the Clipboard.
See “Supported forms of matching for IDM” on page 613.
Detecting content using Indexed Document Matching (IDM) 615
Introducing Indexed Document Matching (IDM)

Agent IDM is enabled by default for a newly installed Endpoint Server. Agent IDM for macOS
is enabled by default for newly installed Endpoint Servers, but disabled if you upgrade. In the
case of all upgrades, if you want to use agent IDM you must enable it and reindex your IDM
profiles so that the endpoint index is generated and made available for download by DLP
Agents.

Server IDM detection

With server IDM detection, the IDM index is deployed to one or more detection servers and
all detection processing occurs on the server or servers. You can use server IDM to perform
exact file matching and file contents matching. For file contents matching, you can choose to
match file contents exactly or partially (10% to 90%) according to the Minimum Document
Exposure set for the IDM condition.
See “Supported forms of matching for IDM” on page 613.

Two-tier IDM detection

Two-tier is a method of detection that requires communication and data transfer between the
DLP Agent and the Endpoint Server to detect incidents. It is recommended only if you have
very large indexes and the agents do not have enough space to support the profiles. Two-tier
detection has more latency than local detection and requires substantially more network
bandwidth. As a result, it does not support inline response rules for blocking or pop-up
notifications.
With two-tier IDM the DLP Agent sends the data to the Endpoint Server for matching against
the server index. If two-tier detection is enabled for IDM, the server supports all forms of
matching, including exact file, exact file contents, and partial file contents.

Note: Two-tier detection is not supported on agents running on macOS endpoints.

If you use two-tier detection for IDM on the Windows endpoint, make sure that you understand
the performance implications of two-tier detection.
See “Two-tier detection for DLP Agents” on page 395.

About the Indexed Document Profile

The Indexed Document Profile is the user-defined configuration for creating and generating
IDM indexes. You define an Indexed Document Profile using the Enforce Server administration
console. You reference the profile in one or more IDM policy rules or exceptions. The profile
is reusable across policies: you can create one document profile and reference it in multiple
policies. When you create the Indexed Document Profile, you have the option of indexing
the document source immediately on save of the profile or at a scheduled time. However, you
must index the document source before you can detect policy violations.
Detecting content using Indexed Document Matching (IDM) 616
Introducing Indexed Document Matching (IDM)

See “Creating and modifying Indexed Document Profiles” on page 629.

For example, consider a scenario where you want to create an IDM index to detect when exact
versions of certain documents are found, or when passages or sections of the documents are
exposed. When you define the Indexed Document Profile, you can upload the documents
to the Enforce Server, or you can index the documents using the Remote IDM Indexer. You
can also use file name and file size filters in the document profile to include or ignore certain
files during indexing.

About the document data source

The document data source is the collection of documents you want to index and detect using
IDM. The indexing algorithm uses a fixed amount of memory per document, so it is bound by
the number of documents, rather than their total size. With a profile using 2 GB when loaded
in memory, approximately 1,000,000 documents can be indexed. The exact number of
documents the system permits depends on how many documents have text that can be
extracted.
See “Preparing the document data source for indexing” on page 625.
For smaller document sets (50 MB or less), you can upload the source files to the Enforce
Server using a ZIP file. For larger document sets (up to 2 GB), you can copy the source files
to the host file system where the Enforce Server is installed, either encapsulated within a single
ZIP file or as individual files. You can use FTP/S to transfer the files to the Enforce Server.
Alternatively, you can use the Remote IDM Indexer to remotely index documents.
See “About indexing remote documents” on page 617.
The document data source can contain any file type and any combination of files. If the system
can extract the contents of the file, IDM detects file contents, either exactly or partially depending
on the platform and the policy configuration. If the system cannot extract the contents of the
file, IDM detects the exact file.
See “Supported forms of matching for IDM” on page 613.

About the indexing process

The IDM indexer is a separate process that installs with and runs on the Enforce Server. Partial
matching is disabled by default on the Agent, and enabled by default on the Detection Server.
See “Configure endpoint partial content matching” on page 632.
The number of documents you can index has increased to up to 1,000,000 on the Server and
up to 30,000 on the Agent. These values are based on initial default limits of 2 GB/60 MB. You
can change the 60 MB limit on the Configure Partial Matching page. While it is possible to
reconfigure the 2 GB limit by changing the size of
com.vontu.profiles.documents.maxIndexSize in \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\indexer.properties,
Detecting content using Indexed Document Matching (IDM) 617
Introducing Indexed Document Matching (IDM)

Symantec recommends that you contact Symantec Support before reconfiguring properties
files.
During indexing, the system stores the document source by changing \Program
Files\Symantec\DataLossPrevention\ServerPlatformCommon\15.5\Protect\documentprofiles
(on Windows) or
/var/Symantec/DataLossPrevention/ServerPlatformCommon/15.5/documentprofiles
(on Linux).
The result of the indexing process is four separate indexes: one for detection servers (the
server index) and three for DLP Agents (the endpoint indexes). All indexes are generated
regardless of whether or not you are licensed for Endpoint Prevent or Endpoint Discover. On
the Enforce Server, the system stores the indexes in \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\index (on Windows)
or /var/Symantec/DataLossPrevention/EnforceServer/15.5/index (on Linux).
See “About the server index files and the agent index files” on page 618.
For most IDM deployments there is no need to configure the indexer. If necessary you can
configure key settings for the indexer using the file \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\Indexer.properties.

Note: Symantec recommends that you contact Symantec Support for guidance if you decide
to modify a properties file. Modifying properties incorrectly can cause serious issues with the
operation of Symantec Data Loss Prevention.

About indexing remote documents

IDM indexing can be done on the Enforce Server or remotely, using the Remote IDM Indexer.
See “Creating and modifying Indexed Document Profiles” on page 629.
Using the CIFS protocol you can remotely index documents that are stored on one or more
file shares in a Microsoft Windows-networked environment. You provide the Universal Naming
Convention (UNC) path to a shared network folder resource and index the documents that
stored in that folder or subfolders depending on the level of permission granted.
See “Using the remote SMB share option to index file shares” on page 637.
WebDAV provides extensions to the HTTP 1.1 protocol that enable collaborative editing and
management of files that are stored on remote web servers. You can index such documents
remotely by exposing them to the Enforce Server using WebDAV. For example, you can use
the remote SMB option with a UNC address and a WebDAV client to index Microsoft SharePoint
or OpenText Livelink documents.
See “Using the remote SMB share option to index SharePoint documents” on page 637.
Detecting content using Indexed Document Matching (IDM) 618
Introducing Indexed Document Matching (IDM)

Note: To index documents on a SharePoint server using the Remote SMB Share option, you
must deploy the Enforce Server to a supported Windows Server operating system host. Data
Loss Prevention depends on Windows NTLM services to mount a WebDAV server.

About the server index files and the agent index files
When you create an Indexed Document Profile and index a document data source, the
system generates four index files, one for the server and three for the endpoint. The indexes
are generated regardless of whether or not you are licensed for a particular detection server
or the DLP Agent.
See “About index deployment and logging” on page 619.
The server index is a binary file named DocSource.rdx. The server index supports exact file,
exact file contents, and partial file contents matching. If the document data source is large,
the server index may span multiple *.rdx files.
The endpoint index is comprised of one secure binary file, either EndpointDocSource.rdx or
LegacyEndpointDocSource.rdx for backward compatibility with 14.0 and 12.5 Agents. The
endpoint index supports exact file and partial file contents matching. EncryptedDocSource.rdx
is for endpoint partial matching.
See “Supported forms of matching for IDM” on page 613.
To create the index entries for exact file and exact file contents matching, the system uses the
MD5 message-digest algorithm. This algorithm is a one-way hash function that takes as input
a message of arbitrary length and produces as output a 128-bit message-digest or "fingerprint"
of the input. If the message input is a text-based document that the system can extract contents
from, such as a Microsoft Word file, the system extracts all of the file content, normalizes it by
removing whitespace, punctuation, and formatting, and creates a cryptographic hash. Otherwise,
if the message input is a file that the system cannot extract the contents from, such as an
image file, small file, or unsupported file type, the system creates a cryptographic hash based
on the binary signature of the file.

Note: To improve accuracy across different versions of the Enforce Server and DLP Agent,
only binary matching MDF is supported on the agent, whether or not the file contains text.

See “Using IDM to detect exact files” on page 620.

See “Using IDM to detect exact and partial file contents” on page 621.
In addition, for file formats the system can extract the contents from, the indexer creates hashes
for discrete sections of content or text passages. These hashes are used for partial matching
for both server and agent indexes. The system uses a selection method to store hashed
sections of partial content so that not all extractable text is indexed. The hash function ensures
Detecting content using Indexed Document Matching (IDM) 619
Introducing Indexed Document Matching (IDM)

that the server index does not contain actual document content. Table 27-3 summarizes the
types of matching supported by the endpoint and server indexes.

Table 27-3 Types of matching supported by the endpoint and server indexes

Message input Output Matches Included in index file

A single cryptographic hash Exact file contents DocSource.rdx

derived from all of the extracted
LegacyEndpointDocSource.rdx
and normalized file contents
Text-based file that the
system can extract the
One or more rolling hashes based Partial file DocSource.rdx
contents from
on discrete passages of extracted contents (10% to
EndpointDocSource.rdx
and normalized content using a 90%)
selection method EncryptedDocSource.rds

Binary file, custom file, A single cryptographic hash based Exact file binary DocSource.rdx
small file, encapsulated on the binary signature of the file
EndpointDocSource.rdx
file
LegacyEndpointDocSource.rdx
Agent only: Text-based
file that the system can
extract the contents
from.

About index deployment and logging

The Enforce Server is responsible for deploying the IDM server and endpoint indexes to the
detection and Endpoint Servers. You cannot manually deploy the indexes.
The system deploys the server index to each designated detection server in the folder \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\index (on Windows)
or /var/Symantec/DataLossPrevention/EnforceServer/15.5/index (on Linux). At run-time,
the detection server loads the server index into random access memory (RAM) when an active
IDM policy that references that index is deployed to that detection server.
The system deploys the endpoint index (either EndpointDocSource.rdx or
LegacyEndpointDocSource.rdx) to each designated Endpoint Server. When a DLP Agent
connects to the Endpoint Server, the DLP Agent downloads the endpoint index. Assuming
agent IDM is enabled, the DLP Agent loads the endpoint index into memory when the index
is required by an active local policy.
See “Estimating endpoint memory use for agent IDM” on page 646.
You cannot manually deploy either the server or endpoint index files by copying the *.rdx file
or files from the Enforce Server to a detection server. The detection server does not monitor
the index destination folder for new index files; the detection server must be notified by the
Enforce Server that an index has been deployed. If a detection server is offline during the
Detecting content using Indexed Document Matching (IDM) 620
Introducing Indexed Document Matching (IDM)

index deployment process, the Enforce Server stops trying to deploy the index. When the
detection server comes back online the Enforce Server deploys the index to the detection
server. The same is true for DLP Agents. There is no way to manually copy the endpoint index
to the endpoint host and have the DLP Agent recognize the index.
Table 27-4 summarizes how IDM indexes are deployed and the logs files to check to
troubleshoot index deployment.

Table 27-4 IDM index deployment and logging

Platform Index file Deployment Logged

Server DocSource.rdx Sent automatically by the Enforce detection_operational.log

Server to each designated detection
Use to identify if the index profile was
server after the index is generated.
deployed to the detection server.
Loaded by the detection server into
FileReader.log
RAM at run-time.
Use to determine if the index profile is
loaded into memory.

Agent EndpointDocSource.rdx Both of these files are sent by the endpoint_server_operational.log

Enforce Server to each designated
or Use to identify if the index profile was
Endpoint Server. The agent selects
deployed to the Endpoint Server.
LegacyEndpoint the appropriate file, based on the
DocSource.rdx version of the agent. Pull the agent logs to see if the index
profile is loaded into memory.
LegacyEndpointDocSource.rdx
is for backward compatibility with 14.0
and 12.5 Agents

Downloaded by the DLP Agent based

on the agent connection interval.

Loaded into RAM at run-time when a

local, active policy requires the index.

Using IDM to detect exact files

The system performs exact file matching automatically on all binary files. In addition, if the file
format is text-based but the system is unable to c extract the contents from the file, the system
performs exact file matching. This behavior is true even if you select a Minimum Document
Exposure percentage for the IDM condition that is less than Exact. The DLP Agent performs
exact file matching on all files, both binary files and files with extractable text.
See “About the server index files and the agent index files” on page 618.
For example, an IDM rule with a minimum document exposure set to 50% automatically
attempts to match a binary file exactly because the Minimum Document Exposure setting
only applies to files that the system cannot extract the contents from. In addition, the system
Detecting content using Indexed Document Matching (IDM) 621
Introducing Indexed Document Matching (IDM)

performs exact file matching for files containing a very small amount of text, as well as files
that were encapsulated when indexed, even if text-based.
As an optimization for exact file type matching in Endpoint IDM detection, the system checks
the byte size of the file before computing the run-time hash for comparison against the index.
If the byte size does not match size of the indexed file there is no need to compute the exact
file hash. The system does not consider the file format when creating the exact file fingerprint.
Table 27-5 summarizes exact file type matching behavior.

Table 27-5 Requirements for using IDM to detect files

File format Example Description

File format from which the Proprietary or non-supported If the system cannot extract the contents from the file
system cannot extract the document format format, you can use IDM to detect that specific file
contents using exact binary matching.

See “Do not compress files in the document source”

on page 649.

Binary file GIF, MPG, AVI, CAD design, You can use IDM to detect binary file types from
JPEG files, audio/video files which you cannot extract the contents, such as
images, graphics, JPEGs, etc. Binary file detection
is not supported on stream-based channels.

File containing a small CAD files and Visio diagrams A file containing a small amount of text is treated as
amount of text a binary file even if the contents are text-based and
can have their contents extracted.

See “Using IDM to detect exact and partial file

contents” on page 621.

Encapsulated file Any file that is encapsulated when If a document data source file is encapsulated in an
indexed (even if text-based and archive file, the file contents of the subfile cannot be
can have their contents extracted and only the binary signature of the file can
extracted); for example, Microsoft be fingerprinted. This does not apply to document
Word file archived in a ZIP file archive that are indexes.

See “About the document data source” on page 616.

Using IDM to detect exact and partial file contents

The primary use case for IDM is to detect file contents (as distinguished from binary files, such
as audio or video files, for example). On both the server and the endpoint, you can use IDM
to match files exactly or partially (10% to 90%). Additionally, on the server, file contents can
be matched exactly. Symantec recommends that you use partial content match because it is
much more reliable than exact content match. File contents include text-based content of any
Detecting content using Indexed Document Matching (IDM) 622
Introducing Indexed Document Matching (IDM)

document type the system can extract the file contents from, such as Microsoft Office documents
(Word, Excel, PowerPoint), PDF, and many more.
See “Supported formats for content extraction” on page 980.
An exact file contents match means that the normalized extracted content from the file matches
exactly the content of a file that has been indexed. With partial matching on the endpoint, using
a 90% threshold generates 90% to 100% content matches. These are less strict than the
previous exact content matches and may, in some cases, match even if there are some minor
differences between the scanned file and the indexed file.
The system does not consider the file format or file size when creating the cryptographic hash
for the index or when checking for an exact file contents match against the index. A document
might contain much more content, but the system detects only the file contents that are indexed
as part of the Indexed Document Profile. For example, consider a situation where you index
a one-page document, and that one-page document is included as part of a 100-page document.
The 100-page document is considered an exact match because its content matches the
one-page document exactly.
See “About the server index files and the agent index files” on page 618.
For text-based files from which you can extract the contents, in addition to creating the MD5
fingerprint for exact file contents matching, the system uses a rolling hash algorithm to register
discrete sections or passages of content. In this case the system uses a selection method to
store hashed sections of content; not all text is hashed in the index. The index does not contain
actual document content.
Table 27-6 lists the requirements to match file contents using IDM.

Table 27-6 Requirements for using IDM to detect content

Requirement Description

File formats from The system must be able to extract the the file format and extract file content. Data Loss
which you can extract Prevention supports content extraction for over 100 file types.
the contents
See “Supported formats for content extraction” on page 980.

Unencapsulated file To match file contents, the source file cannot be encapsulated in an archive file when the
source file is indexed. If a file in the document source is encapsulated in an archive file, the
system does not index the file contents of the encapsulated file. Any encapsulated file is
considered for exact matches only, like image files and other unsupported file formats.

See “Do not compress files in the document source” on page 649.
Note: The exception to this is the main ZIP file that contains the document data source, for
those upload methods that use an archive file. See “Creating and modifying Indexed Document
Profiles” on page 629.
Detecting content using Indexed Document Matching (IDM) 623
Introducing Indexed Document Matching (IDM)

Table 27-6 Requirements for using IDM to detect content (continued)

Requirement Description

Minimum amount of For exact file contents matching, the source file must contain at a minimum 50 characters of
text normalized text before the extracted coProgram
Files\Symantec\DataLossPrevention\EnforceServertent is indexed. Normalization involves
the removal of punctuation and whitespace. A normalized character therefore is either a
number or a letter. This size is set by the min_normalized_size=50 parameter in the file
\Program Files\Symantec\DataLossPrevention
\EnforceServer\15.5\Protect\config\Indexer.properties. If file contains less
than 50 normalized characters, the system performs an exact file match against the file binary.
Note: Symantec advises that you consult with Symantec Support for guidance if you need to
change an advanced setting or edit a properties file. Incorrectly updating a properties file can
have unintended consequences.

For partial file contents matching, there must be at least 300 normalized characters. However,
the exact length is variable depending on the file contents and encoding.

See “Do not index empty documents” on page 649.

Maximum amount of The default maximum size of the document that can be processed for content extraction at
text run-time is 30,000,000 bytes. If your document is over 30,000,000 bytes you need to increase
the default maximum size in Advanced server settings. Contact Symantec Support for
assistance when changing Advanced server settings, to avoid any unintended consequences.

About using the Content Matches Document Signature policy

condition
You use the IDM condition Content Matches Document Signature From to implement IDM
detection rules and exceptions in your policies.
See “Configuring the Content Matches Document Signature policy condition” on page 646.
When you configure this condition, you specify the IDM index to use and how the condition
should match against the index using the Minimum Document Exposure setting. You can
select either Exact or partial between 10% to 90%. For example, if you select 70% for the
Minimum Document Exposure, a match occurs only if 70% or more of the hashed file contents
is detected.
See “Use parallel IDM rules to tune match thresholds” on page 654.
If a file is not text-based, its content is not extractable, is very small, or is encapsulated in an
archive file, the file is matched exactly based on its binary signature. This form of matching is
performed automatically by the system, regardless of what configuration option you choose
for the Minimum Document Exposure setting. This setting only applies to partial file contents
matching.
Detecting content using Indexed Document Matching (IDM) 624
Introducing Indexed Document Matching (IDM)

See “Using IDM to detect exact files” on page 620.

Table 27-7 describes the matching supported by the Content Matches Document Signature
From policy condition.

Table 27-7 Minimum document exposure settings for the IDM condition

Configuration setting File contents Match Example

Exact file matching File contents All of the extracted and Microsoft Word
normalized file contents, if
See “Using IDM to detect
the file is text-based and
exact and partial file
from which the content is not
contents” on page 621.
extractable

Exact content matching The endpoint performs Microsoft Word, JPG, MP3
binary matching on all files.

Partial content matching File contents Discrete passages of text Microsoft Word

See “Using IDM to detect

exact and partial file
contents” on page 621.

About white listing partial file contents

Often sensitive documents contain standard boilerplate text that does not require protection,
including front matter, headers, and footers. Information contained in document headers and
footers is likely to cause false positives. Likewise, boilerplate text, such as standard language
and non-proprietary corporate content that is repeated across confidential documents, can
cause false positives.
See “White listing file contents to exclude from partial matching” on page 627.
Removing non-sensitive boilerplate or header/footer content before indexing is usually not
feasible, especially if you have a large document data set. In this case you can configure the
system to exclude ("whitelist") non-sensitive text. You do this by adding the text to ignore to
the whitelist file. During indexing, any whitelisted content found in the source files is ignored.
At run-time the content does not cause false positives because it has been excluded.
See “Use white listing to exclude non-sensitive content from partial matching” on page 651.

Note: White listing only applies to partial file contents matching; it does not apply to exact file
contents matching. The white listing file is not checked at run-time when the system computes
the cryptographic hashes for exact file contents matching.
Detecting content using Indexed Document Matching (IDM) 625
Configuring IDM profiles and policy conditions

Configuring IDM profiles and policy conditions

Table 27-8 provides the workflow for creating IDM profiles and configuring IDM policies.
Complete the steps to ensure that your IDM rules are properly implemented and are as accurate
and efficient as possible.

Table 27-8 Implementing IDM

Step Action Description

1 Identify the content you want to protect and See “Using IDM to detect exact and partial file contents”
collect the documents that contain this on page 621.
content.
See “Using IDM to detect exact files” on page 620.

2 Prepare the documents for indexing. See “Preparing the document data source for indexing”
on page 625.

3 Whitelist headers, footers, and boilerplate See “White listing file contents to exclude from partial
text. matching” on page 627.

4 Create an Indexed Document Profile and See “Creating and modifying Indexed Document Profiles”
specify the document source. on page 629.

5 Configure any document source filters. See “Filtering documents by file name” on page 640.

6 Schedule indexing as necessary. See “Scheduling document profile indexing” on page 643.

7 Configure one ore more IDM policy conditions See “Configuring the Content Matches Document Signature
or exceptions. policy condition” on page 646.

8 Test and troubleshoot your IDM See “Troubleshooting policies” on page 445.
implementation.

Preparing the document data source for indexing

You must collect and prepare the documents you want to index. These documents are known
as the document data source.
See “About the document data source” on page 616.
A document data source is a ZIP archive file that contains the documents to index. It can also
be the files stored in a file share on a local or remote computer. A document data source ZIP
file can contain any file type and any combination of files. If you have a file share that already
contains the documents you want to protect, you can reference this share in the document
profile.
Detecting content using Indexed Document Matching (IDM) 626
Configuring IDM profiles and policy conditions

Table 27-9 Preparing the document source for indexing

Step Action Description

1 Collect all of the documents Collect all of the documents you want to index and put them in a folder.
you want to protect.
See “About the document data source” on page 616.

2 Uncompress all the files you The files you index should be in their unencapsulated, uncompressed state.
want to index. Check the document collection to make sure none of the files are
encapsulated in an archive file, such as ZIP, TAR, or RAR. If a file is
embedded in an archive file, extract the source file from the archive file and
remove the archive file.

See “Using IDM to detect exact and partial file contents” on page 621.

3 Separate the documents if To protect a large amount of content and files, create separate collections
you have more than for each set of documents over 1,000,000 files in size, with all files in their
1,000,000 files to index. unencapsulated, uncompressed state. For example, if you have 15,000,000
documents you want to index, separate the files by folders, one folder
containing 750,000 files, and another folder containing the remaining 750,000
files. or, you can change the value of
com.vontu.profiles.documents.maxIndexSize in the
Indexer.properties to accommodate larger data sets. The rule of thumb is
2 GB/1 million documents.

See “Create separate profiles to index large document sources” on page 653.

4 Decide how you are going to The indexing process is a separate process that runs on the Enforce Server.
make the document source To index the document source you must make the files accessible to the
files available to the Enforce Enforce Server. You have several options. Decide which one works best
Server. for your needs and proceeding accordingly.

See “Uploading a document archive to the Enforce Server” on page 633.

See “Referencing a document archive on the Enforce Server” on page 634.

See “Using local path on Enforce Server” on page 636.

See “Using the remote SMB share option to index file shares” on page 637.

5 Configure the document The next step is to configure the document profile, or, alternatively, if you
profile. want to exclude specific document content from detection, whitelist it.

See “Creating and modifying Indexed Document Profiles” on page 629.

See “White listing file contents to exclude from partial matching” on page 627.
Detecting content using Indexed Document Matching (IDM) 627
Configuring IDM profiles and policy conditions

White listing file contents to exclude from partial matching

You use white listing to exclude unimportant or noncritical content, such as standard boilerplate
text, document headers and footers, from the IDM index. White listing such content helps to
reduce false positives.
See “About white listing partial file contents” on page 624.
See “Use white listing to exclude non-sensitive content from partial matching” on page 651.
To exclude content from matching, you copy the content you want to exclude to a text file and
save the file as Whitelisted.txt. By default, the file must contain at least 300 non-whitespace
characters to have its content fingerprinted for white listing purposes. When you index the
document source, the Enforce Server or the Remote IDM Indexer looks for the
Whitelisted.txt file.

See “Use white listing to exclude non-sensitive content from partial matching” on page 651.
Table 27-10 describes the process for excluding document content using white listing.

Table 27-10 White listing non-sensitive content

Step Action Description

1 Copy the content you want to Copy only noncritical content you want to exclude, such as standard
exclude from matching into a text boilerplate text and document headers and footers, to the text file. By
file. default, for file contents matching the file to be indexed must contain
at least 300 characters. This default setting applies to the
Whitelisted.txt file as well. For whitelisted text you can change
this default setting.

See “Changing the default indexer properties” on page 644.

2 Save the text file as The Whitelisted.txt file is the source file for storing content you
Whitelisted.txt. want to exclude from matching.

3 Save the file to the Save the file to \Program

whitelisted directory on the Files\Symantec\DataLossPrevention\ServerPlatformCommon
Enforce Server host file system. \15.5\Protect\documentprofiles\whitelisted (on Windows)
or
/var/Symantec/DataLossPrevention/ServerPlatformCommon
/15.5/documentprofiles/whitelisted (on Linux).
Detecting content using Indexed Document Matching (IDM) 628
Configuring IDM profiles and policy conditions

Table 27-10 White listing non-sensitive content (continued)

Step Action Description

4 Configure the Indexed When you index the document data source, the Enforce Server looks
Document Profile and generate for the Whitelisted.txt file. If the file exists, the Enforce Server
the index. copies it to Whitelisted.x.txt, where x is a unique identification
number corresponding to the Indexed Document Profile. Future
indexing of the profile uses the profile-specific Whitelisted.x.txt
file, not the generic Whitelisted.txt file.

See “Creating and modifying Indexed Document Profiles” on page 629.

Manage and add Indexed Document Profiles

The Manage > Data Profiles > Indexed Documents screen lists all configured Indexed
Document Profiles in the system. From this screen you can manage existing profiles and
add new ones.

Table 27-11 Indexed Documents screen actions

Action Description

Add IDM profile Click Add Document Profile to create a new Indexed Document Profile.

See “Configuring IDM profiles and policy conditions” on page 625.

Edit IDM profile Click the name of the Document Profile, or click the pencil icon to the far right of the profile, to
modify an existing Document Profile.

See “Creating and modifying Indexed Document Profiles” on page 629.

Remove IDM profile Click the red X icon next to the far right of the document profile row to delete that profile from
the system. A dialog box confirms the deletion.
Note: You cannot edit or remove a profile if another user currently modifies that profile, or if a
policy exists that depends on that profile.

Refresh IDM profile Click the refresh arrow icon at the upper right of the Indexed Documents screen to fetch the
status latest status of the indexing process. If you are in the process of indexing, the system displays
the message "Indexing is starting." The system does not automatically update the screen when
the indexing process is complete.

Table 27-12 Indexed Documents screen details

Column Description

Document Profile The name of the Indexed Document Profile.

Detecting content using Indexed Document Matching (IDM) 629
Configuring IDM profiles and policy conditions

Table 27-12 Indexed Documents screen details (continued)

Column Description

Detection server The name of the detection server that indexes the Document Profile and the Document Profile
version.

Click the triangle icon beside the Document Profile name to display this information. It appears
beneath the name of the Document Profile.

Location The location of the file(s) on the Enforce Server that the system has profiled and indexed.

Documents The number of documents that the system has indexed for the document profile.

Status The current status of the document indexing process, which can be any of the following:

■ Next scheduled indexing (if it is not currently indexing)

■ Sending an index to a detection server
■ Indexing
■ Deploying to a detection server

In addition, beneath the status of the indexing process, the system displays the status of each
detection server, which can be any of the following:

■ Completed, including a completion date

■ Pending index completion (that is, waiting for the Enforce Server to finish indexing a file)
■ Replicating indexing
■ Creating index (internally)

Error messages The Indexed Document screen also displays any error messages in red (for example, if the
document profile is corrupted or does not exist).

See “Data Profiles” on page 375.

See “Scheduling document profile indexing” on page 643.
See “Configuring the Content Matches Document Signature policy condition” on page 646.

Creating and modifying Indexed Document Profiles

You define and configure an Indexed Document Profile at the screen Manage > Data Profiles
> Indexed Documents > Configure Document Profile. The document profile specifies the
document data source, the indexing parameters, and the indexing schedule. You must define
a document profile to implement IDM detection.
See “About the Indexed Document Profile” on page 615.
Table 27-13 describes the steps for creating and modifying IDM profiles.
Detecting content using Indexed Document Matching (IDM) 630
Configuring IDM profiles and policy conditions

Table 27-13 Configuring a document profile

Step Action Description

1 Navigate to the screen Manage You must be logged on to the Enforce Server administration console
> Data Profiles > Indexed as an administrator or policy author.
Documents.
See “Policy authoring privileges” on page 375.

2 Click Add Document Profile. Select an existing Indexed Document Profile to edit it.

See “Manage and add Indexed Document Profiles” on page 628.

3 Enter a Name for the Document Choose a name that describes the data content and the index type
Profile. (for example, "Research Docs IDM"). The name is limited to 255
characters.

See “Input character limits for policy configuration” on page 431.

Detecting content using Indexed Document Matching (IDM) 631
Configuring IDM profiles and policy conditions

Table 27-13 Configuring a document profile (continued)

Step Action Description

4 Select the Document Source Select one of the five options for indexing the document data source,
method for indexing. depending on how large your data source is and how you have
packaged it.

See “About the document data source” on page 616.

Options for making the data source available to the Enforce Server.

■ Upload Document Archive to Server Now

To use this method, you Browse and select a ZIP file containing
the documents to be indexed. The maximum size of the ZIP file
is 50 MB.
See “Uploading a document archive to the Enforce Server”
on page 633.
■ Reference Archive on Enforce Server
Use this method if you have copied the ZIP file to the file system
host where the Enforce Server is installed. The maximum size of
the ZIP file is 2 GB. This ZIP file is available for selection in the
drop-down field.
See “Referencing a document archive on the Enforce Server”
on page 634.
■ Use Local Path on Enforce Server
This method lets you index individual files that are local to the
Enforce Server. With this method the files to be indexed cannot
be archived in a ZIP file.
See “Using local path on Enforce Server” on page 636.
■ Use Remote SMB Share
See “About indexing remote documents” on page 617.
■ Import from a remotely created IDM profile

The Remote IDM Indexer is a standalone tool that lets you index
your confidential documents and files locally on the systems where
these files are stored. See Remote IDM Indexing See “About the
Remote IDM Indexer” on page 655. for more information.
■ See “Using the remote SMB share option to index SharePoint
documents” on page 637.
Detecting content using Indexed Document Matching (IDM) 632
Configuring IDM profiles and policy conditions

Table 27-13 Configuring a document profile (continued)

Step Action Description

5 Optionally, configure any Filters. You can specify file name and file size filters in the document profile.
The filters tell the system which files to include or ignore during
indexing.

See “Filter documents from indexing to reduce false positives”

on page 652.

Enter files to include in the File Name Include Filters field, or enter
files to exclude in the File Name Exclude Filters field.

See “Filtering documents by file name” on page 640.

Select file sizes to ignore, either Ignore Files Smaller Than or Ignore
Files Larger Than.

See “Filtering documents by file size” on page 642.

6 Select one of the Indexing As part of creating a document profile, you can set up a schedule for
options. indexing the document source.
You do not have to select an indexing option to create a profile that
you can reference in a policy, but you must select an indexing option
to generate the index and actually detect matches using an IDM policy.

■ Select Submit Indexing Job on Save to index the document

source immediately on save of the Document Profile.
■ Select Submit Indexing Job on Schedule to display schedule
options so that you can schedule indexing at a later time.
See “Scheduling document profile indexing” on page 643.

7 Click Save. You must save the document profile.

Configure endpoint partial content matching

You can enable or disable Endpoint partial content matching for IDM profiles on the Enforce
Server administration console at Manage > Data Profiles > Indexed Documents > Configure
Endpoint Partial Matching. This page displays a snapshot in time of all deployed profiles
with their estimated current size. When you click Save, the profiles that you have selected
have partial matching enabled.
Table 27-14 describes the steps for configuring partial content matching on the endpoint.
Detecting content using Indexed Document Matching (IDM) 633
Configuring IDM profiles and policy conditions

Table 27-14 Configuring endpoint partial content matching

Step Action Description

1 Navigate to the Manage >

Data Profiles > Indexed
Documents> screen.

2 Click Configure Partial The Configure Partial Content Matching page displays a
Matching. snapshot of all profiles that are deployed at the time you
access the page, along with their estimated current size.
Note: The Configure Partial Content Matching page is not
accessible while any IDM profile is being indexed.

3 Click the checkbox under

Note: If a profiles starts re-indexing when you are on this
Endpoint Partial Matching
page, and the profile size changes significantly, and if the
for all profiles that you want
profile is also selected for partial matching, the list of selected
to enable for partial matching.
profiles might be affected.

4 Click Save.
Note: The sum of all deployed profiles on the endpoint cannot
exceed the value of Endpoint Total Profile Size (MB), which
is set to a default 60 MB. To change this value, enter a
different value in the Endpoint Total Profile Size (MB) box.

After you click Save, the profiles that you have selected have
partial matching enabled. Click Refresh to ensure that you
have the latest status of the indexing operation.

Uploading a document archive to the Enforce Server

The Upload Document Archive to Server Now option lets you upload a ZIP file with a
maximum size of 50 MB to the Enforce Server and index its contents. To use this method of
indexing, the document source must meet the requirements described in the table Table 27-15
To upload the document archive to Enforce Server describes the process for using the Upload
Document Archive to Server Now method of indexing.
Detecting content using Indexed Document Matching (IDM) 634
Configuring IDM profiles and policy conditions

To upload the document archive to Enforce Server

1 Navigate to the screen Manage > Data Profiles > Indexed Documents > Configure
Document Profile.
2 Select the option Upload Document Archive to Server Now.
Click Browse and select the ZIP file. The ZIP file can be anywhere on the same network
as the Enforce Server.
Optionally, you can type the full path and the file name if the ZIP file is local to the Enforce
Server, for example: c:\Documents\Research.zip.
3 Specify one or more file name or file size filters (optional).
See “Filtering documents by file name” on page 640.
4 Select one of the indexing options (optional).
See “Scheduling document profile indexing” on page 643.
5 Click Save.

Table 27-15 Requirements for using the Upload Document Archive to Server Now option

Requirement Description

ZIP file only The document archive must be a ZIP file; no other encapsulation formats are supported
for this option.

50 MB or less You cannot use this option if the document archive ZIP file is more than 50 MB because
files exceeding that size limit can take too long to upload and slow the performance of the
Enforce Server. If the document archive ZIP file is over 50 MB, use the Reference Archive
on Enforce Server method instead.

UTF-8 file names only The IDM indexing process fails (and presents you with an "unexpected error") if the
document archive (ZIP file) contains non-ASCII file names in encodings other that UTF-8.
If the ZIP file contains files with non-ASCII file names, use one of the following options
instead to make the files available to the Enforce Server for indexing:

■ Use the Remote IDM Indexer.

■ Use Local Path on Enforce Server
■ Use Remote SMB Share

Referencing a document archive on the Enforce Server

You use the Reference Archive on Enforce Server option to create an IDM index based on
a ZIP file that is local to the Enforce Server. You use this option to index source documents
that are archived in a ZIP file that is larger than 50 MB.
See “About the document data source” on page 616.
Detecting content using Indexed Document Matching (IDM) 635
Configuring IDM profiles and policy conditions

Note: If the ZIP file is less than 50 MB, you can use the Upload Document Archive to Server
Now option instead. See “Uploading a document archive to the Enforce Server” on page 633.

To use the Reference Archive on Enforce Server option, you copy the ZIP file to the \Program
Files\Symantec\DataLossPrevention\EnforceServer\Protect\documentprofiles folder
on the Enforce Server file system host. Once you have copied the ZIP file to the Enforce
Server, you can select the document source from the pull-down menu at the Add Document
Profile screen. See “Creating and modifying Indexed Document Profiles” on page 629.
To reference the document archive on the Enforce Server describes the procedure for using
the Reference Archive on Enforce Server option.
To reference the document archive on the Enforce Server
1 Copy the ZIP file to the Enforce Server.
■ On Windows, copy the ZIP file to directory \Program
Files\Symantec\DataLossPrevention\ServerPlatformCommon\15.1\Protect\documentprofiles

■ On Linux, copy the ZIP file to directory

/var/Symantec/DataLossPrevention/ServerPlatformCommon/15.5/documentprofiles

See Table 27-16 on page 636.

Note: The system deletes the document data source file after the indexing process
completes.

2 Log on to the Enforce Server administration console.

3 Navigate to the screen Manage > Data Profiles > Indexed Documents > Configure
Document Profile.
4 Select the file from the Reference Archive on Enforce Server pull-down menu.

Note: A document source currently referenced by another Indexed Document Profile

does not appear in the list.

5 Specify one or more file name or file size filters (optional).

See “Filtering documents by file name” on page 640.
6 Select one of the indexing options (optional).
See “Scheduling document profile indexing” on page 643.
7 Click Save to save the document profile.
Detecting content using Indexed Document Matching (IDM) 636
Configuring IDM profiles and policy conditions

Table 27-16 Requirements to use the option Reference Archive on Enforce Server

Requirement Description

ZIP file only The document archive must be a ZIP file; no other encapsulation formats are supported
for this option.

The ZIP file can be at the most 2 GB. Consider using a third-party solution (such as Secure
FTP), to copy the ZIP file securely to the Enforce Server.

See “About the document data source” on page 616.

subfile not archived Make sure the subfiles are proper and not encapsulated in an archive (other than the
top-level profile archive).

See “Do not compress files in the document source” on page 649.

See “Do not index empty documents” on page 649.

UTF-8 file names only Do not use this method if any of the names of the files you are indexing contain non-ASCII
file names.
Use either of the following options instead:

■ Use the Remote IDM Indexer.

■ Use Local Path on Enforce Server
See “Using local path on Enforce Server” on page 636.
■ Use Remote SMB Share
See “Using the remote SMB share option to index file shares” on page 637.

Using local path on Enforce Server

The Use Local Path on Enforce Server method lets you index individual files that are local
to the Enforce Server. With this method the files to be indexed cannot be archived in a ZIP
file.
See “Creating and modifying Indexed Document Profiles” on page 629.
To use the Use Local Path on Enforce Server method of making the document source
available to the Enforce Server for indexing, you enter the local path to the directory that
contains the documents to index. For example, if you copied the files to the file system at
directory C:\Documents, you would enter C:\Documents in the field for the Use Local Path
on Enforce Server option. You must specify the exact path, not a relative path. Do not include
the actual file names in the path.

Note: If the files you index include a file that is more than 2 GB in size, the system indexes all
the files except the 2 GB file. This only applies to the Use Local Path on Enforce Server
option. It does not apply to the Reference Archive on Enforce Server option.
Detecting content using Indexed Document Matching (IDM) 637
Configuring IDM profiles and policy conditions

Using the remote SMB share option to index file shares

The Use Remote SMB Share method lets you index documents remotely using the Common
Internet File System (CIFS) protocol. To use this method of making the document source
available to the Enforce Server, you enter the Universal Naming Convention (UNC) path for
the Server Message Block (SMB) share that contains the documents to index
See “About indexing remote documents” on page 617.
See “To index remote documents on file shares using CIFS” on page 637. provides the steps
for using CIFS to index remote documents.

Note: Symantec Data Loss Prevention does not delete documents after indexing when you
use the Use Remote SMB Share option.

To index remote documents on file shares using CIFS

1 Log on to the Enforce Server administration console.
2 Navigate to the screen Manage > Data Profiles > Indexed Documents > Configure
Document Profile.
3 Select the option Use Remote SMB Share.
4 Enter the UNC Path for the SMB share that contains the documents to index.
A UNC path consists of a server name, a share name, and an optional file path, for
example: \\server\share\file_path.
5 Enter a valid user name and password for the share, and then re-enter the password.
The user you specify must have general access to the shared drive and read permissions
for the constituent files.
Optionally, you can Use Saved Credentials, in which case the credentials are available
from the pull-down menu.
See “About the credential store” on page 160.
6 Complete the configuration of the Indexed Document Profile.
See “Creating and modifying Indexed Document Profiles” on page 629.

Using the remote SMB share option to index SharePoint documents

To remotely index files on SharePoint, you expose the remote file share using WebDAV. Once
you have enabled WebDAV for SharePoint, you use the Use Remote SMB Share option and
enter the UNC path to index the remote documents. Symantec Data Loss Prevention supports
remote IDM indexing using WebDAV for SharePoint 2007 and SharePoint 2010 instances.
See “About indexing remote documents” on page 617.
Detecting content using Indexed Document Matching (IDM) 638
Configuring IDM profiles and policy conditions

Table 27-17 provides the procedure for remotely indexing SharePoint documents using WebDAV

Table 27-17 Indexing of SharePoint documents

Step Task Description

1 Enable WebDAV for See “Enabling WebDAV for Microsoft IIS” on page 639.
SharePoint.

2 Start the WebClient service. From the computer where the Enforce Server is installed, start the WebClient
service using the "Services" console. If this service is "disabled," right-click it
and select Properties. Enable the service, set it to Manual, then Start it.
Note: You must have administrative privileges to enable this service.

3 Access the SharePoint From the computer where your Enforce Server is installed, access SharePoint
instance. using your browser and the following address format:

http://<server_name>:port

For example: http://protect-x64:80

4 Log on to SharePoint as an You do not need to have SharePoint administrative privileges.

authorized user.

5 Locate the documents to In SharePoint, navigate to the documents you want to scan. Often SharePoint
scan. documents are stored at the Home > Shared Documents screen. Your
documents may be stored in a different location.

6 Find the UNC path for the In SharePoint for the documents you want to scan, select the option Library
documents. > Open with Explorer. Windows Explorer should open a window and display
the documents. Look in the Address field for the path to the documents. This
address is the UNC path you need to scan the documents remotely. For
example: \\protect-x64\Shared Documents. Copy this path to the
Clipboard or a text file.

7 Create the IDM Index. See “Creating and modifying Indexed Document Profiles” on page 629.
Detecting content using Indexed Document Matching (IDM) 639
Configuring IDM profiles and policy conditions

Table 27-17 Indexing of SharePoint documents (continued)

Step Task Description

8 Configure the SharePoint To configure the remote indexing source:

remote indexing source.
■ For the Document Source field, select the Use Remote SMB Share option.
■ For the UNC Path, paste (or enter) the address you copied from the previous
step. For example: \\protect-x64\Shared Documents.
■ For the User Credentials, enter your SharePoint user name and password,
or select the same from the Saved Credentials drop-down list.
■ Select the option Submit Indexing on Save and click Save.

9 Verify success. At the Manage > Data Profiles > Indexed Documents screen you should see
that the index was successfully created. Check the "Status" and the number
of documents indexed. If the index was successfully created you can now use
it to create IDM policies.

See “Troubleshooting SharePoint document indexing” on page 640.

Enabling WebDAV for Microsoft IIS

There are various methods for enabling WebDAV for IIS. The following steps provide one
approach, in this case for a Windows Server 2008 R2. This approach is provided as an example
only. Your approach and environment may differ.
Microsoft IIS deployments that host SharePoint instances can be enabled to accept WebDAV
connections from web clients.
See “Using the remote SMB share option to index SharePoint documents” on page 637.
Enable WebDAV for SharePoint
1 Log on to the SharePoint system where you want to enable WebDAV.
2 Open the Internet Information Services (IIS) Manager console.
3 Select the server name in the IIS tree.
4 Expand the tree, click the Web Sites folder and expand it.
5 Select the SharePoint instance from the list.
6 Right-click the SharePoint instance and select New > Virtual Directory.
7 The Virtual Directory Creation Wizard appears. Click Next.
8 Enter a name in the Alias field (such as "WebDAV") and click Next.
9 Enter a directory path in the Web Site Content Directory field. It can be any directory
path as long as it exists. Click Next.
10 Select Read access and click Next.
Detecting content using Indexed Document Matching (IDM) 640
Configuring IDM profiles and policy conditions

11 Click Finish.
12 Right-click the virtual directory that you created and select Properties.
13 In the Virtual Directory tab, select the option "A redirection to a URL" and click Create.
The alias name is populated in the Application Name field.
14 Enter the SharePoint site URL in the "Redirect to" field and click OK. WebDAV is now
enabled for this SharePoint instance.

Troubleshooting SharePoint document indexing

If you cannot connect the Enforce Server computer to the SharePoint Server computer after
enabling WebDAV, make sure that you have started the WebClient service on the Enforce
Server computer. You must start this service and test the WebDAV connection before you
configure IDM indexing.
See “Using the remote SMB share option to index SharePoint documents” on page 637.
If you plan to re-index SharePoint documents periodically as they are updated, it may be useful
to map the remote network resource to the local computer where the Enforce Server is installed.
You can use the "net use" MS-DOS command to map SharePoint using the UNC path. For
example:
■ net use
This command without parameters retrieves and displays a list of network connections.
■ net use s: \\sharepoint_server\Shared Documents
This command assigns (maps) the SharePoint server to the local "S" drive.
■ net use * \\sharepoint_server\Shared Documents
This command assigns (maps) the SharePoint server to the next available letter drive.
■ net use s: /delete
This command removes the network mapping to the specified drive.

Filtering documents by file name

When you configure an Indexed Document Profile, you have the option of using filters to include
or exclude documents in your data source from being indexed. There are two types of file
name filters: File Name Include Filters and File Name Exclude Filters. Symantec recommends
that if you choose to use file name filters you select either inclusion filters or exclusion filters,
but not both.
See “Filter documents from indexing to reduce false positives” on page 652.
Table 27-18 describes the differences between the include and exclude filters for file names.
Detecting content using Indexed Document Matching (IDM) 641
Configuring IDM profiles and policy conditions

Table 27-18 File name filters distinguished

Filter Description

File Name Include Filters If the File Name Include Filters field is empty, matching is performed on all documents
in the document profile. If you enter anything in the File Name Include Filters field, it is
treated as an inclusion filter. In this case the document is indexed only if it matches the
filter you specify.

For example, if you enter *.docx in the File Name Include Filters field, the system
indexes only the *.docx files in the document source.

File Name Exclude Filters The Exclude Filters field lets you specify the documents to exclude in the matching
process.

If you leave the Exclude Filters field empty, the system performs matching on all
documents in the ZIP file or file share. If you enter any values in the field, the system
scans only those documents that do not match the filter.

The system treats forward slashes (/) and backslashes (\) as equivalent. The system ignores
whitespace at the beginning or end of the pattern. File name filtering does not support escape
characters, so you cannot match on literal question marks, commas, or asterisks.
Table 27-19 describes the syntax accepted by the File Name Filters feature. The syntax for
the Include and Exclude filters is the same.

Table 27-19 File name filtering syntax

Operator Description

Asterisk (*) Represents any number of characters.

Question mark (?) Represents a single character.

Comma (,) and newline Represents a logical OR.

Table 27-20 provides sample filters and descriptions of behavior if you enter them in the File
Name Include Filters field:

Table 27-20 File name filter examples

Filter string Description

*.txt,*.docx The system indexes only .txt and .docx files in the ZIP file or file share, ignoring
everything else.

?????.docx The system indexes files with the .docx extension and files with five-character
names, such as hello.docx and stats.docx, but not good.docx or
marketing.docx.
Detecting content using Indexed Document Matching (IDM) 642
Configuring IDM profiles and policy conditions

Table 27-20 File name filter examples (continued)

Filter string Description

*/documentation/*,*/specs/* The system indexes only files in two subdirectories below the root directory, one
called "documentation" and the other called "specs."

Example with wildcards and IDM indexing fails or ignores the filter setting if the File Name Includes / Excludes
sub-directories: filter string starts with an alphanumeric character and includes a wildcard, for
example: l*.txt. The workaround is to configure the include/exclude filter with
*\scan_dir\l*.txt
the filter string as indicated in this example, that is, *\scan_dir\l*.txt.

For example, the filter 1*.txt does not work for a file path
\\dlp.symantec.com\scan_dir\lincoln-LyceumAddress.txt. However,
if the filter is configured as *\scan_dir\l*.txt, the indexer acknowledges the
filter and index the file.

Filtering documents by file size

Filters let you specify documents to include or exclude from indexing. The types of filters include
File Name Include Filters, File Name Exclude Filters, and File Size Filters. You use file size
filters to exclude files from the matching process based on their size. Any files that match the
size filters are ignored.
See “Filtering documents by file name” on page 640.
In the Size Filters fields, specify any restrictions on the size of files the system should index.
In general you should use only one type of file size filter.
See “Filter documents from indexing to reduce false positives” on page 652.
Table 27-21 describes the file size filter options.

Table 27-21 File size filter configuration options

Filter Description

Ignore Files Smaller Than To exclude files smaller than a particular size:

■ Enter a number in the field for Ignore Files Smaller Than.

■ Select the appropriate unit of measure Bytes, KB (kilobytes), or MB (megabytes)
from the drop-down list.

For example, to prevent indexing of files smaller than one kilobyte (1 KB), enter 1 in
the field and select KB from the corresponding drop-down list.
Detecting content using Indexed Document Matching (IDM) 643
Configuring IDM profiles and policy conditions

Table 27-21 File size filter configuration options (continued)

Filter Description

Ignore Files Larger Than To exclude files larger than a particular size:
■ Enter a number in the field for Ignore Files Larger Than.
■ Select the appropriate unit of measure (Bytes, KB, or MB) from the drop-down list.

For example, to prevent indexing of files larger than two megabytes (2 MB), enter 2
in the field and select MB from the corresponding drop-down list.

Scheduling document profile indexing

When you configure a document profile, select Submit Indexing Job on Save to index the
document profile as soon as you save it. Alternatively, you can set up a schedule for indexing
the document source.
To schedule document indexing, select Submit Indexing Job on Schedule and select a
schedule from the drop-down list as described in Table 27-22.

Note: The Enforce Server can index only one document profile at a time. If one indexing
process is scheduled to start while another indexing process is running, the new process does
not begin until the first process completes.

Table 27-22 Options for scheduling Document Profile indexing

Parameter Description

Index Once On – Enter the date to index the document profile in the format MM/DD/YY. You can also click
the date widget and select a date.

At – Select the hour to start indexing.

Index Daily At – Select the hour to start indexing.

Until – Select this check box to specify a date in the format MM/DD/YY when the indexing
should stop. You can also click the date widget and select a date.

Index Weekly Day of the week – Select the day(s) to index the document.

At – Select the hour to start indexing.

Until – Select this check box to specify a date in the format MM/DD/YY when the indexing
should stop. You can also click the date widget and select a date.
Detecting content using Indexed Document Matching (IDM) 644
Configuring IDM profiles and policy conditions

Table 27-22 Options for scheduling Document Profile indexing (continued)

Parameter Description

Index Monthly Day – Enter the number of the day of each month you want the indexing to occur. The number
must be 1 through 28.

At – Select the hour to start indexing.

Until – Select this check box to specify a date in the format MM/DD/YY when the indexing
should stop. You can also click the date widget and select a date.

Changing the default indexer properties

The server index contains the MD5 fingerprint of each file that has been indexed, either raw
binary or exact extracted content if the contents of the file can be extracted, and hashes of
discrete passages of content.
See “Using IDM to detect exact and partial file contents” on page 621.
The size of the passages depends on the low_threshold_k setting in the indexer properties
file (\Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\indexer.properties).
Generally, there is no need to change the default settings. When you lower the default minimum,
the Enforce Server creates hashes out of smaller sections of the documents it indexes.
The default settings apply to the Whitelisted.txt file as well. If the amount of content you
need to whitelist is less than the minimum amount required for partial matching, you can adjust
the default minimum setting.
To change the default minimum for whitelisted text
1 On the Symantec Data Loss Prevention host, navigate to directory \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config on
Windows, or
/opt/Symantec/DataLossPrevention/EnforceServer/15.5/Protect/config on Linux.

2 Use a text editor to open file Indexer.properties

3 Locate the parameter low_threshold_k:

low_threshold_k=50
Detecting content using Indexed Document Matching (IDM) 645
Configuring IDM profiles and policy conditions

4 Change the numerical portion of the parameter value to reflect the wanted minimum
number of characters that are allowed in Whitelisted.txt.
For example, to change the minimum to 30 characters, modify the value to look like the
following:

low_threshold_k=30

The value for this parameter must match the min_normalized_size value. The default
for min_normalized_size is 50.
5 Save the file.
For more information on IDM configuration and customization, see the article "Understanding
IDM configuration and customization" at http://www.support.symantec.com/doc/TECH234899
at the Symantec Support Center.

Enabling Agent IDM

You enable exact and partial match IDM on the Windows endpoint by setting the advanced
agent configuration parameter Detection.TWO_TIER_IDM_ENABLED.str to OFF. Once two-tier
detection is OFF, the DLP Agent performs exact and partial file and exact and partial file
contents matching, assuming you have generated the endpoint index.

Note: Two-tier deployment is not supported on the Mac Agent.

See “Creating and modifying Indexed Document Profiles” on page 629.

For new installations, exact and partial match IDM on the endpoint is the default setting for
the default endpoint agent configuration (TWO_TIER_IDM_ENABLED = OFF); you do not
need to enable it.
For upgraded systems, exact and partial match IDM on the endpoint is disabled
(TWO_TIER_IDM_ENABLED = ON) so that there is no change in functionality for existing IDM
policies deployed to the endpoint. If you want to use exact match IDM on the endpoint after
upgrade, you need to turn off two-tier detection and reindex each document data source.
See “To turn two-tier detection on or off” on page 645.
To turn two-tier detection on or off
1 Log on to the Enforce Server administration console.
2 Navigate to System > Agents > Agent Configuration.
3 Select the applicable agent configuration.
4 Select the Advanced Agent Settings tab.
5 Locate the Detection.TWO_TIER_IDM_ENABLED.str parameter.
Detecting content using Indexed Document Matching (IDM) 646
Configuring IDM profiles and policy conditions

6 Change the value to either "ON" or "OFF" (case insensitive) depending on your
requirements.
See Table 27-23 on page 646.
7 Click Save at the top of the page to save the changes.
8 Apply the agent configuration to the agent group or groups.
See “Applying agent configurations to an agent group” on page 2412.

Table 27-23 Advanced agent settings for exact match IDM on the endpoint

Advanced Agent Setting parameter Value Default Detection Matching type

engine

Detection.TWO_TIER_IDM_ENABLED.str OFF New installation DLP Agent Exact file

or system
Partial file contents
upgrade from
12.5 or later.

ON System upgrade Endpoint Server Exact file

from 12.0.x
Exact file contents

Partial file contents

Estimating endpoint memory use for agent IDM

For partial matching, DLP requires about 2 KB of RAM per file, or about 60 MB for 30,000 files
for the agent. For exact matching only, DLP requires about 40 bytes per file.
See “About the server index files and the agent index files” on page 618.

Configuring the Content Matches Document Signature policy

condition
The Content Matches Document Signature From matches unstructured document content
based on the Indexed Document Profile. The Content Matches Document Signature From
condition is available for detection rules and exceptions.
See “About using the Content Matches Document Signature policy condition” on page 623.
Detecting content using Indexed Document Matching (IDM) 647
Configuring IDM profiles and policy conditions

To configure the Content Matches Document Signature condition

1 Add an IDM condition to a policy rule or exception, or modify an existing one.
See “Configuring policies” on page 413.
See “Configuring policy rules” on page 417.
See “Configuring policy exceptions” on page 426.
2 Configure the IDM condition parameters.
See Table 27-24 on page 647.
3 Save the policy configuration.

Table 27-24 Content Matches Document Signature condition parameters

Action Description

Set the Minimum Select an option from the drop-down list.

Document Exposure.
Choose Exact to match document contents exactly.

Choose a percentage between 10% and 90% to match document contents partially.

Configure Match Select how you want to count matches:

Counting.
■ Check for existence
Reports a match count of 1 if there are one or more condition matches.
■ Count all matches
Reports a match count of the exact number of matches.

See “Configuring match counting” on page 421.

Select the components to Select one of the available message components to match on:
Match On.
■ Body – The content of the message.
■ Attachments – Any files that are attached to or transferred by the message.

See “Selecting components to match on” on page 423.

Configure additional Select this option to create a compound condition. All conditions must be met to trigger or
conditions to Also Match. except a match.

You can Add any available condition from the drop-down menu.

Test and tune the policy. See “Test and tune policies to improve match accuracy” on page 453.

See “Use parallel IDM rules to tune match thresholds” on page 654.

See “Troubleshooting policies” on page 445.

Detecting content using Indexed Document Matching (IDM) 648
Best practices for using IDM

Best practices for using IDM

Indexed Document Matching (IDM) is designed to protect document content and images. IDM
relies on an index of fingerprinted documents to perform partial and derivative text-based
content matching. In addition, you can also use IDM to match indexed documents exactly
based on their binary stamp, including not only text-based documents but also graphics and
media files
Because of the broad range of matching supported by IDM, you should consider the best
practices in this section to implement IDM policies that accurately match the data you want to
protect.
Table 27-25 summarizes the IDM considerations discussed in this section, with links to individual
topics for each.

Table 27-25 IDM policy best practices

Consideration Description

Reindex IDM profiles after upgrade. See “Reindex IDM profiles after upgrade” on page 649.

Do not compress documents whose content you want to See “Do not compress files in the document source”
fingerprint. on page 649.

Prefer partial matching over exact matching on the DLP See “Prefer partial matching over exact matching on the
Agent. DLP Agent” on page 650.

Do not index text-based documents without content. See “Do not index empty documents” on page 649.

Be aware of the limitations of exact matching. See “Understand limitations of exact matching” on page 650.

Use white listing to exclude partial file contents from See “Use white listing to exclude non-sensitive content
matching and reduce false positives. from partial matching” on page 651.

Filter non-critical documents from indexing to reduce false See “Filter documents from indexing to reduce false
positives. positives” on page 652.

Change the index max size to index more than 1,000,000 See “Create separate profiles to index large document
documents. sources” on page 653.

Use remote indexing for large document sets. See “Remote IDM indexing” on page 655.

Use scheduled indexing to automate profile updates. See “Use scheduled indexing to keep profiles up to date”
on page 653.

Use multiple IDM rules in parallel to establish and tune See “Use parallel IDM rules to tune match thresholds”
match thresholds. on page 654.
Detecting content using Indexed Document Matching (IDM) 649
Best practices for using IDM

Reindex IDM profiles after upgrade

You must update each Indexed Document Matching profile by reindexing each associated
data source after performing a upgrade of Symantec Data Loss Prevention.
If you have upgraded Symantec Data Loss Prevention and you want to use partial-match IDM
on the endpoint for existing IDM policies, you must reindex the data source for each Indexed
Document Profile so that each endpoint index is generated and deployed to DLP Agents.
See “Enabling Agent IDM” on page 645.

Do not compress files in the document source

For file formats whose content can be extracted, the server indexing process opens the
document, extracts the text-based content, and fingerprints the data in full and in part (sections).
However, the indexing process cannot recursively inspect document archives that are contained
in the document set. If a document whose file contents you want to index is compressed in an
archive file (such as ZIP, RAR, or TAR) within the document data source, the system cannot
extract the contents from the file and index its content. In this case, the system only takes a
cryptographic hash of the binary file signature. The embedded file is considered for exact file
matches only, like image files and other unsupported file formats.
This behavior is specific to the design-time indexing process only. At run-time the detection
server does recursively inspect document archives and extract the text of files contained in
those archives. But, to be able to evaluate such content, the IDM index must have been able
to index all content files.
The best practice is not to include any files whose content you want to index in a document
archive. The lone exception is the document archive ZIP file that you upload or copy to the
Enforce Server that contains the entire document set. All files in that container file must be
uncompressed. If the Document Archive uploaded to the Enforce Server for indexing contains
one or more embedded archive files (such as a ZIP), the system performs an exact binary
match on any file contained in the embedded archive file
See “Creating and modifying Indexed Document Profiles” on page 629.

Do not index empty documents

You should be careful about the documents you index. In particular, avoid indexing blank or
empty documents.
For example, indexing a PPTX file containing only photographs or other graphical content but
no textual content matches other blank PPTX files exactly and produces false positives. Is this
case, even though a PPTX file contains no user-entered text, the file does contain header and
footer placeholder text that the system extracts as file contents. Because the amount of text
extracted and normalized is more than 50 non-whitespace characters, the system treats the
file as not binary and creates a cryptographic hash of all of the file contents. As a result, all
Detecting content using Indexed Document Matching (IDM) 650
Best practices for using IDM

other blank PPTX files produce exact file contents matches because the resulting MD5 of the
extracted content is the same.

Note: This behavior has not been observed with XLSX files; that is, false positives do not get
created if the blank files are different.

See “Using IDM to detect exact and partial file contents” on page 621.

Prefer partial matching over exact matching on the DLP Agent

If you are deploying IDM polices to the endpoint, partial match IDM is recommended. The main
advantage of partial match IDM on the endpoint is that matching is fast because it is done
locally by the agent instead of remotely by the server. In addition, partial match IDM lets you
use response rules directly on the endpoint.
See “Types of IDM detection” on page 614.

Understand limitations of exact matching

Exact match means just that: inbound data must match the MD5 fingerprint of either a binary
file signature or an exact match of extracted and normalized file contents. .
See “Supported forms of matching for IDM” on page 613.
Consider the following when implementing server exact match IDM:
■ White listing only applies to partial file contents matching.
■ For binary files and text-based files coming into the detection engine for exact file matching,
as an optimization the system checks the byte size of the file before computing the run-time
MD5 for comparison against the index. If the file byte sizes do not match there is no
comparison of the cryptographic hashes.
■ File type is never checked for exact file or exact file contents matching.
■ Some file formats change the byte size of a file if the file is opened by the native application
and then saved without changes, resulting in the file not matching exactly. For example, if
you open a file such as a JPEG image with Windows Picture and Fax Viewer and save the
file without making changes, the binary size of the file is nonetheless changed, resulting
in no exact match.
■ For some applications the Windows Print operation may alter the file data such that extracted
file contents does not match exactly. Known file types that are affected by this include
Microsoft Office documents.
Table 1 lists some known limitations with exact content matching. This list is not exhaustive
and there may be other file formats that change on resave.
Detecting content using Indexed Document Matching (IDM) 651
Best practices for using IDM

Table 1 Limitations of exact file content matching

File type Application Result on resave

dwg AutoCAD 2012 Does not match

jpeg Windows Picture and Fax Viewer Does not match

doc Microsoft Office Word 2007 Does not match

xls Microsoft Excel 2007 Does not match

ppt Microsoft Presentation 2007 Does not match

pdf Adobe Acrobat 9 Pro Does not match

docx Microsoft Office Word 2007 Match

xlsx Microsoft Excel 2007 Match

pptx Microsoft Presentation 2007 Match

Use white listing to exclude non-sensitive content from partial

matching
White listing is designed to let you exclude partial file contents from matching. You use white
listing to exclude headers, footers, and boilerplate content from partial matching and reduce
false positives. Information contained in document headers and footers is likely to cause false
positives. Likewise boilerplate text, such as standard language and non-proprietary corporate
content that is often repeated across confidential documents can cause false positives.
Ideally, you should remove headers and footers from documents before you index them.
However, this may not be feasible, especially if you have a large document set. As a best
practice, you should whitelist header, footer, and boilerplate content so that this text is excluded
when the server index is generated. If you use white listing, generally you can lower the
Minimum Document Exposure setting in the policy without increasing false positives because
more of the content indexed is confidential data, instead of common, repeated content.

Note: White listing does not apply to exact file or exact file contents matching.

See “About white listing partial file contents” on page 624.

See “White listing file contents to exclude from partial matching” on page 627.
Detecting content using Indexed Document Matching (IDM) 652
Best practices for using IDM

Filter documents from indexing to reduce false positives

When you configure an Indexed Document Profile, you have the option of using filters to include
or exclude documents in your data source for indexing. There are two types of filters: file name
and file size.
See “Creating and modifying Indexed Document Profiles” on page 629.
You use filtering to filter non-critical documents from indexing and ensure that your index is
protecting only confidential files and file contents. Filtering helps reduce false positives and
decrease the size of the IDM index.
See “Do not index empty documents” on page 649.
The best practice is to use either an exclusion filter or an inclusion filter for each filter type, but
not both. For example, you may not need to index all of the files you include in a document
archive or expose to the system by file share. In this case, you can enumerate the files you
want to include (inclusion filter) or list the file types you want to exclude from indexing (exclusion
filter), but you should not use both. You can also use file size filters to set a threshold for the
file size to include or exclude in the index.
See “Filtering documents by file name” on page 640.
See “Filtering documents by file size” on page 642.

Distinguish IDM exceptions from white listing and filtering

White listing lets you exclude partial file contents from matching. Filtering lets you exclude
specific documents from the indexing process. IDM exceptions, on the other hand, let you
except indexed files from exact matching at run-time.
You use the IDM condition as policy exception to exclude files from detection. To be excepted
from matching, an inbound file must be an exact match with a file in the IDM index. You cannot
use IDM exceptions to exclude content from matching. To exclude content, you must whitelist
it.

Note: White listing is not available for exact file or file contents matching; it is only available
for partial content matching.

Table 27-27 White listing, filters, and exceptions distinguished

IDM Use
Configuration

Exception Except exact file from matching

As an example, the CAN-SPAM Act policy template uses an IDM exception.

Detecting content using Indexed Document Matching (IDM) 653
Best practices for using IDM

Table 27-27 White listing, filters, and exceptions distinguished (continued)

IDM Use
Configuration

White listing Except file contents from matching

See “Use white listing to exclude non-sensitive content from partial matching” on page 651.

Filtering Include or exclude files from being indexed

See “Filter documents from indexing to reduce false positives” on page 652.

Create separate profiles to index large document sources

IDM detection is based on an Indexed Document Profile. The maximum single IDM profile size
in RAM is 2 GB. This maximum size limit is based on the overall number of the documents
being indexed. Depending on the size of the actual source files and their extracted text size,
this translates into approximately 1,000,000 files. You can change the 2 GB maximum size of
a single IDM profile index in the indexer.properties file using
com.vontu.profiles.documents.maxIndexSize.

See “About the document data source” on page 616.

If you need to index more than 1,000,000 files, the best practice is to organize the documents
into separate ZIP files or share directories. You should create a separate Indexed Document
Profile for each individual document set. Then, you can define separate rules that reference
each index and add the rules to one or more policies.

Use WebDAV or CIFS to index remote document data sources

For smaller document sets (50 MB or less), you can upload the files to the Enforce Server.
For larger document sets, consider using FTP Secure to upload the files to the Enforce Server.
Alternatively, you can remotely index documents that are stored on a file share that supports
the CIFS protocol, or on a web server that supports the WebDAV protocol, such as Microsoft
SharePoint or OpenText Livelink
See “About indexing remote documents” on page 617.

Use scheduled indexing to keep profiles up to date

You can use index scheduling to keep your IDM profiles up to date. The initial index scans all
the documents to be indexed. Any subsequent index only scans the differences between the
two. You should schedule indexing outside of normal business hours to reduce any potential
affect on the system.
See “Scheduling document profile indexing” on page 643.
Detecting content using Indexed Document Matching (IDM) 654
Best practices for using IDM

Before you set up an indexing schedule, consider the following recommendations:

■ If you update your document sources occasionally (for example, less than once a month),
there is no need to create a schedule. Index the document each time you update it.
■ Schedule indexing for times of minimal system use. Indexing affects performance throughout
the Symantec Data Loss Prevention system, and large documents can take time to index.
■ Index a document as soon as you add or modify the corresponding document profile, and
re-index the document whenever you update it. For example, consider a situation where
every Wednesday at 2:00 A.M. you update a document. In this case scheduling the index
process to run every Wednesday at 3:00 A.M. is optimal. Scheduling document indexing
daily is not recommended because that is too frequent and can degrade server performance.
■ Monitor results and modify your indexing schedule accordingly. If performance is good and
you want more timely updates, schedule more frequent document updates and indexing.
■ Symantec Data Loss Prevention performs incremental indexing. When a previously indexed
share or directory is indexed again, only the files that have changed or been added are
indexed. Any files that are no longer in the archive are deleted during this indexing. So a
reindexing operation can run significantly faster than the initial indexing operation.

Use parallel IDM rules to tune match thresholds

The primary use case for IDM policies is to detect unstructured document content based on
a percentage match requirement called the Minimum Document Exposure. This value is a
configurable parameter that specifies the minimum percentage of content in the message that
must match the IDM index to produce a match. The IDM policy default is “Exact,” which means
that, for text-based documents, all of the content of the message must match the fingerprint
to create an incident. A Minimum Document Exposure setting of 10% means that, on average,
one page of a 10 page document must match the IDM index to create an incident.
A document might contain much more content, but Symantec Data Loss Prevention protects
only the content that is indexed as part of a document profile. For example, consider a situation
where you index a one-page document, and that one-page document is included as part of a
100-page document. The 100-page document is considered an exact match because its
content matches the one-page document exactly. In addition, the matched document does
not have to be of the same file type or format as the indexed document. For example, if you
index a Word document as part of a document profile, and its contents are pasted into the
body of an email message or used to create a PDF, the engine considers it a match
A rule-of-thumb for setting the Minimum Document Exposure setting is 60%. Minimum Document
Exposures set to less than 50% typically create many false positives. Starting with rate of 60%
should give you enough information to determine whether you should go to a higher or lower
match percentage without creating excessive false positives
As an alternative, consider taking a tiered approach to establishing Minimum Document
Exposure settings. For example, you can create multiple IDM rules, each with a different
Detecting content using Indexed Document Matching (IDM) 655
Remote IDM indexing

threshold percentage, such as 80% for documents with a high match percentage, 50% for
documents with a medium match percentage, and 10% with a low match percentage. Using
this approach helps you filter out false positives and establish an accurate Minimum Document
Exposure setting for each IDM index you deploy as part of your policies.

Remote IDM indexing

This section provides instructions and reference content for using the Remote IDM Indexer.

About the Remote IDM Indexer

The Remote IDM Indexer is a standalone tool. With it you can index your confidential documents
and files locally on the systems where these files are stored. The Remote IDM Indexer frees
you from having to collect and copy all the files to the Enforce Server host for indexing.
The Remote IDM Indexer generates a preindex file (*.prdx) that is encrypted and password
protected. You upload the preindex file to the Enforce Server host for final index generation
and deployment.
The Remote IDM Indexer is supported on Windows and Linux platforms. The tool is configured
using a command line interface (CLI) or a properties file. On Windows, you can use the graphical
user interface (GUI) edition of the tool to configure it.
You can integrate the tool with external systems to schedule indexing. In addition, you can
incrementally index a data source by specifying an existing *.prdx file when you run the tool.

Table 27-28 Remote IDM Indexer features

Feature Description

Familiar installation DLP installers for Windows and Linux

Various configuration options Properties file (default)

Command-line interface (CLI)

Graphical user interface GUI (Windows)

Secure preindex file Password protected

Encrypted data contents

Incremental indexing Ability to load an existing preindex and scan only

new or updated files.

Scheduled indexing Windows Task Scheduler

Linux cron job

Detecting content using Indexed Document Matching (IDM) 656
Remote IDM indexing

Table 27-28 Remote IDM Indexer features (continued)

Feature Description

Secure upload to Enforce UI for uploading the preindex to the Enforce Server
User must provide password to complete the
indexing process.

Installing the Remote IDM Indexer

You install the Remote IDM Indexer on one or more systems where the confidential files you
want to index are stored. The process for installing a remote indexer is the same for EMDI,
EDM, and IDM.
See “About installing remote indexers” on page 589.
You can install the Remote IDM Indexer on all supported Windows and Linux platforms. See
the Symantec Data Loss Prevention System Requirements Guide for platform details.

Indexing the document data source using the GUI edition (Windows
only)
To configure the UI edition of the Remote IDM Indexer, you enter the parameters into the
required fields. Optionally you can provide additional parameters, such as a whitelist file for
filters.
On successful completion of indexing, the preindex file (*.prdx) is generated. You move this
file to the Enforce Server to complete the indexing process.
Figure 27-1 shows the GUI edition of the Remote IDM Indexer.
Table 27-29 provides instructions for configuring the GUI edition of the Remote IDM Indexer.
Detecting content using Indexed Document Matching (IDM) 657
Remote IDM indexing

Figure 27-1 Remote IDM Indexer GUI edition

Detecting content using Indexed Document Matching (IDM) 658
Remote IDM indexing

Table 27-29 Configuring the Remote IDM Indexer using the GUI edition

Step Parameters Description

1 Enter the Source URI path. The source URI is the local file path (directory folder) where the files to be
indexed are stored. It can also be a shared file system path accessible by
the host.

The files to be indexed should not be encapsulated.

If the document data source requires credentials you provide them in the
URI Credentials section.

2 Enter the Output File Specify the file path and name for the preindex file that the tool generates.
name.
Include the *.prdx file extension when you specify the output file name.

3 Optionally, enter the Specify the file path to the whitelist.txt file.
Whitelist File path.
Text in the whitelist file is ignored during detection for server-based partial
matching.

4 Optionally, enter one or Enter one or more file names to include for indexing or to exclude for indexing.
more File Name Filters.
The File Name Include Filter includes the named files for indexing.

The File Name Exclude Filter excludes the named files from indexing.

The format for the include and exclude filters accepts both comma-separated
and newline-separated values.

If you use a filter, use one type but not both. For example, if you choose to
use a file name include filter, do not also provide a file name exclude filter.

5 Optionally, enter a File Size

Filter.

6 Optionally, click Always Click Always keep files

keep files.
■ When you want to incrementally add multiple data sources to the same
pre-index file.
■ If you have a folder with content that gets moved and want to keep the
old content in the pre-index file.

7 Click Run to index the data Click Run to start the indexing process.
source immediately.
Alternatively, you can click Schedule to schedule indexing. The tool opens
the Windows Task Utility.

See “Scheduling remote indexing with the Remote IDM Indexer app for
Windows” on page 659.
Detecting content using Indexed Document Matching (IDM) 659
Remote IDM indexing

Table 27-29 Configuring the Remote IDM Indexer using the GUI edition (continued)

Step Parameters Description

8 Enter the Password for the For security purposes you must provide a password for the pre-index file.
pre-index file.
The password must meet the one of the following requirements:

■ ASCII password: a minimum of 10 characters, with at least one upper

case letter, one lower case letter, and one number.
■ Non-ASCII password: a minimum of 10 characters, including at least one
number.

The preindex file is encrypted with the password you provide.

The password you enter here is required to load the preindex into the Enforce
Server for indexing.

9 Verify indexing progress. When you click Run, the status bar shows the scanning completion
percentage.

In addition the Progress section of the interface provides the following

information:

Current Stage: States are Running, Completed, or Error.

Progress: The total number of files indexed.

Current File: The name of the file that is indexed.

Scheduling remote indexing with the Remote IDM Indexer app for
Windows
If you use the Windows GUI version of the Remote IDM Indexer, you can schedule or edit a
task directly from the tool. The following screen shots illustrate the process.
See “To schedule indexing using the Windows GUI version” on page 659.
See “To edit an existing scheduled task using the Windows GUI” on page 661.
To schedule indexing using the Windows GUI version
1 Click Schedule to open the dialog. See “Scheduling remote indexing with the Remote
IDM Indexer app for Windows” on page 659.
2 Click Create to create a new scheduled task. Or, if you already have a task created, click
Edit.
You are prompted to provide a UTF8-encoded password file in cleartext for the scheduled
job. Access to this file should be limited to the appropriate user, such as your Protect user.
Click Create and provide the credentials to the Windows host.
Detecting content using Indexed Document Matching (IDM) 660
Remote IDM indexing

3 Enter the user name and password for the Windows host where the Task Scheduler is
installed.
When you enter the appropriate credentials (generally administrator privileges are required),
the Remote IDM Indexer creates a new task in the Windows Task Scheduler. The tool
displays a dialog indicating that the task was successfully created and provides you with
the name of the task. See Figure 27-3 on page 660.
4 Click OK to close the dialog.
After you complete this operation with Windows the interface appears.
5 Select the SymantecDLP folder in the Task Scheduler Library.
Notice to the right that there is a task created named "Remote IDM Indexer <time-stamp>".
See Figure 27-4 on page 661.
6 Double-click the created task.
This action brings up the Window Task Scheduler properties dialog for this task. Using
this dialog you can schedule when the Remote IDM Indexer should run. Refer to the Task
Scheduler help for details on using the Windows Task Scheduler.

Figure 27-2 Scheduling indexing dialog

Figure 27-3 Successfully scheduled task dialog

Detecting content using Indexed Document Matching (IDM) 661
Remote IDM indexing

Figure 27-4 Symantec DLP scheduled task

To edit an existing scheduled task using the Windows GUI

1 Click Schedule to open the dialog. See Figure 27-2 on page 660.
2 Click Edit/Delete Existing Tasks to open the Windows Task Scheduler utility. Here
you can edit or delete an existing scheduled task.

Figure 27-5 Windows Task Scheduler properties configuration

See “Incremental indexing” on page 661.

Incremental indexing
You can incrementally index a remote data source by specifying an existing preindex file
(*.prdx) in the command line argument when you run the tool.
Detecting content using Indexed Document Matching (IDM) 662
Remote IDM indexing

In the GUI version of the tool you can browse to and select an existing *.prdx file for the
Output File path.
The indexing process appends newly indexed files and file contents to the existing preindex
entries.
The tool compares the last modified date of the file. If the file has been modified after the file
that was preindexed, the tool updates the preindex with the changes that were made to the
file. If the date the file was modified is the same, the pre-index is not updated. If you change
any include, exclude, or size filters in your existing preindex file, those filters are applied to
any previously indexed files. For example, for a remote data source with ten .docx files and
ten .pptx files, if your first remote indexing job has no filters, all files are indexed. If you add
an exclude filter for .docx files (-exclude_filter=*.docx) and run the indexing job again,
the .docx files are removed from the index and only the .pptx files remain.

Always keep files

You can select Always keep files in the Remote IDM Indexer GUI version for Windows or
use keep_all_files=true at the command line for Windows and Linux when you want to
incrementally add multiple data sources to the same preindex file. It keeps files which are in
the previous preindex, but not in the current data source. It also enables you to incrementally
add multiple data sources to the same preindex file. You can also use keep_all_files if you
have a folder containing content that is moved and you want to keep the old content in the
preindex file.
The previous IDM incremental indexer, and the indexer available through the Enforce Server
administration console, replaces the entire old index with a new one. For example, when
document set A is indexed and then document set B is incrementally indexed for the same
profile, the index of set A is dropped and replaced with the index of set B.

Logging and troubleshooting

Remote IDM indexing status messages are logged to the Indexer.log file.
The log file path is C:\ProgramData\Symantec\DataLossPrevention\Indexer\15.5\logs
(Windows) or /var/log/Symantec/DataLossPrevention/Indexer/15.5/ (Linux).
The log presents error messages indicating whether file access was denied or file indexing
failed.
See “Copying the preindex file to the Enforce Server host” on page 662.

Copying the preindex file to the Enforce Server host

After you have generated the preindex file you must copy it to the Enforce Server host so it
can be loaded for profiling and deployment.
Detecting content using Indexed Document Matching (IDM) 663
Remote IDM indexing

You copy the *.prdx file to the following directory on the Enforce Server host on Windows:
C:\Program
Files\Symantec\DataLossPrevention\ServerPlatformCommon\15.5\documentprofiles
or on Linux:
/var/Symantec/DataLossPrevention/ServerPlatformCommon/15.5/documentprofiles.

You can use FTP or FTP/S to copy the *.prdx file to the Enforce Server host file system.

Note: Make sure that the Enforce user who is reading and loading the .prdx file has permission
to enable copying and loading of the file.

See “Loading the remote index file into the Enforce Server” on page 663.

Loading the remote index file into the Enforce Server

The Enforce Server administration console provides a user interface for uploading remote IDM
preindexes to the Enforce Server.
The Data Loss Prevention administrator or policy author must specify the preindex password
that was entered when the preindex file was initially created.
The system uses the preindex to generate the final index that is deployed to detection servers
and agents (if Agent IDM is enabled).

Note: If you have not copied the preindex file to the proper directory on the Enforce Server
host on Windows: C:\Program
Files\Symantec\DataLossPrevention\ServerPlatformCommon\15.5\documentprofiles
or on Linux:
/var/Symantec/DataLossPrevention/ServerPlatformCommon/15.5/documentprofiles,
the file does not appear in the drop-down field for selection.

Figure 27-6 Loading the remote index into Enforce

Chapter 28
Detecting content using
Vector Machine Learning
(VML)
This chapter includes the following topics:

■ Introducing Vector Machine Learning (VML)

■ Configuring VML profiles and policy conditions

■ Best practices for using VML

Introducing Vector Machine Learning (VML)

Vector Machine Learning (VML) performs statistical analysis to protect unstructured data. The
analysis determines if content is similar to example content you train against.
With VML you do not have to locate and fingerprint all of the data you want to protect. You
also do not have to describe it and risk potential inaccuracies. Instead, you train the system
to learn the type of content you want to protect based on example documents you provide.
VML detection is based on a VML profile. You create a VML profile by uploading a
representative amount of content from a specific category of data. The system scans the
content, extracts the features, and creates a statistical model based on the frequency of
keywords in the example documents. At run-time the system applies the model to analyze and
detect the content that has the features that are statistically similar to the profile.
VML simplifies the detection of unstructured, text-based content and offers the potential for
high accuracy. The key to implementing VML is the example content you train the system
against. You must be careful to select the documents that are representative of the type of
content you want to protect. And, you must select good examples of content you want to ignore
that are closely related to the content you want to protect.
Detecting content using Vector Machine Learning (VML) 665
Introducing Vector Machine Learning (VML)

See “Configuring VML profiles and policy conditions” on page 668.

About the Vector Machine Learning Profile

The Vector Machine Learning Profile is the data profile that you define for implementing VML
policies.
For example, you might create a VML profile to protect your source code. You train the system
using positive example documents (proprietary code that you want to protect). You also train
the system using negative example documents (open source code that you do not care to
protect). A VML policy references the VML profile to analyze message data and recognize the
content that is similar to the positive features. The VML profile can be tuned, and it can be
easily updated by adding or removing documents to or from the training sets.
See “Data Profiles” on page 375.
See “Creating new VML profiles” on page 669.

About the content you train

Collecting the documents for training is the most important step in the Vector Machine Learning
process. Vector Machine Learning is only as accurate as the example content you train against.
See “Configuring VML profiles and policy conditions” on page 668.
A VML profile is based on a category of content representing a specific business use case. A
category of content comprises two training sets: positive and negative.
The positive training set is content you want to protect. More specific categorization results in
better accuracy. For example, “Customer Purchase Orders” is better than “Financial Documents”
because it is more specific.
The negative training set is content you want to ignore, yet related to the positive training set.
For example, if the positive training set is “Weekly Sales Reports," the negative training set
might contain "Sales Press Releases."
You should collect an equal amount of positive and negative content that is primarily text-based.
You do not have to collect all the content you want to protect. However, you do need to
assemble training sets large enough to produce reliable statistics.
The recommended number of documents is 250 per training set. The minimum number of
documents per training set is 50.
Table 28-1summarizes the baseline requirements for the content you collect for VML profile
training.
Detecting content using Vector Machine Learning (VML) 666
Introducing Vector Machine Learning (VML)

Table 28-1 VML training set requirements

Category of Type of data Training set Quantity Content Size

content

Positive Recommended: Content you want

250 documents to protect.

Minimum: 50
documents 30 MB per upload
Single, specific Text-based
business use case (primarily) No size limit per
Negative Approximately the Content you do
category.
same amount as not want to protect
the positive yet thematically
category. related to the
positive category.

About the base accuracy from training percentage rates

During the VML profile training process, the system extracts example document content and
converts it to raw text. The system selects features (or keywords) using a proprietary algorithm
and generates the VML profile. As part of the training process, the system calculates and
reports base accuracy rates for false positives and false negatives. The base accuracies from
training percentage rates indicate the quality of your positive and negative training sets.
The goal is to achieve 100% accuracy (0% base false rates), but obtaining this level of quality
for both training sets is usually not possible. You should reject a training profile if either the
base false positive rate or the base false negative rate is more than 5%. A relatively high base
false percentage rate indicates that the training set is not well categorized. In this case you
need to add documents to an under-represented training set or remove documents from an
over-represented training set, or both.
See “Managing training set documents” on page 676.
Table 28-2 describes what the base accuracy percentage rates from training mean in relation
to the positive and negative training sets for a given VML profile.

Table 28-2 Base accuracy rates from training

Accuracy rate Description

Base false positive rate The percentage of the content in the negative training set that is statistically similar to the
(%) positive content.

Base false negative Rrate The percentage of the content in the positive training set that is statistically similar to
(%) negative content.
Detecting content using Vector Machine Learning (VML) 667
Introducing Vector Machine Learning (VML)

About the Similarity Threshold and Similarity Score

Each VML profile has a Similarity Threshold that can be set from 0 to 10. This setting is used
to make an adjustment for imperfect information within a training set to achieve the best
accuracy possible. During detection, a message must have a Similarity Score greater than the
Similarity Threshold for an incident to be generated. The Similarity Threshold is set at the
profile level—not within a policy. It is set this way because there is an ideal Similarity Threshold
setting that is unique to your training set where the best accuracy rates can be achieved (both
in terms of false positives and false negatives).
When a VML policy detects an incident, the system displays the Similarity Score in the match
highlighting section of the Incident Snapshot in the Enforce Server administration console.
The Similarity Score indicates how similar the detected content is to the VML profile. The
higher the score the more statistically similar the message is to the positive example documents
in your VML profile.
Consider an example where a Similarity Threshold is set to 4 and a message with a Similarity
Score of 5 is detected. In this case the system reports the match as an incident and displays
the Similarity Score during match highlighting. However, if a message is detected with a
Similarity Score of 3, the system does not report a match (and no incident) because the
Similarity Score is below the Similarity Threshold.
Table 28-3 describes the Similarity Threshold and Similarity Score numbers.

Table 28-3 Similarity Threshold and Similarity Score details

Similarity Description

Similarity Threshold The Similarity Threshold is a configurable parameter between 0 and 10 that is unique to each
VML profile. The default setting is 10, which requires the most similar match between the VML
profile features and the detected message content. As such, this setting is likely to produce
fewer incidents. A setting of 0 produces the most number of matches, many of which are likely
to be false positives.

See “Adjusting the Similarity Threshold” on page 681.

Similarity Score The Similarity Score is a read-only run-time statistic between 0 and 10 reported by the system
based on the detection results of a VML policy. To report an incident, the Similarity Score must
be higher than the Similarity Threshold, otherwise the VML policy does not report a match.

About using unaccepted VML profiles in policies

The system lets you create a policy that is based on a VML profile that has never been accepted.
However, the VML profile is not active and is not deployed to a referenced policy until the
profile is initially accepted.
See “Training VML profiles” on page 672.
Detecting content using Vector Machine Learning (VML) 668
Configuring VML profiles and policy conditions

Where you have a VML policy that references a never-accepted VML profile, the result of this
configuration depends on the type of detection server. Table 28-4 describes the behavior:

Table 28-4 References to never-accepted VML profiles

Detection server Description

Discover Server Discover scanning does not begin until all policy dependencies are loaded.
A Discover scan based on a VML policy does not start until the referenced
VML profile is accepted. In this case the system displays a message in the
Discover scanning interface that indicates that the scan waits on the
dependency to load.

Network and Endpoint For a simple rule, or compound rule where the conditions are ANDed, the
Servers entire rule fails because the VML condition cannot match. If this is the only
rule in the policy, the policy does not work.

For a policy where there are multiple rules that are ORed, only the VML rule
fails; the other rules in the policy are evaluated.

See “Policy detection execution” on page 394.

Configuring VML profiles and policy conditions

Vector Machine Learning (VML) performs statistical analysis to protect unstructured data. It
also determines if content is similar to an example set of documents you train against.
See “Introducing Vector Machine Learning (VML)” on page 664.
The following table describes the process for implementing VML.

Table 28-5 Implementing VML

Step Action Description

Step 1 Collect the example documents for Collect a representative number of example documents that contain
training the system. the positive content that you want to protect and the negative
content you want to ignore.

See “About the content you train” on page 665.

Step 2 Create a new VML profile. Define a new VML profile based on the specific business category
of data from which you have derived your positive and negative
training sets.

See “Creating new VML profiles” on page 669.

Detecting content using Vector Machine Learning (VML) 669
Configuring VML profiles and policy conditions

Table 28-5 Implementing VML (continued)

Step Action Description

Step 3 Upload the example documents. Upload the example positive and negative training sets separately
to the Enforce Server.

See “Uploading example documents for training” on page 671.

Step 4 Train the VML profile. Train the system to learn the type of content you want to protect
and generate the VML profile.

See “Training VML profiles” on page 672.

Step 5 Accept or reject the trained profile. Accept the trained profile to deploy it. Or, reject the profile, update
one or both of the training sets (by adding or removing example
documents), and restart the training process.

See “About the base accuracy from training percentage rates”

on page 666.

See “Managing VML profiles” on page 677.

Step 6 Create a VML policy and test Create a VML policy that references the VML profile.
detection.
See “Configuring the Detect using Vector Machine Learning Profile
condition” on page 679.

Test and review incidents based on the Similarity Score.

See “About the Similarity Threshold and Similarity Score”

on page 667.

Step 7 Tune the VML profile. Adjust the Similarity Threshold setting as necessary to optimize
detection results.

See “Adjusting the Similarity Threshold” on page 681.

Step 8 Follow VML best practices. See “Best practices for using VML” on page 687.

Creating new VML profiles

A VML profile contains the model that is generated from the training set contents. Once you
define a VML profile, you use it to create one or more VML policies.
See “Configuring VML profiles and policy conditions” on page 668.

Note: You must have Enforce Server administrator privileges to create VML profiles.
Detecting content using Vector Machine Learning (VML) 670
Configuring VML profiles and policy conditions

To create a new VML profile

1 Click New Profile from the Manage > Data Profiles > Vector Machine Learning screen
(if you have not already done so).
2 Enter a Name for the VML profile in the Create New Profile dialog.
Use a logical name for the VML profile that corresponds to the category of data you want
to protect.
See “About the content you train” on page 665.
3 Optionally, enter a Description for the VML profile.
You may want to include a description that identifies the purpose of the VML profile.
4 Click Create to create the new VML profile.
Or, click Cancel to cancel the operation.
5 Click Manage Profile to upload example documents.
See “Uploading example documents for training” on page 671.

Working with the Current Profile and Temporary Workspace tabs

For any single VML profile there are two possible versions: Current and Temporary. The
Current Profile is the run-time version; the Temporary Profile is the design-time version. As
you develop a VML profile, you create a Current Profile that you have trained, accepted, and
perhaps deployed to one or more policies. You also create a Temporary Profile that you actively
edit and tune.
The Enforce Server administration console displays each version of the VML profile in separate
tabs:
■ Current Profile
This version is the active instance of the VML profile. This version has been successfully
trained and accepted; it is available for deployment to one or more policies.
■ Temporary Workspace
This version is an editable version of the VML profile. This version has not been trained,
or accepted, or both; it cannot be deployed to a policy.
Initially, when you create a new VML profile, the system displays only the Current Profile tab
with an empty training set. After you initially train and accept the VML profile, the Trained Set
table in the Current Profile tab is populated with details about the training set. The information
that is displayed in this table and tab is read-only.
Detecting content using Vector Machine Learning (VML) 671
Configuring VML profiles and policy conditions

To edit a VML profile

◆ Click Manage Profile to the far right of the Current Profile tab.
The system displays the editable version of the profile in the Temporary Workspace tab.
You can now proceed with training and managing the profile.
See “Training VML profiles” on page 672.
The Temporary Workspace tab remains present in the user interface until you train and
accept a new version of the VML profile. In other words, there is no way to close the Temporary
Workspace tab without training and accepting, even if you made no changes to the profile.
Once you accept a new version of the VML profile, the system overwrites the previous Current
Profile with the newly accepted version. You cannot revert to a previously accepted Current
Profile. However, you can revert to previous versions of the training set for a Temporary Profile.
See “Managing training set documents” on page 676.

Uploading example documents for training

The training set comprises the example positive and negative documents you want to train
the system against. You upload the positive and the negative documents separately.

Note: You can upload individual documents. However, we recommended that you upload a
document archive (such as ZIP, RAR, or TAR) that contains the recommended (250) or
minimum (50) number of example documents. The maximum upload size is 30 MB. You can
partition the documents across archives if you have more than 30 MB of data to upload. See
“About the content you train” on page 665.

To upload the training set

1 Click Manage Profile from the Current Profile tab (if you have not already done so).
This action enables the VML profile for editing in the Temporary Workspace tab.
See “Working with the Current Profile and Temporary Workspace tabs” on page 670.
2 Click Upload Contents (if you have not already done so).
This action opens the Upload Contents dialog.
3 Select the category of content:
■ Choose Positive: match contents similar to these to upload a positive document
archive.
■ Choose Negative: ignore contents similar to these to upload a negative document
archive.

4 Click Browse to select the document archive to upload.

Detecting content using Vector Machine Learning (VML) 672
Configuring VML profiles and policy conditions

5 Navigate the file system to where you have stored the example documents.
6 Choose the file to upload and click Open.
7 Verify that you have chosen the correct category of content: Positive or Negative.
If you mismatch the upload (select Negative but upload a Positive document archive), the
resulting profile is inaccurate.
8 Click Submit to upload the document archive to the Enforce Server.
The system displays a message indicating if the file successfully uploaded. If the upload
was successful, the document archive appears in the New Documents table. This table
displays the document type, name, size, date uploaded, and the user who uploaded it. If
the upload was not successful, check the error message and retry the upload. Click the
X icon in the Remove column to delete an uploaded document or document archive from
the training set.
9 Click Upload Contents to repeat the process for the other training set.
The profile is not complete and cannot be trained until you have uploaded the minimum
number of positive and negative example documents.
See Table 28-1 on page 666.
10 Once you have successfully uploaded both training sets you are ready to train the VML
profile.
See “Training VML profiles” on page 672.

Training VML profiles

During the profile training process, the system scans the training content, extracts key features,
and generates a statistical model. When the training process completes successfully, the
system prompts you to accept or reject the training profile. If you accept the training results,
that version of the VML profile becomes the Current Profile. The Current Profile is active and
available for use in one or more policies.
See “Configuring VML profiles and policy conditions” on page 668.
Detecting content using Vector Machine Learning (VML) 673
Configuring VML profiles and policy conditions

Table 28-6 Training the VML profile

Step Action Description

Step 1 Enable training mode. Select the VML profile you want to train from the Manage > Data Profiles >
Vector Machine Learning screen. Or, create a new VML profile.

See “Creating new VML profiles” on page 669.

Click Manage Profile to the far right of the Current Profile tab. The system
displays the profile for training in the Temporary Workspace tab.

See “Working with the Current Profile and Temporary Workspace tabs”
on page 670.

Step 2 Upload the training Familiarize yourself with the training set requirements and recommendations.
content.
See “About the content you train” on page 665.

Upload the positive and the negative training sets in separate document archives
to the Enforce Server.

See “Uploading example documents for training” on page 671.

Step 3 Adjust the memory The default value is "High" which generally results in the best training set accuracy
allocation (only if rates. Typically you do not need to change this setting. For some situations you
necessary). may want to choose a "Medium" or "Low" memory setting (for example, deploying
the profile to the endpoint).

See “Adjusting the memory allocation” on page 675.

Note: If you change the memory setting, you must do so before you train the
profile to ensure accurate training results. If you have already trained the profile,
you must retrain it again after you adjust the memory allocation.

Step 4 Start the training Click Start Training to begin the profile training process.
process. During the training process, the system:

■ Extracts the key features from the content;

■ Creates the model;
■ Calculates the predicted accuracy based on the averaged false positive and
false negative rates for the entire training set;
■ Generates the VML profile.
Detecting content using Vector Machine Learning (VML) 674
Configuring VML profiles and policy conditions

Table 28-6 Training the VML profile (continued)

Step Action Description

Step 5 Verify training When the training process completes, the system indicates if the training profile
completion. was successfully created.

If the training process failed, the system displays an error. Check the debug log
files and restart the training process.

See “Debug log files” on page 337.

On successful completion of the training process, the system displays the following
information for the New Profile:

■ Trained Example Documents

The number of example documents in each training set that the system has
trained against and profiled.
■ Accuracy Rate From Training
The quality of the training set expressed as base false positive and base false
negative percentage rates.
See “About the base accuracy from training percentage rates” on page 666.
■ Memory
■ The minimum amount of memory that is required to load the profile at run-time
for detection.

Note: If you previously accepted the profile, the system also displays the Current
Profile statistics for side-by-side comparison.

Step 6 Accept or reject the If the training process is successful, the system prompts you to accept or reject
training profile. the training profile. Your decision is based on the Accuracy Rate from Training
percentages.

See “About the base accuracy from training percentage rates” on page 666.
To accept or reject the training profile:

■ Click Accept to save the training results as the active Current Profile.
Once you accept the training profile, it appears in the Current Profile tab
and the Temporary Workspace tab is removed.
■ Click Reject to discard the training results.
The profile remains in the Temporary Workspace tab for editing. You can
adjust one or both of the training sets by adding or removing documents and
retraining the profile.
See “Managing training set documents” on page 676.

Note: A trained VML profile is not active until you accept it. The system lets you
create a policy based on a VML profile that has not been trained or accepted.
However, the VML profile is not deployed to that policy until the profile is accepted.
See “About using unaccepted VML profiles in policies” on page 667.
Detecting content using Vector Machine Learning (VML) 675
Configuring VML profiles and policy conditions

Table 28-6 Training the VML profile (continued)

Step Action Description

Step 7 Test and tune the Once you have successfully trained and accepted the VML profile, you can now
profile. use it to define policy rules and tune the VML profile.

See “Configuring the Detect using Vector Machine Learning Profile condition”
on page 679.

See “About the Similarity Threshold and Similarity Score” on page 667.
Note: For more information, refer to the Symantec Data Loss Prevention Vector
Machine Learning Best Practices Guide, available at the Symantec Support
Center at (http://www.symantec.com/docs/DOC8733).

Adjusting the memory allocation

The Memory Allocation setting determines the amount of memory that is required to load
VML the profile at run-time for policy detection. When you allocate more memory to training
the larger the VML profile, the profile becomes larger. More features are modeled. By default
this value is set to "High." You should not normally adjust this value. Resources are limited on
the endpoint. If you intend to deploy the VML profile to the endpoint, use a lower memory
setting to reduce the size of the profile.
To adjust memory allocation
1 Click Adjust beside the Memory Allocation setting.
This setting is available in the Temorary Workspace tab. If it is not available, click Manage
Profile from the Current Profile tab.
See “Working with the Current Profile and Temporary Workspace tabs” on page 670.
2 Select the desired memory allocation level.
The following options are available:
■ High
Requires a higher amount of run-time memory; generally yields higher detection
accuracy (default setting).
■ Medium
■ Low
Requires less run-time memory; may result in lower detection accuracy.

3 Click Save to save the setting.

The Memory Setting display should reflect the adjustment you made.
Detecting content using Vector Machine Learning (VML) 676
Configuring VML profiles and policy conditions

4 Click Start Training to start the training process.

You must adjust the memory allocation before you train the VML profile. If you have already
trained the profile, retrain after adjusting this setting.
See “Training VML profiles” on page 672.
5 Verify the amount of memory that is required to run the VML profile.
After you train the VML profile, the system displays the Memory Required (KB) value.
This value, represents the minimum amount of memory that is required to load the profile
at run-time.
See “Managing VML profiles” on page 677.

Managing training set documents

As you train and tune a VML profile, you may need to adjust one or both of the training sets.
For example, if you reject a training profile, you must add or remove example documents to
improve the training accuracy rates.
See “About the base accuracy from training percentage rates” on page 666.
To add documents to a training set
1 Click Manage Profile for the profile you want to edit.
The editable profile appears in the Temporary Workspace tab.
2 Click Upload Contents.
See “Uploading example documents for training” on page 671.
To remove documents from a training set
1 Click Manage Profile for the profile you want to edit.
The editable profile appears in the Temporary Workspace tab.
2 Click the red X in the Mark Removed column for the trained document you want to remove.
The removed document appears in the Removed Documents table. Repeat this process
as necessary to remove all unwanted documents from the training set.
3 Click Start Training to retrain the profile.
You must retrain and accept the updated profile to complete the document removal
process. If you do not accept the new profile the document you attempted to remove
remains part of the profile.
See “Training VML profiles” on page 672.
Detecting content using Vector Machine Learning (VML) 677
Configuring VML profiles and policy conditions

To revert removed documents

1 Click the revert icon in the Revert column for a document you have removed.
The document is added back to the training set.
2 Click Start Training to retrain the profile.
You must retrain the profile and reaccept it even though you reverted to the original
configuration.

Managing VML profiles

The Manage > Data Profiles > Vector Machine Learning screen is the home page for
managing existing VML profiles and the starting point for creating new VML profiles.
See “Configuring VML profiles and policy conditions” on page 668.

Note: You must have Enforce Server administrator privileges to manage and create VML
profiles.

Table 1 Creating and managing VML profiles

Action Description

Create new profiles. Click New Profile to create a new VML profile.

See “Creating new VML profiles” on page 669.

View and sort The system lists all existing VML profiles and their state at the Vector Machine
profiles. Learning screen.

Click the column header to sort the VML profiles by name or status.

Manage and train Select a VML profile from the list to display and manage it.
profiles.
The Current Profile tab displays the active profile.

See “Working with the Current Profile and Temporary Workspace tabs” on page 670.

Click Manage Profile to edit the profile.

The editable profile appears in the Temporary Workspace tab. From this tab you
can:

■ Upload training set documents.

See “Uploading example documents for training” on page 671.
■ Train the profile.
See “Training VML profiles” on page 672.
■ Add and remove documents from the training sets.
See “Managing training set documents” on page 676.
Detecting content using Vector Machine Learning (VML) 678
Configuring VML profiles and policy conditions

Table 1 Creating and managing VML profiles (continued)

Action Description

Monitor profiles. The system lists and describes the status of all VML profiles.
■ Memory Required (KB)
The minimum amount of memory that is required to load the profile in memory
for detection.
See “Adjusting the memory allocation” on page 675.
■ Status
The present status of the profile.
See Table 28-8 on page 678.
■ Deployment Status
The historical status of the profile.
See Table 28-9 on page 679.

Remove profiles. Click the X icon at the far right to delete an existing profile.

If you delete an existing profile, the system removes the profile metadata and the
Training Set from the Enforce Server.

The Status field displays the current state of each VML profile.

Table 28-8 Status values for VML profiles

Status value Description

Accepted on <date> The date the training profile was accepted.

Managing The current profile is enabled for editing.

Empty The profile is created, but no content is uploaded.

Awaiting Acceptance The profile is ready to be accepted.

Canceling Training The system is in the process of canceling the training.

Training Canceled The training process is canceled.

Failed The training process failed.

Training <time> The training is in progress (for the time indicated).

The Deployment Status field indicates if the VML profile has ever been accepted or not.
Detecting content using Vector Machine Learning (VML) 679
Configuring VML profiles and policy conditions

Table 28-9 Deployment Status values for VML profiles

Status value Description

Never Accepted The VML profile has never been accepted.

See “About using unaccepted VML profiles in policies”
on page 667.

Accepted on <date> The VML profile was accepted on the date indicated.

Changing names and descriptions for VML profiles

If necessary you can change the name of a VML profile or edit its description. When you are
ready to deploy a VML profile to one or more policies, give the profile a self-describing name
so policy authors can easily recognize it.

Note: You do not have to retrain a profile if you change the name or description.

To change the VML profile name or description

1 Select the VML profile from the Manage > Data Profiles > Vector Machine Learning
screen.
See “Managing VML profiles” on page 677.
2 Click the Edit link beside the name of the VML profile.
3 Edit the name and description of the profile in the Change Name and Description dialog
that appears.
4 Click OK to save the changes to the VML profile name or description.
5 Verify the changes at the home screen for the VML profile.

Configuring the Detect using Vector Machine Learning Profile

condition
Once you have trained and accepted the VML profile, you configure a VML policy using the
Detect using Vector Machine Learning Profile condition. This condition references the VML
profile to detect the content that is similar to the example content you have trained against.
See “Configuring VML profiles and policy conditions” on page 668.
Detecting content using Vector Machine Learning (VML) 680
Configuring VML profiles and policy conditions

Table 28-10 Configuring a VML policy rule

Step Action Description

Step 1 Create and train the VML See “Creating new VML profiles” on page 669.
profile.
See “Training VML profiles” on page 672.

See “About using unaccepted VML profiles in policies” on page 667.

Step 2 Configure a new or an existing See “Configuring policies” on page 413.

policy.

Step 3 Add the VML rule to the policy. From the Configure Policy screen:

■ Select Add Rule.

■ Select the Detect using Vector Machine Learning profile rule from
the list of content rules.
■ Select the VML profile you want to use from the drop-down menu.
■ Click Next.

Step 4 Configure the VML detection Name the rule and configure the rule severity.
rule.
See “Configuring policy rules” on page 417.

Step 5 Select components to match Select one or both message components to Match On:
on.
■ Body, which is the content of the message
■ Attachments, which are any files transported by the message

Note: On the endpoint, the Symantec DLP Agent matches on the entire
message, not individual message components.

See “Selecting components to match on” on page 423.

Step 6 Configure additional conditions Optionally, you can create a compound detection rule by adding more
(optional). conditions to the rule.

To add additional conditions, select the desired condition from the

drop-down menu and click Add.
Note: All conditions must match for the rule to trigger an incident.

See “Configuring compound match conditions” on page 429.

Step 7 Save the policy configuration. Click OK then click Save to save the policy.

Configuring VML policy exceptions

In some situations, you may want to implement a VML policy exception to ignore certain
content.
See “Configuring VML profiles and policy conditions” on page 668.
Detecting content using Vector Machine Learning (VML) 681
Configuring VML profiles and policy conditions

Table 28-11 Configuring a VML policy exception

Step Action Description

Step 1 Create and train the VML profile. See “Creating new VML profiles” on page 669.
See “Training VML profiles” on page 672.

Step 2 Configure a new or an existing See “Configuring policies” on page 413.

policy.

Step 3 Add a VML exception to the From the Configure Policy screen:
policy.
■ Select Add Exception.
■ Select the Detect using Vector Machine Learning profile exception
from the list of content exceptions.
■ Select the VML profile you want to use from the drop-down menu.
■ Click Next.

Step 4 Configure the policy exception. Name the exception.

Select the components you want to apply the exception to:

■ Entire Message
Select this option to compare the exception against the entire
message. If an exception is found anywhere in the message, the
exception is triggered and no matching occurs.
■ Matched Components Only
Select this option to match the exception against the same
component as the rule. For example, if the rule matches on the Body
and the exception occurs in an attachment, the exception is not
triggered.

Step 5 Configure the condition. Generally you can accept the default condition settings for policy
exceptions.

See “Configuring policy exceptions” on page 426.

Step 6 Save the policy configuration. Click OK then click Save to save the policy.

Adjusting the Similarity Threshold

You adjust the Similarity Threshold setting to tune the VML profile. The Similarity Threshold
determines how similar detected content must be to a VML profile to produce an incident.
See “About the Similarity Threshold and Similarity Score” on page 667.

Note: You do not have to retrain the VML profile after you adjust the Similarity Threshold,
unless you modify a training set based on testing results.
Detecting content using Vector Machine Learning (VML) 682
Configuring VML profiles and policy conditions

To adjust the Current Value of the Similarity Threshold

1 Click Edit beside the Similarity Threshold label for the VML profile you want to tune.
This action opens the Similarity Threshold dialog.
2 Drag the meter to the desired Curent Value setting.
You set the Similarity Threshold to a decimal value between 0 and 10. The default value
is 10, which produces fewer incidents; a setting of 0 produces more incidents.
3 Click Save to save the Similarity Threshold setting.
4 Test the VML profile using a VML policy.
Compare the Similarity Scores across matches. A detected message must have a Similarity
Score higher than the Similarity Threshold to produce an incident. Make further adjustments
to the Similarity Threshold setting as necessary to optimize and fine-tune the VML profile.
See “Configuring the Detect using Vector Machine Learning Profile condition” on page 679.

Testing and tuning VML profiles

You tune a VML profile by testing it with the Similarity Threshold set to 0. After you determine
the possible range of Similarity Scores for false positives, adjust the Similarity Threshold to
be greater than the highest Similarity Score that false positives reports. This process is known
as negative testing.
A good training set has a well-defined range where the Similarity Threshold is set to achieve
the best accuracy rates. A poor training set yields a poor accuracy result regardless of the
Similarity Threshold. A Similarity Threshold that is set too high or too low can result in a large
number of false positives or false negatives.
To determine the proper Similarity Threshold setting, the recommendation is to perform negative
testing as described in the following steps.

Table 28-12 Steps for tuning VML profiles

Step Action Description

Step 1 Train the VML profile. Follow the recommendations in this guide for defining the category and uploading
the training set documents. Adjust the memory allocation before you train the
profile. Refer to the Symantec Data Loss Prevention Administration Guide for help
performing the tasks involved.

Step 2 Set the Similarity The default Similarity Threshold is 10. At this value the system does not generate
Threshold to 0. any incidents. A setting of 0 produces the most incidents, many of which are likely
to be false positives. The purpose of setting the value to 0 is to see the entire
range of potential matches. It also servers to tune the profile to be greater than
the highest false positive score.
Detecting content using Vector Machine Learning (VML) 683
Configuring VML profiles and policy conditions

Table 28-12 Steps for tuning VML profiles (continued)

Step Action Description

Step 3 Create a VML policy. Create a policy that references the VML profile you want to tune. The profile must
be accepted to be deployable to a policy.

Step 4 Test the policy. Test the VML policy using a corpus of test data. For example, you can use the
DLP_Wikipedia_sample.zip file to test your VML policies against. Create a
mechanism to detect incidents. The mechanism can be a Discover scan target of
a local file folder where you place the test data. Or it can be a DLP Agent scan of
a copy/paste operation.

Step 5 Review any incidents. Review any matches at the Incident Snapshot screen. Verify a relatively low
Similarity Score for each match. A relatively low Similarity Score indicates a false
positive. If one or more test documents produce a match with a relatively high
Similarity Score, you have a training set quality issue. In this case you need to
review the content and if appropriate add the document(s) to the positive training
set. You then need to retrain and retune the profile.

See “Log files for troubleshooting VML training and policy detection” on page 686.

Step 6 Adjust the Similarity Review the incidents to determine the highest Similarity Score among the detected
Threshold. false positives that you have tested the profile against. Then, you can adjust the
Similarity Threshold for the profile to be greater than the highest Similarity Score
for the false positives.

For example, if the highest detected false positive has a Similarity Score of 4.5,
set the Similarity Threshold to 4.6. This setting filters the known false positives
from being reported as incidents.

Properties for configuring training

VML includes several property files for configuring VML training and logging. The following
table lists and describes relevant VML configuration properties.

Table 28-13 Property files for VML

Property file at \Protect\config\ Description

MLDTraining.properties Main property file for configuring VML training settings.

See Table 28-14 on page 684.

Manager.properties Property file for the Enforce Server; contains 1 VML setting.

See Table 28-15 on page 685.

Detecting content using Vector Machine Learning (VML) 684
Configuring VML profiles and policy conditions

Table 28-13 Property files for VML (continued)

Property file at \Protect\config\ Description

MLDTrainingLogging.properties Properties file for configuring VML logging.

See “Log files for troubleshooting VML training and policy
detection” on page 686.

The following table lists and describes the VML training parameters available for configuration
in properties file MLDTraining.properties.

Table 28-14 Relevant configuration parameters for VML training

Parameter Description

minimum_documents_per_category Specifies the minimum number of documents that are

required for each training set (positive and negative). The
default setting is 50. Reducing this number below 50 is
not recommended or supported.

See “Recommendations for training set definition”

on page 689.

mld_num_folds Specifies the number of folds to use for the k-fold

evaluation process. The default is 10.

Reducing this value speeds up the time the system takes

to train against the content because fewer folds are
evaluated. This speed up occurs potentially at the sacrifice
of visibility into profile quality. You don't need to change
this value, unless you have a large number of example
documents (and thus the training sets are very large). Or,
unless you know for certain that you have a
well-categorized overall training set.

See “Recommendations for accepting or rejecting a profile”

on page 692.

minimum_features_to_keep Specifies the minimum number of features to keep for the

profile. The default setting is 1000.

Lowering this value can help reduce the size of the profile.
However, adjusting this setting is not recommended.
Instead, use the memory allocation setting to tune the size
of the profile.

See “Guidelines for profile sizing” on page 691.

Detecting content using Vector Machine Learning (VML) 685
Configuring VML profiles and policy conditions

Table 28-14 Relevant configuration parameters for VML training (continued)

Parameter Description

significance_threshold Specifies the minimum number of times a word must occur

before it is considered a feature. The default is 2.

Increasing this value (to 3 or 4, for example) may help

reduce the size of the profile because fewer words qualify
as features. You should not adjust this setting unless
setting the memory allocation to "Low" does not produce
a small enough profile for your deployment requirements.

See “Guidelines for profile sizing” on page 691.

stopword_file Specifies the default stopword file

\config\machinelearningconfig\stopwords.txt.

Stopwords are common words, such as articles and

prepositions. During training the system ignores (does not
consider for feature extraction) any word that is contained
in the stopwords file.

If you add words to be ignored, you must use all lower

case because VML feature extraction normalizes the
content to lower case for evaluation.

logging_config_file Specifies the configuration file for standard VML logging.

See “Log files for troubleshooting VML training and policy

detection” on page 686.

native_logging_config_file Specifies the configuration file for native VML logging.

See “Log files for troubleshooting VML training and policy

detection” on page 686.

The following parameter is available for configuration in properties file

MLDTraining.properties.

Table 28-15 Configuration parameter for VML profiles

Parameter Description

DEFAULT_SIMILARITY_THRESHOLD Establishes the default value for the Similarity Threshold,

which is 10. Changing this value affects the default value
only. You can adjust the value using the Enforce Server
administration console.

See “Testing and tuning VML profiles” on page 682.

Detecting content using Vector Machine Learning (VML) 686
Configuring VML profiles and policy conditions

Log files for troubleshooting VML training and policy detection

The system provides debug log files for troubleshooting the VML training process and policy
detection. The following table lists and describes the debug log files.
See “Troubleshooting policies” on page 445.

Table 28-16 Debug log files for VML

Log file Description

machinelearning_training.log Records the accuracy from training percentage rates for

each fold of the evaluation process for each VML profile
training run.

Examines the quality of each training set at a granular,

per-fold level.

See “Recommendations for accepting or rejecting a

profile” on page 692.

machinelearning_native_filereader.log Records the "distance," which is expressed as a positive

or negative number, and the "confidence," which is a
similarity percentage, for each message evaluated by a
VML policy.

Examines all messages or documents evaluated by VML

policies, including positive matches with similarity
percentages beneath the Similarity Threshold, or
messages the system has categorized as negative
(expressed as a negative "distance" number).

See “Testing and tuning VML profiles” on page 682.

machinelearning_training_native_manager.log Records the total number of features modeled and the

number of features kept to generate the profile for each
training run.
The total number of features modeled versus the number
of features kept for the profile depends on the memory
allocation setting:

■ If "high" the system keeps 80% of the features.

■ If "medium" the system keeps 50% of the features.
■ If "low" the system keeps 30% of the features.

See “Guidelines for profile sizing” on page 691.

Detecting content using Vector Machine Learning (VML) 687
Best practices for using VML

Best practices for using VML

This section provides best practices for implementing VML policies, including best practices
for testing and tuning your VML policies.
In addition, you can download example VML training set documents from the Symantec Support
Center at http://www.symantec.com/docs/DOC8733. These documents are provided under
the Creative Commons license (http://creativecommons.org/licenses/by-sa/3.0/).
Table 28-17 provides a summary of the VML best practices that are discussed in this section.
It includes links to individual topics for more in-depth recommendations.

Table 28-17 Summary of VML best practices

Functional area Best practice

Recommended Use VML to protect unstructured, text-based content. Do not use VML to protect graphics, binary
uses for VML data, or personally identifiable information (PII).

See “When to use VML” on page 688.

Category of content Define the VML profile based on a single category of content that you want to protect. The
category of content should be derived from a specific business use case. Narrowly defined
categories are better than broadly defined ones.

See “Recommendations for training set definition” on page 689.

Positive training set Archive and upload the recommended (250) number of example documents for the positive
training set, or at least the minimum (50).

See “Guidelines for training set sizing” on page 690.

Negative training Archive and upload the example documents for the negative training set. Ideally the negative
set training set contains a similar number of well-categorized documents as the positive training set.
In addition, add some documents containing generic or neutral content to your negative training
set.

See “Guidelines for training set sizing” on page 690.

Profile sizing Consider adjusting the memory allocation to low. Internal testing has shown that setting the
memory allocation to low may improve accuracy in certain cases.

See “Guidelines for profile sizing” on page 691.

Training set quality Reject the training result and adjust the example documents if either of the base accuracy rates
from training are more than 5%.

See “Recommendations for accepting or rejecting a profile” on page 692.

Profile tuning Perform negative testing to tune the VML profile by using a corpus of testable data.

See “Testing and tuning VML profiles” on page 682.

Detecting content using Vector Machine Learning (VML) 688
Best practices for using VML

Table 28-17 Summary of VML best practices (continued)

Functional area Best practice

Profile deployment Remove accepted profiles not in use by policies to reduce detection server load. Tune the
Similarity Threshold before deploying a profile into production across all endpoints to avoid
network overhead.

See “Recommendations for deploying profiles” on page 694.

When to use VML

VML is designed to protect unstructured content that is primarily text-based. VML is well-suited
for protecting sensitive content that is highly distributed such that gathering all of it for
fingerprinting is not possible or practical. VML is also well-suited for protecting sensitive content
that you cannot adequately describe and achieve high matching accuracy.
The following table summarizes the recommended uses cases for VML.

Table 28-18 Recommended uses for VML

Use VML when Explanation

It is not possible or practical Often collecting all of the content you want to protect for fingerprinting is an impossible
to fingerprint all the data you task. This situation arises for many forms of unstructured data: marketing materials,
want to protect. financial documents, patient records, product formulas, source code, and so forth.

VML works well for this situation because you do not have to collect all of the content
you want to protect. You collect a smaller set of example documents.

You cannot adequately Often describing the data you want to protect is difficult without sacrificing some
describe the data you want to accuracy. This situation may arise when you have long keyword lists that are hard to
protect. generate, tune, and maintain.

VML works well in these situations because it automatically models the features
(keywords) you want to protect. It enables you to easily manage and update the source
content.

A policy reports frequent false Sometimes a certain category of information is a constant source of false positives.
positives. For example, a weekly sales report may consistently produce false positives for a Data
Identifier policy looking for social security numbers.

VML may work well here because you can train against the content that causes the
false positives and create a policy exception to ignore those features.
Note: The false positive contents must belong to a well-defined category for VML to
be an effective solution for this use case. See “Recommendations for training set
definition” on page 689.
Detecting content using Vector Machine Learning (VML) 689
Best practices for using VML

When not to use VML

VML is not designed to protect structured data, such as Personally Identifiable Information
(PII), or binary content, such as documents that contain mostly graphics or image files.
The following table summarizes the non-recommended uses of VML.

Table 28-19 Non-recommended uses for VML

Do not use VML to Explanation

Protect personally identifiable Exact Data Matching (EDM) and Data Identifiers are the best option for protecting the
information (PII). common types of PII.

Protect binary files and Indexed Document Matching (IDM) is the best option to protect the content that is
images. largely binary, such as image files or CAD files.

Recommendations for training set definition

A VML category is the specific business use case from which you derive your example
documents for training the VML profile. The more specific the category the better the detection
results. For example, the category "Financial Documents" is not recommended because it is
too broad. A better category classification is "Sales Forecasts" or "Quarterly Earnings" because
each is particular to a specific business use case.
A VML category contains two sets of training content: positive and negative. The positive
training set contains content you want to protect; the negative training set contains content
you want to ignore. You should derive both the positive and negative training sets from the
same category of content such that all documents are thematically related.
Using an entirely generic content for the negative training set, while possible, is not
recommended. While generic content produces good design-time training accuracy rates, you
cannot detect the content you want to protect at run-time with sufficient accuracy.

Note: While a completely generic negative training set is not recommended, seeding the
negative training set with some neutral-content documents does have value. See “Guidelines
for training set sizing” on page 690.

The following table provides some example categories and possible positive and negative
training sets comprising those categories.
Detecting content using Vector Machine Learning (VML) 690
Best practices for using VML

Table 28-20 Some example categories and training sets

Category Positive training set Negative training set

Product source code Proprietary product source code Source code from open source
projects

Product formulas Proprietary product formulas Non-proprietary product information

Quarterly earnings Pre-release earnings; sales estimates; Details of published annual accounts
accounting documents

Marketing plans Marketing plans Published marketing collateral and

advertising copy

Medical records Patient medical records Healthcare documents

Customer sales Customer purchasing patterns Publicly available consumer data

Mergers and acquisitions Confidential legal documents; M&A Publicly available materials; press
documents releases

Manufacturing methods Proprietary manufacturing methods Industry standards

and research

Guidelines for training set sizing

VML is only as accurate as the example content you train. To use VML you do not have to
locate all the data you want to protect, nor do you have to describe it. Instead, your sample
documents must accurately represent the type of content you want to protect They must also
represent content that you want to ignore. This content must be thematically related to the
positive content.
Higher numbers of example documents collected for training yield more accurate VML profiles.
A well-defined category of content contains 500 example documents: 250 positive and 250
negative. The minimum number of documents per training set is 50.
Ideally, you collect a similar number of negative and positive documents for training. You
should seed the negative training set with generic or neutral-content documents. The archive
file DLP_Wikipedia_sample.zip that is attached to this guide at the Symantec Support Center
is provided for this purpose.
As an example, your positive training set contains 250 example documents and your negative
training set contains 150 documents. You can add 100 to 200 generic documents to your
negative training set from the DLP_Wikipedia_sample.zip archive file. Internal testing has
shown that adding generic content to complement a well-defined negative training set can
improve accuracy for VML.
Detecting content using Vector Machine Learning (VML) 691
Best practices for using VML

If you cannot collect enough positive documents to meet the minimum requirement, you can
upload the under-sized training set multiple times. For example, consider a case where you
have the category of content "Sales Forecasts." For this category you have collected 25 positive
spreadsheets and 50 negative documents. In this case, you can upload the positive training
set twice to reach the minimum document threshold and equal the number of negative
documents. Note that you should use this technique for development and testing purposes
only. Production profiles should be trained against at least the minimum number of documents
for both training sets.
Table 28-21 lists the optimal, recommended, and minimum number of documents to include
in each training set.

Note: These training set guidelines assume an average document size of 3 KB. If you have
larger-sized documents, fewer in number may be sufficient.

Table 28-21 Training set size guidelines

Training set Minimum Recommended

Positive example documents 50 250

Negative example documents 50 250

Total number of documents for the

100 500
category

Recommendations for uploading documents for training

While you can upload individual documents to the Enforce Server for training, it is recommended
that you upload a document archive (ZIP, RAR, TAR) that contains the example documents
for each training set. The maximum upload size is 30 MB. There is no training set size limit.
To gather the documents for training, it is recommended that you create a staging area. For
example, consider a category called "Sales Reports." In this case you would create a folder
called \VML\training_stage\sales_reports that represents the category. Within this folder
you would create two subfolders, one for the positive training set and the other for the negative
training set (for example: \VML\training_stage\sales_reports\positive). When you are
ready to train the profile, you compress the positive subfolder and the negative subfolder into
separate document archives. You can partition the training set across archives if you have
more than 30 MB of data to upload for a training set. Do not embed an archive within an archive.

Guidelines for profile sizing

Before you train a VML profile, you can adjust the amount of memory allocated to the profile.
The amount of memory you allocate determines how many features the system models, which
Detecting content using Vector Machine Learning (VML) 692
Best practices for using VML

in turn affects the size of the profile. The higher the memory allocation setting, the more in-depth
the feature extraction and the plotting of the model, and the larger the profile. In general, for
server-based policy detection, the recommended memory allocation setting is high, which is
the default setting.
On the endpoint, the VML profile is deployed to the host computer and loaded into memory
by the DLP Agent. (Unlike EDM and IDM, VML does not rely on two-tier detection for endpoint
policies.) Because memory on the endpoint is limited, the recommendation is to allocate low
or medium memory for endpoint policies. Internal testing has shown that reducing the memory
allocation does not reduce the accuracy of the profile and may improve accuracy in certain
situations.

Table 28-22 Memory allocation recommendations

Memory allocation Description

High Default setting generally appropriate for server-based detection.

Medium Use this setting to reduce the size of the profile.

Low Use this setting for endpoint detection.

Recommendations for accepting or rejecting a profile

When you train a VML profile against the category content, the system selects features, creates
the model, and calculates the base accuracy rates for false positives and negatives. Base
accuracy rates are calculated using a standard and generally accepted process called k-folds
evaluation. The base accuracy rates provide you with an early indicator of the quality of your
category training sets.
To illustrate how the k-folds evaluation process works, assume that you have a category with
500 total example documents: 250 positive and 250 negative. During the training run, the
system divides the training set into 10 folds. Each fold is a distinct subset of the overall training
set and contain both positive and negative example documents. The system uses nine folds
to generate a VML profile, and one fold to test the profile. Any of the folds can become the
test fold for the first round of evaluation. For the next round, the next fold in the queue becomes
the test fold. This process repeats for all 10 folds. The system performs a final training run
called the cross-fold, averages the results of all folds, and generates the final model.
On successful completion of the training process, the system displays the averaged accuracy
rates and prompts you to accept or reject the training profile. The false positive accuracy rate
is the percentage of negative test documents that are misclassified as positive. The false
negative rate is the percentage of positive test documents that are misclassified as negative.
As a general guideline, you should reject the training profile if either rate is more than 5%.
Detecting content using Vector Machine Learning (VML) 693
Best practices for using VML

Note: You can use the log file machinelearning_training.log to evaluate per-fold training
accuracy rates.
See “Log files for troubleshooting VML training and policy detection” on page 686.

Guidelines for accepting or rejecting training results

You decide to accept or reject a training profile based on the false positive and false negative
percentages that the system displays to you at the end of the training process.
See “About the Similarity Threshold and Similarity Score” on page 667.
To better understand how the system calculates the Machine Learning Profile training set
accuracy rates, consider the following example.
You have a training set that includes 1000 documents, 500 positive and 500 negative. When
you train the profile, the system takes 90% of the documents, extracts the features, and creates
a model. It takes the remaining 10% of the documents and evaluates their features against
the model for similarity. It then produces false positive and false negative accuracy rates. This
process is known as the "fold." For each training set, the system evaluates ten folds, each
time comparing a different 10% of the documents against the 90%. At the end of the cycle,
the system performs a cross-fold evaluation of all ten folds. It then produces an average
accuracy percentage rate for both the positive and negative categories.
Assume that the result of the training process yields a base false positive rate of approximately
1.2% and a base false negative rate of approximately 1%. On average, 1.2% of the negative
documents in the training set are mis-categorized as positive, and 1% of the documents in the
training set are mis-categorized as negative. While the goal is 0% for both rates, in general a
percentage rate under 5% for each category is acceptable.
The percentages that are produced at the end of the training process are averages across the
10 folds. Rather than relying on the general 5% rule of thumb, the better practice is to review
the percentage rate results for each fold. To review the percentage rates, examine the log file
\ProgramData\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\logs\debug\mld0.log
(Windows) or
/var/log/Symantec/DataLossPrevention/DetectionServer/15.5/debug/mld0.log (Linux).
As shown below, the individual fold rates give a reading for each of the ten folds on which you
can base your decision to accept or reject the profile.

Table 28-23 Training set accuracy evaluation process

Fold evaluation Per fold category accuracy rates and cross-fold averages

Fold 0 false positive rate 2.013422727584839 false negative rate 0.0

Fold 1 false positive rate 1.3513513803482056 false negative rate 1.7857142686843872

Detecting content using Vector Machine Learning (VML) 694
Best practices for using VML

Table 28-23 Training set accuracy evaluation process (continued)

Fold evaluation Per fold category accuracy rates and cross-fold averages

Fold 2 false positive rate 1.3513513803482056 false negative rate 0.8928571343421936

Fold 3 false positive rate 1.3513513803482056 false negative rate 1.7857142686843872

Fold 4 false positive rate 1.3513513803482056 false negative rate 0.8928571343421936

Fold 5 false positive rate 1.3513513803482056 false negative rate 2.6785714626312256

Fold 6 false positive rate 0.0 false negative rate 0.0

Fold 7 false positive rate 0.6756756901741028 false negative rate 0.0

Fold 8 false positive rate 1.3513513803482056 false negative rate 0.8928571343421936

Fold 9 false positive rate 1.3513513803482056 false negative rate 1.8018018007278442

Cross-fold Avg False Positive Rate 1.214855808019638 Avg False Negative Rate
1.0730373203754424

Recommendations for deploying profiles

Accepted VML profiles are transferred to every detection server and Symantec DLP Agent
even if those profiles are not required by the active policies on that server or endpoint. Detection
servers load all VML profiles into memory regardless of whether or not any associated VML
policies are deployed to those servers. DLP Agents only load the VML profiles that are required
by an active policy. To optimize server performance, it is recommended not to deploy (accept)
unnecessary VML profiles and remove any accepted (deployed) VML profiles that are not
required by active policies.
In addition, when you change the Similarity Threshold, the system re-syncs the entire profile
with the detection servers and DLP Agents. If you have a large VML profile and possible
bandwidth limitations (for example, deployment to many endpoints), this may cause network
congestion. In this case you should test and tune the profile at a select few endpoints before
deploying the profile into production at every endpoint on your network.
Chapter 29
Detecting content using
Form Recognition -
Sensitive Image Recognition
This chapter includes the following topics:

■ About Form Recognition detection

■ Configuring Form Recognition detection

■ Managing Form Recognition profiles

■ Advanced server settings for Form Recognition

■ Viewing a Form Recognition incident

About Form Recognition detection

Form Recognition provides the ability to detect forms that contain sensitive information, such
as tax forms, medical forms, insurance forms, and so on.
Form Recognition detects form images in a variety of image formats, including the following:
■ Microsoft Office documents
■ PDF (version 1.2 and later only)
■ PDF that use AcroForms format
■ XFA (Only the hard-copy image, or the image that you would see if you printed the form,
is supported. Soft copies, such as fillable forms, are not supported. Text extraction from
XFA is also not supported
■ JPEG (.jpg, .jpeg)
Detecting content using Form Recognition - Sensitive Image Recognition 696
Configuring Form Recognition detection

■ PNG
■ TIFF (single page or multi-page, .tif or .tiff)
■ Bitmap (.bmp, .dib)
Form Recognition is available for Network Monitor, Network Prevent for Email, Network Prevent
for Web, and Network Discover. Form Recognition is not available for Endpoint Discover,
Endpoint Prevent, or any cloud detectors.
See “Configuring Form Recognition detection” on page 696.
See “About extracting images from Microsoft Office documents for OCR and Form Recognition”
on page 706.

How Form Recognition works

Symantec Data Loss Prevention analyzes the features of your blank forms and stores the
results as key points in the Form Recognition profile. This process is called indexing. Then
the detection server compares images in network traffic or stored in data repositories to the
forms you have indexed. The extent that the detected form matches key points in indexed
blank form is called the alignment. By default, 85% of the key points must match or align for
the form to be considered a match.
The comparison between the detected image and the indexed blank form also allows Symantec
Data Loss Prevention to determine how much of the form has been filled in. The fill threshold
is represented as a range from 1-10, where 1 is a minimally filled-in form, and 10 is an entirely
filled-in form. You use the fill threshold to specify when Symantec Data Loss Prevention creates
an incident. A low fill threshold creates more incidents by detecting partially filled-in,
electronically fillable forms with at least one check-box filled, or incomplete forms. A high fill
threshold creates fewer incidents, but may not catch all possible data loss. A fill threshold of
0 detects all matching forms, including blank forms. By default, the fill threshold for a Form
Recognition profile is 1. You can specify another value when you create a profile. You can
also adjust this value for an existing profile to fine-tune your detection results.
See “Configuring Form Recognition detection” on page 696.
See “Managing Form Recognition profiles” on page 700.

Configuring Form Recognition detection

To configure Form Recognition, you collect a blank set of forms that you want to protect and
add them to a ZIP archive of single-page PDF files. This ZIP archive is called a Gallery Archive.
You then upload your gallery archive to a Form Recognition profile on the Enforce Server for
indexing. The Enforce Server indexes your forms and pushes the index out to your detection
servers. You also specify the fill threshold for the profile: the fill threshold specifies how much
of the form must be filled to trigger an incident.
Detecting content using Form Recognition - Sensitive Image Recognition 697
Configuring Form Recognition detection

Table 29-1 provides a high-level workflow for configuring Form Recognition detection:

Table 29-1 Form Recognition workflow

Step Action More information

1 Collect and prepare blank copies of the forms you want to protect. See “Preparing a Form Recognition
Gallery Archive” on page 697.

2 Configure a Form Recognition profile. Specify the Gallery Archive See “Configuring a Form Recognition
with the forms you want to detect and a Fill Threshold for creating profile” on page 698.
incidents.

3 Configure a policy with a Form Recognition detection or exception See “Configuring the Form Recognition
rule using your Form Recognition profile. detection rule” on page 699.

See “Configuring the Form Recognition

exception rule” on page 700.

Preparing a Form Recognition Gallery Archive

The Form Recognition gallery archive is a ZIP archive containing single-page PDF copies of
the blank forms you want to protect. You use the gallery archive to create a Form Recognition
profile.
Symantec recommends that you index no more than 500 total images across all Form
Recognition profiles. To improve performance, Symantec recommends creating fewer profiles
that contain more forms, rather than more profiles that contain fewer forms.
For best results, ensure that the form images in your gallery archive meet the following
guidelines:
■ The PDF files containing the form images should be at least 200 DPI.
■ Forms with electronically fillable fields must be in ArcroForm format. Other interactive form
formats are not supported for detection.
■ Each form should have a sufficient amount of text and graphical content. Sparse forms
may cause more false matches.
■ Each form should contain unique content. Forms that share very similar content are harder
to match and may cause more false matches. For example, tax forms from 2014 and 2015
would share many similar features, and would be difficult to detect if they were in the same
profile.
■ Each form should have content evenly distributed across the page. Forms with clustered
content and sparse areas are more difficult to match.
■ Each form should have either white or light-colored backgrounds. Black or dark backgrounds
are not supported.
Detecting content using Form Recognition - Sensitive Image Recognition 698
Configuring Form Recognition detection

To prepare a Form Recognition Gallery Archive

1 Collect blank copies of the forms you want to detect.
2 Save all blank copies of forms as PDF files. Consider the following guidelines as you
prepare PDF files:
■ The gallery must only contain PDF files. Symantec Data Loss Prevention ignores any
other folders and files in the ZIP archive.
■ If a form has two or more pages, separate them into single-page files, then convert to
PDF format.
For example, if your form is a single three-page Microsoft Word file titled
YourForm.docx, separate the file into three separate single-page files, then convert
them to PDF:
■ YourForm_1of3.PDF

■ YourForm_2of3.PDF

■ YourForm_3of3.PDF

■ If your form contains electronically fillable fields, use a PDF editing tool for the
conversion process that retains AcroForms formatting, for example Adobe Acrobat.
■ If your form includes several pages of un-fillable boilerplate, only add the fillable pages
to your gallery archive.

3 Add all single-page PDF files to a ZIP archive.

Configuring a Form Recognition profile

Configure a Form Recognition profile by uploading a Gallery Archive and specifying a Fill
Threshold.
See “Preparing a Form Recognition Gallery Archive” on page 697.
To configure and index a Form Recognition profile
1 Navigate to Manage > Data Profiles > Form Recognition to display the Form
Recognition Profiles screen.
2 Click Add Profile to display the Configure Form Recognition Profile.
3 Enter a name for the profile in the Name field.

Note: The name you enter is used when you configure policies and appears in the incident
snapshot for Form Recognition incidents.

4 (Optional) Enter a description for the profile in the Description field.

Detecting content using Form Recognition - Sensitive Image Recognition 699
Configuring Form Recognition detection

5 Enter a value in the Fill Threshold field.

The fill threshold is a range from 1-10, where 1 represents a form that has been filled in
minimally, and 10 a form that has been filled in completely. You can also enter 0 to detect
blank forms.

Note: For electronically filled forms, entering 1 for the fill threshold detects any electronically
filled item on a form. For example, setting the threshold to 1 detects a single selected
check-box. In contrast, setting the threshold to 1 may not detect a similar check-box that
has been filled in using a pen.

6 Upload the gallery archive by clicking Browse and selecting the gallery archive ZIP file.
7 Click Save to begin indexing the profile.
When the gallery completes indexing, you can use it to configure a Form Recognition rule
in a policy.
See “Configuring the Form Recognition detection rule” on page 699.

Configuring the Form Recognition detection rule

You configure the detection rule by specifying a Form Recognition profile.
See “Configuring a Form Recognition profile” on page 698.
The indexed forms in the profile are compared against detected forms to determine if the forms
match. The Form Recognition rule matches on attachments only.
To configure the Form Recognition detection rule
1 Go to Manage Policies > Policy List, click New, and create a new blank policy or policy
from a template.
See “Adding a new policy or policy template” on page 412.
2 Click Add Rule on the Detection tab to display the Configure Policy - Add Rule.
3 Select Detect using Form Recognition Profile in the the Form Recognition section
and select the Form Recognition profile that contains the forms you want to protect.
4 Click Next to display the Configure Policy - Edit Rule page.
5 Enter a name for the rule in the Rule Name field.
6 Choose the rule severity.
See “Policy severity” on page 374.
Detecting content using Form Recognition - Sensitive Image Recognition 700
Managing Form Recognition profiles

7 Select the conditions for the Form Recognition detection rule.

You can use the Also Match field to configure compound match conditions. See
“Compound conditions” on page 394.
8 Click OK to add the detection rule.
9 Click Save to apply the detection rule to the policy.
The new policy displays in the Policy List.

Configuring the Form Recognition exception rule

You configure the exception rule by specifying a Form Recognition profile.
See “Configuring a Form Recognition profile” on page 698.
To configure the Form Recognition exception rule
1 Go to Manage Policies > Policy List, click New, and create a new blank policy or policy
from a template.
See “Adding a new policy or policy template” on page 412.
2 Click Add Exception on the Detection tab to display the Configure Policy - Add
Exception.
3 Select Detect using Form Recognition Profile in the Form Recognition section and
select the Form Recognition profile that contains the forms you want to protect.
4 Click Next to display the Configure Policy - Edit Exception page.
5 Enter a name for the exception in the Exception Name field.
6 Select the conditions for the Form Recognition detection rule.
You can use the Also Match field to configure compound match conditions. See
“Compound conditions” on page 394.
7 Click OK to add the exception rule.
8 Click Save to apply the detection rule to the policy.
The new policy displays in the Policy List.

Managing Form Recognition profiles

The Form Recognition Profiles screen (Manage > Data Profiles > Form Recognition) to
provides a summarized view of all Form Recognition profiles. You can use this screen to
confirm that a profile was indexed successfully, view the indexing status, and so on.
Detecting content using Form Recognition - Sensitive Image Recognition 701
Managing Form Recognition profiles

Table 29-2 Form Recognition Profiles details

Element Description

Add Profile Click Add Profile to configure a new Form Recognition profile.
See “Configuring a Form Recognition profile” on page 698.

Show Entries Select a value from Show Entries to specify the number of profiles
you can view on this page.

Page navigation You can use the following buttons to change the view of profiles:

■ Click Last to view profiles with the most recent dates in ascending
order.
■ Click a number to navigate to that specific page number.
■ Click Next to view the next page.
■ Click Previous to view the previous page.

Profile Name Click the Profile Name to view or edit the profile.
Note: You can sort column data in ascending order (A-Z/1-3) by
clicking the up arrow or descending order (Z-A/3-1) by clicking the
down arrow.

Description The profile description. You can edit the description by clicking the
profile name or the pencil icon in the Actions column.

State Each profile displays one of the following states:

■ Gallery missing or invalid displays when indexing for the profile

failed. The gallery did not upload because the ZIP archive is invalid.

■ Indexing not started displays when indexing for the profile did not
start. The uploaded gallery did not process.
■ Indexing in progress displays when the uploaded gallery is
indexing.
■ Profile indexed displays when indexing for this profile is complete
and the index successfully created.
■ Invalid gallery displays when indexing for the profile failed. The
uploaded gallery did not start indexing because it is invalid.
■ Index contains no images displays when indexing for the profile
failed. The uploaded gallery did not index because it contains no
compatible files.
■ Indexing failed displays when indexing for this profile failed. The
uploaded gallery was not indexed.
■ Indexing found some unusable files displays when indexing for
the profile completes with errors. Some of the files in the uploaded
gallery cannot be indexed.
Detecting content using Form Recognition - Sensitive Image Recognition 702
Advanced server settings for Form Recognition

Table 29-2 Form Recognition Profiles details (continued)

Element Description

Gallery The gallery archive name.

You cannot edit the gallery name. You can upload a new gallery or an
existing gallery that has been renamed by clicking the profile name or
the pencil icon in the Actions column.

Usable Forms Count The total number of form images in the gallery that have been indexed
without errors and can be used in a policy.

Date Indexed The date when the profile was last indexed.

Index Version The version number of the index.

Fill Threshold The fill threshold value you provided when you configured the Form
Recognition profile. You can edit this value by clicking the profile name
or the pencil icon in the Actions column.

Actions Click the Pencil to edit profile details.

Click the red X to delete a profile. If you delete a profile, the system
removes the profile metadata and gallery from the Enforce Server.

Advanced server settings for Form Recognition

Some of the default Form Recognition server settings might require testing and fine-tuning to
determine what works best for your needs. You can modify these settings on the System >
Servers and Detectors > Overview > Server/Detector Detail - Advanced Settings page.
Symantec recommends that you contact Symantec Technical Support before modifying any
advanced server settings.
There are nine advanced settings related to Form Recognition:
■ ContentExtraction.ImageExtractorEnabled
■ ContentExtraction.MaxNumImagesToExtract
■ FormRecognition.ALIGNMENT_COEFFICIENT
■ FormRecognition.CANONICAL_FORM_WIDTH
■ FormRecognition.MAXIMUM_FORM_WIDTH
■ FormRecognition.MINIMUM_FORM_ASPECT_RATIO
■ FormRecognition.MINIMUM_FORM_WIDTH
■ FormRecognition.OPENCV_THREADPOOL_SIZE
Detecting content using Form Recognition - Sensitive Image Recognition 703
Viewing a Form Recognition incident

■ FormRecognition.PRECLASSIFIER_ACTION
You can see details about these settings here:
See “Advanced server settings” on page 285.

Viewing a Form Recognition incident

You view and remediate Form Recognition incidents as you would any Symantec Data Loss
Prevention incident. See “About incident remediation” on page 1841.
In addition to the usual incident snapshot information, Form Recognition incidents include:
■ Yellow highlighted areas on the form, which indicate form elements that align and electronic
fields that have been filled.
■ Orange highlighted areas on the form, indicating questionable areas.
■ A Similarity Score which indicates how similar the form elements are. The higher the
score, the more statistically similar the field contents are to the form fields.
Chapter 30
Detecting Content using
OCR - Sensitive Image
Recognition
This chapter includes the following topics:

■ About content detection with OCR Sensitive Image Recognition

■ OCR Server system requirements

■ Using diagnostics for sizing OCR Server deployments

■ Creating a null policy to assist in OCR diagnostics for Discover Servers

■ Using the OCR Server Sizing Estimator spreadsheet

■ Setting up OCR Servers

■ Installing an OCR Sensitive Image Recognition license

■ Creating an OCR configuration

■ Using the OCR engine

■ More about languages and Dictionaries

■ Viewing OCR incidents in reports

■ Advanced Server settings and Troubleshooting for Sensitive Image Recognition content
extraction
Detecting Content using OCR - Sensitive Image Recognition 705
About content detection with OCR Sensitive Image Recognition

About content detection with OCR Sensitive Image

Recognition
OCR (optical character recognition) Sensitive Image Recognition provides the capability to
extract text from images (scanned documents, screen shots, pictures, Microsoft office
documents, and so on) and from PDFs, enabling you to use new or preexisting text-based
detection rules on this content.
The extracted text then enters the detection chain and is processed identically to conventionally
extracted text. Incident snapshots for OCR text are similar to those for conventionally extracted
text: the text excerpt is displayed, with the detected words highlighted. OCR incidents have
visual indicators denoting that the text came from OCR, and a thumbnail of the original image.
You can set up OCR to use various languages. To improve recognition results, you can also
choose a specialized dictionary (such as legal, financial, or medical) to enable supplemental
spell checking. You can also set up a customized dictionary to deal with proper nouns or other
terms specific to your business.
While OCR content extraction can integrate with both Windows and Linux detection servers,
Symantec supports installing the OCR Server on Windows servers only. OCR content extraction
is not supported on the Windows Agents, macOS Agents, the Data Loss Prevention cloud
services, or the Data Loss Prevention appliances (both virtual and physical). For information
on supported versions of Windows servers, see the Symantec Data Loss Prevention System
Requirements Guide at
http://www.symantec.com/docs/DOC10602
See “Installing an OCR Sensitive Image Recognition license” on page 711.

Detection types supported for OCR extraction

The following detection types are supported for OCR extraction:
■ Network Monitor
■ Network Prevent for Email
■ Network Prevent for Web
■ Network Discover
■ Cloud Prevent for Office 365 on Azure

File types supported for OCR extraction

Images of the following file types are extracted and sent to OCR:
■ JPEG (.jpg, .jpeg)
Detecting Content using OCR - Sensitive Image Recognition 706
OCR Server system requirements

■ PNG
■ TIFF (single page or multi-page, .tif or .tiff)
■ Bitmap (.bmp)
■ Images extracted from PDF files, such as pages from a scanned document.
■ Images extracted from Microsoft Office documents.

About extracting images from Microsoft Office documents for OCR

and Form Recognition
You can extract images from Microsoft Office documents for OCR and Form Recognition
detection in Symantec Data Loss Prevention15.5. Data Loss Prevention can extract image file
formats including BMP, PNG, and JPG from Word, Excel, and PowerPoint. This capability is
dynamically enabled by default and can be disabled or statically enabled by changing the
ContentExtraction.ImageExtractor Advanced setting.

See “Advanced Server settings and Troubleshooting for Sensitive Image Recognition content
extraction” on page 715.

OCR Server system requirements

The OCR (optical character recognition) Server has specific hardware, operating system, and
server settings requirements, different from the Data Loss Prevention Enforce Server and
detection servers. You can find the latest information on these requirements in the article
"Symantec Data Loss Prevention OCR Server System Requirements and OCR Server Sizing
Estimator" at the Symantec Support Center at http://www.symantec.com/docs/doc10612.html
See “Using diagnostics for sizing OCR Server deployments” on page 706.

Using diagnostics for sizing OCR Server deployments

When you enable OCR.RECORD_REQUEST_STATISTICS on a given detection server, the detection
server starts logging. It collects metrics on the images that it encounters that are suitable for
OCR submission. Not all images that the detection server encounters are suitable for OCR
submission. For example, the images that are the wrong dimensions or are unlikely to contain
text that can be transcribed won’t be submitted to OCR for processing.
You can measure the proportion of files and messages that Data Loss Prevention inspects
and that contain images that can be submitted to OCR. The resulting metrics can be used to
help you properly size and scale your OCR Server deployment. First, you need to set the
OCR.RECORD_REQUEST_STATISTICS Advanced Server setting to true. Then, Symantec
recommends that you allow the detection server to operate normally for one calendar week.
Detecting Content using OCR - Sensitive Image Recognition 707
Using diagnostics for sizing OCR Server deployments

The system collects metrics on the images that are encountered and logs the results in the
OcrRequestsRecord0.log for the last 24 hours. If you let the server run for one calendar week,
you can plot the “trailing 24 hour” data over this longer interval. This longer run enables you
to see the peaks and valleys of your potential OCR image load. During this process, no incidents
are created and only the images that are suitable for submission to OCR are counted.

Note: You do not have to have the Data Loss Prevention Symantec Data Loss Prevention
Sensitive Image Recognition add-on license to use this feature. You can estimate sizing
requirements for an OCR Server deployment in advance of purchasing the DLP Sensitive
Image Recognition add-on license that includes the OCR feature.

Figure 30-1 is a sample of an OcrRequestsRecord0.log showing a snapshot of the results.

In it you can see samples of the values that you can enter in the OCR Server Sizing Estimator
spreadsheet to help you to size your OCR Server deployment.

Figure 30-1 Sample OcrRequestsRecord0.log results

After you run the OCR diagnostics, disable OCR.RECORD_REQUEST_STATISTICS to disable

logging to the OcrRequestRecord0.log file.
To run diagnostics for OCR sizing for the Network Prevent for Email, Network Prevent for Web,
and Network Monitor data-in-motion channels
1 Go to System > Servers and Detectors > Overview and select a detection server.
2 Click Server Settings.
3 Set OCR.RECORD_REQUEST_STATISTICS to true.
4 Click Save.
5 Restart the detection server.
6 Let the detection server run for a week and collect metrics. This process works best for
the data in motion channels, such as Network Prevent for Email, Network Prevent for
Web, and Network Monitor.
Detecting Content using OCR - Sensitive Image Recognition 708
Creating a null policy to assist in OCR diagnostics for Discover Servers

7 Consult the OcrRequestsRecord0.log to get the values to enter in the OCR Server Sizing
Estimator spreadsheet.
8 Go to the OCR Server Sizing Estimator spreadsheet at
https://www.symantec.com/docs/DOC10612.
9 Enter data in the green cells from the log for the following values:
Percentage of messages containing images requiring OCR (OCR messages)
Estimated average number of images per OCR message
10 The spreadsheet calculates the number of OCR Servers that you need to deploy for the
image traffic of each detection server in your Symantec Data Loss Prevention deployment.
11 Set OCR.RECORD_REQUEST_STATISTICS to false to disable logging.
You use a different technique for estimating OCR Server sizing requirements for Network
Discover. See “Creating a null policy to assist in OCR diagnostics for Discover Servers”
on page 708.

Creating a null policy to assist in OCR diagnostics for

Discover Servers
When you enable OCR.RECORD_REQUEST_STATISTICS on a given detection server, the detection
server starts logging. It collects metrics on the images that it encounters that are suitable for
OCR submission. Not all images that the detection server encounters are suitable for OCR
submission. For example, the images that are the wrong dimensions or are unlikely to contain
text that can be transcribed won’t be submitted to OCR for processing.
For Network Discover, you can directly measure the proportion of images suitable for submission
to OCR for each Discover scan target by enabling the OCR.RECORD_REQUEST_STATISTICS
advanced setting before you run a scan against that target. To expedite the scan process,
Symantec recommends binding a null policy to the Discover scan target.
Creating a null policy group
1 Go to System > Servers and Detectors > Policy Groups.
2 Click Add.
3 In the Name field, enter a name, such as Null Group and a description.
4 Set the Policy Group to Null.
5 Check all boxes under Servers and Detectors to assign the policy group to all servers.
6 Click Save.
Detecting Content using OCR - Sensitive Image Recognition 709
Creating a null policy to assist in OCR diagnostics for Discover Servers

Creating a null policy that is suspended

1 Go to Manage > Policies > Policy List.
2 Click New.
3 Click Add a blank policy.
4 Add a Name, Null Policy.
5 Set the Policy Group to Null.
6 Set the Status to Suspended.
7 On the Detection tab, click Add Rule then add an existing rule such as Message
Attachment or File Name Match or Message Attachment or File Type Match with no
exceptions to the policy.
Bind the suspended policy to the Null policy group and run the scans
1 Go to Manage > Discover Scanning > Discover Targets.
2 Click New Target, and selection File System from the pull-down menu. On the General
tab, type the Name of the Discover target.
3 In the General section, create a new scan named, for example, "Fileshare Scan."
4 Select the Null Policy policy group.
5 Under Scan Execution select Always scan all items.
6 Indicate the targets that you want to scan.
7 Under Scan Schedule, schedule the scans.
When the scans are invoked, Discover crawls all of the scan targets. Files that are detected
in the pipeline are analyzed and metrics for images are collected. Make sure that
OCR.RECORD_REQUEST_STATISTICS is enabled. However, incidents are not generated
since there’s no active policy that is associated with the scans.
The scans take time, since crawling remote repositories is a time-consuming operation,
but the crawling goes faster than normal since no policies are executed.
8 After the scan operation is complete, unbind the null policy group from the scans and
re-bind the appropriate policy groups.
9 Consult the OcrRequestsRecord0.log to get the values to enter in the OCR Server Sizing
Estimator spreadsheet. See Figure 30-1 on page 707.
10 Go to the OCR Server Sizing Estimator spreadsheet at
https://www.symantec.com/docs/DOC10612.
Detecting Content using OCR - Sensitive Image Recognition 710
Using the OCR Server Sizing Estimator spreadsheet

11 Enter the data from the log into the green cells in the spreadsheet for the following values:
Percentage of messages containing images requiring OCR (OCR messages)
Estimated average number of images per OCR message
12 The spreadsheet calculates the number of OCR Servers that you need to deploy for the
image traffic of each detection server in your Symantec Data Loss Prevention deployment.
13 Set OCR.RECORD_REQUEST_STATISTICS to false to disable logging.
See “Using the OCR Server Sizing Estimator spreadsheet” on page 710.

Using the OCR Server Sizing Estimator spreadsheet

The OCR Server Sizing Estimator spreadsheet can help you to estimate how many OCR
Servers you need for each detection server in your deployment. The spreadsheet and directions
on how to use it are available at the Symantec Support Center at
https://www.symantec.com/docs/doc10612.html
See “Setting up OCR Servers” on page 710.

Setting up OCR Servers

OCR content extraction also requires installation of an OCR Server. You configure the OCR
Server (micro service) from the Enforce Server administration console. Symantec recommends
that you install the OCR Server on dedicated hardware, or on VMs with dedicated resources,
because of its high processing requirements. A certificate for communication between the
OCR client on the Enforce Server and the OCR Server is also required.
The OCR Server is an independent server, separate from any Data Loss Prevention detection
server. You can configure the detection server to talk to an OCR address (IP address or host
name). That address can either be a single OCR Server, or a single load balancer in front of
several OCR Servers. You can use an external load balancer or another technology, such as
Windows Network Load Balancing.
Note: A detection server can only be configured with a single OCR Server address. You can
use the IP address or host name for a single OCR Server. Or, you can use the virtual IP
address for a load balancer (or pair of load balancers) that front-ends multiple OCR Servers.
If you want to configure a detection server to communicate with a pool of OCR Servers, the
detection server is limited to supporting configuration of a single OCR Server address. You
must front-end multiple OCR Servers by a load balancer that provides that single address. In
addition, we only support load balancers without persistence enabled.
In the single OCR Server case, it can be installed on a separate computer, or on the same
computer as the detection server (not recommended). Configuration information is included
Detecting Content using OCR - Sensitive Image Recognition 711
Installing an OCR Sensitive Image Recognition license

with the request, so OCR Servers can service requests from different detection servers that
are configured differently.
For example, you can configure one detection server to detect English with the highest possible
OCR accuracy. Then, you can configure another detection server to detect Japanese, with
the highest possible speed. In this case, the same OCR Server is able to handle both types
of requests. Symantec recommends that you install the OCR Server on a computer separate
from the detection server. However, Symantec supports co-locating of the OCR Server with
a detection server.
You install an OCR Server using the Symantec DLP OCR Server Installer setup wizard.
To install an OCR Server
1 Open the OCR Server Installer.
2 Double click OCRServerInstaller64.
3 Click Next.
4 Select desired Destination directory. Click Next. The installer runs.
5 Click Finish when the installation is complete.
Now the OCR service is running and is ready to receive OCR requests.
See “Creating an OCR configuration” on page 711.

Installing an OCR Sensitive Image Recognition license

When you first purchase Symantec Data Loss Prevention, upgrade to a later version, or
purchase additional product modules, you must install one or more Symantec Data Loss
Prevention license files. To use OCR (optical character recognition), you must install the
Symantec Data Loss Prevention OCR Sensitive Image recognition license. License files have
names in the format name.slf.
See “Installing a new license file” on page 234. for more information on adding a license to
Symantec Data Loss Prevention.
See “OCR Server system requirements” on page 706.

Creating an OCR configuration

Adding an OCR profile
1 Go to System > Settings > OCR Engine Configuration.
2 Click Add OCR Engine Configuration.
Detecting Content using OCR - Sensitive Image Recognition 712
Creating an OCR configuration

Configuring the OCR Engine

1 Enter the Name of the profile.
2 Enter an optional Description of the profile.
3 Enter the OCR server hostname of the server where the OCR requests should be sent.
It can be a single load balancer or an individual OCR Server.
4 Enter the Port number of the port where requests should be sent. The default port is 8555.
5 Enter the OCR Engine timeout (seconds) value. This setting defines how long before
an OCR request should be timed out. The default timeout is 30.
The timeout is how much time the request is allowed to spend inside the OCR (optical
character recogniton) Server, and does not include transit time or other delays.
The timeout needs to be set with the other content timeout settings in the Advanced
Settings. As with other content extraction operations, if the timeout is reached, the OCR
component is skipped and the previously extracted content moves on to detection.

6 Enter a value for Accuracy vs speed. By default, the OCR Server sets the value
dynamically for each document. The Sensitive Image Recognition pre-classifier is on the
detection server inspects each image and determines if it is suitable for OCR content
extraction (and form recognition). It then determines which preset is most appropriate. If
you uncheck this box, you can select a preset to use for all images. You can choose from
Accurate, Balanced, or Fast. This strategy can be appropriate for Discover scans, where
accuracy is prioritized over time.
7 In the Supported Languages section, select the candidate languages for OCR.
You can select one or more languages, and then the OCR Server selects a language
from that pool to use for the image. Symantec assumes that documents are primarily one
language (for example, all French, or all English, as opposed to mixed English and French).
The number of languages should be as small as possible. The more languages you select,
the slower the processing speed.
Even if a language is not selected, you may still get accurate text from that language. For
example, you can select English and German and submit a mixed English-French image
the OCR Server. It may choose English and still return some French text. The language
selection affects which spell-check dictionary to use. It also affects the pool of characters
to choose from if a character in the image is unclear.
8 In the Languages and Dictionaries Specialized Dictionaries section, you enable
supplemental spell checking for different businesses (legal, financial, medical) across
different languages.
Detecting Content using OCR - Sensitive Image Recognition 713
Using the OCR engine

9 In the Languages and Dictionaries Custom Dictionary section, specify the name of
your custom dictionary file to aid recognition accuracy. For example, if certain proper
nouns give the OCR Server difficulty, you can place them in this custom dictionary.
Using Dictionaries and spell checking improves recognition results for low-quality scans
and images (such as faxes). If the characters are crisp and clean they are easier for the
engine to read, and the Dictionaries are less useful.
10 The custom dictionary is a text file, with one entry per line. This text file must be placed
in the dictionary directory of each server at c:\Symantec\DLPOCR\Protect\bin.
Assign a profile to a detection server
1 Go to System > Servers and Detectors > Overview.
2 Select a monitor.
3 On the Server/Detector Detail page, click Configure.
4 On the Configure Server page, click OCR Engine. In OCR Engine Configuration select
the configuration that you want to use for the server.
5 Click Save.
See “Using the OCR engine” on page 713.

Using the OCR engine

You can see all of your OCR configurations and add an OCR Engine configuration on the OCR
Engine Configuration page. On this page you can
■ Click Add OCR Engine Configuration to add a new configuration.
■ Click the name of the configuration or the pencil icon to edit an existing configuration.
■ Click the red X to delete a configuration.
See “Server configuration—basic”on page 705 on page 705.
See “Viewing OCR incidents in reports” on page 715.

More about languages and Dictionaries

Instead of choosing from a pool of languages, the OCR Server assumes that all selected
languages may be in the image. This is a good strategy for the mixed language document use
case, but selecting more than four languages is not recommended, as it can adversely affect
both speed and accuracy.
Detecting Content using OCR - Sensitive Image Recognition 714
More about languages and Dictionaries

Specialized Dictionaries available for OCR content extraction

The following specialized Dictionaries are available for OCR content extraction:
■ Dutch Legal Dictionary
■ Dutch Medical Dictionary
■ English Financial Dictionary
■ English Legal Dictionary
■ English Medical Dictionary
■ French Legal Dictionary
■ French Medical Dictionary
■ German Legal Dictionary
■ German Medical Dictionary

Languages supported for OCR extraction

The following languages are supported for OCR extraction:
■ Arabic
■ Chinese (Simplified)
■ Chinese (Traditional)
■ Czech
■ Danish
■ Dutch
■ English
■ Finnish
■ French
■ German
■ Greek
■ Hungarian
■ Italian
■ Japanese
■ Korean
■ Norwegian
Detecting Content using OCR - Sensitive Image Recognition 715
Viewing OCR incidents in reports

■ Polish
■ Portuguese
■ Portuguese (Brazilian)
■ Romany
■ Russian
■ Spanish
■ Swedish
■ Turkish
Other languages can be detected if they use supported character sets.

Viewing OCR incidents in reports

OCR incidents are flagged and detected text is highlighted in yellow in incident reports.
Thumbnails of the page are included in the incident. Clicking on the thumbnail enables you to
view a larger version of the image. This image contains the extracted text that violates the
Symantec Data Loss Prevention policy.

Advanced Server settings and Troubleshooting for

Sensitive Image Recognition content extraction
The following tables detail Advanced settings and troubleshooting tips for Sensitive Image
Recognition content extraction.

Table 30-1 Advanced settings for OCR and FR image extraction

Advanced setting State Behavior

ContentExtraction.ImageExtractorEnabled =1 Default value ■ Enabled when a Form

Recognition rule is present
Dynamically enables
and enabled or when an
and disables extraction
OCR configuration is
of images, with no
assigned to the Monitor.
restarts required.
■ Disabled when no Form
Recognition rule is present
and enabled or when no
OCR configuration is
assigned to the Monitor.

ContentExtraction.ImageExtractorEnabled =2 Always enabled Images are extracted.

Detecting Content using OCR - Sensitive Image Recognition 716
Advanced Server settings and Troubleshooting for Sensitive Image Recognition content extraction

Table 30-1 Advanced settings for OCR and FR image extraction (continued)

Advanced setting State Behavior

ContentExtraction.ImageExtractorEnabled =0 Always disabled No images are extracted.

ContentExtraction.MaxNumImages_to_Extract Set to 10 by default. The first 10 images are

=10 extracted. You can change this
setting to any value.

Note: You must restart the server when you change Advanced settings.

Consult the following table for troubleshooting tips when using image extraction for OCR and
FR.

Table 30-2 Troubleshooting image extraction for OCR and FR

Issue/Log info Solution

No images are extracted even though you have Check if the ContentExtraction.ImageExtractorEnabled
a Form Recognition rule present or OCR setting is equal to 0. Change it to 1 or 2.
configuration assigned to the monitor.
Make sure that a policy with the Form Recognition rule is not
suspended.

Only 10 images are extracted out of many more Change the value of ContentExtraction.MaxNumImages to
images that are present in your document. Extract =10 in the Advanced settings to a greater value.

Log settings ■ Set com.vontu.detection.level = FINEST in

FileReaderlogging.properties.
■ Set "cehhost" category to TRACE in
log4cxx_config_filereader.xml
Chapter 31
Detecting content using
data identifiers
This chapter includes the following topics:

■ Introducing data identifiers

■ Configuring data identifier policy conditions

■ Modifying system data identifiers

■ Creating custom data identifiers

■ Best practices for using data identifiers

Introducing data identifiers

Symantec Data Loss Prevention provides data identifiers to detect specific instances of
described content. Data identifiers let you quickly implement precise, short-form data matching
with minimal effort.
Data identifiers are algorithms that combine pattern matching with data validators to detect
content. Patterns are similar to regular expressions but more efficient because they are tuned
to match the data precisely. Validators are accuracy checks that focus the scope of detection
and ensure compliance.
For example, the "Credit Card Number" system data identifier detects numbers that match a
specific pattern. The matched pattern is validated by a "Luhn check," which is an algorithm.
In this case the validation is performed on the first 15 digits of the number that evaluates to
equal the 16th digit.
Symantec Data Loss Prevention provides pre-configured data identifiers that you can use to
detect commonly used sensitive data, such as credit card, social security, and driver's license
numbers. Most data identifiers come in three breadths—wide, medium, and narrow—so you
Detecting content using data identifiers 718
Introducing data identifiers

can fine-tune your detection results. Data identifiers offer broad support for detecting
international content.
If a system-defined data identifier does not meet your needs, you can modify it. You can also
define your own custom data identifiers to detect any content that you can describe.
See “System-defined data identifiers” on page 718.
See “Selecting a data identifier breadth” on page 739.

System-defined data identifiers

Symantec Data Loss Prevention provides several system-defined data identifiers to help you
detect and validate pattern-based sensitive data.

Table 31-1 System data identifiers

Category Description

Personal Identity Detect various types of identification numbers for the regions of Africa, Asia Pacific, Europe,
North America, and South America.

See “Personal identity data identifiers” on page 718.

Financial Detect financial identification numbers, such as credit card numbers and ABA routing numbers.

See “Financial data identifiers” on page 729.

Healthcare Detect U.S. and international drug codes, and other healthcare-related pattern-based sensitive
data.

See “Healthcare data identifiers” on page 729.

Information Detect IP addresses.

Technology
See “Information technology data identifiers” on page 730.

International International keywords for PII data identifiers.

keywords
See “International keywords for PII data identifiers” on page 730.

Personal identity data identifiers

Symantec Data Loss Prevention provides various data identifiers for detecting personally
identifiable information (PII) for the regions of Africa, Asia Pacific, Europe, North America, and
South America.
Table 31-2 lists system-defined data identifiers for the Middle East and Africa region.
Detecting content using data identifiers 719
Introducing data identifiers

Table 31-2 African personal identity

Data identifier Description

South African Personal Identification Number See “South African Personal Identification Number”
on page 1469.

Table 31-3 lists system-defined data identifiers for the Asia Pacific region.

Table 31-3 Asia Pacific personal identity

Data identifier Description

Australia Driver's License Number See “Australia Driver's License Number” on page 1018.

Australian Business Number See “Australian Business Number wide breadth”

on page 1020.

Australian Company Number See “Australian Company Number” on page 1022.

Australian Passport Number See “Australian Passport Number” on page 1027.

Australian Tax File Number See “Australian Tax File Number” on page 1029.

China Passport Number See “China Passport Number” on page 1079.

Hong Kong ID See “Hong Kong ID” on page 1215.

India RuPay Card Number See “India RuPay Card Number” on page 1252.

Indian Aadhaar Card Number See “Indian Aadhaar Card Number” on page 1249.

Indian Permanent Account Number See “Indian Permanent Account Number” on page 1251.

Indonesian Identity Card Number See “Indonesian Identity Card Number” on page 1255.

Israel Personal Identification Number See “Israel Personal Identification Number” on page 1276.

Japan Driver's License Number See “Japan Driver's License Number” on page 1285.

Japan Passport Number See “Japan Passport Number” on page 1287.

Japanese Juki-Net Identification Number See “Japanese Juki-Net Identification Number”

on page 1289.\

Japanese My Number - Corporate See “Japanese My Number - Corporate” on page 1291.

Japanese My Number - Personal See “Japanese My Number - Personal” on page 1292.

Kazakhstan Passport Number See “Kazakhstan Passport Number” on page 1295.

Korea Passport Number See “Korea Passport Number” on page 1296.

Detecting content using data identifiers 720
Introducing data identifiers

Table 31-3 Asia Pacific personal identity (continued)

Data identifier Description

Korean Residence Registration Number for Foreigners See “Korea Residence Registration Number for Foreigners”
on page 1298.

Korean Residence Registration Number for Korean See “Korea Residence Registration Number for Korean”
on page 1300.

Macau Individual Identification Number See “Macau National Identification Number” on page 1331.

Malaysia Passport Number See “Malaysia Passport Number” on page 1333.

Malaysian MyKad Number See “Malaysian MyKad Number (MyKad)” on page 1335.

New Zealand Driver's License Number See “New Zealand Driver's Licence Number” on page 1370.

New Zealand National Health Index Number See “New Zealand National Health Index Number”
on page 1371.

New Zealand Passport Number See “New Zealand Passport Number” on page 1373.

People's Republic of China ID See “People's Republic of China ID” on page 1384.

Singapore NRIC See “Singapore NRIC data identifier” on page 1451.

Sri Lanka National Identity Number See “Sri Lanka National Identity Number” on page 1490.

Taiwan ID See “Taiwan ROC ID” on page 1515.

Thailand Passport Number See “Thailand Passport Number” on page 1517.

Thailand Personal Identification Number See “Thailand Personal Identification Number” on page 1519.

United Arab Emirates Personal Number See “United Arab Emirates Personal Number” on page 1544.

Table 31-4 lists system-defined data identifiers for the European region.

Table 31-4 European personal identity

Data identifier Description

Austria Passport Number See “Austria Passport Number” on page 1030.

Austria Tax Identification Number See “Austria Tax Identification Number” on page 1031.

Austria Value Added Tax (VAT) Number See “Austria Value Added Tax (VAT) Number” on page 1033.

Austrian Social Security Number See “Austrian Social Security Number” on page 1036.

Belgian National Number See “Belgian National Number” on page 1039.

Detecting content using data identifiers 721
Introducing data identifiers

Table 31-4 European personal identity (continued)

Data identifier Description

Belgium Driver's License Number See “Belgium Driver's Licence Number” on page 1042.

Belgium Passport Number See “Belgium Passport Number” on page 1044.

Belgium Tax Identification Number See “Belgium Tax Identification Number” on page 1045.

Belgium Value Added Tax (VAT) Number See “Belgium Value Added Tax (VAT) Number”
on page 1047.

Bulgaria Value Added Tax (VAT) Number See “Bulgaria Value Added Tax (VAT) Number”
on page 1060.

Bulgarian Uniform Civil Number - EGN See “Bulgarian Uniform Civil Number - EGN” on page 1063.

Burgerservicenummer See “Burgerservicenummer” on page 1066.

Codice Fiscale See “Codice Fiscale” on page 1081.

Croatia National Identification Number See “Croatia National Identification Number” on page 1104.

Cyprus Tax Identification Number See “Cyprus Tax Identification Number” on page 1109.

Cyprus Value Added Tax (VAT) Number See “Cyprus Value Added Tax (VAT) Number” on page 1111.

Czech Republic Driver's License Number See “Czech Republic Driver's Licence Number”
on page 1112.

Czech Republic Personal Identification Number See “Czech Republic Personal Identification Number”
on page 1114.

Czech Republic Tax Identification Number See “Czech Republic Tax Identification Number”
on page 1117.

Czech Republic Value Added Tax (VAT) Number See “Czech Republic Value Added Tax (VAT) Number”
on page 1121.

Denmark Personal Identification Number See “Denmark Personal Identification Number” on page 1126.

Denmark Tax Identification Number See “Denmark Tax Identification Number” on page 1128.

Denmark Value Added Tax (VAT) Number See “Denmark Value Added Tax (VAT) Number”
on page 1130.

Estonia Driver's Licence Number See “Estonia Driver's Licence Number” on page 1147.

Estonia Personal Identification Code See “Estonia Personal Identification Code” on page 1151.

Estonia Passport Number See “Estonia Passport Number” on page 1149.

Detecting content using data identifiers 722
Introducing data identifiers

Table 31-4 European personal identity (continued)

Data identifier Description

Estonia Value Added Tax (VAT) Number See “Estonia Value Added Tax (VAT) Number” on page 1153.

European Health Insurance Card Number See “European Health Insurance Card Number”
on page 1156.

Finland Driver's Licence Number See “Finland Driver's Licence Number” on page 1165.

Finland European Health Insurance Number See “Finland European Health Insurance Number”
on page 1167.

Finland Passport Number See “Finland Passport Number” on page 1169.

Finland Tax Identification Number See “Finland Tax Identification Number” on page 1171.

Finland Value Added Tax (VAT) Number See “Finland Value Added Tax (VAT) Number” on page 1173.

Finnish Personal Identification Number See “Finnish Personal Identification Number” on page 1175.

France Driver's License Number See “France Driver's License Number” on page 1177.

France Health Insurance Number See “France Health Insurance Number” on page 1179.

France Tax Identification Number See “France Tax Identification Number” on page 1181.

France Value Added Tax (VAT) Number See “France Value Added Tax (VAT) Number” on page 1182.

French INSEE Code See “French INSEE Code” on page 1185.

French Passport Number See “French Passport Number” on page 1187.

French Social Security Number See “French Social Security Number” on page 1188.

German Passport Number See “German Passport Number” on page 1190.

German Personal ID Number See “German Personal ID Number” on page 1192.

Germany Driver's License Number See “Germany Driver's License Number” on page 1194.

Germany Tax Identification Number See “Germany Tax Identification Number” on page 1198.

Germany Value Added Tax (VAT) Number See “Germany Value Added Tax (VAT) Number”
on page 1196.

Greece Passport Number See “Greece Passport Number” on page 1200.

Greece Social Security Number (AMKA) See “Greece Social Security Number (AMKA)” on page 1202.

Greece Value Added Tax (VAT) Number See “Greece Value Added Tax (VAT) Number” on page 1206.
Detecting content using data identifiers 723
Introducing data identifiers

Table 31-4 European personal identity (continued)

Data identifier Description

Greek Tax Identification Number See “Greek Tax Identification Number” on page 1204.

Hungarian Social Security Number See “Hungarian Social Security Number” on page 1221.

Hungarian Tax Identification Number See “Hungarian Tax Identification Number” on page 1223.

Hungarian VAT Number See “Hungarian VAT Number” on page 1225.

Hungary Driver's Licence Number See “Hungary Driver's Licence Number” on page 1217.

Hungary Passport Number See “Hungary Passport Number” on page 1219.

Iceland National Identification Number See “Iceland National Identification Number” on page 1241.

Iceland Passport Number See “Iceland Passport Number” on page 1245.

Iceland Value Added Tax (VAT) Number See “Iceland Value Added Tax (VAT) Number” on page 1247.

Ireland Passport Number See “Ireland Passport Number” on page 1266.

Ireland Tax Identification Number See “Ireland Tax Identification Number” on page 1268.

Ireland Value Added Tax (VAT) Number See “Ireland Value Added Tax (VAT) Number” on page 1271.

Irish Personal Public Service Number See “Irish Personal Public Service Number” on page 1274.

Italy Driver's License Number See “Italy Driver's Licence Number” on page 1278.

Italy Health Insurance Number See “Italy Health Insurance Number” on page 1280.

Italy Passport Number See “Italy Passport Number” on page 1282.

Italy Value Added Tax (VAT) Number See “Italy Value Added Tax (VAT) Number” on page 1283.

Latvia Driver's Licence Number See “Latvia Driver's Licence Number” on page 1303.

Latvia Passport Number See “Latvia Passport Number” on page 1305.

Latvia Personal Identification Number See “Latvia Personal Identification Number” on page 1306.

Latvia Value Added Tax (VAT) Number See “Latvia Value Added Tax (VAT) Number” on page 1308.

Liechtenstein Passport Number See “Liechtenstein Passport Number” on page 1311.

Lithuania Personal Identification Number See “Lithuania Personal Identification Number” on page 1312.

Lithuania Tax Identification Number See “Lithuania Tax Identification Number” on page 1315.
Detecting content using data identifiers 724
Introducing data identifiers

Table 31-4 European personal identity (continued)

Data identifier Description

Lithuania Value Added Tax Number See “Lithuania Value Added Tax (VAT) Number”
on page 1317.

Luxembourg National Register of Individuals Number See “Luxembourg National Register of Individuals Number”
on page 1320.

Luxembourg Passport Number See “Luxembourg Passport Number” on page 1322.

Luxembourg Tax Identification Number See “Luxembourg Tax Identification Number” on page 1324.

Luxembourg Value Added Tax (VAT) Number See “Luxembourg Value Added Tax (VAT) Number”
on page 1327.

Malta National Identification Number See “Malta National Identification Number” on page 1337.

Malta Tax Identification Number See “Malta Tax Identification Number” on page 1339.

Malta Value Added Tax (VAT) Number See “Malta Value Added Tax (VAT) Number” on page 1342.

Netherlands Bank Account Number See “Netherlands Bank Account Number” on page 1359.

Netherlands Driver's License Number See “Netherlands Driver's License Number” on page 1362.

Netherlands Passport Number See “Netherlands Passport Number” on page 1363.

Netherlands Tax Identification Number See “Netherlands Tax Identification Number” on page 1364.

Netherlands Value Added Tax (VAT) Number See “Netherlands Value Added Tax (VAT) Number”
on page 1367.

Norway Driver's Licence Number See “Norway Driver's Licence Number” on page 1375.

Norway National Identification Number See “Norway National Identification Number” on page 1377.

Norway Value Added Tax Number See “Norway Value Added Tax Number” on page 1379.

Norwegian Birth Number See “Norwegian Birth Number” on page 1382.

Poland Driver's Licence Number See “Poland Driver's Licence Number” on page 1386.

Poland European Health Insurance Number See “Poland European Health Insurance Number”
on page 1387.

Poland Passport Number See “Poland Passport Number” on page 1389.

Poland Value Added Tax (VAT) Number See “Poland Value Added Tax (VAT) Number” on page 1391.

Polish Identification Number See “Polish Identification Number” on page 1394.

Detecting content using data identifiers 725
Introducing data identifiers

Table 31-4 European personal identity (continued)

Data identifier Description

Polish REGON Number See “Polish REGON Number” on page 1396.

Polish Social Security Number (PESEL) See “Polish Social Security Number (PESEL)” on page 1398.

Polish Tax Identification Number (NIP) See “Polish Tax Identification Number” on page 1400.

Portugal Driver's Licence Number See “Portugal Driver's Licence Number” on page 1402.

Portugal National Identification Number See “Portugal National Identification Number” on page 1404.

Portugal Passport Number See “Portugal Passport Number” on page 1407.

Portugal Tax Identification Number See “Portugal Tax Identification Number” on page 1408.

Portugal Value Added Tax (VAT) Number See “Portugal Value Added Tax (VAT) Number”
on page 1411.

Romania Driver's Licence Number See “Romania Driver's Licence Number” on page 1416.

Romania National Identification Number See “Romania National Identification Number” on page 1419.

Romania Value Added Tax (VAT) Number See “Romania Value Added Tax (VAT) Number”
on page 1420.

Romanian Numerical Personal Code (CNP) See “Romanian Numerical Personal Code” on page 1425.

Russian Passport Identification Number See “Russian Passport Identification Number” on page 1427.

Russian Taxpayer Identification Number See “Russian Taxpayer Identification Number” on page 1428.

SEPA Creditor Identifier Number North See “SEPA Creditor Identifier Number North” on page 1430.

SEPA Creditor Identifier Number South See “SEPA Creditor Identifier Number South” on page 1437.

SEPA Creditor Identifier Number West See “SEPA Creditor Identifier Number West” on page 1441.

Serbia Unique Master Citizen Number See “Serbia Unique Master Citizen Number” on page 1445.

Serbia Value Added Tax (VAT) Number See “Serbia Value Added Tax (VAT) Number” on page 1448.

Slovakia Driver's Licence Number See “Slovakia Driver's Licence Number” on page 1451.

Slovakia National Identification Number See “Slovakia National Identification Number” on page 1453.

Slovakia Passport Number See “Slovakia Passport Number” on page 1457.

Slovakia Value Added Tax (VAT) Number See “Slovakia Value Added Tax (VAT) Number”
on page 1459.
Detecting content using data identifiers 726
Introducing data identifiers

Table 31-4 European personal identity (continued)

Data identifier Description

Slovenia Passport Number See “Slovenia Passport Number” on page 1461.

Slovenia Tax Identification Number See “Slovenia Tax Identification Number” on page 1463.

Slovenia Unique Master Citizen Number See “Slovenia Unique Master Citizen Number” on page 1465.

Slovenia Value Added Tax (VAT) Number See “Slovenia Value Added Tax (VAT) Number”
on page 1467.

Spain Driver's License Number See “Spain Driver's Licence Number” on page 1477.

Spain Value Added Tax (VAT) Number See “Spain Value Added Tax (VAT) Number” on page 1474.

Spanish Customer Account Number See “Spanish Customer Account Number” on page 1479.

Spanish DNI Identification Number See “Spanish DNI ID” on page 1481.

Spanish Passport Number See “Spanish Passport Number” on page 1483.

Spanish Social Security Number See “Spanish Social Security Number ” on page 1485.

Spanish Tax Identification (CIF) See “Spanish Tax Identification (CIF)” on page 1487.

Sweden Driver's Licence Number See “Sweden Driver's Licence Number” on page 1492.

Sweden Personal Identification Number See “Sweden Personal Identification Number” on page 1501.

Sweden Tax Identification Number See “Sweden Tax Identification Number” on page 1494.

Sweden Value Added Tax (VAT) Number See “Sweden Value Added Tax (VAT) Number”
on page 1496.

Swedish Passport Number See “Swedish Passport Number” on page 1499.

Swiss AHV Number See “Swiss AHV Number” on page 1505.

Swiss Social Security Number (AHV) See “Swiss Social Security Number (AHV)” on page 1507.

Switzerland Health Insurance Card Number See “Switzerland Health Insurance Card Number”
on page 1509.

Switzerland Passport Number See “Switzerland Passport Number” on page 1511.

Switzerland Value Added Tax (VAT) Number See “Switzerland Value Added Tax (VAT) Number”
on page 1513.

Turkish Identification Number See “Turkish Identification Number” on page 1521.

UK Bank Account Number Sort Code See “UK Bank Account Number Sort Code” on page 1523.
Detecting content using data identifiers 727
Introducing data identifiers

Table 31-4 European personal identity (continued)

Data identifier Description

UK Driver's Licence Number See “UK Drivers Licence Number” on page 1525.

UK Electoral Roll Number See “UK Electoral Roll Number” on page 1527.

UK Passport Number See “UK Passport Number” on page 1532.

UK National Health Service (NHS) Number See “UK National Health Service (NHS) Number”
on page 1528.

UK National Insurance Number See “UK National Insurance Number” on page 1530.

UK Tax ID Number See “UK Tax ID Number” on page 1534.

UK Value Added Tax (VAT) Number See “UK Value Added Tax (VAT) Number” on page 1536.

Ukraine Identity Card See “Ukraine Identity Card” on page 1539.

Ukraine Passport (Domestic) See “Ukraine Passport (Domestic)” on page 1541.

Ukraine Passport (International) See “Ukraine Passport (International)” on page 1543.

Table 31-5 lists system-defined data identifiers for the North American region.

Table 31-5 North American personal identity

Data identifier Description

Canada Driver's License Number See “Canada Driver's License Number” on page 1067.

Canada Passport Number See “Canada Passport Number” on page 1070.

Canada Permanent Residence (PR) Number See “Canada Permanent Residence (PR) Number”
on page 1072.

Canadian Social Insurance Number See “Canadian Social Insurance Number” on page 1074.

Driver's License Number – CA State See “Driver's License Number – CA State ” on page 1133.

Driver's License Number – FL, MI, MN States See “Driver's License Number - FL, MI, MN States”
on page 1134.

Driver's License Number – IL State See “Driver's License Number - IL State” on page 1136.

Driver's License Number – NJ State See “Driver's License Number - NJ State” on page 1138.

Driver's License Number – NY State See “Driver's License Number - NY State” on page 1139.

Driver's License Number -WA State See “Driver's License Number - WA State” on page 1140.
Detecting content using data identifiers 728
Introducing data identifiers

Table 31-5 North American personal identity (continued)

Data identifier Description

Driver's License Number - WI State See “Driver's License Number - WI State” on page 1142.

Mexican Personal Registration and Identification See “Mexican Personal Registration and Identification
Number Number” on page 1346.

Mexican Tax Identification Number See “Mexican Tax Identification Number” on page 1349.

Mexican Unique Population Registry Code (CURP) See “Mexican Unique Population Registry Code”
on page 1351.

Mexico CLABE Number See “Mexico CLABE Number” on page 1353.

Randomized US Social Security Number (SSN) See “Randomized US Social Security Number (SSN)”
on page 1414.

US Individual Tax ID Number (ITIN) See “US Individual Tax Identification Number (ITIN)”
on page 1546.

US Passport Number See “US Passport Number” on page 1548.

US Social Security Number (SSN) See “US Social Security Number (SSN)” on page 1550.
Note: This data identifer is replaced by the Randomized
US SSN data identifier.

US ZIP+4 Postal Codes See “US ZIP+4 Postal Codes” on page 1553.

Table 31-6 lists system-defined data identifiers for the South American region.

Table 31-6 South American personal identity

Data identifier Description

Argentina Tax Identification Number See “Argentina Tax Identification Number” on page 1015.

Brazilian Election Identification Number See “Brazilian Election Identification Number” on page 1049.

Brazilian National Registry of Legal Entities Number See “Brazilian National Registry of Legal Entities Number”
on page 1053.

Brazilian Natural Person Registry Number See “Brazilian Natural Person Registry Number (CPF)”
on page 1055.

Chilean National Identification Number See “Chilean National Identification Number” on page 1077.

Colombian Addresses See “Colombian Addresses” on page 1082.

Detecting content using data identifiers 729
Introducing data identifiers

Table 31-6 South American personal identity (continued)

Data identifier Description

Colombian Cell Phone Number See “Colombian Cell Phone Number” on page 1085.

Colombian Personal Identification Number See “Colombian Personal Identification Number”

on page 1088.

Colombian Tax Identification Number See “Colombian Tax Identification Number” on page 1090.

Venezuela National Identification Number See “Venezuela National Identification Number”

on page 1555.

Financial data identifiers

Table 31-7 lists system-defined data identifiers for detecting financial identification numbers,
such as credit card numbers and ABA routing numbers.

Table 31-7 Financial data identifiers

Data identifier Description

ABA Routing Number See “ABA Routing Number” on page 1013.

Credit Card Number See “Credit Card Number” on page 1095.

Credit Card Magnetic Stripe Data See “Credit Card Magnetic Stripe Data” on page 1092.

CUSIP Number See “CUSIP Number” on page 1106.

IBAN Central See “IBAN Central” on page 1227.

IBAN East See “IBAN East” on page 1231.

IBAN West See “IBAN West” on page 1237.

International Securities Identification Number See “International Securities Identification Number”

on page 1259.

SWIFT Code See “SWIFT Code ” on page 1503.

Healthcare data identifiers

Table 31-8 lists system-defined data identifiers for detecting U.S. and international drug codes,
and healthcare provider and consumer information.
Detecting content using data identifiers 730
Introducing data identifiers

Table 31-8 Healthcare

Data identifier Description

Australian Medicare Number See “Australian Medicare Number” on page 1024.

British Columbia Personal Healthcare Number See “British Columbia Personal Healthcare Number”
on page 1058.

Drug Enforcement Agency (DEA) Number See “Drug Enforcement Agency (DEA) Number”
on page 1145.

Healthcare Common Procedure Coding System See “Healthcare Common Procedure Coding System
(HCPCS CPT Code) (HCPCS CPT Code)” on page 1208.

Health Insurance Claim Number See “Health Insurance Claim Number” on page 1212.

Medicare Beneficiary Identifier See “Medicare Beneficiary Identifier” on page 1344.

National Drug Code See “National Drug Code (NDC)” on page 1355.

National Provider Identifier Number See “National Provider Identifier Number” on page 1357.

Information technology data identifiers

See Table 31-9 on page 730. lists system-defined data identifiers for detecting information
technology related patterns, such as IPv4 and IPv6 addresses, and mobile device identification
numbers.

Table 31-9 Information technology

Data identifier Description

International Mobile Equipment Identity Number See “International Mobile Equipment Identity Number”
on page 1257.

IP Address See “IP Address” on page 1261.

IPv6 Address See “IPv6 Address” on page 1263.

International keywords for PII data identifiers

Symantec Data Loss Prevention lets you modify system data identifiers and customize the
input keywords to detect a broad range of international content.
See “Extending and customizing data identifiers” on page 731.
See “Use custom keywords for system data identifiers” on page 869.
Detecting content using data identifiers 731
Introducing data identifiers

Extending and customizing data identifiers

You can customize data identifiers to suit your requirements. You can extend system-defined
data identifiers by modifying them. And, you can create new data identifiers for custom data
matching.
The most common use case for modifying a system-defined data identifier is to edit the data
input for a validator that accepts data input. For example, if the data identifier implements the
"Find keywords" validator, you may want to add or remove values from the list of keywords.
Another use case may involve adding or removing validators to or from the data identifier, or
changing one or more of the patterns defined by the data identifier.
See “Cloning a system data identifier before modifying it” on page 777.
To create a custom data identifier, you implement one or more detection pattern(s), select one
or more data validators, provide the data input if the validator requires it, and choose a data
normalizer.
See “Custom data identifier configuration” on page 814.
Policy authors can reuse modified and custom data identifiers in one or more policies.

About data identifier configuration

You can configure three types of data identifiers:
■ Instance – defined at the policy level
See “Configuring data identifier policy conditions” on page 734.
■ Modified – configured at the system-level
See “Modifying system data identifiers” on page 776.
■ Custom – created at the system-level
See “Creating custom data identifiers” on page 811.
The type of data identifier you implement depends on your business requirements. For most
use cases, configuring a policy instance using a non-modified, system-defined data identifier
is sufficient to accurately detect data loss. Should you need to, you can extend a system-defined
data identifier by modifying it, or you can implement one or more custom data identifiers to
detect unique data.
Data identifier configuration done at the policy instance-level is specific to that policy.
Modifications you make to data identifiers at the system-level apply to all data identifiers derived
from the modified data identifier.

About data identifier breadths

System data identifiers are implemented by breadth. The breadth defines the scope of detection
for that data identifier. Each data identifier implements at least one breadth of detection. The
Detecting content using data identifiers 732
Introducing data identifiers

widest option available for the data identifier is likely to produce the most false positive matches;
the narrowest option produces the least. Generally the validators and often the patterns differ
among breadths.
See “Using data identifier breadths” on page 738.
For example, the Driver's License Number – CA State data identifier provides wide and medium
breadths, with the medium breadth using a keyword validator.

Note: Not all system data identifiers provide each breadth of detection. Refer to the complete
list of data identifiers and breadths to determine what is available.
See “Selecting a data identifier breadth” on page 739.

About optional validators for data identifiers

Optional validators help you refine the scope of detection for a data identifier. When you
configure a data identifier instance, you can select among five optional validators.
See “Using optional validators” on page 762.
The type of characters accepted by each optional validator depends on the data identifier.
See “Acceptable characters for optional validators” on page 764.

Note: Optional validators only apply to the policy instance you are actively configuring; they
do not apply system-wide.

About data identifier patterns

Data identifiers implement patterns to match data. The data identifier pattern syntax is similar
to the regular expression language, but more limited. For example, the data identifier pattern
syntax does not support some regular expression features, including grouping, lookahead and
lookbehind expressions, and many special characters (notably the dot "." character). In addition,
the system only allows the use of ASCII characters for data identifier patterns.
See “Using the data identifier pattern language” on page 814.
When you edit a system data identifier, the system exposes the pattern for viewing and editing.
The system-defined data identifier patterns have been tuned and optimized for precise content
matching.
See “Selecting a data identifier breadth” on page 739.
In addition, you can create a custom data identifier in which case you are required to implement
at least one pattern. The best way to understand how to write patterns is to examine the
system-defined data identifier patterns.
Detecting content using data identifiers 733
Introducing data identifiers

See “Writing data identifier patterns to match data” on page 817.

The data identifier pattern language is a subset of the regular expression language.
See “Data identifier pattern language specification” on page 815.

About pattern validators

Pattern validators are validation checks applied to data matched by a data identifier pattern.
Validators help refine the scope of detection and reduce false positives. Many validators allow
for data input. For example, the Keyword validator lets you enter a list of keywords.
See “Using pattern validators” on page 818.
When you modify a data identifier, you can edit the input values for any validator that accepts
data.
See “Editing pattern validator input” on page 778.
When you modify a data identifier, you can add and remove pattern validators. When you
create custom data identifiers, you can configure one or more validators. The system also
provides you with the ability to author a custom script validator to define your own validation
check.
See “Selecting pattern validators” on page 829.

About data normalizers

A data normalizer reconciles the data detected by the data identifier pattern with the format
expected by the normalizer. You cannot modify the normalizer of a system-defined data
identifier. When you create a custom data identifier, you select a data normalizer.
See “Acceptable characters for optional validators” on page 764.
See “Selecting a data normalizer” on page 830.

About cross-component matching

Data identifiers support component matching. This means that you can configure data identifiers
to match on one or more message components. However, if the data identifier implements a
validator (optional or required), such as Find keywords, the validated data and the matched
data must exist in the same component to trigger or except an incident.
See “Detection messages and message components” on page 391.
For example, consider a scenario where you implement the Randomized US Social Security
Number (SSN) data identifier. This data identifier detects on various 9-digits patterns and uses
a keyword validator to narrow the scope of detection. (The keyword and phrases in the list are
"social security number, ssn, ss#"). If the detection engine receives a message with the number
Detecting content using data identifiers 734
Configuring data identifier policy conditions

pattern 123-45-6789 and the keyword "social security number" and both data items are
contained in the message attachment component, the detection engine reports a match.
However, if the attachment contains the number but the body contains the keyword validator,
the detection engine does not consider this to be a match.
See “Configuring the Content Matches data identifier condition” on page 737.

About unique match counting

Data identifiers, keywords, and regular expressions support unique match counting. This
feature lets you count only those pattern matches that are unique.
Unique match counting is useful when you are only concerned with detecting the presence of
unique patterns and not with detecting every matched pattern. For example, you could use
unique match counting to trigger an incident if a document contains 10 or more unique social
security numbers. In this case, if a document contained 10 instances of the same social security
number, the policy would not trigger an incident.
See “Using unique match counting” on page 775.
See “Configuring unique match counting” on page 775.

Configuring data identifier policy conditions

Table 31-10 lists and describes the configuration options for data identifier conditions.
See “Introducing data identifiers” on page 717.
See “Configuring the Content Matches data identifier condition” on page 737.

Table 31-10 Policy instance data identifier configuration

Selectable at the policy level Not configurable

■ Breadth ■ Patterns
You can implement any breadth the data identifier You cannot modify the match patterns at the instance
supports at the instance level. level.
■ Optional Validators ■ Mandatory Validators
You can select one or more optional validators at You cannot modify, add, or remove required validators at
the instance level. the instance level.

Workflow for configuring data identifier policies

Table 31-11 describes the workflow for implementing system-defined data identifiers.
Detecting content using data identifiers 735
Configuring data identifier policy conditions

Table 31-11 Workflow for implementing data identifiers

Step Action Description

1 Decide the type of data See “Introducing data identifiers” on page 717.
identifier you want to
implement.

2 Decide the data identifier See “About data identifier breadths” on page 731.
breadth.

3 Configure the data See “Configuring the Content Matches data identifier condition” on page 737.
identifier.

4 Test and tune the data See “Best practices for using data identifiers” on page 833.
identifier policy.

Managing and adding data identifiers

The Manage > Policies > data identifiers screen lists all data identifiers, including system-
and custom-defined. From this screen you manage and modify existing data identifiers, and
add new ones.
See “Introducing data identifiers” on page 717.

Table 31-12 Manage data identifiers

Action Description

Edit a data identifier. Select the data identifier from the list to modify it.

See “Selecting a data identifier breadth” on page 739.

See “Extending and customizing data identifiers” on page 731.

See “Editing data identifiers” on page 736.

Define a custom data Click Add data identifier to create a custom data identifier.
identifier.
See “Custom data identifier configuration” on page 814.

See “Workflow for creating custom data identifiers” on page 812.

Sort and view data The list is sorted alphabetical by Name.

identifiers.
You can also sort by the Category.

A pencil icon to the left means that the data identifier is modified from its original state, or is
custom.
Detecting content using data identifiers 736
Configuring data identifier policy conditions

Table 31-12 Manage data identifiers (continued)

Action Description

Remove a data Click the X icon on the right side to delete a data identifier.
identifier.
The system does not let you delete system data identifiers. You can only delete custom data
identifiers.

Editing data identifiers

You can modify system-defined data identifiers, including the patterns, validators, and validator
input. Modifications are propagated to any policy that declares the data identifier. You cannot
rename a system data identifier. Consider manually creating a cloned copy before you modify
a system data identifier.
See “Extending and customizing data identifiers” on page 731.

Note: The system does not export data identifiers in a policy template. The system exports a
reference to the system data identifier. The target system where the policy template is imported
provides the actual data identifier. If you modify a system-defined data identifier, the
modifications do not export to the template.

Table 31-13 Workflow for editing data identifiers

Step Action Description

1 Clone the system data Clone the system data identifier before you modify it.
identifier you want to modify.
See “Cloning a system data identifier before modifying it” on page 777.

See “Clone system-defined data identifiers before modifying to preserve

original state” on page 835.

2 Edit the cloned data identifier. If you modify a system data identifier, click the plus sign to display the breadth
and edit the data identifier.

See “Selecting a data identifier breadth” on page 739.

3 Edit one or more Patterns. You can modify any pattern that the Data Identifier provides.

See “Writing data identifier patterns to match data” on page 817.

4 Edit the data input for any See “Editing pattern validator input” on page 778.
validator that accepts input.
See “List of pattern validators that accept input data” on page 778.

5 Optionally, you can add or See “Selecting pattern validators” on page 829.
remove Validators, as
necessary.
Detecting content using data identifiers 737
Configuring data identifier policy conditions

Table 31-13 Workflow for editing data identifiers (continued)

Step Action Description

6 Save the data identifier. Click Save to save the modifications.

Once the data identifier is saved, the icon at the Data Identifiers screen
indicates that it is modified from its original state, or is custom.

See “Managing and adding data identifiers” on page 735.

Note: Click Cancel to not save the Data Identifier.

7 Implement the data identifier See “Configuring the Content Matches data identifier condition” on page 737.
in a policy rule or exception.

Configuring the Content Matches data identifier condition

You can configure the Content Matches data identifier condition in policy detection rules and
exceptions.
See “Introducing data identifiers” on page 717.

Table 31-14 Configuring the Content Matches data identifier condition

Step Action Description

1 Add a data identifier rule Select the Content Matches data identifier condition at the Add Detection
or exception to a policy, Rule or Add Exception screen.
or configure an existing
See “Adding a rule to a policy” on page 415.
one.
See “Adding an exception to a policy” on page 424.

2 Choose a data identifier. Choose a data identifier from the list and click Next.

See “System-defined data identifiers” on page 718.

3 Select a Breadth of Use the breadth option to narrow the scope of detection.
detection.
See “About data identifier breadths” on page 731.

Wide is the default setting and detects the broadest set of matches. Medium
and narrow breadths, if available, check additional criteria and detect fewer
matches.

See “Selecting a data identifier breadth” on page 739.

4 Select and configure one Optional validators restrict the match criteria and reduce false positives.
or more Optional
See “About optional validators for data identifiers” on page 732.
Validators.
Detecting content using data identifiers 738
Configuring data identifier policy conditions

Table 31-14 Configuring the Content Matches data identifier condition (continued)

Step Action Description

5 Configure Match Select how you want to count matches:

Counting.
■ Check for existence
Do not count multiple matches; report a match count of 1 for one or more
matches.
■ Count all matches
Count each match; specify the minimum number of matches to report an
incident.
See “Configuring match counting” on page 421.
■ Count all unique matches
This is the default setting.
See “About unique match counting” on page 734.
See “Configuring unique match counting” on page 775.

6 Configure the message Select one or more message components on which to match.
components to Match
On the endpoint, the detection engine matches the entire message, not
On.
individual components.

See “Selecting components to match on” on page 423.

If the data identifier uses optional or required keyword validators, the keyword
must be present in the same component as the matched data identifier content.

See “About cross-component matching” on page 733.

7 Configure additional Optionally, you can Add one or more additional conditions from any available
conditions to Also Match. in the Also Match condition list.

All conditions in a compound rule or exception must match to trigger or except

an incident.

See “Configuring compound match conditions” on page 429.

Using data identifier breadths

Each system data identifier provides one or more breadths of detection. When you configure
a system data identifier instance, or when you modify a system data identifier, you select which
breadth to implement. Not all breadth options are available for each data identifier.
See “About data identifier breadths” on page 731.
Detecting content using data identifiers 739
Configuring data identifier policy conditions

Table 31-15 Available rule breadths for system data identifiers

Breadth Description

Wide The wide breadth defines a single or multiple patterns to create the greatest number of matches.
In general this breadth produces a higher rate of false positives than the medium and narrow
breadths.

Medium The medium breadth may refine the detection pattern(s) and/or add one or more data validators
to limit the number of matches.

Narrow The narrow breadth offers the tightest patterns and strictest validation to provide the most accurate
positive matches. In general this option requires the presence of a keyword or other validating
restriction to trigger a match.

Selecting a data identifier breadth

You cannot change the normalizer that a system data identifier implements. This information
is useful to know when you implement one or more optional validators.
See “Acceptable characters for optional validators” on page 764.

Table 31-16 System data identifier breadths and normalizers

Data identifier Breadth(s) Normalizer

ABA Routing Number Wide Digits

See “ABA Routing Number” on page 1013. Medium

Narrow

Argentina Tax Identification Number Wide Digits

See “Argentina Tax Identification Number” on page 1015. Medium

Narrow

Australia Driver's License Number Wide Digits and Letters

See “Australia Driver's License Number” on page 1018. Narrow

Australian Business Number Wide Digits

See “Australian Business Number wide breadth” on page 1020. Medium

Narrow

Australian Company Number Wide Digits

See “Australian Company Number” on page 1022. Medium

Narrow
Detecting content using data identifiers 740
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Australian Medicare Number Wide Digits

See “Australian Medicare Number” on page 1024. Medium

Narrow

Australian Passport Number Wide Lowercase

See “Australian Passport Number” on page 1027. Narrow

Australian Tax File Number Wide Digits

See “Australian Tax File Number” on page 1029. Medium

Narrow

Austria Passport Number Wide Digits and Letters

See “Austria Passport Number” on page 1030. Narrow

Austria Tax Identification Number Wide Digits

See “Austria Tax Identification Number” on page 1031. Narrow

Austria Value Added Tax (VAT) Number Wide Digits and Letters

See “Austria Value Added Tax (VAT) Number” on page 1033. Medium

Narrow

Austrian Social Security Number Wide Digits

See “Austrian Social Security Number” on page 1036. Medium

Narrow

Belgian National Number Wide Digits

See “Belgian National Number” on page 1039. Medium

Narrow

Belgium Driver's License Number Wide Digits

See “Belgium Driver's Licence Number” on page 1042. Narrow

Belgium Passport Number Wide Digits and Letters

See “Belgium Passport Number” on page 1044. Narrow

Belgium Tax Identification Number Wide Digits

See “Belgium Tax Identification Number” on page 1045. Narrow

Detecting content using data identifiers 741
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Belgium Value Added Tax (VAT) Number Wide Digits and Letters
See “Belgium Value Added Tax (VAT) Number” on page 1047. Medium

Narrow

Brazilian Election Identification Number Wide Digits

See “Brazilian Election Identification Number” on page 1049. Medium

Narrow

Brazilian National Registry of Legal Entities Number Wide Digits

See “Brazilian National Registry of Legal Entities Number” Medium

on page 1053.
Narrow

Brazilian Natural Person Registry Number Wide Digits

See “Brazilian Natural Person Registry Number (CPF)” Medium

on page 1055.
Narrow

British Columbia Personal Healthcare Number Wide Digits

See “British Columbia Personal Healthcare Number” Medium

on page 1058.
Narrow

Bulgaria Value Added Tax (VAT) Number Wide Digits and Letters

See “Bulgaria Value Added Tax (VAT) Number” on page 1060. Medium

Narrow

Bulgarian Uniform Civil Number - EGN Wide Digits

See “Bulgarian Uniform Civil Number - EGN” on page 1063. Medium

Narrow

Burgerservicenummer Wide Digits

See “Burgerservicenummer” on page 1066. Narrow

Canada Driver's License Number Wide Digits and Letters

See “Canada Driver's License Number” on page 1067. Medium

Narrow

Canada Passport Number Wide Digits and Letters

See “Canada Passport Number” on page 1070. Narrow

Detecting content using data identifiers 742
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Canada Permanent Residence (PR) Number Wide Digits and Letters

See “Canada Permanent Residence (PR) Number” Narrow
on page 1072.

Canadian Social Insurance Number Wide Digits

See “Canadian Social Insurance Number” on page 1074. Medium

Narrow

Chilean National Identification Number Wide Digits and Letters

See “Chilean National Identification Number” on page 1077. Medium

Narrow

China Passport Number Wide Digits and Letters

See “China Passport Number” on page 1079. Narrow

Codice Fiscale Wide Digits and Letters

See “Codice Fiscale” on page 1081. Narrow

Colombian Addresses Wide Lowercase

See “Colombian Addresses” on page 1082. Narrow

Colombian Cell Phone Number Wide Digits

See “Colombian Cell Phone Number” on page 1085. Narrow

Colombian Personal Identification Number Wide Digits

See “Colombian Personal Identification Number” on page 1088. Narrow

Colombian Tax Identification Number Wide Digits

See “Colombian Tax Identification Number” on page 1090. Narrow

Credit Card Magnetic Stripe Data Medium Digits

See “Credit Card Magnetic Stripe Data” on page 1092.

Credit Card Number Wide Digits

See “Credit Card Number” on page 1095. Medium

Narrow
Detecting content using data identifiers 743
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Croatia National Identification Number Wide Digits and Letters

See “Croatia National Identification Number” on page 1104. Medium

Narrow

CUSIP Number Wide Lowercase

See “CUSIP Number” on page 1106. Medium

Narrow

Cyprus Tax Identification Number Wide Digits and Letters

See “Cyprus Tax Identification Number” on page 1109. Medium

Narrow

Cyprus Value Added Tax (VAT) Number Wide Digits and Letters

See “Cyprus Value Added Tax (VAT) Number” on page 1111. Medium

Narrow

Czech Republic Driver's Licence Number Wide Digits and Letters

See “Czech Republic Driver's Licence Number” on page 1112. Narrow

Czech Republic Personal Identification Number Wide Digits

See “Czech Republic Personal Identification Number” Medium

on page 1114.
Narrow

Czech Republic Tax Identification Number Wide Digits

See “Czech Republic Tax Identification Number” on page 1117. Medium

Narrow

Czech Republic Value Added Tax (VAT) Number Wide Digits and Letters

See “Czech Republic Value Added Tax (VAT) Number” Medium

on page 1121.
Narrow

Denmark Personal Identification Number Wide Digits and Letters

See “Denmark Personal Identification Number” on page 1126. Medium

Narrow
Detecting content using data identifiers 744
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Denmark Tax Identification Number Wide Digits

See “Denmark Tax Identification Number” on page 1128. Medium

Narrow

Denmark Value Added Tax (VAT) Number Wide Digits and Letters

See “Denmark Value Added Tax (VAT) Number” on page 1130. Medium

Narrow

Driver's License Number – CA State Wide Lowercase

See “Driver's License Number – CA State ” on page 1133. Medium

Driver's License Number – FL, MI, MN States Wide Lowercase

See “Driver's License Number - FL, MI, MN States” Medium

on page 1134.

Driver's License Number – IL State Wide Lowercase

See “Driver's License Number - IL State” on page 1136. Medium

Driver's License Number – NJ State Wide Lowercase

See “Driver's License Number - NJ State” on page 1138. Medium

Driver's License Number – NY State Wide Lowercase

See “Driver's License Number - NY State” on page 1139. Medium

Driver's License Number – WA State Wide Lowercase

See “Driver's License Number - WA State” on page 1140. Medium

Narrow

Driver's License Number – WI State Wide Digits and Letters

See “Driver's License Number - WI State” on page 1142. Medium

Narrow

Drug Enforcement Agency (DEA) Number Wide Lowercase

See “Drug Enforcement Agency (DEA) Number” on page 1145. Medium

Narrow

Estonia Driver's Licence Number Wide Digits and Letters

See “Estonia Driver's Licence Number” on page 1147. Narrow

Detecting content using data identifiers 745
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Estonia Passport Number Wide Digits and Letters

See “Estonia Passport Number” on page 1149. Narrow

Estonia Personal Identification Code Wide Digits

See “Estonia Personal Identification Code” on page 1151. Medium

Narrow

Estonia Value Added Tax (VAT) Number Wide Digits and Letters

See “Estonia Value Added Tax (VAT) Number” on page 1153. Medium

Narrow

European Health Insurance Card Number Wide Digits

See “European Health Insurance Card Number” on page 1156. Narrow

Finland Driver's Licence Number Wide Digits and Letters

See “Finland Driver's Licence Number” on page 1165. Medium

Narrow

Finland European Health Insurance Number Wide Digits

See “Finland European Health Insurance Number” on page 1167. Narrow

Finland Passport Number Wide Digits and Letters

See “Finland Passport Number” on page 1169. Narrow

Finland Tax Identification Number Wide Do nothing

See “Finland Tax Identification Number” on page 1171. Medium

Narrow

Finland Value Added Tax (VAT) Number Wide Digits and Letters

See “Finland Value Added Tax (VAT) Number” on page 1173. Medium

Narrow

Finnish Personal Identification Number Wide Lowercase

See “Finnish Personal Identification Number” on page 1175. Medium

Narrow

France Driver's License Number Wide Digits

See “France Driver's License Number” on page 1177. Narrow

Detecting content using data identifiers 746
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

France Health Insurance Number Wide Digits

See “France Health Insurance Number” on page 1179. Narrow

France Tax Identification Number Wide Digits

See “France Tax Identification Number” on page 1181. Narrow

France Value Added Tax (VAT) Number Wide Digits and Letters

See “France Value Added Tax (VAT) Number” on page 1182. Medium

Narrow

French INSEE Code Wide Digits

See “French INSEE Code” on page 1185. Narrow

French Passport Number Wide Digits and Letters

See “French Passport Number” on page 1187. Narrow

French Social Security Number Wide Digits and Letters

See “French Social Security Number” on page 1188. Medium

Narrow

German Passport Number Wide Lowercase

See “German Passport Number” on page 1190. Medium

Narrow

German Personal ID Number Wide Lowercase

See “German Personal ID Number” on page 1192. Medium

Narrow

Germany Driver's License Number Wide Digits and Letters

See “Germany Driver's License Number” on page 1194. Narrow

Germany Tax Identification Number Wide Digits

See “Germany Tax Identification Number” on page 1198. Medium

Narrow

Germany Value Added Tax (VAT) Number Wide Digits and Letters

See “Germany Value Added Tax (VAT) Number” on page 1196. Medium

Narrow
Detecting content using data identifiers 747
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Greece Passport Number Wide Digits and Letters

See “Greece Passport Number” on page 1200. Narrow

Greece Social Security Number (AMKA) Wide Digits

See “Greece Social Security Number (AMKA)” on page 1202. Medium

Narrow

Greece Value Added Tax (VAT) Number Wide Digits and Letters

See “Greece Value Added Tax (VAT) Number” on page 1206. Medium

Narrow

Greek Tax Identification Number Wide Digits

See “Greek Tax Identification Number” on page 1204. Medium

Narrow

Healthcare Common Procedure Coding System (HCPCS Medium Digits and Letters
CPT Code)
Narrow
See “Healthcare Common Procedure Coding System (HCPCS
CPT Code)” on page 1208.

Health Insurance Claim Number Wide Digits and Letters

See “Health Insurance Claim Number” on page 1212. Medium

Narrow

Hong Kong ID Wide Lowercase

See “Hong Kong ID” on page 1215. Narrow

Hungarian Social Security Number Wide Digits

See “Hungarian Social Security Number” on page 1221. Medium

Narrow

Hungarian Tax Identification Number Wide Digits

See “Hungarian Tax Identification Number” on page 1223. Medium

Narrow

Hungarian VAT Number Wide Lowercase

See “Hungarian VAT Number” on page 1225. Medium

Narrow
Detecting content using data identifiers 748
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Hungary Driver's Licence Number Wide Digits and Letters

See “Hungary Driver's Licence Number” on page 1217. Narrow

Hungary Passport Number Wide Digits and Letters

See “Hungary Passport Number” on page 1219. Medium

Narrow

IBAN Central Wide Do nothing

See “IBAN Central” on page 1227. Narrow

IBAN East Wide Do nothing

See “IBAN East” on page 1231. Narrow

IBAN West Wide Do nothing

See “IBAN West” on page 1237. Narrow

Iceland National Identification Number Wide Digits

See “Iceland National Identification Number” on page 1241. Medium

Narrow

Iceland Passport Number Wide Digits and Letters

See “Iceland Passport Number” on page 1245. Narrow

Iceland Value Added Tax (VAT) Number Wide Digits and Letters

See “Iceland Value Added Tax (VAT) Number” on page 1247. Narrow

India RuPay Card Number Wide Digits

See “India RuPay Card Number” on page 1252. Medium

Narrow

Indian Aadhaar Card Number Wide Digits

See “Indian Aadhaar Card Number” on page 1249. Medium

Narrow

Indian Permanent Account Number Wide Digits and Letters

See “Indian Permanent Account Number” on page 1251. Narrow

Detecting content using data identifiers 749
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Indonesian Identity Card Number Wide Digits

See “Indonesian Identity Card Number” on page 1255. Medium

Narrow

International Mobile Equipment Identity Number Wide Digits

See “International Mobile Equipment Identity Number” Medium

on page 1257.
Narrow

International Securities Identification Number Wide Lowercase

See “International Securities Identification Number” Medium

on page 1259.
Narrow

IP Address Wide Do nothing

See “IP Address” on page 1261. Medium

Narrow

IPv6 Address Wide Do nothing

See “IPv6 Address” on page 1263. Medium

Narrow

Ireland Passport Number Wide Digits and Letters

See “Ireland Passport Number” on page 1266. Narrow

Ireland Tax Identification Number Wide Digits and Letters

See “Ireland Tax Identification Number” on page 1268. Medium

Narrow

Ireland Value Added Tax (VAT) Number Wide Digits and Letters

See “Ireland Value Added Tax (VAT) Number” on page 1271. Medium

Narrow

Irish Personal Public Service Number Wide Lowercase

See “Irish Personal Public Service Number” on page 1274. Medium

Narrow
Detecting content using data identifiers 750
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Israel Personal Identification Number Wide Digits

See “Israel Personal Identification Number” on page 1276. Medium

Narrow

Italy Driver's Licence Number Wide Digits and Letters

See “Italy Driver's Licence Number” on page 1278. Narrow

Italy Health Insurance Number Wide Digits and Letters

See “Italy Health Insurance Number” on page 1280. Narrow

Italy Passport Number Wide Digits and Letters

See “Italy Passport Number” on page 1282. Narrow

Italy Value Added Tax (VAT) Number Wide Digits and Letters

See “Italy Value Added Tax (VAT) Number” on page 1283. Medium

Narrow

Japan Driver's License Number Wide Digits

See “Japan Driver's License Number” on page 1285. Medium

Narrow

Japan Passport Number Wide Digits and Letters

See “Japan Passport Number” on page 1287. Narrow

Japanese Juki-Net Identification Number Wide Digits

See “Japanese Juki-Net Identification Number” on page 1289. Medium

Narrow

Japanese My Number - Corporate Wide Digits

See “Japanese My Number - Corporate” on page 1291. Narrow

Japanese My Number - Personal Wide Digits

See “Japanese My Number - Personal” on page 1292. Medium

Narrow

Kazakhstan Passport Number Wide Digits and Letters

See “Kazakhstan Passport Number” on page 1295. Narrow

Detecting content using data identifiers 751
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Korea Passport Number Wide Digits and Letters

See “Korea Passport Number” on page 1296. Narrow

Korea Residence Registration Number for Foreigners Wide Digits

See “Korea Residence Registration Number for Foreigners” Medium

on page 1298.
Narrow

Korea Residence Registration Number for Korean Wide Digits

See “Korea Residence Registration Number for Korean” Medium

on page 1300.
Narrow

Latvia Driver's Licence Number Wide Digits and Letters

See “Latvia Driver's Licence Number” on page 1303. Narrow

Latvia Passport Number Wide Digits and Letters

See “Latvia Passport Number” on page 1305. Narrow

Latvia Personal Identification Number Wide Digits

See “Latvia Personal Identification Number” on page 1306. Medium

Narrow

Latvia Value Added Tax (VAT) Number Wide Digits and Letters

See “Latvia Value Added Tax (VAT) Number” on page 1308. Medium

Narrow

Liechtenstein Passport Number Wide Digits and Letters

See “Liechtenstein Passport Number” on page 1311. Narrow

Lithuania Personal Identification Number Wide Digits

See “Lithuania Personal Identification Number” on page 1312. Medium

Narrow

Lithuania Tax Identification Number Wide Digits

See “Lithuania Tax Identification Number” on page 1315. Medium

Narrow
Detecting content using data identifiers 752
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Lithuania Value Added Tax Number Wide Digits and Letters

See “Lithuania Value Added Tax (VAT) Number” on page 1317. Medium

Narrow

Luxembourg National Register of Individuals Number Wide Digits

See “Luxembourg National Register of Individuals Number” Medium

on page 1320.
Narrow

Luxembourg Passport Number Wide Digits and Letters

See “Luxembourg Passport Number” on page 1322. Narrow

Luxembourg Tax Identification Number Wide Digits

See “Luxembourg Tax Identification Number” on page 1324. Medium

Narrow

Luxembourg Value Added Tax (VAT) Number Wide Digits and Letters

See “Luxembourg Value Added Tax (VAT) Number” Medium

on page 1327.
Narrow

Macau Individual Identification Number Wide Digits

See “Macau National Identification Number” on page 1331. Narrow

Malaysia Passport Number Wide Digits and Letters

See “Malaysia Passport Number” on page 1333. Narrow

Malaysian MyKad Number Wide Digits

See “Malaysian MyKad Number (MyKad)” on page 1335. Medium

Narrow

Malta National Identification Number Wide Digits and Letters

See “Malta National Identification Number” on page 1337. Narrow

Malta Tax Identification Number Wide Digits and Letters

See “Malta Tax Identification Number” on page 1339. Narrow

Malta Value Added Tax (VAT) Number Wide Digits and Letters

See “Malta Value Added Tax (VAT) Number” on page 1342. Medium

Narrow
Detecting content using data identifiers 753
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Medicare Beneficiary Identifier Wide Digits and Letters

See “Medicare Beneficiary Identifier” on page 1344. Medium

Narrow

Mexican Personal Registration and Identification Number Wide Digits and Letters

See “Mexican Personal Registration and Identification Number” Medium

on page 1346.
Narrow

Mexican Tax Identification Number Wide Digits and Letters

See “Mexican Tax Identification Number” on page 1349. Medium

Narrow

Mexican Unique Population Registry Code (CURP) Wide Lowercase

See “Mexican Unique Population Registry Code” on page 1351. Medium

Narrow

Mexico CLABE Number Wide Digits

See “Mexico CLABE Number” on page 1353. Medium

Narrow

National Drug Code Wide Do nothing

See “National Drug Code (NDC)” on page 1355. Medium

Narrow

National Provider Identifier Number Wide Digits

See “National Provider Identifier Number” on page 1357. Medium

Narrow

Netherlands Bank Account Number Wide Digits and Letters

See “Netherlands Bank Account Number” on page 1359. Medium

Narrow

Netherlands Driver's License Number Wide Digits

See “Netherlands Driver's License Number” on page 1362. Narrow

Netherlands Passport Number Wide Digits and Letters

See “Netherlands Passport Number” on page 1363. Narrow

Detecting content using data identifiers 754
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Netherlands Tax Identification Number Wide Digits

See “Netherlands Tax Identification Number” on page 1364. Medium

Narrow

Netherlands Value Added Tax (VAT) Number Wide Digits and Letters

See “Netherlands Value Added Tax (VAT) Number” Medium

on page 1367.
Narrow

New Zealand Driver's License Number Wide Digits and Letters

See “New Zealand Driver's Licence Number” on page 1370. Narrow

New Zealand National Health Index Number Wide Lowercase

See “New Zealand National Health Index Number” on page 1371. Medium

Narrow

New Zealand Passport Number Wide Digits and Letters

See “New Zealand Passport Number” on page 1373. Narrow

Norway Driver's Licence Number Wide Digits

See “Norway Driver's Licence Number” on page 1375. Narrow

Norway National Identification Number Wide Digits

See “Norway National Identification Number” on page 1377. Medium

Narrow

Norway Value Added Tax Number Wide Digits and Letters

See “Norway Value Added Tax Number” on page 1379. Medium

Narrow

Norwegian Birth Number Wide Digits

See “Norwegian Birth Number” on page 1382. Medium

Narrow

People's Republic of China ID Wide Lowercase

See “People's Republic of China ID” on page 1384. Narrow

Poland Driver's Licence Number Wide Digits

See “Poland Driver's Licence Number” on page 1386. Narrow

Detecting content using data identifiers 755
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Poland European Health Insurance Number Wide Digits

See “Poland European Health Insurance Number” on page 1387. Narrow

Poland Passport Number Wide Digits and Letters

See “Poland Passport Number” on page 1389. Narrow

Poland Value Added Tax (VAT) Number Wide Digits and Letters

See “Poland Value Added Tax (VAT) Number” on page 1391. Medium

Narrow

Polish Identification Number Wide Digits and Letters

See “Polish Identification Number” on page 1394. Medium

Narrow

Polish REGON Number Wide Digits

See “Polish REGON Number” on page 1396. Medium

Narrow

Polish Social Security Number (PESEL) Wide Digits

See “Polish Social Security Number (PESEL)” on page 1398. Medium

Narrow

Polish Tax Identification Number Wide Digits

See “Polish Tax Identification Number” on page 1400. Medium

Narrow

Portugal Driver's Licence Number Wide Digits and Letters

See “Portugal Driver's Licence Number” on page 1402. Narrow

Portugal National Identification Number Wide Digits and Letters

See “Portugal National Identification Number” on page 1404. Medium

Narrow

Portugal Passport Number Wide Digits and Letters

See “Portugal Passport Number” on page 1407. Narrow

Detecting content using data identifiers 756
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Portugal Tax Identification Number Wide Digits

See “Portugal Tax Identification Number” on page 1408. Medium

Narrow

Portugal Value Added Tax (VAT) Number Wide Digits and Letters

See “Portugal Value Added Tax (VAT) Number” on page 1411. Medium

Narrow

Randomized US Social Security Number (SSN) Medium Digits

See “Randomized US Social Security Number (SSN)” Narrow

on page 1414.

Romania Driver's Licence Number Wide Lowercase

See “Romania Driver's Licence Number” on page 1416. Narrow

Romania National Identification Number Wide Digits

See “Romania National Identification Number” on page 1419. Medium

Narrow

Romania Value Added Tax (VAT) Number Wide Digits and Letters

See “Romania Value Added Tax (VAT) Number” on page 1420. Medium

Narrow

Romanian Numerical Personal Code Wide Digits

See “Romanian Numerical Personal Code” on page 1425. Medium

Narrow

Russian Passport Identification Number Wide Digits

See “Russian Passport Identification Number” on page 1427. Narrow

Russian Taxpayer Identification Number Wide Digits

See “Russian Taxpayer Identification Number” on page 1428. Medium

Narrow

SEPA Creditor Identifier Number North Wide Digits and Letters

See “SEPA Creditor Identifier Number North” on page 1430. Medium

Narrow
Detecting content using data identifiers 757
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

SEPA Creditor Identifier Number South Wide Digits and Letters

See “SEPA Creditor Identifier Number South” on page 1437. Medium

Narrow

SEPA Creditor Identifier Number West Wide Digits and Letters

See “SEPA Creditor Identifier Number West” on page 1441. Medium

Narrow

Serbia Unique Master Citizen Number Wide Digits

See “Serbia Unique Master Citizen Number” on page 1445. Medium

Narrow

Serbia Value Added Tax (VAT) Number Wide Digits and Letters

See “Serbia Value Added Tax (VAT) Number” on page 1448. Medium

Narrow

Singapore NRIC Wide Lowercase

See “Singapore NRIC data identifier” on page 1451.

Slovakia Driver's Licence Number Wide Digits and Letters

See “Slovakia Driver's Licence Number” on page 1451. Narrow

Slovakia National Identification Number Wide Digits and Letters

See “Slovakia National Identification Number” on page 1453. Medium

Narrow

Slovakia Passport Number Wide Digits and Letters

See “Slovakia Passport Number” on page 1457. Narrow

Slovakia Value Added Tax (VAT) Number Wide Digits and Letters

See “Slovakia Value Added Tax (VAT) Number” on page 1459. Medium

Narrow

Slovenia Passport Number Wide Digits and Letters

See “Slovenia Passport Number” on page 1461. Narrow

Detecting content using data identifiers 758
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Slovenia Tax Identification Number Wide Digits

See “Slovenia Tax Identification Number” on page 1463. Medium

Narrow

Slovenia Unique Master Citizen Number Wide Digits

See “Slovenia Unique Master Citizen Number” on page 1465. Medium

Narrow

Slovenia Value Added Tax (VAT) Number Wide Digits and Letters

See “Slovenia Value Added Tax (VAT) Number” on page 1467. Medium

Narrow

South African Personal Identification Number Wide Digits

See “South African Personal Identification Number” Medium

on page 1469.
Narrow

Spain Driver's License Number Wide Digits and Letters

See “Spain Driver's Licence Number” on page 1477. Narrow

Spain Value Added Tax (VAT) Number Wide Digits and Letters

See “Spain Value Added Tax (VAT) Number” on page 1474. Medium

Narrow

Spanish Customer Account Number Wide Digits

See “Spanish Customer Account Number” on page 1479. Medium

Narrow

Spanish DNI ID Wide Digits and Letters

See “Spanish DNI ID” on page 1481. Narrow

Spanish Social Security Number Wide Digits

See “Spanish Social Security Number ” on page 1485. Medium

Narrow

Spanish Tax Identification (CIF) Wide Digits and Letters

See “Spanish Tax Identification (CIF)” on page 1487. Medium

Narrow
Detecting content using data identifiers 759
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Sri Lanka National Identity Number Wide Digits and Letters

See “Sri Lanka National Identity Number” on page 1490. Medium

Narrow

Sweden Driver's Licence Number Wide Digits

See “Sweden Driver's Licence Number” on page 1492. Medium

Narrow

Sweden Tax Identification Number Wide Digits

See “Sweden Tax Identification Number” on page 1494. Medium

Narrow

Sweden Value Added Tax (VAT) Number Wide Digits and Letters

See “Sweden Value Added Tax (VAT) Number” on page 1496. Medium

Narrow

Swedish Passport Number Wide Digits and Letters

See “Swedish Passport Number” on page 1499. Narrow

Swedish Personal Identification Number Wide Digits

See “Sweden Personal Identification Number” on page 1501. Medium

Narrow

SWIFT Code Wide Swift

See “SWIFT Code ” on page 1503. Narrow

Swiss AHV Number Wide Digits

See “Swiss AHV Number” on page 1505. Narrow

Swiss Social Security Number (AHV) Wide Digits

See “Swiss Social Security Number (AHV)” on page 1507. Medium

Narrow

Switzerland Health Insurance Card Number Wide Digits

See “Switzerland Health Insurance Card Number” on page 1509. Narrow

Switzerland Passport Number Wide Digits and Letters

See “Switzerland Passport Number” on page 1511. Narrow

Detecting content using data identifiers 760
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

Switzerland Value Added Tax (VAT) Number Wide Lowercase

See “Switzerland Value Added Tax (VAT) Number” Medium
on page 1513.
Narrow

Taiwan ROC ID Wide Do nothing

See “Taiwan ROC ID” on page 1515. Narrow

Thailand Passport Number Wide Digits and Letters

See “Thailand Passport Number” on page 1517. Narrow

Thailand Personal Identification Number Wide Digits

See “Thailand Personal Identification Number” on page 1519. Medium

Narrow

Turkish Identification Number Wide Digits

See “Turkish Identification Number” on page 1521. Medium

Narrow

UK Bank Account Number Sort Code Wide Digits

See “UK Bank Account Number Sort Code” on page 1523. Medium

Narrow

UK Driver's Licence Number Wide Digits and Letters

See “UK Drivers Licence Number” on page 1525. Medium

Narrow

UK Electoral Roll Number Narrow Lowercase

See “UK Electoral Roll Number” on page 1527.

UK National Health Service (NHS) Number Medium Digits

See “UK National Health Service (NHS) Number” on page 1528. Narrow

UK National Insurance Number Wide Lowercase

See “UK National Insurance Number” on page 1530. Medium

Narrow
Detecting content using data identifiers 761
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

UK Passport Number Wide Do nothing

See “UK Passport Number” on page 1532. Medium

Narrow

UK Tax ID Number Wide Do nothing

See “UK Tax ID Number” on page 1534. Medium

Narrow

UK Value Added Tax (VAT) Number Wide Digits and Letters

See “UK Value Added Tax (VAT) Number” on page 1536. Medium

Narrow

Ukraine Identity Card Wide Digits

See “Ukraine Identity Card” on page 1539. Medium

Narrow

Ukraine Passport (Domestic) Wide Digits

See “Ukraine Passport (Domestic)” on page 1541. Narrow

Ukraine Passport (International) Wide Digits and Letters

See “Ukraine Passport (International)” on page 1543. Narrow

United Arab Emirates Personal Number Wide Digits

See “United Arab Emirates Personal Number” on page 1544. Medium

Narrow

US Individual Tax ID Number (ITIN) Wide Digits

See “US Individual Tax Identification Number (ITIN)” Medium

on page 1546.
Narrow

US Passport Number Wide Digits

See “US Passport Number” on page 1548. Narrow

US Social Security Number (SSN) Wide Digits

See “US Social Security Number (SSN)” on page 1550. Medium

Narrow
Detecting content using data identifiers 762
Configuring data identifier policy conditions

Table 31-16 System data identifier breadths and normalizers (continued)

Data identifier Breadth(s) Normalizer

US ZIP+4 Postal Codes Wide Digits and Letters

See “US ZIP+4 Postal Codes” on page 1553. Medium

Narrow

Venezuela National ID Number Wide Digits and Letters

See “Venezuela National Identification Number” on page 1555. Medium

Narrow

Using optional validators

Table 31-17 lists the optional validators policy authors can configure for system data identifiers.
See “About optional validators for data identifiers” on page 732.

Table 31-17 Available optional validators for policy instances

Optional validator Description

Require beginning Match the characters that begin (lead) the matched data item.
characters
For example, for the CA Drivers License data identifier, you could require the beginning
character to be the letter "C." In this case the engine matches a license number C6457291.

See “Acceptable characters for optional validators” on page 764.

Require ending characters Match the characters that end (trail) the matched data item.

See “Acceptable characters for optional validators” on page 764.

Exclude beginning Exclude from matching characters that begin (lead) the matched data.
characters
See “Acceptable characters for optional validators” on page 764.

Exclude ending Exclude from matching the characters that end (trail) the matched data item.
characters
See “Acceptable characters for optional validators” on page 764.
Detecting content using data identifiers 763
Configuring data identifier policy conditions

Table 31-17 Available optional validators for policy instances (continued)

Optional validator Description

Find keywords Match one or more keywords or key phrases in addition to the matched data item. Can
check for the proximity of matched data against a list of keywords.

Keywords can also be scanned for case sensitivity. Then a check is performed for the
proximity of the matched data identifier patterns against a list of keywords. An incident is
generated when all of the data identifier patterns in the rule match. Captured keywords
are highlighted in incidents. Proximity, case sensitivity, and validator highlighting are
disabled by default and must be enabled to work.

The keyword must be detected in the same message component as the data identifier
content to report a match.

See “About cross-component matching” on page 733.

This optional validator accepts any characters (numbers, letters, others).

See “Acceptable characters for optional validators” on page 764.

See “List of pattern validators that accept input data” on page 778.

Exact Match Data Lookup tokens around a pattern for an Exact Match Data Identifier index and validate the
Identifier Check pattern.

See “Adding an EMDI check to a built-in or custom data identifier condition in a policy”
on page 487.

Configuring optional validators

You implement optional validators to refine the scope of a data identifier defined in a policy
instance. System and custom data identifiers support the configuration of optional validators.
See “About optional validators for data identifiers” on page 732.
The type of input allowed by an optional validator (numbers, letters, characters) depends on
the data identifier. If you enter unacceptable input characters and attempt to save the
configuration, the system reports an error.
For example, the US Social Security Number (SSN) data identifier accepts numbers only. If
you configure the "Require ending character" optional validator and provide input as letters,
you receive the following error when you attempt to save the configuration: Input to "Require
ending characters" Validator is incorrect: List contains non-number character.
See “Acceptable characters for optional validators” on page 764.
Detecting content using data identifiers 764
Configuring data identifier policy conditions

To configure an optional validator

1 Click the plus sign beside the Optional Validators label for the data identifier instance
you are configuring.
See “Configuring the Content Matches data identifier condition” on page 737.
2 Select one or more optional validators.
See “About optional validators for data identifiers” on page 732.
3 Provide the expected input for each optional validator you select.
Each value can be of any length. Use commas to separate multiple values.
4 Click Save to save the configuration.
If the system displays an error message, make sure you have entered the correct type of
expected character input.
See “Acceptable characters for optional validators” on page 764.

Acceptable characters for optional validators

Each optional validator requires you to enter in some data values. You must enter the
appropriate type of data according for that data identifier. Table 31-18 lists the acceptable data
type for each data identifier/optional validator pairing.
See “About optional validators for data identifiers” on page 732.

Note: The Find keyword optional validator accepts any characters as values for all data
identifiers .

The type of data expected by the optional validator depends on the data identifier. Most data
identifier/optional validator pairings accept numbers only; some accept alphanumeric values,
and a few accept any characters. If you enter unacceptable input and attempt to save the
policy, the system reports an error.
See “Configuring optional validators” on page 763.

Table 31-18 Acceptable characters for optional validators

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

ABA Routing Number Numbers only Numbers only

Argentina Tax Identification Number Numbers only Numbers only

Australia Driver's License Number Alphanumeric Alphanumeric

Detecting content using data identifiers 765
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

Australian Business Number Numbers only Numbers only

Australian Company Number Numbers only Numbers only

Australian Medicare Number Numbers only Numbers only

Australian Passport Number Letters only (normalized Numbers only

to lowercase)

Australian Tax File Number Numbers only Numbers only

Austria Passport Number Alphanumeric Alphanumeric

Austria Tax Identification Number Numbers only Numbers only

Austria Value Added Tax (VAT) Number Letters only Numbers only

Austrian Social Security Number Numbers only Numbers only

Belgian National Number Numbers only Numbers only

Belgium Driver's Licence Number Numbers only Numbers only

Belgium Passport Number Alphanumeric Alphanumeric

Belgium Tax Identification Number Numbers only Numbers only

Belgium Value Added Tax (VAT) Number Letters only Numbers only

Brazilian Election Identification Number Numbers only Numbers only

Brazilian National Registry of Legal Entities Number Numbers only Numbers only

Brazilian Natural Person Registry Number Numbers only Numbers only

British Columbia Personal Number Numbers only Numbers only

Bulgaria Value Added Tax (VAT) Number Letters only Numbers only

Bulgarian Uniform Civil Number - EGN Numbers only Numbers only

Burgerservicenummer Numbers only Numbers only

Canada Driver's License Number Alphanumeric Alphanumeric

Canada Passport Number Letters only Numbers only

Canada Permanent Resident (PR) Number Letters only Numbers only

Detecting content using data identifiers 766
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

Canadian Social Insurance Number Numbers only Numbers only

Chilean National Identification Number Alphanumeric Alphanumeric

China Passport Number Alphanumeric Alphanumeric

Codice Fiscale Letters only Letters only

Columbian Addresses Numbers only Numbers only

Colombian Cell Phone Number Numbers only Numbers only

Columbian Personal Identification Number Numbers only Numbers only

Colombian Tax Identification Number Numbers only Numbers only

Common Procedure Coding System (HCPCS CPT Code) Alphanumeric Alphanumeric

Credit Card Magnetic Stripe Data Numbers only Numbers only

Credit Card Number Numbers only Numbers only

Croatia National Identification Number Alphanumeric Alphanumeric

CUSIP Number Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

Cyprus Tax Identification Number Letters only Numbers only

Cyprus Value Added Tax (VAT) Number Alphanumeric Alphanumeric

Czech Republic Driver's Licence Number Letters only Numbers only

Czech Republic Personal Identification Number Numbers only Numbers only

Czech Republic Tax Identification Number Numbers only Numbers only

Czech Republic Value Added Tax (VAT) Number Letters only Numbers only

Denmark Personal Identification Number Alphanumeric Alphanumeric

Denmark Tax Identification Number Numbers only Numbers only

Denmark Value Added Tax (VAT) Number Letters only Numbers only

Driver's License Number – CA State Letters only (normalized Numbers only

to lowercase)
Detecting content using data identifiers 767
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

Driver's License Number – FL, MI, MN States Letters only (normalized Numbers only
to lowercase)

Driver's License Number – IL State Letters only (normalized Numbers only

to lowercase)

Driver's License Number – NJ State Letters only (normalized Numbers only

to lowercase)

Driver's License Number – NY State Numbers only Numbers only

Driver's License Number - WA State Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

Driver's License Number - WI State Letters only Numbers only

Drug Enforcement Agency (DEA) Number Letters only (normalized Numbers only
to lowercase)

Estonia Driver's Licence Number Letters only Numbers only

Estonia Passport Number Letters only Numbers only

Estonia Personal Identification Number Numbers only Numbers only

Estonia Value Added Tax (VAT) Number Letters only Numbers only

European Health Insurance Card Number Numbers only Numbers only

Finland Driver's Licence Number Alphanumeric Alphanumeric

Finland European Health Insurance Number Numbers only Numbers only

Finland Passport Number Letters only Numbers only

Finland Tax Identification Number Alphanumeric Alphanumeric

Finland Value Added Tax (VAT) Number Letters only Numbers only

Finnish Personal Identification Number Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

France Driver's Licence Number Numbers only Numbers only

France Health Insurance Number Numbers only Numbers only

France Tax Identification Number Numbers only Numbers only

Detecting content using data identifiers 768
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

France Value Added Tax (VAT) Number Letters only Numbers only

French INSEE Code Numbers only Numbers only

French Passport Number Alphanumeric Alphanumeric

French Social Security Number Alphanumeric Alphanumeric

German Passport Number Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

German Personal Identification Number Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

German Driver's Licence Number Alphanumeric Alphanumeric

German Tax Identification Number Numbers only Numbers only

German Value Added Tax (VAT) Number Letters only Numbers only

Greece Passport Number Letters only Numbers only

Greece Social Security Number (AMKA) Numbers only Numbers only

Greece Value Added Tax (VAT) Number Letters only Numbers only

Greek Tax Identification Number Numbers only Numbers only

Health Insurance Claim Number Alphanumeric Alphanumeric

Hong Kong ID Alphanumeric Alphanumeric

Hungarian Social Security Number Numbers only Numbers only

Hungarian Tax Identification Number Numbers only Numbers only

Hungarian VAT Number Letters only (normalized Numbers only

to lowercase)

Hungary Driver's Licence Number Letters only Numbers only

Hungary Passport Number Letters only Numbers only

IBAN Central Alphanumeric Alphanumeric

IBAN East Alphanumeric Alphanumeric

IBAN West Alphanumeric Alphanumeric

Detecting content using data identifiers 769
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

Iceland National Identification Number Numbers only Numbers only

Iceland Passport Number Letters only Numbers only

Iceland Value Added Tax (VAT) Number Letters only Numbers only

India RuPay Card Number Numbers only Numbers only

Indian Aadhar Card Number Numbers only Numbers only

Indonesian Identity Card Number Letters only Letters only

International Mobile Equipment Identity Number Numbers only Numbers only

International Securities Identification Number Letters only (normalized Numbers only

to lowercase)

IP Address Any characters Any characters

IPv6 Address Alphanumeric Alphanumeric

Ireland Passport Number Letters only Numbers only

Ireland Tax Identification Number Alphanumeric Alphanumeric

Ireland Value Added Tax (VAT) Number Letters only Numbers only

Irish Personal Public Service Number Numbers only Letters only (normalized
to lowercase)

Israel Personal Identification Number Numbers only Numbers only

Italy Driver's Licence Number Letters only Letters only

Italy Health Insurance Number Letters only Letters only

Italy Passport Number Alphanumeric Alphanumeric

Italy Value Added Tax (VAT) Number Letters only Numbers only

Japan Driver's License Number Numbers only Numbers only

Japan Passport Number Letters only Numbers only

Japanese Juki-Net ID Number Numbers only Numbers only

Japanese My Number - Corporate Numbers only Numbers only

Detecting content using data identifiers 770
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

Japanese My Number - Personal Numbers only Numbers only

Kazakhstan Passport Number Letters only Numbers only

Korea Passport Number Alphanumeric Alphanumeric

Korea Residence Registration Number for Foreigners Numbers only Numbers only

Korea Residence Registration Number for Korean Numbers only Numbers only

Latvia Driver's Licence Number Letters only Numbers only

Latvia Passport Number Letters only Numbers only

Latvia Personal Identification Number Numbers only Numbers only

Latvia Value Added Tax (VAT) Number Letters only Numbers only

Liechtenstein Passport Number Letters only Numbers only

Lithuania Personal Identification Number Numbers only Numbers only

Lithuania Tax Identification Number Numbers only Numbers only

Lithuania Value Added Tax (VAT) Number Letters only Numbers only

Luxembourg National Register of Individuals Number Numbers only Numbers only

Luxembourg Passport Number Alphanumeric Alphanumeric

Luxembourg Tax Identification Number Numbers only Numbers only

Luxembourg Value Added Tax (VAT) Number Letters only Numbers only

Macau National Identification Number Numbers only Numbers only

Malaysia Passport Number Letters only Numbers only

Malaysian MyKad Number (MyKad) Numbers only Numbers only

Malta National Identification Number Numbers only Letters only

Malta Tax Identification Number Alphanumeric Alphanumeric

Malta Value Added Tax (VAT) Number Alphanumeric Alphanumeric

Medicare Beneficiary Number Alphanumeric Alphanumeric

Detecting content using data identifiers 771
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

Mexican Personal Registration and Identification Number Alphanumeric Alphanumeric

Mexican Tax Identification Number Alphanumeric Alphanumeric

Mexican Unique Population Registry Code Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

Mexico CLABE Number Numbers only Numbers only

National Drug Code (NDC) Numbers only Numbers only

National Provider Identifier Number Numbers only Numbers only

Netherlands Bank Account Number Alphanumeric Alphanumeric

Netherlands Driver's Licence Number Numbers only Numbers only

Netherlands Passport Number Alphanumeric Alphanumeric

Netherlands Tax Identification Number Numbers only Numbers only

Netherlands Value Added Tax (VAT) Number Letters only Numbers only

New Zealand Driver's License Number Letters only Numbers only

New Zealand National Health Index Number Letters only (normalized Numbers only
to lowercase)

New Zealand Passport Number Letters only Numbers only

Norway Driver's Licence Number Numbers only Numbers only

Norway National Identification Number Numbers only Numbers only

Norway Value Added Tax Number Alphanumeric Alphanumeric

Norwegian Birth Number Numbers only Numbers only

People's Republic of China ID Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

Poland Driver's Licence Number Numbers only Numbers only

Poland European Health Insurance Number Numbers only Numbers only

Poland Passport Number Letters only Numbers only

Poland Value Added Tax (VAT) Number Letters only Numbers only
Detecting content using data identifiers 772
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

Polish Identification Number Letters only Numbers only

Polish REGON Number Numbers only Numbers only

Polish Social Security Number (PESEL) Numbers only Numbers only

Polish Tax Identification Number Numbers only Numbers only

Portugal Driver's Licence Number Letters only Numbers only

Portugal National Identification Number Alphanumeric Alphanumeric

Portugal Passport Number Letters only Numbers only

Portugal Tax Identification Number Numbers only Numbers only

Portugal Value Added Tax (VAT) Number Letters only Numbers only

Randomized US Social Security Number (SSN) Numbers only Numbers only

Romania Driver's Licence Number Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

Romania National Identification Number Numbers only Numbers only

Romania Numerical Personal Code Numbers only Numbers only

Romania Value Added Tax (VAT) Number Letters only Numbers only

Romanian Numerical Personal Code Numbers only Numbers only

Russian Passport Identification Number Numbers only Numbers only

Russian Taxpayer Identification Number Numbers only Numbers only

SEPA Creditor Identifier Number North Alphanumeric Alphanumeric

SEPA Creditor Identifier Number South Alphanumeric Alphanumeric

SEPA Creditor Identifier Number West Alphanumeric Alphanumeric

Serbia Unique Master Citizen Number Numbers only Numbers only

Serbia Value Added Tax (VAT) Number Alphanumeric Alphanumeric

Singapore NRIC Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)
Detecting content using data identifiers 773
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

Slovakia Driver's Licence Number Letters only Numbers only

Slovakia National Identification Number Alphanumeric Alphanumeric

Slovakia Passport Number Letters only Numbers only

Slovakia Value Added Tax (VAT) Number Letters only Numbers only

Slovenia Passport Number Letters only Numbers only

Slovenia Tax Identification Number Numbers only Numbers only

Slovenia Unique Master Citizen Number Numbers only Numbers only

Slovenia Value Added Tax (VAT) Number Letters only Numbers only

South African Personal Identification Number Numbers only Numbers only

Spain Driver's Licence Number Alphanumeric Alphanumeric

Spain Value Added Tax (VAT) Number Alphanumeric Alphanumeric

Spanish Customer Account Number Numbers only Numbers only

Spanish DNI ID Alphanumeric Alphanumeric

Spanish Passport Number Alphanumeric Alphanumeric

Spanish Social Security Number Numbers only Numbers only

Spanish Tax ID (CIF) Alphanumeric Alphanumeric

Sri Lanka National Identification Number Alphanumeric Alphanumeric

Sweden Driver's Licence Number Numbers only Numbers only

Sweden Personal Identification Number Numbers only Numbers only

Sweden Tax Identification Number Numbers only Numbers only

Sweden Value Added Tax (VAT) Number Letters only Numbers only

Swedish Passport Number Alphanumeric Alphanumeric

SWIFT Code Alphanumeric Alphanumeric

Swiss AHV Number Numbers only Numbers only

Detecting content using data identifiers 774
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

Swiss Social Security Number (AHV) Alphanumeric Alphanumeric

Switzerland Health Insurance Card Number Numbers only Numbers only

Switzerland Passport Number Letters only Numbers only

Switzerland Value Added Tax (VAT) Number Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

Taiwan ROC ID Alphanumeric Alphanumeric

Thailand Passport Number Letters only Numbers only

Thailand Personal ID Number Numbers only Numbers only

Turkish Identification Number Numbers only Numbers only

UK Bank Account Number Sort Code Numbers only Numbers only

UK Driver's Licence Number Alphanumeric (normalized Alphanumeric

to lowercase) (normalized to lowercase)

UK Electoral Roll Number Letters only (normalized Numbers only

to lowercase)

UK National Health Service (NHS) Number Numbers only Numbers only

UK National Insurance Number Letters only (normalized Letters only (normalized

to lowercase) to lowercase)

UK Passport Number Numbers only Numbers only

UK Tax ID Number Numbers only Numbers only

UK Value Added Tax (VAT) Number Letters only Numbers only

Ukraine Identity Card Numbers only Numbers only

Ukraine Passport (Domestic) Numbers only Numbers only

Ukraine Passport (International) Alphanumeric Alphanumeric

United Arab Emirates Personal Number Numbers only Numbers only

US Individual Tax Identification Number (ITIN) Numbers only Numbers only

US Passport Number Numbers only Numbers only

Detecting content using data identifiers 775
Configuring data identifier policy conditions

Table 31-18 Acceptable characters for optional validators (continued)

Data Identifier Exclude/require Exclude/require

beginning characters ending characters

US Social Security Number (SSN) Numbers only Numbers only

US ZIP+4 Postal Codes Letters only Numbers only

Venezuela National ID Number Letters only Numbers only

Using unique match counting

When you define a new data identifier rule, a new keyword rule, or a new regular expression
rule Count all unique matches is the default method for counting matches.
The following table describes unique match counting characteristics.

Table 31-19 Unique match counting characteristics

Unique match counting Description

characteristic

First match is unique A unique match is the first match found in a message component.

See “Detection messages and message components” on page 391.

Match count updated for each unique The match count is incremented by 1 for each unique pattern match.
match

Only unique matches are highlighted Duplicate matches are neither counted nor highlighted at the Incident Snapshot
screen

See “Remediating incidents” on page 1844.

Uniqueness does not span message For example, if the same SSN appears in both the message body and
components attachment, two unique matches will be generated, not one. This is because
each instance is detected in a separate message component.

Compound rule with data identifier In a compound rule combining a data identifier condition with a keyword condition
and keyword proximity conditions that specifies keyword proximity logic, the reported match will be the first match
found

Configuring unique match counting

Count all unique matches is the default selection for new data identifiers you create. After
upgrading Data Loss Prevention, you may need to manually configure pre-existing data identifier
rules to use unique match counting, if you have not done so prior to upgrade
See “About unique match counting” on page 734.
Detecting content using data identifiers 776
Modifying system data identifiers

To configure unique match counting

1 Select the policy containing the data identifier rule or rules you want to update at the
Manage > Policies > Policy List screen.
2 Select the data identifier rule at the Configure Policy screen.
3 Select the match counting option Count all unique matches.
4 Click OK to apply the unique match counting configuration change.
5 Click Save to save the policy change.
6 Test unique match counting.
Create an incident with multiple instances of a data identifier pattern, such as several
instances of the same social security number in the same message component (for
example, in an email attachment).
At the Incident Snapshot verify that only unique matches are highlighted and counted.

Modifying system data identifiers

The system lets you modify system-defined data identifiers, but you cannot delete them. Any
modifications you make to the configuration of a system-defined data identifier take effect
system-wide. This means that the modifications apply to any policies that actively or
subsequently declare the data identifier.
There is no way to automatically revert a data identifier to its original configuration once it is
modified. Before you modify a system data identifier, consider cloning it.
, and any custom data identifiers that you have created. Any modification you make to a data
identifier takes effect system wide. This means the modifications apply to any policy that
declares the modified data identifier.
The system does not include modified data identifiers in policies exported as templates. Before
modifying a system data identifier, export any policies that declare it.
See “Editing data identifiers” on page 736.
See “Editing pattern validator input” on page 778.

Note: The system does not export modified and custom data identifiers in a policy template.
The system exports a reference to the system data identifier. The target system where the
policy template is imported provides the actual data identifier. See “Clone system-defined data
identifiers before modifying to preserve original state” on page 835.

See “Editing data identifiers” on page 736.

Detecting content using data identifiers 777
Modifying system data identifiers

Table 31-20 System data identifier modification options

Modifiable at the system level Not configurable

■ Patterns ■ Name, Description, and Category

You can edit one or more data identifier patterns at You cannot modify the name, description, or category of
the system level. a system data identifier.
■ Active Validators ■ Breadth
You can add or remove required validators at the You cannot define a new detection breadth for a system
system level. data identifier; you can only modify an existing breadth.
■ Data Entry ■ Optional Validators
You can edit the input of an active validator for a You cannot define optional validators at the system level.
system data identifier. You can only configure optional validators at the policy
level.
■ Data Normalizer
You cannot modify the type of data normalizer
implemented by a system data identifier.
■ Delete
You cannot delete a system data identifier.

Cloning a system data identifier before modifying it

The Enforce Server does not provide an automated mechanism for cloning a system data
Identifier.
See “Extending and customizing data identifiers” on page 731.
Before you modify a system data Identifier, consider manually cloning it so you can revert to
the original configuration, if necessary. At the least, you should export a policy as a template
before you modify any system data Identifier declared by that policy.
To manually clone a system data identifier
1 Review the original configuration of the data identifier you want to modify.
2 Create a custom data identifier.
See “Workflow for creating custom data identifiers” on page 812.
3 Copy the configuration of the original data identifier to the custom data identifier.
Add the pattern(s), validator(s), any data input, and the normalizer.
See “Selecting a data identifier breadth” on page 739.
4 Save the custom data identifier.
5 Modify the custom data identifier to suit your needs.
Detecting content using data identifiers 778
Modifying system data identifiers

Editing pattern validator input

At the system-level you can edit the data input that a required validator accepts. Not all
validators accept data input.
See “About pattern validators” on page 733.
To edit required validator input
1 Edit the data identifier by selecting it from the Manage > Policies > data identifiers
screen.
2 Select the Rule Breadth you want to modify.
Generally, the medium and narrow breadth options include validators that accept data
input.
3 Select the editable validator from the Active Validators list whose input you want to edit.
For example, select Find keywords.
See “List of pattern validators that accept input data” on page 778.
4 Edit the input for the validator in the Description and Data Entry field.
5 Select the qualities you want for the keyword;
■ Proximity - To find a keyword only within the set proximity of the matched patterns,
check this box and also indicate the Word Distance or proximity.
■ Case sensitive - Check this box if you want to search for a case-sensitive match.
■ Highlight keywords in incident - Check this box if you want to highlight the matched
keywords in incidents.

6 Click Update Validator to save the changes you have made to the validator input.
Click Discard Changes to not save the changes.
7 Click Save to save the data identifier.

List of pattern validators that accept input data

The following table lists all available pattern validators that require data input. The input data
is editable at the system-level definition of the data identifier.

Note: Input you use for beginning and ending validators concern the text of the match itself.
Input you use for prefix and suffix validators concern characters before and after matched text.
Detecting content using data identifiers 779
Modifying system data identifiers

Table 31-21 Pattern validators that accept input data

Validator Description

Exact Match Enter a comma-separated list of values. If the values are numeric, do NOT enter
any dashes or other separators. Each value can be of any length.

Exclude beginning characters Enter a comma-separated list of values. If the values are numeric, do NOT enter
any dashes or other separators. Each value can be of any length.

Exclude ending characters Enter a comma-separated list of values. If the values are numeric, do NOT enter
any dashes or other separators. Each value can be of any length.

Exclude exact match Enter a comma-separated list of values. Each value can be of any length.

Exclude prefix Enter a comma-separated list of values. Each value can be of any length.

Exclude suffix Enter a comma-separated list of values. Each value can be of any length.

Find keywords Enter a comma-separated list of values. Each value can be of any length.

Require beginning characters Enter a comma-separated list of values. If the values are numeric, do NOT enter
any dashes or other separators. Each value can be of any length.

Require ending characters Enter a comma-separated list of values. If the values are numeric, do NOT enter
any dashes or other separators. Each value can be of any length.

Editing keywords for international PII data identifiers

To use keywords for international data identifiers

1 Create a policy using one of the system-provided international data identifiers that is listed
in the table.
See “List of keywords for international system data identifiers” on page 780.
2 Select the Find Keywords optional validator.
See “Configuring the Content Matches data identifier condition” on page 737.
3 Copy and past the appropriate comma-separated keywords from the list to the Find
Keywords optional validator field.
See “Configuring optional validators” on page 763.

List of keywords for international system data identifiers

Table 31-22 provides keywords for several system-defined international data identifiers. You
can modify the specified data identifier using the corresponding keyword(s).
See “Extending and customizing data identifiers” on page 731.
See “Introducing data identifiers” on page 717.
See “Selecting a data identifier breadth” on page 739.

Table 31-22 Keyword list for international PII data identifiers

Data identifier Language Keywords English translation

Argentina Tax Spanish Número de Identificación Fiscal, Tax identification number,

Identification Number número de contribuyente, taxpayer number, Argentina tax
Número de identificación fiscal identification number, Argentina
Argentina, Argentina número de taxpayer number
contribuyente

Austria Passport German REISEPASS, ÖSTERREICHISCH Passport, Austrian passport

Number REISEPASS, reisepass

Austria Tax German Österreich, Steuernummer Austria, tax number

Identification Number

Austria Value Added German MwSt, Umsatzsteuernummer, VAT, sales tax number, VAT
Tax (VAT) Number MwSt Nummer, number, VAT identification
Ust.-Identifikationsnummer, number, sales tax, UID number
umsatzsteuer, Umsatzsteuer-
Identifikationsnummer
Detecting content using data identifiers 781
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Austrian Social German sozialversicherungsnummer, Social insurance number, social

Security Number soziale sicherheit security number, insurance
kein,Versicherungsnummer, number, Austrian SSN, Austrian
Österreichischen SSN, social insurance
Österreichischen
Sozialversicherungs

Belgian National French Numéro national, numéro de National number, security number,
Number sécurité, numéro d'assuré, number of insured, national
identifiant national, identification, national
identifiantnational#, identification #, national number
Numéronational# #

Belgium Driver's German, French, Führerschein, Fuhrerschein, Driver's license, driver's license
License Number Frisian Fuehrerschein, number, driving permit, driving
Führerscheinnummer, permit number
Fuhrerscheinnummer,
Fuehrerscheinnummer,
Führerscheinnummer,
Fuhrerscheinnummer,
Fuehrerscheinnummer,
Führerschein- Nr, Fuhrerschein-
Nr, Fuehrerschein- Nr, permis de
conduire,
rijbewijs,Rijbewijsnummer,
Numéro permis conduire

Belgium Passport Dutch, German, Paspoort, paspoort, Passport, passport number,

Number French paspoortnummer, Reisepass passport book, passport card
kein, Reisepass, Passnummer,
Passeport, Passeport livre,
Passeport carte, numéro
passeport

Belgium Tax Dutch, German, Numéro de registre national, National registry number, tax
Identification Number French numéro d'identification fiscale, identification number, tax number
belasting aantal,Steuernummer

Belgium Value Added German, French Numéro T.V.A, VAT number, tax identification
Tax (VAT) Number Umsatzsteuer-Identifikationsnummer, number
Umsatzsteuernummer
Detecting content using data identifiers 782
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Brazilian Election Brazilian número identificação, Identification number, voter

Identification Number Portuguese identificação do eleitor, ID eleitor identification, electoral
eleição, número identificação identification number, Brazilian
eleitoral, Número identificação electoral identification number,
eleitoral brasileira,
IDeleitoreleição#

Brazilian National Brazilian Brasileira ID Legal, entidades Brazilian legal identification, legal
Registry of Legal Portuguese jurídicas ID,Registro Nacional de entities ID, National Registry of
Entities Number Pessoas Jurídicas n º, Legal Entities No
BrasileiraIDLegal#

Brazilian Natural Brazilian Cadastro de Pessoas Físicas, Registration of individuals,

Person Registry Portuguese Brasileiro Pessoa Natural Número Brazilian Natural Person Registry
Number de Registro, pessoa natural Number, natural person registry
número de registro, pessoas number, individual registration
singulares registro NO number

British Columbia French MSP nombre, soins de santé no, MSP Number, MSP no, personal
Personal Healthcare soins de santé personnels healthcare number, Healthcare
Number nombre, MSPNombre#, No, PHN
soinsdesanténo#

Bulgaria Value Added Bulgarian номер на таксата, ДДС, ДДС#, Fee number, VAT, VAT number,
Tax (VAT) Number ДДС номер., ДДС номер.#, value added tax
номер на данъка върху
добавената стойност, данък
върху добавената стойност,
ДДС номер

Bulgarian Uniform Civil Bulgarian Униформ граждански номер, Uniform civil number, Uniform ID,
Number - EGN Униформ ID, Униформ Uniform civil ID, Bulgarian uniform
граждански ID, Униформ civil number
граждански не., български
Униформ граждански номер,
УниформгражданскиID#,
Униформгражданскине.#

Burgerservicenummer Dutch Persoonsnummer, sofinummer, person number, social-fiscal

sociaal-fiscaal nummer, number (abbreviation),
persoonsgebonden social-fiscal number,
person-related number

Canada Driver's French permis de conduire Driver's license

License Number
Detecting content using data identifiers 783
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Canada Passport French numéro passeport, No passeport, Passport number, passport no.,
Numbert passeport# passport#

Canada Permanent French numéro résident permanent, permanent resident number,

Resident (PR) Number résident permanent non, résident permanent resident no, permanent
permanent no., carte résident resident number, permanent
permanent, numéro carte résident resident card, permanent resident
permanent, pr non card number, pr no

Chilean National Spanish Chilena número identificación, Chileand identification number,

Identification Number nacional identidad, número national identity, identification
identificación, número number, national identification
identificación nacional, identidad number, identity number, Unique
número, National Role
NúmerodeIdentificación#,
Identidadchilenano#, Rol Único
Nacional, RolÚnicoNacional#,
nacionalidentidad#

China Passport Number Chinese 中国护照, 护照, 护照本 Chinese passport, passport,
passport book

Codice Fiscale Italian codice fiscal, dati anagrafici, tax code, personal data, VAT
partita I.V.A., p. iva number, VAT number

Columbian Addresses Spanish Calle, Cll, Carrera, Cra, Cr, Street, St, Career, Avenue,
Avenida, Av, Dg, Diagonal, Diag, Diagonal, Transversal, sidewalk
Tv, Trans, Transversal, vereda

Columbian Cell Phone Spanish numero celular, número de Cellular number, telephone
Number teléfono, teléfono celular no., number, cellular telephone
numero celular# number

Columbian Personal Spanish cedula, cédula, c.c., c.c,C.C., C.C, Identification card, citizenship
Identification Number cc, CC, NIE., NIE, nie., nie, cedula card, identification document
de ciudadania, cédula de
ciudadanía, cc#, CC #, documento
de identificacion, documento de
identificación, Nit.

Columbian Tax Spanish NIT., NIT, nit., nit, Nit. TIN (tax identification number)
Identification Number
Detecting content using data identifiers 784
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Croatia National Croatian Osobna iskaznica, Nacionalni Personal ID, national identification
Identification Number identifikacijski broj, osobni ID, number, personal ID, personal
osobni identifikacijski broj, porez identification number, tax
iskaznica, porezni broj, porezni identification card, tax number, tax
identifikacijski broj, porez kod, identification number, tax code,
šifra poreznog obveznika taxpayer code

Cyprus Tax Turkish, Greek αριθμός φορολογικού μητρώου, Tax identification number, tax
Identification Number Vergi Kimlik Numarası, vergi number, TIN number, Cyprus TIN
numarası, Kıbrıs TIN numarası number

Cyprus Value Added Turkish, Greek KDV, kdv#, KDV numarası, Katma VAT, VAT number, value added
Tax (VAT) Number değer Vergisi, Φόρος tax,
Προστιθέμενης Αξίας

Czech Republic Driver's Czech řidičský průkaz, řidičský prúkaz, Driving license, driver's license
Licence Number číslo řidičského průkazu, řidičské number, driving license number,
číslo řidičů, ovladače lic., Číslo driver's lic., driver license number,
licence řidiče, Řidičský průkaz, driver's permit
povolení řidiče, řidiči povolení,
povolení k jízdě, číslo licence

Czech Republic Czech Česká Osobní identifikační číslo, Czech Personal Identification
Personal Identification Osobní identifikační číslo., Number, personal identification
Number identifikační číslo, čeština number, Czech identification
identifikační číslo number

Czech Republic Tax Czech osobní kód, Národní identifikační Personal code, national
Identification Number číslo, osobní identifikační číslo, identification number, personal
cínové číslo, daňové identifikačné identification number, TIN number,
číslo, daňový poplatník id tax identification number, taxpayer
ID

Czech Republic Value Czech číslo DPH, Daň z přidané VAT number, value added tax,
Added Tax (VAT) hodnoty, Dan z pridané hodnoty, VAT
Number Daň přidané hodnoty, Dan
pridané hodnoty, DPH, DIC, DIČ
Detecting content using data identifiers 785
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Denmark Personal Danish Nationalt identifikationsnummer, National identification number,

Identification Number personnummer, unikt personal number, unique
identifikationsnummer, identification number, identification
identifikationsnummer, centrale number, central registry of
personregister, persons, CPR number
cpr,cpr-nummer,cpr#,
cpr-nummer#,
identifikationsnummer#,
personnummer#

Denmark Value Added Danish moms, momsnummer, moms VAT number, vat, value added tax
Tax (VAT) Number identifikationsnummer, number, vat identification number
merværdiafgift

Estonia Driver's Estonian juhiluba, JUHILUBA, juhiluba Driving license, driving license
Licence Number number, juhiloa number, number, driver's license number,
Juhiluba, juhi litsentsi number license number

Estonia Passport Estonian Pass, pass, passi number, pass Passport, passport number,
Number nr, pass#, Pass nr, Eesti passi Estonian passport number
number

Estonia Personal Estonian isikukood, isikukood#, IK, IK#, Personal identification code, tax
Identification Code maksu ID, maksukohustuslase ID, taxpayer identification number,
identifitseerimisnumber, tax identification number, tax
maksukood, maksukood#, code, taxpayer code
maksuID#, maksumaksja kood,
maksumaksja
identifitseerimisnumber

Estonia Value Added Estonian käibemaksu VAT registration number, VAT,

Tax (VAT) Number registreerimisnumber, VAT number
käibemaksu, Käibemaksu
number, käibemaks, käibemaks#,
käibemaksu#
Detecting content using data identifiers 786
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

European Health Croatian, Danish, numero conto medico, tessera Medical account number, health
Insurance Card Number Estonian, Finnish, sanitaria assicurazione numero, insurance card number, insurance
French, German, carta assicurazione numero, card number, health insurance
Irish, Italian, Krankenversicherungsnummer, number, medical account number,
Luxembourgish, assicurazione sanitaria numero, health card number, health card,
Polish, Slovenian, medisch rekeningnummer, insurance number, EHIC number,
Spanish ziekteverzekeringskaartnummer,
verzekerings kaart nummer,
gezondheidskaart nummer,
gezondheidskaart, medizinische
Kontonummer,
Krankenversicherungskarte
Nummer, Versicherungsnummer,
Gesundheitskarte Nummer,
Gesundheitskarte, arstliku konto
number, ravikindlustuse kaardi
number, tervisekaart,
tervisekaardi number, Uimhir
ehic, tarjeta salud, broj kartice
zdravstvenog osiguranja, kartice
osiguranja broj, zdravstvenu
karticu, zdravstvene kartice broj,
ehic broj, numero tessera
sanitaria, numero carta di
assicurazione, tessera sanitaria,
numero ehic, Gesondheetskaart,
ehic nummer, numer rachunku
medycznego, numer karty
ubezpieczenia zdrowotne, numer
karty ubezpieczenia, karta
zdrowia, numer karty zdrowia,
numer ehic,
sairausvakuutuskortin numero,
vakuutuskortin numero,
terveyskortti, terveyskortin
numero, medicinsk
kontonummer, ehic numeris,
medizinescher Konto Nummer,
zdravstvena izkaznica
Detecting content using data identifiers 787
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Finland Driver's Finnish, Swedish permis de conduire, ajokortti, Driver's license, driver's license
License Number ajokortin numero, kuljettaja lic., number, driver's lic.
körkort, körkort nummer, förare
lic.

Finland European Finnish Suomi EHIC-numero, Finland EHIC number, sickness

Health Insurance Sairausvakuutuskortti, insurance card, health insurance
Number sairaanhoitokortin, card, EHIC, Finnish health
Sjukförsäkringskort, ehic, insurance card, Health Card,
sairaanhoitokortin, Suomen Survival Card, health insurance
sairausvakuutuskortti, Finska number
sjukförsäkringskort,
Terveyskortti, Hälsokort, ehic#,
sairausvakuutusnumero,
sjukförsäkring nummer

Finland Passport Finnish Suomen passin numero, Finnish passport number, Finnish
Number suomalainen passi, passin passport, passport number,
numero, passin numero.#, passin passport number, passport #
numero#, passin numero, passin
numero., passin numero#, passi#

Finland Tax Finnish verotunniste, verokortti, Tax identification number, tax

Identification Number verotunnus, veronumero card, tax ID, tax number

Finland Value Added Finnish arvonlisäveronumero, ALV, VAT number, VAT, VAT
Tax (VAT) Number arvonlisäverotunniste, ALV nro, identification number
ALV numero, alv

Finnish Personal Finnish tunnistenumero, henkilötunnus, Identification number, personal

Identification Number yksilöllinen henkilökohtainen identification number, unique
tunnistenumero, Ainutlaatuinen personal identification number,
henkilökohtainen tunnus, identity number, Finnish personal
identiteetti numero, Suomen identification number, national
kansallinen henkilötunnus, identification number
henkilötunnusnumero#,
kansallisen tunnistenumero,
tunnusnumero,kansallinen
tunnus numero

France Driver's License French permis de conduire Driver's license

Number

France Health French carte vitale, carte d'assuré social Health card, social insurance card
Insurance Number
Detecting content using data identifiers 788
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

France Tax French numéro d'identification fiscale Tax identification number

Identification Number

France Value Added French Numéro d'identification taxe sur Value added tax identification
Tax (VAT) Number valeur ajoutée, Numéro taxe number, value added tax number,
valeur ajoutée, taxe valeur value added tax, VAT number,
ajoutée, Taxe sur la valeur French VAT number, SIREN
ajoutée, Numéro de TVA identification number
intracommunautaire, n° TVA,
numéro de TVA, Numéro de TVA
en France, français numéro de
TVA, Numéro d'identification
SIREN

French INSEE Code French INSEE, numéro de sécu, code INSEE, social security number,
sécu social security code

French Passport French Passeport français, Passeport, French passport, passport,

Number Passeport livre, Passeport carte, passport book, passport card,
numéro passeport passport number

French Social Security French sécurité sociale non., sécurité Social secuty number, social
Number sociale numéro, code sécurité security code, insurance number
sociale, numéro d'assurance,
sécuritésocialenon.#,
sécuritésocialeNuméro#

German Passport German Reisepass kein, Reisepass, Passport number, passport,

Number Deutsch Passnummer, German passport number,
Passnummer, Reisepasskein#, passport number
Passnummer#

German Personal ID German persönliche Personal identification number, ID

Number identifikationsnummer, number, Germane personal ID
ID-Nummer, Deutsch number, personal ID number,
persönliche-ID-Nummer, clear ID number, personal
persönliche ID Nummer, number, identity number,
eindeutige ID-Nummer, insurance number
persönliche Nummer,identität
nummer, Versicherungsnummer,
persönlicheNummer#,
IDNummer#
Detecting content using data identifiers 789
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Germany Driver's German Führerschein, Fuhrerschein, Driver's license, driver's license

License Number Fuehrerschein, number
Führerscheinnummer,
Fuhrerscheinnummer,
Fuehrerscheinnummer,
Führerscheinnummer,
Fuhrerscheinnummer,
Fuehrerscheinnummer,
Führerschein- Nr, Fuhrerschein-
Nr, Fuehrerschein- Nr

Germany Value Added German Mehrwertsteuer, MwSt, Value added tax, value added tax
Tax (VAT) Number Mehrwertsteuer identification number, value added
Identifikationsnummer, tax number
Mehrwertsteuer nummer

Greece Passport Greek λλάδα pasport αριθμός, Ελλάδα Greece passport number, Greece
Number pasport όχι., Ελλάδα Αριθμός passport no., passport, Greece
Διαβατηρίου, διαβατήριο, passport, passport book
Διαβατήριο, ΕΛΛΑΔΑ
ΔΙΑΒΑΤΗΡΙΟ, Ελλάδα
Διαβατήριο, ελλάδα διαβατήριο,
Διαβατήριο Βιβλίο, βιβλίο
διαβατηρίου

Greece Social Security Greek Αριθμού Μητρώου Κοινωνικής Social security number
Number (AMKA) Ασφάλισης

Greece Value Added Greek FPA, fpa, Foros Prostithemenis VAT, value added tax, tax
Tax (VAT) Number Axias, arithmós dexamenís, Fóros identification number
Prostithémenis Axías, μέγας
κάδος, ΦΠΑ, Φ Π Α, Φόρος
Προστιθέμενης Αξίας, ΦΟΡΟΣ
ΠΡΟΣΤΙΘΕΜΕΝΗΣ ΑΞΙΑΣ, φόρος
προστιθέμενης αξίας, Arithmos
Forologikou Mitroou, Α.Φ.Μ, ΑΦΜ

Greek Tax Identification Greek Αριθμός Φορολογικού Μητρώου, Tax identification number, TIN, tax
Number AΦΜ, Φορολογικού Μητρώου registry number
Νο., τον αριθμό φορολογικού
μητρώου

Hong Kong ID Chinese 身份證 , 三顆星 Identity card, Hong Kong

(Traditional) permanent resident ID Card
Detecting content using data identifiers 790
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Hungary Driver's Hungarian jogosítvány, Illesztőprogramok License, driver's lic, driver's

Licence Number Lic, jogsi, licencszám, vezetői license, number of licenses,
engedély, VEZETŐI ENGEDÉLY, driving license
vezető engedély, VEZETŐ
ENGEDÉLY

Hungary Passport French, útlevél, Magyar útlevélszám, Passport, Hungarian passport

Number Hungarian útlevél könyv, nombre, numéro number, passport book, number,
de passeport, hongrois, numéro passport number
de passeport hongrois

Hungarian Social Hungarian Magyar társadalombiztosítási Hungarian social security number,

Security Number szám, Társadalombiztosítási social security number, social
szám, társadalombiztosítási ID, security ID, social security code
szociális biztonsági kódot,
szociális biztonság nincs.,
társadalombiztosításiID#

Hungarian Tax Hungarian Magyar adóazonosító jel no, Hungarian tax identification
Identification Number adóazonosító szám, magyar tumber, tax identification number,
adószám, Magyar adóhatóság Hungarian tax number, Hungarian
no., azonosító szám, tax authority number, tax number,
adóazonosító no., adóhatóság no tax authority number

Hungarian VAT Number Hungarian Közösségi adószám, Általános Value added tax identification
forgalmi adó szám, number, sales tax number, value
hozzáadottérték adó, magyar added tax, Hungarian value added
Közösségi adószám tax number

Iceland National Icelandic kennitala, persónuleg kennitala, Social security number, personal
Identification Number galdur númer, skattanúmer, identification number, magic
skattgreiðenda kóða, kennitala number, tax code, taxpayer code,
skattgreiðenda taxpayer ID number

Iceland Passport Icelandic vegabréf, vegabréfs númer, Passport, passport number,

Number Vegabréf Nei, vegabréf# passport no.

Iceland Value Added Icelandic virðisaukaskattsnúmer, vsk VAT number

Tax (VAT) Number númer
Detecting content using data identifiers 791
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Indonesian Identity Indonesian, Kartu Tanda Penduduk nomor, Identity card number, card
Card Number Portuguese número do cartão, Kartu identitas number, Indonesian identity card
Indonesia no, kartu no., Kartu number, card no., Indonesian
identitas Indonesia nomor, Nomor identity card number, ID number
Induk Kependudukan,
númerodocartão,kartuno.,
KartuidentitasIndonesiano

International Bank French Code IBAN, numéro IBAN IBAN Code, IBAN number
Account Number (IBAN)
Central

International Bank French Code IBAN, numéro IBAN IBAN Code, IBAN number
Account Number (IBAN)
East

International Bank French Code IBAN, numéro IBAN IBAN Code, IBAN number
Account Number (IBAN)
West

Ireland Passport Irish irelande passeport, Éire pas, no Ireland passport, passport
Number de passeport, pas uimh, uimhir number, passport
pas, numéro de passeport

Ireland Tax Irish uimhir carthanachta, Uimhir Charity number, charity

Identification Number chláraithe charthanais, uimhir registration number,CHY number,
CHY, CHY uimh., uimhir thagartha tax reference number, Ireland tax
cánach, uimhir aitheantais identification number, Irish tax
cánach ireland, aitheantais identification, tax identification
cánach irish, uimhir aitheantais number, tax id, TIN, Ireland tin
cánach, id cánach, uimhir
chánach, cáin #, STÁIN, cáin id
uimh.

Ireland Value Added Irish cáin bhreisluacha, CBL, CBL aon, Ireland VAT number, VAT
Tax (VAT) Number Uimhir CBL, Uimhir CBL number, VAT no, VAT#, value
hÉireann, bhreisluacha uimhir added tax number, value added
chánach tax, irish VAT

Irish Personal Public Gaelic Gaeilge Uimhir Phearsanta Irish personal public service
Service Number Seirbhíse Poiblí, PPS Uimh., number, PPS no., personal public
uimhir phearsanta seirbhíse service number, service no., PPS
poiblí, seirbhíse Uimh, PPS Uimh, no., PPS service one
PPS seirbhís aon
Detecting content using data identifiers 792
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Israel Personal Hebrew, Arabic ‫זהות‬,‫מספר זיהוי ישראלי‬,‫מספר זיהוי‬ Israeli identity number, identity
Identification Number ‫هوية‬,‫هويةاسرائيلية عدد‬,‫ישראלית‬ number, unique identity number,
‫عدد هوية فريدة من نوعها‬,‫رقم الهوية‬,‫ إسرائيلية‬personal ID, unique personal ID,
unique ID

Italy Driver's License Italian patente guida numero, patente di Driver's license number, driver's
Number guida numero, patente di guida, license
patente guida

Italy Health Insurance Italian TESSERA SANITARIA, tessera Health insurance card, Italian
Number sanitaria, tessera sanitaria health insurance card
italiana

Italian Passport Italian Repubblica Italiana Passaporto, Italian Republic passport,

Number Passaporto, Passaporto Italiana, passport, Italian passport, Italian
passport number, Italiana passport number, passport
Passaporto numero, Passaporto number
numero, Numéro passeport
italien, numéro passeport

Italy Value Added Tax Italian IVA, numero partita IVA, IVA#, VAT, VAT number, VAT#, VAT
(VAT) Number numero IVA number

Japan Driver's License Japanese 公安委員会, 番号, 免許, 交付, 運転 Public Security Committee,
Number 免許, 運転免許証, ドライバライセ driver's license, driving license,
ンス, ドライバーズライセンス, ラ driver license, driver's license
イセンス, 運転免許証番号 number, driving license number,
driver license number, license

Japanese Juki-Net ID Japanese 住基ネット識別番号, 住基ネット番 Juki-Net identification number,

Number 号, 識別番号, 個人識別番号 Juki-Net number, identification
number, personal identification
number

Japanese My Number - Japanese マイナンバー, 共通番号 My number, common number

Corporate

Japanese My Number - Japanese マイナンバー, 個人番号, 共通番号 My number, personal number,

Personal common number

Japan Passport Japanese 日本国旅券, パスポート, パスポー Japanese passport, passport,

Number ト数 passport number
Detecting content using data identifiers 793
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Kazakhstan Passport Kazakh төлқұжат, төлқұжат нөмірі, Passport, passport number,

Number номер паспорта, заграничный passport ID, international
пасспорт, национальный passport, national passport
паспорт

Korea Passport Number Korean 한국어 여권, 여권, 여권 번호, 대한 Korean passport, passport,
민국 passport number, Republic of
Korea

Korea Residence Korean 외국인 등록 번호, 주민번호 Foreigner registration number,

Registration Number social security number
for Foreigners

Korean Residence Korean 주민등록번호, 주민번호 Resident registration number,

Registration Number social security number
for Korean

Latvia Driver's Licence Latvian licences numurs, vadītāja License number, driver's license,
Number apliecība, autovadītāja apliecība, driver's license number, driver's
vadītāja apliecības numurs, lic.
Vadītāja licences numurs, vadītāji
lic., vadītāja atļauja

Latvia Passport Latvian LATVIJA, LETTONIE, Pases Nr., Latvia, passport no., passport
Number Pases Nr, Pase, pase, pases number, passport book, passport
numurs, Pases Nr, pases #, passport card
grāmata, pase#, pases karte

Latvia Personal Latvian Personas kods, personas kods, Latvia personal code, personal
Identification Number latvijas personas kods, Valsts code, national identification
identifikācijas numurs, valsts number, identification number,
identifikācijas numurs, national ID, latvia TIN, TIN, tax
identifikācijas numurs, identification number, tax ID, TIN
nacionālais id, latvija alva, alva, number, tax number
nodokļu identifikācijas numurs,
nodokļu id, alvas nē, nodokļa
numurs

Latvia Value Added Tax Latvian PVN Nr, PVN maksātāja numurs, VAT no., VAT payer number, VAT
(VAT) Number PVN numurs, PVN#, pievienotās number, VAT#, value added tax,
vērtības nodoklis, pievienotās value added tax number
vērtības nodokļa numurs

Liechtenstein Passport German Reisepass, Pass Nr, Pass Nr., Passport, passport no.
Number Reisepass#, Pass Nr#
Detecting content using data identifiers 794
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Lithuania Personal Lithuanian Nacionalinis ID, Nacionalinis National ID, national identification
Identification Number identifikavimo numeris, asmens number, personal ID
kodas

Lithuania Tax Lithuanian mokesčių identifikavimo Nr., tax identification number, tax ID,
Identification Number mokesčių identifikavimo numeris, tax ID number, tax ID number, tax
mokesčių ID, mokesčių id nr, ID #, tax number, tax no., fee #
mokesčių id nr., mokesčių ID#,
mokesčių numeris, mokestis Nr,
mokestis #, Mokesčių
identifikavimo numeris

Lithuania Value Added Lithuanian pridėtinės vertės mokesčio VAT number, VAT, VAT #, Value
Tax (VAT) Number numeris, PVM, PVM#, pridėtinės added tax, VAT registration
vertės mokestis, PVM numeris, number
PVM registracijos numeris

Luxembourg National German, French Eindeutige ID-Nummer, Unique ID number, unique ID,
Register of Individuals Eindeutige ID, ID personnelle, personal ID, personal identification
Number Numéro d'identification number
personnel, IDpersonnelle#,
Persönliche
Identifikationsnummer,
EindeutigeID#

Luxembourg Passport French and passnummer, ausweisnummer, Passport number, passport,

Number German passeport, reisepass, pass, pass Luxembourg pass, Luxembourg
net, pass nr, no de passeport, passport
passeport nombre, numéro de
passeport

Luxembourg Tax French, German Zinn, Zinn Nummer, Luxembourg TIN, TIN number, Luxembourg tax
Identification Number Tax Identifikatiounsnummer, identification number, tax number,
Steier Nummer, Steier ID, tax ID, social security ID,
Sozialversicherungsausweis, Luxembourg tax identification
Zinnzahl, Zinn nein, Zinn#, number, Social Security, Social
luxemburgische Security Card, tax identification
steueridentifikationsnummer, number
Steuernummer,Steuer ID, sécurité
sociale, carte de sécurité sociale,
étain,numéro d'étain, étain non,
étain#, Numéro d'identification
fiscal luxembourgeois, numéro
d'identification fiscale
Detecting content using data identifiers 795
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Luxembourg Value German, TVA kee, TVA#, TVA Aschreiwung Luxembourg VAT number, VAT
Added Tax (VAT) Luxembourgish kee, T.V.A, stammnummer, number, VAT, value added tax
Number bleiwen, geheescht, gitt id, number, VAT ID, VAT registration
mehrwertsteuer, vat number, value added tax
registrierungsnummer,
umsatzsteuer-id, wat,
umsatzsteuernummer,
umsatzsteuer-identifikationsnummer,
id de la batterie, lëtzebuerg vat
nee, registréierung nummer,
numéro de TVA, numéro de
enregistrement vat

Macau National Chinese, 身份证号码, 唯一的识别号码 ID number, unique identification

Identification Number Portuguese number
número de identificação, número
cartão identidade, número cartão Identification number, identity card
identidade nacional, número number, national identity card
identificação pessoal, número number, personal identification
identificação único, id único não, number, unique identification
ID único# number, unique non-ID, unique ID
#

Malaysia Passport Malay pasport, nombor pasport, Passport, passport number,

Number pasport# passport #

Malaysian MyKad Malay nombor kad pengenalan, kad Identification card number,
Number (MyKad) pengenalan no, kad pengenalan identification card no., Malaysian
Malaysia, bilangan identiti unik, identification card, unique identity
nombor peribadi, number, personal number
nomborperibadi#,
kadpengenalanno#

Malta National Maltese numru identifikazzjoni nazzjonali, national identification number,

Identification Number ID nazzjonali, numru national ID, personal identification
identifikazzjoni personali, ID number, personal ID
personali, IDnazzjonali#,
IDpersonali#

Malta Tax Identification Maltese kodiċi tat-taxxa, numru tat-taxxa, Tax code, tax number, tax
Number numru identifikazzjoni tat-taxxa, identification number, taxid#
taxxaid#, numru identifikazzjoni taxpayer identification number,
kontribwent, kodiċi kontribwent, taxpayer code, tin, tin no
landa, landa nru
Detecting content using data identifiers 796
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Malta Value Added Tax Maltese Numru tal-VAT, numru tal-VAT, VAT number, VAT, value added
(VAT) Number bettija,valur miżjud taxxa tax number, vat identification
in-numru, bettija identifikazzjoni number
in-numru

Mexican Personal Spanish Clave de Registro de Identidad Personal identity registration key,
Registration and Personal, Código de Mexican personal identification
Identification Number Identificación Personal mexicana, code, Mexican personal
número de identificación identification number
personal mexicana

Mexican Tax Spanish Registro Federal de Federal taxpayer registry, tax

Identification Number Contribuyentes, número de identification number, federal
identificación de impuestos, taxpayer registry number, RFC
Código del Registro Federal de number, RFC key
Contribuyentes, Número RFC,
Clave del RFC

Mexican Unique Spanish Única de registro de Población, Unique population registry, unique
Population Registry clave única, clave única de key, unique identity key, unique
Code identidad, clave personal personal identity, personal identity
Identidad, personal Identidad key
Clave, ClaveÚnica#,
clavepersonalIdentidad#

Mexico CLABE Number Spanish Clave Bancaria Estandarizada, Standardized banking code,
Estandarizado Banco número de standardized bank code number,
clave, número de clave, clave code number
número, clave#

Netherlands Bank Dutch, bancu aklarashon number, Bank account number, account
Account Number Papiamento aklarashon number, number
bankrekeningnummer,
rekeningnummer

Netherlands Driver's Dutch RIJMEWIJS, permis de conduire, Driver's license, driving permit,
License Number rijbewijs, Rijbewijsnummer, driver's license number
RIJBEWIJSNUMMER

Netherlands Passport Dutch Nederlanden paspoort nummer, Dutch passport number, passport,
Number Paspoort, paspoort, Nederlanden passport number
paspoortnummer,
paspoortnummer
Detecting content using data identifiers 797
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Netherlands Tax Dutch, Nederlands belasting Dutch tax identification number,

Identification Number Pampiamento, identificatienummer, tax identification number, Dutch
Norwegian identificatienummer van tax identification, Dutch tax
belasting, identificatienummer number, tax number
belasting, Nederlands belasting
identificatie, Nederlands belasting
id nummer, Nederlands
belastingnummer, btw nummer,
Nederlandse belasting
identificatie, Nederlands
belastingnummer, netherlands
tax identification tal, netherland's
tax identification tal, tax
identification tal, tax tal,
Nederlânske tax identification tal,
Hollânske tax identification,
Nederlânsk tax tal, Hollânske tax
id tal, netherlands impuesto
identification number,
netherland's impuesto
identification number, impuesto
identification number, impuesto
number, hulandes impuesto
identification number, hulandes
impuesto identification, hulandes
impuesto number, hulandes
impuesto id number

Netherlands Value Dutch, Frisian wearde tafoege tax getal, BTW Value added tax number, VAT
Added Tax (VAT) nûmer, BTW-nummer number
Number

New Zealand Driver's Maori raihana taraiwa Driving license

Licence Number

New Zealand Passport Maori uruwhenua, tau uruwhenua, Passport, passport no.
Number uruwhenua no, uruwhenua no.

Norway Driver's Norwegian førerkort, førerkortnummer Driver's license, driver's license

Licence Number number
Detecting content using data identifiers 798
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Norway National Norwegian Nasjonalt ID, personlig ID, National ID, personal ID, national
Identification Number Nasjonalt ID#, personlig ID#, skatt ID #, personal ID #, tax ID, tax
id, skattenummer, skattekode, code, taxpayer ID, taxpayer
skattebetalers id, skattebetalers identification number
identifikasjonsnummer

Norway Value Added Norwegian mva, MVA, momsnummer, VAT, VAT number, VAT
Tax Number Momsnummer, registration number
momsregistreringsnummer

Norwegian Birth Norwegian fødsel nummer, Fødsel nr, fødsel Birth number
Number nei, fødselnei#, fødselnummer#

People's Republic of Chinese 身份证,居民信息,居民身份信息 Identity Card, Information of

China ID (Simplified) resident, Information of resident
identification

Poland Driver's Licence Polish Kierowcy Lic., prawo jazdy, Drivers license number, driving
Number numer licencyjny, zezwolenie na license, license number
prowadzenie, PRAWO JAZDY

Poland European Polish Numer EHIC, Karta Ubezpieczenia EHIC number, Health Insurance
Health Insurance Zdrowotnego, Europejska Karta Card, European Health Insurance
Number Ubezpieczenia Zdrowotnego, Card, health insurance number,
numer ubezpieczenia medical account number
zdrowotnego, numer rachunku
medycznego

Poland Passport French, Polish paszport#, numer paszportu, Nr Passport #, passport number,
Number paszportu, paszport, książka passport number, passport,
paszportowa passport book

passeport, nombre, numéro de Passport, number, passport

passeport, passeport#, No de number, passport #, passport
passeport number

Poland Value Added Polish Numer Identyfikacji Podatkowej, Tax identification number, tax ID
Tax (VAT) Number NIP, nip, Liczba VAT, podatek od number, VAT number, value
wartosci dodanej, faktura VAT, added tax, VAT invoice, VAT
faktura VAT# invoice #

Polish Identification Polish owód osobisty, Tożsamości Identification card, national

Number narodowej, osobisty numer identity, identification card
identyfikacyjny, niepowtarzalny number, unique number, number
numer, numer
Detecting content using data identifiers 799
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Polish REGON Number Polish numer statystyczny, REGON, Statistical number, REGON
numeru REGON, number
numerstatystyczny#,
numeruREGON#

Polish Social Security Polish PESEL Liczba, społeczny PESEL number, social security
Number (PESEL) bezpieczeństwo liczba, społeczny number, social security ID, social
bezpieczeństwo ID, społeczny security code
bezpieczeństwo kod,
PESELliczba#,
społecznybezpieczeństwoliczba#

Polish Tax Polish Numer Identyfikacji Podatkowej, Tax identification number, Polish
Identification Number Polski numer identyfikacji tax identification number
podatkowej,
NumerIdentyfikacjiPodatkowej#

Portugal Driver's Portuguese carteira de motorista, carteira driver's license, license number,
License Number motorista, carteira de habilitação, driving license, driving license
carteira habilitação, número de Portugal
licença, número licença,
permissão de condução,
permissão condução, Licença
condução Portugal, carta de
condução

Portugal National Portuguese bilhete de identidade, número de identity card, civil identification
Identification Number identificação civil, número de number, citizen's card number,
cartão de cidadão, documento de identification document, citizen's
identificação, cartão de cidadão, card, bi number of Portugal,
número bi de portugal, número document number
do documento

Portugal Passport French and passaporte, passeport, Passport number, passport,

Number Portuguese portuguese passport, portuguese Portuguese passport
passeport, portuguese
passaporte, passaporte nº,
passeport nº

Portugal Tax Portuguese número identificação fiscal Tax identification numberr

Identification Number

Portugal Value Added Portuguese imposto sobre valor Value added tax, VAT, VAT
Tax (VAT) Number acrescentado, VAT nº, número number, VAT code
iva, vat não, código iva
Detecting content using data identifiers 800
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Romania Driver's Romanian permis de conducere, PERMIS DE Driving license, driving license
Licence Number CONDUCERE, Permis de number
conducere, numărul permisului
de conducere, Numărul
permisului de conducere

Romania National Romanian numărul de identificare fiscală, fiscal identification number, tax
Identification Number identificarea fiscală nr #, codul identification number, fiscal code
fiscal nr. number,

Romania Value Added Romanian CIF, cif, CUI, cui, TVA, tva, TVA#, VAT, VAT #, value added tax,
Tax (VAT) Number tva#, taxa pe valoare adaugata, fiscal code, fiscal identification
cod fiscal, cod fiscal de code, unique registration code,
identificare, cod fiscal unique identification code, code
identificare, Cod Unic de unique registration
Înregistrare, cod unic de
identificare, cod unic identificare,
cod unic de înregistrare, cod unic
înregistrare

Romanian Numerical Romanian Cod Numeric Personal, cod Personal numeric code, personal
Personal Code identificare personal, cod unic identification code, unique
identificare, număr personal unic, identification code, identity
număr identitate, număr number, personal identification
identificare personal, number
număridentitate#,
CodNumericPersonal#,
numărpersonalunic#

Russian Passport Russian паспорт нет, паспорт, номер Passport no., passport, passport
Identification Number паспорта, паспорт ID, number, passport ID, Russian
Российской паспорт, Русский passport, Russian passport
номер паспорта, паспорт#, number
паспортID#, номерпаспорта#

Russian Taxpayer Russian НДС, номер TIN (tax identification number),

Identification Number налогоплательщика, taxpayer number, taxpayer ID, rax
Налогоплательщика ИД, налог number
число, налогчисло#, ИНН#,
НДС#
Detecting content using data identifiers 801
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

SEPA Creditor Identifier Bulgarian, SEPA-Gläubiger-Identifikator, SEPA creditor identifier, creditor

Number North Finnish, French, Gläubiger-ID, SEPA-ID, ID, SEPA ID, creditor ID
German, Irish, Gläubiger-Kennung
Creditor ID, SEPA ID
Italian,
ID créancier, ID SEPA, Identifiant
Luxembourgish, SEPA creditor identifier, crediting,
du créancie
Portuguese, creditor identification
Spanish SEPA Krediter Identifizéierer,
SEPA creditor identifier, Creditor
Kreditergeld, Krediter
Identifier
Identifizéierer
Creditor ID, SEPA ID, Creditor
SEPA kreditoridentifikator,
identifier
Kreditoridentifikator
Creditor ID, Creditor Identifier
Velkojan tunnus, SEPA-tunnus,
Velkojan tunniste Creditor ID, Creditor Identifier

ID Creidiúnaí, Aithnitheoir Creditor Identifier SEPA, Creditor

Creidiúnaí ID, SEPA ID, Creditor Identifier

ID del creditore, Identificatore del SEPA Creditor Identifier, Creditor

creditore Identifier

Identificador de acreedor SEPA,

ID del acreedor, ID de SEPA,
Identificador del acreedor

Identificador Credor SEPA,

Identificador do Credor
Detecting content using data identifiers 802
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

SEPA Creditor Identifier Bulgarian, SEPA-Gläubiger-Identifikator, SEPA creditor identifier, creditor

Number South Finnish, French, Gläubiger-ID, SEPA-ID, ID, SEPA ID, creditor ID
German, Irish, Gläubiger-Kennung
Creditor ID, SEPA ID
Italian,
ID créancier, ID SEPA, Identifiant
Luxembourgish, SEPA creditor identifier, crediting,
du créancie
Portuguese, creditor identification
Spanish SEPA Krediter Identifizéierer,
SEPA creditor identifier, Creditor
Kreditergeld, Krediter
Identifier
Identifizéierer
Creditor ID, SEPA ID, Creditor
SEPA kreditoridentifikator,
identifier
Kreditoridentifikator
Creditor ID, Creditor Identifier
Velkojan tunnus, SEPA-tunnus,
Velkojan tunniste Creditor ID, Creditor Identifier

ID Creidiúnaí, Aithnitheoir Creditor Identifier SEPA, Creditor

Creidiúnaí ID, SEPA ID, Creditor Identifier

ID del creditore, Identificatore del SEPA Creditor Identifier, Creditor

creditore Identifier

Identificador de acreedor SEPA,

ID del acreedor, ID de SEPA,
Identificador del acreedor

Identificador Credor SEPA,

Identificador do Credor
Detecting content using data identifiers 803
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

SEPA Creditor Identifier Bulgarian, SEPA-Gläubiger-Identifikator, SEPA creditor identifier, creditor

Number West Finnish, French, Gläubiger-ID, SEPA-ID, ID, SEPA ID, creditor ID
German, Irish, Gläubiger-Kennung
Creditor ID, SEPA ID
Italian,
ID créancier, ID SEPA, Identifiant
Luxembourgish, SEPA creditor identifier, crediting,
du créancie
Portuguese, creditor identification
Spanish SEPA Krediter Identifizéierer,
SEPA creditor identifier, Creditor
Kreditergeld, Krediter
Identifier
Identifizéierer
Creditor ID, SEPA ID, Creditor
SEPA kreditoridentifikator,
identifier
Kreditoridentifikator
Creditor ID, Creditor Identifier
Velkojan tunnus, SEPA-tunnus,
Velkojan tunniste Creditor ID, Creditor Identifier

ID Creidiúnaí, Aithnitheoir Creditor Identifier SEPA, Creditor

Creidiúnaí ID, SEPA ID, Creditor Identifier

ID del creditore, Identificatore del SEPA Creditor Identifier, Creditor

creditore Identifier

Identificador de acreedor SEPA,

ID del acreedor, ID de SEPA,
Identificador del acreedor

Identificador Credor SEPA,

Identificador do Credor

Serbia Unique Master Serbian јединствен мајстор грађанин Unique master citizen number,
Citizen Number Број, Јединствен матични број, unique identification number,
јединствен број ид, Национални unique id number, National
идентификациони број identification number

Serbia Value Added Tax Serbian poreski identifikacioni broj, Tax identification number VAT
(VAT) Number PORESKI IDENTIFIKACIONI number, value added tax, VAT,
BROJ, Poreski br., ПДВ број, identification number, tax number
Порез на додату вредност, PDV
broj, Porez na dodatu vrednost,
porez na dodatu vrednost, PDV,
pdv, ПДВ, порески
идентификациони број, PIB, pib,
пиб, poreski broj, порески број
Detecting content using data identifiers 804
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Slovakia Driver's Slovak vodičský preukaz, Vodičský Driving license, license number
Licence Number preukaz, VODIČSKÝ PREUKAZ,
číslo vodičského preukazu,
ovládače lic., povolenie vodiča,
povolenia vodičov, povolenie na
jazdu, povolenie jazdu, číslo
licencie

Slovakia National Hungarian, identifikačné číslo, személyi ID number, identity card number,
Identification Number Slovak igazolvány száma, national identity card number,
személyigazolvány szám, číslo national identification number,
občianského preukazu, identification number, ID card
identifikačná karta č, személyi number, identification card,
igazolvány szám, nemzeti national identity card
személyi igazolvány száma, číslo
národnej identifikačnej karty,
národná identifikačná karta č,
nemzeti személyazonosító
igazolvány, nemzeti azonosító
szám, národné identifikačné číslo,
národná identifikačná značka č,
nemzeti azonosító szám,
azonosító szám, identifikačné
číslo

Slovakia Passport French, Slovak PASSEPORT, passeport, Passport, passport number,

Number cestovný pas, číslo pasu, pas č, passport no
Číslo pasu, PAS, CESTOVNÝ
PAS, Passeport n°

Slovakia Value Added Slovak číslo DPH, číslo dane z pridanej VAT number, value added tax
Tax (VAT) Number hodnoty, identifikačné číslo vat, number, VAT, value added tax,
dph, DPH, daň z pridanej VAT identification number
hodnoty, daň pridanej hodnoty,
číslo dane pridanej hodnoty,
identifikačné číslo DPH

Slovenia Passport French, Slovenian številka potnega lista, potni list, Passport number, passport,
Number knjiga potnega lista, potni list #, passport book, passport #
passeport, Passeport

Slovenia Tax Slovenian identifikacijska številka davka, Tax identification number,

Identification Number Slovenska davčna številka, Slovenian tax number, tax number
Davčna številka
Detecting content using data identifiers 805
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Slovenia Unique Master Slovenian EMŠO, emšo, edinstvena številka Unique national number, unique
Citizen Number državljana, enotna identifikacijska identification number, uniform
številka, Enotna maticna številka registration number, unique
obcana, enotna maticna številka registration number, citizen's
obcana, številka državljana, number, unique identification
edinstvena identifikacijska number
številka

Slovenia Value Added Slovenian številka davka na dodano Value added tax number, VAT no,
Tax (VAT) Number vrednost, DDV št, slovenia vat št Slovenia vat no

South African Personal Afrikaans nasionale identifikasie nommer, National identification number,
Identification Number nasionale identiteitsnommer, national identity number,
versekering aantal, persoonlike insurance number, personal
identiteitsnommer, unieke identity number, unique identity
identiteitsnommer, number, identity number
identiteitsnommer,
identiteitsnommer#,
versekeringaantal#,
nasionaleidentiteitsnommer#

South Korea Resident Korean 주민등록번호, 주민번호 Resident Registration Number,

Registration Number Resident Number

Spain Driver's License Spanish permiso de conducción, permiso Driver's license, driver's license
Number conducción, Número licencia number, driving license, driving
conducir, Número de carnet de permit, driving permit number
conducir, Número carnet
conducir, licencia conducir,
Número de permiso de conducir,
Número de permiso conducir,
Número permiso conducir,
permiso conducir, licencia de
manejo, el carnet de conducir,
carnet conducir
Detecting content using data identifiers 806
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Spain Value Added Tax Spanish Número IVA españa, Número de Spain VAT number, Spanish VAT
(VAT) Number IVA español, español Número number, VAT Number, VAT, value
IVA, Número de valor agregado, added tax number, value added
IVA, Número IVA, Número tax
impuesto sobre valor añadido,
Impuesto valor agregado,
Impuesto sobre valor añadido,
valor añadido el impuesto, valor
añadido el impuesto numero

Spanish Customer Spanish número cuenta cliente, código Customer account number,
Account Number cuenta, cuenta cliente ID, número account code, customer account
cuenta bancaria cliente, código ID, customer bank account
cuenta bancaria number, bank account code

Spanish DNI ID Spanish NIE número, Documento Nacional NIE number, national identity
de Identidad, Identidad único, document, unique identity,
Número nacional identidad, DNI national identity number, DNI
Número number

Spanish Passport Spanish libreta pasaporte, número passport book, passport number,
Number pasaporte, Número Pasaporte, Spanish passport, passport
España pasaporte, pasaporte

Spanish Social Security Spanish Número de la Seguridad Social, Social security number
Number número de la seguridad social

Spanish Tax ID (CIF) Spanish número de contribuyente, número taxpayer number, corporate tax
de impuesto corporativo, número number, tax identification number,
de Identificación fiscal, CIF CIF number
número, CIFnúmero#

Sri Lanka National Sinhala See user interface ID, national identity number,
Identity Number personal identification number,
National Identity Card number

Sweden Driver's Finnish, Romani, ajokortti, permis de Driver's license, driver's license
License Number Swedish, Yiddish conducere,ajokortin numero, number, driving license number
kuljettajat lic., drivere lic., körkort,
numărul permisului de
conducere, ‫שאָפער דערלויבעניש‬
‫נומער‬, körkort nummer, förare lic.,
‫דריווערס דערלויבעניש‬,
körkortsnummer
Detecting content using data identifiers 807
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Sweden Personal Swedish personnummer ID, personligt ID number, personal ID number,

Identification Number id-nummer, unikt id-nummer, unique ID number, personal,
personnummer, identification number
identifikationsnumret,
personnummer#,
identifikationsnumret#

Sweden Tax Swedish skattebetalarens Tax identification number,

Identification Number identifikationsnummer, Sverige Swedish TIN, TIN number
TIN, TIN-nummer

Sweden Value Added Swedish moms#, sverige moms, sverige Swedish VAT, Swedish VAT
Tax (VAT) Number momsnummer, sverige moms nr, number, VAT registration number
sweden vat nummer, sweden
momsnummmer,
momsregistreringsnummer

Swedish Passport Swedish Passnummer, pass, sverige pass, Passport number, passport,
Number SVERIGE PASS, sverige Swedish passport, Swedish
Passnummer passport number

Switzerland Health German, Italian medizinische Kontonummer, Medical account number, health
Insurance Card Number Krankenversicherungskarte insurance card number, health
Nummer, numero conto medico, insurance number
tessera sanitaria assicurazione
numero, assicurazione sanitaria
numero
Detecting content using data identifiers 808
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Switzerland Passport French, German, Passeport, passeport, numéro Passport, passport number,
Number Italian passeport, numéro de passport # passport book
passeport,passeport#, No de
Passport, passport Number,
passeport, No de passeport.,
passport #
Numéro de passeport,
PASSEPORT, LIVRE DE Passport, passport number,
PASSEPORT passport no., passport #

Pass, Passnummer, Pass#, Pass Passport, passport #

Nr., Pass Nr, PASS

Passaporto, Numero di
passaporto, passaporto,
Passaporto n,Passaporto n.,
passaporto#, Passaport, numero
passaporto, numero di
passaporto, numero passaporto,
passaporto n, PASSAPORTO

Reisepass, Reisepass#,
REISEPASS

Switzerland Value French, German, T.V.A, numéro TVA, T.V.A#, VAT, VAT number, VAT #, value
Added Tax (VAT) Italian numéro taxe valeur ajoutée, added tax number, value added
Number T.V.A., taxe sur la valeur ajoutée, tax, VAT registration number,
T.V.A#, numéro enregistrement
VAT, VAT number, VAT #
TVA, Numéro TVA
VAT, VAT registration number,
I.V.A, Partita IVA, I.V.A#, numero
VAT #, VAT number
IVA

MwSt,
Umsatzsteuer-Identifikationsnummer,
MwSt#, Mehrwertsteuer-Nummer,
Mehrwertsteuer, VAT
Registrierungsnummer,
Umsatzsteuer-Identifikationsnummer

Swiss AHV Number French Numéro AVS, numéro d'assuré, AVS number, insurance number,
identifiant national, numéro national identifier, national
d'assurance vieillesse, numéro insurance number, social security
de sécurité soclale, Numéro AVH number, AVH number

German AHV-Nummer, Matrikelnumme, AHV number, Swiss Registration

Personenidentifikationsnummer number, PIN

Italian AVS, AVH AVS, AVH

Detecting content using data identifiers 809
Modifying system data identifiers

Table 31-22 Keyword list for international PII data identifiers (continued)

Data identifier Language Keywords English translation

Swiss Social Security French, German, Identifikationsnummer, Identification number, social

Number (AHV) Italian sozialversicherungsnummer, security number, personal
identification personnelle ID, identification ID, tax identification
Steueridentifikationsnummer, number, tax ID, social security
Steuer ID, codice fiscale, number, tax number
Steuernummer

Taiwan ROC ID Chinese 中華民國國民身分證 Taiwan ID

(Traditional)

Thailand Passport Thai หนังสือเดิน ทาง Passport, passport number

Number ,หมายเลขหนังสือเดินทาง

Thailand Personal ID Thai ประกันภัยจำนวน, Insurance number, personal

Number หมายเลขประจำตัวส่วนบุคคล, identification, identification number
หมายเลขประจำตัวที่ไม่ซ้ำกัน,
ประกันภัยจำนวน#,
หมายเลขประจำตัวส่วนบุคคล#,
หมายเลขประจำตัวที่ไมซ้ำกัน#

Turkish Identification Turkish Kimlik Numarası, Türkiye Identification number, Turkish

Number Cumhuriyeti Kimlik Numarası, Republic identification number,
vatandaş kimliği, kişisel kimlik citizen identity, personal
no, kimlik Numarası#, vatandaş identification number, citizen
kimlik numarası, Kişisel kimlik identification number
Numarası

Ukraine Identity Card Ukrainian посвідчення особи України Ukraine identity card

Ukraine Passport Ukrainian паспорт, паспорт України, Passport, Ukraine passport,

Number (Domestic) номер паспорта, персональний passport number

Ukraine Passport Ukranian паспорт, паспорт України, Passport, Ukraine passport,

Number (International) номер паспорта passport number

United Arab Emirates Arabic ‫فريدة‬,‫رقم التعريف الشخصي‬,‫ الهوية الشخصية رقم‬Personal ID Number, PIN, Unique
Personal Number ‫هوية‬,‫التأمينرقم‬,‫التأمين رقم‬,‫ من نوعها هوية رقم‬ID Number, Insurance Number,
‫فريدة‬# Unique Identity #

Venezuela National ID Spanish cédula de identidad número, National ID number, national

Number clave única de identidad, identification number, personal ID
personal de identidad clave, number, personal identification,
personal de identidad, número de unique identification number
identificación nacional, número
ID nacional
Detecting content using data identifiers 810
Modifying system data identifiers

Updating policies to use the Randomized US SSN data identifier

The Randomized US Social Security Number (SSN) data identifier detects both traditional and
randomized SSNs.
See “Use the Randomized US SSN data identifier to detect SSNs” on page 836.
All policy templates that previously used the US Social Security Number (SSN) data identifier
to detect SSNs are updated to use the Randomized US Social Security Number (SSN) data
identifier.
See “Updating policies after upgrading to the latest version” on page 447.
If you have existing policies that use the US SSN data identifier to detect SSNs, you should
update each policy to use the Randomized US SSN data identifier. If you have created policies
using the version 12.5 Randomized US SSN data identifier, you should update each to use
the latest version of the Randomized US SSN data identifier.
To update a policy to use the Randomized US SSN data identifier provides steps for updating
your SSN policies.
To update a policy to use the Randomized US SSN data identifier
1 Edit the policy that implements the US SSN data identifier or the 12.5 Randomized US
SSN data identifier.
See “Configuring policies” on page 413.
Refer to the topic "Configuring policies" in the Symantec Data Loss Prevention
Administration Guide and online Help.
2 Edit the rule that contains the US SSN data identifier.
See “Configuring policy rules” on page 417.
Refer to the topic "Configuring policy rules" in the Symantec Data Loss Prevention
Administration Guide and online Help.
3 Remove the US SSN data identifier.
4 Add the Randomized US SSN data identifier.
See “Managing and adding data identifiers” on page 735.
Refer to the topic "Managing and adding data identifiers" in the Symantec Data Loss
Prevention Administration Guide and online Help.
5 Save the policy.
Detecting content using data identifiers 811
Creating custom data identifiers

6 Test policy detection for both traditional and randomized US SSNs.

See “Test and tune policies to improve match accuracy” on page 453.
Refer to the topic "Test and tune policies to improve match accuracy" in the Symantec
Data Loss Prevention Administration Guide and online Help.
7 Deploy the updated SSN policy into production.
See “Policy deployment” on page 373.
Refer to the topic "Policy deployment" in the Symantec Data Loss Prevention Administration
Guide and online Help.

Creating custom data identifiers

You can create and delete one or more custom data identifiers. A custom data identifier may
be a system data identifier that you have cloned and intend to modify, or one that you create
from scratch. A custom data identifier is reusable across policies. Changes made to a custom
data identifier at the system-level affect any policies that actively or subsequently declare the
custom data identifier.
Table 31-23 lists the components of custom data identifiers.
See “Workflow for creating custom data identifiers” on page 812.

Table 31-23 Custom data identifier components

Component Description

Patterns Define one or more data identifier pattern language patterns, separated by line breaks.

See “About data identifier patterns” on page 732.

See “Using the data identifier pattern language” on page 814.

Data Normalizer Select a data normalizer to standardize the data before matching against it.

See “Selecting a data normalizer” on page 830.

Validators Add or remove validators to perform validation checks on the data detected by the
pattern(s).

See “About pattern validators” on page 733.

Validation Checks Select system-provided validation checks to add them to your list of Active Validators.

See “About pattern validators” on page 733.

Description and Data Entry Provide comma-separated data values for any validators that require data input.

See “About pattern validators” on page 733.

Detecting content using data identifiers 812
Creating custom data identifiers

Table 31-23 Custom data identifier components (continued)

Component Description

Pre- and Post-Validators Pre- and post-validators define characters and character ranges that are valid before
or after a data identifier pattern.

See “Configuring pre- and post-validators” on page 831.

Workflow for creating custom data identifiers

You can implement custom data identifiers to detect unique content. To implement a custom
data identifier, you must define at least one pattern and select a data normalizer. Validators
are optional.
See “Custom data identifier configuration” on page 814.
When you define a custom data identifier, the system assigns it to the "Wide" breadth by
default. This is not a limitation, however, because the actual scope of detection is determined
by the pattern(s) and validator(s) that you define.

Table 31-24 Implementing custom data identifiers

Step Action Description

1 Select Manage > Policies > The Data Identifiers screen lists all data identifiers available in the system.
Data Identifiers.

2 Select Add data identifier. Enter a Name for the custom data identifier.

The name must be unique.

Enter a Description for the custom data identifier.

A custom data identifier is assigned to the Custom category by default and

cannot be changed.

The description field is limited to 255 characters per line.

3 Enter one or more Patterns You must enter at least one pattern for the custom data identifier to be valid.
to match data.
Separate multiple patterns by line breaks.

See “Writing data identifier patterns to match data” on page 817.

See “Using the data identifier pattern language” on page 814.

Detecting content using data identifiers 813
Creating custom data identifiers

Table 31-24 Implementing custom data identifiers (continued)

Step Action Description

4 Select a Data Normalizer. You must select a data normalizer.

See “Selecting a data normalizer” on page 830.
The following normalizers are available:

■ Digits
■ Digits and Letters
■ Lowercase
■ Swift codes
■ Do nothing
Select this option if you do not want to normalize the data.

5 Select zero or more Including a validator to check and verify pattern matching is optional.
Validation Checks.
See “Selecting pattern validators” on page 829.

6 Pre- and Post-Validators: Pre- and Post-Validators are required. You can accept the default values,
Specify characters or or edit them as necessary.
character ranges that are
See “Configuring pre- and post-validators” on page 831.
valid or invalid before or after
a data identifier pattern.

7 Save the custom data Click Save at the upper left of the screen.
identifier.
Once you define and save a custom data identifier, it appears alphabetically
in the list of data identifiers at the Data Identifiers screen.

To edit a custom data identifier, select it from the list.

See “Editing data identifiers” on page 736.

Note: Click Cancel to not save the custom data identifier.

8 Implement the custom data The system lists all custom data identifiers beneath the Custom category
identifier in one or more for the "Content Matches data identifier" condition at the Configure Policy
policies. - Add Rule and the Configure Policy - Add Exception screens.

See “Configuring the Content Matches data identifier condition” on page 737.

You can configure optional validators at the policy instance level for custom
data identifiers.

See “Configuring optional validators” on page 763.

Detecting content using data identifiers 814
Creating custom data identifiers

Custom data identifier configuration

You can create and delete one or more custom data identifiers. A custom data identifier can
be used across policies. Changes made to a custom data identifier at the system-level affect
any policies that actively or subsequently declare the custom data identifier.
See “Workflow for creating custom data identifiers” on page 812.

Table 31-25 Custom data identifier configuration

Configurable at the custom level Not configurable

■ Name and Description ■ Category

You must give a custom data identifier a unique The system assigns a custom data identifier to the
name. Custom category. You cannot change this setting.
It is good practice to provide a description for the ■ Breadth
custom data identifier. The system assigns a custom data identifier to the Wide
You can change the name or description of a custom rule breadth. You cannot change this setting.
data identifier when you modify it. ■ Optional Validators
■ Patterns Custom data identifiers support all optional validators, but
You must define at least one pattern for the custom they are configured at the policy instance level.
data identifier to be valid.
■ Active Validators
You can add one or more required validators to a
custom data identifier.
■ Description and Data Entry
You can edit the input of an active validator that
accepts data input.
■ Data Normalizer
You must select a data normalizer when defining a
custom data identifier.
■ Pre- and Post-Validators
You can edit the values for the valid and invalid pre-
and post validator characters.

Using the data identifier pattern language

The data identifier pattern language is a limited subset of the regular expression lexicon. The
data identifier pattern language does not support all of the regular expressions characters and
constructs. A regular expression pattern converted to a data identifier pattern will require some
syntactical modifications.
Data identifier patterns are limited to 100 characters per line. The pattern itself can be more
than 100 characters, but a line cannot have more than 100 characters. You should split the
pattern up by lines no longer than 100 characters.
Detecting content using data identifiers 815
Creating custom data identifiers

See “Input character limits for policy configuration” on page 431.

Table 31-26 lists the known differences between regular expressions and the data identifier
pattern language. For more detailed information about the data identifier pattern language,
see Data identifier pattern language specification.

Table 31-26 Data identifier pattern language limitations

Character Description

* The asterisk (*), pipe (|), and dot (.) characters are not supported for data identifier
patterns.
|

\w The \w construct cannot be used to match the underscore character (_).

\s The \s construct cannot be used to match a whitespace character; instead, use an actual
whitespace.

\d For digits, use the construct \d.

Grouping Grouping only works at the beginning of the pattern, for example:

\d{4} – 2049 does not work; instead use 2049 – \d{4}

\d{2} /19 \d{2} does not work; instead use \d{2} /[1][9] \d{2}

Groupings are allowed at the beginning of the pattern, like in the credit card data identifier.

Data identifier pattern language specification

You can use three types of tokens when defining a data identifier pattern. Tokens are sequences
of non-whitespace characters at the beginning of the file, or preceded by one or more
whitespace characters, followed by whitespace characters or the end of the file. The three
token types that are used in data identifier patterns are:
■ Character literals
■ Bracket expressions
■ Special characters
You can follow each token by an optional quantifier.
See the section called “Quantifiers” on page 817.
Data identifier patterns only match a complete token or set of tokens.

Literal characters, metacharacters, and special characters

Most characters are literal matches in the data identifier pattern language. For example, the
character a in the data identifier pattern matches the character a in your content. The data
Detecting content using data identifiers 816
Creating custom data identifiers

identifier pattern language includes four metacharacters. To match these metacharacters as

character literals, use the backslash to escape the characters in your data identifier pattern.
See Table 31-27 for descriptions of these metacharacters.

Table 31-27 Metacharacters

Character Description

[ This character is used to begin a bracket expression.

{ This character is used to quantify the preceding token.

? This character is used to quantify the preceding token.

\ This character is used to escape the following character.

The data identifier pattern language includes five predefined special characters. See Table 31-28
for descriptions of these special characters.

Table 31-28 Special characters

Character Description

\l This special character matches any ASCII letter.

\L This special character matches any non-ASCII letter character, including

Unicode characters.

\d This special character matches any ASCII digit.

\D This special character matches any non-ASCII digit, including Unicode

characters.

\w This special character matches any character not matched by \l or \d,

including Unicode characters.

Bracket expressions
Bracket expressions begin with [ and end with ], and contain at least one character within in
the body of the expression. For example, the bracket expression [abcd] matches any of the
letters "a," "b," "c," or "d."
You can include a character range within a bracket expression by separating two characters
with a hyphen: -. For example, the bracket expression [a-z] matches the lower-case letters
"a" through "z". Any two characters separated by - are interpreted as a range. The relative
ordering of the range does not matter: [a-z] and [z-a] match the same characters.
You can include the characters "]" and "-" in your bracket expression if you follow these rules:
Detecting content using data identifiers 817
Creating custom data identifiers

■ The "]" character must appear as the first character in your bracket expression. For example:
[]a-z] matches the "]" character or any lower-case letter between "a" and "z."

■ The "-" character must appear as either the first or last character in your bracket expression.
If your bracket expression contains both the "]" and "-" characters, the "]" must be the first
character, and "-" the last character. For example: []-] matches either "]" or "-."

Order of interpretation
Data identifier patters are interpreted from left to right. For example, the bracket expression
[a-d-z] is interpreted as the range a-d and then the literals - and z.

Quantifiers
You can follow any token in your data identifier pattern with a quantifier. The quantifier specifies
how many occurrences of the pattern to match. See Table 31-29 for a description of the
quantifiers available in the data identifier pattern language.

Table 31-29 Quantifiers

Quantifier Description

? This quantifier specifies that the expression should match zero or one
occurrences of the preceding token.

{n} This quantifier specifies that the expression should match exactly n occurrences
of the preceding token.

{n, m} This quantifier specifies that the expression should match between n and m
occurrences of the preceding token (inclusive).

Writing data identifier patterns to match data

If you modify an existing data identifier, you can edit its patterns. If you create a custom data
identifier, you must implement at least one pattern. Data identifier patterns are implemented
using a syntax that is similar to the regular expression language, with limitations. In addition,
the system only allows the use of ASCII characters for data identifier patterns.
See “About data identifier patterns” on page 732.
See “Data identifier pattern language specification” on page 815.
To edit or implement a pattern
1 Review the patterns for the data identifier you want to modify.
See “Selecting a data identifier breadth” on page 739.
2 Consider cloning the data identifier, if you are modifying a system data identifier.
See “Cloning a system data identifier before modifying it” on page 777.
Detecting content using data identifiers 818
Creating custom data identifiers

3 Select Manage > Policies > Data Identifiers in the Enforce Server administration console.
4 Select the data identifier you want to modify.
5 Select the breadth for the data identifier you want to modify.
Generally, patterns vary among detection breadths.
6 In the Patterns field, modify an existing pattern, or enter one or more new patterns,
separated by line breaks.
Data identifier patterns are implemented as regular expressions. However, much of the
regular expression syntax is not supported.
See “Using the data identifier pattern language” on page 814.
7 Click Save to save the data identifier.

Using pattern validators

The following table lists all available pattern validators. Validators marked with an asterisk (*)
beside the name in the table below require data input.

Table 31-30 Available validators for system and custom data identifiers

Validator Description

ABA Checksum Every ABA routing number must start with the following two digits:
00-15,21-32,61-72,80 and pass an ABA specific, position-weighted check sum.

Advanced KRRN Validation Validates that 3rd and 4th digits are a valid month, that 5th and 6th digits are a valid
day, and the checksum matches the check digit.

Advanced SSN Validator checks whether SSN contains zeros in any group, the area number (first
group) is less than 773 and not 666, the delimiter between the groups is the same,
the number does not consist of all the same digits, and the number is not reserved
for advertising (123-45-6789, 987-65-432x).

Argentinian Tax Identity Computes the checksum and validates the pattern against it.
Number Validation Check

Australian Business Number Computes the checksum and validates the pattern against it.
Validation Check

Australian Company Number Computes the checksum and validates the pattern against it.
Validation Check

Australian Medicare Number Computes the checksum and validates the pattern against it.
Validation Check
Detecting content using data identifiers 819
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Australian Tax File validation Computes the checksum and validates the pattern against it.
check

Austria VAT Number Computes the checksum and validates the pattern against it.
Validation Check

Austrian Social Security Computes the checksum and validates the pattern against it.
Number Validation Check

Basic SSN Performs minimal SSN validation.

Belgian National Number Computes the checksum and validates the pattern against it.
Validation Check

Belgian Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Belgium VAT Number Computes the checksum and validates the pattern against it.
Validation Check

Brazil Election Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Brazilian National Registry of Computes the checksum and validates the pattern against it.
Legal Entities Number
Validation Check

Brazilian Natural Person Computes the checksum and validates the pattern against it.
Registry Number Validation
Check

British Columbia Personal Computes the checksum and validates the pattern against it.
Healthcare Number Validation
Check

Bulgaria Value Added Tax Computes the checksum and validates the pattern against it.
(VAT) Number Validation
Check

Bulgarian Uniform Civil Computes the checksum and validates the pattern against it.
Number Validation Check

Burgerservicenummer Check Performs a check for the Burgerservicenummer.

Canada Driver's License Computes the checksum and validates the pattern against it.
Number Check
Detecting content using data identifiers 820
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Chilean National Identification Computes the checksum and validates the pattern against it.
Number Validation Check

China ID checksum validator Computes the checksum and validates the pattern against it.

Codice Fiscale Control Key Computes the control key and checks if it is valid.
Check

Croatia National Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Cusip Validation Validator checks for invalid CUSIP ranges and computes the CUSIP checksum
(Modulus 10 Double Add Double algorithm).

Custom Script* Enter a custom script to validate pattern matches for this data identifier breadth.

See “Creating custom script validators” on page 831.

Cyprus Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Cyprus Value Added Tax Computes the checksum and validates the pattern against it.
(VAT) Number Validation
Check

Czech Personal Identity Computes the checksum and validates the pattern against it.
Number Validation Check

Czech Republic Tax Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Czech Republic VAT Number Computes the checksum and validates the pattern against it.
Validation Check

Denmark Personal Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Denmark Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Denmark VAT Number Computes the checksum and validates the pattern against it.
Validation Check

DNI control key check Computes the control key and checks if it is valid.
Detecting content using data identifiers 821
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Driver's License Number WA Computes the checksum and validates the pattern against it.
State Validation Check

Driver's License Number WI Computes the checksum and validates the pattern against it.
State Validation Check

Drug Enforcement Agency Computes the checksum and validates the pattern against it.
Number Validation Check

Duplicate digits Ensures that a string of digits are not all the same.

Dutch Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Estonia Personal Computes the checksum and validates the pattern against it.
Identification Number Check

Estonia Value Added Tax Computes the checksum and validates the pattern against it.
(VAT) Number Validation
Check

Exact Match* Enter a comma-separated list of values. If the values are numeric, do NOT enter
any dashes or other separators. Each value can be of any length.

Exact Match Data Identifier Looks up tokens around a pattern for the Exact Match Data Identifier index and
Check validates the pattern.

Exclude beginning Enter a comma-separated list of values. If the values are numeric, do NOT enter
characters* any dashes or other separators. Each value can be of any length.
Note: Beginning and ending validators concern the text of the match itself. Prefix
and suffix validators concern characters before and after matched text.

Exclude ending characters* Enter a comma-separated list of values. If the values are numeric, do NOT enter
any dashes or other separators. Each value can be of any length.

Exclude exact match* Enter a comma-separated list of values. Each value can be of any length.

Exclude prefix* Enter a comma-separated list of values. Each value can be of any length.
Note: Prefix and suffix validators concern characters before and after matched text.
Beginning and ending validators concern the text of the match itself.

Exclude suffix* Enter a comma-separated list of values. Each value can be of any length.

Find keywords* Enter a comma-separated list of values. Each value can be of any length.
Detecting content using data identifiers 822
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Finland Driver's Licence Computes the checksum and validates the pattern against it.
Number Validation Check

Finland Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Finland VAT Number Computes the checksum and validates the pattern against it.
Validation Check

Finnish Personal Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

France VAT Number Computes the checksum and validates the pattern against it.
Validation Check

French Social Security Computes the checksum and validates the pattern against it.
Number Validation Check

German ID Number Validation Computes the checksum and validates the pattern against it.
Check

German Passport Number Computes the checksum and validates the pattern against it.
Validation Check

Germany Tax Number Computes the checksum and validates the pattern against it.
Validation Check

Germany VAT Number Computes the checksum and validates the pattern against it.
Validation Check

Greece Social Security Computes the checksum and validates the pattern against it.
Number (AMKA)

Greece VAT Number Computes the checksum and validates the pattern against it.
Validation Check

Greek Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

HCPCS CPT Code Validation Computes the checksum and validates the pattern against it.
Check

Health Care Insurance Computes the checksum and validates the pattern against it.
Number Check

Hong Kong ID Computes the checksum and validates the pattern against it.
Detecting content using data identifiers 823
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Hungarian Social Security Computes the checksum and validates the pattern against it.
Validation Check

Hungarian Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Hungarian VAT Number Computes the checksum and validates the pattern against it.
Validation Check

Hungary Passport Number Computes the checksum and validates the pattern against it.
Validation Check

Iceland National Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Indonesian Kartu Tanda Computes the checksum and validates the pattern against it.
Penduduk Validation Check

INSEE Control Key Validator computes the INSEE control key and compares it to the last 2 digits of the
pattern.

IP Basic Check Every IP address must match the format x.x.x.x and every number must be less than
256.

IP Octet Check Every IP address must match the format x.x.x.x, every number must be less than
256, and no IP address can contain only single-digit numbers (1.1.1.2).

IP Reserved Range Check Checks whether the IP address falls into any of the "Bogons" ranges. If so the match
is invalid.

IPv6 Basic Validation Check Every IPv6 address must match the format xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx
and every number must be lower than ffff.

Ipv6 Medium Validation Check Every IPv6 address must match the format xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx
and every number must be lower than ffff. No IPv6 address can start with 0.

Ipv6 Reserved Validation Every IPv6 address must match the format xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.xxxx
Check and every number must be lower than ffff. No IPv6 address can start with 0. Each
IPv6 address must be fully compressed.

Ireland Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Ireland VAT Number Computes the checksum and validates the pattern against it.
Validation Check
Detecting content using data identifiers 824
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Irish Personal Public Service Computes the checksum and validates the pattern against it.
Number Validation Check

Israel Personal Identity Computes the checksum and validates the pattern against it.
Number Validation Check

Italy VAT Number Validation Computes the checksum and validates the pattern against it.
Check

Japan Driver's License Computes the checksum and validates the pattern against it.
Number Validation Check

Japanese Juki-Net ID Computes the checksum and validates the pattern against it.
Validation Check

Japanese My Number Computes the checksum and validates the pattern against it.
Validation Check

KRRN Foreign Validation Validates that 3rd and 4th digits are a valid month, that 5th and 6th digits are a valid
Check day, and the checksum matches the check digit.

Latvia Personal Code Check Computes the checksum and validates the pattern against it.

Latvia Value Added Tax (VAT) Computes the checksum and validates the pattern against it.
Number Validation Check

Lithuania Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Lithuania Value Added Tax Computes the checksum and validates the pattern against it.
(VAT) Number Validation
Check

Luhn Check Computes the Luhn checksum and validates the matched pattern against it.

Luxembourg National Computes the checksum and validates the pattern against it.
Register of Individuals
Number Validation Check

Luxembourg Tax Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Luxembourg VAT Number Computes the checksum and validates the pattern against it.
Validation Check
Detecting content using data identifiers 825
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Malaysian MyKad Number Computes the checksum and validates the pattern against it.
Validation Check

Malta Value Added Tax (VAT) Computes the checksum and validates the pattern against it.
Number Validation Check

Medicare Beneficiary Identifier Computes the checksum and validates the pattern against it.
Number Validation Check

Mexican CRIP Validation Computes the checksum and validates the pattern against it.
Check

Mexican Tax Identification Computes the checksum and validates the pattern against it.
Validation Check

Mexican Unique Population Computes the checksum and validates the pattern against it.
Registry Code Validation
Check

Mexico CLABE Number Computes the checksum and validates the pattern against it.
Validation Check

Mod 97 Validator Computes the ISO 7064 Mod 97-10 checksum of the complete match.

National Provider Identifier Computes the checksum and validates the pattern against it.
Number Validation Check

National Securities Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Netherlands Bank Account Computes the checksum and validates the pattern against it.
Number Validation Check

Netherlands VAT Number Computes the checksum and validates the pattern against it.
Validation Check

New Zealand National Health Computes the checksum and validates the pattern against it.
Index Number Validation
Check

NIB Number Validation Check Computes the ISO 7064 Mod 97-10 checksum of the complete match of the NIB
Number.

No Validation Performs no validation.

Detecting content using data identifiers 826
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Norway National Identificaiton Computes the checksum and validates the pattern against it.
Number Validation Check

Norway Value Added Tax Computes the checksum and validates the pattern against it.
(VAT) Number Check

Norwegian Birth Number Computes the checksum and validates the pattern against it.
Validation Check

Number Delimiter Validates a match by checking the surrounding digits.

Poland VAT Number Computes the checksum and validates the pattern against it.
Validation Check

Polish ID Number Validation Computes the checksum and validates the pattern against it.
Check

Polish REGON Number Computes the checksum and validates the pattern against it.
Validation Check

Polish Social Security Number Computes the checksum and validates the pattern against it.
Validation Check

Polish Tax ID Number Computes the checksum and validates the pattern against it.
Validation Check

Portugal National Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Portugal Tax and VAT Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Randomized US Social Computes the checksum and validates the pattern against it.
Security Number Validation
Check

Require beginning characters* Enter a comma-separated list of values. If the values are numeric, do NOT enter
any dashes or other separators. Each value can be of any length.

Require ending characters* Enter a comma-separated list of values. If the values are numeric, do NOT enter
any dashes or other separators. Each value can be of any length.

Romania Driver's Licence Computes the checksum and validates the pattern against it.
Number Validation Check
Detecting content using data identifiers 827
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Romania National Computes the checksum and validates the pattern against it.
Identification Number Check

Romania VAT Number Computes the checksum and validates the pattern against it.
Validation Check

Romanian Numerical Personal Computes the checksum and validates the pattern against it.
Code Check

Russian Taxpayer Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

SEPA Creditor Number Computes the checksum and validates the pattern against it.
Validation Check

Serbia Value Added Tax (VAT) Computes the checksum and validates the pattern against it.
Number Validation Check

Singapore NRIC Computes the Singapore NRIC checksum and validates the pattern against it.

Slovakia National Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Slovakia Value Added Tax Computes the checksum and validates the pattern against it.
(VAT) Number Validation
Check

Slovenia Tax Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Slovenia Unique Master Computes the checksum and validates the pattern against it.
Citizen Number Validation
Check

Slovenia Value Added Tax Computes the checksum and validates the pattern against it.
(VAT) Number Validation
Check

South African Personal Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Spain VAT Number Validation Computes the checksum and validates the pattern against it.
Check
Detecting content using data identifiers 828
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

Spanish Customer Account Computes the checksum and validates the pattern against it.
Number Validation Check

Spanish SSN Number Computes the checksum and validates the pattern against it.
Validation Check

Spanish Tax ID Number Computes the checksum and validates the pattern against it.
Validation Check

Sri Lanka National Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

SSN Area-Group number For a given area number (first group), not all group numbers (second group) might
have been assigned by the SSA. Validator eliminates SSNs with invalid group
numbers.

Sweden TaxPayer Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Sweden Value Added Tax Computes the checksum and validates the pattern against it.
Number Validation Check

Swedish Personal Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Swiss AHV Swiss AHV Modulus 11 Checksum.

Swiss Social Security Number Computes the checksum and validates the pattern against it.
Validation Check

Switzerland Value Added Tax Computes the checksum and validates the pattern against it.
(VAT) Number Validation
Check

Taiwan ID Taiwan ID checksum.

Thailand Personal Computes the checksum and validates the pattern against it.
Identification Number
Validation Check

Turkish Identification Number Computes the checksum and validates the pattern against it.
Validation Check

UK Bank Sort Code Check Computes the checksum and validates the pattern against it.
Detecting content using data identifiers 829
Creating custom data identifiers

Table 31-30 Available validators for system and custom data identifiers (continued)

Validator Description

UK Drivers License Every UK drivers license must be 16 characters and the number at the 8th and 9th
position must be larger than 00 and smaller than 32.

UK NHS UK NHS checksum.

UK VAT Number Validation Computes the checksum and validates the pattern against it.
Check

Ukraine Identity Card Check Validates that the first eight digits are a correctly formatted date.

Venezuela Identification Computes the checksum and validates the pattern against it.
Number Validation Check

Verhoeff Validation Check Computes the checksum and validates the pattern against it.

Ukraine Identity Card Check Computes the checksum and validates the pattern against it.

Zip+4 Postal Codes Validation Computes the checksum and validates the pattern against it.
Check

Selecting pattern validators

Symantec Data Loss Prevention provides a comprehensive set of validators to facilitate pattern
matching accuracy.
See “About pattern validators” on page 733.
When you modify a data identifier, the system exposes the active validators used by the data
identifier. When you modify or create a data identifier, the system displays all system-defined
data validators from which you can choose.

Note: The active validators that allow for and define input are not to be confused with the
"Optional validators" that can be configured for any runtime instance of a particular data
identifier. Optional validators are always configurable at the instance level. Active validators
are only configurable at the system level.

Select a validator from the "Validation Checks" list on the left, then click Add Validator to the
right. If the validator requires input, provide the required data using a comma-separated list
and then click Add Validator.
See “Selecting pattern validators” on page 829.
Detecting content using data identifiers 830
Creating custom data identifiers

To select a pattern validator

1 Create a custom data identifier.
See “Workflow for creating custom data identifiers” on page 812.
2 In the Validators section, select the desired validator.
See “About pattern validators” on page 733.
3 If the validator does not require data input, click Add Validator.
The validator is added to the Active Validators list.
4 If the validator requires data input, enter the data values in the Description and Data
Entry field.
5 Edit the input for the validator in the Description and Data Entry field. If you are using
the Find keywords validator, edit the input for the validator in the Description and Data
Entry field. Then select the qualities you want for the keyword:
■ Proximity: Finds a keyword only within the set proximity of the matched patterns.
Check this box and also indicate the Word Distance.
■ Case sensitive: Check this box if you want to search for a case-sensitive match.
■ Highlight keywords in incident: Check this box if you want to highlight the matched
keywords in incidents.

6 Click Add Validator when you are done entering the values.
The validator is added to the Active Validators list.
7 To remove a validator, select it in the Active Validators list and click the red X icon.
8 Click Save to save the configuration of the data identifier.

Selecting a data normalizer

When you create a custom data identifier, you must select a normalizer to reconcile the data
detected by the pattern with the format expected by the validators.
See “Workflow for creating custom data identifiers” on page 812.
Table 31-31 lists and describes the normalizers you can implement for custom data identifiers
.

Note: You cannot modify the normalizer of a system-defined data identifier.

Detecting content using data identifiers 831
Creating custom data identifiers

Table 31-31 Available data normalizers

Normalizer Description

Digits Only numeric characters are allowed.

Digits and Letters Alphanumeric characters are allowed.

Lowercase Only letters are allowed, normalized to lowercase.

Swift codes Code must match SWIFT requirements.

Do nothing The data is not normalized, evaluated as entered by the user.

Creating custom script validators

The custom script validation check lets you enter a custom script to validate pattern matches.
To implement a custom validator, you use the Symantec Data Loss Prevention Scripting
Language.
You can implement a custom script validator in a system data identifier you modify or in a
custom data identifier.

Note: Refer to the Symantec Data Loss Prevention Detection Customization Guide for details
on using the Symantec Data Loss Prevention Scripting Language.

To implement a custom script validator

1 Modify an existing data identifier or create a custom data identifier.
See “Workflow for creating custom data identifiers” on page 812.
2 Select the Custom Script validator from the list of Validation Checks.
3 Enter your custom script in the Description and Data Entry field.
4 Click Add Validator to add the custom validator to the Active Validators list.
5 Click Save to save the configuration of the data identifier.

Configuring pre- and post-validators

Pre- and Post-Validators define characters and character ranges that are valid before or
after a data identifier pattern. They can be helpful for eliminating false-positive detection results.
Acceptable characters for pre- and post-validators include ASCII characters 32 through 126
(as literal characters), and the special characters \S (non-whitespace or Unicode characters)
and \w (any character not matched by a letter or digit). \S is acceptable as a valid or invalid
character for both pre- and post-validators. \w is acceptable as an invalid character for both
Detecting content using data identifiers 832
Creating custom data identifiers

pre- and post-validators. Additionally, the \l (letter) and \d (digit) special characters are
acceptable as invalid pre- or post validator characters.
Though they are not defined here, white spaces such as tabs and new lines are also treated
as valid characters for pre- and post-validators.
Pre- and Post-Validators are required in custom data identifiers. The fields are pre-populated
with default values, but you can edit them as necessary to tune your results.
The default values for the pre- and post-validators are:
Pre-validators:
■ Valid: ,=:#"'()>;@!`~$%^*\S
■ Invalid: \S\w
Post-validators:
■ Valid: ,."'()<;&=@`~\S
■ Invalid: \S\w
The pre- and post-validators only check the character immediately preceding or following the
matched data identifier. In cases where the same characters appear in both the valid and
invalid fields, the valid field takes precedence. For example, where \S (a Unicode character)
appears in both the valid and invalid field for pre-validator characters, Unicode characters will
be considered valid pre-validator characters.

Examples
These examples show some matching and non-matching pre- and post-validators for a 10
digit data identifier pattern \d{10}:

Table 31-32 Pre- and post-validator characters

Character position Valid Invalid

Pre-validator characters !(, \S\w

Post-validator characters ), \S\w

The following strings would match or not match the data identifier pattern based on the
preceding or following characters as described here:
Detecting content using data identifiers 833
Best practices for using data identifiers

Table 31-33 Pre- and post-validator pattern matching examples

String Pattern match condition Description

A1234567890 No match The character A preceding the

\d{10} pattern is not a valid
pre-validator character, so the
pattern does not match.

!1234567890 Match The character ! preceding the

\d{10} pattern is a valid
pre-validator character, so the
pattern matches.

1234567890} No match The character } following the

\d{10} pattern is not a valid
post-validator character, so the
pattern does not match.

(1234567890) Match The character ( preceding the

\d{10} pattern is a valid
pre-validator character. The
character ) following the pattern
is a valid post-validator character.
Because both characters are
valid, the pattern matches.

@1234567890 No match The character @ preceding the

\d{10} pattern is not a valid
pre-validator character, so it does
not override the invalid special
characters \S\w. The pattern
does not match.

,1234567890, Match The character , is a valid pre- and

post-validator character, so the
pattern matches.

1234567890 Match The \d{10} pattern has no

preceding or following character,
so the pattern matches.

Best practices for using data identifiers

Data identifiers are algorithms that combine pattern matching with data validators to detect
content. Symantec Data Loss Prevention provides a number of system-defined data identifiers
for common data patterns, such as SSNs, Tax IDs, and more. In addition, you can define your
Detecting content using data identifiers 834
Best practices for using data identifiers

own custom data identifiers to match any data you can describe using the data identifier pattern
language. Data identifiers are commonly used to detect personally identifiable information
(PII).
This section provides best practices for implementing data identifier policies.
Table 31-34 summarizes the best practices in this section.

Table 31-34 Summary of data identifier best practices

Best practice Description

Use data identifiers instead of regular expressions when See “Use data identifiers instead of regular expressions
possible. to improve accuracy” on page 834.

Modify data identifier definitions when you want tuning to See “Modify data identifier definitions when you want tuning
apply globally. to apply globally” on page 835.

Close system-defined data identifiers before modifying See “Clone system-defined data identifiers before
them. modifying to preserve original state” on page 835.

Consider using multiple data identifier breadths in parallel. See “Consider using multiple breadths in parallel to detect
different severities of confidential data” on page 836.

Avoid matching on the Envelope over HTTP. See “Avoid matching on the Envelope over HTTP to reduce
false positives” on page 836.

Use the Randomized US SSN data identifier to detect See “Use the Randomized US SSN data identifier to detect
traditional and randomized SSNs. SSNs” on page 836.

Use unique match counting to improve accuracy and ease See “Use unique match counting to improve accuracy and
remediation. ease remediation” on page 837.

Use data identifiers instead of regular expressions to improve

accuracy
Data identifiers are designed to protect personally identifiable information (PII) with very good
accuracy (<10% false positive rate). If a data identifier is available for the type of content you
want to protect, you should use the data identifier instead of a regular expression because
data identifiers are more efficient than regular expressions. Out-of-the-box data identifier
patterns are tuned for accuracy, including region, industry, and country nuances. In addition,
data identifiers include validation checks to verify the data that is matched by the pattern. This
additional layer of intelligence screens out test data and other triggers of false positive incidents.
Regular expressions, on the other hand, can be computationally expensive and can lead to
increased false positives.
For example, if you want to detect social security numbers (SSN), you use the Randomized
US SSN data identifier instead of a regular expression pattern. The Randomized US SSN data
Detecting content using data identifiers 835
Best practices for using data identifiers

identifier is more accurate than any regular expression you can write and much easier and
quicker to implement.

Note: The data identifier pattern language is a limited subset of the regular expression language.
Not all regular expression constructs or characters are supported for data identifier patterns.
See “Using the data identifier pattern language” on page 814.

Clone system-defined data identifiers before modifying to preserve

original state
Before you modify a system data identifier or create a custom data identifier, consider the
following:
■ If you want to modify a system data identifier, manually clone it as a custom data identifier
and then modify the cloned copy. In this fashion you preserve the state of the original
system-defined data identifier.
■ Data identifiers do not export as part of a policy template. As such, you should add the
data identifier to a policy and export the policy as a template before modifying the data
identifier.
An exported template contains a reference to each data identifier that is implemented in
that policy. On import to a target system, the template uses a reference to select the local
data identifier. If the system data identifier is modified, on import it is not by the target
system.
See “Cloning a system data identifier before modifying it” on page 777.

Modify data identifier definitions when you want tuning to apply

globally
Data identifiers offer two levels of configuration:
■ Definitions
■ Instances
Data identifier definitions are configured at the system-level of the Enforce Server. At the
definition level you can tune the data that is supplied by any required validator that the definition
declares at this level, as well as what validators are used.
Data identifier instances are only configured at the policy rule level. Any configurations that
are made at the rule level are local in scope and applicable only to that policy. At the rule level
you use optional validators, such as require or exclude beginning or ending characters, to tune
the instance of the data identifier rule.
Detecting content using data identifiers 836
Best practices for using data identifiers

The general recommendation is to configure data identifier definitions so that the changes
apply globally to any instance of that data identifier definition. Such configurations are reusable
across policies. Rule-level optional validators, such as, should be used for unique policies.

Consider using multiple breadths in parallel to detect different

severities of confidential data
Matching data identifiers against content often requires fine-tuning as you adjust the
configuration to keep both false positives and false negatives to a minimum. After you configure
an instance of the Content Matches Data Identifier condition, study the matches and adjust
the configuration to ensure optimum data matching success.
Consider adjusting the data identifier breadth you use if the data identifier produces too many
false positive or negatives. For example, if you use a wide breadth and receive many false
positives, consider using a medium breadth or narrow breadth.
See “About data identifier breadths” on page 731.
As an alternative approach, consider using multiple data identifier breadths in parallel in the
same rule with different severity levels for each rule. For example, in a single policy that is
designed to detect credit card numbers, you can add three rules to the policy, each using a
different breadth (one wide, one medium, one narrow). You would then set the severity for the
narrow to be high severity incidents, and the wide to be low severity incidents. Using this
layered approach lets you survey the data flowing through the enterprise using a policy that
covers both ends of spectrum. You can use this sampling-based approach to focus your
remediation efforts on the highest-priority incidents while still detecting and being able to review
low-severity incidents.

Avoid matching on the Envelope over HTTP to reduce false positives

Sometimes HTTP transmissions contain session IDs in the header that can trigger false
positives for numeric data identifiers. For example, some social media sites such as Facebook
and LinkedIn contain a session ID that may at times match the CCN and SSN data identifiers
exactly, causing false positives.
To reduce false positives in connection with HTTP session IDs in the message header, the
best practice is not to match on the “Envelope” message component when you implement
numeric data identifiers, specifically the CCN or SSN data identifiers.

Use the Randomized US SSN data identifier to detect SSNs

In 2011, the United States Social Security Administration (SSA) began issuing randomized
SSNs. Under this scheme, the high group number (second part of the SSN) no longer
corresponds to the area number (first part of the SSN). Also, the range of the area number
Detecting content using data identifiers 837
Best practices for using data identifiers

can go up to 899 instead of 773. Randomization applies to SSNs issued on or after June 25,
2011. It does not apply to SSNs issued before that date.
To support the new randomized SSN scheme, Symantec Data Loss Prevention provides the
system-defined Randomized US Social Security Number (SSN) data identifier.
See “Randomized US Social Security Number (SSN)” on page 1414.
The Randomized US SSN data identifier detects both traditional and randomized SSNs. The
Randomized US SSN data identifier replaces the US SSN data identifier, which only detects
traditional SSNs.
Symantec recommends that you use the Randomized US SSN data identifier for all new
policies that you want to use to detect SSNs, and that you update your existing SSN policies
to use the Randomized US SSN data identifier. For your existing policies that already implement
the traditional US SSN data identifier, you can add the Randomized US SSN data identifier
as an OR'd rule so that both run in parallel as you test the policy to ensure it accurately detects
both styles of SSNs.
See “Updating policies to use the Randomized US SSN data identifier” on page 810.

Use unique match counting to improve accuracy and ease

remediation
The data identifier rule configuration, by default, counts only unique matches. With this option
only unique matches are reported as the first match found in the message or message
component. Only unique matches are counted and highlighted. You can also choose the option
which counts all matches.
The best practice is to use unique match counting when you only care about unique matches,
not duplicate matches. For example, if you are using the Credit Card Numbers data identifier
to protect credit card numbers, and you only care if a document contains 25 or more unique
numbers, you can use count all unique matches instead of the count all matches option. If you
counted all matches, a document containing 25 of the same CCNs would trigger the policy,
which is not the objective of your policy.
See “About unique match counting” on page 734.
Chapter 32
Detecting content using
keyword matching
This chapter includes the following topics:

■ Introducing keyword matching

■ Configuring keyword matching

■ Best practices for using keyword matching

Introducing keyword matching

Symantec Data Loss Prevention provides the Content Matches Keyword policy condition for
keyword detection.
To detect data loss using keyword matching, the detection engine compares inbound messages
or message components against each keyword in a list of one or more keywords or keyword
phrases. Keyword matching supports both whole word and partial word matching, as well as
word proximity. Keyword matching is supported on the server and on the endpoint. Unique
match counting is supported for keywords.
See “Using unique match counting” on page 775.
Table 32-1 lists typical keyword matching use cases.
Detecting content using keyword matching 839
Introducing keyword matching

Table 32-1 Keyword matching use cases

Configuration Typical use

Whole word matching Languages based on the Latin alphabet

UTF-8 characters

Chinese, Japanese, and Korean (CJK) languages with token verification enabled for the
server

CJK keywords on the endpoint

See “About keyword matching for Chinese, Japanese, and Korean (CJK) languages”
on page 839.

Partial word matching Languages based on the Latin alphabet

Mixed languages

See “Keyword matching examples” on page 841.

About keyword matching for Chinese, Japanese, and Korean (CJK)

languages
Symantec Data Loss Prevention detection servers support natural language processing for
Chinese, Japanese, and Korean (CJK) keywords. When natural language processing for CJK
languages is enabled, the detection server validates CJK tokens before reporting a match.
For CJK languages, a token is a single character which constitutes a word. Thus, partial word
matching does not apply to CJK languages.
Token validation for CJK keywords is only supported for detection servers and is disabled by
default. You must enable token validation for each detection server. In addition you must match
on whole words for token validation to apply.
On the endpoint you can use whole word matching for CJK keywords.
Table 32-2 summarizes keyword matching use cases for CJK languages.

Table 32-2 Keyword matching use cases for CJK languages

Detection component Use case

Server Enable token verification on the detection server and use whole word matching

See “Enabling and using CJK token verification for server keyword matching” on page 847.

Endpoint Use whole word matching

See “Keyword matching examples for CJK languages” on page 842.

Detecting content using keyword matching 840
Introducing keyword matching

About keyword proximity

Using keyword proximity, a policy author can define a pair of keywords and specify a word
range between them. If the words occur within that range, a match is triggered. For example,
an instance of the Content Matches Keyword condition might require that any instance of
the words “confidential” and “information” occurring within 10 words of each other triggers a
match.
Alternatively, you can use keyword proximity to exclude matching words within a specified
distance by using the Content Matches Keyword condition as a detection exception. In this
case any occurrence of the words “confidential” and “information” within 10 words of each is
excepted from matching.
For Chinese, Japanese, and Korean (CJK) languages, a single CJK character is counted as
one word.
See “Keyword matching syntax” on page 840.
See “Keyword matching examples” on page 841.
See “Configuring the Content Matches Keyword condition” on page 844.

Keyword matching syntax

When you define a keyword rule, the system evaluates every keyword in the condition list
against each message component (header, subject, body, attachment).
Consider the following syntactical guidelines when creating keyword lists.

Table 32-3 Keyword matching syntax

Behavior Description

Whole word matching With whole word matching, keywords match at word boundaries only (\W in the regular
expression lexicon). Any characters other than A-Z, a-z, and 0-9 are interpreted as word
boundaries.

With whole word matching, keywords must have at least one alphanumeric character (a letter
or a number). A keyword consisting of only white-space characters, such as "..", is ignored.

Quotation marks Do not use quotation marks when you enter keywords or phrases because quotes are interpreted
literally and will be required in the match.

White space The systems strips out the white space before and after keywords or key phrases. Each
whitespace within a keyword phrase is counted. In addition to actual spaces, all characters
other than A-Z, a-z, and 0-9 are interpreted as white spaces.

Case sensitivity The case sensitivity option that you choose applies to all keywords in the list for that condition.
Detecting content using keyword matching 841
Introducing keyword matching

Table 32-3 Keyword matching syntax (continued)

Behavior Description

Plurals and verb All plurals and verb inflections must be specifically listed. If the number of enumerations
inflections becomes complicated use the wildcard character (asterisk [*]) to detect a keyword suffix (in
whole word mode only).

Keyword phrases You can enter keyword phrases, such as social security number (without quotes). The system
looks for the entire phrase without returning matches on individual constituent words (such as
social or security).

Keyword variants The system only detects the exact keyword or key phrase, not variants. For example, if you
specify the key phrase social security number, detection does not match a phrase that
contains two spaces between the words.

Matching multiple The system implies an OR between keywords. That is, a message component matches if it
keywords contains any of the keywords, not necessarily all of them. To perform an ALL (or AND) keyword
match, combine multiple keyword conditions in a compound rule or exception.

Alpha-numeric During keyword matching, only a letter or a digit is considered a valid keyword start position.
characters Special characters (non-alphanumeric) are treated as delimiters (ignored). For example, the
ampersand character ("&") and the underscore character ("_") are special characters and are
not considered for keyword start position.

For example, consider the following:

____keyword__

Keyword

&&akeyword&&

123Keyword__

For these examples, the valid keyword start positions are as follows: k, K, a, and 1.
Note: This same behavior applies to keyword validators implemented in data identifiers.

Proximity The word distance (proximity value) is exclusive of detected keywords. Thus, a word distance
of 10 allows for a proximity window of 12 words.

Keyword matching examples

To implement keyword matching, you can enter one or more keywords or phrases, each
separated by a comma or newline character. You can match on whole or partial words, and
specify case sensitivity. You can use the asterisk (*) wildcard character to detect a keyword
suffix (in whole word mode only).
See “Keyword matching syntax” on page 840.
Detecting content using keyword matching 842
Introducing keyword matching

Table 32-4 Keyword matching examples

Keyword type Keyword(s) Matches Does Not Match

keyword confidential confidential confidentially (in

whole word mode
-confidential;
only, otherwise it
®"confidential" would match)
®Confidential

®CONFIDENTIAL

key phrase internal use only internal use only internal use

internal use ONLY (if case

insensitive is selected)

keyword list Newline delimited: Comma delimited: hacks hackers

hack hack, hacker, hacks hack shack

hacker hacker

hacks

keyword with wildcard priv* private prize

privilege prevent

privy

privity

privs

priv

keyword dictionary account number, account ps, american If any keyword or phrase is amx
express, americanexpress, amex, bank present, the data is matched:
creditcard
card, bankcard, card num, card number,
cc #, cc#, ccn, check card, checkcard, amex master card
credit card, credit card #, credit card credit card car
number, credit card#, debit card,
debitcard, diners club, dinersclub, mastercard
discover, enroute, japanese card bureau,
jcb, mastercard, mc, visa, (etc....)

Keyword matching examples for CJK languages

Table 32-5 provides keyword matching examples for Chinese, Japanese, and Korean
languages. All examples assume that the keyword condition is configured to match on whole
words only.
Detecting content using keyword matching 843
Introducing keyword matching

If token verification is enabled, the message size must be sufficient for the token validator to
recognize the language. For example: the message “東京都市部の人口” is too small fo a
message for the token validation process to recognize the language of the message. The
following message is a sufficient size for token validation processing:
今朝のニュースによると東京都市部の人口は増加傾向にあるとのことでした。全国的な人口
減少の傾向の中、東京への一極集中を表しています。
See “About keyword matching for Chinese, Japanese, and Korean (CJK) languages”
on page 839.
Token validation for CJK language keywords is not available on the endpoint. To match CJK
on the endpoint, you configure the condition to match on whole words only.

Table 32-5 Keyword matching examples for CJK

Language Keyword Matches on server with Matches on server Matches on endpoint

token validation ON with token validation
OFF

Chinese 通信数字无线通信数字无线通信交通信息数字无线通信交通信息网

网站站

Japanese 京都市京都府京都市左京区京都府京都市左京区東京都府京都市左京区東京

京都市部の人口都市部の人

Korean 정부 정부의 방침 정부의 방침 의정부 경전 정부의 방침 의정부 경전

철 철

About updates to the Drug, Disease, and Treatment keyword lists

The Drug, Disease, and Treatment keyword lists are updated with current terminology based
on information from the U.S. Federal Drug Administration (FDA) and other sources. The Drug,
and Disease, and Treatment keyword lists are used by the HIPAA and HITECH (including
PHI) and Caldicott Report policy templates.
When you upgrade your Data Loss Prevention system, the generic, system-defined HIPAA
and Caldicott policy templates are updated with the recent Drug, Disease, and Treatment
keyword lists. However, policies you have created based on the HIPAA or Caldicott policy
templates are not automatically updated. This behavior is expected so that any changes or
customizations you have made to your HIPAA or Caldicott policy templates are not overwritten
by updates to the system-defined templates. Updating the Drug, Disease, and Treatment
keyword lists for your HIPAA and Caldicott policy templates is a manual process that you
should perform to ensure your HIPAA or Caldicott policies are up to date.
See “Updating the Drug, Disease, and Treatment keyword lists for your HIPAA and Caldicott
policies” on page 848.
Detecting content using keyword matching 844
Configuring keyword matching

See “Keep the keyword lists for your HIPAA and Caldicott policies up to date” on page 850.
See “HIPAA and HITECH (including PHI) policy template” on page 1690.
See “Caldicott Report policy template” on page 1561.

Configuring keyword matching

Table 32-6 describes the components for implementing keyword matching.

Table 32-6 Implementing keyword matching

Keyword matching feature Description

Match on whole or partial keywords Separate each keyword or phrase by a newline or comma.
and key phrases
See “Keyword matching examples” on page 841.

Match on the wildcard asterisk (*) Match the wildcard at the end of a keyword, in whole word mode only.
character
See “Keyword matching examples” on page 841.

Keyword proximity matching Match across a range of keywords.

See “About keyword proximity” on page 840.

Find keywords Implement one or more keywords in data identifiers to refine the scope of
detection.

See “Introducing data identifiers” on page 717.

Policy rules and exceptions You can implement keyword matching conditions in policy rules and exceptions.

See “Configuring the Content Matches Keyword condition” on page 844.

Cross-component matching Keyword matching detects on one or more message components.

See “Detection messages and message components” on page 391.

Keyword dictionary If you have a large dictionary of keywords, you can index the keyword list.

See “Use VML to generate and maintain large keyword dictionaries” on page 851.

CJK token verification Enable on the detection server for CJK languages and match on whole words
only.

See Table 32-2 on page 839.

Configuring the Content Matches Keyword condition

The Content Matches Keyword condition lets you match content using keywords and key
phrases.
Detecting content using keyword matching 845
Configuring keyword matching

See “Introducing keyword matching” on page 838.

You can implement keyword matching conditions in policy rules and exceptions.
See “Configuring policies” on page 413.
To configure the Content Matches Keyword condition
1 Add a new keyword condition to a policy rule or exception, or modify an existing one.
See “Configuring policy rules” on page 417.
See “Configuring policy exceptions” on page 426.
2 Configure the keyword matching parameters.
See Table 32-7 on page 845.
See “Keyword matching syntax” on page 840.
3 Save the policy.

Table 32-7 Configure the Content Matches Keyword condition

Action Description

Enter the match type. Select if you want the keyword match to be:

Case Sensitive or Case Insensitive

Case insensitive is the default.

Choose the keyword Select the keyword separator you to delimit multiple keywords:
separator.
Newline or Comma.

Newline is the default.

Match any keyword. Enter the keyword(s) or key phrase(s) you want to match. Use the separator you have selected
(newline or comma) to delimit multiple keyword or key phrase entries.

You can use the asterisk (*) wildcard character at the end of any keyword to match one or more
suffix characters in that keyword. If you use the asterisk wildcard character, you must match
on whole words only. For example, a keyword entry of confid* would match on "confidential"
and "confide," but not "confine." As long as the keyword prefix matches, the detection engine
matches on the remaining characters using the wildcard.

See “Keyword matching syntax” on page 840.

See “Keyword matching examples” on page 841.

Detecting content using keyword matching 846
Configuring keyword matching

Table 32-7 Configure the Content Matches Keyword condition (continued)

Action Description

Configure keyword Keyword proximity matching lets you specify a range of detection among keyword pairs.
proximity matching
See “About keyword proximity” on page 840.
(optional).
To implement keyword proximity matching:

■ Select (check) the Keyword Proximity matching option in the "Conditions" section of the
rule builder interface.
■ Click Add Pair of Keywords.
■ Enter a pair of keywords.
■ Specify the Word distance.
The maximum distance between keywords is 999, as limited by the three-digit length of the
“Word distance” field. The word distance is exclusive of detected keywords. For example,
a word distance of 10 allows for a range of 12 words, including the two words comprising
the keyword pair.
■ Repeat the process to add additional keyword pairs.
The system connects multiple keyword pair entries the OR Boolean operator, meaning that
the detection engine evaluates each keyword pair independently.

Match on whole or Select the option On whole words only to match on whole keywords only (by default this
partial keywords. option is selected).

You must match on whole words only if you use the asterisk (*) wildcard character in any
keyword you enter in the list.

See “Keyword matching examples” on page 841.

You must match on whole words only if you have enabled token validation for the server.

See “Keyword matching examples for CJK languages” on page 842.

Configure match Keyword matching lets you specify how you want to count condition matches.
conditions. Select one of the following options:

■ Check for existence

The system reports one incident for all matches.
■ Count all matches and only report incidents with at least 1 matches (default)
With the default setting the system reports one incident for each match. Alternatively, you
can configure the match threshold by changing the default value from 1 to another value.

See “Configuring match counting” on page 421.

Detecting content using keyword matching 847
Configuring keyword matching

Table 32-7 Configure the Content Matches Keyword condition (continued)

Action Description

Select components Keyword matching detection supports matching across message components.
to match on.
See “Selecting components to match on” on page 423.
Select one or more message components to match on:

■ Envelope – Header metadata used to transport the message

■ Subject – Email subject of the message (only applies to SMTP)
■ Body – The content of the message
■ Attachments – Any files attached to or transferred by the message

Note: On the endpoint the DLP Agent matches on the entire message, not individual
components.

See “Detection messages and message components” on page 391.

Also match one or Select this option to create a compound condition. All conditions must be met to report a match.
more additional
You can Add any available condition from the list.
conditions.
See “Configuring compound match conditions” on page 429.

Enabling and using CJK token verification for server keyword

matching
To use token verification for Chinese, Japanese, and Korean (CJK) languages you must enable
it on the server and you must use whole word matching for the keyword condition. In addition,
there must be a sufficient amount of message text for the system to recognize the language.
See “Keyword matching examples for CJK languages” on page 842.
Table 32-8 lists and describes the detection server parameter that lets you enable token
verification for CJK languages.

Table 32-8 Keyword token verification parameter

Setting Default Description

Keyword.TokenVerifierEnabled false Default is disabled ("false").

If enabled ("true"), the server validates tokens for Chinese,

Japanese, and Korean language keywords.

Enable keyword token verification for CJK describes how to enable and use token verification
for CJK keywords.
Detecting content using keyword matching 848
Configuring keyword matching

Enable keyword token verification for CJK

1 Log on to the Enforce Server as an administrative user.
2 Navigate to the System > Servers and Detectors > Overview > Server/Detector Detail
- Advanced Settings screen for the detection server or detector you want to configure.
See “Advanced server settings” on page 285.
3 Locate the parameter Keyword.TokenVerifierEnabled.
4 Change the value to true from false (default).
Setting the server parameter Keyword.TokenVerifierEnabled = true enables token
validation for CJK keyword detection.
5 Save the detection server configuration.
6 Recycle the detection server.
7 Configure a keyword condition using whole word matching.
In the condition the option Match On whole word only is checked.
See “Configuring the Content Matches Keyword condition” on page 844.

Updating the Drug, Disease, and Treatment keyword lists for your
HIPAA and Caldicott policies
If you have created a policy derived from the HIPAA or Caldicott template and have not made
any changes or customizations to the derived policy, after upgrade you can create a new policy
from the appropriate template and remove the old policy from production. If you have made
changes to a policy derived from either the HIPAA or Caldicott policy template and you want
to preserve these changes, you can copy the updated keyword lists from either the HIPAA or
Caldicott policy template and use the copied keyword lists to update your HIPAA or Caldicott
policies.
See “About updates to the Drug, Disease, and Treatment keyword lists” on page 843.
See “Keep the keyword lists for your HIPAA and Caldicott policies up to date” on page 850.
To update the Drug, Disease, and Treatment keyword lists for HIPAA and Caldicott policies
provides instructions for updating the keyword lists for your HIPAA and Caldicot policies.
To update the Drug, Disease, and Treatment keyword lists for HIPAA and Caldicott policies
1 Create a new policy from a template and choose either the HIPAA or Caldicott template.
See “Creating a policy from a template” on page 397.
2 Edit the detection rules for the policy.
See “Configuring policy rules” on page 417.
Detecting content using keyword matching 849
Best practices for using keyword matching

3 Select the Patient Data and Drug Keywords (Keyword Match) rule.
4 Select the Content Matches Keyword condition.
5 Select all the keywords in the Match any Keyword data field and copy them to the
Clipboard.
6 Paste the copied keywords to a text file named Drug Keywords.txt.
7 Cancel the rule edit operation to return to the policy Detection tab.
8 Repeat the same process for the Patient Data and Treatment Keywords (Keyword
Match) rule.
9 Copy and paste the keywords from the condition to a text file named Treatment
Keywords.txt.

10 Repeat the same process for the Patient Data and Disease Keywords (Keyword Match)
rule.
11 Copy and paste the keywords from the condition to a text file named Disease
Keywords.txt.

12 Update your HIPAA and Caldicott policies derived from the HIPAA or Caldicott templates
using the keyword *.txt files you created.
13 Test your updated HIPAA and Caldicott policies.

Best practices for using keyword matching

The Content Matches Keyword condition lets you match content using keywords, key phrases,
and keyword lists or dictionaries. On the server, the keyword rule matches on the header,
subject, body and attachment message components, and it supports cross-component matching.
On the endpoint the keyword condition matches on the entire message.
Table 32-9 summarizes the keyword matching best practices in this section.

Table 32-9 Summary of keyword matching best practices

Best practice More information

Enable linguistic validation for CJK keyword See “Enable token verification on the server to reduce false
detection on the server. positives for CJK keyword detection” on page 850.

Update keyword lists for your Caldicott and HIPAA See “Keep the keyword lists for your HIPAA and Caldicott policies
policies. up to date” on page 850.

Tune keyword validators to improve data identifier See “Tune keywords lists for data identifiers to improve match
accuracy. accuracy” on page 851.
Detecting content using keyword matching 850
Best practices for using keyword matching

Table 32-9 Summary of keyword matching best practices (continued)

Best practice More information

Use VML to profile long keyword lists and See “Use VML to generate and maintain large keyword
dictionaries dictionaries” on page 851.

Use keyword matching for metadata detection. See “Use keyword matching to detect document metadata”
on page 851.

Enable token verification on the server to reduce false positives for

CJK keyword detection
Symantec Data Loss Prevention provides token validation for Chinese, Japanese, and Korean
(CJK) languages. Token validation is supported for detection servers and must be enabled.
See “About keyword matching for Chinese, Japanese, and Korean (CJK) languages”
on page 839.
Token validation lets you match CJK keywords using whole word matching, and improves
overall match accuracy for CJK languages. Although there may be a slight performance hit,
you should enable token verification for each detection server where CJK keyword conditions
are deployed. Once enabled you can use whole word matching for CJK keywords.
See “Enabling and using CJK token verification for server keyword matching” on page 847.

Keep the keyword lists for your HIPAA and Caldicott policies up to
date
For each Symantec Data Loss Prevention relese, the Drug, Disease, and Treatment keyword
lists are updated based on information from the U.S. Federal Drug Administration (FDA) and
other sources. These keyword lists are used in the HIPAA and HITECH (including PHI) and
Caldicott Report policy templates.
See “About updates to the Drug, Disease, and Treatment keyword lists” on page 843.
If you have upgraded to the latest Data Loss Prevention version and you have existing policies
derived from either the HIPAA or Caldicott policy template, consider updating your HIPAA and
Caldicott policies to use the Drug, Disease, and Treatment keyword lists provided with this
Data Loss Prevention version.
See “Updating the Drug, Disease, and Treatment keyword lists for your HIPAA and Caldicott
policies” on page 848.
Detecting content using keyword matching 851
Best practices for using keyword matching

Tune keywords lists for data identifiers to improve match accuracy

Many data identifier definitions contain required keyword validators with pre-populated keyword
lists. In addition, you can add your own list of keywords to a data identifier rule. The best
practice is tune the keyword list using a keyword matching condition before you add the keyword
list to the data identifier condition as a required or optional validator
See “Using pattern validators” on page 818.
To tune the keyword list, take the keywords you want to use for the validator and put them into
a separate keyword matching rule condition and policy. Then test the policy using data that
should and should not match the keywords. The keyword rule will let you see match highlighting
and tune the keyword list. Once tested, you can add the keywords to the data identifier and
then test the data identifier policy to ensure accuracy.

Use keyword matching to detect document metadata

Symantec Data Loss Prevention supports metadata detection for certain document formats,
such as DOCX and PDF. Detection servers and DLP Agents support metadata detection.
If you want to detect document metadata, the recommendation is to enable it for the server or
endpoint and use the Content Matches Keyword condition to match metadata tags.

Use VML to generate and maintain large keyword dictionaries

Sometimes you may want to protect a long list or dictionary of keywords. An example might
be a list of project code names. You can use Vector Machine Learning (VML) to automate the
detection of long keyword lists that are difficult to generate, tune, and maintain. For example,
you could generate a VML profile based on a collection of documents containing the keywords
you want to detect. If you want to detect common words, remove them from the VML stopword
file.
See “Best practices for using VML” on page 687.
Chapter 33
Detecting content using
regular expressions
This chapter includes the following topics:

■ Introducing regular expression matching

■ About the updated regular expression engine

■ About writing regular expressions

■ Configuring the Content Matches Regular Expression condition

■ Best practices for using regular expression matching

Introducing regular expression matching

Data Loss Prevention provides the Content Matches Regular Expression policy match
condition to match message content using the regular expression pattern language.
Regular expressions provide a mechanism for identifying strings of text, such as particular
characters, words, or patterns of characters. You can use the regular expression condition to
match (or exclude from matching) characters, patterns, and strings. Unique match counting
is supported for regular expressions.
See “Using unique match counting” on page 775.
See “Configuring the Content Matches Regular Expression condition” on page 854.
See “Best practices for using regular expression matching” on page 855.
Detecting content using regular expressions 853
About the updated regular expression engine

About the updated regular expression engine

Detection servers and endpoint agents use a common regular expression engine. This common
engine performs regular expression evaluation at a faster rate than previous engines. You will
also notice performance improvements when you have DLP policy sets with many regex rules,
since adding more rules doesnt incur much of a performance cost.

About writing regular expressions

Symantec Data Loss Prevention implements the PCRE-compatible regular exp'ression syntax
for policy condition matching. Table 33-1 provides some reference constructs for writing regular
expressions to match or exclude characters in messages or message components.
See “Introducing regular expression matching” on page 852.

Note: Data Identifier pattern matching is based on the regular expression syntax. However,
not all regular expression constructs listed in the table below are supported by Data Identifier
patterns. See “About data identifier patterns” on page 732.

Table 33-1 Regular expression constructs

Regular expression Description

construct

. Any single character (except for newline characters)

Note: The use of the dot (.) character is not supported for data identifier patterns.

\d Any digit (0-9)

\s Any white space

\w Any word character (a-z, A-Z, 0-9, _)

Note: The use of the \w construct does not match the underscore (_) character when
implemented in a data identifier pattern.

\D Anything other than a digit

\S Anything other than white space

[] Elements inside brackets are a character class (For example, [abc] matches 1 character:
a, b, or c.)

^ At the beginning of a character class, negates it (For example, [^abc] matches anything
except a, b, or c.)
Detecting content using regular expressions 854
Configuring the Content Matches Regular Expression condition

Table 33-1 Regular expression constructs (continued)

Regular expression Description

construct

+ Following a regular expression means 1 or more (For example, \d+ means 1 or more digit.)

? Following a regular expression means 0 or 1 (For example, \d? means 1 or no digits.)

* Following a regular expression means any number (For example, \d* means 0, 1, or more
digits.)

(?i) At the beginning of a regular expression makes the expression case-insensitive (Regular
expressions are case-sensitive by default.)

(?: ) Groups regular expressions together (The ?: is a slight performance enhancement.)

(?u) Makes a period (.) match even newline characters

| Means OR (For example, A|B means regular expression A or regular expression B.)

Configuring the Content Matches Regular Expression

condition
You use the Content Matches Regular Expression condition to match (or exclude from
matching) characters, patterns, and strings using regular expressions.
See “Introducing regular expression matching” on page 852.
To configure the Content Matches Regular Expression condition
1 Add a Content Matches Regular Expression condition to a policy, or edit an existing
one.
See “Configuring policies” on page 413.
See “Configuring policy rules” on page 417.
See “Configuring policy exceptions” on page 426.
2 Configure the Content Matches Regular Expression condition parameters.
See Table 33-2 on page 855.
3 Save the policy configuration.
Detecting content using regular expressions 855
Best practices for using regular expression matching

Table 33-2 Content Matches Regular Expression parameters

Action Description

Match regex. Specify a regular expression to be matched.

See “About writing regular expressions” on page 853.

Configure match Configure how you want to count matches.

counting.
See “Configuring match counting” on page 421.

Check for existence reports a match count of 1 if there are one or more matches. For
compound rules or exceptions, all conditions must be configured this way.

Count all matches reports the sum of all matches; applies if any condition uses this
parameter.

Match on one or more Configure cross-component matching by selecting one or more message components to
message components. match on.

■ Envelope – The header of the message, transport metadata.

■ Subject – The email subject (only applies to email messages).
■ Body – The content of the message.
■ Attachments – The content of any files that are attached to or transported by the
message.

See “Selecting components to match on” on page 423.

Also match one or more Select this option to create a compound condition. All conditions must match to trigger or
additional conditions. except an incident.

You can Add any available condition from the list.

See “Configuring compound match conditions” on page 429.

Best practices for using regular expression matching

This section provides considerations for implementing the Content Matches Regular
Expression match condition in your Data Loss Prevention policies.
See “Introducing regular expression matching” on page 852.
Table 33-3 summarizes the regular expression matching best practices in this section.

Table 33-3 Regular expressions best practices

Best practice Description

Use Data Identifiers instead of regular expressions where See “Use regular expressions sparingly to support efficient
possible. performance” on page 857.
Detecting content using regular expressions 856
Best practices for using regular expression matching

Table 33-3 Regular expressions best practices (continued)

Best practice Description

Use regular expressions sparingly to support efficient policy See “Test regular expressions before deployment to
performance. improve accuracy” on page 857.

Use look ahead and behind characters to improve regular See “Use look ahead and look behind characters to
expression performance. improve regular expression accuracy” on page 856.

Test regular expressions for accuracy and performance. See “Test regular expressions before deployment to
improve accuracy” on page 857.

When to use regular expression matching

Data Identifiers are more efficient than regular expressions because the Data Identifier patterns
are tuned for accuracy and the data is validated. For example, if you want to search for social
security numbers, use the US Social Security Number (SSN) Data Identifier instead of a regular
expression.
The regular expression condition is useful for matching or excepting unique data types for
which there are no system-provided Data Identifiers. Examples of these include internal account
numbers and data types that can vary greatly in length, such as email addresses.

Use look ahead and look behind characters to improve regular

expression accuracy
Symantec Data Loss Prevention implements a significant enhancement to improve the
performance of regular expressions. To achieve improved regular expression performance,
the look ahead and look behind sections must exactly match one of the supported standard
sections.
Table 33-4 lists the standard look ahead and look behinds sections that this performance
improvement supports. If either section differs even slightly, that section is executed as part
of the regular expression without the performance improvement.
See “About writing regular expressions” on page 853.

Table 33-4 Look ahead and look behind standard sections

Operation Construct

Look ahead (?=(?:[^-\w])|$)

Detecting content using regular expressions 857
Best practices for using regular expression matching

Table 33-4 Look ahead and look behind standard sections (continued)

Operation Construct

Look behind (?<=(^|(?:[^)+\d][^-\w+])))

and

(?<=(^|(?:[^)+\d][^-\w+])|\t))

Use regular expressions sparingly to support efficient performance

Regular expressions can be computationally expensive. If you add a regular expression
condition, observe the system for one hour. Make sure that the system does not slow down
and that there are no false positives.

Test regular expressions before deployment to improve accuracy

If you implement regular expression matching, consider using a third-party tool to test the
regular expressions before you deploy the policy rules to production. The recommended tool
is RegexBuddy. Another good tool for testing your regular expressions is RegExr.
Chapter 34
Detecting content using
classification matching
This chapter includes the following topics:

■ Introducing classification matching

■ Supported file types

■ How tag matching works

■ Configuring the Content Matches Classification condition

Introducing classification matching

Symantec Data Loss Prevention provides the Content Matches Classification condition to
detect Information Centric Tagging tags that have been applied to various files and email
content.
A tag comprises three components: organization, scope, and sensitivity level. An example
could be: Symantec-Marketing-Confidential, or written in tag form, SYMC-MKTG-CONF. An
organization can be the entire company or logical divisions within one company. Scope is
typically a functional group, such as Payroll or Engineering. The level of the sensitivity of the
data being tagged ranges from 1 through 9, with 1 being the least sensitive. An ICT administrator
defines these tags and can use terms that make sense to the organization. For example, the
admin might call Level 1 PUBLIC, Level 4 CONFIDENTIAL, and Level 9 TOPSECRET. The
collection of all of the tags comprises the classification taxonomy.
To make use of this ICT taxonomy in Data Loss Prevention, you must import it into the Data
Loss Prevention database. The taxonomy is then available to you as you define your detection
rule with the Content Matches Classification option.
In the Conditions area for this rule option, you have three choices for detection criteria:
Content is classified, Content is not classified, and Content matches. If you choose
Detecting content using classification matching 859
Supported file types

Content matches, the taxonomy is available to you to select from drop-down menus under
Organization, Scope, and Level. You can also select Any organization or scope. To complete
the detection formula, you choose the search Operator, such as Not Equals or Is Less Than
or Equals. Multiple operators can be combined ("OR'd" together).

Note: The Content is classified expression is triggered only if the classified file or email
message has been classified within the imported taxonomy. If a file or email message has
been classified using some other taxonomy that has not been imported into Enforce, then this
expression does not evaluate as true. Similarly, something that has been classified within
another Information Centric Tagging taxonomy that is not known to Enforce evaluates as
Content is not classified.

To detect these tags, the Data Loss Prevention detection engine searches the metadata of
supported emails and files. Prior to your search running, end users applied the tags to various
emails and files.
See “About integrating Information Centric Tagging with Data Loss Prevention” on page 226.

Supported file types

Data Loss Prevention searches for tags only in supported file types and email messages. For
the supported file types, see Table 34-1.

Table 34-1 Supported file types for classification matching

File type Supported formats

Microsoft Office ■ Pre-Office 2007 (CFB)

■ Office 2007 and later (XML)

Images .png, .gif

PDF .pdf

Files include email attachments.

No tag detection takes place:

■ On file types natively supported by Information Centric Tagging, but unreadable by Data
Loss Prevention (.jpg, .tiff).
■ Against file types not natively supported by Information Centric Tagging where the
classification tag resides in the Alternate Data Stream.
■ On encrypted data, unless DLP is configured to inspect Microsoft Rights Management
protected files, which include Microsoft Office and PDF documents for policy evaluation.
Detecting content using classification matching 860
How tag matching works

Note: Even though tags can be detected in the (unencrypted) metadata, a common scenario
for using the Content Matches Classification option is to join this option with other options,
such as using keyword matching or regular expressions to detect sensitive content, such
as Social Security Numbers. Then, if a file is detected with a Level 1 (PUBLIC) tag, for
example, but the document content is sensitive, an incident could be generated. If content
is encrypted, that type of policy using compound rules fails.

How tag matching works

For the Content Matches Classification option, you have three choices:
■ Content is classified
■ Content is not classified
■ Content matches (Select Operator, Select Organization, Select Scope, Select Level)
To understand how tag matching works when supported email or file types are searched, see
the appropriate table below.

Note: In the tables, the term this taxonomy refers to the taxonomy that's been
imported/synchronized on this Enforce Server.

Table 34-2 Search results for the Content is classified condition

Incidents are generated when Incidents are not generated when

The tag belongs to this taxonomy. ■ The tag belongs to a different taxonomy.
■ There is no classification tag applied to the
content.
■ The tag is in the wrong format.

Table 34-3 Search results for the Content is not classified condition

Incidents are generated when Incidents are not generated when

■ The tag belongs to a different taxonomy. The tag belongs to this taxonomy.
■ There is no classification tag applied to the
content.
■ The tag is in the wrong format.
Detecting content using classification matching 861
How tag matching works

Table 34-4 Search results for the Content matches [specific operator and selected tags]
condition

Incidents are generated when Incidents are not generated when

The ICT tag matches the criteria. ■ The tag in this taxonomy does not match the
criteria.
■ The tag belongs to a different taxonomy.
■ There is no classification tag applied to the
content.
■ The tag is in the wrong format.

Table 34-5 lists an example of an imported classification taxonomy, displayed on the System
> Settings > Information Centric Tagging page.
Table 34-6 shows the results of running various combinations of operators and tag selections
against that taxonomy, either from the Configure Policy - Add Rule page or the Configure
Policy - Edit Rule page, when defining a detection rule of Content Matches Classification
type.

Table 34-5 Sample imported ICT classification taxonomy

Organization Scope Sensitivity Level

CLOUD

ENG

CONFID 4

RESTRICT 3

INTERNAL 2

CORE

FIN

SECRET 5

PUB 1

MKTG

CONFID 4

PUB 1
Detecting content using classification matching 862
How tag matching works

Table 34-5 Sample imported ICT classification

taxonomy (continued)

Organization Scope Sensitivity Level

OUTSRC

ENG

SECRET 4

CONFID 3

DEPTONLY 2

Table 34-6 Incidents that evaluate to true, based on operator and matching requirements

Operator Organization Scope Level

Equals CLOUD Any 2

Evaluates to true if
content classified
as:

(2) INTERNAL

Equals CORE Any (4) CONFID

Evaluates to true if
content classified
as:

CORE MKTG (4) CONFID

Not Equals CORE MKTG 1

Evaluates to true if
content classified
as:

CLOUD ENG (4) CONFID

CLOUD ENG (3) RESTRICT

CLOUD ENG (2) INTERNAL

CORE FIN (5) SECRET

Detecting content using classification matching 863
Configuring the Content Matches Classification condition

Table 34-6 Incidents that evaluate to true, based on operator and matching
requirements (continued)

Operator Organization Scope Level

OUTSRC ENG (4) SECRET

OUTSRC ENG (3) CONFID

OUTSRC ENG (2) DEPTONLY

Is Less Than or CORE FIN (5) SECRET

Equals

Evaluates to true if
content classified
as:

CORE FIN (5) SECRET

Is Greater Than or OUTSRC ENG (3) CONFID

Equals

Evaluates to true if
content classified
as:

OUTSRC ENG (4) SECRET

OUTSRC ENG (3) CONFID

Configuring the Content Matches Classification

condition
To configure the Content Matches Classification condition
1 Add a Content Matches Classification condition to a policy, or edit an existing one.
2 In the Conditions area, set the parameters:
■ Configure the Content Matches Classification condition (Table 34-7).
■ For the Matches on parameter, Envelope and Attachments are always selected;
Subject and Body are never selected.

3 Save the policy.

Detecting content using classification matching 864
Configuring the Content Matches Classification condition

Table 34-7 Content Matches Classification parameters

Parameter Description

Content is classified See “How tag matching works”

on page 860.

Content is not classified See “How tag matching works”

on page 860.

Content matches: See “How tag matching works”

on page 860.

Select Operator Choose an operator: Equals, Not

Equals, Is Less Than or Equals,
or Is Greater Than or Equals.
Note: As your ICT classification
taxonomy evolves, using "Is less
than..." or "Is greater than..."
makes your detection rule option
more durable. These comparative
terms allow for searching current
and future taxonomies. If you write
every rule using "Equals," you
may have to revise your rules
often.

Select Organization Choose an Organization from the

drop-down menu, which contains
the organizations imported in the
ICT taxonomy. You can also
choose Any.

Note that the term Organization

in Data Loss Prevention is called
Company in Information Centric
Tagging.

Select Scope Choose a Scope from the

drop-down menu, which contains
the scopes imported in the ICT
taxonomy. You can also choose
Any.
Detecting content using classification matching 865
Configuring the Content Matches Classification condition

Table 34-7 Content Matches Classification parameters (continued)

Parameter Description

Select Level Choose a Level from the

drop-down menu, which contains
the sensitivity levels imported in
the ICT taxonomy.

■ If you selected a specific

Organization and Scope, the
Level menu includes the
sensitivity level and name,
such as (1) PUBLIC, (4)
CONF, and (9) TOPSECRET.
■ If you selected Any
Organization and Scope, the
Level menu displays only the
level numbers, 1 - 9, since the
level names could differ
among scopes.
■ The Level is compared only if
the Organization and Scope
requirements are met.

OR Click OR to add another selection

of Operator, Organization,
Scope, and Level. You can add
multiple OR statements for one
rule. OR statements are evaluated
individually; they do not all need
to be true to create an incident.
Chapter 35
Detecting international
language content
This chapter includes the following topics:

■ Detecting non-English language content

■ Best practices for detecting non-English language content

Detecting non-English language content

Symantec Data Loss Prevention detection features support many localized versions of Microsoft
Windows operating systems. To use international character sets, the Windows system on
which you view the Enforce Server administration console must have the appropriate
capabilities.
See “About support for character sets, languages, and locales” on page 91.
See “Working with international characters” on page 93.
You can create policies and detect violations using any supported language. You can use
localized keywords, regular expressions, and Data Profiles to detect data loss. In addition,
Symantec Data Loss Prevention offers several international data identifiers and policy templates
for protecting confidential data.
See “Supported languages for detection” on page 92.
See “Use international policy templates for policy creation” on page 867.
See “Use custom keywords for system data identifiers” on page 869.
Detecting international language content 867
Best practices for detecting non-English language content

Best practices for detecting non-English language

content
This section provides some best practices for implementing non-English language conent
detection.

Use international policy templates for policy creation

Symantec Data Loss Prevention provides several international policy templates that you can
quickly deploy in your enterprise.
See “Creating a policy from a template” on page 397.

Table 35-1 International policy templates

Policy template Description

Canadian Social Insurance Numbers This policy detects patterns indicating Canadian social insurance numbers.

See “Canadian Social Insurance Numbers policy template” on page 1562.

Caldicott Report This policy protects UK patient information.

See “Caldicott Report policy template” on page 1561.

Data Protection Act 1998 This policy protects personal identifiable information.

See “Data Protection Act 1998 policy template” on page 1568.

EU Data Protection Directives This policy detects personal data specific to the EU directives.
See “Data Protection Directives (EU) policy template” on page 1570.

General Data Protection Regulations This policy protects personal identifiable information related to banking and
(Banking and Finance) finance.

See “General Data Protection Regulation (Banking and Finance)” on page 1583.

General Data Protection Regulation This policy protects personal identifiable information related to digital identity.
(Digital Identity)
See “General Data Protection Regulation (Digital Identity)” on page 1617.

General Data Protection Regulation This policy protects personal identifiable information related to government
(Government Identification) identification.

See “General Data Protection Regulation (Government Identification)”

on page 1618.
Detecting international language content 868
Best practices for detecting non-English language content

Table 35-1 International policy templates (continued)

Policy template Description

General Data Protection Regulation This policy protects personal identifiable information related to healthcare
(Healthcare and Insurance) and insurance.

See “General Data Protection Regulation (Healthcare and Insurance)”

on page 1656.

General Data Protection Regulation This policy protects personal identifiable information related to personal
(Personal Profile) profile data.

See “General Data Protection Regulation (Personal Profile)” on page 1672.

General Data Protection Regulation This policy protects personal identifiable information related to travel.
(Travel)
See “General Data Protection Regulation (Travel)” on page 1675.

Human Rights Act 1998 This policy enforces Article 8 of the act for UK citizens.

See “Human Rights Act 1998 policy template” on page 1694.

PIPEDA (Canada) This policy detects Canadian citizen customer data.

See “PIPEDA policy template” on page 1711.

SWIFT Codes (International banking) This policy detects codes that banks use to transfer money across
international borders.

See “SWIFT Codes policy template” on page 1726.

UK Drivers License Numbers This policy detects UK Drivers License Numbers.

See “UK Drivers License Numbers policy template” on page 1727.

UK Electoral Roll Numbers This policy detects UK Electoral Roll Numbers.

See “UK Electoral Roll Numbers policy template” on page 1727.

UK National Insurance Numbers This policy detects UK National Insurance Numbers.

See “UK National Insurance Numbers policy template” on page 1728.

UK National Health Service Number This policy detects personal identification numbers issued by the NHS.

See “UK National Health Service (NHS) Number policy template” on page 1728.

UK Passport Numbers This policy detects valid UK passports.

See “UK Passport Numbers policy template” on page 1728.

UK Tax ID Numbers This policy detects UK Tax ID Numbers.

See “UK Tax ID Numbers policy template” on page 1729.

Detecting international language content 869
Best practices for detecting non-English language content

Use custom keywords for system data identifiers

Data identifiers offer broad support for detecting international content.
See “Introducing data identifiers” on page 717.
Some international data identifiers offer a wide breadth of detection only. In this case you can
implement the Find Keywords optional validator to narrow the scope of detection. Implementing
this optional validator may help you eliminate any false positives that your policy matches.
See “Selecting a data identifier breadth” on page 739.
The following table provides keywords for several international data identifiers.
To use international keywords for system data identifiers
1 Create a policy using one of the system-provided international data identifiers that is listed
in the table.
Table 35-2
2 Select the Find Keywords optional validator.
See “Configuring the Content Matches data identifier condition” on page 737.
3 Copy and past the appropriate comma-separated keywords from the list to the Find
Keywords optional validator field.
See “Configuring optional validators” on page 763.

Table 35-2 International data identifiers and keyword lists

Data Identifier Language Keywords English Translation

Argentina Tax Spanish Número de Identificación Fiscal, Tax identification number,

Austria Passport German REISEPASS, ÖSTERREICHISCH Passport, Austrian passport

Number REISEPASS, reisepass

Austria Tax German Österreich, Steuernummer Austria, tax number

Identification Number

Austria Value Added German MwSt, Umsatzsteuernummer, VAT, sales tax number, VAT
Tax (VAT) Number MwSt Nummer, number, VAT identification
Ust.-Identifikationsnummer, number, sales tax, UID number
umsatzsteuer, Umsatzsteuer-
Identifikationsnummer
Detecting international language content 870
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Austrian Social German sozialversicherungsnummer, Social insurance number, social

Security Number soziale sicherheit security number, insurance
kein,Versicherungsnummer, number, Austrian SSN, Austrian
Österreichischen SSN, social insurance
Österreichischen
Sozialversicherungs

Belgium Passport Dutch, German, Paspoort, paspoort, Passport, passport number,

Number French paspoortnummer, Reisepass passport book, passport card
kein, Reisepass, Passnummer,
Passeport, Passeport livre,
Passeport carte, numéro
passeport

Belgium Value Added German, French Numéro T.V.A, VAT number, tax identification
Tax (VAT) Number Umsatzsteuer-Identifikationsnummer, number
Umsatzsteuernummer
Detecting international language content 871
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Brazilian Election Brazilian número identificação, Identification number, voter

Brazilian Brazilian Portuguese Brasileira ID Legal, entidades

National jurídicas ID,Registro Nacional
Registry of de Pessoas Jurídicas n º,
Legal Entities BrasileiraIDLegal#
Number

Brazilian Natural Brazilian Portuguese Cadastro de Pessoas Físicas,

Person Registry Brasileiro Pessoa Natural
Number Número de Registro, pessoa
natural número de registro,
pessoas singulares registro NO

Burgerservicenummer Dutch Persoonsnummer, sofinummer, person number, social-fiscal

sociaal-fiscaal nummer, number (abbreviation),
persoonsgebonden social-fiscal number,
person-related number
Detecting international language content 872
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Canada Driver's French permis de conduire Driver's license

License Number

Canada Passport French numéro passeport, No passeport, Passport number, passport no.,
Numbert passeport# passport#

Canada Permanent French numéro résident permanent, permanent resident number,

Chilean National Spanish Chilena número identificación, Chileand identification number,

China Passport Number Chinese 中国护照, 护照, 护照本 Chinese passport, passport,
passport book

Codice Fiscale Italian codice fiscal, dati anagrafici, tax code, personal data, VAT
partita I.V.A., p. iva number, VAT number

Columbian Addresses Spanish Calle, Cll, Carrera, Cra, Cr, Street, St, Career, Avenue,
Avenida, Av, Dg, Diagonal, Diag, Diagonal, Transversal, sidewalk
Tv, Trans, Transversal, vereda

Columbian Cell Phone Spanish numero celular, número de Cellular number, telephone
Number teléfono, teléfono celular no., number, cellular telephone
numero celular# number

Columbian Tax Spanish NIT., NIT, nit., nit, Nit. TIN (tax identification number)
Identification Number
Detecting international language content 873
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Cyprus Value Added Turkish, Greek KDV, kdv#, KDV numarası, Katma VAT, VAT number, value added
Tax (VAT) Number değer Vergisi, Φόρος tax,
Προστιθέμενης Αξίας

Czech Republic Value Czech číslo DPH, Daň z přidané VAT number, value added tax,
Added Tax (VAT) hodnoty, Dan z pridané hodnoty, VAT
Number Daň přidané hodnoty, Dan
pridané hodnoty, DPH, DIC, DIČ
Detecting international language content 874
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Denmark Personal Danish Nationalt identifikationsnummer, National identification number,

Denmark Value Added Danish moms, momsnummer, moms VAT number, vat, value added tax
Tax (VAT) Number identifikationsnummer, number, vat identification number
merværdiafgift

Estonia Passport Estonian Pass, pass, passi number, pass Passport, passport number,
Number nr, pass#, Pass nr, Eesti passi Estonian passport number
number

Estonia Value Added Estonian käibemaksu VAT registration number, VAT,

Tax (VAT) Number registreerimisnumber, VAT number
käibemaksu, Käibemaksu
number, käibemaks, käibemaks#,
käibemaksu#
Detecting international language content 875
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

European Health Croatian, Danish, numero conto medico, tessera Medical account number, health
Insurance Card Number Estonian, Finnish, sanitaria assicurazione numero, insurance card number, insurance
French, German, carta assicurazione numero, card number, health insurance
Irish, Italian, Krankenversicherungsnummer, number, medical account number,
Luxembourgish, assicurazione sanitaria numero, health card number, health card,
Polish, Slovenian, medisch rekeningnummer, insurance number, EHIC number,
Spanish ziekteverzekeringskaartnummer,
verzekerings kaart nummer,
gezondheidskaart nummer,
gezondheidskaart, medizinische
Kontonummer,
Krankenversicherungskarte
Nummer, Versicherungsnummer,
Gesundheitskarte Nummer,
Gesundheitskarte, arstliku konto
number, ravikindlustuse kaardi
number, tervisekaart,
tervisekaardi number, Uimhir
ehic, tarjeta salud, broj kartice
zdravstvenog osiguranja, kartice
osiguranja broj, zdravstvenu
karticu, zdravstvene kartice broj,
ehic broj, numero tessera
sanitaria, numero carta di
assicurazione, tessera sanitaria,
numero ehic, Gesondheetskaart,
ehic nummer, numer rachunku
medycznego, numer karty
ubezpieczenia zdrowotne, numer
karty ubezpieczenia, karta
zdrowia, numer karty zdrowia,
numer ehic,
sairausvakuutuskortin numero,
vakuutuskortin numero,
terveyskortti, terveyskortin
numero, medicinsk
kontonummer, ehic numeris,
medizinescher Konto Nummer,
zdravstvena izkaznica
Detecting international language content 876
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Finland European Finnish Suomi EHIC-numero, Finland EHIC number, sickness

Finland Tax Finnish verotunniste, verokortti, Tax identification number, tax

Identification Number verotunnus, veronumero card, tax ID, tax number

Finland Value Added Finnish arvonlisäveronumero, ALV, VAT number, VAT, VAT
Tax (VAT) Number arvonlisäverotunniste, ALV nro, identification number
ALV numero, alv

Finnish Personal Finnish tunnistenumero, henkilötunnus, Identification number, personal

France Driver's License French permis de conduire Driver's license

Number

France Health French carte vitale, carte d'assuré social Health card, social insurance card
Insurance Number
Detecting international language content 877
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

France Tax French numéro d'identification fiscale Tax identification number

Identification Number

French INSEE Code French INSEE, numéro de sécu, code INSEE, social security number,
sécu social security code

French Passport French Passeport français, Passeport, French passport, passport,

Number Passeport livre, Passeport carte, passport book, passport card,
numéro passeport passport number

German Passport German Reisepass kein, Reisepass, Passport number, passport,

Number Deutsch Passnummer, German passport number,
Passnummer, Reisepasskein#, passport number
Passnummer#

German Personal ID German persönliche Personal identification number, ID

Number identifikationsnummer, number, Germane personal ID
ID-Nummer, Deutsch number, personal ID number,
persönliche-ID-Nummer, clear ID number, personal
persönliche ID Nummer, number, identity number,
eindeutige ID-Nummer, insurance number
persönliche Nummer,identität
nummer, Versicherungsnummer,
persönlicheNummer#,
IDNummer#
Detecting international language content 878
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Germany Driver's German Führerschein, Fuhrerschein, Driver's license, driver's license

Greece Social Security Greek Αριθμού Μητρώου Κοινωνικής Social security number
Number (AMKA) Ασφάλισης

Hong Kong ID Chinese 身份證 , 三顆星 Identity card, Hong Kong

(Traditional) permanent resident ID Card
Detecting international language content 879
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Hungary Driver's Hungarian jogosítvány, Illesztőprogramok License, driver's lic, driver's

Licence Number Lic, jogsi, licencszám, vezetői license, number of licenses,
engedély, VEZETŐI ENGEDÉLY, driving license
vezető engedély, VEZETŐ
ENGEDÉLY

Hungary Passport French, útlevél, Magyar útlevélszám, Passport, Hungarian passport

Number Hungarian útlevél könyv, nombre, numéro number, passport book, number,
de passeport, hongrois, numéro passport number
de passeport hongrois

Hungarian Social Hungarian Magyar társadalombiztosítási Hungarian social security number,

Iceland Passport Icelandic vegabréf, vegabréfs númer, Passport, passport number,

Number Vegabréf Nei, vegabréf# passport no.

Iceland Value Added Icelandic virðisaukaskattsnúmer, vsk VAT number

Tax (VAT) Number númer
Detecting international language content 880
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

International Bank French Code IBAN, numéro IBAN IBAN Code, IBAN number
Account Number (IBAN)
Central

International Bank French Code IBAN, numéro IBAN IBAN Code, IBAN number
Account Number (IBAN)
East

International Bank French Code IBAN, numéro IBAN IBAN Code, IBAN number
Account Number (IBAN)
West

Ireland Passport Irish irelande passeport, Éire pas, no Ireland passport, passport
Number de passeport, pas uimh, uimhir number, passport
pas, numéro de passeport

Ireland Tax Irish uimhir carthanachta, Uimhir Charity number, charity

Irish Personal Public Gaelic Gaeilge Uimhir Phearsanta Irish personal public service
Service Number Seirbhíse Poiblí, PPS Uimh., number, PPS no., personal public
uimhir phearsanta seirbhíse service number, service no., PPS
poiblí, seirbhíse Uimh, PPS Uimh, no., PPS service one
PPS seirbhís aon
Detecting international language content 881
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Italy Driver's License Italian patente guida numero, patente di Driver's license number, driver's
Number guida numero, patente di guida, license
patente guida

Italy Health Insurance Italian TESSERA SANITARIA, tessera Health insurance card, Italian
Number sanitaria, tessera sanitaria health insurance card
italiana

Italian Passport Italian Repubblica Italiana Passaporto, Italian Republic passport,

Italy Value Added Tax Italian IVA, numero partita IVA, IVA#, VAT, VAT number, VAT#, VAT
(VAT) Number numero IVA number

Japanese Juki-Net ID Japanese 住基ネット識別番号, 住基ネット番 Juki-Net identification number,

Number 号, 識別番号, 個人識別番号 Juki-Net number, identification
number, personal identification
number

Japanese My Number - Japanese マイナンバー, 共通番号 My number, common number

Corporate

Japanese My Number - Japanese マイナンバー, 個人番号, 共通番号 My number, personal number,

Personal common number

Japan Passport Japanese 日本国旅券, パスポート, パスポー Japanese passport, passport,

Number ト数 passport number
Detecting international language content 882
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Kazakhstan Passport Kazakh төлқұжат, төлқұжат нөмірі, Passport, passport number,

Number номер паспорта, заграничный passport ID, international
пасспорт, национальный passport, national passport
паспорт

Korea Passport Number Korean 한국어 여권, 여권, 여권 번호, 대한 Korean passport, passport,
민국 passport number, Republic of
Korea

Korea Residence Korean 외국인 등록 번호, 주민번호 Foreigner registration number,

Registration Number social security number
for Foreigners

Korean Residence Korean 주민등록번호, 주민번호 Resident registration number,

Registration Number social security number
for Korean

Liechtenstein Passport German Reisepass, Pass Nr, Pass Nr., Passport, passport no.
Number Reisepass#, Pass Nr#
Detecting international language content 883
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Lithuania Personal Lithuanian Nacionalinis ID, Nacionalinis National ID, national identification
Identification Number identifikavimo numeris, asmens number, personal ID
kodas

Luxembourg Passport French and passnummer, ausweisnummer, Passport number, passport,

Number German passeport, reisepass, pass, pass Luxembourg pass, Luxembourg
net, pass nr, no de passeport, passport
passeport nombre, numéro de
passeport

Luxembourg Tax French, German Zinn, Zinn Nummer, Luxembourg TIN, TIN number, Luxembourg tax
Identification Number Tax Identifikatiounsnummer, identification number, tax number,
Steier Nummer, Steier ID, tax ID, social security ID,
Sozialversicherungsausweis, Luxembourg tax identification
Zinnzahl, Zinn nein, Zinn#, number, Social Security, Social
luxemburgische Security Card, tax identification
steueridentifikationsnummer, number
Steuernummer,Steuer ID, sécurité
sociale, carte de sécurité sociale,
étain,numéro d'étain, étain non,
étain#, Numéro d'identification
fiscal luxembourgeois, numéro
d'identification fiscale
Detecting international language content 884
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Macau National Chinese, 身份证号码, 唯一的识别号码 ID number, unique identification

Malaysia Passport Malay pasport, nombor pasport, Passport, passport number,

Number pasport# passport #

Malta National Maltese numru identifikazzjoni nazzjonali, national identification number,

Identification Number ID nazzjonali, numru national ID, personal identification
identifikazzjoni personali, ID number, personal ID
personali, IDnazzjonali#,
IDpersonali#

Malta Tax Identification Maltese kodiċi tat-taxxa, numru tat-taxxa, Tax code, tax number, tax
Number numru identifikazzjoni tat-taxxa, identification number, taxid#
taxxaid#, numru identifikazzjoni taxpayer identification number,
kontribwent, kodiċi kontribwent, taxpayer code, tin, tin no
landa, landa nru
Detecting international language content 885
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Mexican Tax Spanish Registro Federal de Federal taxpayer registry, tax

Netherlands Bank Dutch, bancu aklarashon number, Bank account number, account
Account Number Papiamento aklarashon number, number
bankrekeningnummer,
rekeningnummer

Netherlands Driver's Dutch RIJMEWIJS, permis de conduire, Driver's license, driving permit,
License Number rijbewijs, Rijbewijsnummer, driver's license number
RIJBEWIJSNUMMER

Netherlands Passport Dutch Nederlanden paspoort nummer, Dutch passport number, passport,
Number Paspoort, paspoort, Nederlanden passport number
paspoortnummer,
paspoortnummer
Detecting international language content 886
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Netherlands Tax Dutch, Nederlands belasting Dutch tax identification number,

Netherlands Value Dutch, Frisian wearde tafoege tax getal, BTW Value added tax number, VAT
Added Tax (VAT) nûmer, BTW-nummer number
Number

New Zealand Driver's Maori raihana taraiwa Driving license

Licence Number

New Zealand Passport Maori uruwhenua, tau uruwhenua, Passport, passport no.
Number uruwhenua no, uruwhenua no.

Norway Driver's Norwegian førerkort, førerkortnummer Driver's license, driver's license

Licence Number number
Detecting international language content 887
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Norway Value Added Norwegian mva, MVA, momsnummer, VAT, VAT number, VAT
Tax Number Momsnummer, registration number
momsregistreringsnummer

Norwegian Birth Norwegian fødsel nummer, Fødsel nr, fødsel Birth number
Number nei, fødselnei#, fødselnummer#

People's Republic of Chinese 身份证,居民信息,居民身份信息 Identity Card, Information of

China ID (Simplified) resident, Information of resident
identification

Poland Driver's Licence Polish Kierowcy Lic., prawo jazdy, Drivers license number, driving
Number numer licencyjny, zezwolenie na license, license number
prowadzenie, PRAWO JAZDY

Poland Passport French, Polish paszport#, numer paszportu, Nr Passport #, passport number,
Number paszportu, paszport, książka passport number, passport,
paszportowa passport book

passeport, nombre, numéro de Passport, number, passport

passeport, passeport#, No de number, passport #, passport
passeport number

Polish Identification Polish owód osobisty, Tożsamości Identification card, national

Number narodowej, osobisty numer identity, identification card
identyfikacyjny, niepowtarzalny number, unique number, number
numer, numer
Detecting international language content 888
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Polish REGON Number Polish numer statystyczny, REGON, Statistical number, REGON
numeru REGON, number
numerstatystyczny#,
numeruREGON#

Portugal Passport French and passaporte, passeport, Passport number, passport,

Number Portuguese portuguese passport, portuguese Portuguese passport
passeport, portuguese
passaporte, passaporte nº,
passeport nº

Portugal Tax Portuguese número identificação fiscal Tax identification numberr

Identification Number

Portugal Value Added Portuguese imposto sobre valor Value added tax, VAT, VAT
Tax (VAT) Number acrescentado, VAT nº, número number, VAT code
iva, vat não, código iva
Detecting international language content 889
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Russian Taxpayer Russian НДС, номер TIN (tax identification number),

Identification Number налогоплательщика, taxpayer number, taxpayer ID, rax
Налогоплательщика ИД, налог number
число, налогчисло#, ИНН#,
НДС#
Detecting international language content 890
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

SEPA Creditor Identifier Bulgarian, SEPA-Gläubiger-Identifikator, SEPA creditor identifier, creditor

ID Creidiúnaí, Aithnitheoir Creditor Identifier SEPA, Creditor

Creidiúnaí ID, SEPA ID, Creditor Identifier

ID del creditore, Identificatore del SEPA Creditor Identifier, Creditor

creditore Identifier

Identificador de acreedor SEPA,

ID del acreedor, ID de SEPA,
Identificador del acreedor

Identificador Credor SEPA,

Identificador do Credor
Detecting international language content 891
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

SEPA Creditor Identifier Bulgarian, SEPA-Gläubiger-Identifikator, SEPA creditor identifier, creditor

ID Creidiúnaí, Aithnitheoir Creditor Identifier SEPA, Creditor

Creidiúnaí ID, SEPA ID, Creditor Identifier

ID del creditore, Identificatore del SEPA Creditor Identifier, Creditor

creditore Identifier

Identificador de acreedor SEPA,

ID del acreedor, ID de SEPA,
Identificador del acreedor

Identificador Credor SEPA,

Identificador do Credor
Detecting international language content 892
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

SEPA Creditor Identifier Bulgarian, SEPA-Gläubiger-Identifikator, SEPA creditor identifier, creditor

ID Creidiúnaí, Aithnitheoir Creditor Identifier SEPA, Creditor

Creidiúnaí ID, SEPA ID, Creditor Identifier

ID del creditore, Identificatore del SEPA Creditor Identifier, Creditor

creditore Identifier

Identificador de acreedor SEPA,

ID del acreedor, ID de SEPA,
Identificador del acreedor

Identificador Credor SEPA,

Identificador do Credor

Serbia Value Added Tax Serbian poreski identifikacioni broj, Tax identification number VAT
(VAT) Number PORESKI IDENTIFIKACIONI number, value added tax, VAT,
BROJ, Poreski br., ПДВ број, identification number, tax number
Порез на додату вредност, PDV
broj, Porez na dodatu vrednost,
porez na dodatu vrednost, PDV,
pdv, ПДВ, порески
идентификациони број, PIB, pib,
пиб, poreski broj, порески број
Detecting international language content 893
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Slovakia Passport French, Slovak PASSEPORT, passeport, Passport, passport number,

Number cestovný pas, číslo pasu, pas č, passport no
Číslo pasu, PAS, CESTOVNÝ
PAS, Passeport n°

Slovenia Passport French, Slovenian številka potnega lista, potni list, Passport number, passport,
Number knjiga potnega lista, potni list #, passport book, passport #
passeport, Passeport

Slovenia Tax Slovenian identifikacijska številka davka, Tax identification number,

Identification Number Slovenska davčna številka, Slovenian tax number, tax number
Davčna številka
Detecting international language content 894
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Slovenia Value Added Slovenian številka davka na dodano Value added tax number, VAT no,
Tax (VAT) Number vrednost, DDV št, slovenia vat št Slovenia vat no

South Korea Resident Korean 주민등록번호, 주민번호 Resident Registration Number,

Registration Number Resident Number

Spain Driver's License Spanish permiso de conducción, permiso Driver's license, driver's license
Number conducción, Número licencia number, driving license, driving
conducir, Número de carnet de permit, driving permit number
conducir, Número carnet
conducir, licencia conducir,
Número de permiso de conducir,
Número de permiso conducir,
Número permiso conducir,
permiso conducir, licencia de
manejo, el carnet de conducir,
carnet conducir
Detecting international language content 895
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Spanish Passport Spanish libreta pasaporte, número passport book, passport number,
Number pasaporte, Número Pasaporte, Spanish passport, passport
España pasaporte, pasaporte

Spanish Social Security Spanish Número de la Seguridad Social, Social security number
Number número de la seguridad social

Sri Lanka National Sinhala See user interface ID, national identity number,
Identity Number personal identification number,
National Identity Card number

Sweden Driver's Finnish, Romani, ajokortti, permis de Driver's license, driver's license
License Number Swedish, Yiddish conducere,ajokortin numero, number, driving license number
kuljettajat lic., drivere lic., körkort,
numărul permisului de
conducere, ‫שאָפער דערלויבעניש‬
‫נומער‬, körkort nummer, förare lic.,
‫דריווערס דערלויבעניש‬,
körkortsnummer
Detecting international language content 896
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Sweden Personal Swedish personnummer ID, personligt ID number, personal ID number,

Identification Number id-nummer, unikt id-nummer, unique ID number, personal,
personnummer, identification number
identifikationsnumret,
personnummer#,
identifikationsnumret#

Sweden Tax Swedish skattebetalarens Tax identification number,

Identification Number identifikationsnummer, Sverige Swedish TIN, TIN number
TIN, TIN-nummer

Swedish Passport Swedish Passnummer, pass, sverige pass, Passport number, passport,
Number SVERIGE PASS, sverige Swedish passport, Swedish
Passnummer passport number

Switzerland Health German, Italian medizinische Kontonummer, Medical account number, health
Insurance Card Number Krankenversicherungskarte insurance card number, health
Nummer, numero conto medico, insurance number
tessera sanitaria assicurazione
numero, assicurazione sanitaria
numero
Detecting international language content 897
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Pass, Passnummer, Pass#, Pass Passport, passport #

Nr., Pass Nr, PASS

Passaporto, Numero di
passaporto, passaporto,
Passaporto n,Passaporto n.,
passaporto#, Passaport, numero
passaporto, numero di
passaporto, numero passaporto,
passaporto n, PASSAPORTO

Reisepass, Reisepass#,
REISEPASS

MwSt,
Umsatzsteuer-Identifikationsnummer,
MwSt#, Mehrwertsteuer-Nummer,
Mehrwertsteuer, VAT
Registrierungsnummer,
Umsatzsteuer-Identifikationsnummer

Swiss AHV Number French, German, Numéro AVS, numéro d'assuré, AVS number, insurance number,
Italian identifiant national, numéro national identifier, national
d'assurance vieillesse, numéro insurance number, social security
de sécurité soclale, Numéro AVH number, AVH number

AHV-Nummer, Matrikelnumme, AHV number, Swiss Registration

Personenidentifikationsnummer number, PIN

AVS, AVH AVS, AVH

Detecting international language content 898
Best practices for detecting non-English language content

Table 35-2 International data identifiers and keyword lists (continued)

Data Identifier Language Keywords English Translation

Swiss Social Security French, German, Identifikationsnummer, Identification number, social

Taiwan ROC ID Chinese 中華民國國民身分證 Taiwan ID

(Traditional)

Thailand Passport Thai หนังสือเดิน ทาง Passport, passport number

Number ,หมายเลขหนังสือเดินทาง

Thailand Personal ID Thai ประกันภัยจำนวน, Insurance number, personal

Turkish Identification Turkish Kimlik Numarası, Türkiye Identification number, Turkish

Ukraine Identity Card Ukrainian посвідчення особи України Ukraine identity card

Ukraine Passport Ukrainian паспорт, паспорт України, Passport, Ukraine passport,

Number (Domestic) номер паспорта, персональний passport number

Ukraine Passport Ukranian паспорт, паспорт України, Passport, Ukraine passport,

Number (International) номер паспорта passport number

Venezuela National ID Spanish cédula de identidad número, National ID number, national

Number clave única de identidad, identification number, personal ID
personal de identidad clave, number, personal identification,
personal de identidad, número de unique identification number
identificación nacional, número
ID nacional
Detecting international language content 899
Best practices for detecting non-English language content

Enable token validation to match Chinese, Japanese, and Korean

keywords on the server
The Content Matches Keyword condition supports both whole word and partial word matching.
Symantec Data Loss Prevention detection servers support natural language processing for
Chinese, Japanese, and Korean (CJK) language keywords. If you want to detect CJK keywords,
the recommendation is to enable token validation on the detection server and to use whole
word matching for the keyword condition.
The DLP Agent does not support token validation for CJK. On the endpoint, for CJK and
mixed-language keyword matching, consider using partial word matching.
With whole word matching, keywords match at word boundaries only (\W in the regular
expression lexicon). Any characters other than A-Z, a-z, and 0-9 are interpreted as word
boundaries. With whole word matching, keywords must have at least one alphanumeric
character (a letter or a number). A keyword consisting of only white-space characters, such
as "..", is ignored.
See “About keyword matching for Chinese, Japanese, and Korean (CJK) languages”
on page 839.
Chapter 36
Detecting file properties
This chapter includes the following topics:

■ Introducing file property detection

■ Configuring file property matching

■ Best practices for using file property matching

Introducing file property detection

Symantec Data Loss Prevention provides various methods for detecting the context of
messages, files, and attachments. You can detect the type, size, and name of files and
attachments. You can also use these conditions to except files and attachments from matching.
See “About file type matching” on page 900.
See “About file size matching” on page 902.
See “About file name matching” on page 903.
See “Configuring file property matching” on page 903.

About file type matching

You use the Message Attachment or File Type Match condition to match the file type of a
message attachment. Symantec Data Loss Prevention supports the identification of over 300
file types.
See “Supported formats for file type identification” on page 964.
Example uses of message attachment and file type matching are as follows:
■ A certain type of document should never leave the organization (such as a PGP document
or AutoCAD file).
Detecting file properties 901
Introducing file property detection

■ A certain type of match is likely to occur only in a document of a certain type, such as a
Word document.
The detection engine does not rely on the file name extension to match file format type. The
engine checks the binary signature of supported file formats. For example, if a user changes
a .doc file's extension to .txt and emails the file, the detection engine can still register a match
because it checks the binary signature of the file to detect it as an DOC file.
See “Supported formats for file type identification” on page 964.

Note: File type matching does not detect the content of the file; it only detects the file type
based on its binary signature. To detect content, use a content matching condition.

See “Configuring the Message Attachment or File Type Match condition” on page 904.
See “About custom file type identification” on page 901.

About file format support for file type matching

Symantec Data Loss Prevention supports over 300 file formats for file type identification using
the Message Attachment or File Type Match policy condition.
Refer to the following link for a complete list of file formats that can be recognized by this policy
condition.
See “Supported formats for file type identification” on page 964.

About custom file type identification

If the type of file you want to detect is not supported as a system default file type, Symantec
Data Loss Prevention provides you with the ability to identify custom file types using scripts.
To detect a custom file type, you use the Symantec Data Loss Prevention Scripting Language
to write a custom script that detects the binary signature of the file format that you want to
protect. To implement this match condition you need to enable it on the Enforce Server.
See “Enabling the Custom File Type Signature condition in the policy console” on page 908.
See “Configuring the Custom File Type Signature condition” on page 908.
Refer to the Symantec Data Loss Prevention Detection Customization Guide for the language
syntax and examples.

Note: The Symantec Data Loss Prevention Scripting Language only identifies custom file
formats; it does not extract content from custom file types.
Detecting file properties 902
Introducing file property detection

About file size matching

Use Message Attachment or File Size Match to detect content based on the size of particular
email message components.
See “Detection messages and message components” on page 391.
You can also detect matches for the number of files attached to email for SMTP.
The condition you choose when you configure this rule determines how a match is detected.
You choose from these options:
■ Single – This condition detects a match when the body of an email message or an email
attachment meets or exceeds the file size you specify. Detection is based on the each
component individually.
For example, you could specify a condition where the single file size is more than 50 KB
(kilobytes). An email message with a 20 KB body, and a single 51 KB email attachment
matches because the detected attachment exceeds 50 KB. However, an email message
with a 20 KB body, and a two 20 KB email attachments does not match. Even though the
entire message is more than 50 KB, each component is less than 50 KB. This rule does
not combine the total size of the body or the attached email files.
■ Total Attachment File Size – This condition, for SMTP only, detects a match when the
size of a single or combined email attachments meets or exceeds the file size criteria you
specify. Detection is based solely on the email attachments and does not factor in the body
of the email message.
For example, you could specify a condition where the total file size is more than 50 KB
(kilobytes). An email message with a 20 KB body, and a single 40 KB email attachment
does not match because while the total email exceeds 50 KB, the condition does not factor
in the body of the email message. However, an email message with a 20 KB body, and a
two 30 KB email attachments does match, because the two file attachments exceed 50
KB. In addition, an email with a 40 KB ZIP archive file attached would not match, even if
the extracted size of the files in that archive exceeded 50 KB.
The default value for the Total Attachment File Size condition is zero. This condition has
a character limit of four digits. You will encounter validation errors if you include decimal
points or other characters when specifying this value.
■ Total Attachment File Count – This condition, for SMTP only, detects a match when the
number of combined email attachments meets or exceeds the file count criteria you specify.
Detection is based solely on the combined number of direct email attachments. For example,
you could specify a condition where the total file count is more than five files. An email with
six files attached would match this condition, but an email with a single ZIP archive file
attachment would not match, even if the ZIP archive contained 20 files.
The default value for the Total Attachment File Count condition is zero. This condition
has a character limit of seven digits. You will encounter validation errors if you include
decimal points or other characters when specifying this value.
Detecting file properties 903
Configuring file property matching

Note: If the Total Attachment File Size and Total Attachment File Count conditions are
ANDed together with a content matching rule, the rules will be applied to all message
components. Components will only match one condition in an incident, even if they violate
more than one of the conditions.

The Total Attachment File Size and Total Attachment File Count rules are available on
both Windows and Mac endpoints. On Windows, they apply to Microsoft Outlook and IBM
(Lotus) Notes events. On Mac, they apply to Outlook for Mac events.
See “Configuring the Message Attachment or File Size Match condition” on page 905.

About file name matching

You use the Message Attachment or File Name Match condition to detect the names of files
and attachments.
See “File name matching syntax” on page 907.
See “File name matching examples” on page 907.
See “Configuring the Message Attachment or File Name Match condition” on page 906.

Configuring file property matching

Table 36-1 lists the conditions available for implementing file property matching.

Table 36-1 File Properties match conditions

Match condition Description

Message Attachment or File Detect or except specific files and attachments by type.
Type Match
See “About file type matching” on page 900.

See “Configuring the Message Attachment or File Type Match condition” on page 904.

Message Attachment or File Detect or except specific files and attachments by size.
Size Match
See “About file size matching” on page 902.

See “Configuring the Message Attachment or File Size Match condition” on page 905.

Message Attachment or File Detect or except specific files and attachments by name.
Name Match
See “About file name matching” on page 903.

See “Configuring the Message Attachment or File Name Match condition” on page 906.

Custom File Type Signature Detect or except custom file types.

Detecting file properties 904
Configuring file property matching

Configuring the Message Attachment or File Type Match condition

The Message Attachment or File Type Match condition matches the file type of an attachment
message component. You can configure an instance of this condition in policy rules and
exceptions.
See “About file type matching” on page 900.
To configure the Message Attachment or File Type Match condition
1 Add a Message Attachment or File Type Match condition to a policy rule or exception,
or edit an existing one.
See “Configuring policies” on page 413.
See “Configuring policy rules” on page 417.
See “Configuring policy exceptions” on page 426.
2 Configure the Message Attachment or File Type Match condition parameters.
See Table 36-2 on page 904.
3 Click Save to save the policy.

Table 36-2 Message Attachment or File Type Match condition parameters

Action Description

Select the file type or types Select all of the formats you want to match.
to match.
See “Supported formats for file type identification” on page 964.

Click select all or deselect all to select or deselect all formats.

To select all formats within a certain category (for example, all word-processing formats),
click the section heading.

The system implies an OR operator among all file types you select. For example, if you
select Microsoft Word and Microsoft Excel file type attachments, the system detects all
messages with Word or Excel documents attached, not messages with both attachment
types

Match on attachments only. This condition only matches on the Message Attachments component.

See “Detection messages and message components” on page 391.

Also match on one or more Select this option to create a compound condition. All conditions must match to trigger
additional conditions. or except an incident.

You can Add any condition available from the list.

See “Configuring compound match conditions” on page 429.

Detecting file properties 905
Configuring file property matching

Configuring the Message Attachment or File Size Match condition

The Message Attachment or File Size Match condition matches or excludes from matching
files of a specified size. You can configure an instance of this condition in policy rules and
exceptions.
See “About file size matching” on page 902.
To configure the Message Attachment or File Size Match condition
1 Add Message Attachment or File Size Match to a policy, or edit a policy that already
contains this rule.
See “Configuring policies” on page 413.
See “Configuring policy rules” on page 417.
See “Configuring policy exceptions” on page 426.
2 Select the Message Attachment or File Type Match condition:
See Table 36-3 on page 905.
3 Click Save to save the policy.

Table 36-3 Message Attachment or File Size Match parameters

Action Description

Single File Size Select More Than to specify the minimum file size of the file to match or Less Than to
specify the maximum file size to qualify a match.

Enter a number, and select the unit of measure: bytes, kilobytes (KB), megabytes (MB),
or gigabytes (GB).

Total Attachment File Size Enter a number, and select the unit of measure: bytes, kilobytes (KB), megabytes (MB),
or gigabytes (GB) to qualify a match.

Total Attachment File Enter a number to specify the number of files to qualify a match
Count

Match on the. Select one or both of the following message components on which to base the match:

■ Envelope – The option is not applicable for these options.

■ Subject – The option is not applicable for these options.
■ Body – The content of the message (This option applies only to Single File Size).
■ Attachments – Any files that are attached to or transferred by the message.

See “Selecting components to match on” on page 423.

Detecting file properties 906
Configuring file property matching

Table 36-3 Message Attachment or File Size Match parameters (continued)

Action Description

Also match one or more Select this option to create a compound condition. All conditions must match to trigger or
additional conditions. except an incident.

You can Add any condition available from the list.

See “Configuring compound match conditions” on page 429.

Configuring the Message Attachment or File Name Match condition

The Message Attachment or File Name Match condition matches based on the name of a
file attached to the message. You can configure an instance of this condition in policy rules
and exceptions.
See “About file name matching” on page 903.
To configure the Message Attachment or File Name Match condition
1 Add a Message Attachment or File Name Match condition to a policy, or edit an existing
one.
See “Configuring policies” on page 413.
See “Configuring policy rules” on page 417.
See “Configuring policy exceptions” on page 426.
2 Configure the Message Attachment or File Type Match condition parameters.
See Table 36-4 on page 906.
3 Click Save to save the policy.

Table 36-4 Message Attachment or File Name Match parameters

Action Description

Specify the File Name. Specify the file name to match using the DOS pattern matching language to represent
patterns in the file name.

Separate multiple matching patterns with commas or by placing them on separate lines.

See “File name matching syntax” on page 907.

See “File name matching examples” on page 907.

Match on attachments. This condition only matches on the Message Attachments component.

See “Detection messages and message components” on page 391.

Detecting file properties 907
Configuring file property matching

Table 36-4 Message Attachment or File Name Match parameters (continued)

Action Description

Also match one or more Select this option to create a compound condition. All conditions must match to trigger or
additional conditions. except an incident.

You can Add any condition available from the list.

See “Configuring compound match conditions” on page 429.

File name matching syntax

For file name matching, the system supports the DOS pattern matching syntax to detect file
names, including wildcards.
See “About file name matching” on page 903.
Any characters you enter (other than the DOS operators) match exactly. To enter multiple file
names, enter them as comma-separated values or by line space.
Table 36-5 describes the syntax for the Message Attachment or File Name Match condition.

Table 36-5 DOS Operators for file name detection

Operator Description

. Use a dot to separate the file name and the extension.

* Use an asterisk as a wild card to match any number of characters (including none).

? Use a question mark to match a single character.

File name matching examples

Table 36-6 lists some examples for matching file names using the Message Attachment or
File Name condition.
See “About file name matching” on page 903.

Table 36-6 File name matching examples

Match objective Example

To match a Word file name that begins with ENG- followed ENG-????????.doc
by any eight characters:

If you are not sure that it is a Word document: ENG-????????.*

If you are not sure how many characters are in the name: ENG-*.*
Detecting file properties 908
Configuring file property matching

Table 36-6 File name matching examples (continued)

Match objective Example

To match all file names that begin with ENG- and all file Enter as comma separated values:
names that begin with ITA-:
ENG-*.*,ITA-*

Or separate the file names by line space:

ENG-*.*

ITA-*

Enabling the Custom File Type Signature condition in the policy

console
By default the Custom File Type Signature policy condition is not enabled. To implement the
Custom File Type Signature condition, you must first enable it.
See “About custom file type identification” on page 901.
To enable the Custom File Type Signature rule
1 Using a text editor, open the file \Program
Files\Symantec\DataLossPrevention\EnforceServer\15.5\Protect\config\Manager.properties

2 Set the value of the following parameter to "true":

com.vontu.manager.policy.showcustomscriptrule=true

3 Stop and then restart the Symantec DLP Manager service.

4 Log back on to the Enforce Server Administration Console and add a new blank policy.
5 Add a new detection rule or exception and beneath the File Properties heading you should
see the Custom File Type Signature condition.
6 Configure the condition with your custom script.
See “Configuring the Custom File Type Signature condition” on page 908.

Configuring the Custom File Type Signature condition

The Custom File Type Signature condition matches custom file types that you have scripted.
You can implement the Custom File Type Signature condition in policy rules and exceptions.
See “About custom file type identification” on page 901.
See “Enabling the Custom File Type Signature condition in the policy console” on page 908.
Detecting file properties 909
Best practices for using file property matching

To configure a Custom File Type Signature condition

1 Add a Custom File Type Signature condition to a policy rule or exception, or edit an
existing one.
See “Configuring policy rules” on page 417.
See “Configuring policy exceptions” on page 426.
2 Configure the Custom File Type Signature condition parameters.
See Table 36-7 on page 909.
3 Click Save to save the policy.

Table 36-7 Custom File Type Signature parameters

Action Description

Enter the Script Name. Specify the name of the script. The name must be unique across policies.

Enter the custom file Enter the File Type Matches Signature script for detecting the binary signature of the custom
type script. file type.

See the Symantec Data Loss Prevention Detection Customization Guide for details on
writing custom scripts.

Match only on This condition only matches on the Message Attachments component.
attachments.
See “Detection messages and message components” on page 391.

Also match one or more Select this option to create a compound condition. All conditions must match to trigger or
additional conditions. except an incident.
You can Add any condition available from the list.

See “Configuring compound match conditions” on page 429.

Best practices for using file property matching

This section provides best practices for using file property matching conditions to match file
formats, file size, and file name.

Use compound file property rules to protect design and multimedia

files
You can use IDM to protect files, or you can use file property rules. Unless you must protect
an exact file, the general recommendation is to use the file property rules because there is
less overhead in setting up the rules.
Detecting file properties 910
Best practices for using file property matching

For example, if you want to detect CAD files that contain IP diagrams, you could index these
files and apply IDM rules to detect them. Alternatively, you could create a policy that contains
a file type rule that detects on the CAD file format plus a file size rule that specifies a threshold
size. The file property approach is preferred because in this scenario all you really care about
is protecting large CAD files potentially leaving the company. There is no need to gather and
index these files for IDM if you can simply create rules that will detect on the file type and the
size.

Do not use file type matching to detect content

File type recognition does not crack the file and detect content; it only detects the file type
based on the file's binary signature. To detect content, use a content detection rule such as
EDM, IDM, Data Identifiers, or Keyword matching.
For custom file type detection, use the DLP Scripting Language. Refer to the Symantec Data
Loss Prevention Detection Customization Guide.

Calculate file size properly to improve match accuracy

The file size method counts both the body and any attachments in the file size you specify.

Use expression patterns to match file names

The following DOS pattern matching expressions are provided as examples for configuring
the Message Attachment or File Name condition.

Table 36-8 File name detection examples

Example

Any characters you enter (other than the DOS operators) match exactly.

For example, to match a Word file name that begins with ENG- followed by any eight characters, enter:
ENG-????????.doc

If you are not sure that it is a Word document, enter: ENG-????????.*

If you are not sure how many characters follow ENG-, enter: ENG-*.*

To match all file names that begin with ENG- and all file names that begin with ITA-, enter: ENG-*.*,ITA-* (comma
separated), or you can separate the file names by line space.

Use scripts and plugins to detect custom file types

Symantec Data Loss Prevention provides two mechanisms for detecting custom file types: the
DLP Scripting Language and the Content Extraction SPI. If the only requirement is file type
Detecting file properties 911
Best practices for using file property matching

recognition, it may be easier to write a script than an SPI plugin. But, there may be occasions
where using a script is inadequate.
The scripting language does not support loops; you cannot iterate over the file type bytes and
do some processing. The scripting language is designed to detect a known signature at a
relatively known offset. You cannot use the scripting language detect subtypes of the same
document type. For example, , if you wanted to detect password protected PDF files, you could
not use the scripting language. Or, if you wanted to detect only Word documents with track
changes enabled, you would have to write a plugin. On the other hand, you can deploy a script
to the endpoint; currently plugins are server-based only.
For more information, refer to the Symantec Data Loss Prevention Content Extraction
Plugin Developers Guide and the Symantec Data Loss Prevention Detection
Customization Guide on writing custom plugins and scripts, respectively.
Chapter 37
Detecting network incidents
This chapter includes the following topics:

■ Introducing protocol monitoring for network

■ Configuring the Protocol Monitoring condition for network detection

■ Best practices for using network protocol matching

Introducing protocol monitoring for network

Symantec Data Loss Prevention provides the Protocol Monitoring condition which lets you
detect network messages based on the communications transport method.
Table 37-1 lists the protocols that Data Loss Prevention supports for network detection.

Table 37-1 Supported protocols for network monitoring

Protocol Description

Email/SMTP Simple Mail Transfer Protocol (SMTP) is a protocol for sending email messages between servers.

FTP The file transfer protocol (FTP) is used on the Internet for transferring files from one computer
to another.

HTTP The hypertext transfer protocol (HTTP) is the underlying protocol that supports the World Wide
Web. HTTP defines how messages are formatted and transmitted, and what actions Web servers
and browsers should take in response to various commands.

HTTP/SSL Hypertext transfer protocol over Secure Sockets Layer (HTTPS) is a protocol for sending data
securely between a client and server.

NNTP Network News Transport Protocol (NNTP), which is used to send, distribute, and retrieve USENET
messages.
Detecting network incidents 913
Configuring the Protocol Monitoring condition for network detection

Table 37-1 Supported protocols for network monitoring (continued)

Protocol Description

TCP:custom_protocol The Transmission Control Protocol (TCP) is used to reliably exchange data between computers
across the Internet. This option is only available if you have defined a custom TCP port.

See “Configuring the Protocol Monitoring condition for network detection” on page 913.

Configuring the Protocol Monitoring condition for

network detection
You use the Protocol Monitoring condition to detect network incidents. You can implement an
instance of the Protocol Monitoring condition in one or more policy detection rules and
exceptions.

Table 37-2 Protocol Monitoring condition parameters for Network

Action Description

Add or modify the Protocol Add a new Protocol or Endpoint Monitoring condition to a policy rule or exception, or
or Endpoint Monitoring modify an existing rule or exception condition.
condition.
See “Configuring policies” on page 413.

See “Configuring policy rules” on page 417.

See “Configuring policy exceptions” on page 426.

Select one or more To detect Network incidents, select one or more Protocols.
protocols to match.
■ Email/SMTP
■ FTP
■ HTTP
■ HTTPS/SSL
■ NNTP

Configure a custom Select one or more custom protocols: TCP:custom_protocol.

network protocol.

Configure endpoint See “Configuring the Endpoint Monitoring condition” on page 918.
monitoring.
Detecting network incidents 914
Best practices for using network protocol matching

Table 37-2 Protocol Monitoring condition parameters for Network (continued)

Action Description

Match on the entire The Protocol Monitoring condition matches on the entire message, not individual message
message. components.

The Envelope option is selected by default. You cannot select individual message
components.

See “Detection messages and message components” on page 391.

Also match one or more Select this option to create a compound condition. All conditions must match to trigger or
additional conditions. except an incident.

You can Add any condition available from the list.

See “Configuring compound match conditions” on page 429.

Best practices for using network protocol matching

This section provides best practices for using file property matching conditions to match file
formats, file size, and file name.

Use separate policies for specific protocols

You can use protocol matching detection to detect network traffic, such as Web mail, social
networking, and specific protocols. For protocol monitoring, consider implementing different
policies for each type of protocol, such as SMTP, TCP, HTTP, FTP, etc. Creating separate
policies for specific protocols may ease remediation and help you tune the policies.

Consider detection server network placement to support IP address

matching
You can detect senders/users and recipients based one or more IP addresses. However, to
do so you must carefully consider the placement of the detection server on your network.
If the detection server is installed between the Web proxy and the Internet, the IP address of
all Web traffic from individuals in your organization appears to come from the Web proxy. If
the detection server is installed between the Web proxy and the internal corporate network,
the IP address of all Web traffic from outside your organization appears to go to the Web proxy.
The best practice is to match on domain names instead of IP addresses.
Chapter 38
Detecting endpoint events
This chapter includes the following topics:

■ Introducing endpoint event detection

■ Configuring endpoint event detection conditions

■ Best practices for using endpoint detection

Introducing endpoint event detection

Endpoint detection matches events on endpoints where the Symantec DLP Agent is installed.
See “About Endpoint Prevent monitoring” on page 2296.
Symantec Data Loss Prevention provides several methods for detecting and excepting endpoint
events, and a collection of response rules for responding to them.
See “Response rule actions for endpoint detection” on page 1740.

About endpoint protocol monitoring

On the endpoint you can detect data loss based on the transport protocol, such as email
(SMTP), Web (HTTP), and file transfer (FTP).
See “Configuring the Endpoint Monitoring condition” on page 918.

Table 38-1 Supported protocols for endpoint monitoring

Protocol Description

Email/SMTP Simple Mail Transfer Protocol (SMTP) is a protocol for sending email messages between servers.

FTP The file transfer protocol (FTP) is used on the Internet for transferring files from one computer
to another.
Detecting endpoint events 916
Introducing endpoint event detection

Table 38-1 Supported protocols for endpoint monitoring (continued)

Protocol Description

HTTP/SSL Hypertext transfer protocol over Secure Sockets Layer (HTTPS) is a protocol for sending data
securely between a client and server.

About endpoint destination monitoring

You can also detect endpoint data loss on the destination where data is copied or moved,
such as CD/DVD drive, USB device, or the clipboard.
See “Configuring the Endpoint Monitoring condition” on page 918.

Table 38-2 Supported destinations for endpoint monitoring

Destination Description

Local Drive Monitor the local disk.

CD/DVD The CD/DVD burner on the endpoint computer. This destination can be any type of
third-party CD/DVD burning software.

Removable Storage Device Detect data that is transferred to any eSATA, FireWire, or USB connected storage
device.

Copy to Network Share Detect data that is transferred to any network share or remote file access.

Printer/Fax Detect data that is transferred to a printer or to a fax that is connected to the endpoint
computer. This destination can also be print-to-file documents.

Clipboard The Windows Clipboard used to copy and paste data between Windows applications.

About endpoint global application monitoring

The DLP Agent monitors applications when they access sensitive files. The DLP Agent monitors
any third-party application you add and configure at the System > Agents > Global Application
Monitoring screen.
You can create exceptions for allowable use scenarios.
See “Adding a Windows application” on page 2468.
See “Configuring the Endpoint Monitoring condition” on page 918.
See “Changing global application monitoring settings” on page 2462.
Detecting endpoint events 917
Configuring endpoint event detection conditions

About endpoint location detection

You can detect or except events based on the location of the endpoint.
Using the Endpoint Location detection method, you can choose to detect incidents only when
the endpoint is on or off the network.
For example, you might configure this condition to match only when users are off the corporate
network because you have other rules in place for detecting network incidents. In this case
implementing the Endpoint Location detection method would achieve this result.
See “Configuring the Endpoint Location condition” on page 919.

About endpoint device detection

Symantec Data Loss Prevention lets you detect or except specific endpoint devices based on
described device metadata. You can configure a condition to allow endpoint users to copy
files to a specific device class, such as USB drives from a single manufacturer.
For example, a policy author has a set of USB flash drives with serial numbers that range from
001-010. These are the only flash drives that should be allowed to access the company’s
endpoints. The policy administrator adds the serial number metadata into an exception of a
policy so that the policy applies to all USB flash drives except for the drives with the serial
number that falls into the 001-010 metadata. In this fashion the device metadata allows for
only “trusted devices” to be allowed to carry company data.
See “Creating and modifying endpoint device configurations” on page 922.
The Endpoint Device Class or ID condition detects specific removable storage devices based
on their definitions. Endpoint Destination parameters in the Endpoint Monitoring condition
detect any removable storage device on the endpoint,
See “Configuring the Endpoint Device Class or ID condition” on page 920.

Configuring endpoint event detection conditions

Table 38-3 describes the various methods for implementing endpoint event monitoring.

Table 38-3 Detecting endpoint events

Endpoint match conditions Details

Endpoint Protocol Monitoring Detect endpoint data based on the protocol.

See “About endpoint protocol monitoring” on page 915.

See “Configuring the Endpoint Monitoring condition” on page 918.

Detecting endpoint events 918
Configuring endpoint event detection conditions

Table 38-3 Detecting endpoint events (continued)

Endpoint match conditions Details

Endpoint Destination Detect endpoint data based on the destination.

Monitoring
See “About endpoint protocol monitoring” on page 915.

See “Configuring the Endpoint Monitoring condition” on page 918.

Endpoint Application Detect endpoint data based on the application.

Monitoring
See “About endpoint protocol monitoring” on page 915.

See “Configuring the Endpoint Monitoring condition” on page 918.

Endpoint Device or Class ID Detect when users move endpoint data to a specific device.

See “About endpoint device detection” on page 917.

See “Configuring the Endpoint Device Class or ID condition” on page 920.

Endpoint Location Detect when the endpoint is on or off the corporate network.

See “About endpoint location detection” on page 917.

See “Configuring the Endpoint Location condition” on page 919.

Configuring the Endpoint Monitoring condition

The Endpoint Monitoring condition matches on endpoint message protocols, destinations, and
applications.
You can implement an instance of the Endpoint Monitoring condition in one or more policy
detection rules and exceptions.

Note: This topic does not address network protocol monitoring configuration.
See “Configuring the Protocol Monitoring condition for network detection” on page 913.

Table 38-4 Configure the Endpoint Monitoring condition

Action Description

Add or modify the Add a new Protocol or Endpoint Monitoring condition to a policy rule or
Endpoint Monitoring exception, or modify an existing rule or exception condition.
condition.
See “Configuring policy rules” on page 417.

See “Configuring policy exceptions” on page 426.

See “Configuring policies” on page 413.

Detecting endpoint events 919
Configuring endpoint event detection conditions

Table 38-4 Configure the Endpoint Monitoring condition (continued)

Action Description

Select one or more To detect Endpoint incidents, select one or more Endpoint Protocols:
endpoint protocols to
■ Email/SMTP
match.
■ HTTP
■ HTTPS/SSL
■ FTP

See “About endpoint protocol monitoring” on page 915.

Select one or more To detect when users move data on the endpoint, select one or more Endpoint
endpoint destinations. Destinations:

■ Local Drive
■ CD/DVD
■ Removable Storage Device
■ Copy to Network Share
■ Printer/Fax
■ Clipboard

See “About endpoint protocol monitoring” on page 915.

Monitor endpoint To detect when endpoint applications access files, select the Application File
applications. Access option.

See “About global application monitoring” on page 2461.

Match on the entire The DLP Agent evaluates the entire message, not individual message
message. components.

The Envelope option is selected by default. You cannot select the other
message components.

See “Detection messages and message components” on page 391.

Also match one or more Select this option to create a compound condition. All conditions must match
additional conditions. to trigger or except an incident.

You can Add any condition available from the list.

See “Configuring compound match conditions” on page 429.

Configuring the Endpoint Location condition

The Endpoint Location condition matches endpoint events based on the location of the endpoint
computer where the DLP Agent is installed.
You can implement an instance of the Endpoint Location condition in one or more policy
detection rules and exceptions.
Detecting endpoint events 920
Configuring endpoint event detection conditions

See “Configuring policies” on page 413.

Table 38-5 Configure the Endpoint Location detection condition

Action Description

Add or modify the Add a new Endpoint Location detection condition to a policy rule or exception,
Endpoint Location or modify an existing policy rule or exception.
condition.
See “Configuring policy rules” on page 417.

See “Configuring policy exceptions” on page 426.

Select the location to Select one of the following endpoint locations to monitor:
monitor.
■ Off the corporate network
Select this option to detect or except events when the endpoint computer is
off of the corporate network.
■ On the corporate network
Select this option to detect or except events when the endpoint computer is
on the corporate network.
This option is the default selection.

See “About endpoint location detection” on page 917.

Match on the entire The DLP Agent evaluates the entire message, not individual message
message. components.

The Envelope option is selected by default. The other message components

are not selectable.

See “Detection messages and message components” on page 391.

Also match one or Select this option to create a compound condition. All conditions must match to
more additional trigger or except an incident.
conditions.
You can Add any condition available from the list.

See “Configuring compound match conditions” on page 429.

See “About endpoint location detection” on page 917.

See “Configuring the Endpoint Location condition” on page 919.

Configuring the Endpoint Device Class or ID condition

The Endpoint Device Class or ID condition lets you detect when users move endpoint data to
specific devices.
You can implement the Endpoint Device Class or ID condition in one or more policy detection
rules or exceptions.
See “Configuring policies” on page 413.
Detecting endpoint events 921
Configuring endpoint event detection conditions

Table 38-6 Configuring the Endpoint Device Class or ID condition

Action Description

Add or modify an Add a new Endpoint Device Class or ID condition to a policy rule or exception,
Endpoint Device or modify an existing one.
condition.
See “Configuring policy rules” on page 417.

See “Configuring policy exceptions” on page 426.

Select one or more The condition matches when users move data from an endpoint computer to the
devices. selected device(s).

Click Create an endpoint device to define one or more devices.

See “Creating and modifying endpoint device configurations” on page 922.

Match on the entire The DLP Agent matches on the entire message, not individual message
message. components.

The Envelope option is selected by default. You cannot select other components.

See “Detection messages and message components” on page 391.

See “Configuring compound match conditions” on page 429.

See “About endpoint device detection” on page 917.

Gathering endpoint device IDs for removable devices

You add device metadata information to the Enforce Server and create one or more policy
detection methods that detect or except the specific device instance or class of device. The
system supports the regular expression syntax for defining the metadata. The system displays
the device metadata at the Incident Snapshot screen during remediation.
See “Creating and modifying endpoint device configurations” on page 922.
The metadata the system requires to define the device instance or device class is the Device
Instance ID. On Windows you can obtain the "Device Instance Id" from the Device Manager.
In addition, Symantec Data Loss Prevention provides DeviceID.exe for devices attached to
Windows endpoints and DeviceID for devices attached to Mac endpoints. You can use these
utilities to extract Device Instance ID strings and device regex information. These utilities also
report what devices the system can recognize for detection. These utilities are available with
the Enforce Server installation files.
See “About the Device ID utilities” on page 2496.
Detecting endpoint events 922
Configuring endpoint event detection conditions

Note: The Device Instance ID is also used by Symantec Endpoint Protection.

To obtain the Device Instance ID (on Windows)

1 Right-click My Computer.
2 Select Manage.
3 Select the Device Manager.
4 Click the plus sign beside any device to expand its list of device instances.
5 Double-click the device instance. Or, right-click the device instance and select Properties.
6 Look in the Details tab for the Device Instance Id.
7 Use the ID to create device metadata expressions.
See “Creating and modifying endpoint device configurations” on page 922.
See “About endpoint device detection” on page 917.

Creating and modifying endpoint device configurations

You can configure one or more devices for specific endpoint detection. Once the device
expressions are configured, you implement the Endpoint Device Class or ID condition in one
or more policy rules or exceptions to deny or allow the use of the specific devices.
You might deny or allow the use of devices if endpoint users must copy sensitive information
to company-provided USB drives or SD cards.
See “Gathering endpoint device IDs for removable devices” on page 921.

Note: You can use the DeviceID utility for Windows and Mac endpoints to generate removable
storage device information. See “About the Device ID utilities” on page 2496.

To create and modify endpoint device ID expressions

1 Go to the System > Agent > Endpoint Devices screen.
2 Click Add Device.
3 Enter the Device Name.
4 Enter a Device Description.
5 Enter the Device Definition expression.
The device definition must conform to the regular expression syntax.
See Table 38-7 on page 923.
See “About writing regular expressions” on page 853.
Detecting endpoint events 923
Best practices for using endpoint detection

6 Click Save to save the device configuration.

7 Implement the Endpoint Device Class or ID condition in a detection rule or exception.
See “Configuring the Endpoint Device Class or ID condition” on page 920.

Table 38-7 Example Windows endpoint regular device expressions

Example device class Expression example

Generic USB Device USBSTOR\\DISK&VEN_SANDISK&PROD_ULTRA_BACKUP&REV_8\.32\\3485731392112B52

iPod generic USBSTOR\\DISK&VEN_APPLE&PROD_IPOD&.*

Lexar generic USBSTOR\\DISK&VEN_LEXAR.*

CD Drive IDE\\DISKST9160412ASG__________________0002SDM1\\4&F4ACADA&0&0\.0\.0

Hard drive USBSTOR\\DISK&VEN_MAXTOR&PROD_ONETOUCH_II&REV_023D\\B60899082H____&0

Blackberry generic USBSTOR\\DISK&VEN_RIM&PROD_BLACKBERRY...&REV.*

Cell phone USBSTOR\\DISK&VEN_PALM&PROD_PRE&REV_000\\FBB4B8FF4CAEFEC11

24DED689&0

Table 38-8 Example Mac endpoint regex information

Example device Regex information example

class

SanDisk USB SanDisk&Cruzer Blade&20051535820CF1302C2E

SD Card SDC&346128262

External hard drive External&RAID&0000000000702293

See “About endpoint device detection” on page 917.

Best practices for using endpoint detection

When implementing endpoint match conditions, keep in mind the following considerations:
■ Any detection method that executes on the endpoint matches on the entire message, not
individual message components.
See “Detection messages and message components” on page 391.
■ The Endpoint Destination and Endpoint Location methods are specific to the endpoint
computer and are not user-based.
See “Distinguish synchronized DGM from other types endpoint detection” on page 941.
Detecting endpoint events 924
Best practices for using endpoint detection

■ You might often combine group and detection methods on the endpoint. Keep in mind that
the policy language ANDs detection and group methods, whereas methods of the same
type, two rules for example, are ORed.
See “Policy detection execution” on page 394.
Chapter 39
Detecting described
identities
This chapter includes the following topics:

■ Introducing described identity matching

■ Described identity matching examples

■ Configuring described identity matching policy conditions

■ Best practices for using described identity matching

Introducing described identity matching

Described identity detection matches patterns in messages from email senders and recipients,
Windows users, IM users, URL domains, and IP addresses.
See “Configuring described identity matching policy conditions” on page 926.
See “Configuring the Sender/User Matches Pattern condition” on page 927.
See “Configuring the Recipient Matches Pattern condition” on page 930.

Described identity matching examples

Table 39-1 lists and describes some example described content matching examples.
Detecting described identities 926
Configuring described identity matching policy conditions

Table 39-1 Pattern identity matching examples

Example Pattern Matches Does Not Match

fr, cu All SMTP email that is addressed Any email that is addressed to
to a .fr (France) or .cu (Cuba) French company with the .com
addresses. extension instead of .fr.

Any HTTP post to a .fr address

through a Web-based mail
application, such as Yahoo mail.

company.com All SMTP email that is addressed Any SMTP email that is not
to the specific domain URL, such addressed to the specific domain
as symantec.com. URL.

3rdlevel.company.com All SMTP email that is addressed Any SMTP email that is not
to the specific 3rd level domain, addressed to the specific 3rd level
such as dlp.symantec.com. domain.

[email protected] All SMTP email that is addressed Any email not specifically
to [email protected]. addressed to [email protected],
such as:
All SMTP email that is addressed
to [email protected] (the ■ [email protected]
pattern is not case-sensitive). ■ [email protected]
■ [email protected]

192.168.0.* All email, Web, or URL traffic

Note: If the IP address does not
specifically addressed to
match, use one or more domain
192.168.0.[0-255].
URLs instead.
This result assumes that the IP
address maps to the desired
domain, such as
web.company.com.

*/local/dom1/dom/dom2/Sym These are Lotus Notes example

email addresses.
*/Sym*

*/dlp/qa/test/local/Sym*

Configuring described identity matching policy

conditions
Table 39-2 lists and describes the two conditions that Symantec Data Loss Prevention provides
for matching described identities.
Detecting described identities 927
Configuring described identity matching policy conditions

See “Described identity matching examples” on page 925.

Table 39-2 Implementing described identity matching

Match condition Description

Sender/User Matches Pattern Matches on an email address, domain address, IP address, Windows user
name, or IM screen name/handle.

See “Configuring the Sender/User Matches Pattern condition” on page 927.

Recipient Matches Pattern Matches on an email address, domain address, IP address, or newsgroup.

See “Configuring the Recipient Matches Pattern condition” on page 930.

About Reusable Sender/Recipient Patterns

You can create Reusable Sender/User and Recipient Patterns for use in your policies. Reusable
Sender/Recipient Patterns make policy creation and management easier for policies using
such patterns. For details about creating and using Reusable Sender/Recipient Patterns, refer
to the following topics.
See “Configuring a Reusable Sender Pattern” on page 929.
See “Configuring a Reusable Recipient Pattern” on page 931.

Configuring the Sender/User Matches Pattern condition

The Sender/User Matches Pattern condition matches described user and message sender
identities. You can use this condition in a policy detection rule or exception.
See “Introducing described identity matching” on page 925.
See “Best practices for using described identity matching” on page 932.
Configuring the Sender/User Matches Pattern condition describes the process for configuring
the Sender/User Matches Pattern condition.
Detecting described identities 928
Configuring described identity matching policy conditions

Table 39-3 Configuring the Sender/User Matches Pattern condition

Action Description

Enter one or more Sender Email Address Pattern:

Patterns to match one or
■ To match a specific email address, enter the full email address:
more message senders.
[email protected]
Note: The Pattern field ■ To match multiple exact email addresses, enter a comma-separated list:
allows unlimited data (only
[email protected], [email protected],
limited by the browser).
[email protected]
■ To match partial email addresses, enter one or more domain patterns:
■ Enter one or more top-level domain extensions, for example:
.fr, .cu, .in, .jp
■ Enter one or more domain names, for example:
company.com, symantec.com
■ Enter one or more third-level (or lower) domain names:
web.company.com, mail.yahoo.com, smtp.gmail.com,
dlp.security.symantec.com

Windows User Names

Enter the names of one or more Windows users, for example:

john.smith, jsmith

IM Screen Name

Enter one or more IM screen names that are used in instant messaging systems, for
example:

john_smith, jsmith

IP Address
Enter one or more IP addresses that map to the domain you want to match, for example:

■ Exact IP address match, for example:

192.168.1.1 or for IPv6 fdda:c450:e808:3020:abcd:abcd:0000:5000
■ Wildcard match – The asterisk (*) character can substitute for one or more fields,
for example:
192.168.1.* or 192.*.168.* or for IPv6 fdda:c450:e808:3:*:*:*:*

Note: For IPv6, use only long format addresses.

Select a Reusable Sender You can select a Sender Pattern that you have saved for reuse in your policies. Select
Pattern Reusable Sender Pattern, then choose the pattern you want from the dropdown list.
Detecting described identities 929
Configuring described identity matching policy conditions

Table 39-3 Configuring the Sender/User Matches Pattern condition (continued)

Action Description

Match on the entire message. This condition matches on the entire message. The Envelope option is selected by
default. You cannot select any other message component.

See “Detection messages and message components” on page 391.

Also match additional Select this option to create a compound condition. All conditions must match to trigger
conditions. an incident.

You can Add any available condition from the list.

See “Configuring compound match conditions” on page 429.

Configuring a Reusable Sender Pattern

If you want to use a Sender Pattern in multiple policies, configure a Reusable Sender Pattern.
Reusable Sender Patterns can be selected for use in your policies from the Configure Policy
- Edit Rule page. You can create, edit, and manage your Reusable Sender Patterns from the
Sender/Recipient Patterns page. For example, if you use a Sender Pattern in 50 policies,
using a Reusable Sender Pattern lets you enter the Sender Pattern a single time, then select
it for each policy. In addition, if you need to update the Sender Pattern for those 50 policies,
you can edit it from the Configure Reusable Sender Pattern page and your changes will be
applied automatically to each policy using that pattern.
To configure a Reusable Sender Pattern
1 Take one of the following actions:
■ If you are configuring a policy with a Sender/User Matches Pattern rule, from the
Manage > Policies > Policy List > Configure Policy - Edit Rule page, click Create
Reusable Sender Pattern.
■ In the Enforce Server administration console, navigate to Manage > Policies >
Sender/Recipient Patterns, then click Add > Sender Pattern.

2 In the General section on the Configure Reusable Sender Pattern page, enter a Name
and Description for your Reusable Sender Pattern.
3 In the Sender Pattern section, enter the User Patterns and IP Addresses as described
in the "Configuring the Sender/User Matches Pattern condition table".
See Table 39-3 on page 928.
4 Click Save.
Detecting described identities 930
Configuring described identity matching policy conditions

5 To edit a saved Reusable Sender Pattern, on the Manage > Policies > Sender/Recipient
Patterns page, click the dropdown arrow next to the name of the pattern you want to edit,
then select Edit.
6 To delete a saved Reusable Sender Pattern, on the Manage > Policies >
Sender/Recipient Patterns page, click the dropdown arrow next to the name of the
pattern you want to delete, then select Delete.

Note: You cannot delete a Reusable Sender Pattern that is currently in use in any policy.

Configuring the Recipient Matches Pattern condition

The Recipient Matches Pattern condition matches the described identity of message recipients.
You can use this condition in a policy detection rule or exception.
See “Introducing described identity matching” on page 925.
See “Define precise identity patterns to match users” on page 932.
Configuring the Recipient Matches Pattern condition defines the process for configuring the
Recipient Matches Pattern condition.

Table 39-4 Recipient Matches Pattern condition parameters

Action Description

Enter one or more Recipient Email Address/Newsgroup Pattern

Patterns to match one or more
Enter one or more email or newsgroup addresses to match the desired recipients.
message recipients. Separate
multiple entries with commas. To match specific email addresses, enter the full address, such as
[email protected]. To match email addresses from a specific domain, enter
Note: The Pattern field allows
the domain name only, such as symantec.com.
unlimited data (only limited by
the browser). IP Address

Enter one or more IP address patterns that resolve to the domain that you want to
match. You can use the asterisk (*) wildcard character for one or more fields. You can
enter both IPv4 and IPv6 addresses separated by commas.

URL Domain

Enter one or more URL Domains to match Web-based traffic, including Web-based
email and postings to a Web site. For example, if you want to prohibit the receipt of
certain types of data using Hotmail, enter hotmail.com.
Detecting described identities 931
Configuring described identity matching policy conditions

Table 39-4 Recipient Matches Pattern condition parameters (continued)

Action Description

Select a Reusable Recipient You can select a Recipient Pattern that you have saved for reuse in your policies.
Pattern Select Reusable Recipient Pattern, then choose the pattern you want from the
dropdown list.

Configure match counting. Select one of the following options to specify the number of email recipients that must
match:

■ All recipients must match (Email Only) does not count a match unless ALL email
message recipients match the specified pattern.
■ At least _ recipients must match (Email Only) lets you specify the minimum
number of email message recipients that must match to be counted.
Select one of the following options to specify how you want to count the matches:

■ Check for existence

Reports a match count of 1 if there are one or more matches.
■ Count all matches
Reports the sum of all matches.

See “Configuring match counting” on page 421.

Match on the entire message. This condition matches on the entire message. The Envelope option is selected by
default. You cannot select any other message component.

See “Detection messages and message components” on page 391.

Also match additional Select this option to create a compound condition. All conditions in a rule or exception
conditions. must match to trigger an incident.
You can Add any available condition from the list.

See “Configuring compound match conditions” on page 429.

Configuring a Reusable Recipient Pattern

If you want to use a Recipient Pattern in multiple policies, configure a Reusable Recipient
Pattern. Reusable Recipient Patterns can be selected for use in your policies from the
Configure Policy - Edit Rule page. You can create, edit, and manage your Reusable Recipient
Patterns from the Sender/Recipient Patterns page. For example, if you use a Recipient
Pattern in 50 policies, using a Reusable Recipient Pattern lets you enter the Recipient Pattern
a single time, then select it for each policy. In addition, if you need to update the Recipient
Pattern for those 50 policies, you can edit it from the Configure Reusable Recipient Pattern
page and your changes will be applied automatically to each policy using that pattern.
To configure a Reusable Recipient Pattern
1 Take one of the following actions:
Detecting described identities 932
Best practices for using described identity matching

■ If you are configuring a policy with a Recipient Matches Pattern rule, from the Manage
> Policies > Policy List > Configure Policy - Edit Rule page, click Create Reusable
Recipient Pattern.
■ In the Enforce Server administration console, navigate to Manage > Policies >
Sender/Recipient Patterns, then click Add > Recipient Pattern.

2 In the General section on the Configure Reusable Recipient Pattern page, enter a
Name and Description for your Reusable Recipient Pattern.
3 In the Recipient Pattern section, enter the Email Addresses, IP Addresses, and URL
Domains as described in the "Recipient Matches Pattern condition table".
See Table 39-4 on page 930.
4 Click Save.
5 To edit a saved Reusable Recipient Pattern, on the Manage > Policies >
Sender/Recipient Patterns page, click the dropdown arrow next to the name of the
pattern you want to edit, then select Edit.
6 To delete a saved Reusable Recipient Pattern, on the Manage > Policies >
Sender/Recipient Patterns page, click the dropdown arrow next to the name of the
pattern you want to delete, then select Delete.

Note: You cannot delete a Reusable Recipient Pattern that is currently in use in any policy.

Best practices for using described identity matching

This section provides considerations for implementing the Sender/User or Recipient Matches
Pattern conditions in policy detection rules or exceptions. Keep in mind these considerations
when you implement these conditions.

Define precise identity patterns to match users

Both the Sender/User and Recipient conditions match on the entire message, not individual
message components. If either condition is used as an exception, a match excludes the entire
message, not only the header.
See “Policy detection execution” on page 394.
For both described identity matching rules, the system implies an OR between all
comma-separated list items and between all fields. For example, if any single email address
among a list of email addresses matches, the condition reports (or excepts) an incident. Or,
if either an email address, a domain name, or an IP address matches, the condition reports
(or excepts) an incident.
Detecting described identities 933
Best practices for using described identity matching

See “Detection messages and message components” on page 391.

Table 39-5 describes the types of patterns you can use for described identity matching.

Table 39-5 Patterns for identity matching

Pattern Sender/User Matches Pattern Recipient Matches Pattern

Email address: full and partial matches matches

Domain address: top-level and matches matches

subdomains

IP address matches matches

Windows user name matches does not match

IM screen name / handle matches does not match

Newsgroup patterns does not match matches

Specify email addresses exactly to improve accuracy

An email address must match exactly. For example, [email protected] does not match
[email protected]. But, a domain name pattern such as company.com or
something.company.com matches [email protected].

The email address field does not match the sender or recipient of a Web post. For example,
the email address [email protected] does not match if Bob uses a Web browser to send or
receive email. In this case, you must use the domain pattern mail.yahoo.com to match
[email protected].

Match domains instead of IP addresses to improve accuracy

The URL Domain pattern matches HTTP traffic to particular URL domains. You do not enter
the entire URL. For example, you enter mail.yahoo.com not http://www.mail.yahoo.com.
The system does not resolve URL domains to IP addresses. For example, you specify an IP
address of 192.168.1.1 for a specific domain. If users access the domain URL using a Web
browser, the system does not match emails that are transmitted by the IP address. In this
case, use a domain pattern instead of an IP address, such as internalmemos.com.
You can detect senders/users and recipients based one or more IP addresses . However, to
do so you must carefully consider the placement of the detection server on your network. If
the detection server is installed between the Web proxy and the Internet, the IP address of all
Web traffic from individuals in your organization appears to come from the Web proxy. If the
detection server is installed between the Web proxy and the internal corporate network, the
Detecting described identities 934
Best practices for using described identity matching

IP address of all Web traffic from outside your organization appears to go to the Web proxy.
The best practice is to match on domain names instead of IP addresses.
Chapter 40
Detecting synchronized
identities
This chapter includes the following topics:

■ Introducing synchronized Directory Group Matching (DGM)

■ About two-tier detection for synchronized DGM

■ Configuring User Groups

■ Configuring synchronized DGM policy conditions

■ Best practices for using synchronized DGM

Introducing synchronized Directory Group Matching

(DGM)
Symantec Data Loss Prevention provides synchronized Directory Group Matching (DGM) to
detect data based on the exact identities of users, senders, and recipients of that data. Using
synchronized DGM, you can connect the Enforce Server to a group directory server such as
Microsoft Active Directory and detect users based on their directory group affiliation. For
example, you may want to apply policies to staff only in the engineering department of your
company, but not to staff in the human resources department. Synchronized DGM enables
you to do this.
Synchronized DGM is based on a User Group configuration that you populate with users
synchronized from your directory server. When you create a synchronized DGM policy, you
reference the User Group in the policy. At runtime the synchronized DGM policy only applies
to identities in the User Group reference by the policy. Or, consider an example where you
you want to create a policy that applies to your everyone in your organization except the CEO.
In this case you can create a User Group that contains the CEO's identity as a sole group
Detecting synchronized identities 936
About two-tier detection for synchronized DGM

member. You then define a policy exception that references the CEO User Group. At runtime
the policy will ignore messages sent or received by the CEO.
See “User Groups” on page 376.

About two-tier detection for synchronized DGM

On the endpoint, the Recipient based on a Directory Server Group condition requires two-tier
detection for DLP Agents. The corresponding Sender/User based on a Directory Server
Group condition does not require two-tier detection.
Be sure understand the implications of two-tier detection before you deploy the synchronized
DGM Recipient rule to one or more endpoints.
See “Two-tier detection for DLP Agents” on page 395.
To check if two-tier detection is being used, check the
c:\ProgramData\Symantec\DataLossPrevention\DetectionServer\15.5\Protect\logs
\debug\FileReader.log (Windows) or
/var/log/Symantec/DataLossPrevention/DetectionServer/15.5/debug (Linux) on the
Endpoint Server.
See “Troubleshooting policies” on page 445.

Configuring User Groups

The Manage > Policies > User Groups screen displays configured User Groups and is the
starting point for creating a new User Group. User Groups are used for implementing
synchronized DGM.
See “Introducing synchronized Directory Group Matching (DGM)” on page 935.

Note: DLP Agents installed on Mac endpoints support User Groups that use Active Directory
(AD) group conditions in policies.

To create or modify a User Group

1 Establish a connection to the Active Directory server you want to synchronize with.
See “Configuring directory server connections” on page 156.
2 At the Manage > Policies > User Groups screen, click Create New Group.
Or, to edit an existing user group, select the group in the User Groups screen.
Detecting synchronized identities 937
Configuring User Groups

3 Configure the User Group parameters as required.

See Table 40-1 on page 937.

Note: If this is the first time you are configuring the User Group, you must select the option
Refresh the group directory index on Save to populate the User Group.

4 After you locate the users you want, use the Add and Remove options to include or
exclude them in the User Group.
5 Click Save.

Table 40-1 Configure a User Group

Action Description

Enter the group The Group Name is the name that you want to use to identify this group.
name.
Use a descriptive name so that you can easily identify it later on.

Enter the group Enter a short Description of the group.

description

View which policies Initially, when you create a new User Group, the Used in Policy field displays None.
use the group.
If the User Group already exists and you modify it, the system displays a list of the policies that
implement the User Group, assuming one or more group-based policies is created for this User
Group.

Refresh the group Select (check) the Refresh the group directory index on Save option to synchronize the user
directory index on group profile with the most recent directory server index immediately on Save of the profile. If
Save. you leave this box unselected (unchecked), the profile is synchronized with the directory server
index based on the Directory Connection setting.

See “Scheduling directory server indexing” on page 158.

If this is the first time you are configuring the User Group profile, you must select the Refresh
the group directory index on Save option to populate the profile with the latest directory server
index replication.

Select the directory Select the directory server you want to use from the Directory Server list.
server.
You must establish a connection to the directory server before you create the User Group profile.

See “Configuring directory server connections” on page 156.

Include email Check the Include Mail Aliases box to index user email aliases along with primary email
aliases. addresses. For example, if a user has the primary email address "[email protected]"
and an email alias "[email protected]," checking this box will index both email
addresses. Be aware that indexing email aliases will increase your index size.
Detecting synchronized identities 938
Configuring synchronized DGM policy conditions

Table 40-1 Configure a User Group (continued)

Action Description

Search the directory Enter the search string in the search field and click Search to search the directory for specific
for specific users. users. You can search using literal text or wildcard characters (*).

The search results display the Common Name (CN) and the Distinguished Name (DN) of the
directory server that contains the user. These names give you the specific user identity. Results
are limited to 1000 entries.

Click Clear to clear the results and begin a new search of the directory.
Literal text search criteria options:

■ Name of individual node, such as "engineering" or "accounting"

■ Email address, such as "[email protected]"
Wildcard character search criteria options:

■ The supported wildcard character is an asterisk (*)

■ Proper wildcard search examples:
■ Gabriel *akha* returns "Gabriel Oakham"
■ j* jop* returns "Janice Joplin"
■ Improper wildcard search:
■ Do not begin the search string with a wildcard; this will hinder directory server search
performance.
■ For example, the following search is not recommended: *Gabriel Oakham.

Browse the directory You can browse the directory tree for groups and users by clicking on the individual nodes and
for user groups. expanding them until you see the group or node that you want.

The browse results display the name of each node. These names give you the specific user
identity.

The results are limited to 20 entries by default. Click See More to view up to 1000 results.

Add a user group to To add a group or user to the User Group profile, select it from the tree and click Add.
the profile.
After you select and add the node to the Added Groups column, the system displays the
Common Name (CN) and the Distinguished Name (DN).

Save the user group. Click Save to save the User Group profile you have configured.

Configuring synchronized DGM policy conditions

To implement synchronized DGM policies, you define a Directory Connection using the
Enforce Server administration console. The Directory Connection specifies the directory
server you want to use as source information for defining exact identity User Groups. You
then define one or more User Groups in the Enforce Server administration console and
populate the group by synchronizing the User Group with the directory server. You then
Detecting synchronized identities 939
Configuring synchronized DGM policy conditions

associate the User Groups with the Sender/User based on a Directory Server Group group
rule or the Recipient matches User Group based on a Directory Server group rule.
See “Introducing synchronized Directory Group Matching (DGM)” on page 935.
Table 40-2 describes the process for implementing synchronized DGM.

Table 40-2 Workflow for implementing synchronized DGM

Step Action Description

1 Create the connection to the Establish the connection from the Enforce Server to a directory server such
directory server. as Microsoft Active Directory.

See “Configuring directory server connections” on page 156.

2 Create the User Group. Create one or more User Groups on the Enforce Server and populate the
User Groups with the exact identities from the users, groups, and business
units that are defined in the directory server

See “Configuring User Groups” on page 936.

3 Configure a new policy or edit See “Configuring policies” on page 413.

an existing one.

4 Configure one or more group Choose the type of synchronized DGM rule you want to implement and
rules or exceptions. reference the User Group. After the policy and the group are linked, the
policy applies only to those identifies in the referenced User Group.

See “Configuring the Sender/User based on a Directory Server Group

condition” on page 939.
See “Configuring the Recipient based on a Directory Server Group
condition” on page 940.

Configuring the Sender/User based on a Directory Server Group

condition
The condition Sender/User based on a Directory Server Group matches policy violations
based on message senders and endpoint users synchronized from a directory group server.
You can implement this condition in a policy group (identity) rule or exception.
See “Configuring policies” on page 413.

Note: If the identity being detected is a user, the user must be actively logged on to a DLP
Agent-enabled system for the policy to match.
Detecting synchronized identities 940
Configuring synchronized DGM policy conditions

Table 40-3 Sender/User matches User Group condition parameters

Parameter Description

Select User Groups to Select one or more User Groups that you want this policy to detect.
include in this policy
If you have not created a User Group, click Create a new User Group.

See “Configuring User Groups” on page 936.

Match On This condition matches on the entire message. The Envelope option is selected by default.
You cannot select any other message component.

See “Detection messages and message components” on page 391.

Also Match Select this option to create a compound condition. All conditions in a rule or exception
must match to trigger an incident.

You can Add any available condition from the list.

See “Configuring compound match conditions” on page 429.

See “Introducing synchronized Directory Group Matching (DGM)” on page 935.

Configuring the Recipient based on a Directory Server Group

condition
The Recipient based on a Directory Server Group condition matches policy violations based
on specific message recipients synchronized from a directory server. You can implement this
condition in a policy group rule or exception.
See “Introducing synchronized Directory Group Matching (DGM)” on page 935.

Note: The Recipient based on a Directory Server Group condition requires two-tier detection.
See “About two-tier detection for synchronized DGM” on page 936.

Table 40-4 Configuring the Recipient based on a Directory Server Group condition

Step Action Description

1 Select User Groups to Select the User Group(s) that you want this policy to match on.
include in this policy
If you have not created a User Group, click Create a new Endpoint User
Group option.

See “Configuring User Groups” on page 936.

2 Match On This rule detects the entire message, not individual components. The Envelope
option is selected by default. You cannot select any other message component.

See “Detection messages and message components” on page 391.

Detecting synchronized identities 941
Best practices for using synchronized DGM

Table 40-4 Configuring the Recipient based on a Directory Server Group condition
(continued)

Step Action Description

3 Also Match Select this option to create a compound condition. All conditions in a rule or
exception must match to trigger an incident.

You can Add any available condition from the list.

See “Configuring compound match conditions” on page 429.

Best practices for using synchronized DGM

This section contains a few considerations to keep in mind when implementing synchronized
DGM conditions in your policies.

Refresh the directory on initial save of the User Group

To execute a policy rule based on an Active Directory group, the index that you define on the
Enforce Server must first be populated. When you first define the User Group, the
recommendation is to select the option "Refresh the group directory index on Save." This
ensures proper synchronization of Active Directory with the Enforce Server. Once the User
Group is populated, you can then set up scheduling to keep the user group on Enforce in sync
with the Active Directory server.
One use case for not indexing immediately is where you are creating multiple User Groups
and you want to index after you have defined all the groups. In this case you can use scheduling,
but keep in mind that any policies based on these indices will not execute until they are
populated.
See “Introducing synchronized Directory Group Matching (DGM)” on page 935.
See “Configuring User Groups” on page 936.

Distinguish synchronized DGM from other types endpoint detection

When synchronized DGM policies are deployed to endpoint servers, identity-based detection
applies to the users in a configured group of DLP Agent-based endpoints. With endpoint-based
user groups, many different users can log on to the same computer depending on business
practices. The response that each user sees on that endpoint varies depending on how the
users are grouped. Contrast this style of endpoint detection with the Endpoint Protocol
Destination or Endpoint Location methods, which are specific to the endpoint and are not
user-based.
See “Introducing synchronized Directory Group Matching (DGM)” on page 935.
Chapter 41
Detecting profiled identities
This chapter includes the following topics:

■ Introducing profiled Directory Group Matching (DGM)

■ About two-tier detection for profiled DGM

■ Configuring Exact Data profiles for DGM

■ Configuring profiled DGM policy conditions

■ Best practices for using profiled DGM

Introducing profiled Directory Group Matching (DGM)

Profiled Directory Group Matching (DGM) leverages Exact Data Matching (EDM) technology
to detect identities that you have indexed from your database or directory server using an
Exact Data Profile. For example, you can use profiled DGM to identify network user activity
or to analyze content associated with particular users, senders, or recipients. Or, you can
exclude certain email addresses from analysis. Or, you might want to prevent certain people
from sending confidential information by email.
See “Configuring Exact Data profiles for DGM” on page 943.
Profiled DGM is distinguished from synchronized DGM, which uses a connection to a directory
server (such as Microsoft Active Directory) to match identities.
See “Introducing synchronized Directory Group Matching (DGM)” on page 935.

About two-tier detection for profiled DGM

Profiled DGM relies on an EDM index, which is server-based. Profiled DMG requires two-tier
detection for DLP Agents on the endpoint.
See “About two-tier detection for EDM on the endpoint” on page 533.
Detecting profiled identities 943
Configuring Exact Data profiles for DGM

You cannot combine either type of profiled DGM condition with an Endpoint: Block or
Endpoint: Notify response rule in a policy. If you do, the system reports that the policy is
misconfigured.
See “Troubleshooting policies” on page 445.

Configuring Exact Data profiles for DGM

To implement profiled DGM, you export identity records from a directory server or database,
index the data, and create an Exact Data Profile. You then reference this profile in the
corresponding Sender/User or Recipient condition.
See “Introducing profiled Directory Group Matching (DGM)” on page 942.
Table 41-1 describes the procedure for configuring Exact Data profiles for DGM policies.

Table 41-1 Workflow for implementing profiled DGM

Step Action Description

1 Create the data source file. Create a data source file from the directory server or database you want to
profile. Make sure the data source file contains the appropriate fields.
The following fields are supported for profiled DGM:

■ Email address
■ IP address
■ Window user name (in the format domain\user)
■ IM screen name

See “Creating the exact data source file for profiled DGM for EDM”
on page 537.

2 Prepare the data source See “Configuring Exact Data profiles for EDM” on page 534.
file for indexing.
See “Preparing the exact data source file for indexing for EDM” on page 537.

3 Create the Exact Data This includes uploading the data source file to the Enforce Server, mapping
Profile. the data fields, and indexing the data source.

See “Uploading exact data source files for EDM to the Enforce Server”
on page 539.

See “Creating and modifying Exact Data Profiles for EDM” on page 541.

See “Mapping Exact Data Profile fields for EDM” on page 545.

See “Scheduling Exact Data Profile indexing for EDM” on page 548.
Detecting profiled identities 944
Configuring profiled DGM policy conditions

Table 41-1 Workflow for implementing profiled DGM (continued)

Step Action Description

4 Define the profiled DGM See “Configuring the Sender/User based on a Profiled Directory condition”
condition. on page 944.

See “Configuring the Recipient based on a Profiled Directory condition”

on page 945.

5 Test the profiled DGM Use a test policy group and verify that the matches the policy generates are
policy. accurate.

See “Test and tune policies to improve match accuracy” on page 453.

Configuring profiled DGM policy conditions

Symantec Data Loss Prevention provides two match conditions for profiled DGM: sender/user
and recipient. Both conditions can be used as policy rules or exceptions. For example, consider
a scenario where you index a list of email addresses and author profiled DGM policies based
on this indexed data. You could write a rule that requires the message sender to be from the
indexed list to violate the policy. Or, you could write an exception that is not violated if the
recipient of an email is from the indexed list.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.

Table 41-2 Profiled DGM conditions

Group rule Description

Sender/User based on a Directory If this condition is implemented as a policy rule, a match occurs only if the
from <EDM Profile> sender or user of the data is contained in the index profile. If this condition is
implemented as a policy exception, the data will be excepted from matching
if it is sent by a sender/user listed in the index profile

Recipient based on a Directory from If this condition is implemented as a policy rule, a match occurs only if the
<EDM Profile> recipient of the data is contained in the index profile. If this condition is
implemented as a policy exception, the data will be excepted from matching
if it is received by a recipient listed in the index profile.

Configuring the Sender/User based on a Profiled Directory condition

The Sender/User based on a Directory from detection rule lets you create detection rules
based on sender identity or (for endpoint incidents) user identity. This condition requires an
Exact Data Profile.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.
Detecting profiled identities 945
Configuring profiled DGM policy conditions

After you select the Exact Data Profile, when you configure the rule, the directory you selected
and the sender identifier(s) appear at the top of the page.
Table 41-3 describes the parameters for configuring the Sender/User based on a Directory
an EDM Profile condition.

Table 41-3 Configuring the Sender/User based on a Directory from an EDM Profile condition

Parameter Description

Configuring the Recipient based on a Profiled Directory condition

The Recipient based on a Directory from condition lets you create detection methods based
on the identity of the recipient. This method requires an Exact Data Profile.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.
After you select the Exact Data Profile, when you configure the rule, the directory you selected
and the recipient identifier(s) appear at the top of the page.
Table 41-3 describes the parameters for configuring Recipient based on a Directory from
an EDM profile condition.
Detecting profiled identities 946
Best practices for using profiled DGM

Table 41-4 Configuring the Recipient based on a Directory from an EDM profile condition

Parameter Description

For example, for an Employees directory group profile that includes a Department field, you would
select Where, select Department from the drop-down list, and enter Marketing, Sales in the text
box. For a detection rule, this example causes the system to capture an incident only if at least one
recipient works in Marketing or Sales (as long as the input content meets all other detection criteria).
For an exception, this example prevents the system from capturing an incident if at least one recipient
works in Marketing or Sales.

Best practices for using profiled DGM

Keep in mind the considerations in this section when implementing profiled Directory Group
Matching (DGM)

Follow EDM best practices when implementing profiled DGM

Profiled DGM leverages EDM technology. Follow the EDM procedures and best practices
when implementing profiled DGM.
See “About two-tier detection for profiled DGM” on page 942.

Include an email address field in the Exact Data Profile for profiled
DGM
You must include the appropriate fields in the Exact Data Profile to implement profiled DGM.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.
If you include the email address field in the Exact Data Profile for profiled DGM and map it to
the email data validator, email address will appear in the Directory EDM drop-down list (at
the remediation page).
Detecting profiled identities 947
Best practices for using profiled DGM

Use profiled DGM for Network Prevent for Web identity detection
If you want to implement DGM for Network Prevent for Web, use one of the profiled DGM
conditions to implement identity matching. For example, you may want to use identity matching
to block all web traffic for a specific users. For Network Prevent for Web, you cannot use
synchronized DGM conditions for this use case.
See “Creating the exact data source file for profiled DGM for EDM” on page 537.
See “Configuring the Sender/User based on a Profiled Directory condition” on page 944.
Chapter 42
Using contextual attributes
for Application Detection
This chapter includes the following topics:

■ Introducing contextual attributes for cloud applications

■ Configuring contextual attribute conditions

Introducing contextual attributes for cloud

applications
You can include contextual attribute conditions in policy detection rules for Application Detection
incidents. These contextual attributes specify the attributes that are associated with cloud
applications monitored or inspected by the Cloud Detection Service. For example, you can
create a policy detection rule that includes the Application Name: Gatelet > Salesforce
condition to specify that the detection rule applies to incidents that are associated with the
Symantec CloudSOC Salesforce Gatelet.
Contextual attributes are organized by category: General, User, Data Exposure, Data
Transfer, and Custom.
See “Contextual attribute categories” on page 949.
See “Configuring contextual attribute conditions” on page 948.

Configuring contextual attribute conditions

You configure contextual attribute conditions as part of a policy rule or exception. The following
procedure presumes that you are familiar with policy configuration. Refer to the following topics
for detailed information about policy configuration:
Using contextual attributes for Application Detection 949
Configuring contextual attribute conditions

See “Configuring policies” on page 413.

See “Configuring policy rules” on page 417.
See “Configuring policy exceptions” on page 426.
To configure a policy rule with a contextual attribute condition, follow this procedure:
To configure contextual attribute conditions
1 Add a Contextual Attributes (Cloud Applications and API Detection Appliance only)
condition to a policy rule or exception, or edit an existing one.
2 Select a contextual attribute condition from the Attributes drop-down list.
See “Contextual attribute categories” on page 949.
3 Configure the appropriate contextual attribute values.
4 Click OK.

Contextual attribute categories

Contextual attributes are grouped into categories: General, User, Data Exposure, Data
Transfer, and Custom.
The following tables provide more details about the attributes and attribute values available
in each category.

General attributes
General attributes apply to all data types and applications.
Using contextual attributes for Application Detection 950
Configuring contextual attribute conditions

Table 42-1 General attributes

Attribute Value Description

Application Name Specifies the name of the cloud web

proxy, Gatelet, or Securlet.
Using contextual attributes for Application Detection 951
Configuring contextual attribute conditions

Table 42-1 General attributes (continued)

Attribute Value Description

Securlets:

■ Amazon S3
■ Amazon Web Services
■ Box
■ Cisco Spark
■ Dropbox
■ Facebook Workplace
■ Google Calendar
■ Google Drive
■ Gmail
■ Microsoft Azure
■ Microsoft Teams
■ Office 365 Email
■ Office 365 OneDrive
■ Office 365 SharePoint
■ Salesforce
■ SAP
■ ServiceNow
■ Slack
■ Workday
■ Yammer
Gatelets:

■ 4Shared
■ 4Sync
■ Acrobat.com
■ AIM Mail
■ Alfresco
■ Amazon CloudDrive
■ Amazon Web Services
■ Amazon WorkDocs
■ BitCasa
■ Box
■ BV ShareX
■ cCloud
■ CentralDesktop
■ CloudMe
■ CloudProvider
Using contextual attributes for Application Detection 952
Configuring contextual attribute conditions

Table 42-1 General attributes (continued)

Attribute Value Description

■ Confluence
■ Copy
■ Cubby
■ DigitalBucket
■ Digital Ocean
■ DocuSign
■ Dropbox
■ Dynamics
■ Egnyte
■ FilesAnywhere
■ Flow
■ Ftopia
■ Gmail
■ GroupDocs
■ Hightail
■ Huddle
■ IBM Connections
■ iCloud
■ iDrive
■ Intralinks
■ Jive
■ Joyent
■ Just Cloud
■ MailerLite
■ MediaFire
■ Microsoft Azure
■ Office 365
■ OneDrive
■ OneHub
■ OneUbuntu
■ Outlook.com
■ OwnCloud
■ Oxygen
■ Podio
■ Rackspace
■ RapidShare
■ SafeSync
■ Salesforce
Using contextual attributes for Application Detection 953
Configuring contextual attribute conditions

Table 42-1 General attributes (continued)

Attribute Value Description

■ SeaCloud
■ ShareFile
■ Sites
■ Slack
■ SmartFile
■ Soonr
■ SugarSync
■ SurveyMonkey
■ Syncplicity
■ Uploaded
■ WatchDocs
■ WebCargo
■ Workshare
■ Wuala
■ Xero
■ Yahoo Mail
■ Yammer
■ Zoho Docs
Bluecoat WSS:

■ Bluecoat WSS (Symantec Web

Security Service)
Custom:

■ Custom

Application Type ■ Web Security Services (Cloud Specifies the type of application:
Proxy) Symantec Web Security Services,
■ Gatelet Symantec CloudSOC Gatelets,
■ Securlet Symantec CloudSOC Securlets, or a
■ Custom custom application.

Data Type ■ Data-at-Rest Specifies the data type: data at rest

■ Data-in-Motion (stored in a cloud repository), data in
■ Custom motion (data traveling over the
network), or custom.

User attributes
User attributes address specific information about the user that is associated with an incident.
Using contextual attributes for Application Detection 954
Configuring contextual attribute conditions

Table 42-2 User attributes

Attribute Value Description

Activity Type ■ Create Specifies the type of action that was

■ Edit taken by the user on the data of the
■ Rename incident.
■ Upload Symantec Web Security Service does
■ Download not use this attribute.
■ Custom

Client Tenant Domain Enter the name in the Match field. Specifies the client tenant domain of
the user. You can match exactly with
or without case sensitivity, or match
on a regular expression.

Client Tenant User ID Enter the user identifier in the Match Specifies the client tenant identifier of
field. the user. You can match exactly with
or without case sensitivity, or match
on a regular expression.

Exposed Document Count ■ Is Greater Than Specifies the users with a number of
■ Is Less Than exposed documents above or below
■ Is Greater Than or Equals a certain value, or within a range you
specify.
■ Is Less Than or Equals
■ Equals Symantec Web Security Service does
■ Range not use this attribute.

User ID ■ Match Specifies a user identifier that you

■ Match Type provide. You can match exactly with
or without case sensitivity, or match
on a regular expression.

User Name ■ Match Specifies a user identifier that you

■ Match Type provide. You can match exactly with
or without case sensitivity, or match
on a regular expression.

Symantec Web Security Service does

not use this attribute.

User Threat Score ■ Is Greater Than Specifies the Shadow IT threat score
■ Is Less Than of the user, above or below a certain
■ Is Greater Than or Equals value, or within a range you specify.
■ Is Less Than or Equals This attribute applies only to Securlet
■ Equals policies.
■ Range
Using contextual attributes for Application Detection 955
Configuring contextual attribute conditions

Table 42-2 User attributes (continued)

Attribute Value Description

User is Internal ■ True Specifies whether or not the user is

■ False part of your organization.

Symantec Web Security Service does

not use this attribute.

Data exposure attributes

Data exposure attributes specify information about the documents that are stored in cloud data
repositories ("data at rest"). Symantec Web Security Services does not use any data exposure
attributes.

Table 42-3 Data exposure attributes

Attribute Value Description

Document Creation Date ■ After Specifies the date the document was
■ Before created.
■ On or After
■ On or Before
■ On
■ Range

Document Last Accessed ■ After Specifies the date the document was
■ Before last accessed.
■ On or After
■ On or Before
■ On
■ Range

Document Last Modified ■ After Specifies the date the document was
■ Before last modified.
■ On or After
■ On or Before
■ On
■ Range

Document Owner ■ Match Specifies the name of the document

■ Match Type owner. You can match exactly with or
without case sensitivity, or match on
a regular expression.
Using contextual attributes for Application Detection 956
Configuring contextual attribute conditions

Table 42-3 Data exposure attributes (continued)

Attribute Value Description

Document Tag ■ Match Specifies the metadata tag of the

■ Match Type document. You can match exactly with
or without case sensitivity, or match
on a regular expression.

Document Type ■ Match Specifies the type of document. You

■ Match Type can match exactly with or without case
sensitivity, or match on a regular
expression.

Document is Exposed ■ True Specifies if the document is shared or

■ False accessible. The document is
"exposed" when shared with or
accessible to everyone within your
organization, or shared with or
accessible to anyone outside of your
organization. If the document is only
shared with certain members of your
organization, it is not considered an
exposed document.

Document is Internal ■ True Specifies if the document is "internal."

■ False A document is considered internal if a
member of your organization created
it.

Document is Internally Shared ■ True Specifies if the document is shared

■ False with or accessible to everyone within
your organization.

Document is Publically Exposed ■ True Specifies if the document is shared

■ False with or accessible to everyone outside
your organization. Such documents
are available to everyone on the
Internet.

Job ID ■ Match Specifies the job identifier that is

■ Match Type associated with the document. You
can match exactly with or without case
sensitivity, or match on a regular
expression.
Using contextual attributes for Application Detection 957
Configuring contextual attribute conditions

Table 42-3 Data exposure attributes (continued)

Attribute Value Description

Service Classification ■ Match Specifies the Shadow IT service

■ Match Type classification. You can match exactly
with or without case sensitivity, or
match on a regular expression.

Symantec Web Security Service does

not use this attribute.

Service Rating ■ Is Greater Than Specifies the Shadow IT service score

■ Is Less Than rating, above or below a certain value,
■ Is Greater Than or Equals or within a range you specify.
■ Is Less Than or Equals Symantec Web Security Service does
■ Equals not use this attribute.
■ Range

SharePoint Site Name ■ Match Specifies the name of a SharePoint

■ Match Type Site. You can match exactly with or
without case sensitivity, or match on
a regular expression.

Symantec Web Security Service does

not use this attribute.

Data transfer attributes

Data transfer attributes specify information about data moving over the network ("data in
motion").

Table 42-4 Data transfer attributes

Attribute Value Description

Browser ■ Match Specifies the name of the web browser

■ Match Type that is associated with the detection
request. You can match exactly with
or without case sensitivity, or match
on a regular expression.

Country Select a country from the drop-down Specifies the name of the country that
list of country names. is associated with the detection
request.

Symantec Web Security Service does

not use this attribute.
Using contextual attributes for Application Detection 958
Configuring contextual attribute conditions

Table 42-4 Data transfer attributes (continued)

Attribute Value Description

Device Inside Office ■ True Specifies if the device associated with

■ False the detection request is located within
your office.

Symantec Web Security Service does

not use this attribute.

Device OS ■ Match Specifies the operating system of the

■ Match Type device that is associated with the
detection request. You can match
exactly with or without case sensitivity,
or match on a regular expression.

Symantec Web Security Service does

not use this attribute.

Device Type ■ Match Specifies the type of device that is

■ Match Type associated with the detection request.
You can match exactly with or without
case sensitivity, or match on a regular
expression.

Symantec Web Security Service does

not use this attribute.

Device is Compliant ■ True Specifies whether or not the device is

■ False compliant, based on information from
your mobile device management
system.

Symantec Web Security Service does

not use this attribute.

Device is Managed ■ True Specifies whether or not your

■ False organization manages the device,
based on information from your mobile
device management system.

Symantec Web Security Service does

not use this attribute.

Device is Personal ■ True Specifies whether or not the user owns

■ False the device, based on information from
your mobile device management
system.

Symantec Web Security Service does

not use this attribute.
Using contextual attributes for Application Detection 959
Configuring contextual attribute conditions

Table 42-4 Data transfer attributes (continued)

Attribute Value Description

Device is Trusted ■ True Specifies whether or not the device is

■ False trusted, based on information from
your mobile device management
system.

Symantec Web Security Service does

not use this attribute.

HTTP Method ■ GET Specifies the method that is used in

■ PUT the HTTP traffic that is submitted for
■ DELETE inspection.
■ POST
■ Custom

Network Direction ■ Upload Specifies the network direction of the

■ Download message that is submitted for
■ Custom inspection.

Recipient IP ■ Match Specifies the IP address of the

■ Match Type message recipient. You can match
exactly with or without case sensitivity,
or match on a regular expression.

Recipient Port ■ Is Greater Than Specifies the network port of the

■ Is Less Than message recipient.
■ Is Greater Than or Equals
■ Is Less Than or Equals
■ Equals
■ Range

Sender IP ■ Match Specifies the IP address of the

■ Match Type message sender. You can match
exactly with or without case sensitivity,
or match on a regular expression.

Sender Port ■ Is Greater Than Specifies the network port of the

■ Is Less Than message sender.
■ Is Greater Than or Equals Symantec Web Security Service does
■ Is Less Than or Equals not use this attribute.
■ Equals
■ Range
Using contextual attributes for Application Detection 960
Configuring contextual attribute conditions

Table 42-4 Data transfer attributes (continued)

Attribute Value Description

Site Classification ■ Match Specifies the type of site that is

■ Match Type associated with the detection request,
such as "Social Media." You can
match exactly with or without case
sensitivity, or match on a regular
expression.

Site Risk Score ■ Is Greater Than Specifies a numeric value indicating

■ Is Less Than the risk level of the target site.
■ Is Greater Than or Equals
■ Is Less Than or Equals
■ Equals
■ Range

Source Protocol ■ Match Specifies the OSI Level 7 network

■ Match Type protocol for the detection request. For
example, SMTP, HTTP, FTP, and so
on. You can match exactly with or
without case sensitivity, or match on
a regular expression.

User Agent ■ Match Specifies the user agent for the

■ Match Type detection request that is related to
HTTP traffic. You can match exactly
with or without case sensitivity, or
match on a regular expression.

Custom attributes
Custom attributes let you enter any attributes for your Application Detection policies that are
not provided by default.

Table 42-5 Custom attributes

Attribute Value Description

String Attribute ■ Name Specifies a custom string attribute.

■ Match Name your attribute, then specify the
■ Match Type match and match type for your string.
You can match exactly with or without
case sensitivity, or match on a regular
expression.
Using contextual attributes for Application Detection 961
Configuring contextual attribute conditions

Table 42-5 Custom attributes (continued)

Attribute Value Description

Numeric Attribute ■ Name Specifies a custom numeric attribute.

■ Is Greater Than Name your attribute, then specify the
■ Is Less Than numeric property and value.
■ Is Greater Than or Equals
■ Is Less Than or Equals
■ Equals
■ Range

Boolean Attribute ■ Name Specifies a custom Boolean attribute.

■ True Name your attribute, then specify the
■ False Boolean value.

Date Attribute ■ Name Specifies a custom date attribute.

■ After Name your attribute, then specify the
■ Before date property and value.
■ On or After
■ On or Before
■ On
■ Range
Chapter 43
Supported file formats for
detection
This chapter includes the following topics:

■ Overview of detection file format support

■ Supported formats for file type identification

■ Supported formats for content extraction

■ Supported encapsulation formats for subfile extraction

■ Supported file formats for metadata extraction

Overview of detection file format support

Symantec Data Loss Prevention detection supports various file formats for performing the
following operations:
■ File type identification
■ File contents extraction
■ Subfile extraction
■ Document metadata extraction
Table 43-1 summarizes the file formats that Symantec Data Loss Prevention supports for file
type identification and content, subfile and metadata extraction.
You configure the system to identify individual file formats using the Message Attachment
or File Type Match condition. This condition performs a context-based match that only identifies
the file format type; it does not extract file contents. In addition, you must explicitly select the
individual file format(s) you want to detect.
Supported file formats for detection 963
Overview of detection file format support

See “About file type matching” on page 900.

When you use a content-based detection condition in a policy (such as Content Matches
Keyword), the system automatically extracts file contents for supported file formats (such as
DOCX, PPTX, XSLX, PDF). In addition, the system automatically extracts subfiles from
supported encapsulation file formats (such as ZIP, RAR, TAR).
See “Content matching conditions” on page 387.
Lastly, you can enable metadata extraction for a limited number of document formats (such
as DOCX), and use keyword matching to detect document metadata.
See “About document metadata detection” on page 989.

Note: While there is some overlap among file types supported for extraction and for identification
(because if the system can crack the file it must be able to identify its type), the supported
formats for each operation are distinct and implemented using different match conditions. The
number of file formats supported for type identification is much broader than those supported
for content extraction.

Table 43-1 File format support for detection operations

Operation Description Configuration Supported formats

type

File type Symantec Data Loss Prevention does Explicitly using the Message See “Supported formats for file
identification not rely on file extensions to identify the Attachment or File Type type identification”
format. File type is identified by the Match file property condition. on page 964.
unique binary signature of the file
format.

File contents File contents is any text-based content Implicitly using one or more See “Supported formats for
extraction that can be viewed through the native content match conditions, content extraction”
or source application. including EDM, IDM, VML, on page 980.
data identifiers, keyword,
regular expressions.

Subfile Subfiles are files encapsulated in a Implicitly using one or more See “Supported encapsulation
extraction parent file. Subfiles are extracted and content match conditions, formats for subfile extraction”
(Subfile) processed individually for identification including EDM, IDM, VML, on page 987.
and content extraction. If the subfile data identifiers, keyword,
format is not supported by default, a regular expressions.
custom method can be used to detect
and crack the file.
Supported file formats for detection 964
Supported formats for file type identification

Table 43-1 File format support for detection operations (continued)

Operation Description Configuration Supported formats

type

Metadata Metadata is information about the file, Available for content-based See “Supported file formats
extraction such as author, version, or user-defined match conditions. Must be for metadata extraction”
(Metadata) tags. Generally limited to Microsoft enabled. on page 989.
Office documents (OLE-enabled) and
Adobe PDF files. Metadata support may
differ between agent and server.

Metadata includes data-security tags

that were created in Information Centric
Tagging (ICT).

Supported formats for file type identification

Table 43-2 lists the file types you can identify using the Message Attachment or File Type
Match policy condition.
See “About file type matching” on page 900.
The Unknown file format identifies any format that is unknown to Symantec Data Loss
Prevention. The Unknown file format is only supported for file type identification. This type
identifies files that are not known to Data Loss Prevention and blocks them using the file type
rule.
If the file format you want to identify is not supported, you can use the Symantec Data Loss
Prevention Scripting Language to identify custom file types.
See “About custom file type identification” on page 901.

Note: The Message Attachment or File Type Match condition is a context-based match
condition that only supports file type identification. This condition does not support file contents
extraction. To extract file contents for policy evaluation you must use a content-based detection
rule. See “Supported formats for content extraction” on page 980.

See “Overview of detection file format support” on page 962.

Table 43-2 Formats supported for file type identification

Message Attachment or File Type Match formats

7-Zip Compressed File (7Z)

Ability Office (SS)

Supported file formats for detection 965
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Ability Office (DB)

Ability Office (GR)

Ability Office (WP)

Ability Office (COM)

ACT

Adobe FrameMaker

Adobe Maker Interchange Format (FrameMaker)

Adobe FrameMaker Markup Language

Adobe PDF

AES Multiplus Comm

Aldus Freehand (Macintosh)

Aldus PageMaker (DOS)

Aldus PageMaker (Macintosh)

Amiga IFF-8SVX sound

Amiga MOD sound

ANSI

Apple Double

Apple Single

Applix Alis

Applix Asterix

Applix Graphics

Applix Presents

Applix Spreadsheets

Applix Words

ARC/PAK Archive
Supported file formats for detection 966
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

ASCII

ASCII-armored PGP encoded

ASCII-armored PGP Public Keyring

ASCII-armored PGP signed

Audio Interchange File Format

AutoCAD Drawing

AutoCAD Drawing Exchange

AutoDesk Animator FLIC Animation

AutoDesk Animator Pro FLIC Animation

AutoDesk WHIP

AutoShade Rendering

BinHex

CADAM Drawing (CDD) (server only)

CADAM Drawing Overlay

CATIA Drawing (CAT) (server only)

CCITT Group 3 1-Dimensional (G31D)

COMET TOP Word

Comma Separated Values

Compactor/Compact Pro Archive

Computer Graphics Metafile

Convergent Tech DEF Comm.

Corel Draw CMX

Corel Presentations

Corel Quattro Pro (WB2)

Corel Quattro Pro (WB3)

Supported file formats for detection 967
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Corel WordPerfect Linux

Corel WordPerfect Macintosh

Corel WordPerfect Windows (WO)

Corel WordPerfect Windows (WPD)

CorelDRAW

cpio Archive (UNIX)

cpio Archive (VAX)

cpio Archive (SUN)

CPT Communication

Creative Voice (VOC) sound

Curses Screen Image (UNIX)

Curses Screen Image (VAX)

Curses Screen Image (SUN)

Data Interchange Format

Data Point VISTAWORD

dBase Database

DCX Fax

DCX Fax System

DEC WPS PLUS

DECdx

Desktop Color Separation (DCS)

Device Independent file (DVI)

DG CEOwrite

DG Common Data Stream (CDS)

DIF Spreadsheet
Supported file formats for detection 968
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Digital Document Interchange Format (DDIF)

Disk Doubler Compression

DisplayWrite

Domino XML Language

EMC EmailXtender Container File (EMX)

ENABLE

ENABLE Spreadsheet (SSF)

Encapsulated PostScript (raster)

Enhanced Metafile

Envoy (EVY)

Executable- Other

Executable- UNIX

Executable- VAX

Executable- SUN

FileMaker (Macintosh)

File Share Encryption

Folio Flat File

Framework

Framework II

FTP Session Data

Fujitsu Oasys

GEM Bit Image

GIF

Graphics Environment Manager (GEM VDI)

GZIP
Supported file formats for detection 969
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Haansoft Hangul (Hangul 2010 SE+)

Harvard Graphics

Hewlett-Packard

Honey Bull DSA101

HP Graphics Language (HPG) (server only)

HP Printer Control Language (PCL)

HTML

IBM 1403 Line Printer

IBM DCA/RFT(Revisable Form Text)

IBM DCA-FFT

IBM DCF Script

iCalendar

Informix SmartWare II

Informix SmartWare II Communication File

Informix SmartWare II Database

Informix SmartWare Spreadsheet

Interleaf

Java Archive

JPEG

JPEG File Interchange Format (JFIF)

JustSystems Ichitaro

KW ODA G31D (G31)

KW ODA G4 (G4)

KW ODA Internal G32D (G32)

KW ODA Internal Raw Bitmap (RBM)

Supported file formats for detection 970
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Lasergraphics Language

Legato Extender

Link Library- Other

Link Library UNIX

Link Library VAX

Link Library SUN

Lotus 1-2-3 (123)

Lotus 1-2-3 (WK4)

Lotus 1-2-3 Charts

Lotus AMI Pro

Lotus AMI Professional Write Plus

Lotus AMIDraw Graphics

Lotus Freelance Graphics

Lotus Freelance Graphics 2

Lotus Notes Bitmap

Lotus Notes CDF

Lotus Notes database

Lotus Pic

Lotus Screen Cam

Lotus SmartMaster

Lotus Word Pro

Lyrix MacBinary

MacBinary

Macintosh Raster

MacPaint
Supported file formats for detection 971
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Macromedia (Adobe) Director

Macromedia (Adobe) Flash

MacWrite

MacWrite II

MASS-11

Micrografx Designer

Microsoft Access

Microsoft Advanced Systems Format (ASF)

Microsoft Compressed Folder (LZH)

Microsoft Compressed Folder (LHA)

Microsoft Device Independent Bitmap

Microsoft Excel Charts

Microsoft Excel Macintosh

Microsoft Excel Windows

Microsoft Excel Windows XML

Microsoft Office Access (ACCDB)

Microsoft Office Drawing

Microsoft OneNote

Microsoft Outlook Personal Folder

Microsoft Outlook

Microsoft Outlook Express

Microsoft PowerPoint Macintosh

Microsoft PowerPoint PC

Microsoft PowerPoint Windows

Microsoft PowerPoint Windows XML

Supported file formats for detection 972
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Microsoft PowerPoint Windows Macro-Enabled XML

Microsoft PowerPoint Windows XML Template

Microsoft PowerPoint Windows Macro-Enabled XML Template

Microsoft PowerPoint Windows XML Show

Microsoft PowerPoint Windows Macro-Enabled Show

Microsoft Project

Microsoft Publisher

Microsoft RMS Encrypted Office Binary File

Microsoft RMS Encrypted Open Packaging Conventions File

Microsoft Visio

Microsoft Visio 2013

Microsoft Visio 2013_Macro Format

Microsoft Visio 2013_Stencil Format

Microsoft Visio 2013_Stencil_Macro Format

Microsoft Visio 2013_Template Format

Microsoft Visio _Template_Macro

Microsoft Visio XML

Microsoft Wave Sound

Microsoft Windows Cursor (CUR) Graphics

Microsoft Windows Group File

Microsoft Windows Help File

Microsoft Windows Icon (ICO)

Microsoft Windows OLE 2 Encapsulation

Microsoft Windows Write

Microsoft Word (UNIX)

Supported file formats for detection 973
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Microsoft Word Macintosh

Microsoft Word PC

Microsoft Word Windows

Microsoft Word Windows XML

Microsoft Word Windows Template XML

Microsoft Word Windows Macro-Enabled Template XML

Microsoft Works (Macintosh)

Microsoft Works

Microsoft Works Communication (Macintosh)

Microsoft Works Communication (Windows)

Microsoft Works Database (Macintosh)

Microsoft Works Database (PC)

Microsoft Works Database (Windows)

Microsoft Works Spreadsheet (S30)

Microsoft Works Spreadsheet (S40)

Microsoft Works Spreadsheet (Macintosh)

Microstation

MIDI

MORE Database Outliner (Macintosh)

MPEG-1 Audio layer 3

MPEG-1 Video

MPEG-2 Audio

MS DOS Batch File format

MS DOS Device Driver

MultiMate 4.0
Supported file formats for detection 974
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Multiplan Spreadsheet

Navy DIF

NBI Async Archive Format

NBI Net Archive Format

Netscape Bookmark file

NeWS font file (SUN)

NeXT/Sun Audio

NIOS TOP

Nota Bene

Nurestor Drawing (NUR) (server only)

Oasis Open Document Format (ODT)

Oasis Open Document Format (ODS)

Oasis Open Document Format (ODP)

Object Module UNIX

Object Module VAX

Object Module SUN

ODA/ODIF

ODA/ODIF (FOD 26)

Office Writer

OLE DIB object

OLIDIF

OmniOutliner (OO3)

OpenOffice Calc (SXC)

OpenOffice Calc (ODS)

OpenOffice Impress (SXI)

Supported file formats for detection 975
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

OpenOffice Impress (SXP)

OpenOffice Impress (ODP)

OpenOffice Writer (SXW)

OpenOffice Writer (ODT)

Open PGP

OS/2 PM Metafile Graphics

Paradox (PC) Database

PC COM executable

PC Library Module

PC Object Module

PC PaintBrush

PC True Type Font

PCD Image

PeachCalc Spreadsheet

Persuasion Presentation

PEX Binary Archive (SUN)

PGP Compressed Data

PGP Encrypted Data

PGP Public Keyring

PGP Secret Keyring

PGP Signature Certificate

PGP Signed and Encrypted Data

PGP Signed Data

Philips Script

PKZIP
Supported file formats for detection 976
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Plan Perfect

Portable Bitmap Utilities (PBM)

Portable Greymap Utilities (PGM)

Portable Network Graphics

Portable Pixmap Utilities (PPM)

PostScript File

PRIMEWORD

Program Information File

Q & A for DOS

Q & A for Windows

Quadratron Q-One (V1.93J)

Quadratron Q-One (V2.0)

Quark Express (Macintosh)

QuickDraw 3D Metafile (3DMF)

QuickTime Movie

RAR archive

Real Audio

Reflex Database

Rich Text Format

RIFF Device Independent Bitmap

RIFF MIDI

RIFF Multimedia Movie

SAMNA Word IV

Serialized Object Format (SOF) Encapsulation

SGI RGB Image

Supported file formats for detection 977
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

SGML

Simple Vector Format (SVF)

SMTP document

SolidWorks Drawing (SLDASM, SLDPRT, SLDDRW)

StarOffice Calc (SXC)

StarOffice Calc (ODS)

StarOffice Impress (SXI)

StarOffice Impress (SXP)

StarOffice Impress (ODP)

StarOffice Writer (SXW)

StarOffice Writer (ODT)

Stuff It Archive (Macintosh)

Sun Raster Image

SUN vfont definition

Supercalc Spreadsheet

SYLK Spreadsheet

Symphony Spreadsheet

Tagged Image File

Tape Archive

Targon Word (V 2.0)

Text Mail (MIME)

Transmission Neutral Encapsulation Format

Truevision Targa

Ultracalc Spreadsheet

Unicode Text
Supported file formats for detection 978
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

Uniplex (V6.01)

Uniplex Ucalc Spreadsheet

UNIX Compress

UNIX SHAR Encapsulation

UNKNOWN

Usenet format

UUEncoding

Vcard

VCF

Volkswriter

VRML

Wang Office GDL Header Encapsulation

WANG PC

Wang WITA

WANG WPS Comm.

Windows Animated Cursor

Windows Bitmap

Windows C++ Object Storage

Windows Icon Cursor

Windows Metafile

Windows Micrografx Draw (DRW)

Windows Palette

Windows Media Video (WMV)

Windows Media Audio (WMA)

Windows Video (AVI)

Supported file formats for detection 979
Supported formats for file type identification

Table 43-2 Formats supported for file type identification (continued)

Message Attachment or File Type Match formats

WinZip (unzip reader)

WinZip

Word Connection

WordERA (V 1.0)

WordMARC word processor

WordPad

WordPerfect General File

WordPerfect Graphics 1

WordPerfect Graphics 2

WordStar

WordStar 2000

WordStar 6.0

WriteNow

Writing Assistant word processor

X Bitmap (XBM)

X Image

X Pixmap (XPM)

Xerox 860 Comm.

Xerox Writer word processor

XHTML

XML (generic)

XML Paper Specification

XyWrite
Supported file formats for detection 980
Supported formats for content extraction

Supported formats for content extraction

Symantec Data Loss Prevention cracks more than 100 file formats for performing content
extraction. You use content-based detection conditions to crack a file and extract its contents.
See “Content matching conditions” on page 387.
Table 43-3 lists the various file format categories whose content Symantec Data Loss Prevention
can extract. Refer to the associated link for the individual file formats supported for that category.
See “Overview of detection file format support” on page 962.

Table 43-3 Supported file format categories for content extraction

File format category Default support list

Word-processing file formats See “Supported word-processing formats for content extraction” on page 980.

Presentation file formats See “Supported presentation formats for content extraction” on page 982.

Spreadsheet file formats See “Supported spreadsheet formats for content extraction” on page 983.

Text and markup file formats See “Supported text and markup formats for content extraction” on page 984.

Email file formats See “Supported email formats for content extraction” on page 985.

CAD file formats See “Supported CAD formats for content extraction” on page 985.

Graphics file formats See “Supported graphics formats for content extraction” on page 986.

Database file formats See “Supported database formats for content extraction” on page 986.

Microsoft Office Open XML formats See “About high-performance content extraction for Office Open XML formats”
on page 996.

Other file formats See “Other file formats supported for content extraction” on page 986.

Encapsulation file formats See “Supported encapsulation formats for subfile extraction” on page 987.

Supported word-processing formats for content extraction

Table 43-4 lists the word-processing file formats whose content Symantec Data Loss Prevention
can extract for policy evaluation.

Table 43-4 Supported word-processing file formats for content extraction

Format Name Format Extension

Adobe Maker Interchange Format (FrameMaker) MIF

Apple iWork Pages PAGES

Supported file formats for detection 981
Supported formats for content extraction

Table 43-4 Supported word-processing file formats for content extraction (continued)

Format Name Format Extension

ApplixWords AW

Corel WordPerfect Linux WPS

Corel WordPerfect Macintosh WPS

Corel WordPerfect Windows WO

Corel WordPerfect Windows WPD

DisplayWrite IP

Folio Flat file FFF

Fujitsu Oasys OA2

Haansoft Hangul HWP

IBM DCA/RFT (Revisable Form Text) DC

JustSystems Ichitaro JTD

Lotus AMI Pro SAM

Lotus AMI ProfessionalWrite Plus AMI

LotusWord Pro LWP

Lotus SmartMaster MWP

Microsoft Word PC DOC

Microsoft Word Windows DOC

Microsoft Word Windows XML DOCX

Microsoft Word Windows Template XML DOTX

Microsoft Word Windows Macro-Enabled Template XML DOTM

Microsoft Word Macintosh DOC

Microsoft Works WPS

Microsoft Windows Write WRI

Microsoft OneNote ONE

OpenOfficeWriter SXW
Supported file formats for detection 982
Supported formats for content extraction

Table 43-4 Supported word-processing file formats for content extraction (continued)

Format Name Format Extension

OpenOfficeWriter ODT

StarOfficeWriter SXW

StarOfficeWriter ODT

WordPad RTF

XML Paper Specification XPS

XyWrite XY4

Supported presentation formats for content extraction

Table 43-5 lists the presentation file formats whose content Symantec Data Loss Prevention
can extract for policy evaluation.

Table 43-5 Supported presentation formats for files content extraction

Format Name Format Extension

Apple iWork Keynote KEYNOTE

Applix Presents AG

Corel Presentations SHW

Lotus Freelance Graphics PRZ

Lotus Freelance Graphics 2 PRE

Macromedia Flash SWF

Microsoft PowerPoint Windows PPT

Microsoft PowerPoint PC PPT

Microsoft PowerPoint Windows XML PPTX

Microsoft PowerPoint Windows Macro-Enabled XML PPTM

Microsoft PowerPoint Windows XML Template POTX

Microsoft PowerPoint Windows Macro-Enabled XML Template POTM

Microsoft PowerPoint Windows XML Show PPSX

Supported file formats for detection 983
Supported formats for content extraction

Table 43-5 Supported presentation formats for files content extraction (continued)

Format Name Format Extension

Microsoft PowerPoint Windows Macro-Enabled Show PPSM

Microsoft PowerPoint Macintosh PPT

OpenOffice Impress SXI

OpenOffice Impress SXP

OpenOffice Impress ODP

StarOffice Impress SXI

StarOffice Impress SXP

StarOffice Impress ODP

Supported spreadsheet formats for content extraction

Table 43-6 lists the spreadsheet file formats whose content Symantec Data Loss Prevention
can extract for policy evaluation.

Table 43-6 Supported spreadsheet formats for file contents extraction

Format Name Format Extension

Apple iWork Numbers NUMBERS

Applix Spreadsheets AS

Comma Separated Values CSV

Corel Quattro Pro WB2

Corel Quattro Pro WB3

Data Interchange Format DIF

Lotus 1-2-3 123

Lotus 1-2-3 WK4

Lotus 1-2-3 Charts 123

Microsoft Excel Windows XLS

Microsoft Excel Windows XML XLSX

Supported file formats for detection 984
Supported formats for content extraction

Table 43-6 Supported spreadsheet formats for file contents extraction (continued)

Format Name Format Extension

Microsoft Excel Charts XLS

Microsoft Excel 2007 Binary XLSB

Microsoft Excel Macintosh XLS

Microsoft Works Spreadsheet S30

Microsoft Works Spreadsheet S40

OpenOffice Calc SXC

OpenOffice Calc ODS

StarOffice Calc SXC

StarOffice Calc ODS

Supported text and markup formats for content extraction

Table 43-7 lists the text and markup file formats whose content Symantec Data Loss Prevention
can extract for policy evaluation.

Table 43-7 Supported text and markup file formats for content extraction

Format Name Format Extension

ANSI TXT

ASCII TXT

HTML HTM

Microsoft Excel Windows XML XML

Microsoft Word Windows XML XML

Microsoft Visio XML VDX

Oasis Open Document Format ODT

Oasis Open Document Format ODS

Oasis Open Document Format ODP

Rich Text Format RTF

Supported file formats for detection 985
Supported formats for content extraction

Table 43-7 Supported text and markup file formats for content extraction (continued)

Format Name Format Extension

Unicode Text TXT

XHTML HTM

XML (generic) XML

Supported email formats for content extraction

Table 43-8 lists the email file formats whose content Symantec Data Loss Prevention can
extract for evaluation.

Table 43-8 Supported email file formats for content extraction

Format Name Format Extension

Domino XML Language DXL

EMC EmailXtender Native Message ONM

Microsoft Outlook MSG

Microsoft Outlook Express EML

Text Mail (MIME) various

Transfer Neutral Encapsulation Format various

Supported CAD formats for content extraction

Table 43-9 lists the computer-aided design (CAD) file formats whose content Symantec Data
Loss Prevention can extract for evaluation.

Table 43-9 Supported CAD file formats

Format Name Format Extension

AutoCAD Drawing DWG

AutoCAD Drawing Exchange DFX

Microsoft Visio 2013 VSD

Microsoft Visio XML VSDX

Microsoft Visio 2013_Macro VSDM

Supported file formats for detection 986
Supported formats for content extraction

Table 43-9 Supported CAD file formats (continued)

Format Name Format Extension

Microsoft Visio 2013_Stencil VSSX

Microsoft Visio 2013_Stencil_Macro VSSM

Microsoft Visio 2013_Template VSTX

Microsoft Visio 2013_Template_Macro VSTM

Microstation DGN

Supported graphics formats for content extraction

Table 43-10 lists the graphics file formats whose content Symantec Data Loss Prevention can
extract for evaluation.

Table 43-10 Supported graphics file formats for content extraction

Format Name Format Extension

Enhanced Metafile EMF

Lotus Pic PIC

Tagged Image File (metadata only) TIFF

Windows Metafile WMF

Supported database formats for content extraction

The following table lists the database file formats whose content Symantec Data Loss Prevention
can extract for policy evaluation.

Table 43-11 Crackable database file formats

Format Name Format Extension

Microsoft Access MDB

Microsoft Project MPP

Table 43-12 Other supported formats for content extraction

Format name Format extension

Adobe PDF PDF

iCalendar ICS

MPEG-1 Audio layer 3 (metadata MP3

only)

Microsoft Windows Backup Utility BKF

File

Microsoft Rights Management ■ PFILE

protected files ■ Microsoft Office 2003 and older
■ Files that use Open Packaging Conventions (OPC) file technology, including
Office Open XML (including Office 2007 and greater), and XML Paper
Specification (XPS)

Note: This type of content extraction is only supported on detection servers

running on Windows servers

File Share Encryption (PGP You can decrypt Symantec File Share encrypted files and extract file contents for
Netshare) policy evaluation using the File Share plugin. Refer to the Symantec Data Loss
Prevention Encryption Insight Implementation Guide.
Note: Encryption Insight is only available with Network Discover.

Custom You can write a plug-in to perform content, subfile, and metadata extraction
operations on custom file formats. Refer to the Symantec Data Loss Prevention
Content Extraction Plug-in Developers Guide.
Note: Content extraction plug-ins are limited to detection servers.

Virtual Card File VCF and VCARD electronic business card files

Supported encapsulation formats for subfile

extraction
Symantec Data Loss Prevention supports various encapsulation formats for subfile extraction,
such as ZIP, RAR, and TAR. The system automatically performs subfile extraction for supported
formats using content-based match conditions. Subfile extraction is a subset of content
extraction in that, if the system is successful in extracting a subfile from a supported
encapsulated file, the system automatically extracts the text-based subfile contents if the subfile
format is supported for content extraction.
See “Overview of detection file format support” on page 962.
Supported file formats for detection 988
Supported encapsulation formats for subfile extraction

Table 43-13 lists the file formats whose content Symantec Data Loss Prevention can extract
for content evaluation.

Table 43-13 Supported encapsulation formats for subfile extraction

Format Name Format Extension

7-Zip 7Z

BinHex HQX

GZIP GZ

iCalendar ICS

Java Archive JAR

Microsoft Cabinet CAB

Microsoft Compressed Folder LZH

Microsoft Compressed Folder LHA

Microsoft Visio 2013 VSD

Microsoft Visio 2013 XML VSDX

Microsoft Visio 2013_Macro VSDM

Microsoft Visio 2013_Stencil VSSX

Microsoft Visio 2013_Stencil_Macro VSSM

Microsoft Visio 2013_Template VSTX

Microsoft Visio 2013_Template_Macro VSTM

PKZIP ZIP

WinZip ZIP

RAR archive RAR

Tape Archive TAR

UNIX Compress Z

UUEncoding UUE

Virtual Card File VCF and VCARD electronic business card files

YENC YENC (server only)

Supported file formats for detection 989
Supported file formats for metadata extraction

Supported file formats for metadata extraction

Table 43-14 lists some of the file formats that Symantec Data Loss Prevention supports for
metadata detection, and provides some example metadata fields returned for those formats.
This list is not exhaustive and is provided for quick reference only. Other file formats may be
supported, and other custom fields may be returned. The best practice is to always use the
filter utility to verify metadata support for each file format you want to detect.
See “Always use the filter utility to verify file format metadata support” on page 991.

Table 43-14 Supported file formats for metadata detection

File formats Metadata Description

Example fields:
Microsoft Office documents, for
example: ■ Title
For Microsoft Office documents, the
■ Subject
■ Word (DOC, DOCX) system extracts Object Linking and
Embedding (OLE) metadata. ■ Author
■ Excel (XLS, XLSX)
■ Keywords
■ PowerPoint (PPT, PPTX)
■ Other custom fields

Example fields:
For Adobe PDF files, the system
extracts Document Information ■ Author
Dictionary (DID) metadata. The system ■ Title
Adobe PDF files
does not support Adobe Extensible ■ Subject
Metadata Platform (XMP) metadata ■ Creation
extraction.
■ Update dates

Microsoft Visio Supported format extensions

Use the filter utility to verify metadata See “Always use the filter utility to
Other file formats (including binary and
extraction for other file formats. verify file format metadata support”
text)
on page 991.

Content extraction plug-in that

Custom file formats Custom file type metadata supports the metadata extraction
operation.

About document metadata detection

In addition to file content and subfile extraction, Symantec Data Loss Prevention supports
metadata extraction for many file formats. File format metadata is data about a file that is
stored as file properties. By default metadata extraction is disabled because it can lead to false
positives. Used properly, metadata detection can enhance the accuracy of your content-based
policy rules.
Supported file formats for detection 990
Supported file formats for metadata extraction

For example, consider a business that uses Microsoft Office templates for their Word, Excel,
and PowerPoint documents. The business applies Microsoft OLE metadata properties in the
form of keywords to each template. The business has enabled metadata extraction and
deployed keyword policies to match on metadata keywords. These policies can detect keywords
in documents that are derived from the templates. The business also has the flexibility to use
policy exceptions to avoid generating incidents if certain metadata keywords are present.

Enabling server metadata detection

By default metadata extraction is disabled for detection servers.
To enable server metadata extraction
1 Log on to the Enforce Server administration console as a system administrator.
2 Navigate to the System > Servers and Detectors > Overview > Server/Detector Detail
- Advanced Settings screen for the detection server or cloud detector you want to enable
metadata extraction.
3 Click the Server Settings button.
4 Locate property ContentExtraction.EnableMetaData in the list.
5 Enter the value on for this property to enable metadata extraction.
6 Click Save to save the configuration.
7 Click Recycle the server at the Server Detail screen to restart the server.
8 Click Done at the Server Detail screen to complete the process.

Enabling endpoint metadata detection

By default metadata extraction is disabled for endpoints.
To enable endpoint metadata extraction
1 Log on to the Enforce Server administration console as a system administrator.
2 Navigate to the System > Agents > Agent Configuration screen for the endpoint server
you want to enable metadata extraction.
3 Create a new endpoint configuration for metadata detection, or select the default
configuration.
See “Create a separate endpoint configuration for metadata detection” on page 995.
4 Select the Advanced Agent Settings tab.
5 Locate property Detection.ENABLE_METADATA.str in the list.
Supported file formats for detection 991
Supported file formats for metadata extraction

6 Enter the value on for this property to enable metadata extraction.

7 Click Save and Apply to save the configuration change.

Best practices for using metadata detection

Best practices for using metadata detection lists best practices for implementing metadata
detection with links to corresponding topics for detailed considerations.

Table 43-15 Considerations for implementing metadata detection

Consideration Topic

Always use filter to verify file format metadata support. See “Always use the filter utility to verify file format
metadata support” on page 991.

Enable metadata detection only if it is necessary. See “Distinguish metadata from file content and application
data” on page 993.

Avoid generating false positives by selecting keywords See “Use and tune keyword lists to avoid false positives
carefully. on metadata” on page 995.

Understand resource implications of endpoint metadata See “Understand performance implications of enabling
extraction. endpoint metadata detection” on page 995.

Create a separate endpoint configuration for metadata See “Create a separate endpoint configuration for
detection. metadata detection” on page 995.

Use response rules to add metadata tags to incidents. See “Use response rules to tag incidents with metadata”
on page 995.

Always use the filter utility to verify file format metadata support
To help you create policies that detect file format metadata, use the filter utility that is available
with any Symantec Data Loss Prevention detection or Endpoint Server installation. This utility
provides an easy way to determine which metadata fields the system returns for a given file
format. The utility generates output that contains the metadata the system will extract at runtime
for each file format you test using filter.
To verify file format metadata extraction support using filter describes how to use the filter
utility. It is recommended that you always follow this process so that you can create and tune
policies that accurately detect file format metadata.

Note: The data output by the filter utility is in ASCII format. Symantec Data Loss Prevention
processes data in Unicode format. Therefore, you may rely on the existence of the fields
returned by the filter utility, but the metadata detected by Symantec Data Loss Prevention may
not look identical to the filter output.
Supported file formats for detection 992
Supported file formats for metadata extraction

To verify file format metadata extraction support using filter

1 On the file system where a detection server is installed, start a command prompt session.
2 Change directory to where the filter utility is located.
For example, on a default 64-bit Windows installation you would issue the following
command:
cd \Program Files\Symantec\DataLossPrevention\EnforceServer\15.5
\Protect\plugins\contentextraction\Verity\x64

3 Issue the following command to run the filter program and display its syntax and optional
parameters.
filter -help

As indicated by the help, you use the following syntax to execute the filter utility:
filter [options] inputfile outputfile

The inputfile is an instance of the file format you want to verify. The outputfile is a
file the filter utility writes the extracted data to.
Note the following extraction options:
■ To verify metadata extraction, use the "get doc summary info" option:-i
■ To verify content extraction, use no options: filter inputfile outputfile

4 Execute filter against an instance of the file format to verify metadata extraction.
For example, on Windows you would issue the following command:
filter -i \temp\myfile.doc \temp\metadata_output.txt

Where myfile.doc is a file containing metadata you want to verify and have copied to the
\temp directory, and metadata_output.txt is the name of the file you want the system to
generate and write the extracted data to.
5 Review the filter output. The output data should be similar to the following:

1 2 1252 CodePage 1 1 "S" Title 0 0 (null) 1 1 "P" Author 0 0 (null)

0 0 (null) 0 1 "" (null) 1 1 "m" LastAuthor 1 1 "1" RevNumber
1 3 6300 Minutes EditTime 1 3 Mon Aug 27 11:53:07 2007 LastPrinted

6 Refer to the following tables for an explanation of each metadata extraction field output
by the filter utility.
Table 43-16 repeats the output from Step 5, formatted for readability.
Table 43-17 explains each column field.
Supported file formats for detection 993
Supported file formats for metadata extraction

Table 43-16 Example filter metadata output

Column 1 Column 2 Column 3 Column 4

1 2 1252 CodePage

1 1 "S" Title

0 0 (null)

1 1 "P" Author

0 0 (null)

0 1 "" (null)

1 1 "m" LastAuthor

1 1 "1" RevNumber

1 3 6300 Minutes EditTime

1 3 Mon Aug 27 11:53:07 2007 LastPrinted

Table 43-17 Metadata fields generated by the filter utility

Column 1 Column 2 Column 3 Column 4

1 = valid field The type of data: The data payload for the The name of the field (empty
field. or null if the field is invalid).
0 = invalid field 1 = String
Note: You may ignore rows 2 = Integer
where the first column is 0.
3 = Date/Time

5 = Boolean

Distinguish metadata from file content and application data

Do not confuse metadata extraction with content extraction or application data. Some text that
may appear to be metadata is extracted as content or application data. Table 43-18 describes
some types of data that is not extracted as file format metadata to help you determine if and
when you need to enable metadata detection.
Supported file formats for detection 994
Supported file formats for metadata extraction

Note: This list is not exhaustive and is provided for quick reference only. There may be other
types of data that are not extracted as metadata. The best practice is to use the filter utility to
verify file format metadata support. See “Always use the filter utility to verify file format metadata
support” on page 991.

Table 43-18 Data not extracted as metadata

Content type Extraction method

Application data Application data including message transport information is extracted separately from
file format extraction. For all inbound messages, the system extracts message envelope
(header) and subject information as text at the application layer. The type of application
data that is extracted depends on the channels supported by the detection server or
endpoint.

Headers and footers Document header and footer text is extracted as content, not metadata. To avoid false
positives, it is recommended that you remove or whitelist headers and footers from
documents.

See “Use white listing to exclude non-sensitive content from partial matching”
on page 651.

See the Indexed Document Matching (IDM) chapter in the Symantec Data Loss
Prevention Administration Guide for details.

Markup text Markup text is extracted as content, not metadata. Markup text extraction is supported
for HTML, XML, SGML, and more. Markup text extraction is disabled by default.

See “Advanced server settings” on page 285.

See “Advanced agent settings” on page 2372.

See the "Advanced Server Settings" topic in the Symantec Data Loss Prevention
Administration Guide to enable it.

Hidden text Hidden text is extracted as content, not metadata. Hidden text extraction in the form
of tracked changes is supported for some Microsoft Office file formats. Hidden text
extraction is disabled by default.

See “Advanced server settings” on page 285.

See “Advanced agent settings” on page 2372.

See the "Advanced Server Settings" topic in the Symantec Data Loss Prevention
Administration Guide to enable it.

Watermarks Text-based watermarks are extracted as content, not metadata. Text-based watermark
detection is supported for Microsoft Word documents (versions 2003 and 2007). It is
not supported for other file formats.
Supported file formats for detection 995
Supported file formats for metadata extraction

Use and tune keyword lists to avoid false positives on metadata

Enabling metadata extraction can cause false positives because more text is checked for a
match. For example, if you have a policy that detects keywords and metadata extraction is
enabled, the policy reports a match if a keyword is present in the content or in the metadata.
Once the system has extracted the content and the metadata, the text is normalized and
streamed to the detection component for matching. The detection component has no knowledge
of the source of the text, whether it is application data, content, or metadata.
To detect file format metadata, you define keyword conditions for rules or exceptions that
contain keywords that are specific to one or more file formats. To avoid generating false
positives, clearly define the keyword lists in your policies. The keywords you use to detect
metadata should be unique and distinct from keywords or phrases you use to detect content.
Test and tune keyword lists to improve metadata detection accuracy.

Understand performance implications of enabling endpoint metadata

detection
On the endpoint, enabling metadata extraction does not add overhead if no content rules are
deployed. If content rules are deployed to the endpoint, enabling metadata extraction may
introduce minor overhead because there is extra data to inspect. Test and tune your endpoint
policy keyword lists to ensure that metadata detection is efficient.

Create a separate endpoint configuration for metadata detection

When you enable endpoint metadata detection, consider creating a custom endpoint
configuration specifically for metadata detection. By doing so you can easily revert to the
default configuration if necessary.

Use response rules to tag incidents with metadata

You cannot use metadata detection to apply tags to inbound files or documents that generate
incidents. If this is desired, consider using a FlexResponse plug-in.
See “About response rules” on page 1738.
See the Symantec Data Loss Prevention Administration Guide for details.
Chapter 44
Supported Office Open XML
formats for
high-performance content
extraction
This chapter includes the following topics:

■ About high-performance content extraction for Office Open XML formats

■ Enabling high-performance content extraction for Office Open XML files

■ About metadata extraction for Office Open XML files

■ About subfile extraction for Office Open XML files

About high-performance content extraction for Office

Open XML formats
High-performance content extraction for Office Open XML formats is enabled by default on
Symantec Data Loss Prevention cloud detectors. You can enable Office Open XML
high-performance content extraction on your on-premises detection servers. Office Open XML
content extraction is not available on the endpoint DLP Agent.
Enabling Office Open XML high-performance content extraction on your on-premises detection
servers significantly improves content extraction performance for such files.
Supported Office Open XML formats for high-performance content extraction 997
About high-performance content extraction for Office Open XML formats

Warning: Do not enable Office Open XML high-performance content extraction on detection
servers using Indexed Document Matching (IDM) policies.

Table 44-1 Office Open XML formats for high-performance content extraction

Format name Format extension

Office Open XML Word Processing DOCX

Office Open XML Word Processing Template DOTX

Office Open XML Macro-enabled Word Processing DOCM

Office Open XML Macro-enabled Word Processing DOTM

Template

Office Open XML Spreadsheet XLSX

Office Open XML Spreadsheet Template XLTX

Office Open XML Macro-enabled Spreadsheet XLSM

Office Open XML Macro-enabled Spreadsheet XLTM

Template

Office Open XML Spreadsheet Add-in XLAM

Office Open XML Presentation PPTX

Office Open XML Presentation Template POTX

Office Open XML Presentation Slide Show PPSX

Office Open XML Macro-enabled Presentation PPTM

Office Open XML Macro-enabled Presentation POTM

Template

Office Open XML Presentation Macro-enabled Slide PPSM

Show

Office Open XML Presentation Add-in PPAM

Supported Office Open XML formats for high-performance content extraction 998
Enabling high-performance content extraction for Office Open XML files

Enabling high-performance content extraction for

Office Open XML files
Warning: Do not enable Office Open XML high-performance content extraction on detection
servers using Indexed Document Matching (IDM) policies.

The following procedure describes how to enable Office Open XML high-performance content
extraction on your on-premises detection servers. Note that PowerPoint content extraction is
not enabled by default. If you want to extract content from PowerPoint files, follow the optional
third step in this procedure.
To enable Office Open XML high-performance content extraction
1 On your detection server, open the manifest.xml file, located in one of these locations:
■ Linux:
opt/Symantec/DataLossPrevention/ContentExtractionService/15.5/Plugins/Protect/
plugins/contentextraction/OfficeOpenXMLPlugin

■ Windows: \Program
Files\Symantec\DataLossPrevention\ContentExtractionService\15.5\Plugins\
Protect\plugins\contentextraction\OfficeOpenXMLPlugin

2 Locate the plugin id="OfficeOpenXMLPlugin" line, and set the disabled value to
false. The resulting line should read as follows (line breaks added for legibility):

3 (Optional): To enable PowerPoint content extraction, add the following lines to the
manifest.xml file:

<documentType type="pptx">
<supportedOperations>
<operation type="FileTypeIdentification"/>
<operation type="TextExtraction"/>
<operation type="SubFileExtraction"/>
<operation type="MetadataExtraction"/>
</supportedOperations>
</documentType>
Supported Office Open XML formats for high-performance content extraction 999
About metadata extraction for Office Open XML files

4 Save and close the manifest.xml file.

5 Restart your detection server to apply the change.
6 Repeat steps 1-5 on all detection servers on which you want to enable Office Open XML
content extraction.

About metadata extraction for Office Open XML files

High-performance content extraction for Office Open XML formats supports metadata extraction
in all localized languages. The following table lists the extracted metadata properties:

Table 44-2 Office Open XML metadata

Property type Property

Core properties Author

Application properties AppName

AppVersion
Supported Office Open XML formats for high-performance content extraction 1000
About subfile extraction for Office Open XML files

Table 44-2 Office Open XML metadata (continued)

Property type Property

CharCount

CharactersWithSpaces

Company

EditTime

HyperlinkBase

HyperlinksChanged

LineCount

LinksDirty

Manager

PageCount

Parcount

ScaleCrop

Security

SharedDoc

Template

TitleOfParts

WordCount

Custom properties RightsWatchMark, used by Symantec Information

Centric Tagging

All other custom properties

About subfile extraction for Office Open XML files

High-performance content extraction for Office Open XML formats supports subfile extraction
for image files, Object Linking and Embedding (OLE) Compound Files, and Open Packaging
Convention (OPC) container files.
Supported Office Open XML formats for high-performance content extraction 1001
About subfile extraction for Office Open XML files

Image file extraction

Image file extraction supports Symantec Data Loss Prevention's Form Recognition and Optical
Character Recognition (OCR) Sensitive Image Recognition features.
See “About Form Recognition detection” on page 695.
See “Server configuration—basic”on page 705 on page 705.
Symantec Data Loss Prevention supports content extraction for the following image formats:
■ Bitmap (BMP)
■ Portable Network Graphics (PNG)
■ Joint Photographic Experts Group (JPEG or JPG extensions)
■ Enhanced Metafile (EMF)
■ Windows Metafile (WMF)
There are two catagories of EMF/WMF files:
■ Files attached by users directly to Office Open XML documents.
■ Thumbnail or icon files created by Office applications to represent files attached to Office
Open XML documents.
All EMF and WMF files are counted as images, and therefore count against the maximum
image extraction limit. If you find that you are reaching the maximum image extraction limit
due to a large number of EMF/WMF files, you may want to disable EMF/WMF file extraction.

OLE and OPC file extraction

Symantec Data Loss Prevention can extract files embedded in Office Open XML documents.
The following table lists the supported file formats and embedding types:

Table 44-3
File format Embedding type

Adobe PDF OLE

Bitmap OLE

Excel 97 Worksheet OLE/OPC

Excel Binary OLE/OPC

Excel Chart OLE/OPC

Excel Macro-enabled Worksheet OLE/OPC

Excel Worksheet OLE/OPC

Supported Office Open XML formats for high-performance content extraction 1002
About subfile extraction for Office Open XML files

Table 44-3 (continued)

File format Embedding type

Graph Chart OLE

OpenDocument Presentation OLE

OpenDocument Slide OLE

OpenDocument Text OLE

Package 1 (Non-Office files, all formats) OLE

Package 2 (Non-Office files, all formats) OLE

PowerPoint 97 Presentation OLE/OPC

PowerPoint 97 Slide OLE/OPC

PowerPoint Macro-enabled Presentation OLE/OPC

PowerPoint Macro-enabled Slide OLE/OPC

PowerPoint Presentation OLE/OPC

PowerPoint Slide OLE/OPC

Visio OLE

Word OLE/OPC

Word 97 OLE/OPC

Word Macro OLE/OPC

WordPad OLE

Configuring plug-in settings

Symantec recommends using the default settings for high-performance Office Open XML
content extraction. You may encounter situations in which you want to adjust some settings,
however. This section documents the plugin_settings.txt configuration file, available in
one of the following locations on your detection server:
■ On Linux: /opt/Symantec/DataLossPrevention/ContentExtractionService/
15.5/Plugins/Protect/plugins/contentextraction/OfficeOpenXMLPlugin

■ On Windows: \Program
Files\Symantec\DataLossPrevention\ContentExtractionService\
15.5\Plugins\Protect\plugins\contentextraction\OfficeOpenXMLPlugin
Supported Office Open XML formats for high-performance content extraction 1003
About subfile extraction for Office Open XML files

The plugin_settings.txt file contains these settings (line breaks added for legibility):

dotnetcoreDir=/publish
extractEmfWmf=on
streamConfiguration=EmbeddedOdf,false,false;
CONTENTS,false,false;
Package,false,false;
AttachContents,false,false;
skipFilesWithSignatures=0x38,0x42,0x50,0x53;
imageSignatures=0x42,0x4d;
0xff,0xd8,0xff,0xe0;
0xff,0xd8,0xff,0xe1;
0xff,0xd8,0xff,0xe8;
0xff,0xd8,0xff,0xe2;
0xff,0xd8,0xff,0xe3;
0x89,0x50,0x4e,0x47,0x0d,0x0a,0x1a,0x0a;
0xd7,0xcd,0xc6,0x9a;

To disable EMF/WMF extraction, set extractEmfWmf=off.

The streamConfiguration settings specify the following:
■ The name of the stream in the OLE files that includes file content, such as EmbeddedOdf.
■ Whether to continue to the next stream if content is found in the current stream. This is
false by default, meaning that after finding the first valid content stream, the content
extractor will not continue evaluating the subsequent streams.
■ Whether to include the original OLE file as a subfile. This is also false by default.
The skipFilesWithSignatures setting specifies which file types to skip based on their hex
file signature. By default the content extractor skips PhotoShop Document (PSD) files, as
Symantec Data Loss Prevention cannot perform detection on these files. 0x38,0x42,0x50,0x53
is the hex file signature for PSD files.
The imageSignatures setting specifies which files should be treated as images based on their
hex file signature. By default, the list includes BMP, JPG, JPEG, PNG, and WMF file hex file
signatures.
Restart your detection server after editing the plugin_settings.txt file to apply your changes.
Chapter 45
Library of system data
identifiers
This chapter includes the following topics:

■ Library of system data identifiers

■ ABA Routing Number

■ Argentina Tax Identification Number

■ Australia Driver's License Number

■ Australian Business Number

■ Australian Company Number

■ Australian Medicare Number

■ Australian Passport Number

■ Australian Tax File Number

■ Austria Passport Number

■ Austria Tax Identification Number

■ Austria Value Added Tax (VAT) Number

■ Austrian Social Security Number

■ Belgian National Number

■ Belgium Driver's Licence Number

■ Belgium Passport Number

Library of system data identifiers 1005

■ Belgium Tax Identification Number

■ Belgium Value Added Tax (VAT) Number

■ Brazilian Election Identification Number

■ Brazilian National Registry of Legal Entities Number

■ Brazilian Natural Person Registry Number (CPF)

■ British Columbia Personal Healthcare Number

■ Bulgaria Value Added Tax (VAT) Number

■ Bulgarian Uniform Civil Number - EGN

■ Burgerservicenummer

■ Canada Driver's License Number

■ Canada Passport Number

■ Canada Permanent Residence (PR) Number

■ Canadian Social Insurance Number

■ Chilean National Identification Number

■ China Passport Number

■ Codice Fiscale

■ Colombian Addresses

■ Colombian Cell Phone Number

■ Colombian Personal Identification Number

■ Colombian Tax Identification Number

■ Credit Card Magnetic Stripe Data

■ Credit Card Number

■ Croatia National Identification Number

■ CUSIP Number

■ Cyprus Tax Identification Number

■ Cyprus Value Added Tax (VAT) Number

■ Czech Republic Driver's Licence Number

Library of system data identifiers 1006

■ Czech Republic Personal Identification Number

■ Czech Republic Tax Identification Number

■ Czech Republic Value Added Tax (VAT) Number

■ Denmark Personal Identification Number

■ Denmark Tax Identification Number

■ Denmark Value Added Tax (VAT) Number

■ Driver's License Number – CA State

■ Driver's License Number - FL, MI, MN States

■ Driver's License Number - IL State

■ Driver's License Number - NJ State

■ Driver's License Number - NY State

■ Driver's License Number - WA State

■ Driver's License Number - WI State

■ Drug Enforcement Agency (DEA) Number

■ Estonia Driver's Licence Number

■ Estonia Passport Number

■ Estonia Personal Identification Code

■ Estonia Value Added Tax (VAT) Number

■ European Health Insurance Card Number

■ Finland Driver's Licence Number

■ Finland European Health Insurance Number

■ Finland Passport Number

■ Finland Tax Identification Number

■ Finland Value Added Tax (VAT) Number

■ Finnish Personal Identification Number

■ France Driver's License Number

■ France Health Insurance Number

Library of system data identifiers 1007

■ France Tax Identification Number

■ France Value Added Tax (VAT) Number

■ French INSEE Code

■ French Passport Number

■ French Social Security Number

■ German Passport Number

■ German Personal ID Number

■ Germany Driver's License Number

■ Germany Value Added Tax (VAT) Number

■ Germany Tax Identification Number

■ Greece Passport Number

■ Greece Social Security Number (AMKA)

■ Greek Tax Identification Number

■ Greece Value Added Tax (VAT) Number

■ Healthcare Common Procedure Coding System (HCPCS CPT Code)

■ Health Insurance Claim Number

■ Hong Kong ID

■ Hungary Driver's Licence Number

■ Hungary Passport Number

■ Hungarian Social Security Number

■ Hungarian Tax Identification Number

■ Hungarian VAT Number

■ IBAN Central

■ IBAN East

■ IBAN West

■ Iceland National Identification Number

■ Iceland Passport Number

Library of system data identifiers 1008

■ Iceland Value Added Tax (VAT) Number

■ Indian Aadhaar Card Number

■ Indian Permanent Account Number

■ India RuPay Card Number

■ Indonesian Identity Card Number

■ International Mobile Equipment Identity Number

■ International Securities Identification Number

■ IP Address

■ IPv6 Address

■ Ireland Passport Number

■ Ireland Tax Identification Number

■ Ireland Value Added Tax (VAT) Number

■ Irish Personal Public Service Number

■ Israel Personal Identification Number

■ Italy Driver's Licence Number

■ Italy Health Insurance Number

■ Italy Passport Number

■ Italy Value Added Tax (VAT) Number

■ Japan Driver's License Number

■ Japan Passport Number

■ Japanese Juki-Net Identification Number

■ Japanese My Number - Corporate

■ Japanese My Number - Personal

■ Kazakhstan Passport Number

■ Korea Passport Number

■ Korea Residence Registration Number for Foreigners

■ Korea Residence Registration Number for Korean

Library of system data identifiers 1009

■ Latvia Driver's Licence Number

■ Latvia Passport Number

■ Latvia Personal Identification Number

■ Latvia Value Added Tax (VAT) Number

■ Liechtenstein Passport Number

■ Lithuania Personal Identification Number

■ Lithuania Tax Identification Number

■ Lithuania Value Added Tax (VAT) Number

■ Luxembourg National Register of Individuals Number

■ Luxembourg Passport Number

■ Luxembourg Tax Identification Number

■ Luxembourg Value Added Tax (VAT) Number

■ Macau National Identification Number

■ Malaysia Passport Number

■ Malaysian MyKad Number (MyKad)

■ Malta National Identification Number

■ Malta Tax Identification Number

■ Malta Value Added Tax (VAT) Number

■ Medicare Beneficiary Identifier

■ Mexican Personal Registration and Identification Number

■ Mexican Tax Identification Number

■ Mexican Unique Population Registry Code

■ Mexico CLABE Number

■ National Drug Code (NDC)

■ National Provider Identifier Number

■ Netherlands Bank Account Number

■ Netherlands Driver's License Number

Library of system data identifiers 1010

■ Netherlands Passport Number

■ Netherlands Tax Identification Number

■ Netherlands Value Added Tax (VAT) Number

■ New Zealand Driver's Licence Number

■ New Zealand National Health Index Number

■ New Zealand Passport Number

■ Norway Driver's Licence Number

■ Norway National Identification Number

■ Norway Value Added Tax Number

■ Norwegian Birth Number

■ People's Republic of China ID

■ Poland Driver's Licence Number

■ Poland European Health Insurance Number

■ Poland Passport Number

■ Poland Value Added Tax (VAT) Number

■ Polish Identification Number

■ Polish REGON Number

■ Polish Social Security Number (PESEL)

■ Polish Tax Identification Number

■ Portugal Driver's Licence Number

■ Portugal National Identification Number

■ Portugal Passport Number

■ Portugal Tax Identification Number

■ Portugal Value Added Tax (VAT) Number

■ Randomized US Social Security Number (SSN)

■ Romania Driver's Licence Number

■ Romania National Identification Number

Library of system data identifiers 1011

■ Romania Value Added Tax (VAT) Number

■ Romanian Numerical Personal Code

■ Russian Passport Identification Number

■ Russian Taxpayer Identification Number

■ SEPA Creditor Identifier Number North

■ SEPA Creditor Identifier Number South

■ SEPA Creditor Identifier Number West

■ Serbia Unique Master Citizen Number

■ Serbia Value Added Tax (VAT) Number

■ Singapore NRIC data identifier

■ Slovakia Driver's Licence Number

■ Slovakia National Identification Number

■ Slovakia Passport Number

■ Slovakia Value Added Tax (VAT) Number

■ Slovenia Passport Number

■ Slovenia Tax Identification Number

■ Slovenia Unique Master Citizen Number

■ Slovenia Value Added Tax (VAT) Number

■ South African Personal Identification Number

■ South Korea Resident Registration Number

■ Spain Value Added Tax (VAT) Number

■ Spain Driver's Licence Number

■ Spanish Customer Account Number

■ Spanish DNI ID

■ Spanish Passport Number

■ Spanish Social Security Number

■ Spanish Tax Identification (CIF)

Library of system data identifiers 1012

■ Sri Lanka National Identity Number

■ Sweden Driver's Licence Number

■ Sweden Tax Identification Number

■ Sweden Value Added Tax (VAT) Number

■ Swedish Passport Number

■ Sweden Personal Identification Number

■ SWIFT Code

■ Swiss AHV Number

■ Swiss Social Security Number (AHV)

■ Switzerland Health Insurance Card Number

■ Switzerland Passport Number

■ Switzerland Value Added Tax (VAT) Number

■ Taiwan ROC ID

■ Thailand Passport Number

■ Thailand Personal Identification Number

■ Turkish Identification Number

■ UK Bank Account Number Sort Code

■ UK Drivers Licence Number

■ UK Electoral Roll Number

■ UK National Health Service (NHS) Number

■ UK National Insurance Number

■ UK Passport Number

■ UK Tax ID Number

■ UK Value Added Tax (VAT) Number

■ Ukraine Identity Card

■ Ukraine Passport (Domestic)

■ Ukraine Passport (International)

Library of system data identifiers 1013
Library of system data identifiers

■ United Arab Emirates Personal Number

■ US Individual Tax Identification Number (ITIN)

■ US Passport Number

■ US Social Security Number (SSN)

■ US ZIP+4 Postal Codes

■ Venezuela National Identification Number

Library of system data identifiers

This section lists all data identifiers provided by the Data Loss Prevention system.

ABA Routing Number

The American Banking Association (ABA) routing number, also known as a routing transit
number (RTN), is used to identify financial institutions and process transactions.
The ABA Routing Number data identifier detects a nine-digit number that matches the ABA
Routing Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a nine-digit number with checksum validation.
See “ABA Routing Number wide breadth” on page 1013.
■ The medium breadth detects a nine-digit number with checksum validation, and eliminates
common test numbers.
See “ABA Routing Number medium breadth” on page 1014.
■ The narrow breadth detects a nine-digit number with checksum validation, eliminates
common test numbers, and requires the presence of related keywords.
See “ABA Routing Number narrow breadth” on page 1014.

ABA Routing Number wide breadth

The wide breadth detects a nine-digit number with checksum validation.

Table 45-1 ABA Routing Number wide-breadth patterns

Pattern

[0123678]\d{8}
Library of system data identifiers 1014
ABA Routing Number

Table 45-1 ABA Routing Number wide-breadth patterns (continued)

Pattern

[0123678]\d{3}-\d{4}-\d

Table 45-2 ABA Routing Number wide-breadth validators

Mandatory validator Description

ABA Checksum Every ABA routing number must start with the following
two digits: 00-15,21-32,61-72,80 and pass an ABA-specific,
position-weighted checksum.

ABA Routing Number medium breadth

The medium breadth detects a nine-digit number with checksum validation, and eliminates
common test numbers.

Table 45-3 ABA Routing Number medium-breadth patterns

Pattern

[0123678]\d{8}

[0123678]\d{3}-\d{4}-\d

Table 45-4 ABA Routing Number medium-breadth validators

Mandatory validator Description

ABA Checksum Computes the checksum and validates the pattern against
it.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

123456789

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

ABA Routing Number narrow breadth

The narrow breadth detects a nine-digit number with checksum validation, eliminates common
test numbers, and requires the presence of related keywords.
Library of system data identifiers 1015
Argentina Tax Identification Number

Table 45-5 ABA Routing Number narrow-breadth patterns

Pattern

[0123678]\d{8}

[0123678]\d{3}-\d{4}-\d

Table 45-6 ABA Routing Number narrow-breadth validators

Mandatory validator Description

ABA Checksum Computes the checksum and validates the pattern against
it.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

123456789

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

aba, aba #, aba routing #, aba routing number, aba#,

abarouting#, abaroutingnumber, american bank
association routing #, american bank association
routing number, americanbankassociationrouting#,
americanbankassociationroutingnumber, bank routing
#, bank routing number, bankrouting#,
bankroutingnumber

Number delimiter Validates a match by checking the surrounding numbers.

Argentina Tax Identification Number

Argentina issues a DNI (Documento Nacional de Identidad) as its national form of identification.
It is assigned at birth by the National Registry for People. For tax paying purposes, the CUIT
and the CUIL numbers are issued which are based on the DNI.
The Argentina Tax Identification Number data identifier detects an 11-digit number that matches
the Argentina Tax Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “Argentina Tax Identification Number wide breadth” on page 1016.
Library of system data identifiers 1016
Argentina Tax Identification Number

■ The medium breadth detects an 11-digit number with checksum validation. It also checks
for common test numbers and duplicate digits.
See “Argentina Tax Identification Number medium breadth” on page 1016.
■ The narrow breadth detects an 11-digit number that passes checksum validation. It also
checks for common test numbers, duplicate digits, and requires the presence of related
keywords.
See “Argentina Tax Identification Number narrow breadth” on page 1017.

Argentina Tax Identification Number wide breadth

The wide breadth detects an 11-digit number without checksum validation.

Table 45-7 Argentina Tax Identification Number wide-breadth patterns

Pattern

20-\d{8}-\d

23-\d{8}-\d

27-\d{8}-\d

30-\d{8}-\d

33-\d{8}-\d

34-\d{8}-\d

Table 45-8 Argentina Tax Identification Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Argentina Tax Identification Number medium breadth

The medium breadth detects an 11-digit number with checksum validation. It also checks for
common test numbers and duplicate digits.

Table 45-9 Argentina Tax Identification Number medium-breadth patterns

Pattern

20-\d{8}-\d

23-\d{8}-\d
Library of system data identifiers 1017
Argentina Tax Identification Number

Table 45-9 Argentina Tax Identification Number medium-breadth patterns (continued)

Pattern

27-\d{8}-\d

30-\d{8}-\d

33-\d{8}-\d

34-\d{8}-\d

Table 45-10 Argentina Tax Identification Number medium breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Argentinian Tax Identity Number Validation Check Computes the checksum and validates the pattern against
it.

Argentina Tax Identification Number narrow breadth

The narrow breadth detects an 11-digit number that passes checksum validation. It also checks
for common test numbers, duplicate digits, and requires the presence of related keywords.

Table 45-11 Argentina Tax Identification Number narrow-breadth patterns

Pattern

20-\d{8}-\d

23-\d{8}-\d

27-\d{8}-\d

30-\d{8}-\d

33-\d{8}-\d

34-\d{8}-\d

Table 45-12 Argentina Tax Identification Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1018
Australia Driver's License Number

Table 45-12 Argentina Tax Identification Number narrow-breadth validators (continued)

Mandatory validator Description

Argentinian Tax Identity Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

Inputs:

Tax ID, tax number, Tax No., taxpayer ID, tax identity
number, tax identification no, tax identification number,
TaxID#, taxidnumber#, taxpayer number, Argentina
taxpayer ID

Número de Identificación Fiscal, número de

contribuyente

Australia Driver's License Number

A driver's license is required in Australia before a person is permitted to drive a motor vehicle
of any description on a road in Australia.
The Australia Driver's License Number data identifier detects an 8-, 9-, or 10-digit number, or
a six-digit alphanumeric pattern that matches the Australia Driver's license number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight-, nine-, or 10-digit number, or a six-digit alphanumeric
pattern that matches the Australia Driver's license number format. It also checks for common
test numbers.
See “Australia Driver's License Number wide breadth” on page 1018.
■ The wide breadth detects an eight-, nine-, or 10-digit number, or a six-digit alphanumeric
pattern that matches the Australia Driver's license number format. It also checks for common
test numbers, and requires the presence of related keywords.
See “Australia Driver's License Number narrow breadth” on page 1019.

Australia Driver's License Number wide breadth

The wide breadth detects an eight-, nine-, or 10-digit number, or a six-digit alphanumeric
pattern that matches the Australia Driver's license number format. It also checks for common
test numbers.
Library of system data identifiers 1019
Australia Driver's License Number

Table 45-13 Australia Driver's License Number wide-breadth patterns

Pattern

\d\d\d \d\d\d \d\d\d

\d\d \d\d\d \d\d\d

[A-Za-z]\d\d\d\d\d

\d\d\d[-]\d\d\d[-]\d\d\d\d

Table 45-14 Australia Driver's License Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000, 11111, 22222, 33333, 44444, 55555, 66666,

77777, 88888, 99999

Australia Driver's License Number narrow breadth

Table 45-15 Australia Driver's License Number narrow-breadth patterns

Pattern

\d\d\d \d\d\d \d\d\d

\d\d \d\d\d \d\d\d

[A-Za-z]\d\d\d\d\d

\d\d\d[-]\d\d\d[-]\d\d\d\d

Table 45-16 Australia Driver's License Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1020
Australian Business Number

Table 45-16 Australia Driver's License Number narrow-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000, 11111, 22222, 33333, 44444, 55555, 66666,

77777, 88888, 99999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

driver license, drivers license, driving license, driver

license number, drivers license number, driving license
number, dlno#, drivers lic., driver's license number,
driver licence, drivers licence, driving licence, driver
permit, drivers permit, driving permit, license number,
licence number

Australian Business Number

The Australian Business Number, or ABN, is a unique identifier issued by the Australian
Business Register (ABR), operated by the Australian Taxation Office (ATO).
The Australian Business Number data identifier detects an 11-digit number that matches the
Australian Business Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “Australian Business Number wide breadth” on page 1020.
■ The medium breadth detects an 11-digit number with checksum validation. It also eliminates
common test numbers and ranges reserved for future use.
See “Australian Business Number medium breadth” on page 1021.
■ The narrow breadth detects an 11-digit number that passes checksum validation. It also
eliminates common test numbers, ranges reserved for future use, duplicate digits, and
requires the presence of ABN-related keywords.
See “Australian Business Number narrow breadth” on page 1021.

Australian Business Number wide breadth

The wide breadth detects an 11-digit number without checksum validation.
Library of system data identifiers 1021
Australian Business Number

Table 45-17 Australian Business Number wide-breadth patterns

Pattern

\d{11}

\d{2}[ -]\d{3}[ -]\d{3}[ -]\d{3}

Table 45-18 Australian Business Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Australian Business Number medium breadth

The medium breadth detects an 11-digit number with checksum validation. It also eliminates
common test numbers, such as 123456789, and ranges reserved for future use.

Table 45-19 Australian Business Number medium-breadth patterns

Pattern

\d{11}

\d{2}[ -]\d{3}[ -]\d{3}[ -]\d{3}

Table 45-20 Australian Business Number medium-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Australian Business Number Validation Check Computes the checksum and validates the pattern against
it.

Australian Business Number narrow breadth

The narrow breadth detects an 11-digit number that passes checksum validation. It also
eliminates common test numbers, such as 123456789, ranges reserved for future use, duplicate
digits, and requires the presence of ABN-related keywords.

Table 45-21 Australian Business Number narrow-breadth patterns

Pattern

\d{11}
Library of system data identifiers 1022
Australian Company Number

Table 45-21 Australian Business Number narrow-breadth patterns (continued)

Pattern

\d{2}[ -]\d{3}[ -]\d{3}[ -]\d{3}

Table 45-22 Australian Business Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Australian Business Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

Inputs:

Australia Business No, Business No, BusinessNo#,

Business Number, Australia Business No., ABN, abn#,
businessID#, business ID, abn, ABN#, business
number, businessno#

Australian Company Number

An Australian Company Number (ACN) is a unique nine-digit number issued by the Australian
Securities and Investments Commission to every company registered under the Commonwealth
Corporations Act 2001.
The Australian Company Number data identifier detects a nine-digit number that matches the
Australian Company Number format.
The Australia Company Number data identifier provides three breadths of detection:
■ The wide breadth detects a nine-digit number without checksum validation.
See “Australian Company Number wide breadth” on page 1023.
■ The medium breadth detects a nine-digit number with checksum validation.
See “Australian Company Number medium breadth” on page 1023.
■ The narrow breadth detects a nine-digit number with checksum validation. It also requires
the presence of ACN-related keywords.
See “Australian Company Number narrow breadth” on page 1023.
Library of system data identifiers 1023
Australian Company Number

Australian Company Number wide breadth

The wide breadth detects a nine-digit number without checksum validation.

Table 45-23 Australian Company Number wide-breadth pattern

Pattern

\d{3} \d{3} \d{3}

Table 45-24 Australian Company Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Australian Company Number medium breadth

The wide breadth detects a nine-digit number without checksum validation.

Table 45-25 Australian Company Number medium-breadth pattern

Pattern

\d{3} \d{3} \d{3}

Table 45-26 Australian Company Number medium-breadth validators

Mandatory validator Description

Australian Company Number Validation Check Computes the checksum and validates the pattern against
it.

Australian Company Number narrow breadth

The wide breadth detects a nine-digit number without checksum validation.

Table 45-27 Australian Company Number narrow-breadth pattern

Pattern

\d{3} \d{3} \d{3}

Library of system data identifiers 1024
Australian Medicare Number

Table 45-28 Australian Company Number narrow-breadth validators

Mandatory validator Description

Australian Company Number Validation Check Computes the checksum and validates the pattern against
it.

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Australia Company Number, ACN, Australia Company

No., ACN No, ACN No#, Australia Company No#, ACN
Number

Australian Medicare Number

The Australian Medicare Number is a personal identifier allocated by the Australian Health
Insurance Commission to eligible persons under the Medicare scheme. This number appears
on the Australian Medicare card.
The Australian Medicare Number data identifier detects an eight- or nine-digit number that
matches the Australian Medicare Number format.
The Australian Medicare Number data identifier provides three breadths of detection:
■ The wide breadth detects an eight- or nine-digit number without checksum validation.
See “Australian Medicare Number wide breadth” on page 1024.
■ The medium breadth detects an eight- or nine-digit number with checksum validation.
See “Australian Medicare Number medium breadth” on page 1025.
■ The narrow breadth detects an eight- or nine-digit number with checksum validation. It also
requires the presence of related keywords.
See “Australian Medicare Number narrow breadth” on page 1026.

Australian Medicare Number wide breadth

The wide breadth detects an eight- or nine-digit number without checksum validation.
Library of system data identifiers 1025
Australian Medicare Number

Table 45-29 Australian Medicare Number wide-breadth patterns

Pattern

[2-6]\d{10}

[2-6]\d{9}

[2-6]\d{3} \d{5} \d{1}

[2-6]\d{3}-\d{5}-\d{1}

[2-6]\d{9}[ -/]\d{1}

[2-6]\d{3} \d{5} \d{1}[ -/]\d{1}

[2-6]\d{3}-\d{5}-\d{1}[ -/]\d{1}

[2-6]\d{3} \d{5} \d \d

[2-6]\d{3}-\d{5}-\d-\d

Table 45-30 Australian Medicare Number wide-breadth validator

Validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Australian Medicare Number medium breadth

The medium breadth detects an eight- or nine-digit number with checksum validation.

Table 45-31 Australian Medicare Number medium breadth patterns

Pattern

[2-6]\d{10}

[2-6]\d{9}

[2-6]\d{3} \d{5} \d{1}

[2-6]\d{3}-\d{5}-\d{1}

[2-6]\d{9}[ -/]\d{1}

[2-6]\d{3} \d{5} \d{1}[ -/]\d{1}

[2-6]\d{3}-\d{5}-\d{1}[ -/]\d{1}
Library of system data identifiers 1026
Australian Medicare Number

Table 45-31 Australian Medicare Number medium breadth patterns (continued)

Pattern

[2-6]\d{3} \d{5} \d \d

[2-6]\d{3}-\d{5}-\d-\d

Table 45-32 Australian Medicare Number medium breadth validators

Validator Description

Australian Medicare Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Australian Medicare Number narrow breadth

The narrow breadth detects an eight- or nine-digit number with checksum validation. It also
requires the presence of related keywords.

Table 45-33 Australian Medicare Number narrow breadth patterns

Pattern

[2-6]\d{10}

[2-6]\d{9}

[2-6]\d{3} \d{5} \d{1}

[2-6]\d{3}-\d{5}-\d{1}

[2-6]\d{9}[ -/]\d{1}

[2-6]\d{3} \d{5} \d{1}[ -/]\d{1}

[2-6]\d{3}-\d{5}-\d{1}[ -/]\d{1}

[2-6]\d{3} \d{5} \d \d

[2-6]\d{3}-\d{5}-\d-\d

Table 45-34 Australian Medicare Number narrow breadth validators

Validator Description

Duplicate digits Ensures that a string of digits is not all the same.
Library of system data identifiers 1027
Australian Passport Number

Table 45-34 Australian Medicare Number narrow breadth validators (continued)

Validator Description

Australian Medicare Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

Inputs:

Australian Medicare Number, Medicare Number,

Medicare No., Medicare No#, Australian Medicare No.,
Australian Medicare No#

Australian Passport Number

Australian passports are travel documents issued to Australian citizens by the Australian
Passport Office of the Department of Foreign Affairs and Trade.
The Australian Passport Number data identifier detects an eight-character alphanumeric pattern
that matches the Australian Passport Number format.
The Australia Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern without checksum
validation.
See “ Australian Passport Number wide breadth” on page 1027.
■ The narrow breadth detects an eight-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.
See “Australian Passport Number narrow breadth” on page 1028.

Australian Passport Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern without checksum validation.

Table 45-35 Australian Passport Number wide-breadth patterns

Pattern

[XBCEGTHJLMNP]\d{7}

[XBCEGTHJLMNP] \d{7}
Library of system data identifiers 1028
Australian Passport Number

Table 45-36 Australian Passport Number wide-breadth validator

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Australian Passport Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.

Table 45-37 Australian Passport Number narrow-breadth patterns

Pattern

[XBCEGTHJLMNP]\d{7}

[XBCEGTHJLMNP] \d{7}

Table 45-38 Australian Passport Number narrow-breadth validators

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Australian passport no., Australian Passport Number,

Australian passport number, Passport number,
passport number, passport#, passportno,
passportnumber#, australianpassportnumber,
passportno#
Library of system data identifiers 1029
Australian Tax File Number

Australian Tax File Number

The Australian Tax File Number (TFN) is an eight- or nine-digit number issued by the Australian
Taxation Office (ATO) to taxpayers (individual, company, superannuation fund, partnership,
or trust) to identify their Australian tax dealings.
The Australian Tax File Number data identifier detects an eight- or nine-digit number that
matches the Australian Tax File Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight- or nine-digit number with checksum validation.
See Table 45-39 on page 1029.
■ The narrow breadth detects an eight- or nine-digit number with checksum validation. It also
requires the presence of related keywords.
See “Australian Tax File Number narrow breadth” on page 1029.

Australian Tax File Number wide breadth

The wide breadth detects an eight- or nine-digit number with checksum validation.

Table 45-39 Australian Tax File Number wide-breadth patterns

Patterns

\d{8}

\d{9}

Table 45-40 Australian Tax File Number wide-breadth validators

Mandatory validator Description

Australian Tax File validation check Computes the checksum and validates the pattern against
it.

Australian Tax File Number narrow breadth

The narrow breadth detects an eight- or nine-digit number with checksum validation. It also
requires the presence of related keywords.

Table 45-41 Australian Tax File Number narrow-breadth patterns

Patterns

\d{8}
Library of system data identifiers 1030
Austria Passport Number

Table 45-41 Australian Tax File Number narrow-breadth patterns (continued)

Patterns

\d{9}

Table 45-42 Australian Tax File Number narrow-breadth validators

Mandatory validators Description

Australian Tax File validation check Computes the checksum and validates the pattern
against it.

Find keywords At least one of the following keywords or key

phrases must be present for the data to be matched
when you use this option.

Inputs:

TFN, Tax File Number, Australia TFN, Australia

Tax File Number, ATO, ATO TFN, ATO tax file
number

Austria Passport Number

Austrian passports are travel documents issued to Austrian citizens by the Austrian Passport
Office of the Department of Foreign Affairs and Trade, both in Austria and overseas, and
enable the passport holder to travel internationally.
The Austria Passport Number data identifier detects an eight-character alphanumeric pattern
that matches the Austria Passport Number format.
The Austria Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern without checksum
validation.
See “Austria Passport Number wide breadth” on page 1030.
■ The narrow breadth detects an eight-character alphanumeric pattern. It also requires the
presence of passport-related keywords.
See “Austria Passport Number narrow breadth” on page 1031.

Austria Passport Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern without checksum validation.
Library of system data identifiers 1031
Austria Tax Identification Number

Table 45-43 Austria Passport Number wide-breadth patterns

Patterns

\l[ ]\d{7}

\l\d{7}

Table 45-44 Austria Passport Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Austria Passport Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern. It also requires the
presence of passport-related keywords.

Table 45-45 Austria Passport Number narrow-breadth patterns

Pattern

\l[ ]\d{7}

\l\d{7}

Table 45-46 Austria Passport Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

Inputs:

REISEPASS, passport, ÖSTERREICHISCH REISEPASS,

reisepass

Austria Tax Identification Number

Austria issues nine-digit tax identification numbers to individuals based on their area of residence
to identify taxpayers and facilitate national taxes.
Library of system data identifiers 1032
Austria Tax Identification Number

The Austria Tax Identification Number data identifier detects a nine-digit number that matches
the Austria Tax Identification Number format.
The Austria Tax Identification Number provides two breadths of detection:
■ The wide breadth detects a nine-digit number without checksum validation.
See “Austria Tax Identification Number wide breadth” on page 1032.
■ The narrow breadth detects a nine-digit number. It also requires the presence of related
keywords.
See “Austria Tax Identification Number narrow breadth” on page 1032.

Austria Tax Identification Number wide breadth

The wide breadth detects a nine-digit number without checksum validation.

Table 45-47 Austria Tax Identification Number wide-breadth patterns

Pattern

\d{2}-\d{3}/\d{4}

\d{2} \d{3} \d{4}

\d{9}

Table 45-48 Austria Tax Identification Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Austria Tax Identification Number narrow breadth

The narrow breadth detects a nine-digit number. It also requires the presence of related
keywords.

Table 45-49 Austria Tax Identification Number narrow-breadth patterns

Patterns

\d{2}-\d{3}/\d{4}

\d{2} \d{3} \d{4}

\d{9}
Library of system data identifiers 1033
Austria Value Added Tax (VAT) Number

Table 45-50 Austria Tax Identification Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Austria, TIN, tax identification number, tax number,

Austrian Tax Number, Österreich, Steuernummer

Austria Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Austria, the VAT number
is issued by the tax office for the region in which the business is established.
The Austria Value Added Tax (VAT) Number data identifier detects an 11-character
alphanumeric pattern that matches Austria Value Added Tax (VAT) Number format.
The Austria Value Added Tax (VAT) Number data identifier provides three breadths of detection:
■ The wide breadth detects an 11-character alphanumeric pattern preceded with ATU without
checksum validation.
See “Austria Value Added Tax (VAT) Number wide breadth” on page 1033.
■ The medium breadth detects an 11-character alphanumeric pattern preceded with ATU with
checksum validation.
See “Austria Value Added Tax (VAT) Number medium breadth” on page 1034.
■ The narrow breadth detects an 11-character alphanumeric pattern preceded with ATU with
checksum validation. It also requires the presence of related keywords.
See “Austria Value Added Tax (VAT) Number narrow breadth” on page 1035.

Austria Value Added Tax (VAT) Number wide breadth

The wide breadth detects an 11-character alphanumeric pattern preceded with ATU without
checksum validation.
Library of system data identifiers 1034
Austria Value Added Tax (VAT) Number

Table 45-51 Austria Value Added Tax (VAT) Number wide-breadth patterns

Patterns

[Aa][Tt][Uu]\d{8}

[Aa][Tt] [Uu]\d{8}

[Aa][Tt][Uu] \d{8}

[Aa][Tt][Uu]\d{3} \d{4} \d

[Aa][Tt][Uu]\d{2} \d{4} \d{2}

Table 45-52 Austria Value Added Tax (VAT) Number wide-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Austria Value Added Tax (VAT) Number medium breadth

The medium breadth detects an 11-character alphanumeric pattern preceded with ATU with
checksum validation.

Table 45-53 Austria Value Added Tax (VAT) Number medium-breadth patterns

Patterns

[Aa][Tt][Uu]\d{8}

[Aa][Tt] [Uu]\d{8}

[Aa][Tt][Uu] \d{8}

[Aa][Tt][Uu]\d{3} \d{4} \d

[Aa][Tt][Uu]\d{2} \d{4} \d{2}

Library of system data identifiers 1035
Austria Value Added Tax (VAT) Number

Table 45-54 Austria Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Austria VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Austria Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects an 11-character alphanumeric pattern preceded with ATU with
checksum validation. It also requires the presence of VAT-related keywords.

Table 45-55 Austria Value Added Tax (VAT) Number narrow-breadth patters

Patterns

[Aa][Tt][Uu]\d{8}

[Aa][Tt] [Uu]\d{8}

[Aa][Tt][Uu] \d{8}

[Aa][Tt][Uu]\d{3} \d{4} \d

[Aa][Tt][Uu]\d{2} \d{4} \d{2}

Table 45-56 Austria Value Added Tax (VAT) Number narrow breadth-validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Austria VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999
Library of system data identifiers 1036
Austrian Social Security Number

Table 45-56 Austria Value Added Tax (VAT) Number narrow breadth-validators (continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

vat number, vat, vat#, austrian vat number, vat no.,

vatno#, value added tax number, austrian vat, MwSt,
Umsatzsteuernummer, MwStNummer,
Ust.-Identifikationsnummer, umsatzsteuer,
Umsatzsteuer-Identifikationsnummer, vat identification
number, atu number, uid number

Austrian Social Security Number

A 10-digit social security number is allocated to Austrian citizens who receive available social
security benefits. It is allocated by the umbrella association of the Austrian social security
authorities.
The Austrian Social Security Number data identifier detects a 10-digit number that matches
the Austrian Social Security Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “Austrian Social Security Number wide breadth” on page 1036.
■ The medium breadth detects a 10-digit number that passes checksum validation. It also
eliminates common test numbers and ranges reserved for future use.
See “Austrian Social Security Number medium breadth” on page 1037.
■ The narrow breadth detects a 10-digit number that passes checksum validation. It also
eliminates common test numbers, ranges reserved for future use, duplicate digits, and
requires the presence of Austrian Social Security Number-related keywords.
See “Austrian Social Security Number narrow breadth” on page 1037.

Austrian Social Security Number wide breadth

The wide breadth detects a 10-digit number without checksum validation.
Library of system data identifiers 1037
Austrian Social Security Number

Table 45-57 Austrian Social Security Number wide-breadth patterns

Pattern

\d{10}

\d{4}-\d{6}

\d{4} \d{6}

Table 45-58 Austrian Social Security Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Austrian Social Security Number medium breadth

The narrow breadth detects a 10-digit number that passes checksum validation. It also
eliminates common test numbers, such as 123456789, and ranges reserved for future use.

Table 45-59 Austrian Social Security Number medium-breadth patterns

Pattern

\d{10}

\d{4}-\d{6}

\d{4} \d{6}

Table 45-60 Austrian Social Security Number medium-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Austrian Social Security Number Validation Check Computes the checksum and validates the pattern against
it.

Austrian Social Security Number narrow breadth

The narrow breadth detects a 10-digit number that passes checksum validation. It also
eliminates common test numbers, ranges reserved for future use, duplicate digits, and requires
the presence of Austrian Social Security Number-related keywords.
Library of system data identifiers 1038
Austrian Social Security Number

Table 45-61 Austrian Social Security Number narrow-breadth patterns

Pattern

\d{10}

\d{4}-\d{6}

\d{4} \d{6}

Table 45-62 Austrian Social Security Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Austrian Social Security Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1039
Belgian National Number

Table 45-62 Austrian Social Security Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

Inputs:

social security no, social security number, social

security code, Austrian SSN, SSN#, ssn#, SSN, ssn,
socialsecurityno#,

sozialversicherungsnummer, soziale sicherheit kein,

sozialversicherungsnummer#, sozialesicherheitkein#

insurance number, insurance code, insurancecode#,

national insurance number, insurance no, health
insurance number, health insurance, health insurance
no, EHIC number, EHIC no

versicherungsnummer, versicherungscode, nationale

versicherungsnummer, krankenkassennummer,
krankenversicherung

zdravstveno zavarovanje

EHIC Nummer, Österreichischen SSN,

Österreichischen Sozialversicherungs kein

številka zavarovanja, biztosítási szám, zavarovalna

šifra, biztosítási kód, társadalombiztosítási azonosító
jel, nacionalna številka zavarovanja,
egészségbiztosítási szám, številka zdravstvenega
zavarovanja, egészségbiztosítás, EHIC szám, Številka
EHIC

Belgian National Number

All citizens of Belgium have a National Number. Belgians 12 years of age and older are issued
a Belgian identity card. The Belgian National Number is used also as a Belgian Social Security
Number for citizens.
The Belgian National Number data identifier detects an 11-digit number that matches the
Belgian National Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “Belgian National Number wide breadth ” on page 1040.
Library of system data identifiers 1040
Belgian National Number

■ The medium breadth detects an 11-digit number with checksum validation.

See “Belgian National Number medium breadth” on page 1040.
■ The narrow breadth detects an 11-digit number with checksum validation. It also requires
the presence of related keywords.
See “Belgian National Number narrow breadth” on page 1041.

Belgian National Number wide breadth

The wide breadth detects an 11-digit number without checksum validation.

Table 45-63 Belgian National Number wide-breadth patterns

Pattern

\d{11}

\d{6} \d{3} \d{2}

\d{2}.\d{2}.\d{2}-\d{3}.\d{2}

\d{2}[ .][012345]\d[ .][0123]\d[ -.]\d{3}[ .-]\d{2}

Table 45-64 Belgian National Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Belgian National Number medium breadth

The medium breadth detects an 11-digit number with checksum validation.

Table 45-65 Belgian National Number medium-breadth patterns

Pattern

\d{11}

\d{6} \d{3} \d{2}

\d{2}.\d{2}.\d{2}-\d{3}.\d{2}

\d{2}[ .][012345]\d[ .][0123]\d[ -.]\d{3}[ .-]\d{2}

Library of system data identifiers 1041
Belgian National Number

Table 45-66 Belgian National Number medium-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Belgian National Number Validation Check Computes the checksum and validates the pattern against
it.

Belgian National Number narrow breadth

The narrow breadth detects an 11-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-67 Belgian National Number narrow-breadth patterns

Pattern

\d{11}

\d{6} \d{3} \d{2}

\d{2}.\d{2}.\d{2}-\d{3}.\d{2}

\d{2}[ .][012345]\d[ .][0123]\d[ -.]\d{3}[ .-]\d{2}

Table 45-68 Belgian National Number narrow-breadth validators

Mandatory validator Description

Belgian National Number Validation Check Computes the checksum and validates the pattern against
it.

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1042
Belgium Driver's Licence Number

Table 45-68 Belgian National Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Belgian national number, national number ,social

security number, nationalnumber#, ssn#, ssn,
nationalnumber, bnn#, bnn, personal ID number,
personalIDnumber#

Numéro national, numéro de sécurité, numéro

d'assuré, identifiant national, identifiantnational#,
Numéronational#

Belgium Driver's Licence Number

Identification number for an individual's driver's licence issued by the Driver and Vehicle
Licensing Agency of Belgium.
The Belgium Driver's Licence Number data identifier detects a 10-digit number that matches
the Belgium Driver's Licence Number format.
The Belgium Driver's License Number data identifier provides two breadths of detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “Belgium Driver's Licence Number wide breadth” on page 1042.
■ The narrow breadth detects a 10-digit number without checksum validation. It requires the
presence of related keywords.
See “Belgium Driver's Licence Number narrow breadth” on page 1043.

Belgium Driver's Licence Number wide breadth

The wide breadth detects a 10-digit number without checksum validation.

Table 45-69 Belgium Driver's Licence Number wide-breadth pattern

Pattern

\d{10}
Library of system data identifiers 1043
Belgium Driver's Licence Number

Table 45-70 Belgium Driver's Licence Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Belgium Driver's Licence Number narrow breadth

The narrow breadth detects a 10-digit number without checksum validation. It requires the
presence of related keywords.

Table 45-71 Belgium Driver's Licence Number narrow-breadth pattern

Pattern

\d{10}

Table 45-72 Belgium Driver's License Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Führerschein, Fuhrerschein, Fuehrerschein,

Führerscheinnummer, Fuhrerscheinnummer,
Fuehrerscheinnummer, Führerscheinnummer,
Fuhrerscheinnummer, Fuehrerscheinnummer,
Führerschein- Nr, Fuhrerschein- Nr, Fuehrerschein-
Nr

DL#, Driver License, Driver License Number, driver

license number, Driver Licence, Drivers Lic., Drivers
License, Drivers Licence, Driver's License, Driver's
License Number, driver's license number, Driver's
Licence Number, Driving License number, driving
license number, DLNo#, dlno#

permis de conduire, rijbewijs, Rijbewijsnummer,

Numéro permis conduire
Library of system data identifiers 1044
Belgium Passport Number

Belgium Passport Number

Belgian passports are issued by the Belgian state to its citizens to facilitate international travel.
The Federal Public Service Foreign Affairs, formerly known as the Ministry of Foreign Affairs,
is responsible for issuing and renewing Belgian passports.
The Belgium Passport Number data identifier detects an eight-character alphanumeric pattern
that matches the Belgium Passport Number format.
The Belgium Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern without checksum
validation.
See “Belgium Passport Number wide breadth” on page 1044.
■ The narrow breadth detects an eight-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.
See “Belgium Passport Number narrow breadth” on page 1044.

Belgium Passport Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern without checksum validation.

Table 45-73 Belgium Passport Number wide-breadth pattern

Pattern

\l{2}\d{6}

Table 45-74 Belgium Passport Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Belgium Passport Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.

Table 45-75 Belgium Passport Number narrow-breadth patterns

Patterns

\l{2}\d{6}
Library of system data identifiers 1045
Belgium Tax Identification Number

Table 45-76 Belgium Passport Number narrow-breadth patterns

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

passport number

Paspoort, paspoort, paspoortnummer, Reisepass kein,

Reisepass, Passnummer, Passeport, Passeport livre,
Passeport carte, numéro passeport

Belgian Passport Number, belgian passport number,

passport no

Belgium Tax Identification Number

Belgium issues a tax identification number for persons who has obligations to declare taxes
in Belgium.
The Belgium Tax Identification Number data identifier detects an 11-digit number that matches
the Belgium Tax Identification Number format.
The Belgium Tax Identification Number data identifier provides two breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation. It also requires
the presence of related keywords.
See “Belgium Tax Identification Number wide breadth” on page 1045.
■ The narrow breadth detects an 11-digit number with checksum validation. It also requires
the presence of related keywords.
See “Belgium Tax Identification Number narrow breadth” on page 1046.

Belgium Tax Identification Number wide breadth

The wide breadth detects an 11-digit number without checksum validation. It also requires the
presence of related keywords.

Table 45-77 Belgium Tax Identification Number wide-breadth patterns

Patterns

\d{2}[01]\d[0123]\d{6}
Library of system data identifiers 1046
Belgium Tax Identification Number

Table 45-77 Belgium Tax Identification Number wide-breadth patterns (continued)

Patterns

\d{2}[01]\d[0123]\d \d{3} \d{2}

\d{2}.[01]\d.[0123]\d-\d{3}.\d{2}

\d{2}[ .][01]\d[ .][0123]\d[ -.]\d{3}[ .-]\d{2}

Table 45-78 Belgium Tax Identification Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

tax number, national registration number, National

Registration Number, tax registration number, tax id,
Tax ID, TAX Number

Numéro de registre national, numéro d'identification

fiscale, belasting aantal, Steuernummer, NIF, nif, NIF#,
nif#

Belgium Tax Identification Number narrow breadth

The narrow breadth detects an 11-digit number that passes checksum validation. It also
requires the presence of related keywords.

Table 45-79 Belgium Tax Identification Number narrow-breadth patterns

Patterns

\d{2}[01]\d[0123]\d{6}

\d{2}[01]\d[0123]\d \d{3} \d{2}

\d{2}.[01]\d.[0123]\d-\d{3}.\d{2}

\d{2}[ .][01]\d[ .][0123]\d[ -.]\d{3}[ .-]\d{2}

Library of system data identifiers 1047
Belgium Value Added Tax (VAT) Number

Table 45-80 Belgium Tax Identification Number narrow-breadth validators

Mandatory validator Description

Belgian Tax Identification Number Validation Check Checksum validator for Belgium Tax Identification Number.

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

tax number, national registration number, National

Registration Number, tax registration number, tax id,
Tax ID, TAX Number

Numéro de registre national, numéro d'identification

fiscale, belasting aantal, Steuernummer, NIF, nif, NIF#,
nif#

Belgium Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Belgium, the Value
Added Tax is issued by VAT office for the region in which the business is established.
The Belgium Value Added Tax (VAT) Number detects a 12-character alphanumeric pattern
that matches the Belgium Value Added Tax (VAT) Number format.
The Belgium Value Added Tax (VAT) Number data identifier provides three breadths of
detection:
■ The wide breadth detects a 12-character alphanumeric pattern beginning with BE without
checksum validation.
See “Belgium Value Added Tax (VAT) Number wide breadth” on page 1048.
■ The medium breadth detects a 12-character alphanumeric pattern beginning with BE with
checksum validation.
See “Belgium Value Added Tax (VAT) Number medium breadth” on page 1048.
■ The narrow breadth detects a 12-character alphanumeric pattern beginning with BE with
checksum validation. It also requires the presence of related keywords.
See “Belgium Value Added Tax (VAT) Number narrow breadth” on page 1049.
Library of system data identifiers 1048
Belgium Value Added Tax (VAT) Number

Belgium Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 12-character alphanumeric pattern beginning with BE without
checksum validation.

Table 45-81 Belgium Value Added Tax (VAT) Number wide-breadth patterns

Patterns

[Bb][Ee][0][123456789]\d{8}

[Bb][Ee][0][123456789].\d{4}.\d{4}

[Bb][Ee][0][123456789]-\d{4}-\d{4}

[Bb][Ee][0][123456789] \d{4} \d{4}

Table 45-82 Belgium Value Added Tax (VAT) Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Belgium Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 12-character alphanumeric pattern beginning with BE with
checksum validation.

Table 45-83 Belgium Value Added Tax (VAT) Number medium breadth patterns

Patterns

[Bb][Ee][0][123456789]\d{8}

[Bb][Ee][0][123456789].\d{4}.\d{4}

[Bb][Ee][0][123456789]-\d{4}-\d{4}

[Bb][Ee][0][123456789] \d{4} \d{4}

Table 45-84 Belgium Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Belgium VAT Number Validation Check Checksum validator for the Belgian Value Added Tax (VAT)
Number.
Library of system data identifiers 1049
Brazilian Election Identification Number

Belgium Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 12-character alphanumeric pattern beginning with BE with
checksum validation. It also requires the presence of related keywords.

Table 45-85 Belgium Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Bb][Ee][0][123456789]\d{8}

[Bb][Ee][0][123456789].\d{4}.\d{4}

[Bb][Ee][0][123456789]-\d{4}-\d{4}

[Bb][Ee][0][123456789] \d{4} \d{4}

Table 45-86 Belgium Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Belgium VAT Number Validation Check Checksum validator for the Belgian Value Added Tax (VAT)
Number.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Numéro T.V.A., BTW number, Nº TVA, BTW NR, VAT

Number, vat no, vat number, Numéro T.V.A,
Umsatzsteuer-Identifikationsnummer,
Umsatzsteuernummer, BTW, BTW#, VAT#, vat#

Brazilian Election Identification Number

Brazil voting is compulsory to all citizens between 18 and 70 years old. To vote, all citizens
must be registered to vote and should present an official identity document, usually the election
identification number card.
The Brazilian Election Identification Number detects a 9- to 14-digit number that matches the
Brazilian Election Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 9- to 14-digit number without checksum validation.
Library of system data identifiers 1050
Brazilian Election Identification Number

See “Brazilian Election Identification Number wide breadth” on page 1050.

■ The medium breadth detects a 9- to 14-digit number that passes checksum validation.
See “Brazilian Election Identification Number medium breadth” on page 1051.
■ The narrow breadth detects a 9- to 14-digit number that passes checksum validation, and
requires the presence of related keywords.
See “Brazilian Election Identification Number narrow breadth” on page 1052.

Brazilian Election Identification Number wide breadth

The wide breadth detects a 9- to 14-digit number without checksum validation.

Table 45-87 Brazilian Election Identification Number wide-breadth patterns

Patterns

\d{5}[0]\d{3}

\d{5}[12]\d\d{2}

\d{6}[0]\d{3}

\d{6}[0]\d[/]\d{2}

\d{6}[12]\d\d{2}

\d{6}[12]\d[/]\d{2}

\d{7}[0]\d{3}

\d{7}[0]\d[/]\d{2}

\d{7}[12]\d[/]\d{2}

\d{7}[12]\d\d{2}

\d{8}[0]\d{3}

\d{8}[0]\d[/]\d{2}

\d{8}[0]\d{3}[/]\d{2}

\d{8}[12]\d[/]\d{2}

\d{8}[12]\d\d{2}

\d{8}[12]\d\d{2}[/]\d{2}
Library of system data identifiers 1051
Brazilian Election Identification Number

Table 45-88 Brazilian Election Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Brazilian Election Identification Number medium breadth

The medium breadth detects a 9- to 14-digit number that passes checksum validation.

Table 45-89 Brazilian Election Identification Number medium-breadth patterns

Patterns

\d{5}[0]\d{3}

\d{5}[12]\d\d{2}

\d{6}[0]\d{3}

\d{6}[0]\d[/]\d{2}

\d{6}[12]\d\d{2}

\d{6}[12]\d[/]\d{2}

\d{7}[0]\d{3}

\d{7}[0]\d[/]\d{2}

\d{7}[12]\d[/]\d{2}

\d{7}[12]\d\d{2}

\d{8}[0]\d{3}

\d{8}[0]\d[/]\d{2}

\d{8}[0]\d{3}[/]\d{2}

\d{8}[12]\d[/]\d{2}

\d{8}[12]\d\d{2}

\d{8}[12]\d\d{2}[/]\d{2}
Library of system data identifiers 1052
Brazilian Election Identification Number

Table 45-90 Brazilian Election Identification Number medium-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Brazil Election Identification Number Validation Check Computes Brazil Election Identification Number checksum
every Brazil Election Identification Number must pass.

Brazilian Election Identification Number narrow breadth

The narrow breadth detects a 9- to 14-digit number that passes checksum validation. It also
requires the presence of related keywords.

Table 45-91 Brazilian Election Identification Number narrow-breadth patterns

Patterns

\d{5}[0]\d{3}

\d{5}[12]\d\d{2}

\d{6}[0]\d{3}

\d{6}[0]\d[/]\d{2}

\d{6}[12]\d\d{2}

\d{6}[12]\d[/]\d{2}

\d{7}[0]\d{3}

\d{7}[0]\d[/]\d{2}

\d{7}[12]\d[/]\d{2}

\d{7}[12]\d\d{2}

\d{8}[0]\d{3}

\d{8}[0]\d[/]\d{2}

\d{8}[0]\d{3}[/]\d{2}

\d{8}[12]\d[/]\d{2}

\d{8}[12]\d\d{2}

\d{8}[12]\d\d{2}[/]\d{2}
Library of system data identifiers 1053
Brazilian National Registry of Legal Entities Number

Table 45-92 Brazilian Election Identification Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Brazil Election Identification Number Validation Check Computes Brazil Election Identification Number checksum
every Brazil Election Identification Number must pass.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

election ID, identification number, electrol no., voter

ID, electrol identification number, Voter ID, electrol
number, election voter ID, Electrol Number, Electrol
No., Identification Number, Election Identification No.

número de identificação, identificação do eleitor,

número de identificação eleitoral, ID eleitor eleição,
Número identificação eleitoral brasileira

Brazilian National Registry of Legal Entities Number

The Brazilian National Registry of Legal Entities (CNPJ) Number is a unique number that
identifies an entity or other legal arrangement without legal personality by the Brazilian IRS
(an agency of the Ministry of Finance).
The Brazilian National Registry of Legal Entities (CNPJ) Number data identifier detects a
14-digit number that matches the Brazilian National Registry of Legal Entities (CNPJ) Number
format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 14-digit number without checksum validation.
See “Brazilian National Registry of Legal Entities Number wide breadth” on page 1054.
■ The medium breadth detects a 14-digit number with checksum validation.
See “Brazilian National Registry of Legal Entities Number medium breadth” on page 1054.
■ The narrow breadth detects a 14-digit number that passes checksum validation. It also
requires the presence of related keywords.
See “Brazilian National Registry of Legal Entities Number narrow breadth” on page 1055.
Library of system data identifiers 1054
Brazilian National Registry of Legal Entities Number

Brazilian National Registry of Legal Entities Number wide breadth

The wide breadth detects a 14-digit number without checksum validation.

Table 45-93 Brazilian National Registry of Legal Entities Number wide-breadth patterns

Pattern

\d{14}

\d{8}[/]\d{6}

\d{8}[/]\d{4}-\d{2}

\d{2}.\d{3}.\d{3}[/]\d{4}-\d{2}

Table 45-94 Brazilian National Registry of Legal Entities Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Brazilian National Registry of Legal Entities Number medium breadth

The medium breadth detects a 14-digit number with checksum validation.

Table 45-95 Brazilian National Registry of Legal Entities Number medium-breadth patterns

Pattern

\d{14}

\d{8}[/]\d{6}

\d{8}[/]\d{4}-\d{2}

\d{2}.\d{3}.\d{3}[/]\d{4}-\d{2}

Table 45-96 Brazilian National Registry of Legal Entities Number medium-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Brazilian National Registry of Legal Entities Number Computes the checksum and validates the pattern against
Validation Check it.
Library of system data identifiers 1055
Brazilian Natural Person Registry Number (CPF)

Brazilian National Registry of Legal Entities Number narrow breadth

The narrow breadth detects a 14-digit number that passes checksum validation. It also requires
the presence of related keywords.

Table 45-97 Brazilian National Registry of Legal Entities Number narrow-breadth patterns

Pattern

\d{14}

\d{8}[/]\d{6}

\d{8}[/]\d{4}-\d{2}

\d{2}.\d{3}.\d{3}[/]\d{4}-\d{2}

Table 45-98 Brazilian National Registry of Legal Entities Number narrow-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Brazilian National Registry of Legal Entities Number Computes the checksum and validates the pattern against
Validation Check it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Brazil legal entities number, legalnumber#,legal ID,

legal no., Brazilianlegalno#, legalnumber# ,legal no.,
legal entities number, CNPJ, CNPJ:, CNPJ#, cnpj#,
cnpj CNPJ n º, Registro Nacional de Pessoas Jurídicas
n º, entidades jurídicas ID

Brazilian Natural Person Registry Number (CPF)

The Cadastro de Pessoas Fisicas (CPF, "Natural Person Register") is a number assigned by
the Brazilian Federal Revenue to both Brazilians and resident aliens who pay taxes or take
part, directly or indirectly, in activities that provide revenue for any of the dozens of different
types of taxes existing in Brazil.
The Brazilian Natural Person Registry Number (CPF) data identifier detects an 11-digit number
that matches the Brazilian Natural Person Registry Number (CPF) format.
Library of system data identifiers 1056
Brazilian Natural Person Registry Number (CPF)

This data identifier provides the following breadths of detection:

■ The wide breadth detects an 11-digit number without checksum validation.
See “Brazilian Natural Person Registry Number wide breadth” on page 1056.
■ The medium breadth detects an 11-digit number with checksum validation.
See “Brazilian Natural Person Registry Number medium breadth” on page 1056.
■ The narrow breadth detects an 11-digit number that passes checksum validation. It also
requires the presence of related keywords.
See “Brazilian Natural Person Registry Number narrow breadth ” on page 1057.

Brazilian Natural Person Registry Number wide breadth

The wide breadth detects an 11-digit number without checksum validation.

Table 45-99 Brazilian Natural Person Registry Number wide-breadth patterns

Pattern

\d{11}

\d{9}[-]\d{2}

\d{3}[.]\d{3}[.]\d{3}[-]\d{2}

Table 45-100 Brazilian Natural Person Registry Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Brazilian Natural Person Registry Number medium breadth

The medium breadth detects an 11-digit number with checksum validation.

Table 45-101 Brazilian Natural Person Registry Number medium-breadth patterns Pattern

Pattern

\d{11}

\d{9}[-]\d{2}

\d{3}[.]\d{3}[.]\d{3}[-]\d{2}
Library of system data identifiers 1057
Brazilian Natural Person Registry Number (CPF)

Table 45-102 Brazilian Natural Person Registry Number medium breadth-validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Brazilian Natural Person Registry Number Validation Computes Brazilian Natural Person Registry Number
Check checksum every Brazilian Natural Person Registry Number
must pass.

Brazilian Natural Person Registry Number narrow breadth

The narrow breadth detects an 11-digit number that passes checksum validation. It also
requires the presence of related keywords.

Table 45-103 Brazilian Natural Person Registry Number narrow-breadth patterns

Pattern

\d{11}

\d{9}[-]\d{2}

\d{3}[.]\d{3}[.]\d{3}[-]\d{2}

Table 45-104 Brazilian Natural Person Registry Number narrow-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Brazilian Natural Person Registry Number Validation Computes Brazilian Natural Person Registry Number
Check checksum every Brazilian Natural Person Registry Number
must pass.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

registry of individuals, CPF#, cpf no, CPF no,

Registration number, natural persons registry no, cpf
no, natural persons record no, cpfno#, CPFno#

Cadastro de Pessoas Físicas, pessoas singulares

registro NO pessoa natural número de registro
Library of system data identifiers 1058
British Columbia Personal Healthcare Number

British Columbia Personal Healthcare Number

British Columbia (BC) residents are required by law to enroll in a Medical Service Plan (MSP)
to access basic medical care facilities.
The MSP membership card is called a Care Card and the MSP number is called a Personal
Healthcare Number.
The British Columbia Personal Healthcare Number data identifier detects a 10-digit number
that matches the format of the British Columbia Personal Healthcare Number.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “British Columbia Personal Healthcare Number wide breadth ” on page 1058.
■ The medium breadth detects a 10-digit number that passes checksum validation.
See “ British Columbia Personal Healthcare Number medium breadth” on page 1058.
■ The narrow breadth detects a 10-digit number that passes checksum validation. It also
requires the presence of related keywords.
See “British Columbia Personal Healthcare Number narrow breadth” on page 1059.

British Columbia Personal Healthcare Number wide breadth

The wide breadth detects a 10-digit number without checksum validation.

Table 45-105 British Columbia Personal Healthcare Number wide-breadth patterns

Pattern

[9]\d{9}

[9]\d{3} \d{3} \d{3}

Table 45-106 British Columbia Personal Healthcare Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

British Columbia Personal Healthcare Number medium breadth

The medium breadth detects a 10-digit number that passes checksum validation.
Library of system data identifiers 1059
British Columbia Personal Healthcare Number

Table 45-107 British Columbia Personal Healthcare Number medium-breadth patterns

Pattern

[9]\d{9}

[9]\d{3} \d{3} \d{3}

Table 45-108 British Columbia Personal Healthcare Number medium-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

British Columbia Personal Healthcare Number Computes British Columbia Personal Healthcare Number
Validation Check checksum that every British Columbia Personal Healthcare
Number must pass.

British Columbia Personal Healthcare Number narrow breadth

The narrow breadth detects a 10-digit number that passes checksum validation. It also requires
the presence of related keywords.

Table 45-109 British Columbia Personal Healthcare Number narrow-breadth patterns

Pattern

[9]\d{9}

[9]\d{3} \d{3} \d{3}

Table 45-110 British Columbia Personal Healthcare Number narrow-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

British Columbia Personal Healthcare Number Computes British Columbia Personal Healthcare Number
Validation Check checksum that every British Columbia Personal Healthcare
Number must pass.
Library of system data identifiers 1060
Bulgaria Value Added Tax (VAT) Number

Table 45-110 British Columbia Personal Healthcare Number narrow-breadth validator

(continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

MSP Number,msp number,MSP no,personal healthcare

number,healthcare no,Healthcare
No,PHN,phn,phn#,msp#,mspno#,PHN#,healthcare
number

MSP nombre,soins de santé no,soins de santé

personnels nombre,MSPNombre#,soinsdesanténo#

Bulgaria Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. In Bulgaria, VAT is
administered by the National Revenue Agency, which is overseen by the Bulgarian Ministry
of Finance.
The Bulgaria Value Added Tax (VAT) Number data identifier detects a 9- or 10-character
alphanumeric pattern beginning with the letters BG that matches the Bulgaria VAT Number
format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 9- or 10-character alphanumeric pattern beginning with the
letters BG without checksum validation. It checks for common test numbers.
See “Bulgaria Value Added Tax (VAT) Number wide breadth” on page 1061.
■ The medium breadth detects a 9- or 10-character alphanumeric pattern beginning with the
letters BG with checksum validation.
See “Bulgaria Value Added Tax (VAT) Number medium breadth” on page 1061.
■ The narrow breadth detects a 9- or 10-character alphanumeric pattern beginning with the
letters BG with checksum validation. It also requires the presence of related keywords and
checks for common test numbers.
See “Bulgaria Value Added Tax (VAT) Number narrow breadth” on page 1062.
Library of system data identifiers 1061
Bulgaria Value Added Tax (VAT) Number

Bulgaria Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 9- or 10-character alphanumeric pattern beginning with the letters
BG without checksum validation. It checks for common test numbers.

Table 45-111 Bulgaria Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[bB][gG]\d{9}

[bB][gG] \d{9}

[bB][gG]\d{10}

[bB][gG] \d{10}

Table 45-112 Bulgaria Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999, 0000000000, 1111111111,
2222222222, 3333333333, 4444444444, 5555555555,
6666666666, 7777777777, 8888888888, 9999999999

Bulgaria Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 9- or 10-character alphanumeric pattern beginning with the
letters BG with checksum validation.

Table 45-113 Bulgaria Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[bB][gG]\d{9}

[bB][gG] \d{9}

[bB][gG]\d{10}

[bB][gG] \d{10}
Library of system data identifiers 1062
Bulgaria Value Added Tax (VAT) Number

Table 45-114 Bulgaria Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Bulgaria Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.

Bulgaria Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 9- or 10-character alphanumeric pattern beginning with the
letters BG with checksum validation. It also requires the presence of related keywords and
checks for common test numbers.

Table 45-115 Bulgaria Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[bB][gG]\d{9}

[bB][gG] \d{9}

[bB][gG]\d{10}

[bB][gG] \d{10}

Table 45-116 Bulgaria Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999, 0000000000, 1111111111,
2222222222, 3333333333, 4444444444, 5555555555,
6666666666, 7777777777, 8888888888, 9999999999

Bulgaria Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.
Library of system data identifiers 1063
Bulgarian Uniform Civil Number - EGN

Table 45-116 Bulgaria Value Added Tax (VAT) Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat number, vat, VAT, vat#, VAT#, vat no., vatno#, value
added tax number, vatin, VATIN, value added tax, vat
no

номер на таксата, ДДС, ДДС#, ДДС номер., ДДС

номер.#, номер на данъка върху добавената
стойност, данък върху добавената стойност, ДДС
номер

Bulgarian Uniform Civil Number - EGN

The uniform civil number (EGN) is unique number assigned to each Bulgarian citizen or resident
foreign national. It serves as a national identification number. An EGN is assigned to Bulgarians
at birth, or when a birth certificate is issued.
The Bulgarian Uniform Civil Number - EGN data identifier detects a 10-digit number that
matches the Bulgarian Uniform Civil Number - EGN format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “Bulgarian Uniform Civil Number - EGN wide breadth” on page 1063.
■ The medium breadth detects a 10-digit number that passes checksum validation.
See “Bulgarian Uniform Civil Number - EGN medium breadth” on page 1064.
■ The narrow breadth detects a 10-digit number that passes checksum validation. It also
requires the presence of related keywords.
See “Bulgarian Uniform Civil Number - EGN narrow breadth” on page 1065.

Bulgarian Uniform Civil Number - EGN wide breadth

The wide breadth detects a 10-digit number without checksum validation.

Table 45-117 Bulgarian Uniform Civil Number - EGN wide-breadth pattern

Pattern

\d\d[024][123456789]0[123456789]\d{4}
Library of system data identifiers 1064
Bulgarian Uniform Civil Number - EGN

Table 45-117 Bulgarian Uniform Civil Number - EGN wide-breadth pattern (continued)

Pattern

\d\d[135][012]0[123456789]\d{4}

\d\d[024][123456789][12]\d{5}

\d\d[135][012][12]\d{5}

\d\d[024][123456789]3[01]\d{4}

\d\d[135][012]3[01]\d{4}

Table 45-118 Bulgarian Uniform Civil Number - EGN wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Bulgarian Uniform Civil Number - EGN medium breadth

The medium breadth detects a 10-digit number that passes checksum validation.

Table 45-119 Bulgarian Uniform Civil Number - EGN medium-breadth pattern

Pattern

\d\d[024][123456789]0[123456789]\d{4}

\d\d[135][012]0[123456789]\d{4}

\d\d[024][123456789][12]\d{5}

\d\d[135][012][12]\d{5}

\d\d[024][123456789]3[01]\d{4}

\d\d[135][012]3[01]\d{4}

Table 45-120 Bulgarian Uniform Civil Number - EGN medium-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Bulgarian Uniform Civil Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1065
Bulgarian Uniform Civil Number - EGN

Bulgarian Uniform Civil Number - EGN narrow breadth

The narrow breadth detects a 10-digit number that passes checksum validation. It also requires
the presence of related keywords.

Table 45-121 Bulgarian Uniform Civil Number - EGN narrow-breadth pattern

Pattern

\d\d[024][123456789]0[123456789]\d{4}

\d\d[135][012]0[123456789]\d{4}

\d\d[024][123456789][12]\d{5}

\d\d[135][012][12]\d{5}

\d\d[024][123456789]3[01]\d{4}

\d\d[135][012]3[01]\d{4}

Table 45-122 Bulgarian Uniform Civil Number - EGN narrow-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Bulgarian Uniform Civil Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

BUCN, uniform civil number, uniform civil ID, uniform

civil no, EGN, Bulgarian uniform civil number,
uniformcivilno#, BUCN#, EGN#, bucn, egn#, bucn#,
uniformcivilnumber#, personal number, personal no,
identification number, personal id, national id

Униформ граждански номер, Униформ ID, Униформ

граждански ID, Униформ граждански не., български
Униформ граждански номер,
УниформгражданскиID#, Униформгражданскине.#,
личен номер, лично не, идентификационен номер,
лична идентификация, национален номер
Library of system data identifiers 1066
Burgerservicenummer

Burgerservicenummer
In the Netherlands, the Burgerservicenummer is used to uniquely identify citizens and is printed
on driving licenses, passports and international ID cards under the header Personal Number.
The Burgerservicenummer data identifier detects an eight- or nine-digit number that matches
the Burgerservicenummer format and passes checksum validation.
The Burgerservicenummer data identifier provides two breadths of detection:
■ The wide breadth detects an eight- or nine-digit number that passes checksum validation.
See “Burgerservicenummer wide breadth” on page 1066.
■ The narrow breadth detects an eight- or nine-digit number that passes checksum validation.
It also requires the presence of related keywords.
See “Burgerservicenummer narrow breadth” on page 1066.

Burgerservicenummer wide breadth

The wide breadth detects an eight- or nine-digit number that passes checksum validation.

Table 45-123 Burgerservicenummer wide-breadth pattern

Pattern

\d{9}

Table 45-124 Burgerservicenummer wide-breadth validator

Mandatory validator Description

Burgerservicenummer Check Computes the checksum and validates the pattern against
it.

Burgerservicenummer narrow breadth

The narrow breadth detects an eight- or nine-digit number that passes checksum validation.
It also requires the presence of related keywords.

Table 45-125 Burgerservicenummer narrow-breadth pattern

Pattern

\d{9}
Library of system data identifiers 1067
Canada Driver's License Number

Table 45-126 Burgerservicenummer narrow-breadth validators

Mandatory validator Description

Burgerservicenummer Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Persoonsnummer, sofinummer, sociaal-fiscaal

nummer, persoonsgebonden, person number,
social-fiscal number, person-related number

Canada Driver's License Number

In Canada, driver's licenses are issued by the government of the province or territory in which
the driver is residing. Specific regulations relating to driver's licenses vary province to province,
though they are all similar.
The Canada Driver's License Number data identifier detects a 9-, 10-, 12-, 13-, 14-, or
15-character alphanumeric pattern that matches the Canada Driver's License Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 9-, 10-, 12-, 13-, 14-, or 15-character alphanumeric pattern
that matches the Canada Driver's License Number format without checksum validation. It
checks for common test numbers.
See “Canada Driver's License Number wide breadth” on page 1067.
■ The medium breadth detects a 9-, 10-, 12-, 13-, 14-, or 15-character alphanumeric pattern
that matches the Canada Driver's License Number format with checksum validation.
See “Canada Driver's License Number medium breadth” on page 1068.
■ The narrow breadth detects a 9-, 10-, 12-, 13-, 14-, or 15-character alphanumeric pattern
that matches the Canada Driver's License Number format with checksum validation. It also
requires the presence of related keywords and checks for common test numbers.
See “Canada Driver's License Number narrow breadth” on page 1069.

Canada Driver's License Number wide breadth

The wide breadth detects a 9-, 10-, 12-, 13-, 14-, or 15-character alphanumeric pattern that
matches the Canada Driver's License Number format without checksum validation. It checks
for common test numbers.
Library of system data identifiers 1068
Canada Driver's License Number

Table 45-127 Canada Driver's License Number wide-breadth patterns

Pattern

\d\d\d\d\d\d-\d\d\d

[Dd]\d\d\d\d\d\d\d\d\d

[A-Za-z]{2}-[A-Za-z]{2}-[A-Za-z]{2}-[A-Za-z]\d\d\d[A-Za-z]{2}

[A-Za-z]\d\d\d\d-\d\d\d\d\d-\d\d\d\d\d

[A-Za-z]{5}\d\d\d\d\d\d\d\d\d

[A-Za-z]\d\d\d\d-\d\d\d\d\d\d-\d\d

Table 45-128 Canada Driver's License Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000000000, 11111111111111, 22222222222222,

33333333333333, 44444444444444, 55555555555555,
66666666666666, 77777777777777, 88888888888888,
99999999999999, 000000000000, 111111111111,
222222222222, 333333333333, 444444444444,
555555555555, 666666666666, 777777777777,
888888888888, 999999999999, 000000000, 111111111,
222222222, 333333333, 444444444, 555555555,
666666666, 777777777, 888888888, 999999999

Canada Driver's License Number medium breadth

The medium breadth detects a 9-, 10-, 12-, 13-, 14-, or 15-character alphanumeric pattern
that matches the Canada Driver's License Number format with checksum validation.

Table 45-129 Canada Driver's License Number medium-breadth patterns

Pattern

\d\d\d\d\d\d-\d\d\d

[Dd]\d\d\d\d\d\d\d\d\d

[A-Za-z]{2}-[A-Za-z]{2}-[A-Za-z]{2}-[A-Za-z]\d\d\d[A-Za-z]{2}
Library of system data identifiers 1069
Canada Driver's License Number

Table 45-129 Canada Driver's License Number medium-breadth patterns (continued)

Pattern

[A-Za-z]\d\d\d\d-\d\d\d\d\d-\d\d\d\d\d

[A-Za-z]{5}\d\d\d\d\d\d\d\d\d

[A-Za-z]\d\d\d\d-\d\d\d\d\d\d-\d\d

Table 45-130 Canada Driver's License Number medium-breadth validators

Mandatory validator Description

Canada Driver's License Number Check Computes the checksum and validates the pattern against
it.

Canada Driver's License Number narrow breadth

The narrow breadth detects a 9-, 10-, 12-, 13-, 14-, or 15-character alphanumeric pattern that
matches the Canada Driver's License Number format with checksum validation. It also requires
the presence of related keywords and checks for common test numbers.

Table 45-131 Canada Driver's License Number narrow-breadth patterns

Pattern

\d\d\d\d\d\d-\d\d\d

[Dd]\d\d\d\d\d\d\d\d\d

[A-Za-z]{2}-[A-Za-z]{2}-[A-Za-z]{2}-[A-Za-z]\d\d\d[A-Za-z]{2}

[A-Za-z]\d\d\d\d-\d\d\d\d\d-\d\d\d\d\d

[A-Za-z]{5}\d\d\d\d\d\d\d\d\d

[A-Za-z]\d\d\d\d-\d\d\d\d\d\d-\d\d

Table 45-132 Canada Driver's License Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1070
Canada Passport Number

Table 45-132 Canada Driver's License Number narrow-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000000000, 11111111111111, 22222222222222,

Canada Driver's License Number Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

driver license, drivers license, driving license, driver

license number, drivers license number, driving license
number, dlno#, drivers lic., driver''''s license number,
driver licence, drivers licence, driving licence, driver
permit, drivers permit, driving permit, license number,
licence number, drivers permit number, dl#

permis de conduire

Canada Passport Number

The Canadian passport is issued to citizens of Canada for the purposes of international travel.
The Canada Passport Number data identifier detects an eight- or nine-character alphanumeric
pattern that matches the Canada Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight- or nine-character alphanumeric pattern that matches
the Canada Passport Number format. It checks for common test numbers.
See “Canada Passport Number wide breadth” on page 1071.
■ The narrow breadth detects an eight- or nine-character alphanumeric pattern that matches
the Canada Passport Number format. It checks for common test numbers, and also requires
the presence of related keywords.
Library of system data identifiers 1071
Canada Passport Number

See “Canada Passport Number narrow breadth” on page 1071.

Canada Passport Number wide breadth

The wide breadth detects an eight- or nine-character alphanumeric pattern that matches the
Canada Passport Number format. It checks for common test numbers.

Table 45-133 Canada Passport Number wide-breadth patterns

Pattern

[a-zA-Z]{2}\d{6}

[a-zA-Z]{2}\d{7}

Table 45-134 Canada Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999, 0000000, 1111111,
2222222, 3333333, 4444444, 5555555, 6666666,
7777777, 8888888, 9999999

Canada Passport Number narrow breadth

The narrow breadth detects an eight- or nine-character alphanumeric pattern that matches
the Canada Passport Number format. It checks for common test numbers, and also requires
the presence of related keywords.

Table 45-135 Canada Passport Number narrow-breadth patterns

Pattern

[a-zA-Z]{2}\d{6}

[a-zA-Z]{2}\d{7}

Table 45-136 Canada Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1072
Canada Permanent Residence (PR) Number

Table 45-136 Canada Passport Number narrow-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999, 0000000, 1111111,
2222222, 3333333, 4444444, 5555555, 6666666,
7777777, 8888888, 9999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

passport, passport number, passport no, passportno,

passport no., passport#, passportno#

passeport, numéro passeport, No passeport,

passeport#

Canada Permanent Residence (PR) Number

The Canadian Permanent Resident card is an identification document for permanent residents
of Canada who are not Canadian citizens. This document is required for permanent residents
returning to Canada by air.
The Canada Permanent Residence (PR) Number data identifier detects a 9- or 12-character
alphanumeric pattern that matches the Canada Permanent Residence (PR) Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 9- or 12-character alphanumeric pattern that matches the
Canada Permanent Residence (PR) Number format. It checks for common test numbers.
See “Canada Permanent Residence (PR) Number wide breadth” on page 1072.
■ The narrow breadth detects a 9- or 12-character alphanumeric pattern that matches the
Canada Permanent Residence (PR) Number format. It checks for common test numbers,
and also requires the presence of related keywords.
See “Canada Permanent Residence (PR) Number narrow breadth” on page 1073.

Canada Permanent Residence (PR) Number wide breadth

The wide breadth detects a 9- or 12-character alphanumeric pattern that matches the Canada
Permanent Residence (PR) Number format. It checks for common test numbers.
Library of system data identifiers 1073
Canada Permanent Residence (PR) Number

Table 45-137 Canada Permanent Residence (PR) Number wide-breadth patterns

Pattern

[a-zA-Z]{2}\d{7}

[a-zA-Z]{2}\d{10}

Table 45-138 Canada Permanent Residence (PR) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999,
0000000000, 1111111111, 2222222222, 3333333333,
4444444444, 5555555555, 6666666666, 7777777777,
8888888888, 9999999999

Canada Permanent Residence (PR) Number narrow breadth

The narrow breadth detects a 9- or 12-character alphanumeric pattern that matches the Canada
Permanent Residence (PR) Number format. It checks for common test numbers, and also
requires the presence of related keywords.

Table 45-139 Canada Permanent Residence (PR) Number narrow-breadth patterns

Pattern

[a-zA-Z]{2}\d{7}

[a-zA-Z]{2}\d{10}

Table 45-140 Canada Permanent Residence (PR) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1074
Canadian Social Insurance Number

Table 45-140 Canada Permanent Residence (PR) Number narrow-breadth validators

(continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999,
0000000000, 1111111111, 2222222222, 3333333333,
4444444444, 5555555555, 6666666666, 7777777777,
8888888888, 9999999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

permanent resident number, permanent resident no,

permanent resident no., permanent resident card,
permanent resident card number, pr number, pr no,
pr no.

numéro résident permanent, résident permanent non,

résident permanent no., carte résident permanent,
numéro carte résident permanent, pr non

Canadian Social Insurance Number

The Canadian Social Insurance Number (SIN) is a personal identification number issued by
Human Resources and Skills Development Canada primarily for administering national pension
and employment plans.
The Canadian Social Insurance Number data identifier detects a nine-digit number that matches
the Canadian Social Insurance Number format.
The Canadian Social Insurance Number data identifier provides three breadths of detection:
■ The wide breadth detects nine-digit numbers with the format DDD-DDD-DDD separated
by dashes, spaces, periods, slashes, or without separators. It also performs Luhn-check
validation.
See “Canadian Social Insurance Number wide breadth” on page 1075.
■ The medium breadth detects nine-digit numbers with the format DDD-DDD-DDD separated
by dashes, spaces, or periods. It also performs Luhn check validation and eliminates
non-assigned numbers and common test numbers.
See “Canadian Social Insurance Number medium breadth” on page 1075.
Library of system data identifiers 1075
Canadian Social Insurance Number

■ The narrow breadth detects nine-digit numbers with the format DDD-DDD-DDD separated
by dashes or spaces. It also performs Luhn-check validation; eliminates non-assigned
numbers, fictitiously assigned numbers, and common test numbers; and requires the
presence of related keywords.
See “Canadian Social Insurance Number narrow breadth” on page 1076.

Canadian Social Insurance Number wide breadth

The wide breadth detects nine-digit numbers with the format DDD-DDD-DDD separated by
dashes, spaces, periods, slashes, or without separators. It also performs Luhn-check validation.

Table 45-141 Canadian Social Insurance Number wide-breadth patterns

Patterns

\d{3} \d{3} \d{3}

\d{9}

\d{3}/\d{3}/\d{3}

\d{3}.\d{3}.\d{3}

\d{3}-\d{3}-\d{3}

Table 45-142 Canadian Social Insurance Number wide-breadth validator

Mandatory validator Description

Luhn Check Validator computes the Luhn checksum which every

Canadian Insurance Number must pass.

Canadian Social Insurance Number medium breadth

The medium breadth detects nine-digit numbers with the format DDD-DDD-DDD separated
by dashes, spaces, or periods. It also performs Luhn check validation and eliminates
non-assigned numbers and common test numbers.

Table 45-143 Canadian Social Insurance Number medium-breadth patterns

Patterns

\d{3} \d{3} \d{3}

\d{3}.\d{3}.\d{3}

\d{3}-\d{3}-\d{3}
Library of system data identifiers 1076
Canadian Social Insurance Number

Table 45-144 Canadian Social Insurance Number medium-breadth validators

Mandatory validators Description

Luhn Check Validator computes the Luhn checksum which every

Canadian Insurance Number must pass.

Number delimiter Validates a match by checking the surrounding numbers.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

8, 123456789

Canadian Social Insurance Number narrow breadth

The narrow breadth detects nine-digit numbers with the format DDD-DDD-DDD separated by
dashes or spaces. It also performs Luhn-check validation; eliminates non-assigned numbers,
fictitiously assigned numbers, and common test numbers; and requires the presence of related
keywords.

Table 45-145 Canadian Social Insurance Number narrow-breadth patterns

Patterns

\d{3} \d{3} \d{3}

\d{3}-\d{3}-\d{3}

Table 45-146 Canadian Social Insurance Number narrow-breadth validators

Mandatory validators Description

Luhn Check Validator computes the Luhn checksum which every

Canadian Insurance Number must pass.

Number delimiter Validates a match by checking the surrounding numbers.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:
0, 8, 123456789
Library of system data identifiers 1077
Chilean National Identification Number

Table 45-146 Canadian Social Insurance Number narrow-breadth validators (continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

pension, pensions, soc ins, ins #, social ins, CSIN,

SSN, social security, social insurance, Canada,
Canadian

Chilean National Identification Number

The Chilean National Identity Number or National Unique Role (RUN) is the only identifying
number assigned to all Chilean residents in or out of Chile, and to aliens residing temporarily
or permanently in the country.
The Chilean National Identification Number data identifier detects an eight- or nine-digit number
that matches the Chilean National Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight- or nine-digit number without checksum validation.
See “Chilean National Identification Number wide breadth” on page 1077.
■ The medium breadth detects an eight- or nine-digit number with checksum validation.
See “Chilean National Identification Number medium breadth” on page 1078.
■ The narrow breadth detects an eight- or nine-digit number that passes checksum validation.
It also requires the presence of related keywords.
See “Chilean National Identification Number narrow breadth” on page 1078.

Chilean National Identification Number wide breadth

The wide breadth detects an eight- or nine-digit number without checksum validation.

Table 45-147 Chilean National Identification Number wide-breadth patterns

Patterns

\d{7}[0123456789Kk]

\d{7}[-][0123456789Kk]

\d[.]\d{3}[.]\d{3}[-][0123456789Kk]
Library of system data identifiers 1078
Chilean National Identification Number

Table 45-147 Chilean National Identification Number wide-breadth patterns (continued)

Patterns

\d{8}[0123456789Kk]

\d{8}[-][0123456789Kk]

\d{2}[.]\d{3}[.]\d{3}[-][0123456789Kk]

Table 45-148 Chilean National Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Chilean National Identification Number medium breadth

The medium breadth detects an eight- or nine-digit number with checksum validation.

Table 45-149 Chilean National Identification Number medium-breadth patterns

Patterns

\d{7}[0123456789Kk]

\d{7}[-][0123456789Kk]

\d[.]\d{3}[.]\d{3}[-][0123456789Kk]

\d{8}[0123456789Kk]

\d{8}[-][0123456789Kk]

\d{2}[.]\d{3}[.]\d{3}[-][0123456789Kk]

Table 45-150 Chilean National Identification Number medium-breadth validator

Mandatory validator Description

Chilean National Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Chilean National Identification Number narrow breadth

The narrow breadth detects an eight- or nine-digit number that passes checksum validation.
It also requires the presence of related keywords.
Library of system data identifiers 1079
China Passport Number

Table 45-151 Chilean National Identification Number narrow-breadth patterns

Patterns

\d{7}[0123456789Kk]

\d{7}[-][0123456789Kk]

\d[.]\d{3}[.]\d{3}[-][0123456789Kk]

\d{8}[0123456789Kk]

\d{8}[-][0123456789Kk]

\d{2}[.]\d{3}[.]\d{3}[-][0123456789Kk]

Table 45-152 Chilean National Identification Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Chilean National Identification Number Validation Computes the checksum and validates the pattern against
Check it .

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:
RUT, RUN, national identification number, Chilean
identity no., national unique role, rut#, run#,
identificationnumber, identityno.#, identity number

nationaluniqueroleID#, nacional identidad, número

identificación, número identificación nacional,
identidad número

China Passport Number

The People's Republic of China passport, commonly referred to as the Chinese passport, is
issued to nationals of the People's Republic of China who do not permanently reside in Hong
Kong or Macau for international travel.
The China Passport Number data identifier detects a 9- to 10-character identifier that matches
the China Passport Number format.
The China Passport Number data identifier provides two breadths of detection:
Library of system data identifiers 1080
China Passport Number

■ The wide breadth detects a 9- to 10-character identifier.

See “China Passport Number wide breadth” on page 1080.
■ The narrow breadth detects a 9- 10-character identifier. It also requires the presence of
related keywords.
See “China Passport Number narrow breadth” on page 1080.

China Passport Number wide breadth

The wide breadth detects a 9- to 10-character identifier.

Table 45-153 China Passport Number wide-breadth patterns

Patterns

\d{9}

\l\d{8}

\l{2}\d{8}

Table 45-154 China Passport Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

China Passport Number narrow breadth

The wide breadth detects a 9- to 10-character identifier. It also requires the presence of related
keywords.

Table 45-155 China Passport Number narrow-breadth patterns

Patterns

\d{9}

\l\d{8}

\l{2}\d{8}

Table 45-156 China Passport Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding numbers.

Library of system data identifiers 1081
Codice Fiscale

Table 45-156 China Passport Number narrow-breadth validators (continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

中国护照, 护照, 护照本

passport, Passport, CHINA PASSPORT, China

Passport, china passport, Passport Book, passport
book

Codice Fiscale
The Codice Fiscale uniquely identifies an Italian citizen or permanent resident alien and
issuance of the code is centralized to the Ministry of Treasure. The Codice Fiscale is issued
to every Italian at birth.
The Codice Fiscale data identifier detects a 16-character identifier that matches the Codice
Fiscale format.
The Codice Fiscale data identifier provides two breadths of detection:
■ The wide breadth detects a 16-character identifier with checksum validation.
See “Codice Fiscale wide breadth” on page 1081.
■ The narrow breadth detects a 16-character identifier with checksum validation. It also
requires the presence of related keywords.
See “Codice Fiscale narrow breadth” on page 1082.

Codice Fiscale wide breadth

The wide breadth detects a 16-character identifier that passes checksum validation.

Table 45-157 Codice Fiscale wide-breadth patterns

Patterns

[A-Z]{6}[0-9LMNPQRSTUV]{2}[ABCDEHLMPRST][0-9LMNPQRSTUV]{2}[A-Z] [0-9LMNPQRSTUV]{3}[A-Z]

[A-Z]{3} [A-Z]{3} [0-9LMNPQRSTUV]{2}[ABCDEHLMPRST][0-9LMNPQRSTUV]{2}

[A-Z][0-9LMNPQRSTUV]{3}[A-Z]
Library of system data identifiers 1082
Colombian Addresses

Table 45-158 Codice Fiscale wide-breadth validator

Mandatory validator Description

Codice Fiscale Control Key Check Computes the control key and checks if it is valid.

Codice Fiscale narrow breadth

The narrow breadth detects a 16-character identifier that passes checksum validation. It also
requires the presence of related keywords.

Table 45-159 Codice Fiscale narrow-breadth patterns

Patterns

[A-Z]{6}[0-9LMNPQRSTUV]{2}[ABCDEHLMPRST][0-9LMNPQRSTUV]{2}[A-Z] [0-9LMNPQRSTUV]{3}[A-Z]

[A-Z]{3} [A-Z]{3} [0-9LMNPQRSTUV]{2}[ABCDEHLMPRST][0-9LMNPQRSTUV]{2}

[A-Z][0-9LMNPQRSTUV]{3}[A-Z]

Table 45-160 Codice Fiscale narrow-breadth validators

Mandatory validators Description

Codice Fiscale Control Key Check Computes the control key and checks if it is valid.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

codice fiscal, dati anagrafici, partita I.V.A., p. iva, tax

code, personal data, VAT number

Colombian Addresses
The Colombian Addresses data identifier detects home addresses and physical locations in
Columbia.
The Colombian Addresses data identifier provides two breadths of detection:
■ The wide breadth detects an address without validation.
See “ Colombian Addresses wide breadth” on page 1083.
■ The narrow breadth detects an address with keyword validation.
See “Colombian Addresses narrow breadth” on page 1084.
Library of system data identifiers 1083
Colombian Addresses

Colombian Addresses wide breadth

The wide breadth detects an address without validation.

Table 45-161 Colombian Addresses wide-breadth patterns

Patterns

\d{1,3} No. \d{1,3}-\d{1,3}

\d{1,3} \d{1,3}-\d{1,3}

\d{1,3} Bis \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] Bis \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] \d{1,3}-\d{1,3}

\d{1,3} Bis No \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} Bis No. \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] Bis No. \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] Bis # \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] No. \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} # \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} No. \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] Bis No \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] No \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} # \d{1,3}-\d{1,3}

\d{1,3}[A-Za-z] # \d{1,3}-\d{1,3}

\d{1,3} No \d{1,3}-\d{1,3}

\d{1,3}[A-Za-z] No. \d{1,3}-\d{1,3}

\d{1,3}[A-Za-z] No \d{1,3}-\d{1,3}

\d{1,3} Bis # \d{1,3}[A-Za-z]-\d{1,3}

Library of system data identifiers 1084
Colombian Addresses

Table 45-161 Colombian Addresses wide-breadth patterns (continued)

Patterns

\d{1,3}[A-Za-z] # \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} No \d{1,3}[A-Za-z]-\d{1,3}

The wide breadth of the Colombian Addresses data identifier does not include a validator.

Colombian Addresses narrow breadth

The narrow breadth detects an address with keyword validation.

Table 45-162 Colombian Addresses narrow-breadth patterns

Patterns

\d{1,3} No. \d{1,3}-\d{1,3}

\d{1,3} \d{1,3}-\d{1,3}

\d{1,3} Bis \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] Bis \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] \d{1,3}-\d{1,3}

\d{1,3} Bis No \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} Bis No. \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] Bis No. \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] Bis # \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] No. \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} # \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} No. \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] Bis No \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] No \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} # \d{1,3}-\d{1,3}
Library of system data identifiers 1085
Colombian Cell Phone Number

Table 45-162 Colombian Addresses narrow-breadth patterns (continued)

Patterns

\d{1,3}[A-Za-z] # \d{1,3}-\d{1,3}

\d{1,3} No \d{1,3}-\d{1,3}

\d{1,3}[A-Za-z] No. \d{1,3}-\d{1,3}

\d{1,3}[A-Za-z] No \d{1,3}-\d{1,3}

\d{1,3} Bis # \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3}[A-Za-z] # \d{1,3}[A-Za-z]-\d{1,3}

\d{1,3} No \d{1,3}[A-Za-z]-\d{1,3}

Table 45-163 Colombian Addresses narrow-breadth validator

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Calle, Cll, Carrera, Cra, Cr, Avenida, Av, Dg, Diagonal,

Diag, Tv, Trans, Transversal, vereda

Colombian Cell Phone Number

The Colombian Cell Phone Number data identifier detects Colombian cell phone numbers.
The Colombian Cell Phone Number data identifier provides two breadths of detection:
■ The wide breadth detects a 8- to 10- digit number with duplicate digit validation.
See “Colombian Cell Phone Number wide breadth” on page 1085.
■ The narrow breadth detects an 8- to 10-digit number with required characters at the
beginning. It also checks for duplicate digits, and it requires the presence of related
keywords.
See “Colombian Cell Phone Number narrow breadth” on page 1086.

Colombian Cell Phone Number wide breadth

The wide breadth detects an 8- to 10-digit number with duplicate digit validation.
Library of system data identifiers 1086
Colombian Cell Phone Number

Table 45-164 Colombian Cell Phone Number wide-breadth patterns

Patterns

\d{8}

\d{2}.\d{3}.\d{3}

\d{2} \d{3} \d{3}

\d{2}/\d{3}/\d{3}

\d{2}-\d{3}-\d{3}

\d{2},\d{3},\d{3}

\d{9}

\d{3} \d{3} \d{3}

\d{3}-\d{3}-\d{3}

\d{3},\d{3},\d{3}

\d{3}/\d{3}/\d{3}

\d{3}.\d{3}.\d{3}

\d{10}

\d{1}/\d{3}/\d{3}/\d{3}

\d{1},\d{3},\d{3},\d{3}

\d{1}.\d{3}.\d{3}.\d{3}

\d{1}-\d{3}-\d{3}-\d{3}

\d{1} \d{3} \d{3} \d{3}

Table 45-165 Colombian Cell Phone Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Colombian Cell Phone Number narrow breadth

The narrow breadth detects an 8- to 10-digit number with required characters at the beginning.
It also checks for duplicate digits, and it requires the presence of related keywords.
Library of system data identifiers 1087
Colombian Cell Phone Number

Table 45-166 Colombian Cell Phone Number narrow-breadth patterns

Patterns

\d{8}

\d{2}.\d{3}.\d{3}

\d{2} \d{3} \d{3}

\d{2}/\d{3}/\d{3}

\d{2}-\d{3}-\d{3}

\d{2},\d{3},\d{3}

\d{9}

\d{3} \d{3} \d{3}

\d{3}-\d{3}-\d{3}

\d{3},\d{3},\d{3}

\d{3}/\d{3}/\d{3}

\d{3}.\d{3}.\d{3}

\d{10}

\d{1}/\d{3}/\d{3}/\d{3}

\d{1},\d{3},\d{3},\d{3}

\d{1}.\d{3}.\d{3}.\d{3}

\d{1}-\d{3}-\d{3}-\d{3}

\d{1} \d{3} \d{3} \d{3}

Table 45-167 Colombian Cell Phone Number narrow-breadth validators

Mandatory validators Description

Require beginning characters This validator requires the following characters at the
beginning of the number:

300, 301, 302, 310, 311, 312, 313, 314, 315, 316, 317,
318, 319, 320, 321, 350

Duplicate digits Ensures that a string of digits is not all the same.
Library of system data identifiers 1088
Colombian Personal Identification Number

Table 45-167 Colombian Cell Phone Number narrow-breadth validators (continued)

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

numero celular, número de teléfono, teléfono celular

no., numero celular#

Colombian Personal Identification Number

The Colombian Personal Identification Number is a unique 8- or 10-digit number assigned to
Colombian citizens at birth.
The Colombian Personal Identification Number data identifier detects an 8 or 10-digit number
that matches the Colombian Personal Identification Number format.
The Colombian Personal Identification Number data identifier provides two breadths of detection:
■ The wide breadth detects an 8- or 10-digit number with duplicate digit validation.
See “Colombian Personal Identification Number wide breadth” on page 1088.
■ The narrow breadth detects an 8- or 10-digit number with duplicate digit validation; prefix
and suffix exclusion; and beginning character exclusion. It also requires the presence of
related keywords.
See “Colombian Personal Identification Number narrow breadth” on page 1089.

Colombian Personal Identification Number wide breadth

The wide breadth detects an 8- or 10-digit number with duplicate digit validation.

Table 45-168 Colombian Personal Identification Number wide-breadth patterns

Patterns

\d{9}

\d{3} \d{3} \d{3}

\d{3}-\d{3}-\d{3}

\d{3},\d{3},\d{3}
Library of system data identifiers 1089
Colombian Personal Identification Number

Table 45-168 Colombian Personal Identification Number wide-breadth patterns (continued)

Patterns

\d{3}/\d{3}/\d{3}

\d{3}.\d{3}.\d{3}

Table 45-169 Colombian Personal Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Colombian Personal Identification Number narrow breadth

The narrow breadth detects an 8- or 10-digit number with duplicate digit validation; prefix and
suffix exclusion; and beginning character exclusion. It also requires the presence of related
keywords.

Table 45-170 Colombian Personal Identification Number narrow-breadth patterns

Patterns

\d{9}

\d{3} \d{3} \d{3}

\d{3}-\d{3}-\d{3}

\d{3},\d{3},\d{3}

\d{3}/\d{3}/\d{3}

\d{3}.\d{3}.\d{3}

Table 45-171 Colombian Personal Identification Number narrow-breadth validators

Mandatory validators Description

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

300, 301, 302, 310, 310, 312, 313, 314, 315, 316, 317,
318, 319, 320, 321, 350

Exclude prefix Excludes the following prefixes:

$ ,$
Library of system data identifiers 1090
Colombian Tax Identification Number

Table 45-171 Colombian Personal Identification Number narrow-breadth validators

(continued)

Mandatory validators Description

Exclude suffix Excludes the following suffix:

.00

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

cedula, cédula, c.c., c.c, C.C., C.C, cc, CC, NIE., NIE,
nie., nie, cedula de ciudadania, cédula de ciudadanía,
cc#, CC #, documento de identificacion, documento
de identificación, Nit.

Colombian Tax Identification Number

The Colombian Tax Identification Number is a nine-digit number assigned to persons who
must pay taxes in Colombia.
The Colombian Tax Identification Number data identifier detects a nine-digit number that
matches the Colombian Tax Identification Number format.
The Colombian Tax Identification Number data identifier provides two breadths of detection:
■ The wide breadth detects a nine-digit number with duplicate digit validation.
See “Colombian Tax Identification Number wide breadth” on page 1090.
■ The narrow breadth detects a nine-digit number with duplicate digit validation, required
beginning characters, and prefix exclusion. It also requires the presence of related keywords.
See “Colombian Tax Identification Number narrow breadth” on page 1091.

Colombian Tax Identification Number wide breadth

The wide breadth detects a 9-digit number with duplicate digit validation.
Library of system data identifiers 1091
Colombian Tax Identification Number

Table 45-172 Colombian Tax Identification Number wide-breadth patterns

Patterns

\d{9}

\d{3} \d{3} \d{3}

\d{3}-\d{3}-\d{3}

\d{3},\d{3},\d{3}

\d{3}/\d{3}/\d{3}

\d{3}.\d{3}.\d{3}

Table 45-173 Colombian Tax Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Colombian Tax Identification Number narrow breadth

The narrow breadth detects a nine-digit number with duplicate digit validation, required beginning
characters, and prefix exclusion. It also requires the presence of related keywords.

Table 45-174 Colombian Tax Identification Number narrow-breadth patterns

Patterns

\d{9}

\d{3} \d{3} \d{3}

\d{3}-\d{3}-\d{3}

\d{3},\d{3},\d{3}

\d{3}/\d{3}/\d{3}

\d{3}.\d{3}.\d{3}

Table 45-175 Colombian Tax Identification Number narrow-breadth validators

Mandatory validators Description

Require beginning characters Requires these characters at the beginning of the number:

800, 860, 890, 900

Library of system data identifiers 1092
Credit Card Magnetic Stripe Data

Table 45-175 Colombian Tax Identification Number narrow-breadth validators (continued)

Mandatory validators Description

Exclude prefix Excludes the following prefix:

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

NIT., NIT, nit., nit, Nit.

Credit Card Magnetic Stripe Data

The magnetic stripe of a credit card contains information about the card. Storage of the complete
version of this data is a violation of the Payment Card Industry (PCI) Data Security Standard.
The Credit Card Magnetic Stripe Data data identifier detects the following raw data taken from
the credit card magnetic stripe:
■ Data from track one, format B, which typically contains account number, name, expiration
date, and possibly Card Verification Value or Card Verification Code 1 (CVV1/CVC1).
■ Data from track two, which typically contains account number and possibly expiration date,
service code and Card Verification Value or Card Verification Code 1 (CVV1/CVC1)
The Credit Card Magnetic Stripe data identifier detects the characteristic data pattern for track
two data which contains the start sentinel, format code, primary account number, name,
expiration date, service code, discretional data, and the end sentinel. It also includes standard
field separators. It validates the data using a Luhn-check validator.
Library of system data identifiers 1093
Credit Card Magnetic Stripe Data

Table 45-176 Credit Card Magnetic Stripe Data medium-breadth patterns

Patterns Patterns (continued)

%B3[068]\d{12}^[A-Z]{1}

%B3[068]\d{2} \d{6} \d{4}^[A-Z]{1}

%B3[068]\d{2}-\d{6}-\d{4}^[A-Z]{1}

%B4\d{12}^[A-Z]{1}

%B3[47]\d{2}-\d{6}-\d{5}^[A-Z]{1}

%B4\d{3} \d{4} \d{4} \d{4}^[A-Z]{1}

%B3[47]\d{2} \d{6} \d{5}^[A-Z]{1}

%B4\d{15}^[A-Z]{1}

%B3[47]\d{13}^[A-Z]{1}

%B5[1-5]\d{2}-\d{4}-\d{4}-\d{4}^[A-Z]{1}

%B4\d{3}-\d{4}-\d{4}-\d{4}^[A-Z]{1}

%B5[1-5]\d{2} \d{4} \d{4} \d{4}^[A-Z]{1}

%B5[1-5]\d{14}^[A-Z]{1}

%B2131\d{11}^[A-Z]{1}

%B3\d{3}-\d{4}-\d{4}-\d{4}^[A-Z]{1}

%B3\d{3} \d{4} \d{4} \d{4}^[A-Z]{1}

%B3\d{15}^[A-Z]{1}

%B2149\d{11}^[A-Z]{1}

%B2149 \d{6} \d{5}^[A-Z]{1}

%B2149-\d{6}-\d{5}^[A-Z]{1}

%B2014\d{11}^[A-Z]{1}

%B2014 \d{6} \d{5}^[A-Z]{1}

%B2014-\d{6}-\d{5}^[A-Z]{1}
Library of system data identifiers 1094
Credit Card Magnetic Stripe Data

Table 45-176 Credit Card Magnetic Stripe Data medium-breadth patterns (continued)

Patterns Patterns (continued)

;1800\d{11}=

;6011-\d{4}-\d{4}-\d{4}=

;6011 \d{4} \d{4} \d{4}=

;6011\d{12}=

;3[068]\d{12}=

;3[068]\d{2} \d{6} \d{4}=

;3[068]\d{2}-\d{6}-\d{4}=

;4\d{12}=

;3[47]\d{2}-\d{6}-\d{5}=

;4\d{3} \d{4} \d{4} \d{4}=

;3[47]\d{2} \d{6} \d{5}=

;4\d{15}= ;3[47]\d{13}=

;5[1-5]\d{2}-\d{4}-\d{4}-\d{4}=

;4\d{3}-\d{4}-\d{4}-\d{4}=

;5[1-5]\d{2} \d{4} \d{4} \d{4}=

;5[1-5]\d{14}= ;2131\d{11}=

;3\d{3}-\d{4}-\d{4}-\d{4}=

;3\d{3} \d{4} \d{4} \d{4}=

;3\d{15}=

;2149\d{11}=

;2149 \d{6} \d{5}=

;2149-\d{6}-\d{5}=

;2014\d{11}=

;2014 \d{6} \d{5}=

;2014-\d{6}-\d{5}=

%B1800\d{11}^[A-Z]{1}

%B6011-\d{4}-\d{4}-\d{4}^[A-Z]{1}

%B6011 \d{4} \d{4}

\d{4}^[A-Z]{1}
Library of system data identifiers 1095
Credit Card Number

Table 45-176 Credit Card Magnetic Stripe Data medium-breadth patterns (continued)

Patterns Patterns (continued)

%B6011\d{12}^[A-Z]{1}

Table 45-177 Credit Card Magnetic Stripe Data medium-breadth validator

Validator Description

Luhn Check Computes the Luhn checksum which every instance must
pass.

Credit Card Number

Account number needed to process credit card transactions. Often abbreviated as CCN. Also
known as a Primary Account Number (PAN).
The Credit Card Number data identifier detects valid credit card numbers that are separated
by spaces, dashes, periods, or without separators
The Credit Card Number data identifier offers three breadths of detection:
■ The wide breadth detects valid credit card numbers that are separated by spaces, dashes,
periods, or without separators. It also performs Luhn-check validation.
See “Credit Card Number wide breadth” on page 1095.
■ The medium breadth detects valid credit card numbers that are separated by spaces,
dashes, periods, or without separators. It also checks for common test numbers and
performs Luhn-check validation.
See “Credit Card Number medium breadth” on page 1096.
■ The narrow breadth detects valid credit card numbers that are separated by spaces, dashes,
periods, or without separators. It also checks for common test numbers, performs
Luhn-check validation and requires the presence of credit card number-related keywords.
See “Credit Card Number narrow breadth” on page 1100.

Credit Card Number wide breadth

The wide breadth detects valid credit card numbers that are separated by spaces, dashes,
periods, or without separators.
This validator includes formats for American Express, Diner's Club, Discover, Japan Credit
Bureau (JCB), MasterCard, and Visa.
This validator performs Luhn-check validation.
Library of system data identifiers 1096
Credit Card Number

Table 45-178 Credit Card Number wide-breadth patterns

Patterns Patterns (continued)

2149 \d{6} \d{5} 4\d{12}

2149-\d{6}-\d{5} \d{16}

2014\d{11} \d{4}.\d{4}.\d{4}.\d{4}

2014 .\d{6}.\d{5} \d{4}-\d{4}-\d{4}-\d{4}

2014 \d{6} \d{5} \d{4} \d{4} \d{4} \d{4}

2014-\d{6}-\d{5} 1800\d{11}

3[47]\d{2}.\d{6}.\d{5} 2131\d{11}

3[068]\d{2}.\d{6}.\d{4} 2149\d{11}

3[47]\d{2}-\d{6}-\d{5} 2149.\d{6}.\d{5}

3[068]\d{2}-\d{6}-\d{4}

3[47]\d{13}

3[068]\d{2} \d{6} \d{4}

3[47]\d{2} \d{6} \d{5}

3[068]\d{12}

Table 45-179 Credit Card Number wide-breadth validator

Mandatory validator Description

Luhn Check Computes the Luhn checksum, which every credit card number must pass.

Credit Card Number medium breadth

The medium breadth detects valid credit card numbers that are separated by spaces, dashes,
periods, or without separators. This validator performs Luhn check validation. This validator
includes formats for American Express, Diner's Club, Discover, Japan Credit Bureau (JCB),
MasterCard, and Visa. This validator eliminates common test numbers, including those reserved
for testing by credit card issuers.
Library of system data identifiers 1097
Credit Card Number

Table 45-180 Credit Card Number medium-breadth patterns

Patterns Patterns (continued)

Library of system data identifiers 1098
Credit Card Number

Table 45-180 Credit Card Number medium-breadth patterns (continued)

Patterns Patterns (continued)

1800\d{11} 2720.\d{4}.\d{4}.\d{4}

2131\d{11} 2720-\d{4}-\d{4}-\d{4}

3\d{3}.\d{4}.\d{4}.\d{4} 2720 \d{4} \d{4} \d{4}

3\d{3}-\d{4}-\d{4}-\d{4} 2720\d{12}

3\d{3} \d{4} \d{4} \d{4} 6221[2][6-8]\d{10}

3\d{15} 6221.[2][6-8]\d{2}.\d{4}.\d{4}

4\d{3}.\d{4}.\d{4}.\d{4} 6221-[2][6-8]\d{2}-\d{4}-\d{4}

4\d{3}-\d{4}-\d{4}-\d{4} 6221 [2][6-8]\d{2} \d{4} \d{4}

4\d{3} \d{4} \d{4} \d{4} 622[2-8]\d{12}

4\d{15} 622[2-8].\d{4}.\d{4}.\d{4}

4\d{12} 622[2-8]-\d{4}-\d{4}-\d{4}

5[1-5]\d{2}.\d{4}.\d{4}.\d{4} 622[2-8] \d{4} \d{4} \d{4}

5[1-5]\d{2}-\d{4}-\d{4}-\d{4} 6229[2][0-5]\d{10}

2149.\d{6}.\d{5} 6229.[2][0-5]\d{2}.\d{4}.\d{4}

5[1-5]\d{2} \d{4} \d{4} 6229-[2][0-5]\d{2}-\d{4}-\d{4}

\d{4}
6229 [2][0-5]\d{2} \d{4} \d{4}
2149 \d{6} \d{5}
2014 \d{6} \d{5}
5[1-5]\d{14}
2014-\d{6}-\d{5}
2149-\d{6}-\d{5}
2014\d{11}
2149\d{11}
6011.\d{4}.\d{4}.\d{4}
2014.\d{6}.\d{5}
6011-\d{4}-\d{4}-\d{4}
222[1-9]\d{12}
6011 \d{4} \d{4} \d{4}
222[1-9][.-]\d{4}[.-]\d{4}[.-]\d{4}
6011\d{12}
22[3-9]\d{13}
3[068]\d{2}.\d{6}.\d{4}
22[3-9]\d[.-]\d{4}[.-]\d{4}[.-]\d{4}
3[068]\d{2}-\d{6}-\d{4}
2[3-6]\d{14}
3[068]\d{2} \d{6} \d{4}
2[3-6]\d{2}.\d{4}.\d{4}.\d{4}
3[068]\d{12}
2[3-6]\d{2}-\d{4}-\d{4}-\d{4}
3[47]\d{13}
2[3-6]\d{2} \d{4} \d{4}
3[47]\d{2}.\d{6}.\d{5}
\d{4}
Library of system data identifiers 1099
Credit Card Number

Table 45-180 Credit Card Number medium-breadth patterns (continued)

Patterns Patterns (continued)

27[0-1]\d{13} 3[47]\d{2} \d{6} \d{5}

27[0-1]\d.\d{4}.\d{4}.\d{4} 3[47]\d{2}-\d{6}-\d{5}

27[0-1]\d-\d{4}-\d{4}-\d{4}

27[0-1]\d \d{4} \d{4} \d{4}

Table 45-181 Credit Card Number medium-breadth validators

Mandatory validators Description

Exclude exact match Excludes anything that matches the specified text.

Inputs:

0111111111111111, 1234567812345670, 180025848680889, 180026939516875,

201400000000009, 201411032364438, 201431736711288, 210002956344412,
214906110040367, 30000000000004, 30175572836108, 30203642658706,
30374367304832, 30569309025904, 3088000000000000, 3088000000000009,
3088272824427380, 3096666928988980, 3158060990195830, 340000000000009,
341019464477148, 341111111111111, 341132368578216, 343510064010360,
344400377306201, 3530111333300000, 3566002020360500, 370000000000002,
371449635398431, 374395534374782, 378282246310005, 378282246310005,
378282246310005, 378734493671000, 38520000023237, 4007000000027,
4012888888881880, 4024007116284, 4111111111111110, 4111111111111111,
4222222222222, 4242424242424242, 4485249610564758, 4539399050593,
4539475158333170, 4539603277651940, 4539687075612974, 4539890911376230,
4556657397647250, 4716733846619930, 4716976758661, 4916437046413,
4916451936094420, 4916491104658550, 4916603544909870, 4916759155933,
5105105105105100, 5119301340696760, 5263386793750340, 5268196752489640,
5283145597742620, 5424000000000015, 5429800397359070, 5431111111111111,
5455780586062610, 5472715456453270, 5500000000000004, 5539878514522540,
5547392938355060, 5555555555554440, 5555555555554444, 5556722757422205,
6011000000000000, 6011000000000004, 6011000000000012, 6011000990139420,
6011111111111110, 6011111111111117, 6011312054074430, 6011354276117410,
6011601160116611, 6011905056260500, 869908581608894, 869933317208876,
869989278167071

Luhn Check Validator computes the Luhn checksum, which every credit card number must pass.

Number delimiter Validates a match by checking the surrounding number.

Library of system data identifiers 1100
Credit Card Number

Credit Card Number narrow breadth

The narrow breadth detects valid credit card numbers that are separated by spaces, dashes,
periods, or without separators. It performs Luhn check validation. Includes formats for American
Express, Diner's Club, Discover, Japan Credit Bureau (JCB), MasterCard, and Visa. Eliminates
common test numbers, including those reserved for testing by credit card issuers. It also
requires presence of a credit card-related keyword.
Library of system data identifiers 1101
Credit Card Number

Table 45-182 Credit Card Number narrow-breadth patterns

Patterns Patterns (continued)

222[1-9]\d{12}

222[1-9][.-]\d{4}[.-]\d{4}[.-]\d{4}

22[3-9]\d{13}

22[3-9]\d[.-]\d{4}[.-]\d{4}[.-]\d{4}

2[3-6]\d{14}

2[3-6]\d{2}.\d{4}.\d{4}.\d{4}

2[3-6]\d{2}-\d{4}-\d{4}-\d{4}

2[3-6]\d{2} \d{4} \d{4} \d{4}

27[0-1]\d{13}

27[0-1]\d.\d{4}.\d{4}.\d{4}

27[0-1]\d-\d{4}-\d{4}-\d{4}

27[0-1]\d \d{4} \d{4} \d{4}

2720.\d{4}.\d{4}.\d{4}

2720-\d{4}-\d{4}-\d{4}

2720 \d{4} \d{4} \d{4}

2720\d{12}

6221[2][6-8]\d{10}

6221.[2][6-8]\d{2}.\d{4}.\d{4}

6221-[2][6-8]\d{2}-\d{4}-\d{4}

6221 [2][6-8]\d{2} \d{4} \d{4}

622[2-8]\d{12}

622[2-8].\d{4}.\d{4}.\d{4}

622[2-8]-\d{4}-\d{4}-\d{4}

622[2-8] \d{4} \d{4} \d{4}

6229[2][0-5]\d{10}

6229.[2][0-5]\d{2}.\d{4}.\d{4}

6229-[2][0-5]\d{2}-\d{4}-\d{4}

6229 [2][0-5]\d{2} \d{4} \d{4}

Library of system data identifiers 1102
Credit Card Number

Table 45-182 Credit Card Number narrow-breadth patterns (continued)

Patterns Patterns (continued)

2149 \d{6} \d{5}

2149-\d{6}-\d{5}

2014\d{11}

2014 \d{6} \d{5}

2014-\d{6}-\d{5}

6011-\d{4}-\d{4}-\d{4}

6011 \d{4} \d{4} \d{4}

6011\d{12}

3[068]\d{12}

3[068]\d{2} \d{6} \d{4}

3[068]\d{2}-\d{6}-\d{4}

3[47]\d{2}-\d{6}-\d{5}

3[47]\d{2} \d{6} \d{5}

3[47]\d{13}

4\d{3}-\d{4}-\d{4}-\d{4}

3\d{3}.\d{4}.\d{4}.\d{4}

2149.\d{6}.\d{5}

2014.\d{6}.\d{5}

6011.\d{4}.\d{4}.\d{4}

3[068]\d{2}.\d{6}.\d{4}

3[47]\d{2}.\d{6}.\d{5}

4\d{3}.\d{4}.\d{4}.\d{4}

1800\d{11}

4\d{12}

4\d{3} \d{4} \d{4} \d{4}

4\d{15}

5[1-5]\d{2}-\d{4}-\d{4}-\d{4}

5[1-5]\d{2} \d{4} \d{4}

\d{4}
Library of system data identifiers 1103
Credit Card Number

Table 45-182 Credit Card Number narrow-breadth patterns (continued)

Patterns Patterns (continued)

5[1-5]\d{14}

5[1-5]\d{2}.\d{4}.\d{4}.\d{4}

2131\d{11}

3\d{3}-\d{4}-\d{4}-\d{4}

3\d{3} \d{4} \d{4} \d{4}

3\d{15}

2149\d{11}

Table 45-183 Credit Card Number narrow-breadth validators

Mandatory validators Description

Exclude exact match Excludes anything that matches the specified text.

Inputs:

0111111111111111, 1234567812345670, 180025848680889, 180026939516875,

Luhn Check Validator computes the Luhn checksum which every Credit Card Number must
pass.
Library of system data identifiers 1104
Croatia National Identification Number

Table 45-183 Credit Card Number narrow-breadth validators (continued)

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding number.

Find keywords With this option selected, at least one of the following keywords or key phrases
must be present for the data to be matched.

Inputs:

account number, account ps, american express, americanexpress, amex,

bank card, bankcard, card num, card number, cc #, cc#, ccn, check card,
checkcard, credit card, credit card #, credit card number, credit card#, debit
card, debitcard, diners club, dinersclub, discover, enroute, japanese card
bureau, jcb, mastercard, mc, visa

Croatia National Identification Number

The Croatian National Identification number (Osobni identifikacijski broj or OIB) is the permanent
personal and tax identifier for Croatian citizens and residents.
The Croatia National Identification Number data identifier detects an 11-digit number, optionally
preceded by the letters HR or hr, that matches the Croatia National Identification Number
format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-digit number, optionally preceded by the letters HR or hr,
that matches the Croatia National Identification Number format. It checks for duplicate
digits and common test numbers.
See “Croatia National Identification Number wide breadth” on page 1105.
■ The medium breadth detects an 11-digit number, optionally preceded by the letters HR or
hr, that matches the Croatia National Identification Number format with checksum validation.
See “Croatia National Identification Number medium breadth” on page 1105.
■ The narrow breadth detects an 11-digit number, optionally preceded by the letters HR or
hr, that matches the Croatia National Identification Number format with checksum validation.
It checks for duplicate digits and common test numbers, and requires the presence of
related keywords.
See “Croatia National Identification Number narrow breadth” on page 1105.
Library of system data identifiers 1105
Croatia National Identification Number

Croatia National Identification Number wide breadth

The wide breadth detects an 11-digit number, optionally preceded by the letters HR or hr, that
matches the Croatia National Identification Number format. It checks for duplicate digits and
common test numbers.

Table 45-184 Croatia National Identification Number wide-breadth patterns

Pattern

\d{11}

[Hh][Rr]\d{11}

Table 45-185 Croatia National Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Croatia National Identification Number medium breadth

The medium breadth detects an 11-digit number, optionally preceded by the letters HR or hr,
that matches the Croatia National Identification Number format with checksum validation.

Table 45-186 Croatia National Identification Number medium-breadth patterns

Pattern

\d{11}

[Hh][Rr]\d{11}

Table 45-187 Croatia National Identification Number medium-breadth validators

Mandatory validator Description

Croatia National Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Croatia National Identification Number narrow breadth

The narrow breadth detects an 11-digit number, optionally preceded by the letters HR or hr,
that matches the Croatia National Identification Number format with checksum validation. It
Library of system data identifiers 1106
CUSIP Number

checks for duplicate digits and common test numbers, and requires the presence of related
keywords.

Table 45-188 Croatia National Identification Number narrow-breadth patterns

Pattern

\d{11}

[Hh][Rr]\d{11}

Table 45-189 Croatia National Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Croatia National Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

national ID, Osobna iskaznica, national identification

number,Nacionalni identifikacijski broj, personal ID,
osobni ID, personal identification number, osobni
identifikacijski broj, OIB, OIB#, nationalid#,
personalid#, tax id, tax number, tax identification
number, tax code, taxpayer code, taxpayer id, taxpayer
identification code, porez iskaznica, porezni broj,
porezni identifikacijski broj, porez kod, šifra poreznog
obveznika

CUSIP Number
The CUSIP number is a unique identifier assigned to North American stock or other securities.
This number is issued by the Committee on Uniform Security Identification Procedures (CUSIP)
to assist in clearing and settling trades. CINS is an extension of CUSIP used to identify securities
outside of North America.
The CUSIP Number data identifier detects a 9-character alphanumeric pattern that matches
the CUSIP Number format.
This data identifier provides three breadths of detection:
Library of system data identifiers 1107
CUSIP Number

■ The wide breadth detects a 9-character alphanumeric pattern with checksum validation.
See “CUSIP Number wide breadth” on page 1107.
■ The medium breadth detects a 9-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
See “CUSIP Number medium breadth” on page 1107.
■ The narrow breadth detects a 9-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords, excluding the NNA keyword.
See “CUSIP Number narrow breadth” on page 1108.

CUSIP Number wide breadth

The wide breadth detects a 9-character alphanumeric pattern with checksum validation. The
5th, 6th, 7th, and 8th character can be a letter or number, and all others are digits.

Table 45-190 CUSIP Number wide-breadth pattern

Pattern

w\d\w{6}\d

\w\d\w{4} \w{2} \d

Table 45-191 CUSIP Number wide-breadth validator

Mandatory validator Description

Cusip Validation Validator checks for invalid CUSIP ranges and computes the CUSIP checksum
(Modulus 10 Double Add Double algorithm).

CUSIP Number medium breadth

The medium breadth detects a 9-character alphanumeric pattern with checksum validation. It
also requires the presence of related keywords. The 5th, 6th, 7th, and 8th character can be a
letter or number, and all others are digits.

Table 45-192 CUSIP Number medium-breadth pattern

Pattern

w\d\w{6}\d

\w\d\w{4} \w{2} \d
Library of system data identifiers 1108
CUSIP Number

Table 45-193 CUSIP Number medium-breadth validator

Mandatory validator Description

Cusip Validation Validator checks for invalid CUSIP ranges and computes the CUSIP
checksum (Modulus 10 Double Add Double algorithm).

Find keywords With this option selected, at least one of the following keywords or key
phrases must be present for the data to be matched.

Inputs:

cusip, c.u.s.i.p., Committee on Uniform Security Identification

Procedures, American Bankers Association, Standard & Poor's, S&P,
National Numbering Association, NNA, National Securities
Identification Number

CUSIP Number narrow breadth

The narrow breadth detects a 9-character alphanumeric pattern with checksum validation. It
also requires the presence of related keywords, excluding the NNA keyword. The 5th, 6th,
7th, and 8th character can be a letter or number, and all others are digits.

Table 45-194 CUSIP Number narrow-breadth pattern

Pattern

w\d\w{6}\d

\w\d\w{4} \w{2} \d

Table 45-195 CUSIP Number narrow-breadth validators

Mandatory validator Description

Cusip Validation Validator checks for invalid CUSIP ranges and computes the CUSIP checksum
(Modulus 10 Double Add Double algorithm).

Find keywords With this option selected, at least one of the following keywords or key phrases
must be present for the data to be matched.
Inputs:

cusip, c.u.s.i.p., Committee on Uniform Security Identification Procedures,

American Bankers Association, Standard & Poor's, S&P, National Numbering
Association, National Securities Identification Number
Library of system data identifiers 1109
Cyprus Tax Identification Number

Cyprus Tax Identification Number

The Cyprus Tax Identification Number is a unique identifier for Cypriot taxpayers.
The Cyprus Tax Identification Number data identifier detects a nine-character alphanumeric
pattern that matches the Cyprus Tax Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a nine-character alphanumeric pattern that matches the Cyprus
Tax Identification Number format without checksum validation.
See “Cyprus Tax Identification Number wide breadth” on page 1109.
■ The medium breadth detects a nine-character alphanumeric pattern that matches the
Cyprus Tax Identification Number format with checksum validation.
See “Cyprus Tax Identification Number medium breadth” on page 1109.
■ The narrow breadth detects a nine-character alphanumeric pattern that matches the Cyprus
Tax Identification Number format with checksum validation. It also requires the presence
of related keywords.
See “Cyprus Tax Identification Number narrow breadth” on page 1110.

Cyprus Tax Identification Number wide breadth

The wide breadth detects a nine-character alphanumeric pattern that matches the Cyprus Tax
Identification Number format without checksum validation.

Table 45-196 Cyprus Tax Identification Number wide-breadth patterns

Pattern

\d{8}[A-Za-z]

Table 45-197 Cyprus Tax Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Cyprus Tax Identification Number medium breadth

The medium breadth detects a nine-character alphanumeric pattern that matches the Cyprus
Tax Identification Number format with checksum validation.
Library of system data identifiers 1110
Cyprus Tax Identification Number

Table 45-198 Cyprus Tax Identification Number medium-breadth patterns

Pattern

\d{8}[A-Za-z]

Table 45-199 Cyprus Tax Identification Number medium-breadth validators

Mandatory validator Description

Cyprus Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Cyprus Tax Identification Number narrow breadth

The narrow breadth detects a nine-character alphanumeric pattern that matches the Cyprus
Tax Identification Number format with checksum validation. It also requires the presence of
related keywords.

Table 45-200 Cyprus Tax Identification Number narrow-breadth patterns

Pattern

\d{8}[A-Za-z]

Table 45-201 Cyprus Tax Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Cyprus Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

tax identification number, tax number, tax id, cyprus

TIN number, taxid#, taxnumber#, αριθμός φορολογικού
μητρώου, Vergi Kimlik Numarası, vergi numarası,
Kıbrıs TIN numarası, tin, TIN, tin#, TIN#, tin no
Library of system data identifiers 1111
Cyprus Value Added Tax (VAT) Number

Cyprus Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Cyprus, VAT is
administered by the tax office for the region in which the business is established.
The Cyprus Value Added Tax (VAT) Number data identifier detects an 11-character
alphanumeric pattern that matches the Cyprus VAT Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-character alphanumeric pattern that matches the Cyprus
VAT Number format without checksum validation.
See “Cyprus Value Added Tax (VAT) Number wide breadth” on page 1111.
■ The medium breadth detects an 11-character alphanumeric pattern that matches the Cyprus
VAT Number format with checksum validation.
See “Cyprus Value Added Tax (VAT) Number medium breadth” on page 1111.
■ The narrow breadth detects an 11-character alphanumeric pattern that matches the Cyprus
VAT Number format with checksum validation. It also requires the presence of related
keywords.
See “Cyprus Value Added Tax (VAT) Number narrow breadth” on page 1112.

Cyprus Value Added Tax (VAT) Number wide breadth

The wide breadth detects an 11-character alphanumeric pattern that matches the Cyprus VAT
Number format without checksum validation.

Table 45-202 Cyprus Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Cc][Yy]\d{8}[A-Za-z]

Table 45-203 Cyprus Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Cyprus Value Added Tax (VAT) Number medium breadth

The medium breadth detects an 11-character alphanumeric pattern that matches the Cyprus
VAT Number format with checksum validation.
Library of system data identifiers 1112
Czech Republic Driver's Licence Number

Table 45-204 Cyprus Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Cc][Yy]\d{8}[A-Za-z]

Table 45-205 Cyprus Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Cyprus Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.

Cyprus Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects an 11-character alphanumeric pattern that matches the Cyprus
VAT Number format with checksum validation. It also requires the presence of related keywords.

Table 45-206 Cyprus Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Cc][Yy]\d{8}[A-Za-z]

Table 45-207 Cyprus Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Cyprus Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat no, vat, vat number, vat#, VAT, VAT#, value added
tax, vatin, VATIN, KDV, kdv#, KDV numarası, Katma
değer Vergisi, Φόρος Προστιθέμενης Αξίας

Czech Republic Driver's Licence Number

The Czech Republic Ministry of Transport grants driver's licenses in the Czech Republic,
confirming the rights of the holder to drive motor vehicles.
Library of system data identifiers 1113
Czech Republic Driver's Licence Number

The Czech Republic Driver's Licence Number data identifier detects an eight-character
alphanumeric pattern that matches the Czech Republic Driver's Licence Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern that matches the Czech
Republic Driver's Licence Number format. It checks for common test patterns.
See “Czech Republic Driver's License Number wide breadth” on page 1113.
■ The narrow breadth detects an eight-character alphanumeric pattern that matches the
Czech Republic Driver's Licence Number format. It checks for common test patterns, and
also requires the presence of related keywords.
See “Czech Republic Driver's License Number narrow breadth” on page 1113.

Czech Republic Driver's License Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern that matches the Czech
Republic Driver's Licence Number format. It checks for common test patterns.

Table 45-208 Czech Republic Driver's License Number wide-breadth patterns

Pattern

[Ee][A-Za-z] \d{6}

Table 45-209 Czech Republic Driver's License Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

Czech Republic Driver's License Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern that matches the Czech
Republic Driver's Licence Number format. It checks for common test patterns, and also requires
the presence of related keywords.

Table 45-210 Czech Republic Driver's License Number narrow-breadth patterns

Pattern

[Ee][A-Za-z] \d{6}
Library of system data identifiers 1114
Czech Republic Personal Identification Number

Table 45-211 Czech Republic Driver's License Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

driver license, drivers license, driving license, driver

license number, drivers license number, driving license
number, DLNo#, dlno#, drivers lic., Driver's License,
Driver's License Number, driver's license number,
Driver's Licence Number, driver licence, drivers
licence, driving licence, Driver's Licence, driver permit,
drivers permit, driving permit, license number, licence
number

řidičský průkaz, řidičský prúkaz, číslo řidičského

průkazu, řidičské číslo řidičů, ovladače lic., Číslo
licence řidiče, Řidičský průkaz, povolení řidiče, řidiči
povolení, povolení k jízdě, číslo licence

Czech Republic Personal Identification Number

All citizens of the Czech Republic are issued a unique personal identification number by the
Ministry of Interior.
The Czech Republic Personal Identification Number data identifier detects a 9- or 10-digit
number that matches the Czech Personal Identification Number format.
This data identifier provides three breadths of validation:
■ The wide breadth detects a 9- or 10-digit number without checksum validation.
See “Czech Republic personal Identification Number wide breadth” on page 1115.
■ The medium breadth detects a 9- or 10-digit number with checksum validation.
See “Czech Republic Personal Identification Number medium breadth” on page 1115.
■ The narrow breadth detects a 9- or 10-digit number with checksum validation. It also requires
the presence of related keywords.
See “Czech Republic Personal Identification Number narrow breadth” on page 1116.
Library of system data identifiers 1115
Czech Republic Personal Identification Number

Czech Republic personal Identification Number wide breadth

The wide breadth detects a 9- or 10-digit number without checksum validation.

Table 45-212 Czech Republic Personal Identification Number wide-breadth patterns

Pattern

\d\d[0156]\d[0123]\d[/]\d\d\d

\d\d[0156]\d[0123]\d[/]\d\d\d\d

\d\d[0156]\d[0123]\d\d\d\d

\d\d[0156]\d[0123]\d\d\d\d\d

\d\d[0156]\d[012345678]\d[/]\d\d\d

\d\d[0156]\d[012345678]\d[/]\d\d\d\d

\d\d[0156]\d[012345678]\d\d\d\d

\d\d[0156]\d[012345678]\d\d\d\d\d

Table 45-213 Czech Republic Personal Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Czech Republic Personal Identification Number medium breadth

The medium breadth detects a 9- or 10-digit number with checksum validation.

Table 45-214 Czech Republic Personal Identification Number medium-breadth pattern

Pattern

\d\d[0156]\d[0123]\d[/]\d\d\d

\d\d[0156]\d[0123]\d[/]\d\d\d\d

\d\d[0156]\d[0123]\d\d\d\d

\d\d[0156]\d[0123]\d\d\d\d\d

\d\d[0156]\d[012345678]\d[/]\d\d\d

\d\d[0156]\d[012345678]\d[/]\d\d\d\d
Library of system data identifiers 1116
Czech Republic Personal Identification Number

Table 45-214 Czech Republic Personal Identification Number medium-breadth pattern

(continued)

Pattern

\d\d[0156]\d[012345678]\d\d\d\d

\d\d[0156]\d[012345678]\d\d\d\d\d

Table 45-215 Czech Republic Personal Identification Number medium-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Czech Personal Identity Computes the checksum and validates the pattern against it.
Number Validation Check

Exclude beginning characters 5555555555, 1111111111, 111111111

Czech Republic Personal Identification Number narrow breadth

The narrow breadth detects a 9- or 10-digit number with checksum validation. It also requires
the presence of related keywords.

Table 45-216 Czech Republic Personal Identification Number narrow-breadth patterns

Pattern

\d\d[0156]\d[0123]\d[/]\d\d\d

\d\d[0156]\d[0123]\d[/]\d\d\d\d

\d\d[0156]\d[0123]\d\d\d\d

\d\d[0156]\d[0123]\d\d\d\d\d

\d\d[0156]\d[012345678]\d[/]\d\d\d

\d\d[0156]\d[012345678]\d[/]\d\d\d\d

\d\d[0156]\d[012345678]\d\d\d\d

\d\d[0156]\d[012345678]\d\d\d\d\d
Library of system data identifiers 1117
Czech Republic Tax Identification Number

Table 45-217 Czech Republic Personal Identification Number narrow-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Czech Personal Identity Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

personal ID number, PID, personal identity number,

Czech Personal ID Number, identity no, Czech Republic
ID, republic identity number, national number,
insurance number, unique identification number, PID#,
Czechidno#, identityno#

Osobní identifikační číslo, Pojištění číslo, unikátní

identifikační číslo , Osobní identifikační číslo,
identifikační číslo

Czech Republic Tax Identification Number

The Czech Republic Tax Identification Number is a unique identifier for taxpayers in the Czech
Republic.
The Czech Republic Tax Identification Number data identifier detects a 9- to 10-character
alphanumeric pattern that matches the Czech Tax Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 9- to 10-character alphanumeric pattern that matches the Czech
Tax Identification Number format without checksum validation. It checks for common test
patterns.
See “Czech Republic Tax Identification Number wide breadth” on page 1118.
■ The medium breadth detects a 9- to 10-character alphanumeric pattern that matches the
Czech Tax Identification Number format with checksum validation.
See “Czech Republic Tax Identification Number medium breadth” on page 1119.
■ The narrow breadth detects a 9- to 10-character alphanumeric pattern that matches the
Czech Tax Identification Number format with checksum validation. It checks for common
test patterns, and also requires the presence of related keywords.
Library of system data identifiers 1118
Czech Republic Tax Identification Number

See “Czech Republic Tax Identification Number narrow breadth” on page 1120.

Czech Republic Tax Identification Number wide breadth

The wide breadth detects a 9- to 10-character alphanumeric pattern that matches the Czech
Republic Tax Identification Number format without checksum validation. It checks for common
test patterns.

Table 45-218 Czech Republic Tax Identification Number wide-breadth patterns

Pattern

\d{2}[05][1-9][012]\d{4,5}

\d{2}[05][1-9]3[01]\d{3,4}

\d{2}[05][1-9][012]\d[/]\d{3,4}

\d{2}[05][1-9]3[01][/]\d{3,4}

\d{2}[16][012]{2}\d{4,5}

\d{2}[16][012]3[01]\d{3,4}

\d{2}[16][012]{2}\d[/]\d{3,4}

\d{2}[16][012]3[01][/]\d{3,4}

\d{2}[27][1-9][012]\d{5}

\d{2}[27][1-9]3[01]\d{4}

\d{2}[27][1-9][012]\d[/]\d{4}

\d{2}[27][1-9]3[01][/]\d{4}

\d{2}[38][012]{2}\d{5}

\d{2}[38][012]3[01]\d{4}

\d{2}[38][012]{2}\d[/]\d{4}

\d{2}[38][012]3[01][/]\d{4}

Table 45-219 Czech Republic Tax Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1119
Czech Republic Tax Identification Number

Table 45-219 Czech Republic Tax Identification Number wide-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 1111111, 2222222, 33333, 44444, 55555, 66666,

77777, 88888, 99999

Czech Republic Tax Identification Number medium breadth

The medium breadth detects a 9- to 10-character alphanumeric pattern that matches the Czech
Republic Tax Identification Number format with checksum validation.

Table 45-220 Czech Republic Tax Identification Number medium-breadth patterns

Pattern

\d{2}[05][1-9][012]\d{4,5}

\d{2}[05][1-9]3[01]\d{3,4}

\d{2}[05][1-9][012]\d[/]\d{3,4}

\d{2}[05][1-9]3[01][/]\d{3,4}

\d{2}[16][012]{2}\d{4,5}

\d{2}[16][012]3[01]\d{3,4}

\d{2}[16][012]{2}\d[/]\d{3,4}

\d{2}[16][012]3[01][/]\d{3,4}

\d{2}[27][1-9][012]\d{5}

\d{2}[27][1-9]3[01]\d{4}

\d{2}[27][1-9][012]\d[/]\d{4}

\d{2}[27][1-9]3[01][/]\d{4}

\d{2}[38][012]{2}\d{5}

\d{2}[38][012]3[01]\d{4}

\d{2}[38][012]{2}\d[/]\d{4}

\d{2}[38][012]3[01][/]\d{4}
Library of system data identifiers 1120
Czech Republic Tax Identification Number

Table 45-221 Czech Republic Tax Identification Number medium-breadth validators

Mandatory validator Description

Czech Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Czech Republic Tax Identification Number narrow breadth

The narrow breadth detects a 9- to 10-character alphanumeric pattern that matches the Czech
Republic Tax Identification Number format with checksum validation. It checks for common
test patterns, and also requires the presence of related keywords.

Table 45-222 Czech Republic Tax Identification Number narrow-breadth patterns

Pattern

\d{2}[05][1-9][012]\d{4,5}

\d{2}[05][1-9]3[01]\d{3,4}

\d{2}[05][1-9][012]\d[/]\d{3,4}

\d{2}[05][1-9]3[01][/]\d{3,4}

\d{2}[16][012]{2}\d{4,5}

\d{2}[16][012]3[01]\d{3,4}

\d{2}[16][012]{2}\d[/]\d{3,4}

\d{2}[16][012]3[01][/]\d{3,4}

\d{2}[27][1-9][012]\d{5}

\d{2}[27][1-9]3[01]\d{4}

\d{2}[27][1-9][012]\d[/]\d{4}

\d{2}[27][1-9]3[01][/]\d{4}

\d{2}[38][012]{2}\d{5}

\d{2}[38][012]3[01]\d{4}

\d{2}[38][012]{2}\d[/]\d{4}

\d{2}[38][012]3[01][/]\d{4}
Library of system data identifiers 1121
Czech Republic Value Added Tax (VAT) Number

Table 45-223 Czech Republic Tax Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 1111111, 2222222, 33333, 44444, 55555, 66666,

77777, 88888, 99999

Czech Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

personal code, personal id, national identification

number, national ID, personal ID, personal
identification number, nationalid#, personalid#,
personal identification code, PID#, tin, tax identification
number, tin#, tax id, tin no, tin number, tax number,
tax code, taxpayer id, taxpayer identification number

osobní kód, Národní identifikační číslo, osobní

identifikační číslo, cínové číslo, daňové identifikačné
číslo, daňový poplatník id

Czech Republic Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. In the Czech Republic, it
is also called DPH.
The Czech Republic Value Added Tax (VAT) Number data identifier detects a 10- to
15-character alphanumeric pattern that matches the Czech Republic Value Added Tax (VAT)
Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 10- to 15-character alphanumeric pattern that matches the
Czech Value Added Tax (VAT) Number format without checksum validation. It checks for
common test patterns.
See “Czech Republic Value Added Tax (VAT) Number wide breadth” on page 1122.
■ The medium breadth detects a 10- to 15-character alphanumeric pattern that matches the
Czech Value Added Tax (VAT) Number format with checksum validation.
Library of system data identifiers 1122
Czech Republic Value Added Tax (VAT) Number

See “Czech Republic Value Added Tax (VAT) Number medium breadth” on page 1123.
■ The narrow breadth detects a 10- to 15-character alphanumeric pattern that matches the
Czech Value Added Tax (VAT) Number format with checksum validation. It checks for
common test patterns, and also requires the presence of related keywords.
See “Czech Republic Value Added Tax (VAT) Number narrow breadth” on page 1124.

Czech Republic Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 10- to 15-character alphanumeric pattern that matches the Czech
Value Added Tax (VAT) Number format without checksum validation. It checks for common
test numbers.

Table 45-224 Czech Republic Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Cc][Zz]\d{8,13}

[Cc][Zz] \d{8,13}

[Cc][Zz] \d{2} \d{2} \d{2} \d{2}

[Cc][Zz] \d{2} \d{2} \d{2} \d{2} \d{2}

[Cc][Zz]\d{3} \d{2} \d{3}

[Cc][Zz]\d{3} \d{2} \d{2} \d{3}

[Cc][Zz] \d{3} \d{2} \d{3}

[Cc][Zz] \d{3} \d{2} \d{2} \d{3}

Table 45-225 Czech Republic Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1123
Czech Republic Value Added Tax (VAT) Number

Table 45-225 Czech Republic Value Added Tax (VAT) Number wide-breadth validators
(continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

0000000000, 1111111111, 2222222222, 3333333333,

4444444444, 5555555555, 6666666666, 7777777777,
8888888888, 9999999999

00000000000, 11111111111, 22222222222,

33333333333, 44444444444, 55555555555,
66666666666, 77777777777, 88888888888, 99999999999

000000000000, 111111111111, 222222222222,

333333333333, 444444444444, 555555555555,
666666666666, 777777777777, 888888888888,
999999999999

0000000000000, 1111111111111, 2222222222222,

3333333333333, 4444444444444, 5555555555555,
6666666666666, 7777777777777, 8888888888888,
9999999999999

Czech Republic Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 10- to 15-character alphanumeric pattern that matches the
Czech Value Added Tax (VAT) Number format with checksum validation.

Table 45-226 Czech Republic Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Cc][Zz]\d{8,13}

[Cc][Zz] \d{8,13}

[Cc][Zz] \d{2} \d{2} \d{2} \d{2}

[Cc][Zz] \d{2} \d{2} \d{2} \d{2} \d{2}

[Cc][Zz]\d{3} \d{2} \d{3}

Library of system data identifiers 1124
Czech Republic Value Added Tax (VAT) Number

Table 45-226 Czech Republic Value Added Tax (VAT) Number medium-breadth patterns
(continued)

Pattern

[Cc][Zz]\d{3} \d{2} \d{2} \d{3}

[Cc][Zz] \d{3} \d{2} \d{3}

[Cc][Zz] \d{3} \d{2} \d{2} \d{3}

Table 45-227 Czech Republic Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Czech Republic VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Czech Republic Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 10- to 15-character alphanumeric pattern that matches the
Czech Value Added Tax (VAT) Number format with checksum validation. It checks for common
test numbers, and also requires the presence of related keywords.

Table 45-228 Czech Republic Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Cc][Zz]\d{8,13}

[Cc][Zz] \d{8,13}

[Cc][Zz] \d{2} \d{2} \d{2} \d{2}

[Cc][Zz] \d{2} \d{2} \d{2} \d{2} \d{2}

[Cc][Zz]\d{3} \d{2} \d{3}

[Cc][Zz]\d{3} \d{2} \d{2} \d{3}

[Cc][Zz] \d{3} \d{2} \d{3}

[Cc][Zz] \d{3} \d{2} \d{2} \d{3}

Table 45-229 Czech Republic Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1125
Czech Republic Value Added Tax (VAT) Number

Table 45-229 Czech Republic Value Added Tax (VAT) Number narrow-breadth validators
(continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

0000000000, 1111111111, 2222222222, 3333333333,

4444444444, 5555555555, 6666666666, 7777777777,
8888888888, 9999999999

00000000000, 11111111111, 22222222222,

33333333333, 44444444444, 55555555555,
66666666666, 77777777777, 88888888888, 99999999999

000000000000, 111111111111, 222222222222,

333333333333, 444444444444, 555555555555,
666666666666, 777777777777, 888888888888,
999999999999

0000000000000, 1111111111111, 2222222222222,

3333333333333, 4444444444444, 5555555555555,
6666666666666, 7777777777777, 8888888888888,
9999999999999

Czech Republic VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat number, value added tax, vat, VAT, VAT#, vat#,

VATIN, vatin

číslo DPH, Daň z přidané hodnoty, Dan z pridané

hodnoty, Daň přidané hodnoty, Dan pridané hodnoty,
DPH, DIC, DIČ
Library of system data identifiers 1126
Denmark Personal Identification Number

Denmark Personal Identification Number

In Denmark, every citizen has a national identification number. The number serves as proof
of identification for most purposes.
The Denmark Personal Identification Number data identifier detects a 10-digit number that
matches the Denmark Personal Identification Number format.
The Denmark Personal Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “Denmark Personal Identification Number wide breadth” on page 1126.
■ The medium breadth detects a 10-digit number with checksum validation.
See “Denmark Personal Identification Number medium breadth” on page 1126.
■ The medium breadth detects a 10-digit number with checksum validation. It also requires
the presence of related keywords.
See “Denmark Personal Identification Number narrow breadth” on page 1127.

Denmark Personal Identification Number wide breadth

The wide breadth detects a 10-digit number without checksum validation.

Table 45-230 Denmark Personal Identification Number wide-breadth patterns

Patterns

\d{6}[ -]\d{4}

\d{6}[ -]\l{4}

\d{10}

Table 45-231 Denmark Personal Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Denmark Personal Identification Number medium breadth

The medium breadth detects a 10-digit number with checksum validation.
Library of system data identifiers 1127
Denmark Personal Identification Number

Table 45-232 Denmark Personal Identification Number medium-breadth patterns

Patterns

\d{6}[ -]\d{4}

\d{6}[ -]\l{4}

\d{10}

Table 45-233 Denmark Personal Identification Number medium-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding

characters.

Denmark Personal Identification Number Checksum validator for the Denmark Personal
Validation Check Identification Number.

Denmark Personal Identification Number narrow breadth

The medium breadth detects a ten-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-234 Denmark Personal Identification Number narrow-breadth patterns

Patterns

\d{6}[ -]\d{4}

\d{6}[ -]\l{4}

\d{10}

Table 45-235 Denmark Personal Identification Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Denmark Personal Identification Number Validation Checksum validator for the Denmark Personal Identification
Check Number.
Library of system data identifiers 1128
Denmark Tax Identification Number

Table 45-235 Denmark Personal Identification Number narrow-breadth validators (continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

national identification number, national identity

number, personal identity number, personal
identification number, nationalid#, personalidentityno#,
unique identity number, uniqueidentityno#

Nationalt identifikationsnummer, personnummer, unikt

identifikationsnummer, identifikationsnummer, centrale
personregister, cpr, cpr-nummer, cpr#, cpr-nummer#,
identifikationsnummer#, personnummer#

Denmark Tax Identification Number

Denmark issues a tax identification number for persons who have obligations to declare taxes
in Denmark. The tax identification number also serves as a personal health insurance number.
The Denmark Tax Identification Number data identifier detects a 10-digit number that matches
the Denmark Tax Identification Number format.
The Denmark Tax Identification Number data identifier offers three breadths of detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “Denmark Tax Identification Number wide breadth” on page 1128.
■ The medium breadth detects a 10-digit number with checksum validation.
See “Denmark Tax Identification Number medium breadth” on page 1129.
■ The narrow breadth detects a 10-digit number with checksum validation. It also requires
the presence of related keywords.
See “Denmark Tax Identification Number narrow breadth” on page 1129.

Denmark Tax Identification Number wide breadth

The wide breadth detects a 10-digit number without checksum validation.

Table 45-236 Denmark Tax Identification Number wide-breadth pattern

Pattern

\d{6}-\d{4}
Library of system data identifiers 1129
Denmark Tax Identification Number

Table 45-237 Denmark Tax Identification Number wide breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Denmark Tax Identification Number medium breadth

The medium breadth detects a 10-digit number with checksum validation.

Table 45-238 Denmark Tax Identification Number medium-breadth pattern

Pattern

\d{6}-\d{4}

Table 45-239 Denmark Tax Identification Number medium-breadth validator

Mandatory validator Description

Denmark Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Denmark Tax Identification Number narrow breadth

The narrow breadth detects a 10-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-240 Denmark Tax Identification Number narrow-breadth pattern

Pattern

\d{6}-\d{4}

Table 45-241 Denmark Tax Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Denmark Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Duplicate digits Ensures that a string of digits is not all the same.
Library of system data identifiers 1130
Denmark Value Added Tax (VAT) Number

Table 45-241 Denmark Tax Identification Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

tax id, tax number, tax identification number, tax code

skat id, skattenummer, skat identifikationsnummer,

skat kode

cpr number, cpr#, taxid#, cpr, CPR, health insurance,

health insurance number, health card number, health
card, travel health insurance card, health insurance
card number

sygesikring, Sundhedsforsikringsnummer,
sundhedskortnummer, sundhedskort,
REJSESYGESIKRINGSKORT,
Sundhedsforsikringskort, sygesikringkortnummer,
Krankenkassennummer, Gesundheitskarte Nummer,
ReisekrankenversicherungskarteNummer,
GesundheitsVersicherungkarte Nummer

Denmark Value Added Tax (VAT) Number

VAT is a consumption tax that is borne by the end consumer. VAT is paid for each transaction
in the manufacturing and distribution process. For Denmark, the VAT number is issued by the
tax office for the region in which the business is established.
The Denmark Value Added Tax (VAT) Number detects a 10-character alphanumeric pattern
that matches the Denmark Value Added Tax (VAT) Number format.
The Denmark Value Added Tax (VAT) Number data identifier provides three breadths of
detection:
■ The wide breadth detects a 10-character alphanumeric pattern preceded by DK without
checksum validation.
See “Denmark Value Added Tax (VAT) Number wide breadth” on page 1131.
■ The medium breadth detects a 10-character alphanumeric pattern preceded by DKwith
checksum validation.
See “Denmark Value Added Tax (VAT) Number medium breadth” on page 1131.
■ The narrow breadth detects a 10-character alphanumeric pattern preceded by DKwith
checksum validation. It also requires the presence of related keywords.
Library of system data identifiers 1131
Denmark Value Added Tax (VAT) Number

See “Denmark Value Added Tax (VAT) Number narrow breadth” on page 1132.

Denmark Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 10-character alphanumeric pattern preceded by DK without
checksum validation.

Table 45-242 Denmark Value Added Tax (VAT) Number wide-breadth patterns

Patterns

[Dd][Kk]\d{8}

[Dd][Kk] \d{8}

[Dd][Kk] \d{3} \d{3} \d{2}

[Dd][Kk] \d{3}-\d{3}-\d{2}

[Dd][Kk] \d{3}.\d{3}.\d{2}

[Dd][Kk]-\d{8}

[Dd][Kk] \d{3},\d{3},\d{2}

Table 45-243 Denmark Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Denmark Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 10-character alphanumeric pattern preceded by DKwith checksum
validation.

Table 45-244 Denmark Value Added Tax (VAT) Number medium-breadth patterns

Patterns

[Dd][Kk]\d{8}

[Dd][Kk] \d{8}
Library of system data identifiers 1132
Denmark Value Added Tax (VAT) Number

Table 45-244 Denmark Value Added Tax (VAT) Number medium-breadth patterns (continued)

Patterns

[Dd][Kk] \d{3} \d{3} \d{2}

[Dd][Kk] \d{3}-\d{3}-\d{2}

[Dd][Kk] \d{3}.\d{3}.\d{2}

[Dd][Kk]-\d{8}

[Dd][Kk] \d{3},\d{3},\d{2}

Table 45-245 Denmark Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Denmark VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Denmark Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 10-character alphanumeric pattern preceded by DKwith checksum
validation. It also requires the presence of related keywords

Table 45-246 Denmark Value Added Tax (VAT) Number narrow-breadth patterns

Patterns

[Dd][Kk]\d{8}

[Dd][Kk] \d{8}

[Dd][Kk] \d{3} \d{3} \d{2}

[Dd][Kk] \d{3}-\d{3}-\d{2}

[Dd][Kk] \d{3}.\d{3}.\d{2}

[Dd][Kk]-\d{8}

[Dd][Kk] \d{3},\d{3},\d{2}

Table 45-247 Denmark Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1133
Driver's License Number – CA State

Table 45-247 Denmark Value Added Tax (VAT) Number narrow-breadth validators (continued)

Mandatory validators Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Denmark VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

vat number, vat, vat#, vat no., value added tax number,
vat identification number

moms, momsnummer, moms identifikationsnummer,

merværdiafgift

Driver's License Number – CA State

The California (CA) state driver's license number is the identifier for an individual's driver's
license issued by the US state of California.
The Drivers License Number – CA State data identifier detects the presence of a eight-character
alphanumeric pattern that matches the Drivers License Number – CA State format.
This data identifier provides two breadths of validation:
■ The wide breadth detects an eight-character alphanumeric pattern beginning with a letter
followed by seven numerals.
See “Driver's License Number – CA State wide breadth” on page 1133.
■ The medium breadth validates a detected number against keywords.
See “Driver's License Number – CA State medium breadth” on page 1134.

Driver's License Number – CA State wide breadth

The wide breadth detects an eight-character alphanumeric pattern beginning with a letter
followed by seven numerals.

Note: This breadth option does not include any validators.

Library of system data identifiers 1134
Driver's License Number - FL, MI, MN States

Table 45-248 Driver's License Number wide-breadth pattern

Pattern

\l\d{7}

Driver's License Number – CA State medium breadth

The medium breadth detects an eight-character alphanumeric pattern beginning with a letter
followed by seven numerals. It validates a detected number by requiring a driver's license
keyword AND a California-related keyword.

Table 45-249 Driver's License Number – CA State medium-breadth pattern

Pattern

\l\d{7}

Table 45-250 Driver's License Number – CA State medium-breadth validators

Mandatory validators Description

Find keywords With this option selected, at least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

driver license, drivers license, driver's license, driver licenses, drivers licenses,
driver's licenses, dl#, dls#, lic#, lics#

Find keywords With this option selected, at least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

ca, calif, california

Driver's License Number - FL, MI, MN States

The driver's license numbers for the states of Florida (FL), Michigan (MI), and Minnesota (MN)
are the identifiers an individual's driver's license issued by one of those US states. These
states are grouped together because they share a common pattern for this number.
This data identifier detects a 13-character alphanumeric pattern that matches the Driver's
License Number - FL, MI, MN States format.
This data identifier provides two breadths of validation:
Library of system data identifiers 1135
Driver's License Number - FL, MI, MN States

■ The wide breadth detects any 13-character alphanumeric pattern with a letter followed by
12 numbers.
See “Driver's License Number- FL, MI, MN States wide breadth” on page 1135.
■ The medium breadth narrows the scope by requiring the presence keywords.
See “Driver's License Number- FL, MI, MN States medium breadth” on page 1135.

Driver's License Number- FL, MI, MN States wide breadth

The wide breadth of this data identifier detects any 13-character string with a letter followed
by 12 numbers.
For the MN license number, the following format is matched: L-DDD-DDD-DDD-DDD.

Note: This breadth option does not include any validators.

Table 45-251 Driver's License Number- FL, MI, MN States wide-breadth patterns

Patterns

\l \d{3} \d{3} \d{3} \d{3}

\l\d{12}

\l\d{3}-\d{3}-\d{2}-\d{3}-\d

\l-\d{3}-\d{3}-\d{3}-\d{3}

Driver's License Number- FL, MI, MN States medium breadth

The medium breadth of this data identifier implements patters to detect any 13-character string
with a letter followed by 12 numbers. For the MN license number, the following format is
matched: L-DDD-DDD-DDD-DDD.
This data identifier validates the number by requiring the presence of a drivers license keyword
AND a state-related keyword.

Table 45-252 Driver's License Number- FL, MI, MN States medium-breadth patterns

Patterns

\l \d{3} \d{3} \d{3} \d{3}

\l\d{12}

\l\d{3}-\d{3}-\d{2}-\d{3}-\d
Library of system data identifiers 1136
Driver's License Number - IL State

Table 45-252 Driver's License Number- FL, MI, MN States medium-breadth patterns
(continued)

Patterns

\l-\d{3}-\d{3}-\d{3}-\d{3}

Table 45-253 Driver's License Number- FL, MI, MN States medium-breadth validators

Mandator validators Description

Find keywords Requires at least one of the input keywords or key phrases to be present for the
data to be matched.

Inputs:

driver license, drivers license, driver's license, driver licenses, drivers

licenses, driver's licenses, dl#, dls#, lic#, lics#

Find keywords Requires at least one of the input keywords or key phrases to be present for the
data to be matched.

Inputs:

fla, fl, florida, michigan, mi, minnesota, mn

Driver's License Number - IL State

The Illinois (IL) state driver's license number is a 12-character alphanumeric string that identifies
an individual's driver's license issued by the US state of Illinois.
The Driver's License Number - IL State data identifier detects a 12-character alphanumeric
pattern that matches the Driver's License Number - IL State format.
This data identifier provides two breadths of validation:
■ The wide breadth detects the presence of a 12-character alphanumeric pattern without
validation.
See “Driver's License Number- IL State wide breadth” on page 1136.
■ The medium breadth narrows the scope by requiring the presence of keywords.
See “Driver's License Number- IL State medium breadth” on page 1137.

Driver's License Number- IL State wide breadth

The wide breadth detects a 12-character alphanumeric pattern, beginning with a letter (the
first letter of the person's last name) followed by 11 numbers.
Library of system data identifiers 1137
Driver's License Number - IL State

Note: This breadth option does not include any validators.

Table 45-254 Driver's License Number- IL State wide-breadth patterns

Patterns

\l\d{3}-\d{4}-\d{4}

\l\d{11}

Driver's License Number- IL State medium breadth

The medium breadth detects a 12-character string, beginning with a letter (the first letter of
the person's last name) followed by 11 numbers.
This breadth also requires the presence of both a driver's license keyword AND an
Illinois-related keyword.

Table 45-255 Driver's License Number- IL State medium-breadth patterns

Patterns

\l\d{3}-\d{4}-\d{4}

\l\d{11}

Table 45-256 Driver's License Number- IL State medium-breadth validators

Mandatory validators Description

Find keywords Requires at least one of the input keywords or key phrases
to be present for the data to be matched.

Inputs:

driver license, drivers license, driver's license, driver

licenses, drivers licenses, driver's licenses, dl#, dls#,
lic#, lics#

Find keywords Requires at least one of the input keywords or key phrases
to be present for the data to be matched.
Inputs:

il, illinois
Library of system data identifiers 1138
Driver's License Number - NJ State

Driver's License Number - NJ State

The New Jersey (NJ) state driver's license number is a 15-character alphanumeric pattern
that identifies an individual's driver's license issued by the US state of New Jersey.
The Driver's License Number - NJ State detects a 15-character alphanumeric pattern that
matches the Driver's License Number - NJ State format.
This data identifier provides two breadths of validation:
■ The wide breadth detects a 15-character alphanumeric pattern without validation.
See “Driver's License Number- NJ State wide breadth” on page 1138.
■ The medium breadth narrows the scope by requiring the presence of related keywords.
See “Driver's License Number- NJ State medium breadth” on page 1138.

Driver's License Number- NJ State wide breadth

The wide breadth detects a 15-character alphanumeric pattern, beginning with a letter (the
first letter of the person's last name) followed by 14 numbers.

Note: The wide breadth option does not include any validators.

Table 45-257 Driver's License Number- NJ State wide-breadth patterns

Patterns

\l\d{4} \d{5} \d{5}

\l\d{14}

Driver's License Number- NJ State medium breadth

The medium breadth detects a 15-character alphanumeric pattern, beginning with a letter (the
first letter of the person's last name) followed by 14 numbers.
This breadth also requires the presence of both a driver's license keyword AND a New
Jersey-related keyword.

Table 45-258 Driver's License Number- NJ State medium-breadth patterns

Patterns

\l\d{3}-\d{4}-\d{4}

\l\d{11}
Library of system data identifiers 1139
Driver's License Number - NY State

Table 45-259 Driver's License Number- NJ State medium-breadth validators

Mandatory validators Description

Find keywords Requires at least one of the input keywords or key phrases
to be present for the data to be matched.

Inputs:

driver license, drivers license, driver's license, driver

licenses, drivers licenses, driver's licenses, dl#, dls#,
lic#, lics#

Find keywords Requires at least one of the input keywords or key phrases
to be present for the data to be matched.

Inputs:

nj, new jersey, newjersey

Driver's License Number - NY State

The New York (NY) state driver's license number is a nine-digit identifier for an individual's
driver's license issued by the US state of New York.
The Driver's License Number - NY State data identifier detects a nine-digit number that matches
the Driver's License Number - NY State format.
The data identifier detects the presence of a New York driver's license number.
This data identifier provides two breadths of validation:
■ The wide breadth detects a string of nine digits without validation.
See “Driver's License Number- NY State wide breadth” on page 1139.
■ The medium breadth narrows the scope by requiring the presence of related keywords.
See “Driver's License Number- NJ State medium breadth” on page 1138.

Driver's License Number- NY State wide breadth

The wide breadth detects a nine-digit string without validation.

Note: The wide breadth option does not include any validators.

Table 45-260 Driver's License Number- NY State wide-breadth patterns

Patterns

\d{3} \d{3} \d{3}

Library of system data identifiers 1140
Driver's License Number - WA State

Table 45-260 Driver's License Number- NY State wide-breadth patterns (continued)

Patterns

\d{9}

Driver's License Number - NY State medium breadth

The medium breadth detects a nine-digit number.
This breadth also requires the presence of both a driver's license keyword AND a New
York–related keyword.

Table 45-261 Driver's License Number- NY State medium-breadth patterns

Patterns

\d{3} \d{3} \d{3}

\d{9}

Table 45-262 Driver's License Number- NY State medium-breadth validators

Mandatory validators Description

Find keywords Requires at least one of the input keywords or key phrases to be present for the
data to be matched.

Inputs:

driver license, drivers license, driver's license, driver licenses, drivers

licenses, driver's licenses, dl#, dls#, lic#, lics#

Find keywords Requires at least one of the input keywords or key phrases to be present for the
data to be matched.

Inputs:

new york, ny, newyork

Driver's License Number - WA State

Identification number for an individual's driver's license issued by the US state of Washington.
The Driver's License Number - WA State data identifier detects alphanumeric patterns that
match the Driver's License Number - WA State format.
The Driver's License Number - WA State data identifier provides three breadths of detection.
■ The wide breadth detects a Washington State driver's license with no validation.
Library of system data identifiers 1141
Driver's License Number - WA State

See “ Driver's License Number - WA State wide breadth” on page 1141.

■ The medium breadth detects a Washington State driver's license with checksum validation.
See “Driver's License Number - WA State medium breadth” on page 1141.
■ The narrow breadth detects a Washington State driver's license with checksum validation.
It also requires the presence of related keywords.
See “Driver's License Number - WA State narrow breadth” on page 1142.

Driver's License Number - WA State wide breadth

The wide breadth detects a Washington State driver's license with no validation.

Table 45-263 Driver's License Number - WA State wide-breadth patterns

Pattern

\l{5}\l[A-Za-z*]\d{3}\w{2}

\l{4}[*]\l[A-Za-z*]\d{3}\w{2}

\l{3}[*]{2}\l[A-Za-z*]\d{3}\w{2}

\l{2}[*]{3}\l[A-Za-z*]\d{3}\w{2}

\l{1}[*]{4}\l[A-Za-z*]\d{3}\w{2}

The wide breadth of the Driver's License Number - WA State data identifier does not include
a validator.

Driver's License Number - WA State medium breadth

The medium breadth detects a Washington State driver's license with checksum validation.

Table 45-264 Driver's License Number - WA State medium-breadth patterns

Pattern

\l{5}\l[A-Za-z*]\d{3}\w{2}

\l{4}[*]\l[A-Za-z*]\d{3}\w{2}

\l{3}[*]{2}\l[A-Za-z*]\d{3}\w{2}

\l{2}[*]{3}\l[A-Za-z*]\d{3}\w{2}

\l{1}[*]{4}\l[A-Za-z*]\d{3}\w{2}
Library of system data identifiers 1142
Driver's License Number - WI State

Table 45-265 Driver's License Number - WA State medium-breadth validators

Mandatory validator Description

Driver's License Number - WA State Validation Check Computes the checksum and validates the pattern against
it.

Driver's License Number - WA State narrow breadth

The narrow breadth detects a Washington State driver's license with checksum validation. It
also requires the presence of related keywords.

Table 45-266 Driver's License Number - WA State narrow-breadth patterns

Pattern

\l{5}\l[A-Za-z*]\d{3}\w{2}

\l{4}[*]\l[A-Za-z*]\d{3}\w{2}

\l{3}[*]{2}\l[A-Za-z*]\d{3}\w{2}

\l{2}[*]{3}\l[A-Za-z*]\d{3}\w{2}

\l{1}[*]{4}\l[A-Za-z*]\d{3}\w{2}

Table 45-267 Driver's License Number - WA State narrow-breadth validators

Mandatory validator Description

Driver's License Number - WA State Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

driver license, drivers license, driver licenses, drivers

licenses, dl#, dls#, lic#, lics#, wash, washington, wa

Driver's License Number - WI State

The Driver's License Number - WI State is an identification number for an individual driver's
license issued by the US state of Wisconsin.
Library of system data identifiers 1143
Driver's License Number - WI State

The Driver's License Number - WI State data identifier detects a 13-digit number that matches
the Driver's License Number - WI State format.
The Driver's License Number - WI State data identifier provides three breadths of detection.
■ The wide breadth detects a 13-digit number with ending-character exclusion validation.
See “ Driver's License Number - WI State wide breadth” on page 1143.
■ The wide breadth detects a 13-digit number with ending-character exclusion and checksum
validation.
See “Driver's License Number - WI State medium breadth” on page 1143.
■ The wide breadth detects a 13-digit number with ending-character exclusion and checksum
validation. It also requires the presence of related keywords.
See “Driver's License Number - WI State narrow breadth” on page 1144.

Driver's License Number - WI State wide breadth

The wide breadth detects a 13-digit number with ending-character exclusion validation.

Table 45-268 Driver's License Number - WI Statewide-breadth patterns

Pattern

\l\d{3}-\d{4}-\d{4}-\d{2}

\l\d{13}

Table 45-269 Driver's License Number - WI State wide-breadth validator

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000000000, 1111111111111, 2222222222222,

3333333333333, 4444444444444, 5555555555555,
6666666666666, 7777777777777, 8888888888888,
9999999999999

Driver's License Number - WI State medium breadth

The wide breadth detects a 13-digit number with ending-character exclusion and checksum
validation.
Library of system data identifiers 1144
Driver's License Number - WI State

Table 45-270 Driver's License Number - WI State medium-breadth patterns

Pattern

\l\d{3}-\d{4}-\d{4}-\d{2}

\l\d{13}

Table 45-271 Driver's License Number - WI State medium-breadth validators

Mandatory validator Description

Driver's License Number - WI State Validation Check Computes the checksum and validates the pattern against
it.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000000000, 1111111111111, 2222222222222,

3333333333333, 4444444444444, 5555555555555,
6666666666666, 7777777777777, 8888888888888,
9999999999999

Driver's License Number - WI State narrow breadth

The wide breadth detects a 13-digit number with ending-character exclusion and checksum
validation. It also requires the presence of related keywords.

Table 45-272 Driver's License Number - WI State narrow-breadth patterns

Pattern

\l\d{3}-\d{4}-\d{4}-\d{2}

\l\d{13}

Table 45-273 Driver's License Number - WI State narrow-breadth validators

Mandatory validator Description

Driver's License Number - WI State Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1145
Drug Enforcement Agency (DEA) Number

Table 45-273 Driver's License Number - WI State narrow-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000000000, 1111111111111, 2222222222222,

3333333333333, 4444444444444, 5555555555555,
6666666666666, 7777777777777, 8888888888888,
9999999999999

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

driver license, drivers license, driver licenses, drivers

licenses, dl#, dls#, lic#, lics#, wisc., wisconsin, wi

Drug Enforcement Agency (DEA) Number

A DEA number is a number assigned to a health care provider (such as a medical practitioner,
dentist, or veterinarian) by the U.S. Drug Enforcement Administration allowing them to write
prescriptions for controlled substances.
The Drug Enforcement Agency (DEA) Number data identifier detects an eight- or nine-character
alphanumeric pattern that matches the Drug Enforcement Agency (DEA) Number format.
The Drug Enforcement Agency (DEA) Number data identifier provides three breadths of
detection:
■ The wide breadth detects an eight- or nine-character alphanumeric pattern without validation.
See “ Drug Enforcement Agency (DEA) Number wide breadth” on page 1145.
■ The medium breadth detects an eight- or nine-character alphanumeric pattern with ending
character exclusion and checksum validation.
See “Drug Enforcement Agency (DEA) Number medium breadth” on page 1146.
■ The narrow breadth detects an eight- or nine-character alphanumeric pattern with ending
character exclusion and checksum validation. It also requires the presence of related
keywords.
See “Drug Enforcement Agency (DEA) Number narrow breadth” on page 1146.

Drug Enforcement Agency (DEA) Number wide breadth

The wide breadth detects an eight- or nine-character alphanumeric pattern without validation.
Library of system data identifiers 1146
Drug Enforcement Agency (DEA) Number

Table 45-274 Drug Enforcement Agency (DEA) Number wide-breadth patterns

Pattern

[ABFGMPR]\l\d{7}

[ABFGMPR]\d{8}

The wide breadth of the Drug Enforcement Agency (DEA) Number data identifier includes no
validators.

Drug Enforcement Agency (DEA) Number medium breadth

The medium breadth detects an eight- or nine-character alphanumeric pattern with ending
character exclusion and checksum validation.

Table 45-275 Drug Enforcement Agency (DEA) Number medium-breadth patterns

Pattern

[ABFGMPR]\l\d{7}

[ABFGMPR]\d{8}

Table 45-276 Drug Enforcement Agency (DEA) Number medium-breadth validators

Mandatory validator Description

Drug Enforcement Agency Number Validation Check Computes the checksum and validates the pattern against
it.

Exclude ending characters Data ending with any of the following list of values is not
matched:

5555555, 55555555

Drug Enforcement Agency (DEA) Number narrow breadth

The narrow breadth detects an eight- or nine-character alphanumeric pattern with ending
character exclusion and checksum validation. It also requires the presence of related keywords.

Table 45-277 Drug Enforcement Agency (DEA) Number narrow-breadth patterns

Pattern

[ABFGMPR]\l\d{7}

[ABFGMPR]\d{8}
Library of system data identifiers 1147
Estonia Driver's Licence Number

Table 45-278 Drug Enforcement Agency (DEA) Number narrow-breadth validators

Mandatory validator Description

Drug Enforcement Agency Number Validation Check Computes the checksum and validates the pattern against
it.

Exclude ending characters Data ending with any of the following list of values is not
matched:

5555555, 55555555

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

dea number, DEA, DEA no., DEA Registration Number,

DEA registration no., DEA#, DEA No#, Drug
Enforcement Agency Number, Drug Enforcement
Agency No.

Estonia Driver's Licence Number

The Estonian Road Administration issues driving licenses in Estonia, confirming the rights of
the holder to drive motor vehicles.
The Estonia Driver's Licence Number data identifier detects an eight-character alphanumeric
pattern that matches the Estonia Driver's Licence Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern that matches the Estonia
Driver's Licence Number format. It checks for common test patterns.
See “Estonia Driver's Licence Number wide breadth” on page 1147.
■ The narrow breadth detects an eight-character alphanumeric pattern that matches the
Estonia Driver's Licence Number format. It checks for common test patterns, and also
requires the presence of related keywords.
See “Estonia Driver's Licence Number narrow breadth” on page 1148.

Estonia Driver's Licence Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern that matches the Estonia
Driver's Licence Number format. It checks for common test patterns.
Library of system data identifiers 1148
Estonia Driver's Licence Number

Table 45-279 Estonia Driver's Licence Number wide-breadth patterns

Pattern

[Ee][A-Za-z]\d{6}

Table 45-280 Estonia Driver's Licence Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

Estonia Driver's Licence Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern that matches the Estonia
Driver's Licence Number format. It checks for common test patterns, and also requires the
presence of related keywords.

Table 45-281 Estonia Driver's Licence Number narrow-breadth patterns

Pattern

[Ee][A-Za-z]\d{6}

Table 45-282 Estonia Driver's Licence Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999
Library of system data identifiers 1149
Estonia Passport Number

Table 45-282 Estonia Driver's Licence Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

driver license, driver licence, drivers license, drivers

licence, driving license, driving licence, driver license
number, driver licence number, drivers license number,
drivers licence number, driving license number, driving
licence number, driver's license, driver's licence,
Driver's License, Driver's Licence, driver's license
number, driver's licence number, Driver's License
Number, Driver's Licence Number, DLNo#, dlno#,
drivers lic., driver permit, drivers permit, driving permit,
license number, licence number, licence

juhiluba, JUHILUBA, juhiluba number, juhiloa number,

Juhiluba, juhi litsentsi number

Estonia Passport Number

The Estonian passport is an international travel document issued to citizens of Estonia that
also serves as proof of Estonian citizenship. The Border Guard Board in Estonia and Estonian
foreign representations abroad are responsible for issuing Estonian passports.
The Estonia Passport Number data identifier detects an eight- or nine-character alphanumeric
pattern that matches the Estonia Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight- or nine-character alphanumeric pattern that matches
the Estonia Passport Number format. It checks for common test patterns.
See “Estonia Passport Number wide breadth” on page 1149.
■ The narrow breadth detects an eight- or nine-character alphanumeric pattern that matches
the Estonia Passport Number format. It checks for common test patterns, and also requires
the presence of related keywords.
See “Estonia Passport Number narrow breadth” on page 1150.

Estonia Passport Number wide breadth

The wide breadth detects an eight- or nine-character alphanumeric pattern that matches the
Estonia Passport Number format. It checks for common test patterns.
Library of system data identifiers 1150
Estonia Passport Number

Table 45-283 Estonia Passport Number wide-breadth patterns

Pattern

[Kk][A-Za-z]\d{7}

[Kk]\d{7}

[Vv][A-Za-z]\d{7}

[Vv]\d{7}

Table 45-284 Estonia Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Estonia Passport Number narrow breadth

The narrow breadth detects an eight- or nine-character alphanumeric pattern that matches
the Estonia Passport Number format. It checks for common test patterns, and also requires
the presence of related keywords.

Table 45-285 Estonia Passport Number narrow-breadth patterns

Pattern

[Kk][A-Za-z]\d{7}

[Kk]\d{7}

[Vv][A-Za-z]\d{7}

[Vv]\d{7}

Table 45-286 Estonia Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1151
Estonia Personal Identification Code

Table 45-286 Estonia Passport Number narrow-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

Passport, passport number, passport, passport no,

passport#, passportno, Passport No., Passport No,
PASSPORT, Pass, pass, passi number, pass nr, pass#,
Pass nr, Eesti passi number

Estonia Personal Identification Code

In Estonia, the personal identification code is a number based on the sex and birth date of a
person. This code is used as a unique personal identifier by governmental and other systems
where identification is required, as well as for digital signatures using the national identity card
and its associated certificates. It also serves as tax identification number.
The Estonia Personal Identification Code data identifier detects an 11-digit number that matches
the Estonia Personal Identification Code format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-digit number that matches the Estonia Personal Identification
Code format without checksum validation. It checks for common test numbers.
See “Estonia Personal Identification Code wide breadth” on page 1152.
■ The medium breadth detects an 11-digit number that matches the Estonia Personal
Identification Code format with checksum validation.
See “Estonia Personal Identification Code medium breadth” on page 1152.
■ The narrow breadth detects an 11-digit number that matches the Estonia Personal
Identification Code format with checksum validation. It also requires the presence of related
keywords.
See “Estonia Personal Identification Code narrow breadth” on page 1153.
Library of system data identifiers 1152
Estonia Personal Identification Code

Estonia Personal Identification Code wide breadth

The wide breadth detects an 11-digit number that matches the Estonia Personal Identification
Code format without checksum validation. It checks for common test numbers.

Table 45-287 Estonia Personal Identification Code wide-breadth patterns

Pattern

\d{3}[01]\d[0123]\d{5}

\d \d{2}[01]\d[0123]\d \d{4}

\d \d{2}[01]\d[0123]\d{4} \d

Table 45-288 Estonia Personal Identification Code wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of numbers is not all the same.

Estonia Personal Identification Code medium breadth

The medium breadth detects an 11-digit number that matches the Estonia Personal Identification
Code format with checksum validation.

Table 45-289 Estonia Personal Identification Code medium-breadth patterns

Pattern

\d{3}[01]\d[0123]\d{5}

\d \d{2}[01]\d[0123]\d \d{4}

\d \d{2}[01]\d[0123]\d{4} \d

Table 45-290 Estonia Personal Identification Code medium-breadth validators

Mandatory validator Description

Estonia Personal Identification Number Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1153
Estonia Value Added Tax (VAT) Number

Estonia Personal Identification Code narrow breadth

The narrow breadth detects an 11-digit number that matches the Estonia Personal Identification
Code format with checksum validation. It checks for common test numbers, and also requires
the presence of related keywords.

Table 45-291 Estonia Personal Identification Code narrow-breadth patterns

Pattern

\d{3}[01]\d[0123]\d{5}

\d \d{2}[01]\d[0123]\d \d{4}

\d \d{2}[01]\d[0123]\d{4} \d

Table 45-292 Estonia Personal Identification Code narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of numbers is not all the same.

Estonia Personal Identification Number Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

national ID, national identification number, personal

ID, personal identification number, nationalid#,
personalid#, isikukood, isikukood#, IK, IK#, personal
identification code, PID#, maksu ID,
maksukohustuslase identifitseerimisnumber,
maksukood, tax id, tax number, tax identification
number, tax code, maksukood#, maksuID#, taxpayer
id, taxpayer identification number, maksumaksja kood,
maksumaksja identifitseerimisnumber

Estonia Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Estonia, VAT is
administered by tax office for the region in which the business is established.
Library of system data identifiers 1154
Estonia Value Added Tax (VAT) Number

The Estonia Value Added Tax (VAT) Number data identifier detects an 11-character
alphanumeric pattern that matches the Estonia VAT Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-character alphanumeric pattern that matches the Estonia
VAT Number format without checksum validation. It checks for common test patterns.
See “Estonia Value Added Tax (VAT) Number wide breadth” on page 1154.
■ The medium breadth detects an 11-character alphanumeric pattern that matches the Estonia
VAT Number format with checksum validation.
See “Estonia Value Added Tax (VAT) Number medium breadth” on page 1154.
■ The narrow breadth detects an 11-character alphanumeric pattern that matches the Estonia
VAT Number format with checksum validation. It checks for common test patterns, and
also requires the presence of related keywords.
See “Estonia Value Added Tax (VAT) Number narrow breadth” on page 1155.

Estonia Value Added Tax (VAT) Number wide breadth

The wide breadth detects an 11-character alphanumeric pattern that matches the Estonia VAT
Number format without checksum validation. It checks for common test patterns.

Table 45-293 Estonia Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Ee][Ee]\d{9}

[Ee][Ee] \d{9}

Table 45-294 Estonia Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

Estonia Value Added Tax (VAT) Number medium breadth

The medium breadth detects an 11-character alphanumeric pattern that matches the Estonia
VAT Number format with checksum validation.
Library of system data identifiers 1155
Estonia Value Added Tax (VAT) Number

Table 45-295 Estonia Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Ee][Ee]\d{9}

[Ee][Ee] \d{9}

Table 45-296 Estonia Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Estonia Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.

Estonia Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects an 11-character alphanumeric pattern that matches the Estonia
VAT Number format with checksum validation. It checks for common test patterns, and also
requires the presence of related keywords.

Table 45-297 Estonia Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Ee][Ee]\d{9}

[Ee][Ee] \d{9}

Table 45-298 Estonia Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

Estonia Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.
Library of system data identifiers 1156
European Health Insurance Card Number

Table 45-298 Estonia Value Added Tax (VAT) Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat, vat number, vat#, käibemaksu

registreerimisnumber, käibemaksu, value added tax
number, Käibemaksu number, käibemaks, käibemaks#,
käibemaksu#, vat registration number

European Health Insurance Card Number

The European Health Insurance Card (EHIC) allows anyone insured by or covered by a
statutory social security scheme of the European Economic Area countries and Switzerland
to receive medical treatment in another member state free or at a reduced cost.
The European Health Insurance Card Number data identifier detects a 20-digit number that
matches the European Health Insurance Card Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 20-digit number that matches the European Health Insurance
Card Number format. It checks for common test numbers.
See “European Health Insurance Card Number wide breadth” on page 1156.
■ The narrow breadth detects a 20-digit number that matches the European Health Insurance
Card Number format. It checks for common test numbers, and also requires the presence
of related keywords.
See “European Health Insurance Card Number narrow breadth” on page 1160.

European Health Insurance Card Number wide breadth

The wide breadth detects a 20-digit number that matches the European Health Insurance Card
Number format. It checks for common test numbers.

Table 45-299 European Health Insurance Card Number wide-breadth patterns

Pattern

80040\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80826\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

38500\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d
Library of system data identifiers 1157
European Health Insurance Card Number

Table 45-299 European Health Insurance Card Number wide-breadth patterns (continued)

Pattern

80203\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

60189\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80246\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80276\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80300\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80021\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80380\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80440\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80442\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

30066\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80620\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80703\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80724\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80752\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80756\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80616\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

Table 45-300 European Health Insurance Card Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1158
European Health Insurance Card Number

Table 45-300 European Health Insurance Card Number wide-breadth validators (continued)

Mandatory validator Description

Exclude ending characters

Library of system data identifiers 1159
European Health Insurance Card Number

Table 45-300 European Health Insurance Card Number wide-breadth validators (continued)

Mandatory validator Description

Data ending with any of the following list of values is not

matched:

80040000000000000000, 80040111111111111111,
80040222222222222222, 80040333333333333333,
80040444444444444444, 80040555555555555555,
80040666666666666666, 80040777777777777777,
80040888888888888888, 80040999999999999999

80826000000000000000, 80826111111111111111,
80826222222222222222, 80826333333333333333,
80826444444444444444, 80826555555555555555,
80826666666666666666, 80826777777777777777,
80826888888888888888, 80826999999999999999

38500000000000000000, 38500111111111111111,
38500222222222222222, 38500333333333333333,
38500444444444444444, 38500555555555555555,
38500666666666666666, 38500777777777777777,
38500888888888888888, 38500999999999999999

80203000000000000000, 80203111111111111111,
80203222222222222222, 80203333333333333333,
80203444444444444444, 80203555555555555555,
80203666666666666666, 80203777777777777777,
80203888888888888888, 80203999999999999999

60189000000000000000, 60189111111111111111,
60189222222222222222, 60189333333333333333,
60189444444444444444, 60189555555555555555,
60189666666666666666, 60189777777777777777,
60189888888888888888, 60189999999999999999

80246000000000000000, 80246111111111111111,
80246222222222222222, 80246333333333333333,
80246444444444444444, 80246555555555555555,
80246666666666666666, 80246777777777777777,
80246888888888888888, 80246999999999999999

80276000000000000000, 80276111111111111111,
80276222222222222222, 80276333333333333333,
80276444444444444444, 80276555555555555555,
80276666666666666666, 80276777777777777777,
80276888888888888888, 80276999999999999999

80300000000000000000, 80300111111111111111,
80300222222222222222, 80300333333333333333,
Library of system data identifiers 1160
European Health Insurance Card Number

Table 45-300 European Health Insurance Card Number wide-breadth validators (continued)

Mandatory validator Description

80300444444444444444, 80300555555555555555,
80300666666666666666, 80300777777777777777,
80300888888888888888, 80300999999999999999

80021000000000000000, 80021111111111111111,
80021222222222222222, 80021333333333333333,
80021444444444444444, 80021555555555555555,
80021666666666666666, 80021777777777777777,
80021888888888888888, 80021999999999999999

80380000000000000000, 80380111111111111111,
80380222222222222222, 80380333333333333333,
80380444444444444444, 80380555555555555555,
80380666666666666666, 80380777777777777777,
80380888888888888888, 80380999999999999999

80440000000000000000, 80440111111111111111,
80440222222222222222, 80440333333333333333,
80440444444444444444, 80440555555555555555,
80440666666666666666, 80440777777777777777,
8440888888888888888, 80440999999999999999

80442000000000000000, 80442111111111111111,
80442222222222222222, 80442333333333333333,
80442444444444444444, 80442555555555555555,
80442666666666666666, 80442777777777777777,
80442888888888888888, 80442999999999999999

30066000000000000000, 30066111111111111111,
30066222222222222222, 30066333333333333333,
30066444444444444444, 30066555555555555555,
30066666666666666666, 30066777777777777777,
30066888888888888888, 30066999999999999999

European Health Insurance Card Number narrow breadth

The narrow breadth detects a 20-digit number that matches the European Health Insurance
Card Number format. It checks for common test numbers, and also requires the presence of
related keywords.

Table 45-301 European Health Insurance Card Number narrow-breadth patterns

Pattern

80040\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d
Library of system data identifiers 1161
European Health Insurance Card Number

Table 45-301 European Health Insurance Card Number narrow-breadth patterns (continued)

Pattern

80826\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

38500\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80203\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

60189\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80246\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80276\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80300\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80021\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80380\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80440\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80442\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

30066\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80620\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80703\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80724\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80752\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80756\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

80616\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

Table 45-302 European Health Insurance Card Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1162
European Health Insurance Card Number

Table 45-302 European Health Insurance Card Number narrow-breadth validators (continued)

Mandatory validator Description

Exclude ending characters

Library of system data identifiers 1163
European Health Insurance Card Number

Table 45-302 European Health Insurance Card Number narrow-breadth validators (continued)

Mandatory validator Description

Data ending with any of the following list of values is not

matched:

80300000000000000000, 80300111111111111111,
80300222222222222222, 80300333333333333333,
Library of system data identifiers 1164
European Health Insurance Card Number

Table 45-302 European Health Insurance Card Number narrow-breadth validators (continued)

Mandatory validator Description

80300444444444444444, 80300555555555555555,
80300666666666666666, 80300777777777777777,
80300888888888888888, 80300999999999999999

Table 45-302 European Health Insurance Card Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

medical account number, health insurance card

number, insurance card number, health card, health
card number, ehic number, ehic, ehic#, numero conto
medico, tessera sanitaria assicurazione numero, carta
assicurazione numero, Krankenversicherungsnummer,
assicurazione sanitaria numero, medisch
rekeningnummer, ziekteverzekeringskaartnummer,
verzekerings kaart nummer, gezondheidskaart
nummer, gezondheidskaart, medizinische
Kontonummer, Krankenversicherungskarte Nummer,
Versicherungsnummer, Gesundheitskarte Nummer,
Gesundheitskarte, arstliku konto number,
ravikindlustuse kaardi number, tervisekaart,
tervisekaardi number, Uimhir ehic, tarjeta salud, broj
kartice zdravstvenog osiguranja, kartice osiguranja
broj, zdravstvenu karticu, zdravstvene kartice broj,
ehic broj, numero tessera sanitaria, numero carta di
assicurazione, tessera sanitaria, numero ehic,
Gesondheetskaart, ehic nummer, numer rachunku
medycznego, numer karty ubezpieczenia zdrowotne,
numer karty ubezpieczenia, karta zdrowia, numer karty
zdrowia, numer ehic, sairausvakuutuskortin numero,
vakuutuskortin numero, terveyskortti, terveyskortin
numero, medicinsk kontonummer, ehic numeris,
medizinescher Konto Nummer, zdravstvena izkaznica

Finland Driver's Licence Number

The Finland Driver's License Number is the 10-character alphanumeric pattern that identifies
an individual Finnish driver's license.
The Finland Driver's Licence Number data identifier detects a 10-character alphanumeric
pattern that matches the Finland Driver's Licence Number format.
The Finland Driver's Licence Number data identifier offers three breadths of detection:
■ The wide breadth detects a 10-character alphanumeric pattern without checksum validation.
See “Finland Driver's Licence Number wide breadth” on page 1166.
■ The medium breadth detects a 10-character alphanumeric pattern with checksum validation.
Library of system data identifiers 1166
Finland Driver's Licence Number

See “Finland Driver's Licence Number medium breadth” on page 1166.

■ The narrow breadth detects a 10-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
See “Finland Driver's Licence Number narrow breadth” on page 1166.

Finland Driver's Licence Number wide breadth

The wide breadth detects a 10-character alphanumeric pattern without checksum validation.

Table 45-303 Finland Driver's Licence Number wide-breadth patterns

Patterns

\d{6}-\d{4}

\d{6}-\d{3}\l

Table 45-304 Finland Driver's Licence Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Finland Driver's Licence Number medium breadth

The medium breadth detects a 10-character alphanumeric pattern with checksum validation.

Table 45-305 Finland Driver's Licence Number medium-breadth patterns

Patterns

\d{6}-\d{4}

\d{6}-\d{3}\l

Table 45-306 Finland Driver's Licence Number medium-breadth validator

Mandatory validator Description

Finland Driver's Licence Number Validation Check Computes the checksum and validates the pattern against
it.

Finland Driver's Licence Number narrow breadth

The narrow breadth detects a 10-character alphanumeric pattern with checksum validation. It
also requires the presence of related keywords.
Library of system data identifiers 1167
Finland European Health Insurance Number

Table 45-307 Finland Driver's Licence Number narrow-breadth patterns

Patterns

\d{6}-\d{4}

\d{6}-\d{3}\l

Table 45-308 Finland Driver's Licence Number narrow-breadth validators

Mandatory validators Description

Finland Driver's Licence Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

driver license, driver license number, drivers lic.,

drivers license, drivers license number, driving license
number, DLNo#, dlno#, driving license

permis de conduire, ajokortti, ajokortin numero,

kuljettaja lic., körkort, körkort nummer, förare lic.

Finland European Health Insurance Number

The Finland European Health Insurance Number is a unique 20-digit numeric identifier that is
assigned to every person who uses health services in Finland.
The Finland European Health Insurance Number data identifier detects a 20-digit number that
matches the Finland European Health Insurance Number format.
The Finland European Health Insurance Number data identifier provides two breadths of
detection:
■ The wide breadth detects a 20-digit number without checksum validation.
See “Finland European Health Insurance Number wide breadth” on page 1168.
■ The narrow breadth detects a 20-digit number without checksum validation. It requires the
presence of related keywords.
See “Finland European Health Insurance Number narrow breadth” on page 1168.
Library of system data identifiers 1168
Finland European Health Insurance Number

Finland European Health Insurance Number wide breadth

The wide breadth detects a 20-digit number without checksum validation.

Table 45-309 Finland European Health Insurance Number wide-breadth patterns

Patterns

8024680246\d{10}

8024680246[- ]\d{10}

Table 45-310 Finland European Health Insurance Number wide-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

80246802460000000000, 80246802461111111111,
80246802462222222222, 80246802463333333333,
80246802464444444444, 80246802465555555555,
80246802466666666666, 80246802467777777777,
80246802468888888888, 80246802469999999999

Finland European Health Insurance Number narrow breadth

The narrow breadth detects a 20-digit number without checksum validation. It requires the
presence of related keywords.

Table 45-311 Finland European Health Insurance Number narrow-breadth patterns

Patterns

8024680246\d{10}

8024680246[- ]\d{10}

Table 45-312 Finland European Health Insurance Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1169
Finland Passport Number

Table 45-312 Finland European Health Insurance Number narrow-breadth validators

(continued)

Mandatory validators Description

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Suomi EHIC-numero, health insurance card,

Sairausvakuutuskortti, sairaanhoitokortin,
Sjukförsäkringskort, ehic, sairaanhoitokortin, Finland
health insurance card, Suomen sairausvakuutuskortti,
Finska sjukförsäkringskort, health card number,
Terveyskortti, Hälsokort, health card,
FinlandEHICNumber#, ehic#, EHIC,
sairausvakuutusnumero, health insurance number,
sjukförsäkring nummer, EHIC#

Finland Passport Number

Finnish passports are issued to nationals of Finland for the purpose of international travel.
They also facilitate the process of securing assistance from Finnish consular officials abroad.
The Finland Passport Number data identifier detects a nine-digit alphanumeric pattern that
matches the Finland Passport Number format.
The Finland Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects a nine-digit alphanumeric pattern without checksum validation.
See “Finland Passport Number wide breadth” on page 1170.
■ The narrow breadth detects a nine-digit alphanumeric pattern without checksum validation.
It requires the presence of related keywords.
See “Finland Passport Number narrow breadth” on page 1170.
Library of system data identifiers 1170
Finland Passport Number

Finland Passport Number wide breadth

The wide breadth detects a nine-digit alphanumeric pattern without checksum validation.

Table 45-313 Finland Passport Number wide-breadth pattern

Pattern

[A-Za-z]{2}\d{7}

Table 45-314 Finland Passport Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Finland Passport Number narrow breadth

The narrow breadth detects a nine-digit alphanumeric pattern without checksum validation. It
requires the presence of related keywords.

Table 45-315 Finland Passport Number narrow-breadth pattern

Pattern

[A-Za-z]{2}\d{7}

Table 45-316 Finland Passport Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1171
Finland Tax Identification Number

Table 45-316 Finland Passport Number narrow-breadth validators (continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

finland passport number, finland passport no., finland

passport no#, finland passport#, finland passport
number#

Suomen passin numero, suomalainen passi, passin

numero, passin numero.#, passin numero#

passport number, passport no., passport no#,

passport#, passport number#

passin numero, passin numero., passin numero#,

passi#

Finland Tax Identification Number

Finland issues a tax identification number for persons who have obligations to declare taxes
in Finland.
The Finland Tax Identification Number data identifier detects an 8- or 11-character alphanumeric
pattern that matches the Finland Tax Identification Number format.
The Finland Tax Identification Number provides three breadths of detection:
■ The wide breadth detects an 8- or 11-character alphanumeric pattern without checksum
validation.
See “Finland Tax Identification Number wide breadth” on page 1171.
■ The medium breadth detects an 8- or 11-character alphanumeric pattern with checksum
validation.
See “Finland Tax Identification Number medium breadth” on page 1172.
■ The narrow breadth detects an 8- or 11-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.
See “Finland Tax Identification Number narrow breadth” on page 1172.

Finland Tax Identification Number wide breadth

The wide breadth detects an 8- or 11-character alphanumeric pattern without checksum
validation.
Library of system data identifiers 1172
Finland Tax Identification Number

Table 45-317 Finland Tax Identification Number wide-breadth patterns

Patterns

\d{6}[Aa+-]\d{3}\w

\d{7}[-]\d

Table 45-318 Finland Tax Identification Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Finland Tax Identification Number medium breadth

The medium breadth detects an 8- or 11-character alphanumeric pattern with checksum
validation.

Table 45-319 Finland Tax Identification Number medium-breadth patterns

Patterns

\d{6}[Aa+-]\d{3}\w

\d{7}[-]\d

Table 45-320 Finland Tax Identification Number medium-breadth validator

Mandatory validator Description

Finland Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Finland Tax Identification Number narrow breadth

The narrow breadth detects an 8- or 11-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.

Table 45-321 Finland Tax Identification Number narrow-breadth patterns

Patterns

\d{6}[Aa+-]\d{3}\w

\d{7}[-]\d
Library of system data identifiers 1173
Finland Value Added Tax (VAT) Number

Table 45-322 Finland Tax Identification Number narrow breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Finland Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

tax identification number, tax number, tax id, taxid#,

taxnumber#

verotunniste, verokortti, verotunnus, veronumero

Finland Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process.
The Finland Value Added Tax (VAT) Number data identifier detects a 10-character alphanumeric
pattern that matches the Finland Value Added Tax (VAT) Number format.
The Finland Value Added Tax (VAT) Number data identifier provides three breadths of detection:
■ The wide breadth detects a 10-character alphanumeric pattern beginning with FI without
checksum validation.
See “Finland Value Added Tax (VAT) Number wide breadth” on page 1173.
■ The medium breadth detects a 10-character alphanumeric pattern beginning with FI with
checksum validation.
See “Finland Value Added Tax (VAT) Number medium breadth” on page 1174.
■ The narrow breadth detects a 10-character alphanumeric pattern beginning with FI with
checksum validation. It also requires the presence of related keywords.
See “Finland Value Added Tax (VAT) Number narrow breadth” on page 1175.

Finland Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 10-character alphanumeric pattern beginning with FI without
checksum validation.
Library of system data identifiers 1174
Finland Value Added Tax (VAT) Number

Table 45-323 Finland Value Added Tax (VAT) Number wide-breadth patterns

Patterns

[Ff][Ii]\d{8}

[Ff][Ii] \d{8}

[Ff][Ii]\d{7}-\d

[Ff][Ii] \d{7}-\d

Table 45-324 Finland Value Added Tax (VAT) Number wide-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Finland Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 10-character alphanumeric pattern beginning with FI with
checksum validation.

Table 45-325 Finland Value Added Tax (VAT) Number medium-breadth patterns

Patterns

[Ff][Ii]\d{8}

[Ff][Ii] \d{8}

[Ff][Ii]\d{7}-\d

[Ff][Ii] \d{7}-\d

Table 45-326 Finland Value Added Tax (VAT) Number medium-breadth validator

Mandatory validator Description

Finland VAT Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1175
Finnish Personal Identification Number

Finland Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 10-character alphanumeric pattern beginning with FI with
checksum validation. It also requires the presence of related keywords.

Table 45-327 Finland Value Added Tax (VAT) Number narrow-breadth patterns

Patterns

[Ff][Ii]\d{8}

[Ff][Ii] \d{8}

[Ff][Ii]\d{7}-\d

[Ff][Ii] \d{7}-\d

Table 45-328 Finland Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Finland VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

vat, vat number, vat#

arvonlisäveronumero, ARVONLISÄVERO, ALV,

arvonlisäverotunniste, ALV nro, ALV numero, alv

Finnish Personal Identification Number

The Finnish Personal Identification Number or Personal Identity Code is a unique personal
identifier used for identifying citizens in government and many other transactions.
The Finnish Personal Identification Number data identifier detects an alphanumeric pattern
that matches the Finnish Personal Identification Number format.
Library of system data identifiers 1176
Finnish Personal Identification Number

The Finnish Personal Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects a Finnish Personal Identification Number without validation.
See “ Finnish Personal Identification Number wide breadth” on page 1176.
■ The medium breadth detects a Finnish Personal Identification Number with checksum
validation.
See “Finnish Personal Identification Number medium breadth” on page 1176.
■ The narrow breadth detects a Finnish Personal Identification Number with checksum
validation. It also requires the presence of related keywords.
See “Finnish Personal Identification Number narrow breadth” on page 1176.

Finnish Personal Identification Number wide breadth

The wide breadth detects a Finnish Personal Identification Number without validation.

Table 45-329 Finnish Personal Identification Number wide-breadth pattern

Pattern

\d{6}[-+Aa]\d{3}\w

The wide breadth of the Finnish Personal Identification Number wide breadth includes no
validators.

Finnish Personal Identification Number medium breadth

The medium breadth detects a Finnish Personal Identification Number with checksum validation.

Table 45-330 Finnish Personal Identification Number medium-breadth pattern

Pattern

\d{6}[-+Aa]\d{3}\w

Table 45-331 Finnish Personal Identification Number medium-breadth validators

Mandatory validator Description

Finnish Personal Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Finnish Personal Identification Number narrow breadth

The narrow breadth detects a Finnish Personal Identification Number with checksum validation.
It also requires the presence of related keywords.
Library of system data identifiers 1177
France Driver's License Number

Table 45-332 Finnish Personal Identification Number narrow-breadth pattern

Pattern

\d{6}[-+Aa]\d{3}\w

Table 45-333 Finnish Personal Identification Number narrow-breadth validators

Mandatory validator Description

Finnish Personal Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

identification number, personal ID, identity number,

Finnish national ID number, personalIDnumber#,
National Identification Number, id number, National id
no., National id number, id no

tunnistenumero, henkilötunnus, yksilöllinen

henkilökohtainen tunnistenumero, Ainutlaatuinen
henkilökohtainen tunnus, identiteetti numero, Suomen
kansallinen henkilötunnus, henkilötunnusnumero#,
kansallisen tunnistenumero, tunnusnumero,
kansallinen tunnus numero

France Driver's License Number

The France Driver's License Number is the 12-digit identifier for an individual's driver's licence
issued by the Driver and Vehicle Licensing Agency of France.
The France Driver's License Number data identifier detects a 12-digit number that matches
the France Driver's License Number format.
The France Driver's License Number data identifier provides two breadths of detection:
■ The wide breadth detects a 12-digit number without checksum validation.
See “France Driver's License Number wide breadth” on page 1178.
■ The narrow breadth detects a 12-digit number without checksum validation. It also requires
the presence of related keywords.
See “France Driver's License Number narrow breadth” on page 1178.
Library of system data identifiers 1178
France Driver's License Number

France Driver's License Number wide breadth

The wide breadth detects a 12-digit number without checksum validation.

Table 45-334 France Driver's License Number wide-breadth pattern

Pattern

\d{12}

Table 45-335 France Driver's License Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number Delimiter Validates a match by checking the surrounding characters.

France Driver's License Number narrow breadth

The narrow breadth detects a 12-digit number without checksum validation. It also requires
the presence of related keywords.

Table 45-336 France Driver's License Number narrow-breadth pattern

Pattern

\d{12}

Table 45-337 France Driver's License Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1179
France Health Insurance Number

Table 45-337 France Driver's License Number narrow-breadth validators (continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

drivers licence number, drivers license number, driving

licence number, driving license number

permis de conduire

licence number, license number, licence numbers,

license numbers, drivers license, driving licence,
driving license, DL#, dl#, DLNO#, dlno#, Driver License,
Driver License Number, Drivers Lic., Drivers Licence,
Driver's License, Driver's License Number, driver's
license number, Driver's Licence Number

France Health Insurance Number

A Carte Vitale is social insurance card used in France that contains medical information for
the card holder. It has a unique 21-digit serial number.
The France Health Insurance Number data identifier detects a 21-digit number that matches
the France Health Insurance Number format.
The France Health Insurance Number data identifier provides two breadths of detection:
■ The wide breadth detects a 21-digit number without checksum validation.
See “France Health Insurance Number wide breadth” on page 1179.
■ The narrow breadth detects a 21-digit number without checksum validation. It also requires
the presence of related keywords.
See “France Health Insurance Number narrow breadth” on page 1180.

France Health Insurance Number wide breadth

The wide breadth detects a 21-character number without checksum validation.

Table 45-338 France Health Insurance Number wide-breadth patterns

Pattern

\d{10} \d{10} \d
Library of system data identifiers 1180
France Health Insurance Number

Table 45-338 France Health Insurance Number wide-breadth patterns (continued)

Pattern

\d{21}

Table 45-339 France Health Insurance Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

France Health Insurance Number narrow breadth

The narrow breadth detects a 21-character number without checksum validation. It also requires
the presence of related keywords.

Table 45-340 France Health Insurance Number narrow-breadth patterns

Pattern

\d{10} \d{10} \d

\d{21}

Table 45-341 France Health Insurance Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

insurance card, social insurance card,

carte vitale, carte d'assuré social

Library of system data identifiers 1181
France Tax Identification Number

France Tax Identification Number

France issue a tax identification number for anyone who has obligations to declare taxes in
France.
The France Tax Identification Number data identifier detects a 13-digit number that matches
the France Tax Identification Number format.
The France Tax Identification Number data identifier provides two breadths of detection:
■ The wide breadth detects a 13-digit number without checksum validation.
See “France Tax Identification Number wide breadth” on page 1181.
■ The narrow breadth detects a 13-digit number without checksum validation. It also requires
the presence of related keywords.
See “France Tax Identification Number narrow breadth” on page 1181.

France Tax Identification Number wide breadth

The wide breadth detects a 13-digit number without checksum validation.

Table 45-342 France Tax Identification Number wide-breadth patterns

Patterns

[0123]\d{12}

[0123]\d{1} \d{2} \d{3} \d{3} \d{3}

Table 45-343 France Tax Identification Number wide-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

France Tax Identification Number narrow breadth

The narrow breadth detects a 13-digit number without checksum validation. It also requires
the presence of related keywords.

Table 45-344 France Tax Identification Number narrow-breadth patterns

Patterns

[0123]\d{12}
Library of system data identifiers 1182
France Value Added Tax (VAT) Number

Table 45-344 France Tax Identification Number narrow-breadth patterns (continued)

Patterns

[0123]\d{1} \d{2} \d{3} \d{3} \d{3}

Table 45-345 France Tax Identification Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number Delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

tax identification number, tax number, tax id

numéro d'identification fiscale

France Value Added Tax (VAT) Number

The Value Added Tax (VAT) is a tax levied on goods and services provided in France and is
collected from the final customer. Companies must register with the Register of Commerce
and Companies in France to get a VAT number allocated.
The France Value Added Tax (VAT) Number data identifier detects a 13-character alphanumeric
pattern that matches the France Value Added Tax (VAT) Number format.
The France Value Added Tax (VAT) Number data identifier provides three breadths of detection:
■ The wide breadth detects a 13-character alphanumeric pattern without checksum validation.
See “France Value Added Tax (VAT) Number wide breadth” on page 1182.
■ The medium breadth detects a 13-character alphanumeric pattern with checksum validation.
See “France Value Added Tax (VAT) Number medium breadth” on page 1183.
■ The narrow breadth detects a 13-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
See “France Value Added Tax (VAT) Number narrow breadth” on page 1184.

France Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 13-character alphanumeric pattern without checksum validation.
Library of system data identifiers 1183
France Value Added Tax (VAT) Number

Table 45-346 France Value Added Tax (VAT) Number wide-breadth patterns

Patterns

[Ff][Rr][0-9A-Za-z]{2}\d{9}

[Ff][Rr][0-9A-Za-z]{2} \d{9}

[Ff][Rr] [0-9A-Za-z]{2}\d{9}

[Ff][Rr]-[0-9A-Za-z]{2}\d{9}

[Ff][Rr][0-9A-Za-z]{2} \d{3}-\d{3}-\d{3}

[Ff][Rr][0-9A-Za-z]{2} \d{3}.\d{3}.\d{3}

[Ff][Rr][0-9A-Za-z]{2} \d{3},\d{3},\d{3}

[Ff][Rr][0-9A-Za-z]{2} \d{3} \d{3} \d{3}

Table 45-347 France Value Added Tax (VAT) Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

France Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 13-character alphanumeric pattern with checksum validation.

Table 45-348 France Value Added Tax (VAT) Number medium breadth patterns

Patterns

[Ff][Rr][0-9A-Za-z]{2}\d{9}

[Ff][Rr][0-9A-Za-z]{2} \d{9}

[Ff][Rr] [0-9A-Za-z]{2}\d{9}

[Ff][Rr]-[0-9A-Za-z]{2}\d{9}

[Ff][Rr][0-9A-Za-z]{2} \d{3}-\d{3}-\d{3}

[Ff][Rr][0-9A-Za-z]{2} \d{3}.\d{3}.\d{3}

[Ff][Rr][0-9A-Za-z]{2} \d{3},\d{3},\d{3}

[Ff][Rr][0-9A-Za-z]{2} \d{3} \d{3} \d{3}

Library of system data identifiers 1184
France Value Added Tax (VAT) Number

Table 45-349 France Value Added Tax (VAT) Number medium breadth validators

Mandatory validator Description

France VAT Number Validation Check Checksum validator for the France Value Added Tax (VAT
Number.

France Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 13-character alphanumeric pattern with checksum validation. It
also requires the presence of related keywords.

Table 45-350 France Value Added Tax (VAT) Number narrow-breadth patterns

Patterns

[Ff][Rr][0-9A-Za-z]{2}\d{9}

[Ff][Rr][0-9A-Za-z]{2} \d{9}

[Ff][Rr] [0-9A-Za-z]{2}\d{9}

[Ff][Rr]-[0-9A-Za-z]{2}\d{9}

[Ff][Rr][0-9A-Za-z]{2} \d{3}-\d{3}-\d{3}

[Ff][Rr][0-9A-Za-z]{2} \d{3}.\d{3}.\d{3}

[Ff][Rr][0-9A-Za-z]{2} \d{3},\d{3},\d{3}

[Ff][Rr][0-9A-Za-z]{2} \d{3} \d{3} \d{3}

Table 45-351 France Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

France VAT Number Validation Check Checksum validator for the France Value Added Tax (VAT
Number.
Library of system data identifiers 1185
French INSEE Code

Table 45-351 France Value Added Tax (VAT) Number narrow-breadth validators (continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

france vat number, French vat number, VAT Number,

vat no, VAT#, value added tax number, value added
tax, SIREN identification no

Numéro d'identification taxe sur valeur ajoutée,

Numéro taxe valeur ajoutée, taxe valeur ajoutée, Taxe
sur la valeur ajoutée, Numéro de TVA
intracommunautaire, n° TVA,numéro de TVA, Numéro
de TVA en France, français numéro de TVA, Numéro
d'identification SIREN

French INSEE Code

The INSEE code in France is used as a social insurance number, a national identification
number, and for taxation and employment purposes.
The French INSEE Code data identifier detects a 15-digit number that matches the French
INSEE Code format.
The French INSEE Code data identifier detects the presence of INSEE numbers.
The French INSEE Code data identifier provides two breadths of detection:
■ The wide breadth detects a 15-digit number that passes checksum validation.
■ The narrow breadth detects a 15-digit number that passes checksum validation. It also
requires the presence of related keywords.

French INSEE Code wide breadth

The wide breadth detects a 15-digit number which encodes the date of birth, department of
origin, commune of origin, and an order number. A space delimiter after the first 13 digits is
optional. The last two digits of the INSEE code encode a control key used to validate a
checksum.
Library of system data identifiers 1186
French INSEE Code

Table 45-352 French INSEE Code wide-breadth patterns

Patterns

\d{13} \d{2}

d{15}

Table 45-353 French INSEE Code wide-breadth validator

Mandatory validator Description

INSEE Control Key This validator computes the INSEE control key and compares it to the last 2 digits
of the pattern.

French INSEE Code narrow breadth

The narrow breadth detects a 15-digit number which encodes the date of birth, department of
origin, commune of origin, and an order number. A space delimiter after the first 13 digits is
optional. The last two digits of the INSEE code encode a control key used to validate a
checksum. It also requires the presence of related keywords.

Table 45-354 French INSEE Code narrow-breadth patterns

Pattern

\d{13} \d{2}

d{15}

Table 45-355 French INSEE Code narrow-breadth validators

Mandatory validator Description

INSEE Control Key This validator computes the INSEE control key and
compares it to the last 2 digits of the pattern.

Find keywords With this option selected, at least one of the

following keywords or key phrases must be present
for the data to be matched.

Inputs:

INSEE, numéro de sécu, code sécu

social security number, social security code

Library of system data identifiers 1187
French Passport Number

French Passport Number

The French passport is an identity document issued to French citizens. Besides enabling the
bearer to travel internationally and serving as indication of French citizenship, the passport
facilitates the process of securing assistance from French consular officials abroad or other
European Union member states in case a French consular is absent, if needed.
The French Passport Number data identifier detects a nine-character alphanumeric pattern
that matches the French Passport Number format.
The French Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects a nine-character alphanumeric pattern without checksum
validation.
See “French Passport Number wide breadth” on page 1187.
■ The narrow breadth detects a nine-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.
See “French Passport Number narrow breadth” on page 1187.

French Passport Number wide breadth

The wide breadth detects a nine-character alphanumeric pattern without checksum validation.

Table 45-356 French Passport Number wide-breadth pattern

Pattern

\d\d[A-Za-z][A-za-z]\d\d\d\d\d

Table 45-357 French Passport Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

French Passport Number narrow breadth

The narrow breadth detects a nine-character alphanumeric pattern without checksum validation.
It also requires the presence of related keywords.

Table 45-358 French Passport Number narrow-breadth pattern

Pattern

\d\d[A-Za-z][A-za-z]\d\d\d\d\d
Library of system data identifiers 1188
French Social Security Number

Table 45-359 French Passport Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

passport, Passport, French Passport, french passport,

Passport Card, Passport Book, passport card, passport
book, passport number, passport no, Passport Number

Passeport français, Passeport, Passeport livre,

Passeport carte, numéro passeport

French Social Security Number

The French Social Security Number (FSSN) is a unique number assigned to each French
citizen or resident foreign national. It serves as a national identification number.
The French Social Security Number data identifier detects a 15-character alphanumeric pattern
that matches the French Social Security Number format.
The French Social Security Number system data identifier provides three breadths of detection:
■ The wide breadth detects a 15-character alphanumeric pattern without checksum validation.
See “French Social Security Number wide breadth” on page 1188.
■ The medium breadth detects a 15-character alphanumeric pattern with checksum validation.
See “French Social Security Number medium breadth” on page 1189.
■ The narrow breadth detects a 15-character alphanumeric pattern that passes checksum
validation. It also requires the presence of related keywords.
See “French Social Security Number narrow breadth” on page 1189.

French Social Security Number wide breadth

The wide breadth detects a 15-character alphanumeric pattern without checksum validation.

Table 45-360 French Social Security Number wide-breadth pattern

Pattern

[12]\d{2}[012]\d{2}[AB1234567890]\d{8}
Library of system data identifiers 1189
French Social Security Number

Table 45-361 French Social Security Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

French Social Security Number medium breadth

The medium breadth detects a 15-character alphanumeric pattern with checksum validation.

Table 45-362 French Social Security Number medium-breadth pattern

Pattern

[12]\d{2}[012]\d{2}[AB1234567890]\d{8}

Table 45-363 French Social Security Number medium-breadth validator

Mandatory validator Description

French Social Security Number Validation Check Computes the checksum and validates the pattern against
it.

French Social Security Number narrow breadth

The narrow breadth detects a 15-character alphanumeric pattern that passes checksum
validation. It also requires the presence of related keywords.

Table 45-364 French Social Security Number narrow-breadth pattern

Pattern

[12]\d{2}[012]\d{2}[AB1234567890]\d{8}

Table 45-365 French Social Security Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

French Social Security Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1190
German Passport Number

Table 45-365 French Social Security Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

French social security number, social security number,

FSSN#, SSN#, ssn, ssn#, socialsecuritynumber,
insurance number, national ID number, nationalid#

sécurité sociale non., sécurité sociale numéro, code

sécurité sociale, numéro d'assurance

German Passport Number

The German passport number is issued to German nationals for the purpose of international
travel. A German passport is an officially recognized document that German authorities accept
as proof of identity from German citizens.
The German Passport Number data identifier detects an 11-character alphanumeric pattern
the matches the German Passport Number format.
The German Passport Number system data identifier provides three breadths of detection:
■ The wide breadth detects an 11-character alphanumeric pattern ending with the letter "D"
without checksum validation.
See “German Passport Number wide breadth” on page 1190.
■ The medium breadth detects an 11-character alphanumeric pattern ending with the letter
"D" with checksum validation.
See “German Passport Number medium breadth” on page 1191.
■ The narrow breadth detects an 11-character alphanumeric pattern ending with the letter
"D" with checksum validation. It also requires the presence of related keywords.
See “German Passport Number narrow breadth” on page 1191.

German Passport Number wide breadth

The wide breadth detects an 11-character alphanumeric pattern ending with the letter "D"
without checksum validation.
Library of system data identifiers 1191
German Passport Number

Table 45-366 German Passport Number wide-breadth patterns

Patterns

\w{9}\dD

\w{10}[dD]

Table 45-367 German Passport Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

German Passport Number medium breadth

The medium breadth detects an 11-character alphanumeric pattern ending with the letter "D"
with checksum validation.

Table 45-368 German Passport Number medium-breadth patterns

Patterns

\w{9}\dD

\w{10}[dD]

Table 45-369 German Passport Number medium-breadth validator

Mandatory validator Description

German Passport Number Validation Check Computes the checksum every German Passport Number
must pass.

German Passport Number narrow breadth

The narrow breadth detects an 11-character alphanumeric pattern ending with the letter "D"
with checksum validation. It also requires the presence of related keywords.

Table 45-370 German Passport Number narrow-breadth patterns

Patterns

\w{9}\dD

\w{10}[dD]
Library of system data identifiers 1192
German Personal ID Number

Table 45-371 German Passport Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

German Passport Number Validation Check Computes the checksum every German Passport Number
must pass.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

German passport number, passport number, passport

no, passportno#, passportnumber#

Reisepass kein, Reisepass, Passnummer

German Personal ID Number

The German Personal ID Number is issued to all German citizens.
The German Personal ID Number data identifier detects an 11-character alphanumeric pattern
that matches the German Personal ID Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-character alphanumeric pattern ending with the letter "D"
without checksum validation.
See “German Personal ID Number wide breadth” on page 1192.
■ The medium breadth detects an 11-character alphanumeric pattern ending with the letter
"D" with checksum validation.
See “ German Personal ID Number medium breadth” on page 1193.
■ The narrow breadth detects an 11-character alphanumeric pattern ending with the letter
"D" with checksum validation. It also requires the presence of related keywords.
See “German Personal ID Number narrow breadth” on page 1193.

German Personal ID Number wide breadth

The wide breadth detects an 11-character alphanumeric pattern ending with the letter "D"
without checksum validation.

Table 45-372 German Personal ID Number wide-breadth pattern

Pattern

\w{9}\dD
Library of system data identifiers 1193
German Personal ID Number

Table 45-373 German Personal ID Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

German Personal ID Number medium breadth

The medium breadth detects an 11-character alphanumeric pattern ending with the letter "D"
with checksum validation.

Table 45-374 German Personal ID Number medium-breadth pattern

Pattern

\w{9}\dD

Table 45-375 German Personal ID Number medium breadth validator

Mandatory validator Description

German ID Number Validation Check Computes the checksum and validates the pattern against
it.

German Personal ID Number narrow breadth

The narrow breadth detects an 11-character alphanumeric pattern ending with the letter "D"
with checksum validation. It also requires the presence of related keywords.

Table 45-376 German Personal ID Number narrow-breadth pattern

Pattern

\w{9}\dD

Table 45-377 German Personal ID Number narrow-breadth validators

Mandatory validatora Description

Duplicate digits Ensures that a string of digits is not all the same.

German ID Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1194
Germany Driver's License Number

Table 45-377 German Personal ID Number narrow-breadth validators (continued)

Mandatory validatora Description

Find keywords If you select this option, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

ID number, identification number, personal ID number,

perosnal ID, GPID, GPID#, unique personal ID number,
unique personal ID, insurance number, identity number
German personal ID number

persönliche identifikationsnummer, ID-Nummer,

Deutsch persönliche-ID-Nummer, persönliche ID
Nummer, eindeutige ID-Nummer, persönliche Nummer,
identität nummer, Versicherungsnummer

Germany Driver's License Number

Identification number for an individual's driver's licence issued by the Driver and Vehicle
Licensing Agency of the Germany.
The Germany Driver's License Number data identifier detects a 13-character alphanumeric
pattern that matches the Germany Driver's License Number format.
The Germany Driver's License Number data identifier provides two breadths of detection:
■ The wide breadth detects a 13-character alphanumeric pattern without checksum validation.
See “Germany Driver's License Number wide breadth” on page 1194.
■ The narrow breadth detects a 13-character alphanumeric pattern without checksum
validation. It also requires the presence of related keywords.
See “Germany Driver's License Number narrow breadth” on page 1195.

Germany Driver's License Number wide breadth

The wide breadth detects a 13-character alphanumeric pattern without checksum validation.

Table 45-378 Germany Driver's License Number wide-breadth pattern

Pattern

\w\d{2}\w{6}\d\w
Library of system data identifiers 1195
Germany Driver's License Number

Table 45-379 Germany Driver's License Number wide-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Germany Driver's License Number narrow breadth

The narrow breadth detects a 13-character alphanumeric pattern without checksum validation.
It also requires the presence of related keywords.

Table 45-380 Germany Driver's License Number narrow-breadth patterns

Pattern

\w\d{2}\w{6}\d\w

Table 45-381 Germany Driver's License Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Führerschein, Fuhrerschein, Fuehrerschein,

Führerscheinnummer, Fuhrerscheinnummer,
Fuehrerscheinnummer, Führerscheinnummer,
Fuhrerscheinnummer, Fuehrerscheinnummer,
Führerschein- Nr, Fuhrerschein- Nr, Fuehrerschein-
Nr

Driver License, Driver License Number, driver license

number, Driver Licence, Drivers Lic., Drivers License,
Drivers Licence, Driver's License, Driver's License
Number, driver's license number, Driver's Licence
Number, Driving License number, driving license
number, DL#, dl#, DLNO#, dlno#, driving licence,
driving license
Library of system data identifiers 1196
Germany Value Added Tax (VAT) Number

Germany Value Added Tax (VAT) Number

The Value Added Tax (VAT) is a tax levied on goods and services provided in Germany and
is collected from the final customer.
The Germany Value Added Tax (VAT) Number data identifier detects an 11-character
alphanumeric pattern that matches the Germany Value Added Tax (VAT) Number format.
The Germany Value Added Tax (VAT) Number data identifier provides three breadths of
detection:
■ The wide breadth detects an 11-character alphanumeric pattern without checksum validation.
See “Germany Value Added Tax (VAT) Number wide breadth” on page 1196.
■ The medium breadth detects an 11-character alphanumeric pattern with checksum
validation.
See “Germany Value Added Tax (VAT) Number medium breadth” on page 1196.
■ The narrow breadth detects an 11-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
See “Germany Value Added Tax (VAT) Number narrow breadth” on page 1197.

Germany Value Added Tax (VAT) Number wide breadth

The wide breadth detects an 11-character alphanumeric pattern without checksum validation.

Table 45-382 Germany Value Added Tax (VAT) Number wide-breadth patterns

Patterns

[Dd][Ee]\d{9}

[Dd][Ee] \d{9}

[Dd][Ee]\d{3}[, ]\d{3}[, ]\d{3}

[Dd][Ee] \d{3}[, ]\d{3}[, ]\d{3}

Table 45-383 Germany Value Added Tax (VAT) Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Germany Value Added Tax (VAT) Number medium breadth

The medium breadth detects an 11-character alphanumeric pattern with checksum validation.
Library of system data identifiers 1197
Germany Value Added Tax (VAT) Number

Table 45-384 Germany Value Added Tax (VAT) Number medium-breadth patterns

Patterns

[Dd][Ee]\d{9}

[Dd][Ee] \d{9}

[Dd][Ee]\d{3}[, ]\d{3}[, ]\d{3}

[Dd][Ee] \d{3}[, ]\d{3}[, ]\d{3}

Table 45-385 Germany Value Added Tax (VAT) Number medium breadth validator

Germany VAT Number Validation Check Checksum validator for the Germany Value Added Tax
(VAT) Number.

Germany Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects an 11-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.

Table 45-386 Germany Value Added Tax (VAT) Number narrow-breadth patterns

Patterns

[Dd][Ee]\d{9}

[Dd][Ee] \d{9}

[Dd][Ee]\d{3}[, ]\d{3}[, ]\d{3}

[Dd][Ee] \d{3}[, ]\d{3}[, ]\d{3}

Table 45-387 Germany Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Germany VAT Number Validation Check Checksum validator for the Germany Value Added Tax
(VAT) Number.
Library of system data identifiers 1198
Germany Tax Identification Number

Table 45-387 Germany Value Added Tax (VAT) Number narrow-breadth validators (continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

VAT Number, vat no, vat number, VAT#, vat#

Mehrwertsteuer, MwSt, Mehrwertsteuer

Identifikationsnummer, Mehrwertsteuer nummer

Germany Tax Identification Number

Germany issues an 11-digit tax identification number for persons who have obligations to
declare taxes in Germany.
The Germany Tax Identification Number data identifier detects an 11-digit number that matches
the Germany Tax Identification Number format.
The Germany Tax Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “Germany Tax Identification Number wide breadth” on page 1198.
■ The medium breadth detects an 11-digit number with checksum validation.
See “Germany Tax Identification Number medium breadth” on page 1199.
■ The narrow breadth detects an 11-digit number with checksum validation. It also requires
the presence of related keywords.
See “Germany Tax Identification Number narrow breadth” on page 1199.

Germany Tax Identification Number wide breadth

The wide breadth detects an 11-digit number without checksum validation.

Table 45-388 Germany Tax Identification Number wide-breadth patterns

Patterns

\d{11}

\d{2} \d{3} \d{3} \d{3}

\d{2}-\d{3}-\d{3}-\d{3}

\d{2}.\d{3}.\d{3}.\d{3}
Library of system data identifiers 1199
Germany Tax Identification Number

Table 45-388 Germany Tax Identification Number wide-breadth patterns (continued)

Patterns

\d{2},\d{3},\d{3},\d{3}

Table 45-389 Germany Tax Identification Number wide-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Germany Tax Identification Number medium breadth

The medium breadth detects an 11-digit number with checksum validation.

Table 45-390 Germany Tax Identification Number medium-breadth patterns

Patterns

\d{11}

\d{2} \d{3} \d{3} \d{3}

\d{2}-\d{3}-\d{3}-\d{3}

\d{2}.\d{3}.\d{3}.\d{3}

\d{2},\d{3},\d{3},\d{3}

Table 45-391 Germany Tax Identification Number medium-breadth validator

Mandatory validator Description

Germany Tax Number Validation Check Computes the checksum and validates the pattern against
it.

Germany Tax Identification Number narrow breadth

The narrow breadth detects an 11-digit number with checksum validation. It also requires the
presence of related keywords.
Library of system data identifiers 1200
Greece Passport Number

Table 45-392 Germany Tax Identification Number narrow-breadth patterns

Patterns

\d{11}

\d{2} \d{3} \d{3} \d{3}

\d{2}-\d{3}-\d{3}-\d{3}

\d{2}.\d{3}.\d{3}.\d{3}

\d{2},\d{3},\d{3},\d{3}

Table 45-393 Germany Tax Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Germany Tax Number Validation Check Computes the checksum and validates the pattern against
it.

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

tin, tin number, tin no, tin#, german tax identification

number, germany tax identification number, tax
number, tax id

Zinn, Zinnnummer, Zinn Nr, Zinn#,

Steueridentifikationsnummer, Steuer
Identifikationsnummer, Steuernummer, Steuer ID,
Identifikationsnummer

Greece Passport Number

Greek passports are issued to Greek citizens for the purpose of international travel. The
passport along with the national identity card allows for free rights of movement and residence
in any of the states of the European Union and European Economic Area.
The Greece Passport Number data identifier detects a nine-character alphanumeric pattern
that matches the Greece Passport Number format.
This data identifier provides the following breadths of detection:
Library of system data identifiers 1201
Greece Passport Number

■ The wide breadth detects a nine-character alphanumeric pattern that matches the Greece
Passport Number format. It checks for common test patterns.
See “Greece Passport Number wide breadth” on page 1201.
■ The narrow breadth detects a nine-character alphanumeric pattern that matches the Greece
Passport Number format. It checks for common test patterns, and also requires the presence
of related keywords.
See “Greece Passport Number narrow breadth” on page 1201.

Greece Passport Number wide breadth

The wide breadth detects a nine-character alphanumeric pattern that matches the Greece
Passport Number format. It checks for common test patterns.

Table 45-394 Greece Passport Number wide-breadth patterns

Pattern

[a-zA-Z]{2}\d{7}

Table 45-395 Greece Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:
0000000, 1111111, 2222222, 3333333, 4444444,
5555555, 6666666, 7777777, 8888888, 9999999

Greece Passport Number narrow breadth

The narrow breadth detects a nine-character alphanumeric pattern that matches the Greece
Passport Number format. It checks for common test patterns, and also requires the presence
of related keywords.

Table 45-396 Greece Passport Number narrow-breadth patterns

Pattern

[a-zA-Z]{2}\d{7}
Library of system data identifiers 1202
Greece Social Security Number (AMKA)

Table 45-397 Greece Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

passport, passport number, passport no., passport

no, passport#, Passport No., PASSPORTΕ, λλάδα
pasport αριθμός, Greece passport no., Ελλάδα pasport
όχι., Ελλάδα Αριθμός Διαβατηρίου, διαβατήριο,
Διαβατήριο, ΕΛΛΑΔΑ ΔΙΑΒΑΤΗΡΙΟ, Ελλάδα
Διαβατήριο, ελλάδα διαβατήριο, Διαβατήριο Βιβλίο,
βιβλίο διαβατηρίου

Greece Social Security Number (AMKA)

The Greek social security number (AMKA) is the 11-digit work and insurance identification
number of every worker, retired person, and protected family member in Greece.
The Greece Social Security Number (AMKA) detects an 11-digit number that matches the
Greece Social Security Number (AMKA) format.
The Greece Social Security Number (AMKA) data identifier provides three breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “Greece Social Security Number (AMKA) wide breadth” on page 1202.
■ The medium breadth detects an 11-digit number with checksum validation.
See “Greece Social Security Number (AMKA) medium breadth” on page 1203.
■ The narrow breadth detects an 11-digit number with checksum validation. It also requires
the presence of related keywords.
See “Greece Social Security Number (AMKA) narrow breadth” on page 1203.

Greece Social Security Number (AMKA) wide breadth

The wide breadth detects an 11-digit number without checksum validation.
Library of system data identifiers 1203
Greece Social Security Number (AMKA)

Table 45-398 Greece Social Security Number (AMKA) wide-breadth pattern

Pattern

\d{11}

Table 45-399 Greece Social Security Number (AMKA) wide-breadth pattern

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Greece Social Security Number (AMKA) medium breadth

The medium breadth detects an 11-digit number with checksum validation.

Table 45-400 Greece Social Security Number (AMKA) medium-breadth pattern

Pattern

\d{11}

Table 45-401 Greece Social Security Number (AMKA) medium-breadth validator

Mandatory validator Description

Greece Social Security Number (AMKA) Computes the checksum and validates the pattern against
it.

Greece Social Security Number (AMKA) narrow breadth

The narrow breadth detects an 11-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-402 Greece Social Security Number (AMKA) narrow-breadth pattern

Pattern

\d{11}
Library of system data identifiers 1204
Greek Tax Identification Number

Table 45-403 Greece Social Security Number (AMKA) narrow-breadth validators

Mandatory validators Description

Greece Social Security Number (AMKA) Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

greece social security number, greece ssn, greece

ssn#, greece social security no., social security no.,
ssn#, amka, greece amka

Αριθμού Μητρώου Κοινωνικής Ασφάλισης

Greek Tax Identification Number

The Arithmo Forologiko Mitro (AFM) is a unique personal tax identification number assigned
to any individual resident in Greece or person who owns property in Greece.
The Greek Tax Identification Number data identifier detects a nine-digit number that matches
the Greek Tax Identification Number format.
The Greek Tax Identification Number system data identifier provides three breadths of detection:
■ The wide breadth detects a nine-digit number without checksum validation.
See “Greek Tax Identification Number wide breadth” on page 1204.
■ The medium breadth detects a nine-digit number with checksum validation.
See “Greek Tax Identification Number medium breadth” on page 1205.
■ The narrow breadth detects a nine-digit number that passes checksum validation. It also
requires the presence of related keywords.
See “Greek Tax Identification Number narrow breadth” on page 1205.

Greek Tax Identification Number wide breadth

The wide breadth detects a nine-digit number without checksum validation.
Library of system data identifiers 1205
Greek Tax Identification Number

Table 45-404 Greek Tax Identification Number wide-breadth pattern

Pattern

\d{9}

Table 45-405 Greek Tax Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Greek Tax Identification Number medium breadth

The medium breadth detects a nine-digit number with checksum validation.

Table 45-406 Greek Tax Identification Number medium-breadth pattern

Pattern

\d{9}

Table 45-407 Greek Tax Identification Number medium-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Greek Tax Identification Number Validation Check Computes Greek Tax Identification Number checksum
every Greek Tax Identification Number must pass.

Greek Tax Identification Number narrow breadth

The narrow breadth detects a nine-digit number that passes checksum validation. It also
requires the presence of related keywords.

Table 45-408 Greek Tax Identification Number narrow-breadth pattern

Pattern

\d{9}

Table 45-409 Greek Tax Identification Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.
Library of system data identifiers 1206
Greece Value Added Tax (VAT) Number

Table 45-409 Greek Tax Identification Number narrow-breadth validators (continued)

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Greek Tax Identification Number Validation Check Computes Greek Tax Identification Number checksum
every Greek Tax Identification Number must pass.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

AFM, TIN, tax ID No., Tax id no, tax identification

number, tax id no., Tax Registry Number, Tax Registry
No., AFM#, TIN#, Tax Identification Number, TaxIDNo#,
taxregistryno#

Αριθμός Φορολογικού Μητρώου, AΦΜ, AΦΜ αριθμός,

Φορολογικού Μητρώου Νο., τον αριθμό φορολογικού
μητρώου

Greece Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Greece, VAT is
administered by the VAT office for the region in which the business is established.
The Greece Value Added Tax (VAT) Number data identifier detects an 11-character
alphanumeric pattern that matches the Greece VAT Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-character alphanumeric pattern that matches the Greece
VAT Number format without checksum validation. It checks for common test patterns.
See “Greece Value Added Tax (VAT) Number wide breadth” on page 1207.
■ The medium breadth detects an 11-character alphanumeric pattern that matches the Greece
VAT Number format with checksum validation.
See “Greece Value Added Tax (VAT) Number medium breadth” on page 1207.
■ The narrow breadth detects an 11-character alphanumeric pattern that matches the Greece
VAT Number format with checksum validation. It checks for common test patterns, and
also requires the presence of related keywords.
See “Greece Value Added Tax (VAT) Number narrow breadth” on page 1208.
Library of system data identifiers 1207
Greece Value Added Tax (VAT) Number

Greece Value Added Tax (VAT) Number wide breadth

The wide breadth detects an 11-character alphanumeric pattern that matches the Greece VAT
Number format without checksum validation. It checks for common test patterns.

Table 45-410 Greece Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Ee][Ll]\d{9}

[Ee][Ll] \d{9}

Table 45-411 Greece Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

Greece Value Added Tax (VAT) Number medium breadth

The medium breadth detects an 11-character alphanumeric pattern that matches the Greece
VAT Number format with checksum validation.

Table 45-412 Greece Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Ee][Ll]\d{9}

[Ee][Ll] \d{9}

Table 45-413 Greece Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Greece VAT Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1208
Healthcare Common Procedure Coding System (HCPCS CPT Code)

Greece Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects an 11-character alphanumeric pattern that matches the Greece
VAT Number format with checksum validation. It checks for common test patterns, and also
requires the presence of related keywords.

Table 45-414 Greece Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Ee][Ll]\d{9}

[Ee][Ll] \d{9}

Table 45-415 Greece Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

Greece VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat number, value added tax, vat, VAT, VAT#, vat#,

FPA, fpa, VATIN, vatin, Foros Prostithemenis Axias,
arithmós dexamenís, Fóros Prostithémenis Axías,
μέγας κάδος, ΦΠΑ, Φ Π Α, Φόρος Προστιθέμενης
Αξίας, ΦΟΡΟΣ ΠΡΟΣΤΙΘΕΜΕΝΗΣ ΑΞΙΑΣ, φόρος
προστιθέμενης αξίας, Arithmos Forologikou Mitroou,
Α.Φ.Μ, ΑΦΜ

Healthcare Common Procedure Coding System

(HCPCS CPT Code)
The Healthcare Common Procedure Coding System (HCPCS) is a set of health care procedure
codes based on the American Medical Association's Current Procedural Terminology (CPT).
Library of system data identifiers 1209
Healthcare Common Procedure Coding System (HCPCS CPT Code)

The Healthcare Common Procedure Coding System (HCPCS CPT Code) data identifier detects
a two- or five-character alphanumeric pattern that matches the HCPCS CPT Code format.
Healthcare Common Procedure Coding System (HCPCS CPT Code) data identifier provides
two breadths of detection:
■ The medium breadth detects a two- or five-character alphanumeric pattern with checksum
validation.
See “Healthcare Common Procedure Coding System (HCPCS CPT Code) medium breadth”
on page 1209.
■ The narrow breadth detects a two- or five-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.
See “Healthcare Common Procedure Coding System (HCPCS CPT Code) narrow breadth”
on page 1210.

Healthcare Common Procedure Coding System (HCPCS CPT Code)

medium breadth
The medium breadth detects a two- or five-character alphanumeric pattern with checksum
validation.

Table 45-416 Healthcare Common Procedure Coding System (HCPCS CPT Code)
medium-breadth patterns

Patterns Patterns (continued)

[A][AD-KMO-Z1-9] [V][1-35-9P]

[B][ALOPRU] [X][EPSU]

[C][A-NPR-T] [Z][AB]

[D][A] [L]\d{4}

[E][1-4A-EJMPTXY] [A][04-9]\d{3}

[F][1-9A-CPX] [B][459][0-29]\d{2}

[G][1-9A-HJ-Z] [C][12589]\d{3}

[H][9A-Z] [E][0128]\d{3}

[J][1-4A-FW] [G][03689]\d{3}

[K][1-4A-Z] [H][0-2]0[0-5]\d

[Q][1-9C-HJ-NPSTW-Z] [J][0-37-9]\d{3}
Library of system data identifiers 1210
Healthcare Common Procedure Coding System (HCPCS CPT Code)

Table 45-416 Healthcare Common Procedure Coding System (HCPCS CPT Code)
medium-breadth patterns (continued)

Patterns Patterns (continued)

[QK]0 [K][0][0-14-9]\d{2}

[L][1CDLMR-T] [M]0[013][067][01456]

[M][2S] [P][2379][06][0-7]\d

[N][BRU] [Q][0-59][01459]\d{2}

[P][1-6A-DIL-OST] [R]007[056]

[R][A-EIRT] [S][0-589]\d{3}

[S][A-HJ-NQS-Z] [T][1245][0159][0-49]\d

[T][1-9AC-HJ-NP-W] [V][25][0-7]\d{2}

[U][1-9A-HJKNP-S]

Table 45-417 Healthcare Common Procedure Coding System (HCPCS CPT Code)
medium-breadth validator

Mandatory validator Description

HCPCS CPT Code Validation Check Computes the checksum and validates the pattern against
it.

Healthcare Common Procedure Coding System (HCPCS CPT Code)

narrow breadth
The narrow breadth detects a two- or five-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.

Table 45-418 Healthcare Common Procedure Coding System (HCPCS CPT Code)
narrow-breadth patterns

Patterns Patterns (continued)

[A][AD-KMO-Z1-9] [V][1-35-9P]

[B][ALOPRU] [X][EPSU]

[C][A-NPR-T] [Z][AB]

[D][A] [L]\d{4}
Library of system data identifiers 1211
Healthcare Common Procedure Coding System (HCPCS CPT Code)

Table 45-418 Healthcare Common Procedure Coding System (HCPCS CPT Code)
narrow-breadth patterns (continued)

Patterns Patterns (continued)

[E][1-4A-EJMPTXY] [A][04-9]\d{3}

[F][1-9A-CPX] [B][459][0-29]\d{2}

[G][1-9A-HJ-Z] [C][12589]\d{3}

[H][9A-Z] [E][0128]\d{3}

[J][1-4A-FW] [G][03689]\d{3}

[K][1-4A-Z] [H][0-2]0[0-5]\d

[Q][1-9C-HJ-NPSTW-Z] [J][0-37-9]\d{3}

[QK]0 [K][0][0-14-9]\d{2}

[L][1CDLMR-T] [M]0[013][067][01456]

[M][2S] [P][2379][06][0-7]\d

[N][BRU] [Q][0-59][01459]\d{2}

[P][1-6A-DIL-OST] [R]007[056]

[R][A-EIRT] [S][0-589]\d{3}

[S][A-HJ-NQS-Z] [T][1245][0159][0-49]\d

[T][1-9AC-HJ-NP-W] [V][25][0-7]\d{2}

[U][1-9A-HJKNP-S]

Table 45-419 Healthcare Common Procedure Coding System (HCPCS CPT Code)
narrow-breadth validators

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

hcpcs cpt code, HCPCS, hcpcs, cpt, CPT, healthcare

common procedure coding system, current procedural
terminology
Library of system data identifiers 1212
Health Insurance Claim Number

Table 45-419 Healthcare Common Procedure Coding System (HCPCS CPT Code)
narrow-breadth validators (continued)

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

HCPCS CPT Code Validation Check Computes the checksum and validates the pattern against
it.

Health Insurance Claim Number

The Health Insurance Claim Number (HICN) is assigned by the United States Social Security
Administration to an individual for the purpose of identifying them as a medicare beneficiary.
The Health Insurance Claim Number data identifier detects a 7- to 12-character alphanumeric
pattern that matches the Health Insurance Claim Number format.
The Health Insurance Claim Number data identifier provides three breadths of detection
■ The wide breadth detects a 7- to 12-character alphanumeric pattern without checksum
validation.
See “Health Insurance Claim Number wide breadth” on page 1212.
■ The medium breadth detects a 7- to 12-character alphanumeric pattern with checksum
validation.
See “Health Insurance Claim Number medium breadth” on page 1213.
■ The narrow breadth detects a 7- to 12-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.
See “Health Insurance Claim Number narrow breadth” on page 1214.

Health Insurance Claim Number wide breadth

The wide breadth detects a 7- to 12-character alphanumeric pattern without checksum
validation.

Table 45-420 Health Insurance Claim Number wide-breadth patterns

Patterns

[a-zA-Z]{1,3}-\d{6}

[a-zA-Z]{1,3}-[0-8]\d{2} \d{1}[1-9] \d{4}

[a-zA-Z]{1,3}-[0-8]\d{3}[1-9]\d{4}

[a-zA-Z]{1,3}-[0-8]\d{2}[1-9]\d{5}
Library of system data identifiers 1213
Health Insurance Claim Number

Table 45-420 Health Insurance Claim Number wide-breadth patterns (continued)

Patterns

[a-zA-Z]{1,3}-[0-8]\d{2}-\d{1}[1-9]-\d{4}

[a-zA-Z]{1,3}-[0-8]\d{2} [1-9]\d{1} \d{4}

[a-zA-Z]{1,3}-[0-8]\d{2}-[1-9]\d{1}-\d{4}

[0-8]\d{2} \d{1}[1-9] \d{4}-[a-zA-Z]{1,3}

[0-8]\d{3}[1-9]\d{4}-[a-zA-Z]{1,3}

[0-8]\d{2}[1-9]\d{5}-[a-zA-Z]{1,3}

[0-8]\d{2}-\d{1}[1-9]-\d{4}-[a-zA-Z]{1,3}

[0-8]\d{2} [1-9]\d{1} \d{4}-[a-zA-Z]{1,3}

[0-8]\d{2}-[1-9]\d{1}-\d{4}-[a-zA-Z]{1,3}

[0-8]\d{2}[1-9]\d{1}\d{4}-[a-zA-Z][0-9]

Table 45-421 Health Insurance Claim Number wide-breadth validator

Mandatory validator

Number delimiter Validates a match by checking the surrounding characters.

Health Insurance Claim Number medium breadth

The medium breadth detects a 7- to 12-character alphanumeric pattern with checksum
validation.

Table 45-422 Health Insurance Claim Number medium-breadth patterns

Patterns

[a-zA-Z]{1,3}-\d{6}

[a-zA-Z]{1,3}-[0-8]\d{2} \d{1}[1-9] \d{4}

[a-zA-Z]{1,3}-[0-8]\d{3}[1-9]\d{4}

[a-zA-Z]{1,3}-[0-8]\d{2}[1-9]\d{5}

[a-zA-Z]{1,3}-[0-8]\d{2}-\d{1}[1-9]-\d{4}

[a-zA-Z]{1,3}-[0-8]\d{2} [1-9]\d{1} \d{4}

Library of system data identifiers 1214
Health Insurance Claim Number

Table 45-422 Health Insurance Claim Number medium-breadth patterns (continued)

Patterns

[a-zA-Z]{1,3}-[0-8]\d{2}-[1-9]\d{1}-\d{4}

[0-8]\d{2} \d{1}[1-9] \d{4}-[a-zA-Z]{1,3}

[0-8]\d{3}[1-9]\d{4}-[a-zA-Z]{1,3}

[0-8]\d{2}[1-9]\d{5}-[a-zA-Z]{1,3}

[0-8]\d{2}-\d{1}[1-9]-\d{4}-[a-zA-Z]{1,3}

[0-8]\d{2} [1-9]\d{1} \d{4}-[a-zA-Z]{1,3}

[0-8]\d{2}-[1-9]\d{1}-\d{4}-[a-zA-Z]{1,3}

[0-8]\d{2}[1-9]\d{1}\d{4}-[a-zA-Z][0-9]

Table 45-423 Health Insurance Claim Number medium-breadth validator

Mandatory validator

Health Care Insurance Number Check Computes the checksum and validates the pattern against
it.

Health Insurance Claim Number narrow breadth

The narrow breadth detects a 7- to 12-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.

Table 45-424 Health Insurance Claim Number narrow-breadth patterns

Patterns

[a-zA-Z]{1,3}-\d{6}

[a-zA-Z]{1,3}-[0-8]\d{2} \d{1}[1-9] \d{4}

[a-zA-Z]{1,3}-[0-8]\d{3}[1-9]\d{4}

[a-zA-Z]{1,3}-[0-8]\d{2}[1-9]\d{5}

[a-zA-Z]{1,3}-[0-8]\d{2}-\d{1}[1-9]-\d{4}

[a-zA-Z]{1,3}-[0-8]\d{2} [1-9]\d{1} \d{4}

[a-zA-Z]{1,3}-[0-8]\d{2}-[1-9]\d{1}-\d{4}
Library of system data identifiers 1215
Hong Kong ID

Table 45-424 Health Insurance Claim Number narrow-breadth patterns (continued)

Patterns

[0-8]\d{2} \d{1}[1-9] \d{4}-[a-zA-Z]{1,3}

[0-8]\d{3}[1-9]\d{4}-[a-zA-Z]{1,3}

[0-8]\d{2}[1-9]\d{5}-[a-zA-Z]{1,3}

[0-8]\d{2}-\d{1}[1-9]-\d{4}-[a-zA-Z]{1,3}

[0-8]\d{2} [1-9]\d{1} \d{4}-[a-zA-Z]{1,3}

[0-8]\d{2}-[1-9]\d{1}-\d{4}-[a-zA-Z]{1,3}

[0-8]\d{2}[1-9]\d{1}\d{4}-[a-zA-Z][0-9]

Table 45-425 Health Insurance Claim Number narrow-breadth validators

Mandatory validators

Number delimiter Validates a match by checking the surrounding characters.

Health Care Insurance Number Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

health insurance claim number, hicn, hic number, hic

no, hic#, hic no., hicn#, hicno#

Hong Kong ID
The Hong Kong ID is the unique identifier for all residents of Hong Kong that appears on the
Hong Kong Identity Card.
The Hong Kong ID data identifier detects eight-character patterns that match the Hong Kong
ID format.
The Hong Kong ID data identifier provides two breadths of detection:
■ The wide breadth detects eight characters in the form LDDDDDD(D) or LDDDDDD(A). The
last character in the detected string is used to validate a checksum.
See “Hong Kong ID wide breadth” on page 1216.
Library of system data identifiers 1216
Hong Kong ID

■ The narrow breadth detects eight characters in the form LDDDDDD(D) or LDDDDDD(A).
The last character in the detected string is used to validate a checksum. It also requires
the presence of Hong Kong ID-related keywords.
See “Hong Kong ID narrow breadth” on page 1216.

Hong Kong ID wide breadth

The wide breadth detects eight characters in the form LDDDDDD(D) or LDDDDDD(A). The
last character in the detected string is used to validate a checksum.

Table 45-426 Hong Kong ID wide-breadth patterns

Patterns

[A-Za-z]\d{6}(\d)

[A-Za-z][A-Za-z]\d{6}(\d)

[A-Za-z]\d{6}(A)

[A-Za-z]\d{6}(a)

[A-Za-z][A-Za-z]\d{6}(A)

[A-Za-z][A-Za-z]\d{6}(a)

[A-Za-z]\d{7}

[A-Za-z][A-Za-z]\d{7}

[A-Za-z]\d{6}[Aa]

[A-Za-z][A-Za-z]\d{6}[Aa]

Table 45-427 Hong Kong ID wide-breadth validator

Mandatory validator Description

Hong Kong ID Computes the checksum and validates the pattern against it.

Hong Kong ID narrow breadth

The narrow breadth detects eight characters in the form LDDDDDD(D) or LDDDDDD(A). The
last character in the detected string is used to validate a checksum. It also requires the presence
of Hong Kong ID-related keywords.
Library of system data identifiers 1217
Hungary Driver's Licence Number

Table 45-428 Hong Kong ID narrow-breadth patterns

Patterns

[A-Za-z]\d{6}(\d)

[A-Za-z][A-Za-z]\d{6}(\d)

[A-Za-z]\d{6}(A)

[A-Za-z]\d{6}(a)

[A-Za-z][A-Za-z]\d{6}(A)

[A-Za-z][A-Za-z]\d{6}(a)

[A-Za-z]\d{7}

[A-Za-z][A-Za-z]\d{7}

[A-Za-z]\d{6}[Aa]

[A-Za-z][A-Za-z]\d{6}[Aa]

Table 45-429 Hong Kong ID narrow-breadth validators

Mandatory validators Description

Hong Kong ID Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

身份證,三顆星, Identity card, Hong Kong permanent

resident ID Card, HKID

Hungary Driver's Licence Number

A driving license in Hungary is a document issued by the Ministry of Economics and Transport,
confirming the rights of the holder to drive motor vehicles.
The Hungary Driver's Licence Number data identifier detects an eight-character alphanumeric
pattern that matches the Hungary Driver's Licence Number format.
This data identifier provides the following breadths of detection:
Library of system data identifiers 1218
Hungary Driver's Licence Number

■ The wide breadth detects an eight-character alphanumeric pattern that matches the Hungary
Driver's Licence Number format. It checks for common test patterns.
See “Hungary Driver's Licence Number wide breadth” on page 1218.
■ The narrow breadth detects an eight-character alphanumeric pattern that matches the
Hungary Driver's Licence Number format. It checks for common test patterns, and it requires
the presence of related keywords.
See “Hungary Driver's Licence Number narrow breadth” on page 1218.

Hungary Driver's Licence Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern that matches the Hungary
Driver's Licence Number format. It checks for common test patterns.

Table 45-430 Hungary Driver's Licence Number wide-breadth patterns

Pattern

[Cc][A-Za-z]\d{6}

Table 45-431 Hungary Driver's Licence Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:
000000, 111111, 222222, 333333, 444444, 555555,
666666, 777777, 888888, 999999

Hungary Driver's Licence Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern that matches the Hungary
Driver's Licence Number format. It checks for common test patterns, and it requires the presence
of related keywords.

Table 45-432 Hungary Driver's Licence Number narrow-breadth patterns

Pattern

[Cc][A-Za-z]\d{6}
Library of system data identifiers 1219
Hungary Passport Number

Table 45-433 Hungary Driver's Licence Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

DLNo#, dlno#, DL#, Drivers Lic., driver licence, driver

license, drivers licence, drivers license, driver's
licence, driver's license, driving licence, driving
license, licence number, license number, driving permit

jogosítvány, Illesztőprogramok Lic, jogsi,licencszám,

vezetői engedély, VEZETŐI ENGEDÉLY, vezető
engedély, VEZETŐ ENGEDÉLY

Hungary Passport Number

Hungarian passports are issued to Hungarian citizens for international travel by the Central
Data Processing, Registration, and Election Office of the Hungarian Ministry of the Interior.
The Hungary Passport Number data identifier detects an eight- or nine-character alphanumeric
pattern that matches the Hungary Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight- or nine-character alphanumeric pattern that matches
the Hungary Passport Number format without checksum validation.
See “Hungary Passport Number wide breadth” on page 1220.
■ The medium breadth detects an eight- or nine-character alphanumeric pattern that matches
the Hungary Passport Number format with checksum validation.
See “Hungary Passport Number medium breadth” on page 1220.
■ The narrow breadth detects an eight- or nine-character alphanumeric pattern that matches
the Hungary Passport Number format with checksum validation. It also requires the presence
of related keywords.
See “Hungary Passport Number narrow breadth” on page 1220.
Library of system data identifiers 1220
Hungary Passport Number

Hungary Passport Number wide breadth

The wide breadth detects an eight- or nine-character alphanumeric pattern that matches the
Hungary Passport Number format without checksum validation.

Table 45-434 Hungary Passport Number wide-breadth patterns

Pattern

[A-Za-z]{2}[0-9]{6}

[A-Za-z]{2}[0-9]{7}

Table 45-435 Hungary Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Hungary Passport Number medium breadth

The medium breadth detects an eight- or nine-character alphanumeric pattern that matches
the Hungary Passport Number format with checksum validation.

Table 45-436 Hungary Passport Number medium-breadth patterns

Pattern

[A-Za-z]{2}[0-9]{6}

[A-Za-z]{2}[0-9]{7}

Table 45-437 Hungary Passport Number medium-breadth validators

Mandatory validator Description

Hungary Passport Number Validation Check Computes the checksum and validates the pattern against
it.

Hungary Passport Number narrow breadth

The narrow breadth detects an eight- or nine-character alphanumeric pattern that matches
the Hungary Passport Number format with checksum validation. It also requires the presence
of related keywords.
Library of system data identifiers 1221
Hungarian Social Security Number

Table 45-438 Hungary Passport Number narrow-breadth patterns

Pattern

[A-Za-z]{2}[0-9]{6}

[A-Za-z]{2}[0-9]{7}

Table 45-439 Hungary Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Hungary Passport Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

passport, útlevél, hungarian passport number, Magyar

útlevélszám, hungarianpassportnumber, passport
book, útlevél könyv, passeport, nombre, numéro de
passeport, hongrois, numéro de passeport hongrois

Hungarian Social Security Number

The Hungarian Social Security Number (TAJ) is a unique identifier issued by the Hungarian
government.
The Hungarian Social Security Number data identifier detects a nine-digit number that matches
the Hungarian Social Security Number format.
The Hungarian Social Security Number system data identifier provides three breadths of
detection:
■ The wide breadth detects a nine-digit number without checksum validation.
See “Hungarian Social Security Number wide breadth” on page 1222.
■ The medium breadth detects a nine-digit number with checksum validation.
See “Hungarian Social Security Number medium breadth” on page 1222.
■ The narrow breadth detects a nine-digit number that passes checksum validation. It also
requires related keywords.
See “Hungarian Social Security Number narrow breadth” on page 1222.
Library of system data identifiers 1222
Hungarian Social Security Number

Hungarian Social Security Number wide breadth

The wide breadth detects a nine-digit number without checksum validation.

Table 45-440 Hungarian Social Security Number wide-breadth pattern

Pattern

\d{9}

Table 45-441 Hungarian Social Security Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Hungarian Social Security Number medium breadth

The medium breadth detects a nine-digit number with checksum validation.

Table 45-442 Hungarian Social Security Number medium-breadth pattern

Pattern

\d{9}

Table 45-443 Hungarian Social Security Number medium-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Hungarian Social Security Validation Check Computes the checksum and validates the pattern against
it.

Hungarian Social Security Number narrow breadth

The narrow breadth detects a nine-digit number that passes checksum validation. It also
requires related keywords.

Table 45-444 Hungarian Social Security Number narrow-breadth pattern

Pattern

\d{9}
Library of system data identifiers 1223
Hungarian Tax Identification Number

Table 45-445 Hungarian Social Security Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Hungarian Social Security Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Hungarian social security number, social security

number, socialsecuritynumber#, hssn#, HSSN#,
socialsecuritynno, HSSN, TAJ, TAJ#, SSN, SSN#,
social security no

ÁFA, Közösségi adószám, Általános forgalmi adó

szám, hozzáadottérték adó, ÁFA szám, magyar ÁFA
szám

Hungarian Tax Identification Number

The Hungarian Tax Identification Number is a 10-digit number that always begins with the digit
"8."
The Hungarian Tax Identification Number data identifier detects a 10-digit number that matches
the Hungarian Tax Identification Number format.
The Hungarian Tax Identification Number system data identifier provides three breadths of
detection:
■ The wide breadth detects a 10-digit number beginning with the digit "8" without checksum
validation.
See “Hungarian Tax Identification Number wide breadth” on page 1224.
■ The medium breadth detects a 10-digit number beginning with the digit "8" with checksum
validation.
See “Hungarian Tax Identification Number medium breadth” on page 1224.
■ The narrow breadth detects a 10-digit number beginning with the digit "8" that passes
checksum validation. It also requires the presence of related keywords.
See “Hungarian Tax Identification Number narrow breadth” on page 1224.
Library of system data identifiers 1224
Hungarian Tax Identification Number

Hungarian Tax Identification Number wide breadth

The wide breadth detects a 10-digit number beginning with the digit "8" without checksum
validation.

Table 45-446 Hungarian Tax Identification Number wide-breadth pattern

Pattern

[8]\d{9}

Table 45-447 Hungarian Tax Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Hungarian Tax Identification Number medium breadth

The medium breadth detects a 10-digit number beginning with the digit "8" with checksum
validation.

Table 45-448 Hungarian Tax Identification Number medium breadth-pattern

Pattern

[8]\d{9}

Table 45-449 Hungarian Tax Identification Number medium-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Hungarian Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Hungarian Tax Identification Number narrow breadth

The narrow breadth detects a 10-digit number beginning with the digit "8" that passes checksum
validation. It also requires the presence of related keywords.

Table 45-450 Hungarian Tax Identification Number narrow breadth-pattern

Pattern

[8]\d{9}
Library of system data identifiers 1225
Hungarian VAT Number

Table 45-451 Hungarian Tax Identification Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Hungarian Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Hungarian tax identification number, Hungarian TIN,

tax ID number, VAT number, tax authority no, tax ID
tax identity number, taxidnumber#, tin#, TIN#,
Hungatiantin#, tax identification no, taxIDno#,
adóazonosító szám, adószám, adóhatóság szám

Hungarian VAT Number

All Hungarian businesses (including non-profit organizations) upon registration at the court of
Registry are granted a value-added tax (VAT) number.
The Hungarian VAT Number data identifier detects an eight-character alphanumeric pattern
that matches the Hungarian VAT Number format.
The Hungarian VAT Number system data identifier provides three breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern beginning with the letters
"HU/hu" without checksum validation.
See “Hungarian VAT Number wide breadth” on page 1226.
■ The medium breadth detects an eight-character alphanumeric pattern beginning with the
letters "HU/hu" with checksum validation.
See “Hungarian VAT Number medium breadth” on page 1226.
■ The narrow breadth detects an eight-character alphanumeric pattern beginning with the
letters "HU/hu" that passes checksum validation. It also requires the presence of related
keywords.
See “Hungarian VAT Number narrow breadth” on page 1226.
Library of system data identifiers 1226
Hungarian VAT Number

Hungarian VAT Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern beginning with the letters
"HU/hu" without checksum validation.

Table 45-452 Hungarian VAT Number wide-breadth patterns

Patterns

HU\d{8}

hu\d{8}

Table 45-453 Hungarian VAT Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Hungarian VAT Number medium breadth

The medium breadth detects an eight-character alphanumeric pattern beginning with the letters
"HU/hu" with checksum validation.

Table 45-454 Hungarian VAT Number medium-breadth patterns

Patterns

HU\d{8}

hu\d{8}

Table 45-455 Hungarian VAT Number medium-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Hungarian VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Hungarian VAT Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern beginning with the letters
"HU/hu" that passes checksum validation. It also requires the presence of related keywords.
Library of system data identifiers 1227
IBAN Central

Table 45-456 Hungarian VAT Number narrow-breadth patterns

Patterns

HU\d{8}

hu\d{8}

Table 45-457 Hungarian VAT Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Hungarian VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

VAT, VAT No., Value Added Tax Number, vat#, vatno#,

hungarianvatno#, tax no., VAT number, value added
tax

ÁFA, Közösségi adószám, Általános forgalmi adó

szám, hozzáadottérték adó, ÁFA szám, magyar ÁFA
szám

IBAN Central
The International Bank Account Number (IBAN) is an international standard for identifying
bank accounts across national borders.
The IBAN Central data identifier detects IBAN numbers for Andorra, Austria, Belgium, Germany,
Italy, Liechtenstein, Luxembourg, Malta, Monaco, San Marino, and Switzerland.
The IBAN West data identifier provides two breadths of detection:
■ The wide breadth detects a country-specific IBAN number with checksum validation.
See “IBAN Central wide breadth” on page 1228.
■ The narrow breadth detects a country-specific IBAN number with checksum validation. It
also requires the presence of related keywords.
See “IBAN Central narrow breadth” on page 1229.
Library of system data identifiers 1228
IBAN Central

Note: Do not add the NIB validation to any IBAN data identifiers that apply to DLP Agents. The
NIB validator is only for use with server-side detection.

IBAN Central wide breadth

The wide breadth detects a country-specific IBAN number with checksum validation. IBAN
numbers can include space delimiters, dash delimiters, or no delimiters.

Table 45-458 IBAN Central wide-breadth patterns

Patterns Description

AD\d{2}\d{4}\d{4}\w{4}\w{4}\w{4} Andorra patterns

AD\d{2} \d{4} \d{4} \w{4} \w{4} \w{4}

AD\d{2}-\d{4}-\d{4}-\w{4}-\w{4}-\w{4}

AT\d{2}\d{4}\d{4}\d{4}\d{4} Austria patterns

AT\d{2} \d{4} \d{4} \d{4} \d{4}

AT\d{2}-\d{4}-\d{4}-\d{4}-\d{4}

BE\d{2}\d{4}\d{4}\d{4} Belgium patterns

BE\d{2} \d{4} \d{4} \d{4}

BE\d{2}-\d{4}-\d{4}-\d{4}

CH\d{2}\d{4}\d\w{3}\w{4}\w{4}\w Switzerland patterns

CH\d{2} \d{4} \d\w{3} \w{4} \w{4} \w

CH\d{2}-\d{4}-\d\w{3}-\w{4}-\w{4}-\w

DE\d{2}\d{4}\d{4}\d{4}\d{4}\d{2} Germany patterns

DE\d{2} \d{4} \d{4} \d{4} \d{4} \d{2}

DE\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{2}

IT\d{2}[A-Z]\d{3}\d{4}\d{3}\w\w{4}\w{4}\w{3} Italy patterns

IT\d{2} [A-Z]\d{3} \d{4} \d{3}\w \w{4} \w{4} \w{3}

IT\d{2}-[A-Z]\d{3}-\d{4}-\d{3}\w-\w{4}-\w{4}-\w{3}

LI\d{2}\d{4}\d\w{3}\w{4}\w{4}\w Liechtenstein patterns

LI\d{2} \d{4} \d\w{3} \w{4} \w{4} \w

LI\d{2}-\d{4}-\d\w{3}-\w{4}-\w{4}-\w
Library of system data identifiers 1229
IBAN Central

Table 45-458 IBAN Central wide-breadth patterns (continued)

Patterns Description

LU\d{2}\d{3}\w\w{4}\w{4}\w{4} Luxembourg patterns

LU\d{2} \d{3}\w \w{4} \w{4} \w{4}

LU\d{2}-\d{3}\w-\w{4}-\w{4}-\w{4}

MC\d{2}\d{4}\d{4}\d{2}\w{2}\w{4}\w{4}\w\d{2} Monaco patterns

MC\d{2} \d{4} \d{4} \d{2}\w{2} \w{4} \w{4} \w\d{2}

MC\d{2}-\d{4}-\d{4}-\d{2}\w{2}-\w{4}-\w{4}-\w\d{2}

MT\d{2}[A-Z]{4}\d{4}\d\w{3}\w{4}\w{4}\w{4}\w{3} Malta patterns

MT\d{2} [A-Z]{4} \d{4} \d\w{3} \w{4} \w{4} \w{4}

\w{3}

MT\d{2}-[A-Z]{4}-\d{4}-\d\w{3}-\w{4}-\w{4}-\w{4}-\w{3}

SM\d{2}[A-Z]\d{3}\d{4}\d{3}\w\w{4}\w{4}\w{3} San Marino patterns

SM\d{2} [A-Z]\d{3} \d{4} \d{3}\w \w{4} \w{4} \w{3}

SM\d{2}-[A-Z]\d{3}-\d{4}-\d{3}\w-\w{4}-\w{4}-\w{3}

Table 45-459 IBAN Central wide-breadth validator

Validator Description

Mod 97 Validator Computes the ISO 7064 Mod 97-10 checksum of the
complete match.

IBAN Central narrow breadth

The narrow breadth detects a country-specific IBAN number with checksum validation. It also
requires the presence of related keywords.

Table 45-460 IBAN Central narrow-breadth patterns

Patterns Description

AD\d{2}\d{4}\d{4}\w{4}\w{4}\w{4} Andorra patterns

AD\d{2} \d{4} \d{4} \w{4} \w{4} \w{4}

AD\d{2}-\d{4}-\d{4}-\w{4}-\w{4}-\w{4}
Library of system data identifiers 1230
IBAN Central

Table 45-460 IBAN Central narrow-breadth patterns (continued)

Patterns Description

AT\d{2}\d{4}\d{4}\d{4}\d{4} Austria patterns

AT\d{2} \d{4} \d{4} \d{4} \d{4}

AT\d{2}-\d{4}-\d{4}-\d{4}-\d{4}

BE\d{2}\d{4}\d{4}\d{4} Belgium patterns

BE\d{2} \d{4} \d{4} \d{4}

BE\d{2}-\d{4}-\d{4}-\d{4}

CH\d{2}\d{4}\d\w{3}\w{4}\w{4}\w Switzerland patterns

CH\d{2} \d{4} \d\w{3} \w{4} \w{4} \w

CH\d{2}-\d{4}-\d\w{3}-\w{4}-\w{4}-\w

DE\d{2}\d{4}\d{4}\d{4}\d{4}\d{2} Germany patterns

DE\d{2} \d{4} \d{4} \d{4} \d{4} \d{2}

DE\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{2}

IT\d{2}[A-Z]\d{3}\d{4}\d{3}\w\w{4}\w{4}\w{3} Italy patterns

IT\d{2} [A-Z]\d{3} \d{4} \d{3}\w \w{4} \w{4} \w{3}

IT\d{2}-[A-Z]\d{3}-\d{4}-\d{3}\w-\w{4}-\w{4}-\w{3}

LI\d{2}\d{4}\d\w{3}\w{4}\w{4}\w Liechtenstein patterns

LI\d{2} \d{4} \d\w{3} \w{4} \w{4} \w

LI\d{2}-\d{4}-\d\w{3}-\w{4}-\w{4}-\w

LU\d{2}\d{3}\w\w{4}\w{4}\w{4} Luxembourg patterns

LU\d{2} \d{3}\w \w{4} \w{4} \w{4}

LU\d{2}-\d{3}\w-\w{4}-\w{4}-\w{4}

MC\d{2}\d{4}\d{4}\d{2}\w{2}\w{4}\w{4}\w\d{2} Monaco patterns

MC\d{2} \d{4} \d{4} \d{2}\w{2} \w{4} \w{4} \w\d{2}

MC\d{2}-\d{4}-\d{4}-\d{2}\w{2}-\w{4}-\w{4}-\w\d{2}

MT\d{2}[A-Z]{4}\d{4}\d\w{3}\w{4}\w{4}\w{4}\w{3} Malta patterns

MT\d{2} [A-Z]{4} \d{4} \d\w{3} \w{4} \w{4} \w{4}

\w{3}

MT\d{2}-[A-Z]{4}-\d{4}-\d\w{3}-\w{4}-\w{4}-\w{4}-\w{3}
Library of system data identifiers 1231
IBAN East

Table 45-460 IBAN Central narrow-breadth patterns (continued)

Patterns Description

SM\d{2}[A-Z]\d{3}\d{4}\d{3}\w\w{4}\w{4}\w{3} San Marino patterns

SM\d{2} [A-Z]\d{3} \d{4} \d{3}\w \w{4} \w{4} \w{3}

SM\d{2}-[A-Z]\d{3}-\d{4}-\d{3}\w-\w{4}-\w{4}-\w{3}

Table 45-461 IBAN Central narrow-breadth validators

Validators Description

Mod 97 Validator Computes the ISO 7064 Mod 97-10 checksum of the
complete match.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

Inputs:

Code IBAN, numéro IBAN, IBAN Code, IBAN number

IBAN East
The International Bank Account Number (IBAN) is an international standard for identifying
bank accounts across national borders.
The IBAN East data identifier detects IBAN numbers for Bosnia, Bulgaria, Croatia, Cyprus,
Czech Republic, Estonia, Greece, Hungary, Israel, Latvia, Lithuania, Macedonia, Montenegro,
Poland, Romania, Serbia, Slovakia, Slovenia, Turkey, and Tunisia.
The IBAN West data identifier provides two breadths of detection:
■ The wide breadth detects a country-specific IBAN number with checksum validation.
See “IBAN East wide breadth” on page 1232.
■ The narrow breadth detects a country-specific IBAN number with checksum validation. It
also requires the presence of related keywords.
See “IBAN East narrow-breadth” on page 1234.

Note: Do not add the NIB validation to any IBAN data identifiers that apply to DLP Agents. The
NIB validator is only for use with server-side detection.
Library of system data identifiers 1232
IBAN East

IBAN East wide breadth

The wide breadth detects a country-specific IBAN number with checksum validation. IBAN
numbers can include space delimiters, dash delimiters, or no delimiters.

Table 45-462 IBAN East wide-breadth patterns

Patterns Description

BA\d{2}\d{4}\d{4}\d{4}\d{4} Bosnia patterns

BA\d{2} \d{4} \d{4} \d{4} \d{4}

BA\d{2}-\d{4}-\d{4}-\d{4}-\d{4}

BG\d{2}[A-Z]{4}\d{4}\d{2}\w{2}\w{4}\w{2} Bulgaria patterns

BG\d{2} [A-Z]{4} \d{4} \d{2}\w{2} \w{4} \w{2}

BG\d{2}-[A-Z]{4}-\d{4}-\d{2}\w{2}-\w{4}-\w{2}

CY\d{2}\d{4}\d{4}\w{4}\w{4}\w{4}\w{4} Cyprus patterns

CY\d{2} \d{4} \d{4} \w{4} \w{4} \w{4} \w{4}

CY\d{2}-\d{4}-\d{4}-\w{4}-\w{4}-\w{4}-\w{4}

CZ\d{2}\d{4}\d{4}\d{4}\d{4}\d{4} Czech Republic patterns

CZ\d{2} \d{4} \d{4} \d{4} \d{4} \d{4}

CZ\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

EE\d{2}\d{4}\d{4}\d{4}\d{4} Estonia patterns

EE\d{2} \d{4} \d{4} \d{4} \d{4}

EE\d{2}-\d{4}-\d{4}-\d{4}-\d{4}

GR\d{2}\d{4}\d{3}\w\w{4}\w{4}\w{4}\w{3} Greece patterns

GR\d{2} \d{4} \d{3}\w \w{4} \w{4} \w{4} \w{3}

GR\d{2}-\d{4}-\d{3}\w-\w{4}-\w{4}-\w{4}-\w{3}

HR\d{2}\d{4}\d{4}\d{4}\d{4}\d Croatia patterns

HR\d{2} \d{4} \d{4} \d{4} \d{4} \d

HR\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d

HU\d{2}\d{4}\d{4}\d{4}\d{4}\d{4}\d{4} Hungary patterns

HU\d{2} \d{4} \d{4} \d{4} \d{4} \d{4} \d{4}

HU\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}
Library of system data identifiers 1233
IBAN East

Table 45-462 IBAN East wide-breadth patterns (continued)

Patterns Description

IL\d{2}\d{4}\d{4}\d{4}\d{4}\d{3} Israel patterns

IL\d{2} \d{4} \d{4} \d{4} \d{4} \d{3}

IL\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{3}

LT\d{2}\d{4}\d{4}\d{4}\d{4} Lithuania patterns

LT\d{2} \d{4} \d{4} \d{4} \d{4}

LT\d{2}-\d{4}-\d{4}-\d{4}-\d{4}

LV\d{2}[A-Z]{4}\w{4}\w{4}\w{4}\w Latvia patterns

LV\d{2} [A-Z]{4} \w{4} \w{4} \w{4} \w

LV\d{2}-[A-Z]{4}-\w{4}-\w{4}-\w{4}-\w

ME\d{2}\d{4}\d{4}\d{4}\d{4}\d{2} Montenegro patterns

ME\d{2} \d{4} \d{4} \d{4} \d{4} \d{2}

ME\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{2}

MK\d{2}\d{3}\w\w{4}\w{4}\w\d{2} Macedonia patterns

MK\d{2} \d{3}\w \w{4} \w{4} \w\d{2}

MK\d{2}-\d{3}\w-\w{4}-\w{4}-\w\d{2}

PL\d{2}\d{4}\d{4}\d{4}\d{4}\d{4}\d{4} Poland patterns

PL\d{2} \d{4} \d{4} \d{4} \d{4} \d{4} \d{4}

PL\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

RO\d{2}[A-Z]{4}\w{4}\w{4}\w{4}\w{4} Romania patterns

RO\d{2} [A-Z]{4} \w{4} \w{4} \w{4} \w{4}

RO\d{2}-[A-Z]{4}-\w{4}-\w{4}-\w{4}-\w{4}

RS\d{2}\d{4}\d{4}\d{4}\d{4}\d{2} Serbia patterns

RS\d{2} \d{4} \d{4} \d{4} \d{4} \d{2}

RS\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{2}

SI\d{2}\d{4}\d{4}\d{4}\d{3} Slovenia patterns

SI\d{2} \d{4} \d{4} \d{4} \d{3}

SI\d{2}-\d{4}-\d{4}-\d{4}-\d{3}
Library of system data identifiers 1234
IBAN East

Table 45-462 IBAN East wide-breadth patterns (continued)

Patterns Description

SK\d{2}\d{4}\d{4}\d{4}\d{4}\d{4} Slovak Republic patterns

SK\d{2} \d{4} \d{4} \d{4} \d{4} \d{4}

SK\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

TN59\d{4}\d{4}\d{4}\d{4}\d{4} Tunisia patterns

TN59 \d{4} \d{4} \d{4} \d{4} \d{4}

TN59-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

TR\d{2}\d{4}\d\w{3}\w{4}\w{4}\w{4}\w{2} Turkey patterns

TR\d{2} \d{4} \d\w{3} \w{4} \w{4} \w{4} \w{2}

TR\d{2}-\d{4}-\d\w{3}-\w{4}-\w{4}-\w{4}-\w{2}

Table 45-463 IBAN East wide-breadth validator

Validator Description

Mod 97 Validator Computes the ISO 7064 Mod 97-10 checksum of the
complete match.

IBAN East narrow-breadth

The narrow breadth detects a country-specific IBAN number with checksum validation. It also
requires the presence of related keywords.

Table 45-464 IBAN East narrow-breadth patterns

Patterns Description

BA\d{2}\d{4}\d{4}\d{4}\d{4} Bosnia patterns

BA\d{2} \d{4} \d{4} \d{4} \d{4}

BA\d{2}-\d{4}-\d{4}-\d{4}-\d{4}

BG\d{2}[A-Z]{4}\d{4}\d{2}\w{2}\w{4}\w{2} Bulgaria patterns

BG\d{2} [A-Z]{4} \d{4} \d{2}\w{2} \w{4} \w{2}

BG\d{2}-[A-Z]{4}-\d{4}-\d{2}\w{2}-\w{4}-\w{2}
Library of system data identifiers 1235
IBAN East

Table 45-464 IBAN East narrow-breadth patterns (continued)

Patterns Description

CY\d{2}\d{4}\d{4}\w{4}\w{4}\w{4}\w{4} Cyprus patterns

CY\d{2} \d{4} \d{4} \w{4} \w{4} \w{4} \w{4}

CY\d{2}-\d{4}-\d{4}-\w{4}-\w{4}-\w{4}-\w{4}

CZ\d{2}\d{4}\d{4}\d{4}\d{4}\d{4} Czech Republic patterns

CZ\d{2} \d{4} \d{4} \d{4} \d{4} \d{4}

CZ\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

EE\d{2}\d{4}\d{4}\d{4}\d{4} Estonia patterns

EE\d{2} \d{4} \d{4} \d{4} \d{4}

EE\d{2}-\d{4}-\d{4}-\d{4}-\d{4}

GR\d{2}\d{4}\d{3}\w\w{4}\w{4}\w{4}\w{3} Greece patterns

GR\d{2} \d{4} \d{3}\w \w{4} \w{4} \w{4} \w{3}

GR\d{2}-\d{4}-\d{3}\w-\w{4}-\w{4}-\w{4}-\w{3}

HR\d{2}\d{4}\d{4}\d{4}\d{4}\d Croatia patterns

HR\d{2} \d{4} \d{4} \d{4} \d{4} \d

HR\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d

HU\d{2}\d{4}\d{4}\d{4}\d{4}\d{4}\d{4} Hungary patterns

HU\d{2} \d{4} \d{4} \d{4} \d{4} \d{4} \d{4}

HU\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

IL\d{2}\d{4}\d{4}\d{4}\d{4}\d{3} Israel patterns

IL\d{2} \d{4} \d{4} \d{4} \d{4} \d{3}

IL\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{3}

LT\d{2}\d{4}\d{4}\d{4}\d{4} Lithuania patterns

LT\d{2} \d{4} \d{4} \d{4} \d{4}

LT\d{2}-\d{4}-\d{4}-\d{4}-\d{4}

LV\d{2}[A-Z]{4}\w{4}\w{4}\w{4}\w Latvia patterns

LV\d{2} [A-Z]{4} \w{4} \w{4} \w{4} \w

LV\d{2}-[A-Z]{4}-\w{4}-\w{4}-\w{4}-\w
Library of system data identifiers 1236
IBAN East

Table 45-464 IBAN East narrow-breadth patterns (continued)

Patterns Description

ME\d{2}\d{4}\d{4}\d{4}\d{4}\d{2} Montenegro patterns

ME\d{2} \d{4} \d{4} \d{4} \d{4} \d{2}

ME\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{2}

MK\d{2}\d{3}\w\w{4}\w{4}\w\d{2} Macedonia patterns

MK\d{2} \d{3}\w \w{4} \w{4} \w\d{2}

MK\d{2}-\d{3}\w-\w{4}-\w{4}-\w\d{2}

PL\d{2}\d{4}\d{4}\d{4}\d{4}\d{4}\d{4} Poland patterns

PL\d{2} \d{4} \d{4} \d{4} \d{4} \d{4} \d{4}

PL\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

RO\d{2}[A-Z]{4}\w{4}\w{4}\w{4}\w{4} Romania patterns

RO\d{2} [A-Z]{4} \w{4} \w{4} \w{4} \w{4}

RO\d{2}-[A-Z]{4}-\w{4}-\w{4}-\w{4}-\w{4}

RS\d{2}\d{4}\d{4}\d{4}\d{4}\d{2} Serbia patterns

RS\d{2} \d{4} \d{4} \d{4} \d{4} \d{2}

RS\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{2}

SI\d{2}\d{4}\d{4}\d{4}\d{3} Slovenia patterns

SI\d{2} \d{4} \d{4} \d{4} \d{3}

SI\d{2}-\d{4}-\d{4}-\d{4}-\d{3}

SK\d{2}\d{4}\d{4}\d{4}\d{4}\d{4} Slovak Republic patterns

SK\d{2} \d{4} \d{4} \d{4} \d{4} \d{4}

SK\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

TN59\d{4}\d{4}\d{4}\d{4}\d{4} Tunisia patterns

TN59 \d{4} \d{4} \d{4} \d{4} \d{4}

TN59-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

TR\d{2}\d{4}\d\w{3}\w{4}\w{4}\w{4}\w{2} Turkey patterns

TR\d{2} \d{4} \d\w{3} \w{4} \w{4} \w{4} \w{2}

TR\d{2}-\d{4}-\d\w{3}-\w{4}-\w{4}-\w{4}-\w{2}
Library of system data identifiers 1237
IBAN West

Table 45-465 IBAN East narrow-breadth validators

Validators Description

Mod 97 Validator Computes the ISO 7064 Mod 97-10 checksum of the
complete match.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

Inputs:

Code IBAN, numéro IBAN, IBAN Code, IBAN number

IBAN West
The International Bank Account Number (IBAN) is an international standard for identifying
bank accounts across national borders.
The IBAN West data identifier detects IBAN numbers for Denmark, Faroe Islands, Finland,
France, Gibraltar, Greenland, Iceland, Ireland, Netherlands, Norway, Portugal, Spain, Sweden,
and the United Kingdom.
The IBAN West data identifier provides two breadths of detection:
■ The wide breadth detects a country-specific IBAN number with checksum validation.
See “IBAN West wide breadth” on page 1237.
■ The narrow breadth detects a country-specific IBAN number with checksum validation. It
also requires the presence of related keywords.
See “IBAN West narrow-breadth” on page 1239.

Note: Do not add the NIB validation to any IBAN data identifiers that apply to DLP Agents. The
NIB validator is only for use with server-side detection.

IBAN West wide breadth

The wide breadth detects a country-specific IBAN number that passes a checksum. IBAN
numbers can include space delimiters, dash delimiters, or no delimiters.
Library of system data identifiers 1238
IBAN West

Table 45-466 IBAN West wide-breadth patterns

Patterns Description

DK\d{2}\d{4}\d{4}\d{4}\d{2} Denmark patterns

DK\d{2} \d{4} \d{4} \d{4} \d{2}

DK\d{2}-\d{4}-\d{4}-\d{4}-\d{2}

ES\d{2}\d{4}\d{4}\d{4}\d{4}\d{4} Spain patterns

ES\d{2} \d{4} \d{4} \d{4} \d{4} \d{4}

ES\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

FI\d{2}\d{4}\d{4}\d{4}\d{2} Finland patterns

FI\d{2} \d{4} \d{4} \d{4} \d{2}

FI\d{2}-\d{4}-\d{4}-\d{4}-\d{2}

FO\d{2}\d{4}\d{4}\d{4}\d{2} Faroe Islands patterns

FO\d{2} \d{4} \d{4} \d{4} \d{2}

FO\d{2}-\d{4}-\d{4}-\d{4}-\d{2}

FR\d{2}\d{4}\d{4}\d{2}\w{2}\w{4}\w{4}\w\d{2} France patterns

FR\d{2} \d{4} \d{4} \d{2}\w{2} \w{4} \w{4} \w\d{2}

FR\d{2}-\d{4}-\d{4}-\d{2}\w{2}-\w{4}-\w{4}-\w\d{2}

GB\d{2}[A-Z]{4}\d{4}\d{4}\d{4}\d{2} United Kingdom

GB\d{2} [A-Z]{4} \d{4} \d{4} \d{4} \d{2}

GB\d{2}-[A-Z]{4}-\d{4}-\d{4}-\d{4}-\d{2}

GI\d{2}[A-Z]{4}\w{4}\w{4}\w{4}\w{3} Gibraltar patterns

GI\d{2} [A-Z]{4} \w{4} \w{4} \w{4} \w{3}

GI\d{2}-[A-Z]{4}-\w{4}-\w{4}-\w{4}-\w{3}

GL\d{2}\d{4}\d{4}\d{4}\d{2} Greenland patterns

GL\d{2} \d{4} \d{4} \d{4} \d{2}

GL\d{2}-\d{4}-\d{4}-\d{4}-\d{2}

IE\d{2}[A-Z]{4}\d{4}\d{4}\d{4}\d{2} Ireland patterns

IE\d{2} [A-Z]{4} \d{4} \d{4} \d{4} \d{2}

IE\d{2}-[A-Z]{4}-\d{4}-\d{4}-\d{4}-\d{2}
Library of system data identifiers 1239
IBAN West

Table 45-466 IBAN West wide-breadth patterns (continued)

Patterns Description

IS\d{2}\d{4}\d{4}\d{4}\d{4}\d{4}\d{2} Iceland patterns

IS\d{2} \d{4} \d{4} \d{4} \d{4} \d{4} \d{2}

IS\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}-\d{2}

NL\d{2}[A-Z]{4}\d{4}\d{4}\d{2} Netherlands patterns

NL\d{2} [A-Z]{4} \d{4} \d{4} \d{2}

NL\d{2}-[A-Z]{4}-\d{4}-\d{4}-\d{2}

NO\d{2}\d{4}\d{4}\d{3} Montenegro patterns

NO\d{2} \d{4} \d{4} \d{3}

NO\d{2}-\d{4}-\d{4}-\d{3}

PT\d{2}\d{4}\d{4}\d{4}\d{4}\d{4}\d Portugal patterns

PT\d{2} \d{4} \d{4} \d{4} \d{4} \d{4} \d

PT\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}-\d

SE\d{2}\d{4}\d{4}\d{4}\d{4}\d{4} Sweden patterns

SE\d{2} \d{4} \d{4} \d{4} \d{4} \d{4}

SE\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

Table 45-467 IBAN West wide-breadth validator

Validator Description

Mod 97 Validator Computes the ISO 7064 Mod 97-10 checksum of the
complete match.

IBAN West narrow-breadth

The narrow breadth detects a country-specific IBAN number that passes a checksum. It also
requires the presence of IBAN-related keywords.
Library of system data identifiers 1240
IBAN West

Table 45-468 IBAN West narrow-breadth patterns

Patterns Description

DK\d{2}\d{4}\d{4}\d{4}\d{2} Denmark patterns

DK\d{2} \d{4} \d{4} \d{4} \d{2}

DK\d{2}-\d{4}-\d{4}-\d{4}-\d{2}

ES\d{2}\d{4}\d{4}\d{4}\d{4}\d{4} Spain patterns

ES\d{2} \d{4} \d{4} \d{4} \d{4} \d{4}

ES\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

FI\d{2}\d{4}\d{4}\d{4}\d{2} Finland patterns

FI\d{2} \d{4} \d{4} \d{4} \d{2}

FI\d{2}-\d{4}-\d{4}-\d{4}-\d{2}

FO\d{2}\d{4}\d{4}\d{4}\d{2} Faroe Islands patterns

FO\d{2} \d{4} \d{4} \d{4} \d{2}

FO\d{2}-\d{4}-\d{4}-\d{4}-\d{2}

FR\d{2}\d{4}\d{4}\d{2}\w{2}\w{4}\w{4}\w\d{2} France patterns

FR\d{2} \d{4} \d{4} \d{2}\w{2} \w{4} \w{4} \w\d{2}

FR\d{2}-\d{4}-\d{4}-\d{2}\w{2}-\w{4}-\w{4}-\w\d{2}

GB\d{2}[A-Z]{4}\d{4}\d{4}\d{4}\d{2} United Kingdom

GB\d{2} [A-Z]{4} \d{4} \d{4} \d{4} \d{2}

GB\d{2}-[A-Z]{4}-\d{4}-\d{4}-\d{4}-\d{2}

GI\d{2}[A-Z]{4}\w{4}\w{4}\w{4}\w{3} Gibraltar patterns

GI\d{2} [A-Z]{4} \w{4} \w{4} \w{4} \w{3}

GI\d{2}-[A-Z]{4}-\w{4}-\w{4}-\w{4}-\w{3}

GL\d{2}\d{4}\d{4}\d{4}\d{2} Greenland patterns

GL\d{2} \d{4} \d{4} \d{4} \d{2}

GL\d{2}-\d{4}-\d{4}-\d{4}-\d{2}

IE\d{2}[A-Z]{4}\d{4}\d{4}\d{4}\d{2} Ireland patterns

IE\d{2} [A-Z]{4} \d{4} \d{4} \d{4} \d{2}

IE\d{2}-[A-Z]{4}-\d{4}-\d{4}-\d{4}-\d{2}
Library of system data identifiers 1241
Iceland National Identification Number

Table 45-468 IBAN West narrow-breadth patterns (continued)

Patterns Description

IS\d{2}\d{4}\d{4}\d{4}\d{4}\d{4}\d{2} Iceland patterns

IS\d{2} \d{4} \d{4} \d{4} \d{4} \d{4} \d{2}

IS\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}-\d{2}

NL\d{2}[A-Z]{4}\d{4}\d{4}\d{2} Netherlands patterns

NL\d{2} [A-Z]{4} \d{4} \d{4} \d{2}

NL\d{2}-[A-Z]{4}-\d{4}-\d{4}-\d{2}

NO\d{2}\d{4}\d{4}\d{3} Montenegro patterns

NO\d{2} \d{4} \d{4} \d{3}

NO\d{2}-\d{4}-\d{4}-\d{3}

PT\d{2}\d{4}\d{4}\d{4}\d{4}\d{4}\d Portugal patterns

PT\d{2} \d{4} \d{4} \d{4} \d{4} \d{4} \d

PT\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}-\d

SE\d{2}\d{4}\d{4}\d{4}\d{4}\d{4} Sweden patterns

SE\d{2} \d{4} \d{4} \d{4} \d{4} \d{4}

SE\d{2}-\d{4}-\d{4}-\d{4}-\d{4}-\d{4}

Table 45-469 IBAN West narrow-breadth validators

Validators Description

Mod 97 Validator Computes the ISO 7064 Mod 97-10 checksum of the
complete match.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

Inputs:

Code IBAN, numéro IBAN, IBAN Code, IBAN number

Iceland National Identification Number

The Iceland National Identification Number is a unique national identifier used by the Icelandic
government to identify individuals and organizations. It is administered by the Registers Iceland.
Library of system data identifiers 1242
Iceland National Identification Number

Icelandic national identification numbers are issued to Icelandic citizens at birth and to foreign
nationals resident in Iceland upon registration. They are also issued to corporations and
institutions.
The Iceland National Identification Number data identifier detects a 10-digit number that
matches the Iceland National Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 10-digit number that matches the Iceland National Identification
Number format without checksum validation. It checks for common test numbers.
See “Iceland National Identification Number wide breadth” on page 1242.
■ The medium breadth detects a 10-digit number that matches the Iceland National
Identification Number format with checksum validation.
See “Iceland National Identification Number medium breadth” on page 1243.
■ The narrow breadth detects a 10-digit number that matches the Iceland National
Identification Number format with checksum validation. It checks for common test numbers,
and also requires the presence of related keywords.
See “Iceland National Identification Number narrow breadth” on page 1244.

Iceland National Identification Number wide breadth

The wide breadth detects a 10-digit number that matches the Iceland National Identification
Number format without checksum validation. It checks for common test numbers.

Table 45-470 Iceland National Identification Number wide-breadth patterns

Pattern

[04][1-9]0[1-9]\d{2}-\d{3}[09]

[1256][0-9]0[1-9]\d{2}-\d{3}[09]

[37][01]0[1-9]\d{2}-\d{3}[09]

[04][1-9]1[012]\d{2}-\d{3}[09]

[1256][0-9]1[012]\d{2}-\d{3}[09]

[37][01]1[012]\d{2}-\d{3}[09]

[04][1-9]0[1-9]\d{5}[09]

[1256][0-9]0[1-9]\d{5}[09]

[37][01]0[1-9]\d{5}[09]

[04][1-9]1[012]\d{5}[09]
Library of system data identifiers 1243
Iceland National Identification Number

Table 45-470 Iceland National Identification Number wide-breadth patterns (continued)

Pattern

[1256][0-9]1[012]\d{5}[09]

[37][01]1[012]\d{5}[09]

Table 45-471 Iceland National Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of numbers is not all the same.

Iceland National Identification Number medium breadth

The medium breadth detects a 10-digit number that matches the Iceland National Identification
Number format with checksum validation.

Table 45-472 Iceland National Identification Number medium-breadth patterns

Pattern

[04][1-9]0[1-9]\d{2}-\d{3}[09]

[1256][0-9]0[1-9]\d{2}-\d{3}[09]

[37][01]0[1-9]\d{2}-\d{3}[09]

[04][1-9]1[012]\d{2}-\d{3}[09]

[1256][0-9]1[012]\d{2}-\d{3}[09]

[37][01]1[012]\d{2}-\d{3}[09]

[04][1-9]0[1-9]\d{5}[09]

[1256][0-9]0[1-9]\d{5}[09]

[37][01]0[1-9]\d{5}[09]

[04][1-9]1[012]\d{5}[09]

[1256][0-9]1[012]\d{5}[09]

[37][01]1[012]\d{5}[09]
Library of system data identifiers 1244
Iceland National Identification Number

Table 45-473 Iceland National Identification Number medium-breadth validators

Mandatory validator Description

Iceland National Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Iceland National Identification Number narrow breadth

The narrow breadth detects a 10-digit number that matches the Iceland National Identification
Number format with checksum validation. It checks for common test numbers, and also requires
the presence of related keywords.

Table 45-474 Iceland National Identification Number narrow-breadth patterns

Pattern

[04][1-9]0[1-9]\d{2}-\d{3}[09]

[1256][0-9]0[1-9]\d{2}-\d{3}[09]

[37][01]0[1-9]\d{2}-\d{3}[09]

[04][1-9]1[012]\d{2}-\d{3}[09]

[1256][0-9]1[012]\d{2}-\d{3}[09]

[37][01]1[012]\d{2}-\d{3}[09]

[04][1-9]0[1-9]\d{5}[09]

[1256][0-9]0[1-9]\d{5}[09]

[37][01]0[1-9]\d{5}[09]

[04][1-9]1[012]\d{5}[09]

[1256][0-9]1[012]\d{5}[09]

[37][01]1[012]\d{5}[09]

Table 45-475 Iceland National Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of numbers is not all the same.
Library of system data identifiers 1245
Iceland Passport Number

Table 45-475 Iceland National Identification Number narrow-breadth validators (continued)

Mandatory validator Description

Iceland National Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

personal code, national ID, national identification

number, personal ID, personal identification number,
personal identification code, nationalid#, personalid#,
KH, KH#, magic number, magicnumber#, magicno#,
magic no., social security number, ssn, ssn#, social
security no., kennitala,kennitala#, tin, tax identification
number, tin#, tax id, tin no, tin number, tax number,
tax code, taxpayer id, taxpayer identification number,
persónuleg kennitala, galdur númer, skattanúmer,
skattgreiðenda kóða, kennitala skattgreiðenda

Iceland Passport Number

Icelandic passports are issued to citizens of Iceland for the purpose of international travel and
may also serve as a proof of Iceland citizenship.
The Iceland Passport Number data identifier detects an eight-character alphanumeric pattern
that matches the Iceland Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern that matches the Iceland
Passport Number format. It checks for common test patterns.
See “Iceland Passport Number wide breadth” on page 1245.
■ The narrow breadth an eight-character alphanumeric pattern that matches the Iceland
Passport Number format. It checks for common test patterns, and also requires the presence
of related keywords.
See “Iceland Passport Number narrow breadth” on page 1246.

Iceland Passport Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern that matches the Iceland
Passport Number format. It checks for common test patterns.
Library of system data identifiers 1246
Iceland Passport Number

Table 45-476 Iceland Passport Number wide-breadth patterns

Pattern

[A-Za-z]\d{7}

Table 45-477 Iceland Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Iceland Passport Number narrow breadth

The narrow breadth an eight-character alphanumeric pattern that matches the Iceland Passport
Number format. It checks for common test patterns, and also requires the presence of related
keywords.

Table 45-478 Iceland Passport Number narrow-breadth patterns

Pattern

[A-Za-z]\d{7}

Table 45-479 Iceland Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999
Library of system data identifiers 1247
Iceland Value Added Tax (VAT) Number

Table 45-479 Iceland Passport Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

passport, passport number, passport no, passportno,

passport no., passport#, passportno#

vegabréf, vegabréfs númer, Vegabréf Nei, vegabréf#

Iceland Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Iceland, VAT is
administered by the VAT office for the region in which the business is established.
The Iceland Value Added Tax (VAT) Number data identifier detects a seven- or eight-character
alphanumeric pattern that matches the Iceland VAT Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a seven- or eight-character alphanumeric pattern that matches
the Iceland VAT Number format. It checks for common test patterns.
See “Iceland Value Added Tax (VAT) Number wide breadth” on page 1247.
■ The narrow breadth detects a seven- or eight-character alphanumeric pattern that matches
the Iceland VAT Number format. It checks for common test patterns, and also requires the
presence of related keywords.
See “Iceland Value Added Tax (VAT) Number narrow breadth” on page 1248.

Iceland Value Added Tax (VAT) Number wide breadth

The wide breadth detects a seven- or eight-character alphanumeric pattern that matches the
Iceland VAT Number format. It checks for common test patterns.

Table 45-480 Iceland Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Ii][Ss] \d\d\d\d\d

[Ii][Ss] \d\d\d\d\d\d
Library of system data identifiers 1248
Iceland Value Added Tax (VAT) Number

Table 45-481 Iceland Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000, 11111, 22222, 33333, 44444, 55555, 66666,

77777, 88888, 99999, 000000, 111111, 222222, 333333,
444444, 555555, 666666, 777777, 888888, 999999

Iceland Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a seven- or eight-character alphanumeric pattern that matches
the Iceland VAT Number format. It checks for common test patterns, and also requires the
presence of related keywords.

Table 45-482 Iceland Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Ii][Ss] \d\d\d\d\d

[Ii][Ss] \d\d\d\d\d\d

Table 45-483 Iceland Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000, 11111, 22222, 33333, 44444, 55555, 66666,

77777, 88888, 99999, 000000, 111111, 222222, 333333,
444444, 555555, 666666, 777777, 888888, 999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat, vat number, value added tax number

virðisaukaskattsnúmer, vsk númer

Library of system data identifiers 1249
Indian Aadhaar Card Number

Indian Aadhaar Card Number

The UIDAI is mandated to assign a 12-digit UID number termed as Aadhaar to all the residents
of India. The Aadhaar number is robust enough to eliminate duplicate and fake identities and
can be verified and authenticated in a cost-effective way online.
The Indian Aadhaar Card Number data identifier detects a 12-digit number that matches the
Indian Aadhaar Card Number format.
The Indian Aadhaar Card Number data identifier provides three breadths of detection:
■ The wide breadth detects a 12-digit number without checksum validation.
See “Indian Aadhaar Card Number wide breadth” on page 1249.
■ The medium breadth detects a 12-digit number with checksum validation.
See “Indian Aadhaar Card Number medium breadth” on page 1249.
■ The narrow breadth detects a 12-digit number with checksum validation. It also requires
the presence of related keywords.
See “Indian Aadhaar Card Number narrow breadth” on page 1250.

Indian Aadhaar Card Number wide breadth

The wide breadth detects a 12-digit number without checksum validation.

Table 45-484 Indian Aadhaar Card Number wide-breadth patterns

Patterns

[2-9]\d{11}

[2-9]\d{3} \d{4} \d{4}

Table 45-485 Indian Aadhaar Card Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Indian Aadhaar Card Number medium breadth

The medium breadth detects a 12-digit number with checksum validation.

Table 45-486 Indian Aadhaar Card Number medium-breadth patterns

Patterns

[2-9]\d{11}
Library of system data identifiers 1250
Indian Aadhaar Card Number

Table 45-486 Indian Aadhaar Card Number medium-breadth patterns (continued)

Patterns

[2-9]\d{3} \d{4} \d{4}

Table 45-487 Indian Aadhaar Card Number medium-breadth validators

Mandatory validators Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

333333333333, 666666666666, 999999999999

Number delimiter Validates a match by checking the surrounding numbers.

Verheoff validation check Computes the checksum and validates the pattern against
it.

Indian Aadhaar Card Number narrow breadth

The narrow breadth detects a 12-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-488 Indian Aadhaar Card Number narrow-breadth patterns

Patterns

[2-9]\d{11}

[2-9]\d{3} \d{4} \d{4}

Table 45-489 Indian Aadhaar Card Number narrow-breadth validators

Mandatory validators Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

333333333333, 666666666666, 999999999999

Number delimiter Validates a match by checking the surrounding numbers.

Verheoff validation check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1251
Indian Permanent Account Number

Table 45-489 Indian Aadhaar Card Number narrow-breadth validators (continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

aadhar card no.,uidai,aadhar no.,Aadhar

Number,Aadhar#,Aadhar Card#

Indian Permanent Account Number

The Indian Permanent Account Number (PAN) is a unique 10-character alphanumeric identifier
issued by the Indian Income Tax Department to an individual.
The Indian Permanent Account Number detects a 10-character alphanumeric pattern that
matches the Indian Permanent Account Number format.
This data identifier provides two breadths of detection:
■ The wide breadth detects a 10-character alphanumeric pattern without checksum validation.
See “Indian Permanent Account Number wide breadth” on page 1251.
■ The narrow breadth detects a 10-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.
See “Indian Permanent Account Number narrow breadth” on page 1252.

Indian Permanent Account Number wide breadth

The wide breadth detects a 10-character alphanumeric pattern without checksum validation.

Table 45-490 Indian Permanent Account Number wide-breadth pattern

Pattern

[A-Za-z]{3}[CPHFATBLJGcphfatbljg][A-Za-z]\d{4}[A-Za-z]

Table 45-491 Indian Permanent Account Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.
Library of system data identifiers 1252
India RuPay Card Number

Indian Permanent Account Number narrow breadth

The narrow breadth detects a 10-character alphanumeric pattern without checksum validation.
It requires the presence of related keywords.

Table 45-492 Indian Permanent Account Number narrow-breadth pattern

Pattern

[A-Za-z]{3}[CPHFATBLJGcphfatbljg][A-Za-z]\d{4}[A-Za-z]

Table 45-493 Indian Permanent Account Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

PAN, permanent account number, pan, pan#, PAN#,

PAN Card Number, pan card no, pancardno#, PAN
card no, pan#, PANID#

India RuPay Card Number

The India RuPay Card is a card payment system similar to MasterCard and Visa created by
the National Payments Corporation of India.
The India RuPay Card Number data identifier detects a 16-digit number that matches the
RuPay Card Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 16-digit number that matches the RuPay Card Number format
without checksum validation. It checks for common test numbers.
See “India RuPay Card Number wide breadth” on page 1253.
■ The medium breadth detects a 16-digit number that matches the RuPay Card Number
format with checksum validation.
See “India RuPay Card Number medium breadth” on page 1253.
■ The narrow breadth detects a 16-digit number that matches the RuPay Card Number format
with checksum validation. It checks for common test numbers, and also requires the
presence of related keywords.
See “India RuPay Card Number narrow breadth” on page 1254.
Library of system data identifiers 1253
India RuPay Card Number

India RuPay Card Number wide breadth

The wide breadth detects a 16-digit number that matches the RuPay Card Number format
without checksum validation. It checks for common test numbers.

Table 45-494 India RuPay Card Number wide-breadth patterns

Pattern

508[5-9]\d\d\d\d\d\d\d\d\d\d\d\d

607[0-8]\d\d\d\d\d\d\d\d\d\d\d\d

6079[0-8]\d\d\d\d\d\d\d\d\d\d\d

6069[89]\d\d\d\d\d\d\d\d\d\d\d

6521[5-9]\d\d\d\d\d\d\d\d\d\d\d

652[2345]\d\d\d\d\d\d\d\d\d\d\d\d

6531[0-4]\d\d\d\d\d\d\d\d\d\d\d

6530\d\d\d\d\d\d\d\d\d\d\d\d

608[0123]\d\d\d\d\d\d\d\d\d\d\d\d

6950\d\d\d\d\d\d\d\d\d\d\d\d

Table 45-495 India RuPay Card Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of numbers is not all the same.

India RuPay Card Number medium breadth

The medium breadth detects a 16-digit number that matches the RuPay Card Number format
with checksum validation.

Table 45-496 India RuPay Card Number medium-breadth patterns

Pattern

508[5-9]\d\d\d\d\d\d\d\d\d\d\d\d

607[0-8]\d\d\d\d\d\d\d\d\d\d\d\d
Library of system data identifiers 1254
India RuPay Card Number

Table 45-496 India RuPay Card Number medium-breadth patterns (continued)

Pattern

6079[0-8]\d\d\d\d\d\d\d\d\d\d\d

6069[89]\d\d\d\d\d\d\d\d\d\d\d

6521[5-9]\d\d\d\d\d\d\d\d\d\d\d

652[2345]\d\d\d\d\d\d\d\d\d\d\d\d

6531[0-4]\d\d\d\d\d\d\d\d\d\d\d

6530\d\d\d\d\d\d\d\d\d\d\d\d

608[0123]\d\d\d\d\d\d\d\d\d\d\d\d

6950\d\d\d\d\d\d\d\d\d\d\d\d

Table 45-497 India RuPay Card Number medium-breadth validators

Mandatory validator Description

Luhn Check Computes the checksum and validates the pattern against
it.

India RuPay Card Number narrow breadth

The narrow breadth detects a 16-digit number that matches the RuPay Card Number format
with checksum validation. It checks for common test numbers, and also requires the presence
of related keywords.

Table 45-498 India RuPay Card Number narrow-breadth patterns

Pattern

508[5-9]\d\d\d\d\d\d\d\d\d\d\d\d

607[0-8]\d\d\d\d\d\d\d\d\d\d\d\d

6079[0-8]\d\d\d\d\d\d\d\d\d\d\d

6069[89]\d\d\d\d\d\d\d\d\d\d\d

6521[5-9]\d\d\d\d\d\d\d\d\d\d\d

652[2345]\d\d\d\d\d\d\d\d\d\d\d\d

6531[0-4]\d\d\d\d\d\d\d\d\d\d\d
Library of system data identifiers 1255
Indonesian Identity Card Number

Table 45-498 India RuPay Card Number narrow-breadth patterns (continued)

Pattern

6530\d\d\d\d\d\d\d\d\d\d\d\d

608[0123]\d\d\d\d\d\d\d\d\d\d\d\d

6950\d\d\d\d\d\d\d\d\d\d\d\d

Table 45-499 India RuPay Card Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of numbers is not all the same.

Luhn Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

bank card, bankcard, card number, cc#, ccn, check

card, checkcard, credit card, credit card number,
creditcard#, debit card, debitcard, debit card number,
rupay card, rupay, rupaycard#, ccn#, debitcard#,
rupay#

Indonesian Identity Card Number

The Indonesian identity card (Kartu Tanda Penduduk, or KTP) number is used as the basis
for issuance of passport, driving license, taxpayer identification number, insurance policy,
certificate of land rights, and identity documents.
The Indonesian Identity Card Number data identifier detects a 16-digit number that matches
the Indonesian Identity Card Number format.
The Indonesian Identity Card Number system data identifier provides three breadths of
detection:
■ The wide breadth detects a 16-digit number without checksum validation.
See “Indonesian Identity Card Number wide breadth” on page 1256.
■ The medium breadth detects a 16-digit number with checksum validation.
See “Indonesian Identity Card Number medium breadth” on page 1256.
Library of system data identifiers 1256
Indonesian Identity Card Number

■ The narrow breadth detects a 16-digit number that passes checksum validation. It also
requires the presence of related keywords.
See “Indonesian Identity Card Number narrow breadth” on page 1256.

Indonesian Identity Card Number wide breadth

The wide breadth detects a 16-digit number without checksum validation.

Table 45-500 Indonesian Identity Card Number wide-breadth pattern

Pattern

\d{2}[01237]\d{3}[01234567]\d[01]\d{7}

Table 45-501 Indonesian Identity Card Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Indonesian Identity Card Number medium breadth

The medium breadth detects a 16-digit number with checksum validation.

Table 45-502 Indonesian Identity Card Number medium-breadth pattern

Pattern

\d{2}[01237]\d{3}[01234567]\d[01]\d{7}

Table 45-503 Indonesian Identity Card Number medium-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Indonesian Kartu Tanda Penduduk Validation Check Validator computes the checksum that every Indonesian
Kartu Tanda Penduduk must pass.

Indonesian Identity Card Number narrow breadth

The narrow breadth detects a 16-digit number that passes checksum validation. It also requires
the presence of related keywords.
Library of system data identifiers 1257
International Mobile Equipment Identity Number

Table 45-504 Indonesian Identity Card Number narrow-breadth pattern

Pattern

\d{2}[01237]\d{3}[01234567]\d[01]\d{7}

Table 45-505 Indonesian Identity Card Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Indonesian Kartu Tanda Penduduk Validation Check Validator computes the checksum that every Indonesian
Kartu Tanda Penduduk must pass.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

identity card number, Indonesian identity card no,

Indonesian identity card number, NIK, KTP, unique ID,
unique identity number, national identification number,
national identity no, identity number

kartu tanda penduduk nomor, nomor Induk

Kependudukan, tanda penduduk nomor, kartu identitas
Indonesia no, kartu identitas Indonesia nomor, nomor
identitas unik

International Mobile Equipment Identity Number

The International Mobile Station Equipment Identity (IMEI) is a unique identifier for 3GPP
(GSM, UMTS, and LTE) and iDEN mobile phones and some satellite phones.
The International Mobile Equipment Identity Number detects a 15-digit number that matches
the International Mobile Equipment Identity Number format.
The International Mobile Equipment Identity Number data identifier provides three breadths
of detecion:
■ The wide breadth detects a 15-digit number with duplicate digit validation.
See “International Mobile Equipment Identity Number wide breadth” on page 1258.
■ The medium breadth detects a 15-digit number with Luhn check validation and beginning
character exclusion.
See “International Mobile Equipment Identity Number medium breadth” on page 1258.
Library of system data identifiers 1258
International Mobile Equipment Identity Number

■ The narrow breadth detects a 15-digit number with duplicate digit and Luhn check validation.
It also requires the presence of related keywords.
See “International Mobile Equipment Identity Number narrow breadth” on page 1259.

International Mobile Equipment Identity Number wide breadth

The wide breadth detects a 15-digit number with duplicate digit validation.

Table 45-506 International Mobile Equipment Identity Number wide-breadth patterns

Patterns

\d{15}

\d{2}-\d{6}-\d{6}-\d

Table 45-507 International Mobile Equipment Identity Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

International Mobile Equipment Identity Number medium breadth

The medium breadth detects a 15-digit number with Luhn check validation and beginning
character exclusion.

Table 45-508 International Mobile Equipment Identity Number medium-breadth patterns

Patterns

\d{15}

\d{2}-\d{6}-\d{6}-\d

Table 45-509 International Mobile Equipment Identity Number medium-breadth validators

Mandatory validators Description

Luhn Check Computes the Luhn checksum and validates the pattern
against it.

Number delimiter Validates a match by checking the surrounding numbers.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

000000000000000
Library of system data identifiers 1259
International Securities Identification Number

International Mobile Equipment Identity Number narrow breadth

The narrow breadth detects a 15-digit number with duplicate digit and Luhn check validation.
It also requires the presence of related keywords.

Table 45-510 International Mobile Equipment Identity Number narrow-breadth patterns

Patterns

\d{15}

\d{2}-\d{6}-\d{6}-\d

Table 45-511 International Mobile Equipment Identity Number narrow-breadth validators

Mandatory validators Description

Luhn Check Computes the Luhn checksum and validates the pattern
against it.

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

imei, IMEI, imei no, IMEI No, IMEI Number, imei number,
International Mobile Station Equipment Identity
Number, International Mobile Station Equipment
Identity

International Securities Identification Number

An International Securities Identification Number (ISIN) is a 12-character alphanumeric pattern
that uniquely identifies a security. Securities for which ISINs are issued include bonds,
commercial paper, stocks and warrants.
The International Securities Identification Number data identifier detects a 12-character
alphanumeric pattern that matches the International Securities Identification Number format.
■ The wide breadth detects a 12-character alphanumeric pattern without validation.
See “ International Securities Identification Number wide breadth” on page 1260.
■ The medium breadth detects a 12-character alphanumeric pattern with checksum validation.
See “International Securities Identification Number medium breadth” on page 1260.
Library of system data identifiers 1260
International Securities Identification Number

■ The narrow breadth detects a 12-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
See “International Securities Identification Number narrow breadth” on page 1260.

International Securities Identification Number wide breadth

The wide breadth detects a 12-character alphanumeric pattern without validation.

Table 45-512 International Securities Identification Number wide-breadth pattern

Pattern

\l{2}\w{9}\d

The wide breadth of the International Securities Identification Number includes no validators.

International Securities Identification Number medium breadth

The medium breadth detects a 12-character alphanumeric pattern with checksum validation.

Table 45-513 International Securities Identification Number medium-breadth pattern

Pattern

\l{2}\w{9}\d

Table 45-514 International Securities Identification Number medium-breadth validator

Mandatory validator Description

International Securities Identification Number Computes the checksum and validates the pattern against
Validation Check it.

International Securities Identification Number narrow breadth

The narrow breadth detects a 12-character alphanumeric pattern with checksum validation. It
also requires the presence of related keywords.

Table 45-515 International Securities Identification Number narrow-breadth pattern

Pattern

\l{2}\w{9}\d
Library of system data identifiers 1261
IP Address

Table 45-516 International Securities Identification Number narrow-breadth validators

Mandatory validators Description

International Securities Identification Number Computes the checksum and validates the pattern against
Validation Check it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

isin, i.s.i.n, International Securities Identification

Number, Standard & Poor's, S&P, National Numbering
Association, NNA ID, ID number, identification number,
Id no., international securities ID no., International
securities ID number

IP Address
An IP address is the computer networking code that is used to identify devices and facilitate
communications.
The IP Address data identifier detects IPv4 addresses.
This data identifier offers three breadths of detection:
■ The wide breadth detects IP addresses and validates their format.
See “IP Address wide breadth” on page 1261.
■ The medium breadth detects IP addresses, validates their format, and eliminates fictitious
addresses.
See “IP Address medium breadth” on page 1262.
■ The narrow breadth detects IP addresses, validates their format, and eliminates fictitious
and unassigned addresses.
See “IP Address narrow breadth” on page 1263.

IP Address wide breadth

The wide breadth of the IP Address data identifier detects numbers in format
DDD.DDD.DDD.DDD with an optional /DD. Each three-digit group must be between 0 and
255 inclusive and the /DD must be between 0 and 32. Additionally, 0.0.0.0 is not allowed.
Library of system data identifiers 1262
IP Address

Table 45-517 IP Address wide-breadth patterns

Patterns

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}/[0-9]

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}/[1-2][0-9]?

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}/[3][0-2]?

Table 45-518 IP Address wide-breadth validator

Mandatory validator Description

IP Basic Check Every IP address must match the format x.x.x.x and every
number must be less than 256.

IP Address medium breadth

The medium breadth of the IP Address data identifier detects numbers in format
DDD.DDD.DDD.DDD with an optional /DD. Each three-digit group must be between 0 and
255 inclusive and the /DD must be between 0 and 32. Additionally, 0.0.0.0 is not allowed. Also,
eliminates as common fictitious examples all 1-digit match groups such as 1.1.1.2.

Table 45-519 IP Address medium-breadth patterns

Patterns

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}/[0-9]

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}/[1-2][0-9]?

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}/[3][0-2]?

Table 45-520 IP Address medium-breadth validator

Mandatory Validator Description

IP Octet Check Every IP address must match the format x.x.x.x, every number must be less than 256,
and no IP address can contain only single-digit numbers (1.1.1.2).
Library of system data identifiers 1263
IPv6 Address

IP Address narrow breadth

The narrow breadth of the IP Address data identifier detects numbers in format
DDD.DDD.DDD.DDD with an optional /DD. Each three-digit group must be between 0 and
255 inclusive and the /DD must be between 0 and 32. Additionally, 0.0.0.0 is not allowed. Also,
eliminates as common fictitious examples all 1-digit match groups such as 1.1.1.2. Also
eliminates unassigned IP addresses ("bogons").

Table 45-521 IP Address narrow-breadth patterns

Patterns

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}/[0-9]

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}/[1-2][0-9]?

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}/[3][0-2]?

Table 45-522 IP Address narrow-breadth validators

Mandatory Validators Description

IP Octet Check Every IP address must match the format x.x.x.x, every number must be less than 256,
and no IP address can contain only single-digit numbers (1.1.1.2).

IP Reserved Range Check Checks whether the IP address falls into any of the "Bogons" ranges. If so, the match
is invalid.

IPv6 Address
Internet Protocol version 6 (IPv6) is the latest version of the Internet Protocol (IP), the
communications protocol that provides an identification and location system for computers on
networks and routes traffic across the Internet.
The IPv6 Address data identifier detects IPv6 addresses.
This data identifier offers three breadths of detection:
■ The wide breadth detects IPv6 addresses and validates their format.
See “IPv6 Address wide breadth” on page 1264.
■ The medium breadth detects IPv6 addresses and validates their format. It also validates
that they do not begin with the numeral 0.
See “IPv6 Address medium breadth” on page 1264.
Library of system data identifiers 1264
IPv6 Address

■ The narrow breadth detects IPv6 addresses and validates their format. It also validates
that they do not begin with the numeral 0. Address strings are fully compressed, not
normalized.
See “IPv6 Address narrow breadth” on page 1265.

IPv6 Address wide breadth

The wide breadth detects IPv6 addresses and validates that they match the format
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx.

Table 45-523 IPv6 Address wide-breadth patterns

Patterns

[0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

Pattern continues to 44 repetitions.

Table 45-524 IPv6 Address wide-breadth validator

Validator Description

IPv6 Address Basic Validation Check Checks every IPv6 address and verifies that they match
the xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx format.

IPv6 Address medium breadth

The medium breadth detects IPv6 addresses and validates that they match the format
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx. It also validates that they do not begin with the
numeral 0.

Table 45-525 IPv6 Address medium-breadth patterns

Patterns

[0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]
Library of system data identifiers 1265
IPv6 Address

Table 45-525 IPv6 Address medium-breadth patterns (continued)

Patterns

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

Pattern continues to 44 repetitions.

Table 45-526 IPv6 Address medium-breadth validator

Mandatory Validator Description

IPv6 Address Medium Checks every IPv6 address and verifies that they match the
Validation Check xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx format, and that addresses do not start with
the numeral 0.

IPv6 Address narrow breadth

The narrow breadth detects IPv6 addresses and validates that they match the format
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx. It also validates that they do not begin with the
numeral 0. Address strings are fully compressed, not normalized.

Table 45-527 IPv6 Address narrow-breadth patterns

Patterns

[0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

[0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%][0-9A-Fa-f:./%]

Pattern continues to 44 repetitions.

Library of system data identifiers 1266
Ireland Passport Number

Table 45-528 IPv6 Address narrow-breadth validator

Mandatory Validator Description

IPv6 Address Reserved Checks every IPv6 address and verifies that they match the
Validation Check xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx format, do not start with the numeral 0, and
are fully compressed.

Table 45-529 IPv6 Address narrow-breadth normalizer

Normalizer Description

Noop (No operation) String is passed as it is without normalizing.

Ireland Passport Number

An Irish passport is the passport issued to citizens of Ireland. An Irish passport enables the
bearer to travel internationally and serves as evidence of Irish citizenship and citizenship of
the European union. It also facilitates the access to consular assistance from both Irish
embassies and any embassy from other European union member states while abroad.
The Ireland Passport Number data identifier detects a seven- or nine-character alphanumeric
pattern that matches the Ireland Passport Number format.
The Ireland Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects a seven- or nine-character alphanumeric pattern without checksum
validation.
See “Ireland Passport Number wide breadth” on page 1266.
■ The narrow breadth detects a seven- or nine-character alphanumeric pattern without
checksum validation. It requires the presence of related keywords.
See “Ireland Passport Number narrow breadth” on page 1267.

Ireland Passport Number wide breadth

The wide breadth detects a seven- or nine-character alphanumeric pattern without checksum
validation.

Table 45-530 Ireland Passport Number wide-breadth patterns

Patterns

[a-zA-Z]{2}\d{7}

[a-zA-Z]\d{6}
Library of system data identifiers 1267
Ireland Passport Number

Table 45-530 Ireland Passport Number wide-breadth patterns (continued)

Patterns

[a-zA-Z]\d{8}

Table 45-531 Ireland Passport Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Ireland Passport Number narrow breadth

The narrow breadth detects a seven- or nine-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.

Table 45-532 Ireland Passport Number narrow-breadth patterns

Patterns

[a-zA-Z]{2}\d{7}

[a-zA-Z]\d{6}

[a-zA-Z]\d{8}

Table 45-533 Ireland Passport Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

passport number, passport, passport no, pas,

passeport, ireland passport, irelande passeport, Éire
pas, no de passeport, pas uimh, uimhir pas, numéro
de passeport
Library of system data identifiers 1268
Ireland Tax Identification Number

Ireland Tax Identification Number

The Ireland Tax Identification Number is issued by department of social protection for natural
persons and by revenue commissioner for non-natural persons. Non-natural persons can be
companies, partnerships, trusts, and unincorporated bodies.
The Ireland Tax Identification Number data identifier detects a six- to nine-character
alphanumeric pattern that matches the Ireland Tax Identification Number format.
The Ireland Tax Identification Number provides three breadths of detection:
■ The wide breadth detects a six- to nine-character alphanumeric pattern without checksum
validation.
See “Ireland Tax Identification Number wide breadth” on page 1268.
■ The medium breadth detects a six- to nine-character alphanumeric pattern with checksum
validation.
See “Ireland Tax Identification Number medium breadth” on page 1269.
■ The narrow breadth detects a six- to nine-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.
See “Ireland Tax Identification Number narrow breadth” on page 1270.

Ireland Tax Identification Number wide breadth

The wide breadth detects a six- to nine-character alphanumeric pattern without checksum
validation.

Table 45-534 Ireland Tax Identification Number wide-breadth patterns

Patterns

\d{7}[A-Wa-w]

\d{7} [A-Wa-w]

\d{3} \d{2} \d{2}[A-Wa-w]

\d{3} \d{2} \d{2} [A-Wa-w]

\d{7}[A-Wa-w][A-Ia-iWw]

\d{7} [A-Wa-w][A-Ia-iWw]

\d{3} \d{2} \d{2}[A-Wa-w][A-Ia-iWw]

\d{3} \d{2} \d{2} [A-Wa-w][A-Ia-iWw]

\d{3} \d{2} \d{2} [A-Wa-w] [A-Ia-iWw]

Library of system data identifiers 1269
Ireland Tax Identification Number

Table 45-534 Ireland Tax Identification Number wide-breadth patterns (continued)

Patterns

[Cc][Hh][Yy]\d{3}

[Cc][Hh][Yy] \d{3}

[Cc][Hh][Yy]\d{4}

[Cc][Hh][Yy] \d{4}

[Cc][Hh][Yy]\d{5}

[Cc][Hh][Yy] \d{5}

Table 45-535 Ireland Tax Identification Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Ireland Tax Identification Number medium breadth

The medium breadth detects a six- to nine-character alphanumeric pattern with checksum
validation.

Table 45-536 Ireland Tax Identification Number medium-breadth patterns

Patterns

\d{7}[A-Wa-w]

\d{7} [A-Wa-w]

\d{3} \d{2} \d{2}[A-Wa-w]

\d{3} \d{2} \d{2} [A-Wa-w]

\d{7}[A-Wa-w][A-Ia-iWw]

\d{7} [A-Wa-w][A-Ia-iWw]

\d{3} \d{2} \d{2}[A-Wa-w][A-Ia-iWw]

\d{3} \d{2} \d{2} [A-Wa-w][A-Ia-iWw]

\d{3} \d{2} \d{2} [A-Wa-w] [A-Ia-iWw]

[Cc][Hh][Yy]\d{3}
Library of system data identifiers 1270
Ireland Tax Identification Number

Table 45-536 Ireland Tax Identification Number medium-breadth patterns (continued)

Patterns

[Cc][Hh][Yy] \d{3}

[Cc][Hh][Yy]\d{4}

[Cc][Hh][Yy] \d{4}

[Cc][Hh][Yy]\d{5}

[Cc][Hh][Yy] \d{5}

Table 45-537 Ireland Tax Identification Number medium-breadth validator

Mandatory validator Description

Ireland Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Ireland Tax Identification Number narrow breadth

The narrow breadth detects a six- to nine-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.

Table 45-538 Ireland Tax Identification Number narrow-breadth patterns

Patterns

\d{7}[A-Wa-w]

\d{7} [A-Wa-w]

\d{3} \d{2} \d{2}[A-Wa-w]

\d{3} \d{2} \d{2} [A-Wa-w]

\d{7}[A-Wa-w][A-Ia-iWw]

\d{7} [A-Wa-w][A-Ia-iWw]

\d{3} \d{2} \d{2}[A-Wa-w][A-Ia-iWw]

\d{3} \d{2} \d{2} [A-Wa-w][A-Ia-iWw]

\d{3} \d{2} \d{2} [A-Wa-w] [A-Ia-iWw]

[Cc][Hh][Yy]\d{3}
Library of system data identifiers 1271
Ireland Value Added Tax (VAT) Number

Table 45-538 Ireland Tax Identification Number narrow-breadth patterns (continued)

Patterns

[Cc][Hh][Yy] \d{3}

[Cc][Hh][Yy]\d{4}

[Cc][Hh][Yy] \d{4}

[Cc][Hh][Yy]\d{5}

[Cc][Hh][Yy] \d{5}

Table 45-539 Ireland Tax Identification Number narrow-breadth validators

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

CHY, charity number, charity registration number, CHY

number, CHY#, CHY no., CHY no, TRN,TRN#, tax
reference number, ireland tax identification number,
irish tax identification, tax identification number, tax
id, taxid, taxid#, tax number, tax no, taxno#, tax#, TIN,
TIN#, ireland tin, tax id no, tax id no.

uimhir carthanachta, Uimhir chláraithe charthanais,

uimhir CHY, CHY uimh., uimhir thagartha cánach,
uimhir aitheantais cánach ireland, aitheantais cánach
irish, uimhir aitheantais cánach, id cánach, uimhir
chánach, cáin #, STÁIN, cáin id uimh.

Number delimiter Validates a match by checking the surrounding characters.

Ireland Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Ireland Value Added Tax (VAT) Number

VAT is a consumption tax that is borne by the end consumer. VAT is paid for each transaction
in the manufacturing and distribution process. For Ireland, the VAT number is issued by the
Irish tax authority.
Library of system data identifiers 1272
Ireland Value Added Tax (VAT) Number

The Ireland Value Added Tax (VAT) Number data identifier detects a 9- to 11-character
alphanumeric pattern that matches the Ireland Value Added Tax (VAT) Number format.
The Ireland Value Added Tax (VAT) Number data identifier provides three breadths of detection:
■ The wide breadth detects a 9- to 11-character alphanumeric pattern without checksum
validation.
See “Ireland Value Added Tax (VAT) Number wide breadth” on page 1272.
■ The medium breadth detects a 9- to 11-character alphanumeric pattern with checksum
validation.
See “Ireland Value Added Tax (VAT) Number medium breadth” on page 1273.
■ The narrow breadth detects a 9- to 11-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.
See “Ireland Value Added Tax (VAT) Number narrow breadth” on page 1273.

Ireland Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 9- to 11-character alphanumeric pattern without checksum
validation.

Table 45-540 Ireland Value Added Tax (VAT) Number wide-breadth patterns

Patterns

[Ii][Ee]\d{7}[A-Wa-w]

[Ii][Ee] \d{7}[A-Wa-w]

[Ii][Ee] \d{7} [A-Wa-w]

[Ii][Ee]\d{7}[A-Wa-w][HhAa]

[Ii][Ee] \d{7}[A-Wa-w][HhAa]

[Ii][Ee] \d{7} [A-Wa-w][HhAa]

[Ii][Ee][0-9][A-Za-z+*]\d{5}[A-Wa-w]

[Ii][Ee] [0-9][A-Za-z+*]\d{5}[A-Wa-w]

[Ii][Ee] [0-9] [A-Za-z+*]\d{5}[A-Wa-w]

Table 45-541 Ireland Value Added Tax (VAT) Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1273
Ireland Value Added Tax (VAT) Number

Ireland Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 9- to 11-character alphanumeric pattern with checksum
validation.

Table 45-542 Ireland Value Added Tax (VAT) Number medium-breadth patterns

Patterns

[Ii][Ee]\d{7}[A-Wa-w]

[Ii][Ee] \d{7}[A-Wa-w]

[Ii][Ee] \d{7} [A-Wa-w]

[Ii][Ee]\d{7}[A-Wa-w][HhAa]

[Ii][Ee] \d{7}[A-Wa-w][HhAa]

[Ii][Ee] \d{7} [A-Wa-w][HhAa]

[Ii][Ee][0-9][A-Za-z+*]\d{5}[A-Wa-w]

[Ii][Ee] [0-9][A-Za-z+*]\d{5}[A-Wa-w]

[Ii][Ee] [0-9] [A-Za-z+*]\d{5}[A-Wa-w]

Table 45-543 Ireland Value Added Tax (VAT) Number medium-breadth validator

Mandatory validator Description

Ireland VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Ireland Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 9- to 11-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.

Table 45-544 Ireland Value Added Tax (VAT) Number narrow-breadth patterns

Patterns

[Ii][Ee]\d{7}[A-Wa-w]

[Ii][Ee] \d{7}[A-Wa-w]

[Ii][Ee] \d{7} [A-Wa-w]

Library of system data identifiers 1274
Irish Personal Public Service Number

Table 45-544 Ireland Value Added Tax (VAT) Number narrow-breadth patterns (continued)

Patterns

[Ii][Ee]\d{7}[A-Wa-w][HhAa]

[Ii][Ee] \d{7}[A-Wa-w][HhAa]

[Ii][Ee] \d{7} [A-Wa-w][HhAa]

[Ii][Ee][0-9][A-Za-z+*]\d{5}[A-Wa-w]

[Ii][Ee] [0-9][A-Za-z+*]\d{5}[A-Wa-w]

[Ii][Ee] [0-9] [A-Za-z+*]\d{5}[A-Wa-w]

Table 45-545 Ireland Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

ireland vat number, vat number, vat no, VAT#, VAT,

value added tax number, value added tax, irish vat

cáin bhreisluacha, CBL, CBL aon, Uimhir CBL, Uimhir

CBL hÉireann, bhreisluacha uimhir chánach

Number delimiter Validates a match by checking the surrounding characters.

Ireland VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Irish Personal Public Service Number

The format of the number is a unique eight-character alphanumeric pattern ending with a letter,
such as 8765432A. The number is assigned at the registration of birth of the child and is issued
on a Public Services Card and is unique to every person.
The Irish Personal Public Service Number detects and eight-character alphanumeric pattern
that matches the Irish Personal Public Service Number format.
The Irish Personal Public Service Number system data identifier provides three breadths of
detection:
Library of system data identifiers 1275
Irish Personal Public Service Number

■ The wide breadth detects an eight-character alphanumeric pattern ending with a letter
without checksum validation.
See “Irish Personal Public Service Number wide breadth” on page 1275.
■ The medium breadth detects an eight-character alphanumeric pattern ending with a letter
with checksum validation.
See “Irish Personal Public Service Number medium breadth” on page 1275.
■ The narrow breadth detects an eight-character alphanumeric pattern ending with a letter
that passes checksum validation. It also requires the presence of related keywords.
See “Irish Personal Public Service Number narrow breadth” on page 1276.

Irish Personal Public Service Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern ending with a letter without
checksum validation.

Table 45-546 Irish Personal Public Service Number wide-breadth pattern

Pattern

\d{7}[a-wA-W]

Table 45-547 Irish Personal Public Service Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Irish Personal Public Service Number medium breadth

The medium breadth detects an eight-character alphanumeric pattern ending with a letter with
checksum validation.

Table 45-548 Irish Personal Public Service Number medium-breadth pattern

Pattern

\d{7}[a-wA-W]

Table 45-549 Irish Personal Public Service Number medium-breadth validator

Mandatory validator Description

Irish Personal Public Service Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1276
Israel Personal Identification Number

Irish Personal Public Service Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern ending with a letter with
checksum validation. It also requires the presence of related keywords.

Table 45-550 Irish Personal Public Service Number narrow-breadth pattern

Pattern

\d{7}[a-wA-W]

Table 45-551 Irish Personal Public Service Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Irish Personal Public Service Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

public service no, personal public service no, pps no,

PPS No, personal service no, PPS service no, ppsno#,
Irish PPS No, Irish pps no, PPSNO#, publicserviceno#,
personal public service number
uimhir phearsanta seirbhíse poiblí, pps uimh, Uimhir
aitheantais phearsanta

Israel Personal Identification Number

The Israel Personal Identification Number is a nine-digit number issued to all Israeli citizens
at birth by the Ministry of the Interior. Personal identification numbers are also issued to all
residents over 16 years old who have legal temporary or permanent residence status.
The Israel Personal Identification Number data identifier detects a nine-digit number that
matches the Israel Personal Identification Number format.
The Israel Personal Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects a nine-digit number without checksum validation.
See “Israel Personal Identification Number wide breadth” on page 1277.
■ The medium breadth detects a nine-digit number with checksum validation.
See “Israel Personal Identification Number medium breadth” on page 1277.
Library of system data identifiers 1277
Israel Personal Identification Number

■ The narrow breadth detects a nine-digit number with checksum validation. It also requires
the presence of related keywords.
See “Israel Personal Identification Number narrow breadth” on page 1277.

Israel Personal Identification Number wide breadth

The wide breadth detects a nine-digit number without checksum validation.

Table 45-552 Israel Personal Identification Number wide-breadth pattern

Pattern

\d{9}

Table 45-553 Israel Personal Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Israel Personal Identification Number medium breadth

The medium breadth detects a nine-digit number with checksum validation.

Table 45-554 Israel Personal Identification Number medium-breadth pattern

Pattern

\d{9}

Table 45-555 Israel Personal Identification Number medium-breadth validators

Mandatory validator Description

Israeli Identity Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding numbers.

Israel Personal Identification Number narrow breadth

The narrow breadth detects a nine-digit number with checksum validation. It also requires the
presence of related keywords.
Library of system data identifiers 1278
Italy Driver's Licence Number

Table 45-556 Israel Personal Identification Number narrow-breadth pattern

Pattern

\d{9}

Table 45-557 Israel Personal Identification Number narrow-breadth validators

Mandatory validators Description

Israel Personal Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

identity number, IDnumber#, israeliidentitynumber,

identitynumber#, identity no, Israeli identity number,
unique personal ID, personal ID, unique ID, unique
identity number

‫هو ية اسرائيل ية‬, ‫זהותישר אלית‬,‫מספר זיהוי ישר אלי‬,‫מספר זיה וי‬
‫عدد هوية فريدة من نوعها‬,‫رقم الهوية‬,‫هوية إسرائ يلية‬,‫عدد‬

Italy Driver's Licence Number

The Italy Driver's Licence Number is the identifier for an individual driver's license issued by
the Driver and Vehicle Licensing Agency of Italy.
The Italy Driver's Licence Number data identifier detects a 10-character alphanumeric pattern
that matches the Italy Driver's Licence Number format.
The Italy Driver's Licence Number data identifier provides two breadths of detection:
■ The wide breadth detects a 10-character alphanumeric pattern without checksum validation.
See “Italy Driver's Licence Number wide breadth” on page 1279.
■ The narrow breadth detects a 10-character alphanumeric pattern without checksum
validation. It also requires the presence of related keywords.
See “Italy Driver's Licence Number narrow breadth” on page 1279.
Library of system data identifiers 1279
Italy Driver's Licence Number

Italy Driver's Licence Number wide breadth

The wide breadth detects a 10-character alphanumeric pattern without checksum validation.

Table 45-558 Italy Driver's Licence Number wide-breadth pattern

Pattern

[A-Za-z][A-Za-z]\d{7}[A-Za-Z]

Table 45-559 Italy Driver's Licence Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Italy Driver's Licence Number narrow breadth

The narrow breadth detects a 10-character alphanumeric pattern without checksum validation.
It also requires the presence of related keywords.

Table 45-560 Italy Driver's Licence Number narrow-breadth patterns

Pattern

[A-Za-z][A-Za-z]\d{7}[A-Za-Z]

Table 45-561 Italy Driver's Licence Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

drivers licence number, drivers license number, driving

licence number, driving license number, drivers
license, driving licence, driving license

patente guida numero, patente di guida numero,

patente di guida, patente guida

Driver's License, Driver's License Number, driver's

license number, Driver's Licence Number
Library of system data identifiers 1280
Italy Health Insurance Number

Italy Health Insurance Number

The Italian Health Insurance Card is issued to every Italian citizen by the Italian Ministry of
Economy and Finance in cooperation with the Italian Agency of Revenue. The objective of the
card is to improve the social security services through expenditure control and performance,
and to optimize the use health services to citizens.
The Italy Health Insurance Number data identifier detects a 16-character alphanumeric pattern
that matches the Italy Health Insurance Number format.
The Italy Health Insurance Number data identifier provides two breadths of detection:
■ The wide breadth detects a 16-character alphanumeric pattern without checksum validation.
It also requires the presence of related keywords.
See “Italy Health Insurance Number wide breadth” on page 1280.
■ The wide breadth detects a 16-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
See “Italy Health Insurance Number narrow breadth” on page 1281.

Italy Health Insurance Number wide breadth

The wide breadth detects a 16-character alphanumeric pattern without checksum validation.
It also requires the presence of related keywords.

Table 45-562 Italy Health Insurance Number wide-breadth pattern

Pattern

[A-Z]{6}[0-9LMNPQRSTUV]{2}[ABCDEHLMPRST][0-9LMNPQRSTUV]

{2}[A-Z][0-9LMNPQRSTUV]{3}[A-Z]

[A-Z]{3} [A-Z]{3} [0-9LMNPQRSTUV]{2}[ABCDEHLMPRST]

[0-9LMNPQRSTUV]{2} [A-Z][0-9LMNPQRSTUV]{3}[A-Z]

Table 45-563 Italy Health Insurance Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Library of system data identifiers 1281
Italy Health Insurance Number

Table 45-563 Italy Health Insurance Number wide-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

TESSERA SANITARIA, tessera sanitaria, tessera

sanitaria italiana, Health Insurance Card, Italian health
insurance card, health insurance card, EHIC, health
card, ehic, Health Card

Italy Health Insurance Number narrow breadth

The wide breadth detects a 16-character alphanumeric pattern with checksum validation. It
also requires the presence of related keywords.

Table 45-564 Italy Health Insurance Number narrow-breadth patterns

Pattern

[A-Z]{6}[0-9LMNPQRSTUV]{2}[ABCDEHLMPRST][0-9LMNPQRSTUV]

{2}[A-Z][0-9LMNPQRSTUV]{3}[A-Z]

[A-Z]{3} [A-Z]{3} [0-9LMNPQRSTUV]{2}[ABCDEHLMPRST]

[0-9LMNPQRSTUV]{2} [A-Z][0-9LMNPQRSTUV]{3}[A-Z]

Table 45-565 Italy Health Insurance Number narrow-breadth validators

Mandatory validator Description

Codice Fiscale Control Key Check Computes the control key and checks if it is valid.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

TESSERA SANITARIA, tessera sanitaria, tessera

sanitaria italiana, Health Insurance Card, Italian health
insurance card, health insurance card, EHIC, health
card, ehic, Health Card
Library of system data identifiers 1282
Italy Passport Number

Italy Passport Number

Italian passports are issued to Italian citizens for the purpose of international travel.
The Italy Passport Number data identifier detects a nine-character alphanumeric pattern that
matches the Italy Passport Number format.
The Italy Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects a nine-character alphanumeric pattern without checksum
validation.
See “Italy Passport Number wide breadth” on page 1282.
■ The narrow breadth detects a nine-character alphanumeric pattern without checksum
validation. It also requires the presence of related keywords.
See “Italy Passport Number narrow breadth” on page 1282.

Italy Passport Number wide breadth

The wide breadth detects a nine-character alphanumeric pattern without checksum validation.

Table 45-566 Italy Passport Number wide-breadth pattern

Pattern

\l{2}\d{7}

Table 45-567 Italy Passport Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Italy Passport Number narrow breadth

The narrow breadth detects a nine-character alphanumeric pattern without checksum validation.
It also requires the presence of related keywords.

Table 45-568 Italy Passport Number narrow-breadth patterns

Pattern

\l{2}\d{7}
Library of system data identifiers 1283
Italy Value Added Tax (VAT) Number

Table 45-569 Italy Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Repubblica Italiana Passaporto, Passaporto,

Passaporto Italiana, passport number, Italiana
Passaporto numero, Passaporto numero, Numéro
passeport italien, numéro passeport, Italian passport
number

Italy Value Added Tax (VAT) Number

Value-Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Italy, the Value Added
Tax is issued by VAT office for the region in which the business is established.
The Italy Value Added Tax (VAT) Number data identifier detects a 13-character alphanumeric
pattern that matches the Italy Value Added Tax (VAT) Number format.
The Italy Value Added Tax (VAT) Number data identifier provides three breadths of detection:
■ The wide breadth detects a 13-character alphanumeric pattern preceded by IT, without
checksum validation.
See “Italy Value Added Tax (VAT) Number wide breadth” on page 1283.
■ The medium breadth detects a 13-character alphanumeric pattern preceded by IT, with
checksum validation.
See “Italy Value Added Tax (VAT) Number medium breadth” on page 1284.
■ The narrow breadth detects a 13-character alphanumeric pattern preceded by IT, with
checksum validation. It also requires the presence of related keywords.
See “Italy Value Added Tax (VAT) Number narrow breadth” on page 1285.

Italy Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 13-character alphanumeric pattern preceded by IT, without
checksum validation.
Library of system data identifiers 1284
Italy Value Added Tax (VAT) Number

Table 45-570 Italy Value Added Tax (VAT) Number wide-breadth pattern

Pattern

[Ii][Tt]\d{11}

[Ii][Tt] \d{11}

[Ii][Tt].\d{11}

[Ii][Tt]-\d{11}

[Ii][Tt],\d{11}

Table 45-571 Italy Value Added Tax (VAT) Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Italy Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 13-character alphanumeric pattern preceded by IT, with
checksum validation.

Table 45-572 Italy Value Added Tax (VAT) Number medium-breadth patterns

[Ii][Tt]\d{11}

[Ii][Tt] \d{11}

[Ii][Tt].\d{11}

[Ii][Tt]-\d{11}

[Ii][Tt],\d{11}

Table 45-573 Italy Value Added Tax (VAT) Number medium-breadth validator

Italy VAT Number Validation Check Checksum validator for the Italy Value Added Tax
(VAT) Number.
Library of system data identifiers 1285
Japan Driver's License Number

Italy Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 13-character alphanumeric pattern preceded by IT, with
checksum validation. It also requires the presence of related keywords.

Table 45-574 Italy Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Ii][Tt]\d{11}

[Ii][Tt] \d{11}

[Ii][Tt].\d{11}

[Ii][Tt]-\d{11}

[Ii][Tt],\d{11}

Table 45-575 Italy Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Italy VAT Number Validation Check Checksum validator for the Italy Value Added Tax (VAT)
Number.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

VAT Number, vat no, VAT#, IVA, numero partita IVA,

IVA#, numero IVA

Japan Driver's License Number

In Japan, a driving license is required when operating a car, motorcycle or moped on public
roads. Driving licenses are issued by the prefectural governments' public safety commissions
and are overseen on a nationwide basis by the National Police Agency.
The Japan Driver's License Number data identifier detects a 12-digit number that matches the
Japan Driver's License Number format.
The Japan Driver's License Number data identifier provides three breadths of detection:
■ The wide breadth detects a 12-digit number without checksum validation.
See “Japan Driver's License Number wide breadth” on page 1286.
Library of system data identifiers 1286
Japan Driver's License Number

■ The medium breadth detects a 12-digit number with checksum validation.

See “Japan Driver's License Number medium breadth” on page 1286.
■ The narrow breadth detects a 12-digit number with checksum validation. It also requires
the presence of related keywords.
See “Japan Driver's License Number narrow breadth” on page 1286.

Japan Driver's License Number wide breadth

The wide breadth detects a 12-digit number without checksum validation.

Table 45-576 Japan Driver's License Number wide-breadth pattern

Pattern

\d{12}

Table 45-577 Japan Driver's License Number validator

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Japan Driver's License Number medium breadth

The medium breadth detects a 12-digit number with checksum validation.

Table 45-578 Japan Driver's License Number medium-breadth pattern

Pattern

\d{12}

Table 45-579 Japan Driver's License Number medium-breadth validator

Mandatory validator Description

Japan Driver's License Number Validation Check Computes the checksum and validates the pattern against
it.

Japan Driver's License Number narrow breadth

The narrow breadth detects a 12-digit number with checksum validation. It also requires the
presence of related keywords.
Library of system data identifiers 1287
Japan Passport Number

Table 45-580 Japan Driver's License Number narrow-breadth pattern

Pattern

\d{12}

Table 45-581 Japan Driver's License Number narrow-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Japan Driver's License Number Validation Check Computes the checksum and validates the pattern against
it.

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

公安委員会,番号,免許,交付,運転免許,運転免許証,ドライ
バライセンス,ドライバーズライセンス,ライセンス,運転
免許証番号

driver's license,driving license,driver license,driver's

license number,driving license number,driver license
number,license

Japan Passport Number

Japan Passport Numbers are issued to Japanese citizens for international travel.
The Japan Passport Number detects a valid Japanese passport number pattern.
The Japan Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects a valid Japanese passport number pattern.
See “Japan Passport Number wide breadth” on page 1287.
■ The narrow breadth detects a valid Japanese passport number pattern. It also requires the
presence of related keywords.
See “Japan Passport Number narrow breadth” on page 1288.

Japan Passport Number wide breadth

The wide breadth detects a valid Japanese passport number pattern.
Library of system data identifiers 1288
Japan Passport Number

Table 45-582 Japan Passport Number wide-breadth patterns

Patterns

\l{2}\d{3}\l\d{2}\l\d

\l{2}\d{4}\l\d\l\d

\l\d{4}\l\d{2}\l\d

\l\d{4}\l\d{2}\l{2}\d

\l{2}\d{3}\l\d{2}\l{2}\d

\l{2}\d{8}

\l{2}\d{7}

\l\d{8}

Table 45-583 Japan Passport Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Japan Passport Number narrow breadth

The narrow breadth detects a valid Japanese passport number pattern. It also requires the
presence of related keywords.

Table 45-584 Japan Passport Number narrow-breadth patterns

Patterns

\l{2}\d{3}\l\d{2}\l\d

\l{2}\d{4}\l\d\l\d

\l\d{4}\l\d{2}\l\d

\l\d{4}\l\d{2}\l{2}\d

\l{2}\d{3}\l\d{2}\l{2}\d

\l{2}\d{8}

\l{2}\d{7}

\l\d{8}
Library of system data identifiers 1289
Japanese Juki-Net Identification Number

Table 45-585 Japan Passport Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

日本国旅券, パスポート, パスポート数, passport,

Passport, JAPAN PASSPORT, Japan Passport, japan
passport, Passport Book, passport book

Japanese Juki-Net Identification Number

The Juki Net Identification Number is a unique number assigned to both Japanese and foreign
residents for confirming their personal identification.
The Japanese Juki-Net Identification Number detects an 11-digit number that matches the
Japanese Juki-Net Identification Number format.
The Juki-Net Identification Number system data identifier provides three breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “Japanese Juki-Net Identification Number wide breadth” on page 1289.
■ The medium breadth detects an 11-digit number with checksum validation.
See “Japanese Juki-Net Identification Number medium breadth” on page 1290.
■ The narrow breadth detects an 11-digit number that passes checksum validation. It also
requires the presence of related keywords.
See “Japanese Juki-Net Identification Number narrow breadth” on page 1290.

Japanese Juki-Net Identification Number wide breadth

The wide breadth detects an 11-digit number without checksum validation.

Table 45-586 Japanese Juki-Net Identification Number wide-breadth pattern

Pattern

\d{11}
Library of system data identifiers 1290
Japanese Juki-Net Identification Number

Table 45-587 Japanese Juki-Net Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Japanese Juki-Net Identification Number medium breadth

The medium breadth detects an 11-digit number with checksum validation.

Table 45-588 Japanese Juki-Net Identification Number medium-breadth pattern

Pattern

\d{11}

Table 45-589 Japanese Juki-Net Identification Number medium-breadth validator

Mandatory validator Description

Japanese Juki-Net Id Validation Check Validator computes checksum number that every Japanese
Juki-net card number must pass.

Number delimiter Validates a match by checking the surrounding characters.

Japanese Juki-Net Identification Number narrow breadth

The narrow breadth detects an 11-digit number that passes checksum validation. It also
requires the presence of related keywords.

Table 45-590 Japanese Juki-Net Identification Number narrow-breadth pattern

Pattern

\d{11}

Table 45-591 Japanese Juki-Net Identification Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Japanese Juki-Net Id Validation Check Validator computes checksum number that every Japanese
Juki-net card number must pass..
Library of system data identifiers 1291
Japanese My Number - Corporate

Table 45-591 Japanese Juki-Net Identification Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

juki net identity number, juki net number, identification

number, Juki Net No, jukinetno# personal identification
number, juki net no, jukinetnumber#, unique jukinet
ID

住基ネット識別番号, 住基ネット番号, 識別番号, 個人識

別番号, ID番号, ユニークID番号

Japanese My Number - Corporate

The Japanese My Number - Corporate is a unique identifier for Japanese corporations used
for tax administration, social security administration, and disaster response.
The Japanese My Number - Corporate data identifier detects a 13-digit number that matches
the My Number - Corporate format.
The Japanese My Number - Corporate data identifier provides two breadths of detection:
■ The wide breadth detects a 13-digit number with checksum validation.
See “ Japanese My Number - Corporate wide breadth” on page 1291.
■ The narrow breadth detects a 13-digit number with checksum validation. It also requires
the presence of related keywords.
See “Japanese My Number - Corporate narrow breadth” on page 1292.

Japanese My Number - Corporate wide breadth

The wide breadth detects a 13-digit number with checksum validation.

Table 45-592 Japanese My Number - Corporate wide-breadth pattern

Pattern

\d{13}
Library of system data identifiers 1292
Japanese My Number - Personal

Table 45-593 Japanese My Number - Corporate wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Japanese My Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding numbers.

Japanese My Number - Corporate narrow breadth

The narrow breadth detects a 13-digit number with checksum validation. It also requires the
presence of a Japanese My Number-related keyword.

Table 45-594 Japanese My Number - Corporate narrow-breadth pattern

Pattern

\d{13}

Table 45-595 Japanese My Number - Corporate narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Japanese My Number Validation Check Computes the checksum and validates the pattern against
it.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

000000000000

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

マイナンバー, 共通番号

Japanese My Number - Personal

The Japanese My Number - Personal is a unique identifier for Japanese citizens and residents
used for tax administration, social security administration, and disaster response.
Library of system data identifiers 1293
Japanese My Number - Personal

The Japanese My Number - Personal data identifier detects a 12-digit number that matches
the My Number - Personal format.
■ The wide breadth detects a 12-digit number with checksum validation.
See “Japanese My Number - Personal wide breadth” on page 1293.
■ The medium breadth detects a 12-digit number with checksum validation.
See “Japanese My Number - Personal medium breadth” on page 1293.
■ The narrow breadth detects a 12-digit number with checksum validation. It also requires
the presence of related keywords.
See “Japanese My Number - Personal narrow breadth” on page 1294.

Japanese My Number - Personal wide breadth

The wide breadth detects a 12-digit number with checksum validation.

Table 45-596 Japanese My Number - Personal wide-breadth pattern

Pattern

\d{12}

Table 45-597 Japanese My Number - Personal wide-breadth validators

Mandatory validator Description

Japanese My Number Validation Check Computes the checksum and validates the pattern against
it.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

000000000000

Japanese My Number - Personal medium breadth

The medium breadth detects a 12-digit number with checksum validation.

Table 45-598 Japanese My Number - Personal medium-breadth patterns

Pattern

\d{12}

\d{4} \d{4} \d{4}

\d{4}-\d{4}-\d{4}
Library of system data identifiers 1294
Japanese My Number - Personal

Table 45-598 Japanese My Number - Personal medium-breadth patterns (continued)

Pattern

\d{4}.\d{4}.\d{4}

Table 45-599 Japanese My Number - Personal medium-breadth validators

Mandatory validator Description

Japanese My Number Validation Check Computes the checksum and validates the pattern against
it.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

000000000000

Japanese My Number - Personal narrow breadth

The narrow breadth detects a 12-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-600 Japanese My Number - Personal narrow-breadth patterns

Pattern

\d{12}

\d{4} \d{4} \d{4}

\d{4}-\d{4}-\d{4}

\d{4}.\d{4}.\d{4}

Table 45-601 Japanese My Number - Personal narrow-breadth validators

Mandatory validator Description

Japanese My Number Validation Check Computes the checksum and validates the pattern against
it.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

000000000000
Library of system data identifiers 1295
Kazakhstan Passport Number

Table 45-601 Japanese My Number - Personal narrow-breadth validators (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

マイナンバー, 個人番号, 共通番号

Kazakhstan Passport Number

Kazakhstani passports are issued to citizens of the Republic of Kazakhstan to facilitate
international travel.
The Kazakhstan Passport Number data identifier detects an eight-character alphanumeric
pattern that matches the Kazakhstan Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern that matches the
Kazakhstan Passport Number format. It checks for common test patterns.
See “Kazakhstan Passport Number wide breadth” on page 1295.
■ The narrow breadth detects an eight-character alphanumeric pattern that matches the
Kazakhstan Passport Number format. It checks for common test patterns, and also requires
the presence of related keywords.
See “Kazakhstan Passport Number narrow breadth” on page 1296.

Kazakhstan Passport Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern that matches the Kazakhstan
Passport Number format. It checks for common test patterns.

Table 45-602 Kazakhstan Passport Number wide-breadth patterns

Pattern

[A-Za-z]\d\d\d\d\d\d\d

Table 45-603 Kazakhstan Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1296
Korea Passport Number

Table 45-603 Kazakhstan Passport Number wide-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Kazakhstan Passport Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern that matches the
Kazakhstan Passport Number format. It checks for common test patterns, and also requires
the presence of related keywords.

Table 45-604 Kazakhstan Passport Number narrow-breadth patterns

Pattern

[A-Za-z]\d\d\d\d\d\d\d

Table 45-605 Kazakhstan Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

passport, passport number, passport no, passportno,

passport no., passport#, passportno#

төлқұжат, төлқұжат нөмірі, номер паспорта,

заграничный пасспорт, национальный паспорт

Korea Passport Number

Korean Passports are issued to Korean citizens to facilitate international travel.
Library of system data identifiers 1297
Korea Passport Number

The Korea Passport Number data identifier detects a valid Korean passport number.
The Korea Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects a valid Korean Passport Number pattern.
See “Korea Passport Number wide breadth” on page 1297.
■ The narrow breadth detects a valid Korean Passport Number pattern. It also requires the
presence of related keywords.
See “Korea Passport Number narrow breadth” on page 1297.

Korea Passport Number wide breadth

The wide breadth detects a valid Korean Passport Number pattern.

Table 45-606 Korea Passport Number wide-breadth patterns

Patterns

\l{2}\d{7}

\l\d{8}

\d{9}

Table 45-607 Korea Passport Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Korea Passport Number narrow breadth

The narrow breadth detects a valid Korean Passport Number pattern. It also requires the
presence of related keywords.

Table 45-608 Korea Passport Number narrow-breadth patterns

Patterns

\l{2}\d{7}

\l\d{8}

\d{9}
Library of system data identifiers 1298
Korea Residence Registration Number for Foreigners

Table 45-609 Korea Passport Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

한국어 여권, 여권, 여권 번호, 조선 민주주의 인민 공화국,

대한민국

passport, Passport, KOREA PASSPORT, Korea

Passport, korea passport, Book, passport book, South
Korea, Republic of Korea

Korea Residence Registration Number for Foreigners

A foreign resident registration number is a 13-digit number issued to all foreign residents of
the Republic of Korea. It is used to identify people in various private transactions such as in
banking and employment and for online identification purposes.
The Korea Residence Registration Number for Foreigners data identifier detects a 13-digit
number that matches the Korea Residence Registration Number for Foreigners format.
The Korea Residence Registration Number for Foreigners data identifier provides three breadths
of detection:
■ The wide breadth detects a 13-digit number without checksum validation.
See “Korea Residence Registration Number for Foreigners wide breadth” on page 1298.
■ The medium breadth detects a 13-digit number with checksum validation.
See “Korea Residence Registration Number for Foreigners medium breadth” on page 1299.
■ The narrow breadth detects a 13-digit number with checksum validation. It also requires
the presence of related keywords.
See “Korea Residence Registration Number for Foreigners narrow breadth” on page 1299.

Korea Residence Registration Number for Foreigners wide breadth

The wide breadth detects a 13-digit number without checksum validation.
Library of system data identifiers 1299
Korea Residence Registration Number for Foreigners

Table 45-610 Korea Residence Registration Number for Foreigners wide-breadth patterns

Patterns

\d{2}[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d{8}

\d\d[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d[ ]\d{7}

Table 45-611 Korea Residence Registration Number for Foreigners wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Korea Residence Registration Number for Foreigners medium

breadth
The medium breadth detects a 13-digit number with checksum validation.

Table 45-612 Korea Residence Registration Number for Foreigners medium-breadth patterns

Patterns

\d{2}[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d{8}

\d\d[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d[ ]\d{7}

Table 45-613 Korea Residence Registration Number for Foreigners medium-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

KRRN Foreign Validation Check Computes the checksum and validates the pattern against
it.

Korea Residence Registration Number for Foreigners narrow breadth

The narrow breadth detects a 13-digit number with checksum validation. It also requires the
presence of related keywords.
Library of system data identifiers 1300
Korea Residence Registration Number for Korean

Table 45-614 Korea Residence Registration Number for Foreigners narrow-breadth patterns

Patterns

\d{2}[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d{8}

\d\d[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d[ ]\d{7}

Table 45-615 Korea Residence Registration Number for Foreigners narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

KRRN Foreign Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

외국인 등록 번호, 주민번호, Foreign Registration

Number, Foreign Resident Number

Korea Residence Registration Number for Korean

A resident registration number is a 13-digit number issued to all residents of the Republic of
Korea. Similar to national identification numbers in other countries, it is used to identify people
in various private transactions such as in banking and employment. It is also used extensively
for online identification purposes.
The Korea Residence Registration Number for Korean data identifier detects a 13-digit number
that matches the residence registration number format.
The Korea Residence Registration Number for Korean data identifier provides three breadths
of detection:
■ The wide breadth detects a 13-digit number without checksum validation.
See “Korea Residence Registration Number for Korean wide breadth” on page 1301.
■ The medium breadth detects a 13-digit number with checksum validation.
See “Korea Residence Registration Number for Korean medium breadth” on page 1301.
Library of system data identifiers 1301
Korea Residence Registration Number for Korean

■ The narrow breadth detects a 13-digit number with checksum validation. It also requires
the presence of related keywords.
See “Korea Residence Registration Number for Korean narrow breadth” on page 1302.

Korea Residence Registration Number for Korean wide breadth

The wide breadth detects a 13-digit number without checksum validation.

Table 45-616 Korea Residence Registration Number for Korean wide-breadth patterns

Patterns

\d{2}[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d{8}

\d\d[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d[ ]\d{7}

Table 45-617 Korea Residence Registration Number for Korean wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Korea Residence Registration Number for Korean medium breadth

The medium breadth detects a 13-digit number with checksum validation.

Table 45-618 Korea Residence Registration Number for Korean medium-breadth patterns

Patterns

\d{2}[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d{8}

\d\d[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d[ ]\d{7}

Table 45-619 Korea Residence Registration Number for Korean medium-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1302
Korea Residence Registration Number for Korean

Table 45-619 Korea Residence Registration Number for Korean medium-breadth validators
(continued)

Mandatory validator Description

Advanced KRRN Validation Validates that the third and fourth digits represent a valid
month, and that the fifth and sixth digits represent a valid
day. Validates the checksum of the pattern.

Korea Residence Registration Number for Korean narrow breadth

The narrow breadth detects a 13-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-620 Korea Residence Registration Number for Korean narrow-breadth patterns

Patterns

\d{2}[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d{8}

\d\d[01]\d[0123]\d-\d{7}

\d{2}[01]\d[0123]\d[ ]\d{7}

Table 45-621 Korea Residence Registration Number for Korean narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Advanced KRRN Validation Validates that the third and fourth digits represent a valid
month, and that the fifth and sixth digits represent a valid
day. Validates the checksum of the pattern.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

주민등록번호, 주민번호

Resident Registration Number, Resident Number

Library of system data identifiers 1303
Latvia Driver's Licence Number

Latvia Driver's Licence Number

A driver's license in Latvia is a document issued by the Road Traffic Safety Directorate,
confirming the rights of the holder to drive motor vehicles.
The Latvia Driver's Licence Number data identifier detects an eight- or nine-character
alphanumeric pattern that matches the Latvia Driver's Licence Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight- or nine-character alphanumeric pattern that matches
the Latvia Driver's Licence Number format. It checks for common test numbers.
See “Latvia Driver's Licence Number wide breadth” on page 1303.
■ The narrow breadth detects an eight- or nine-character alphanumeric pattern that matches
the Latvia Driver's Licence Number format. It checks for common test numbers, and also
requires the presence of related keywords.
See “Latvia Driver's Licence Number narrow breadth” on page 1304.

Latvia Driver's Licence Number wide breadth

The wide breadth detects an eight- or nine-character alphanumeric pattern that matches the
Latvia Driver's Licence Number format. It checks for common test numbers.

Table 45-622 Latvia Driver's Licence Number wide-breadth patterns

Pattern

[a-zA-Z]{2}\d{6}

[a-zA-Z]{2}\d{7}

[a-zA-Z]{3}\d{6}

Table 45-623 Latvia Driver's Licence Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999, 0000000, 1111111,
2222222, 3333333, 4444444, 5555555, 6666666,
7777777, 8888888, 9999999
Library of system data identifiers 1304
Latvia Driver's Licence Number

Latvia Driver's Licence Number narrow breadth

The narrow breadth detects

Table 45-624 Latvia Driver's Licence Number narrow-breadth patterns

Pattern

[a-zA-Z]{2}\d{6}

[a-zA-Z]{2}\d{7}

[a-zA-Z]{3}\d{6}

Table 45-625 Latvia Driver's Licence Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999, 0000000, 1111111,
2222222, 3333333, 4444444, 5555555, 6666666,
7777777, 8888888, 9999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

licence number, driver license, driver licence, drivers

license, drivers licence, driving license, driving licence,
driver license number, driver licence number, drivers
license number, drivers licence number, driving license
number, driving licence number, driver's license,
driver's licence, Driver's License, Driver's Licence,
driver's license number, driver's licence number,
Driver's License Number, Driver's Licence Number,
DLNo#, dlno#, drivers lic., driver permit, drivers permit,
driving permit, license number

licences numurs, vadītāja apliecība, autovadītāja

apliecība, vadītāja apliecības numurs, Vadītāja licences
numurs, vadītāji lic., vadītāja atļauja
Library of system data identifiers 1305
Latvia Passport Number

Latvia Passport Number

Latvian passports are issued to citizens of Latvia for identity and international travel purposes.
The territorial section of The Office of Citizenship and Migration Affairs issues passports.
The Latvia Passport Number data identifier detects a nine-character alphanumeric pattern that
matches the Latvia Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a nine-character alphanumeric pattern that matches the Latvia
Passport Number format. It checks for common test patterns.
See “Latvia Passport Number wide breadth” on page 1305.
■ The narrow breadth detects a nine-character alphanumeric pattern that matches the Latvia
Passport Number format. It checks for common test patterns, and also requires the presence
of related keywords.
See “Latvia Passport Number narrow breadth” on page 1305.

Latvia Passport Number wide breadth

The wide breadth detects a nine-character alphanumeric pattern that matches the Latvia
Passport Number format. It checks for common test patterns.

Table 45-626 Latvia Passport Number wide-breadth patterns

Pattern

[Ll][A-Za-z]\d{7}

Table 45-627 Latvia Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Latvia Passport Number narrow breadth

The narrow breadth detects a nine-character alphanumeric pattern that matches the Latvia
Passport Number format. It checks for common test patterns, and also requires the presence
of related keywords.
Library of system data identifiers 1306
Latvia Personal Identification Number

Table 45-628 Latvia Passport Number narrow-breadth patterns

Pattern

[Ll][A-Za-z]\d{7}

Table 45-629 Latvia Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

Passport, passport number, passport, passport no,

passport book, passport#, passportno, passport card,
LATVIJA, LETTONIE, Pases Nr., Pases Nr, Passport
No., Passport No, Passeport No., Passeport No, Pase,
pase, PASSPORT, PASSEPORT, pases numurs, Pases
Nr, pases grāmata, pase#, pases karte

Latvia Personal Identification Number

The Latvian personal identification number is used for national identity and as a tax identification
number for financial purposes. It is issued by the office of citizenship and migration affairs of
the Ministry of Interior.
The Latvia Personal Identification Number data identifier detects an 11-digit number that
matches the Latvia Personal Identification Number format.
The Latvia Personal Identification Number provides three breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “Latvia Personal Identification Number wide breadth” on page 1307.
■ The medium breadth detects an 11-digit number with checksum validation.
See “Latvia Personal Identification Number medium breadth” on page 1307.
■ The narrow breadth detects an 11-digit number with checksum validation. It also requires
the presence of related keywords.
See “Latvia Personal Identification Number narrow breadth” on page 1307.
Library of system data identifiers 1307
Latvia Personal Identification Number

Latvia Personal Identification Number wide breadth

The wide breadth detects an 11-digit number without checksum validation.

Table 45-630 Latvia Personal Identification Number wide-breadth patterns

Patterns

\d{2}[01]\d{3}-[012]\d{4}

\d{2}[01]\d{3}[012]\d{4}

32\d{9}

Table 45-631 Latvia Personal Identification Number wide-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Latvia Personal Identification Number medium breadth

The medium breadth detects an 11-digit number with checksum validation.

Table 45-632 Latvia Personal Identification Number medium-breadth patterns

Patterns

\d{2}[01]\d{3}-[012]\d{4}

\d{2}[01]\d{3}[012]\d{4}

32\d{9}

Table 45-633 Latvia Personal Identification Number medium-breadth validator

Mandatory validator Description

Latvia Personal Code Check Computes the checksum and validates the pattern against
it.

Latvia Personal Identification Number narrow breadth

The narrow breadth detects an 11-digit number with checksum validation. It also requires the
presence of related keywords.
Library of system data identifiers 1308
Latvia Value Added Tax (VAT) Number

Table 45-634 Latvia Personal Identification Number narrow-breadth patterns

Patterns

\d{2}[01]\d{3}-[012]\d{4}

\d{2}[01]\d{3}[012]\d{4}

32\d{9}

Table 45-635 Latvia Personal Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Latvia Personal Code Check Computes the checksum and validates the pattern against
it.

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

latvia personal code, personal code, national

identification number, identification number, national
id, id#, latvia tin, tin, tax identification number, tin#,
tax id, tin no, tin number, tax number

Personas kods, personas kods, latvijas personas kods,

Valsts identifikācijas numurs, valsts identifikācijas
numurs, identifikācijas numurs, nacionālais id, latvija
alva, alva, nodokļu identifikācijas numurs, nodokļu id,
alvas nē, nodokļa numurs

Latvia Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. In Latvia, VAT is administered
by the State Revenue Service.
The Latvia Value Added Tax (VAT) Number data identifier detects a 13-character alphanumeric
pattern beginning with LV that matches the Latvia VAT Number format.
This data identifier provides the following breadths of detection:
Library of system data identifiers 1309
Latvia Value Added Tax (VAT) Number

■ The wide breadth detects a 13-character alphanumeric pattern beginning with LV that
matches the Latvia VAT Number format without checksum validation. It checks for common
test patterns.
See “Latvia Value Added Tax (VAT) Number wide breadth” on page 1309.
■ The medium breadth detects a 13-character alphanumeric pattern beginning with LV that
matches the Latvia VAT Number format with checksum validation.
See “Latvia Value Added Tax (VAT) Number medium breadth” on page 1309.
■ The narrow breadth detects a 13-character alphanumeric pattern beginning with LV that
matches the Latvia VAT Number format with checksum validation. It checks for common
test patterns, and also requires the presence of related keywords.
See “Latvia Value Added Tax (VAT) Number narrow breadth” on page 1310.

Latvia Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 13-character alphanumeric pattern beginning with LV that matches
the Latvia VAT Number format without checksum validation. It checks for common test patterns.

Table 45-636 Latvia Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Ll][Vv]\d{11}

[Ll][Vv] \d{11}

Table 45-637 Latvia Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000000, 11111111111, 22222222222,

33333333333, 44444444444, 55555555555,
66666666666, 77777777777, 88888888888, 99999999999

Latvia Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 13-character alphanumeric pattern beginning with LV that
matches the Latvia VAT Number format with checksum validation.
Library of system data identifiers 1310
Latvia Value Added Tax (VAT) Number

Table 45-638 Latvia Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Ll][Vv]\d{11}

[Ll][Vv] \d{11}

Table 45-639 Latvia Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Latvia Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.

Latvia Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 13-character alphanumeric pattern beginning with LV that matches
the Latvia VAT Number format with checksum validation. It checks for common test patterns,
and also requires the presence of related keywords.

Table 45-640 Latvia Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Ll][Vv]\d{11}

[Ll][Vv] \d{11}

Table 45-641 Latvia Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000000, 11111111111, 22222222222,

33333333333, 44444444444, 55555555555,
66666666666, 77777777777, 88888888888, 99999999999

Latvia Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.
Library of system data identifiers 1311
Liechtenstein Passport Number

Table 45-641 Latvia Value Added Tax (VAT) Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat, vat number, value added tax, value added tax

number, vat identification number, vat#, vat no, VAT,
VAT#, vatin, VATIN

PVN Nr, PVN maksātāja numurs, PVN numurs, Vat Nr,

PVN#, pievienotās vērtības nodoklis, pievienotās
vērtības nodokļa numurs

Liechtenstein Passport Number

Liechtenstein passports are issued to nationals of Liechtenstein for the purpose of international
travel. The passport may also serve as proof of Liechtensteiner citizenship.
The Liechtenstein Passport Number data identifier detects a six-character alphanumeric pattern
that matches the Liechtenstein Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a six-character alphanumeric pattern that matches the
Liechtenstein Passport Number format. It checks for common test patterns.
See “Liechtenstein Passport Number wide breadth” on page 1311.
■ The narrow breadth detects a six-character alphanumeric pattern that matches the
Liechtenstein Passport Number format. It checks for common test patterns, and also requires
the presence of related keywords.
See “Liechtenstein Passport Number narrow breadth” on page 1312.

Liechtenstein Passport Number wide breadth

The wide breadth detects a six-character alphanumeric pattern that matches the Liechtenstein
Passport Number format. It checks for common test patterns.

Table 45-642 Liechtenstein Passport Number wide-breadth patterns

Pattern

[a-zA-Z]\d\d\d\d\d
Library of system data identifiers 1312
Lithuania Personal Identification Number

Table 45-643 Liechtenstein Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000, 11111, 22222, 33333, 44444, 55555, 66666,

77777, 88888, 99999

Liechtenstein Passport Number narrow breadth

The narrow breadth detects a six-character alphanumeric pattern that matches the Liechtenstein
Passport Number format. It checks for common test patterns, and also requires the presence
of related keywords.

Table 45-644 Liechtenstein Passport Number narrow-breadth patterns

Pattern

[a-zA-Z]\d\d\d\d\d

Table 45-645 Liechtenstein Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000, 11111, 22222, 33333, 44444, 55555, 66666,

77777, 88888, 99999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

passport, passport number, passport no, passportno,

passport no., passport#, passportno#, Reisepass, Pass
Nr, Pass Nr., Reisepass#, Pass Nr#

Lithuania Personal Identification Number

In Lithuania, the personal identification code is a number based on the sex and birth date of
a person. This code is used as a unique personal identifier by governmental and other systems
Library of system data identifiers 1313
Lithuania Personal Identification Number

where identification is required, as well as for digital signatures using the national identity card
and its associated certificates.
The Lithuania Personal Identification Number data identifier detects an 11-digit number that
matches the Lithuania Personal Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-digit number that matches the Lithuania Personal
Identification Number format without checksum validation. It checks for common test
numbers.
See “Lithuania Personal Identification Number wide breadth” on page 1313.
■ The medium breadth detects an 11-digit number that matches the Lithuania Personal
Identification Number format with checksum validation.
See “Lithuania Personal Identification Number medium breadth” on page 1314.
■ The narrow breadth detects an 11-digit number that matches the Lithuania Personal
Identification Number format with checksum validation. It checks for common test numbers,
and also requires the presence of related keywords.
See “Lithuania Personal Identification Number narrow breadth” on page 1314.

Lithuania Personal Identification Number wide breadth

The wide breadth detects an 11-digit number that matches the Lithuania Personal Identification
Number format without checksum validation. It checks for common test numbers.

Table 45-646 Lithuania Personal Identification Number wide-breadth patterns

Pattern

\d{3}[01]\d[0123]\d{5}

\d \d{2}[01]\d[0123]\d \d{4}

\d{3}[01]\d[0123]\d{4} \d

Table 45-647 Lithuania Personal Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.
Library of system data identifiers 1314
Lithuania Personal Identification Number

Lithuania Personal Identification Number medium breadth

The medium breadth detects an 11-digit number that matches the Lithuania Personal
Identification Number format with checksum validation.

Table 45-648 Lithuania Personal Identification Number medium-breadth patterns

Pattern

\d{3}[01]\d[0123]\d{5}

\d \d{2}[01]\d[0123]\d \d{4}

\d{3}[01]\d[0123]\d{4} \d

Table 45-649 Lithuania Personal Identification Number medium-breadth validators

Mandatory validator Description

Estonia Personal Identification Number Check Computes the checksum and validates the pattern against
it.

Lithuania Personal Identification Number narrow breadth

The narrow breadth detects an 11-digit number that matches the Lithuania Personal
Identification Number format with checksum validation. It checks for common test numbers,
and also requires the presence of related keywords.

Table 45-650 Lithuania Personal Identification Number narrow-breadth patterns

Pattern

\d{3}[01]\d[0123]\d{5}

\d \d{2}[01]\d[0123]\d \d{4}

\d{3}[01]\d[0123]\d{4} \d

Table 45-651 Lithuania Personal Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Estonia Personal Identification Number Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1315
Lithuania Tax Identification Number

Table 45-651 Lithuania Personal Identification Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

national ID, national identification number, personal

ID, personal identification number, nationalid#,
personalid#, personal identification code, PID#

Nacionalinis ID, Nacionalinis identifikavimo numeris,

asmens kodas

Lithuania Tax Identification Number

The Lithuanian Taxpayer Identification Number is used to identify taxpayers and facilitate the
administration of their national tax affairs.
The Lithuania Tax Identification Number data identifier detects and 11-digit number that matches
the Lithuania Tax Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects and 11-digit number that matches the Lithuania Tax Identification
Number format without checksum validation. It checks for common test numbers.
See “Lithuania Tax Identification Number wide breadth” on page 1315.
■ The medium breadth detects and 11-digit number that matches the Lithuania Tax
Identification Number format with checksum validation.
See “Lithuania Tax Identification Number medium breadth” on page 1316.
■ The narrow breadth detects and 11-digit number that matches the Lithuania Tax Identification
Number format with checksum validation. It checks for common test numbers, and also
requires the presence of related keywords.
See “Lithuania Tax Identification Number narrow breadth” on page 1316.

Lithuania Tax Identification Number wide breadth

The wide breadth detects and 11-digit number that matches the Lithuania Tax Identification
Number format without checksum validation. It checks for common test numbers.
Library of system data identifiers 1316
Lithuania Tax Identification Number

Table 45-652 Lithuania Tax Identification Number wide breadth pattern

Pattern

[1-6]\d{2}[01]\d[0123]\d{5}

Table 45-653 Lithuania Tax Identification Number wide breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Lithuania Tax Identification Number medium breadth

The medium breadth detects and 11-digit number that matches the Lithuania Tax Identification
Number format with checksum validation.

Table 45-654 Lithuania Tax Identification Number medium breadth pattern

Pattern

[1-6]\d{2}[01]\d[0123]\d{5}

Table 45-655 Lithuania Tax Identification Number medium breadth validator

Mandatory validator Description

Lithuania Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Lithuania Tax Identification Number narrow breadth

The narrow breadth detects and 11-digit number that matches the Lithuania Tax Identification
Number format with checksum validation. It checks for common test numbers, and also requires
the presence of related keywords.

Table 45-656 Lithuania Tax Identification Number narrow breadth pattern

Pattern

[1-6]\d{2}[01]\d[0123]\d{5}
Library of system data identifiers 1317
Lithuania Value Added Tax (VAT) Number

Table 45-657 Lithuania Tax Identification Number narrow breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Lithuania Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

tax identification no., TIN, tin, TIN#, tin#, tin no., tax
identification number, tin no, tax id, tax id no, tax id
no., taxid, taxid#, tax number, tax no, tax#, Tax
Identification Number

mokesčių identifikavimo Nr., mokesčių identifikavimo

numeris, mokesčių ID, mokesčių id nr, mokesčių id
nr., mokesčių ID#, mokesčių numeris, mokestis Nr,
mokestis #, Mokesčių identifikavimo numeris

Lithuania Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. In Lithuania, VAT is
administered by the State Tax Inspectorate.
The Lithuania Value Added Tax (VAT) Number data identifier detects an 11- or 14-character
alphanumeric pattern beginning with LT.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11- or 14-character alphanumeric pattern beginning with LT
without checksum validation. It checks for common test patterns.
See “Lithuania Value Added Tax (VAT) Number wide breadth” on page 1318.
■ The medium detects an 11- or 14-character alphanumeric pattern beginning with LT with
checksum validation.
See “Lithuania Value Added Tax (VAT) Number medium breadth” on page 1318.
■ The narrow breadth detects an 11- or 14-character alphanumeric pattern beginning with
LT with checksum validation. It checks for common test patterns, and also requires the
presence of related keywords.
See “Lithuania Value Added Tax (VAT) Number narrow breadth” on page 1319.
Library of system data identifiers 1318
Lithuania Value Added Tax (VAT) Number

Lithuania Value Added Tax (VAT) Number wide breadth

The wide breadth detects an 11- or 14-character alphanumeric pattern beginning with LT
without checksum validation. It checks for common test patterns.

Table 45-658 Lithuania Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Ll][Tt]\d{7}[1]\d

[Ll][Tt] \d{7}[1]\d

[Ll][Tt]\d{10}[1]\d

[Ll][Tt] \d{10}[1]\d

Table 45-659 Lithuania Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000010, 111111111, 222222212, 333333313,

444444414, 555555515, 666666616, 777777717,
888888818, 999999919, 000000000000, 111111111111,
222222222212, 333333333313, 444444444414,
555555555515, 666666666616, 777777777717,
888888888818, 999999999919

Lithuania Value Added Tax (VAT) Number medium breadth

The medium detects an 11- or 14-character alphanumeric pattern beginning with LT with
checksum validation.

Table 45-660 Lithuania Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Ll][Tt]\d{7}[1]\d

[Ll][Tt] \d{7}[1]\d

[Ll][Tt]\d{10}[1]\d

[Ll][Tt] \d{10}[1]\d
Library of system data identifiers 1319
Lithuania Value Added Tax (VAT) Number

Table 45-661 Lithuania Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Lithuania Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.

Lithuania Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects an 11- or 14-character alphanumeric pattern beginning with LT
with checksum validation. It checks for common test patterns, and also requires the presence
of related keywords.

Table 45-662 Lithuania Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Ll][Tt]\d{7}[1]\d

[Ll][Tt] \d{7}[1]\d

[Ll][Tt]\d{10}[1]\d

[Ll][Tt] \d{10}[1]\d

Table 45-663 Lithuania Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000010, 111111111, 222222212, 333333313,

Lithuania Value Added Tax (VAT) Number Validation Computes the checksum and validates the pattern against
Check it.
Library of system data identifiers 1320
Luxembourg National Register of Individuals Number

Table 45-663 Lithuania Value Added Tax (VAT) Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat, vat number, vat#, value added tax number, VAT,

VAT#

pridėtinės vertės mokesčio numeris, PVM, PVM#,

pridėtinės vertės mokestis, PVM numeris, PVM
registracijos numeris

Luxembourg National Register of Individuals Number

The Luxembourg National Register of Individuals Number is an 11-digit identification number
issued to all Luxembourg citizens at age 15.
The Luxembourg National Register of Individuals Number data identifier detects an 11-digit
number that matches the Luxembourg National Register of Individuals Number format.
The Luxembourg National Register of Individuals Number system data identifier provides three
breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “ Luxembourg National Register of Individuals Number wide breadth” on page 1320.
■ The medium breadth detects an 11-digit number with checksum validation.
See “ Luxembourg National Register of Individuals Number medium breadth” on page 1321.
■ The narrow breadth detects an 11-digit number that passes checksum validation. It also
requires the presence of related keywords.
See “ Luxembourg National Register of Individuals Number narrow breadth” on page 1321.

Luxembourg National Register of Individuals Number wide breadth

The wide breadth detects an 11-digit number without checksum validation.

Table 45-664 Luxembourg National Register of Individuals Number wide-breadth pattern

Pattern

\d{11}
Library of system data identifiers 1321
Luxembourg National Register of Individuals Number

Table 45-665 Luxembourg National Register of Individuals Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Luxembourg National Register of Individuals Number medium breadth

The medium breadth detects an 11-digit number with checksum validation.

Table 45-666 Luxembourg National Register of Individuals Number medium breadth patterns

Pattern

\d{11}

Table 45-667 Luxembourg National Register of Individuals Number medium breadth validator

Mandatory validator Description

Luxembourg National Register of Individuals Number Computes the checksum and validates the pattern against
Validation Check it.

Number delimiter Validates a match by checking the surrounding characters.

Luxembourg National Register of Individuals Number narrow breadth

The narrow breadth detects an 11-digit number that passes checksum validation. It also
requires the presence of related keywords.

Table 45-668 Luxembourg National Register of Individuals Number narrow breadth patterns

Pattern

\d{11}

Table 45-669 Luxembourg National Register of Individuals Number narrow breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Luxembourg National Register of Individuals Number Computes the checksum and validates the pattern against
Validation Check it.
Library of system data identifiers 1322
Luxembourg Passport Number

Table 45-669 Luxembourg National Register of Individuals Number narrow breadth validator
(continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Personal ID, personal ID number, personalidno#,

unique ID number, personalidnumber#, unique ID key,
Personal ID Code, uniqueidkey#, individual code,
individual ID

Eindeutige ID-Nummer, Eindeutige ID, ID personnelle,

Numéro d'identification personnel, IDpersonnelle#,
Persönliche Identifikationsnummer, EindeutigeID#

Luxembourg Passport Number

A Luxembourg passport is an international travel document issued to nationals of the grand
Duchy of Luxembourg, and may also serve as proof of Luxembourgish citizenship.
The Luxembourg Passport Number data identifier detects a seven- or eight-character
alphanumeric pattern that matches the Luxembourg Passport Number format.
The Luxembourg Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects a seven- or eight-character alphanumeric pattern without
checksum validation.
See “Luxembourg Passport Number wide breadth” on page 1322.
■ The narrow breadth detects a seven- or eight-character alphanumeric pattern without
checksum validation. It requires the presence of related keywords.
See “Luxembourg Passport Number narrow breadth” on page 1323.

Luxembourg Passport Number wide breadth

The wide breadth detects a seven- or eight-character alphanumeric pattern without checksum
validation.

Table 45-670 Luxembourg Passport Number wide-breadth patterns

Patterns

\l\w{5}[0-9]
Library of system data identifiers 1323
Luxembourg Passport Number

Table 45-670 Luxembourg Passport Number wide-breadth patterns (continued)

Patterns

\l\w{5}[0-9][0-9A-Za-z]

Table 45-671 Luxembourg Passport Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Luxembourg Passport Number narrow breadth

The narrow breadth detects a seven- or eight-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.

Table 45-672 Luxembourg Passport Number narrow-breadth patterns

Patterns

\l\w{5}[0-9]

\l\w{5}[0-9][0-9A-Za-z]

Table 45-673 Luxembourg Passport Number narrow-breadth validators

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

passport number, passport, passport no, luxembourg

pass, luxembourg passeport, luxembourg passport

passnummer, ausweisnummer, passeport, reisepass,

pass, pass net, pass nr, no de passeport, passeport
nombre, numéro de passeport

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1324
Luxembourg Tax Identification Number

Luxembourg Tax Identification Number

This number is issued by Luxembourg internal revenue department (Administration des
contributions directes - ACD) and is used for tax related purposes of natural and non-natural
persons.
The Luxembourg Tax Identification Number data identifier detects an 11- or 13-digit number
that matches the Luxembourg Tax Identification Number format.
The Luxembourg Tax Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects an 11- or 13-digit number without checksum validation.
See “Luxembourg Tax Identification Number wide breadth” on page 1324.
■ The medium breadth detects an 11- or 13-digit number with checksum validation.
See “Luxembourg Tax Identification Number medium breadth” on page 1325.
■ The narrow breadth detects an 11- or 13-digit number with checksum validation. It also
requires the presence of related keywords.
See “Luxembourg Tax Identification Number narrow breadth” on page 1326.

Luxembourg Tax Identification Number wide breadth

The wide breadth detects an 11- or 13-digit number without checksum validation.

Table 45-674 Luxembourg Tax Identification Number wide-breadth patterns

Patterns

[1][89]\d{2}[01]\d[0123]\d\d{5}

[1][89]\d{2}[01]\d[0123]\d \d{5}

[1][89]\d{2}[01]\d[0123]\d-\d{5}

[1][89]\d{2}[01]\d[0123]\d,\d{5}

[1][89]\d{2}[01]\d[0123]\d.\d{5}

[2][0]\d{2}[01]\d[0123]\d\d{5}

[2][0]\d{2}[01]\d[0123]\d \d{5}

[2][0]\d{2}[01]\d[0123]\d-\d{5}

[2][0]\d{2}[01]\d[0123]\d,\d{5}

[2][0]\d{2}[01]\d[0123]\d.\d{5}

\d{11}
Library of system data identifiers 1325
Luxembourg Tax Identification Number

Table 45-674 Luxembourg Tax Identification Number wide-breadth patterns (continued)

Patterns

\d{2} \d{3} \d{3} \d{3}

\d{2}-\d{3}-\d{3}-\d{3}

\d{2}.\d{3}.\d{3}.\d{3}

\d{2},\d{3},\d{3},\d{3}

Table 45-675 Luxembourg Tax Identification Number wide-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Luxembourg Tax Identification Number medium breadth

The medium breadth detects an 11- or 13-digit number with checksum validation.

Table 45-676 Luxembourg Tax Identification Number medium-breadth patterns

Patterns

[1][89]\d{2}[01]\d[0123]\d\d{5}

[1][89]\d{2}[01]\d[0123]\d \d{5}

[1][89]\d{2}[01]\d[0123]\d-\d{5}

[1][89]\d{2}[01]\d[0123]\d,\d{5}

[1][89]\d{2}[01]\d[0123]\d.\d{5}

[2][0]\d{2}[01]\d[0123]\d\d{5}

[2][0]\d{2}[01]\d[0123]\d \d{5}

[2][0]\d{2}[01]\d[0123]\d-\d{5}

[2][0]\d{2}[01]\d[0123]\d,\d{5}

[2][0]\d{2}[01]\d[0123]\d.\d{5}

\d{11}
Library of system data identifiers 1326
Luxembourg Tax Identification Number

Table 45-676 Luxembourg Tax Identification Number medium-breadth patterns (continued)

Patterns

\d{2} \d{3} \d{3} \d{3}

\d{2}-\d{3}-\d{3}-\d{3}

\d{2}.\d{3}.\d{3}.\d{3}

\d{2},\d{3},\d{3},\d{3}

Table 45-677 Luxembourg Tax Identification Number medium-breadth validator

Mandatory validator Description

Luxembourg Tax Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Luxembourg Tax Identification Number narrow breadth

The narrow breadth detects an 11- or 13-digit number with checksum validation. It also requires
the presence of related keywords.

Table 45-678 Luxembourg Tax Identification Number narrow-breadth patterns

Patterns

[1][89]\d{2}[01]\d[0123]\d\d{5}

[1][89]\d{2}[01]\d[0123]\d \d{5}

[1][89]\d{2}[01]\d[0123]\d-\d{5}

[1][89]\d{2}[01]\d[0123]\d,\d{5}

[1][89]\d{2}[01]\d[0123]\d.\d{5}

[2][0]\d{2}[01]\d[0123]\d\d{5}

[2][0]\d{2}[01]\d[0123]\d \d{5}

[2][0]\d{2}[01]\d[0123]\d-\d{5}

[2][0]\d{2}[01]\d[0123]\d,\d{5}

[2][0]\d{2}[01]\d[0123]\d.\d{5}

\d{11}
Library of system data identifiers 1327
Luxembourg Value Added Tax (VAT) Number

Table 45-678 Luxembourg Tax Identification Number narrow-breadth patterns (continued)

Patterns

\d{2} \d{3} \d{3} \d{3}

\d{2}-\d{3}-\d{3}-\d{3}

\d{2}.\d{3}.\d{3}.\d{3}

\d{2},\d{3},\d{3},\d{3}

Table 45-679 Luxembourg Tax Identification Number narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Luxembourg Tax Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

social security, tin, tin number, tin no, tin#, luxembourg

tax identification number, tax number, tax id

Zinn, Zinn Nummer, Luxembourg Tax

Identifikatiounsnummer, Steier Nummer, Steier ID,
Sozialversicherungsausweis, Zinnzahl, Zinn nein,
Zinn#, luxemburgische steueridentifikationsnummer,
Steuernummer, Steuer ID

sécurité sociale, carte de sécurité sociale, étain,

numéro d'étain, étain non, étain#, Numéro
d'identification fiscal luxembourgeois, numéro
d'identification fiscale, identifiant d'impôt,

Sozialunterstützung, Sozialversécherung

Number delimiter Validates a match by checking the surrounding characters.

Luxembourg Value Added Tax (VAT) Number

VAT is a consumption tax that is borne by the end consumer. VAT is paid for each transaction
in the manufacturing and distribution process.
Library of system data identifiers 1328
Luxembourg Value Added Tax (VAT) Number

The Luxembourg Value Added Tax (VAT) Number data identifier detects an eight-character
alphanumeric pattern that matches the Luxembourg Value Added Tax (VAT) Number format.
The Luxembourg Value Added Tax (VAT) Number provides three breadths of detecion:
■ The wide breadth detects an eight-character alphanumeric pattern beginning with LU without
checksum validation.
See “Luxembourg Value Added Tax (VAT) Number wide breadth” on page 1328.
■ The medium breadth detects an eight-character alphanumeric pattern beginning with LU
with checksum validation.
See “Luxembourg Value Added Tax (VAT) Number medium breadth” on page 1329.
■ The narrow breadth detects an eight-character alphanumeric pattern beginning with LU
with checksum validation. It also requires the presence of related keywords.
See “Luxembourg Value Added Tax (VAT) Number narrow breadth” on page 1329.

Luxembourg Value Added Tax (VAT) Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern beginning with LU without
checksum validation.

Table 45-680 Luxembourg Value Added Tax (VAT) Number wide-breadth patterns

Patterns

[Lu][Uu]\d{8}

[Lu][Uu] \d{8}

[Lu][Uu]-\d{8}

[Lu][Uu] \d{3} \d{3} \d{2}

[Lu][Uu] \d{4} \d{4}

[Lu][Uu] \d{4}-\d{4}

[Lu][Uu] \d{4}.\d{4}

[Lu][Uu] \d{4},\d{4}

Table 45-681 Luxembourg Value Added Tax (VAT) Number wide-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1329
Luxembourg Value Added Tax (VAT) Number

Table 45-681 Luxembourg Value Added Tax (VAT) Number wide-breadth validators (continued)

Mandatory validators Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Luxembourg Value Added Tax (VAT) Number medium breadth

The medium breadth detects an eight-character alphanumeric pattern beginning with LU with
checksum validation.

Table 45-682 Luxembourg Value Added Tax (VAT) Number medium-breadth patterns

Patterns

[Lu][Uu]\d{8}

[Lu][Uu] \d{8}

[Lu][Uu]-\d{8}

[Lu][Uu] \d{3} \d{3} \d{2}

[Lu][Uu] \d{4} \d{4}

[Lu][Uu] \d{4}-\d{4}

[Lu][Uu] \d{4}.\d{4}

[Lu][Uu] \d{4},\d{4}

Table 45-683 Luxembourg Value Added Tax (VAT) Number medium-breadth validator

Mandatory validator Description

Luxembourg VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Luxembourg Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern beginning with LU with
checksum validation. It also requires the presence of related keywords.
Library of system data identifiers 1330
Luxembourg Value Added Tax (VAT) Number

Table 45-684 Luxembourg Value Added Tax (VAT) Number narrow-breadth patterns

Patterns

[Lu][Uu]\d{8}

[Lu][Uu] \d{8}

[Lu][Uu]-\d{8}

[Lu][Uu] \d{3} \d{3} \d{2}

[Lu][Uu] \d{4} \d{4}

[Lu][Uu] \d{4}-\d{4}

[Lu][Uu] \d{4}.\d{4}

[Lu][Uu] \d{4},\d{4}

Table 45-685 Luxembourg Value Added Tax (VAT) Number narrow-breadth validatos

Mandatory validators Description

Luxembourg VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:
00000000, 11111111, 22222222, 33333333, 44444444,
55555555, 66666666, 77777777, 88888888, 99999999
Library of system data identifiers 1331
Macau National Identification Number

Table 45-685 Luxembourg Value Added Tax (VAT) Number narrow-breadth validatos
(continued)

Mandatory validators Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

luxembourg vat number, Luxembourg vat no, vat

number, vat no, vat, VAT#, value added tax number,
vat id, vat registration number, value added tax

TVA kee, TVA#, TVA Aschreiwung kee, T.V.A,

stammnummer, bleiwen, geheescht, gitt id,
mehrwertsteuer, vat registrierungsnummer,
umsatzsteuer-id, wat, umsatzsteuernummer,
umsatzsteuer-identifikationsnummer

id de la batterie, lëtzebuerg vat nee, registréierung

nummer, numéro de TVA, numéro de enregistrement
vat

Macau National Identification Number

The Macau resident identification card is for permanent and non-permanent residents of Macau.
The identity card is issued by the Identification Services Directorate of Macau.
The Macau National Identification Number data identifier detects an eight-digit number that
matches the Macau National Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight-digit number that matches the Macau National
Identification Number format. It checks for common test numbers.
See “Macau National Identification Number wide breadth” on page 1331.
■ The narrow breadth detects an eight-digit number that matches the Macau National
Identification Number format. It checks for common test numbers, and also requires the
presence of related keywords.
See “Macau National Identification Number narrow breadth” on page 1332.

Macau National Identification Number wide breadth

The wide breadth detects an eight-digit number that matches the Macau National Identification
Number format. It checks for common test numbers.
Library of system data identifiers 1332
Macau National Identification Number

Table 45-686 Macau National Identification Number wide-breadth patterns

Pattern

1\d\d\d\d\d\d(\d)

5\d\d\d\d\d\d(\d)

7\d\d\d\d\d\d(\d)

Table 45-687 Macau National Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Macau National Identification Number narrow breadth

The narrow breadth detects an eight-digit number that matches the Macau National Identification
Number format. It checks for common test numbers, and also requires the presence of related
keywords.

Table 45-688 Macau National Identification Number narrow-breadth patterns

Pattern

1\d\d\d\d\d\d(\d)

5\d\d\d\d\d\d(\d)

7\d\d\d\d\d\d(\d)

Table 45-689 Macau National Identification Numbernarrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.
Library of system data identifiers 1333
Malaysia Passport Number

Table 45-689 Macau National Identification Numbernarrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

identification number, id number, identity card number,

identity card no, national identity card number, national
identity card no, national identification number,
personal identification number, personal ID no, unique
identification number, unique id no, nationalid#,
perosonalid#, uniqueid#

身份证号码, 唯一的识别号码

número de identificação, número cartão identidade,

número cartão identidade nacional, número
identificação pessoal, número identificação único, id
único não, ID único#

Malaysia Passport Number

The Malaysian passport is issued to citizens of Malaysia by the Immigration Department of
Malaysia.
The Malaysia Passport Number data identifier detects a nine-character alphanumeric pattern
that matches the Malaysia Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a nine-character alphanumeric pattern that matches the Malaysia
Passport Number format. It checks for common test patterns.
See “Malaysia Passport Number wide breadth” on page 1333.
■ The narrow breadth detects a nine-character alphanumeric pattern that matches the
Malaysia Passport Number format. It checks for common test patterns, and also requires
the presence of related keywords.
See “Malaysia Passport Number narrow breadth” on page 1334.

Malaysia Passport Number wide breadth

The wide breadth detects a nine-character alphanumeric pattern that matches the Malaysia
Passport Number format. It checks for common test patterns.
Library of system data identifiers 1334
Malaysia Passport Number

Table 45-690 Malaysia Passport Number wide-breadth patterns

Pattern

[AaHhKk]\d\d\d\d\d\d\d\d

Table 45-691 Malaysia Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Malaysia Passport Number narrow breadth

The narrow breadth detects a nine-character alphanumeric pattern that matches the Malaysia
Passport Number format. It checks for common test patterns, and also requires the presence
of related keywords.

Table 45-692 Malaysia Passport Number narrow-breadth patterns

Pattern

[AaHhKk]\d\d\d\d\d\d\d\d

Table 45-693 Malaysia Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999
Library of system data identifiers 1335
Malaysian MyKad Number (MyKad)

Table 45-693 Malaysia Passport Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

passport, passport number, passport no, passportno,

passport no., passport#, passportno#

pasport, nombor pasport, pasport#

Malaysian MyKad Number (MyKad)

The Malaysian National Registration Identity Card Number (NRIC No.) is a unique 12-digit
number issued to Malaysian citizens and permanent residents for identification, indexing, and
tracking purposes.
The Malaysian MyKad Number (MyKad) data identifier detects a 12-digit number that matches
the MyKad format.
The Malaysian MyKad Number (MyKad) system data identifier provides three breadths of
detection:
The Malaysian MyKad Number (MyKad) system data identifier provides three breadths of
detection:
■ The wide breadth detects an 12-digit number without checksum validation.
See “ Malaysian MyKad Number (MyKad) wide breadth” on page 1335.
■ The medium breadth detects a 12-digit number with checksum validation.
See “ Malaysian MyKad Number (MyKad) medium breadth” on page 1336.
■ The narrow breadth detects a 12-digit number that passes checksum validation. It also
requires the presence of MyKad-related keywords.
See “ Malaysian MyKad Number (MyKad) narrow breadth” on page 1336.

Malaysian MyKad Number (MyKad) wide breadth

The wide breadth detects a 12-digit number without checksum validation.

Table 45-694 Malaysian MyKad Number (MyKad) wide-breadth patterns

Patterns

\d{12}
Library of system data identifiers 1336
Malaysian MyKad Number (MyKad)

Table 45-694 Malaysian MyKad Number (MyKad) wide-breadth patterns (continued)

Patterns

\d{6}-\d{2}-\d{4}

Table 45-695 Malaysian MyKad Number (MyKad) wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Malaysian MyKad Number (MyKad) medium breadth

The medium breadth detects a 12-digit number with checksum validation.

Table 45-696 Malaysian MyKad Number (MyKad) medium-breadth patterns

Patterns

\d{12}

\d{6}-\d{2}-\d{4}

Table 45-697 Malaysian MyKad Number (MyKad) medium-breadth validators

Mandatory validators Description

Malaysian My Kad Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Malaysian MyKad Number (MyKad) narrow breadth

The narrow breadth detects a 12-digit number that passes checksum validation. It also requires
the presence of MyKad-related keywords.

Table 45-698 Malaysian MyKad Number (MyKad) narrow-breadth patterns

Patterns

\d{12}

\d{6}-\d{2}-\d{4}
Library of system data identifiers 1337
Malta National Identification Number

Table 45-699 Malaysian MyKad Number (MyKad) narrow-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Malaysian MyKad Number Validation Check Validator computes checksum number that every
Malaysian MyKad Number must Computes the checksum
and validates the pattern against it.pass.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

NRIC No, nricno#, MyKad Number, mykad no,

mykadnumber#, identity card no, MyKadno#, mykad,
mykad#, identity card number, nric no

nombor kad pengenalan, kad pengenalan no, kad

pengenalan Malaysia, bilangan identiti unik, nombor
peribadi, nomborperibadi#, kadpengenalanno#

Malta National Identification Number

Every resident of Malta is assigned a national number. For foreigners who are authorized to
reside in Malta, National numbers for foreign resident end with the letter A. National numbers
for Maltese citizens end with M, G, L, H or P.
The Malta National Identification Number data identifier detects an eight-character alphanumeric
pattern that matches the Malta National Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern that matches the Malta
National Identification Number format. It checks for common test patterns.
See “Malta National Identification Number wide breadth” on page 1338.
■ The narrow breadth detects an eight-character alphanumeric pattern that matches the
Malta National Identification Number format. It checks for common test patterns, and also
requires the presence of related keywords.
See “Malta National Identification Number narrow breadth” on page 1338.
Library of system data identifiers 1338
Malta National Identification Number

Malta National Identification Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern that matches the Malta
National Identification Number format. It checks for common test patterns.

Table 45-700 Malta National Identification Number wide-breadth patterns

Pattern

\d{6}[1-9][APap]

[012]\d{6}[MGLHBZmglhbz]

[3][01]\d{5}[MGLHBZmglhbz]

32000\d{2}[MGLHBZmglhbz]

Table 45-701 Malta National Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Malta National Identification Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern that matches the Malta
National Identification Number format. It checks for common test patterns, and also requires
the presence of related keywords.

Table 45-702 Malta National Identification Number narrow-breadth patterns

Pattern

\d{6}[1-9][APap]

[012]\d{6}[MGLHBZmglhbz]

[3][01]\d{5}[MGLHBZmglhbz]

32000\d{2}[MGLHBZmglhbz]
Library of system data identifiers 1339
Malta Tax Identification Number

Table 45-703 Malta National Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

national ID, national identification number, personal

ID, personal identification number, nationalid#,
personalid#

numru identifikazzjoni nazzjonali, ID nazzjonali, numru

identifikazzjoni personali, ID personali, IDnazzjonali#,
IDpersonali#

Malta Tax Identification Number

The Malta Tax Identification Number is assigned by the Inland Revenue Department as a
means of identification for income tax purposes.
The Malta Tax Identification Number data identifier detects an eight- or nine-character
alphanumeric pattern that matches the Malta Tax Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight- or nine-character alphanumeric pattern that matches
the Malta Tax Identification Number format. It checks for common test patterns.
See “Malta Tax Identification Number wide breadth” on page 1339.
■ The narrow breadth detects an eight- or nine-character alphanumeric pattern that matches
the Malta Tax Identification Number format. It checks for common test patterns, and also
requires the presence of related keywords.
See “Malta Tax Identification Number narrow breadth” on page 1340.

Malta Tax Identification Number wide breadth

The wide breadth detects an eight- or nine-character alphanumeric pattern that matches the
Malta Tax Identification Number format. It checks for common test patterns.
Library of system data identifiers 1340
Malta Tax Identification Number

Table 45-704 Malta Tax Identification Number wide-breadth patterns

Pattern

\d{6}[1-9][APap]

[012]\d{6}[MGLHBZmglhbz]

[3][01]\d{5}[MGLHBZmglhbz]

32000\d{2}[MGLHBZmglhbz]

[1]{2}\d{7}

[2]{2}\d{7}

[3]{2}\d{7}

[4]{2}\d{7}

[5]{2}\d{7}

[6]{2}\d{7}

[7]{2}\d{7}

[8]{2}\d{7}

Table 45-705 Malta Tax Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Malta Tax Identification Number narrow breadth

The narrow breadth detects an eight- or nine-character alphanumeric pattern that matches
the Malta Tax Identification Number format. It checks for common test patterns, and also
requires the presence of related keywords.
Library of system data identifiers 1341
Malta Tax Identification Number

Table 45-706 Malta Tax Identification Number narrow-breadth patterns

Pattern

\d{6}[1-9][APap]

[012]\d{6}[MGLHBZmglhbz]

[3][01]\d{5}[MGLHBZmglhbz]

32000\d{2}[MGLHBZmglhbz]

[1]{2}\d{7}

[2]{2}\d{7}

[3]{2}\d{7}

[4]{2}\d{7}

[5]{2}\d{7}

[6]{2}\d{7}

[7]{2}\d{7}

[8]{2}\d{7}

Table 45-707 Malta Tax Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999
Library of system data identifiers 1342
Malta Value Added Tax (VAT) Number

Table 45-707 Malta Tax Identification Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

tax code, tax number, tax identification number, kodiċi

tat-taxxa, numru tat-taxxa, numru identifikazzjoni
tat-taxxa, tax id, taxxaid#, taxid#, numru identifikazzjoni
kontribwent, taxpayer identification number, kodiċi
kontribwent, taxpayer code, tin, TIN, tin#, TIN#, tin no,
landa, landa nru

Malta Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. In Malta, VAT is administered
by tax office for the region in which the business is established.
The Malta Value Added Tax (VAT) Number data identifier detects an 8- or 10-character
alphanumeric pattern that matches the Malta Value Added Tax (VAT) Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 8- or 10-character alphanumeric pattern that matches the
Malta Value Added Tax (VAT) Number format without checksum validation. It checks for
common test patterns.
See “Malta Value Added Tax (VAT) Number wide breadth” on page 1342.
■ The medium breadth detects an 8- or 10-character alphanumeric pattern that matches the
Malta Value Added Tax (VAT) Number format with checksum validation.
See “Malta Value Added Tax (VAT) Number medium breadth” on page 1343.
■ The narrow breadth detects an 8- or 10-character alphanumeric pattern that matches the
Malta Value Added Tax (VAT) Number format with checksum validation. It checks for
common test patterns, and also requires the presence of related keywords.
See “Malta Value Added Tax (VAT) Number narrow breadth” on page 1343.

Malta Value Added Tax (VAT) Number wide breadth

The wide breadth detects an 8- or 10-character alphanumeric pattern that matches the Malta
Value Added Tax (VAT) Number format without checksum validation. It checks for common
test patterns.
Library of system data identifiers 1343
Malta Value Added Tax (VAT) Number

Table 45-708 Malta Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Mm][Tt]\d{8}

\d{4}[-]\d{4}

Table 45-709 Malta Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Malta Value Added Tax (VAT) Number medium breadth

The medium breadth detects an 8- or 10-character alphanumeric pattern that matches the
Malta Value Added Tax (VAT) Number format with checksum validation.

Table 45-710 Malta Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Mm][Tt]\d{8}

\d{4}[-]\d{4}

Table 45-711 medium-breadth validators

Mandatory validator Description

Malta Value Added Tax (VAT) Number Validation Check Computes the checksum and validates the pattern against
it.

Malta Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects an 8- or 10-character alphanumeric pattern that matches the Malta
Value Added Tax (VAT) Number format with checksum validation. It checks for common test
patterns, and also requires the presence of related keywords.
Library of system data identifiers 1344
Medicare Beneficiary Identifier

Table 45-712 Malta Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Mm][Tt]\d{8}

\d{4}[-]\d{4}

Table 45-713 Malta Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Malta Value Added Tax (VAT) Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat number, VAT number, vat, vat#, malta vat number,

vatno#, value added tax number, malta vat, vat
identification number

Numru tal-VAT, numru tal-VAT, bettija,valur miżjud

taxxa in-numru, bettija identifikazzjoni in-numru

Medicare Beneficiary Identifier

The Medicare Beneficiary Identifier (MBI) is assigned to an individual for the purpose of
identifying them as a medicare beneficiary. The MBI will replace the Healthcare Insurance
Claim Number (HICN) on all Medicare cards by April 2019.
The Medicare Beneficiary Identifier detects an 11-character alphanumeric pattern that matches
the Medicare Beneficiary Identifier format.
The Medicare Beneficiary Identifier data identifier provides three breadths of detection:
■ The wide breadth detects an 11-character alphanumeric pattern without checksum validation.
See “Medicare Beneficiary Identifier wide breadth” on page 1345.
■ The medium breadth detects an 11-character alphanumeric pattern with checksum
validation.
Library of system data identifiers 1345
Medicare Beneficiary Identifier

See “Medicare Beneficiary Identifier medium breadth” on page 1345.

■ The narrow breadth detects an 11-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
See “Medicare Beneficiary Identifier narrow breadth” on page 1345.

Medicare Beneficiary Identifier wide breadth

The wide breadth detects an 11-character alphanumeric pattern without checksum validation.

Table 45-714 Medicare Beneficiary Identifier wide-breadth pattern

Pattern

[1-9][A-Za-z][0-9A-Za-z][0-9][A-Za-z][0-9A-Za-z][0-9][A-Za-z]{2}[0-9]{2}

Table 45-715 Medicare Beneficiary Identifier wide-breadth validator

Mandatory validator

Number delimiter Validates a match by checking the surrounding characters.

Medicare Beneficiary Identifier medium breadth

The medium breadth detects an 11-character alphanumeric pattern with checksum validation.

Table 45-716 Medicare Beneficiary Identifier medium-breadth pattern

Pattern

[1-9][A-Za-z][0-9A-Za-z][0-9][A-Za-z][0-9A-Za-z][0-9][A-Za-z]{2}[0-9]{2}

Table 45-717 Medicare Beneficiary Identifier medium-breadth validator

Mandatory validator

Medicare Beneficiary Identifier Number Validation Computes the checksum and validates the pattern against
Check it.

Medicare Beneficiary Identifier narrow breadth

The narrow breadth detects an 11-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
Library of system data identifiers 1346
Mexican Personal Registration and Identification Number

Table 45-718 Medicare Beneficiary Identifier narrow-breadth pattern

Pattern

[1-9][A-Za-z][0-9A-Za-z][0-9][A-Za-z][0-9A-Za-z][0-9][A-Za-z]{2}[0-9]{2}

Table 45-719 Medicare Beneficiary Identifier narrow breadth validators

Mandatory validators

Number delimiter Validates a match by checking the surrounding characters.

Medicare Beneficiary Identifier Number Validation Computes the checksum and validates the pattern against
Check it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Medicare Beneficiary Identifier, medicare beneficiary

identifier, mbi number, mbi no, mbi number#, mbi
number#, mbi no#, medicare beneficiary number,
medicare beneficiary no, medicare beneficiary#

Mexican Personal Registration and Identification

Number
The Mexican Personal Registration and Identification Number is a number used in Mexican
states (with the exception of Mexico City) as a personal identification code.
The Mexican Personal Registration and Identification Number detects a 15-character
alphanumeric pattern that matches the Mexican Personal Registration and Identification Number
format.
The Mexican Personal Registration and Identification Number data identifier provides three
breadths of detection:
■ The wide breadth detects a 15-character alphanumeric pattern without checksum validation.
See “Mexican Personal Registration and Identification Number wide breadth” on page 1347.
■ The medium breadth detects a 15-character alphanumeric pattern with checksum validation.
See “Mexican Personal Registration and Identification Number medium breadth” on page 1347.
■ The narrow breadth detects a 15-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
See “Mexican Personal Registration and Identification Number narrow breadth” on page 1348.
Library of system data identifiers 1347
Mexican Personal Registration and Identification Number

Mexican Personal Registration and Identification Number wide

breadth
The wide breadth detects a 15-character alphanumeric pattern without checksum validation.

Table 45-720 Mexican Personal Registration and Identification Number wide-breadth pattern

Pattern

\d{2}-\d{3}-\d{2}-\d{7}-\w

Table 45-721 Mexican Personal Registration and Identification Number wide-breadth validator

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000000000, 11111111111111, 22222222222222,

33333333333333, 44444444444444, 55555555555555,
66666666666666, 77777777777777, 88888888888888,
99999999999999

Mexican Personal Registration and Identification Number medium

breadth
The medium breadth detects a 15-character alphanumeric pattern with checksum validation.

Table 45-722 Mexican Personal Registration and Identification Number medium-breadth

pattern

Pattern

\d{2}-\d{3}-\d{2}-\d{7}-\w

Table 45-723 Mexican Personal Registration and Identification Number medium-breadth

validator

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000000000, 11111111111111, 22222222222222,

33333333333333, 44444444444444, 55555555555555,
66666666666666, 77777777777777, 88888888888888,
99999999999999
Library of system data identifiers 1348
Mexican Personal Registration and Identification Number

Table 45-723 Mexican Personal Registration and Identification Number medium-breadth

validator (continued)

Mandatory validator Description

Mexican CRIP Validation Check Computes the checksum for the match and validates the
pattern against it.

Mexican Personal Registration and Identification Number narrow

breadth
The narrow breadth detects a 15-character alphanumeric pattern with checksum validation. It
also requires the presence of related keywords.

Table 45-724 Mexican Personal Registration and Identification Number narrow-breadth

pattern

Pattern

\d{2}-\d{3}-\d{2}-\d{7}-\w

Table 45-725 Mexican Personal Registration and Identification Number narrow-breadth

validator

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:
00000000000000, 11111111111111, 22222222222222,
33333333333333, 44444444444444, 55555555555555,
66666666666666, 77777777777777, 88888888888888,
99999999999999

Mexican CRIP Validation Check Computes the checksum for every number matched and
validates the pattern against it.
Library of system data identifiers 1349
Mexican Tax Identification Number

Table 45-725 Mexican Personal Registration and Identification Number narrow-breadth

validator (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Personal Registration and Identification Code, CRIP,

crip, CRIP#, crip#, Mexican Personal ID Code, Mexican
personal identification number

Clave de Registro de Identidad Personal, Código de

Identificación Personal mexicana, número de
identificación personal mexicana

Mexican Tax Identification Number

In Mexico, a legal entity, such as a company or a person, is assigned a tax identification
number. A tax identification number for a company is 12 characters, while a tax identification
number for a person is 13 characters.
The Mexican Tax Identification Number data identifier detects a 12- or 13-character
alphanumeric pattern that matches the Mexican Tax Identification Number format.
The Mexican Tax Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects a 12- or 13-character alphanumeric pattern without validation.
See “Mexican Tax Identification Number wide breadth” on page 1349.
■ The medium breadth detects a 12- or 13-character alphanumeric pattern with checksum
validation.
See “Mexican Tax Identification Number medium breadth” on page 1350.
■ The narrow breadth detects a 12- or 13-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.
See “Mexican Tax Identification Number narrow breadth” on page 1350.

Mexican Tax Identification Number wide breadth

The wide breadth detects a 12- or 13-character alphanumeric pattern without validation.
Library of system data identifiers 1350
Mexican Tax Identification Number

Table 45-726 Mexican Tax Identification Number wide-breadth patterns

Patterns

[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]\d\d[01]\d[0-3]\d\w\w\w

[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z][- ]\d\d[01]\d[0-3]\d\w\w\w

[a-zA-Z][a-zA-Z][a-zA-Z]\d\d[01]\d[0-3]\d\w\w\w

[a-zA-Z][a-zA-Z][a-zA-Z][- ]\d\d[01]\d[0-3]\d\w\w\w

Mexican Tax Identification Number medium breadth

The medium breadth detects a 12- or 13-character alphanumeric pattern with checksum
validation.

Table 45-727 Mexican Tax Identification Number medium-breadth patterns

Patterns

[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]\d\d[01]\d[0-3]\d\w\w\w

[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z][- ]\d\d[01]\d[0-3]\d\w\w\w

[a-zA-Z][a-zA-Z][a-zA-Z]\d\d[01]\d[0-3]\d\w\w\w

[a-zA-Z][a-zA-Z][a-zA-Z][- ]\d\d[01]\d[0-3]\d\w\w\w

Table 45-728 Mexican Tax Identification Number medium-breadth validator

Mandatory validator Description

Mexican TAX ID Validation Check Computes the checksum and validates the pattern against
it.

Mexican Tax Identification Number narrow breadth

The narrow breadth detects a 12- or 13-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.

Table 45-729 Mexican Tax Identification Number narrow-breadth patterns

Patterns

[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]\d\d[01]\d[0-3]\d\w\w\w

[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z][- ]\d\d[01]\d[0-3]\d\w\w\w
Library of system data identifiers 1351
Mexican Unique Population Registry Code

Table 45-729 Mexican Tax Identification Number narrow-breadth patterns (continued)

Patterns

[a-zA-Z][a-zA-Z][a-zA-Z]\d\d[01]\d[0-3]\d\w\w\w

[a-zA-Z][a-zA-Z][a-zA-Z][- ]\d\d[01]\d[0-3]\d\w\w\w

Table 45-730 Mexican Tax Identification Number narrow-breadth validators

Mandatory validator Description

Mexican TAX ID Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Tax Identification Number, Tax ID, Tax ID No., RFC

Number, TIN, TIN#, Federal Taxpayer Registry Code

Registro Federal de Contribuyentes, número de

identificación de impuestos, Código del Registro
Federal de Contribuyentes, Número RFC, Clave del
RFC

Mexican Unique Population Registry Code

The Mexican Unique Population Registry Code (Clave Única de Registro de Población, or
CURP) is the unique alphanumeric identifier assigned to each person living in Mexico, either
nationals or foreigners, as well as Mexican nationals who live in other countries.
The Mexican Unique Population Registry Code data identifier detects an 18-character
alphanumeric pattern that matches the CURP format.
The Mexican Unique Population Registry Code system data identifier provides three breadths
of detection:
■ The wide breadth detects an 18-character alphanumeric pattern without validation.
See “Mexican Unique Population Registry Code wide breadth” on page 1352.
■ The medium breadth detects an 18-character alphanumeric pattern with checksum
validation.
See “Mexican Unique Population Registry Code medium breadth” on page 1352.
■ The narrow breadth detects an 18-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.
Library of system data identifiers 1352
Mexican Unique Population Registry Code

See “ Mexican Unique Population Registry Code narrow breadth” on page 1352.

Mexican Unique Population Registry Code wide breadth

The wide breadth detects an 18-character alphanumeric pattern without validation.

Table 45-731 Mexican Unique Population Registry Code wide-breadth pattern

Pattern

\w[AEIOUaeiou]\w{2}\d{2}[0-1]\d[0-3]\d[HMhm]\w{7}

Mexican Unique Population Registry Code medium breadth

The medium breadth detects an 18-character alphanumeric pattern with checksum validation.

Table 45-732 Mexican Unique Population Registry Code medium-breadth pattern

Pattern

\w[AEIOUaeiou]\w{2}\d{2}[0-1]\d[0-3]\d[HMhm]\w{7}

Table 45-733 Mexican Unique Population Registry Code medium-breadth validator

Mandatory validator Description

Mexican Personal ID Code Number Validation Check Computes the checksum and validates the pattern against
it.

Mexican Unique Population Registry Code narrow breadth

The narrow breadth detects an 18-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.

Table 45-734 Mexican Unique Population Registry Code narrow-breadth pattern

Pattern

\w[AEIOUaeiou]\w{2}\d{2}[0-1]\d[0-3]\d[HMhm]\w{7}

Table 45-735 Mexican Unique Population Registry Code narrow-breadth validators

Mandatory validator Description

Mexican Personal ID Code Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1353
Mexico CLABE Number

Table 45-735 Mexican Unique Population Registry Code narrow-breadth validators (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Personal ID, personal ID number, personal ID, unique

ID number, unique ID key, personal ID code, unique
population registry code, unique population code,
personalid#, personalidnumber#, uniqueidkey#

CURP, curp#, clave Única de registro de Población,

clave única, clave única de identidad, clave personal
Identidad, personal Identidad Clave, ClaveÚnica#,
clavepersonalIdentidad#

Mexico CLABE Number

The Mexico CLABE (Clave Bancaria Estandarizada) Number is an 18-digit number used as
a banking standard for the numbering of bank accounts in Mexico.
The Mexico CLABE Number data identifier detects an 18-digit number that matches the CLABE
Number format.
The Mexico CLABE Number data identifier provides three breadths of detection:
■ The wide breadth detects an 18-digit number without checksum validation.
See “ Mexico CLABE Number wide breadth” on page 1353.
■ The medium breadth detects an 18-digit number with checksum validation.
See “Mexico CLABE Number medium breadth” on page 1354.
■ The narrow breadth detects an 18-digit number with checksum validation. It also requires
the presence of related keywords.
See “Mexico CLABE Number narrow breadth” on page 1354.

Mexico CLABE Number wide breadth

The wide breadth detects an 18-digit number without checksum validation.

Table 45-736 Mexico CLABE Number wide-breadth patterns

Pattern

\d{18}
Library of system data identifiers 1354
Mexico CLABE Number

Table 45-737 Mexico CLABE Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Mexico CLABE Number medium breadth

The medium breadth detects an 18-digit number with checksum validation.

Table 45-738 Mexico CLABE Number medium-breadth patterns

Pattern

\d{18}

Table 45-739 Mexico CLABE Number medium-breadth validators

Mandatory validator Description

Mexico CLABE Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding numbers.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

555555555555555555

Mexico CLABE Number narrow breadth

The narrow breadth detects an 18-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-740 Mexico CLABE Number narrow-breadth patterns

Pattern

\d{18}

Table 45-741 Mexico CLABE Number narrow-breadth validators

Mandatory validator Description

Mexico CLABE Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1355
National Drug Code (NDC)

Table 45-741 Mexico CLABE Number narrow-breadth validators (continued)

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Mexico CLABE Number, mexico clabe number, clabe

number, clabe no., Mexico CLABE No., mexico clabe
no., CLABE No#, clabe no#

Clave Bancaria Estandarizada, Estandarizado Banco

número de clave, número de clave, clave número,
clave#

National Drug Code (NDC)

The National Drug Code (NDC) is an identifier issued by the Food and Drug Administration
(FDA) for an individual drug in the United States. An alternate format is defined by HIPAA
regulations.
The National Drug Code data identifier detects the existence of an NDC as well as the HIPAA
version.
This data identifier provides three breadths of detection:
■ The wide breadth checks for the existence of an NDC number or its HIPAA version.
See “National Drug Code (NDC) wide breadth” on page 1355.
■ The medium breadth restricts the patterns for detecting the numbers.
See “National Drug Code (NDC) medium breadth” on page 1356.
■ The narrow breadth requires a keyword match.
See “National Drug Code (NDC) narrow breadth” on page 1356.

National Drug Code (NDC) wide breadth

The wide breadth detects the standard FDA format, which is a 10-digit number in the format
4-4-2, 5-4-1 or 5-3-2, with the numbers separated by dashes or spaces.
This data identifier also detects the HIPAA format, an 11-digit number in the format 5-4-2. The
HIPAA format may include a single asterisk to represent a missing digit.
Library of system data identifiers 1356
National Drug Code (NDC)

Table 45-742 National Drug Code (NDC) wide breadth patterns

Patterns

*?\d{4} \d{4} \d{2}

*?\d{4}-\d{4}-\d{2}

\d{5} *?\d{3} \d{2}

\d{5}-*?\d{3}-\d{2}

\d{5} \d{4} *?\d

\d{5}-\d{4}-*?\d

\d{5} \d{4} \d{2}

\d{5}-\d{4}-\d{2}

National Drug Code (NDC) medium breadth

The medium breadth detects the standard FDA format, which is a 10-digit number in the format
4-4-2, 5-4-1 or 5-3-2, with the numbers separated by dashes.
This data identifier also detects the HIPAA format, an 11-digit number in the format 5-4-2. The
HIPAA format may include a single asterisk to represent a missing digit.

Note: The medium breadth of this data identifier does not include any validators.

Table 45-743 National Drug Code (NDC) medium breadth patterns

Pattern

*?\d{4}-\d{4}-\d{2}

\d{5}-*?\d{3}-\d{2}

\d{5}-\d{4}-*?\d

\d{5}-\d{4}-\d{2}

National Drug Code (NDC) narrow breadth

The narrow breadth detects the standard FDA format, which is a 10-digit number in the format
4-4-2, 5-4-1 or 5-3-2, with the numbers separated by dashes.
Library of system data identifiers 1357
National Provider Identifier Number

This data identifier also detects the HIPAA format, an 11-digit number in the format 5-4-2. The
HIPAA format may include a single asterisk to represent a missing digit. This data identifier
also requires the presence of an NDC-related keyword.

Table 45-744 National Drug Code (NDC) narrow breadth patterns

Pattern

*?\d{4}-\d{4}-\d{2}

\d{5}-*?\d{3}-\d{2}

\d{5}-\d{4}-*?\d

\d{5}-\d{4}-\d{2}

Table 45-745 National Drug Code (NDC) narrow breadth validators

Mandatory validator Description

Find keywords With this option selected, at least one of the following keywords or key phrases
must be present for the data to be matched.

Find keywords input ndc, national drug code

National Provider Identifier Number

National Provider Identifier (NPI) is a unique 10-digit identification number issued to health
care providers in the United States by the Centers for Medicare and Medicaid Services.
The National Provider Identifier Number data identifier detects a 10-digit number that matches
the National Provider Identifier Number format.
The National Provider Identifier Number data identifier provides three breadths of detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “National Provider Identifier Number wide breadth” on page 1357.
■ The medium breadth detects a 10-digit number with checksum validation.
See “National Provider Identifier Number medium breadth” on page 1358.
■ The narrow breadth detects a 10-digit number with checksum validation. It also requires
the presence of related keywords.
See “National Provider Identifier Number narrow breadth” on page 1358.

National Provider Identifier Number wide breadth

The wide breadth detects a 10-digit number without checksum validation.
Library of system data identifiers 1358
National Provider Identifier Number

Table 45-746 National Provider Identifier Number wide-breadth patterns

Pattern

\d{10}

80840\d{10}

Table 45-747 National Provider Identifier Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

National Provider Identifier Number medium breadth

The medium breadth detects a 10-digit number with checksum validation.

Table 45-748 National Provider Identifier Number medium-breadth patterns

Pattern

\d{10}

80840\d{10}

Table 45-749 National Provider Identifier Number medium-breadth validators

Mandatory validator Description

National Provider Identifier Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding numbers.

National Provider Identifier Number narrow breadth

The narrow breadth detects a 10-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-750 National Provider Identifier Number narrow-breadth patterns

Pattern

\d{10}

80840\d{10}
Library of system data identifiers 1359
Netherlands Bank Account Number

Table 45-751 National Provider Identifier Number narrow-breadth validators

Mandatory validator Description

National Provider Identifier Number Validation Check Computes the checksum and validates the pattern against
it.

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

National Provider Identifier, NPI, npi, n.p.i, hipaa,

National Provider ID, npiid, national provider ID
number, NPI ID

Netherlands Bank Account Number

The Netherlands bank account number is the standard bank account number used across the
Netherlands.
The Netherlands Bank Account Number data identifier detects an 8-, 9-, or 10-character
alphanumeric pattern that matches the Netherlands Bank Account Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 8-, 9-, or 10-character alphanumeric pattern that matches the
Netherlands Bank Account Number format without checksum validation. It checks for
common test patterns.
See “Netherlands Bank Account Number wide breadth” on page 1360.
■ The medium breadth detects an 8-, 9-, or 10-character alphanumeric pattern that matches
the Netherlands Bank Account Number format with checksum validation.
See “Netherlands Bank Account Number medium breadth” on page 1360.
■ The narrow breadth detects an 8-, 9-, or 10-character alphanumeric pattern that matches
the Netherlands Bank Account Number format with checksum validation. It checks for
common test patterns, and also requires the presence of related keywords.
See “Netherlands Bank Account Number medium breadth” on page 1360.
Library of system data identifiers 1360
Netherlands Bank Account Number

Netherlands Bank Account Number wide breadth

The wide breadth detects an 8-, 9-, or 10-character alphanumeric pattern that matches the
Netherlands Bank Account Number format without checksum validation. It checks for common
test patterns.

Table 45-752 Netherlands Bank Account Number wide-breadth patterns

Pattern

[PpGg]\d\d\d\d\d\d\d

\d\d\d\d\d\d\d\d\d\d

\d\d\d\d\d\d\d\d\d

Table 45-753 Netherlands Bank Account Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Netherlands Bank Account Number medium breadth

The medium breadth detects an 8-, 9-, or 10-character alphanumeric pattern that matches the
Netherlands Bank Account Number format with checksum validation.

Table 45-754 Netherlands Bank Account Number medium-breadth patterns

Pattern

[PpGg]\d\d\d\d\d\d\d

\d\d\d\d\d\d\d\d\d\d

\d\d\d\d\d\d\d\d\d
Library of system data identifiers 1361
Netherlands Bank Account Number

Table 45-755 Netherlands Bank Account Number medium-breadth validators

Mandatory validator Description

Netherlands Bank Account Number Validation Check Computes the checksum and validates the pattern against
it.

Netherlands Bank Account Number narrow breadth

The narrow breadth detects an 8-, 9-, or 10-character alphanumeric pattern that matches the
Netherlands Bank Account Number format with checksum validation. It checks for common
test patterns, and also requires the presence of related keywords.

Table 45-756 Netherlands Bank Account Number narrow-breadth patterns

Pattern

[PpGg]\d\d\d\d\d\d\d

\d\d\d\d\d\d\d\d\d\d

\d\d\d\d\d\d\d\d\d

Table 45-757 Netherlands Bank Account Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Exclude ending characters Data ending with any of the following list of values is not
matched:

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999

Netherlands Bank Account Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

bank account number, account number, bancu

aklarashon number, aklarashon number

bancu aklarashon number, aklarashon number,

bankrekeningnummer, rekeningnummer
Library of system data identifiers 1362
Netherlands Driver's License Number

Netherlands Driver's License Number

Identification number for an individual driver's license issue by the Netherlands' RDW agency.
The Netherlands Driver's License Number data identifier detects a 10-digit number that matches
the Netherlands Driver's License Number format.
The Netherlands Driver's License Number data identifier provides two breadths of detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “Netherlands Driver's License Number wide breadth” on page 1362.
■ The narrow breadth detects a 10-digit number without checksum validation. It also requires
the presence of related keywords.
See “Netherlands Driver's License Number narrow breadth” on page 1362.

Netherlands Driver's License Number wide breadth

The wide breadth detects a 10-digit number without checksum validation.

Table 45-758 Netherlands Driver's License Number wide-breadth pattern

Pattern

\d{10}

Table 45-759 Netherlands Driver's License Number wide-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Netherlands Driver's License Number narrow breadth

The narrow breadth detects a 10-digit number without checksum validation. It also requires
the presence of related keywords.

Table 45-760 Netherlands Driver's License Number narrow-breadth pattern

Pattern

\d{10}
Library of system data identifiers 1363
Netherlands Passport Number

Table 45-761 Netherlands Driver's License Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

RIJMEWIJS, Driver License, Driver License Number,

driver license number, Driver Licence, Drivers Lic.,
Drivers License, Drivers Licence, Driver's License,
Driver's License Number, driver's license number,
Driver's Licence Number, Driving License number,
driving license number, DLNo#, dlno#

permis de conduire, rijbewijs, Rijbewijsnummer, DL#,

RIJBEWIJSNUMMER

Netherlands Passport Number

The Dutch passports are issued to Netherlands citizens for the purpose of international travel.
The Netherlands Passport Number data identifier detects a nine-digit number that matches
the Netherlands Passport Number format.
The Netherlands Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects a nine-digit number without checksum validation.
See “Netherlands Passport Number wide breadth” on page 1363.
■ The narrow breadth detect a nine-digit number. It also requires the presence of related
keywords.
See “Netherlands Passport Number narrow breadth” on page 1364.

Netherlands Passport Number wide breadth

The wide breadth detects a nine-digit number without checksum validation.

Table 45-762 Netherlands Passport Number wide-breadth pattern

Pattern

\w{9}
Library of system data identifiers 1364
Netherlands Tax Identification Number

Table 45-763 Netherlands Passport Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Netherlands Passport Number narrow breadth

The narrow breadth detect a nine-digit number. It also requires the presence of related
keywords.

Table 45-764 Netherlands Passport Number narrow-breadth pattern

Pattern

\w{9}

Table 45-765 Netherlands Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Dutch Passport Number, Dutch passport number,

passport number, Netherlands passport number

Nederlanden paspoort nummer, Paspoort, paspoort,

Nederlanden paspoortnummer, paspoortnummer

Netherlands Tax Identification Number

Netherlands issues a tax identification number at birth or at registration at the municipality.
The Netherlands Tax Identification Number data identifier detects a nine-digit number that
matches the Netherlands Tax Identification Number format.
The Netherlands Tax Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects a nine-digit number without checksum validation.
See “Netherlands Tax Identification Number wide breadth” on page 1365.
■ The medium breadth detects a nine-digit number with checksum validation.
See “Netherlands Tax Identification Number medium breadth” on page 1365.
Library of system data identifiers 1365
Netherlands Tax Identification Number

■ The narrow breadth detects a nine-digit number with checksum validation. It also requires
the presence of related keywords.
See “Netherlands Tax Identification Number narrow breadth” on page 1366.

Netherlands Tax Identification Number wide breadth

The wide breadth detects a nine-digit number without checksum validation.

Table 45-766 Netherlands Tax Identification Number wide-breadth patterns

Pattern

\d{9}

\d{3}-\d{3}-\d{3}

\d{3}.\d{3}.\d{3}

\d{3} \d{3} \d{3}

Table 45-767 Netherlands Tax Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Duplicate digits Ensures that a string of digits is not all the same.

Netherlands Tax Identification Number medium breadth

The medium breadth detects a nine-digit number with checksum validation.

Table 45-768 Netherlands Tax Identification Number medium-breadth patterns

Pattern

\d{9}

\d{3}-\d{3}-\d{3}

\d{3}.\d{3}.\d{3}

\d{3} \d{3} \d{3}

Library of system data identifiers 1366
Netherlands Tax Identification Number

Table 45-769 Netherlands Tax Identification Number medium-breadth validator

Mandatory validator Description

Dutch Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.

Netherlands Tax Identification Number narrow breadth

The narrow breadth detects a nine-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-770 Netherlands Tax Identification Number narrow-breadth patterns

Pattern

\d{9}

\d{3}-\d{3}-\d{3}

\d{3}.\d{3}.\d{3}

\d{3} \d{3} \d{3}

Table 45-771 Netherlands Tax Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Duplicate digits Ensures that a string of digits is not all the same.

Dutch Tax Identification Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1367
Netherlands Value Added Tax (VAT) Number

Table 45-771 Netherlands Tax Identification Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

netherlands tax identification number, netherlands tax

identification, netherland's tax identification number,
netherland's tax identification, tax identification
number, dutch tax id, dutch tax identification number,
tax id, tax id#, tax number, tax no#, tax#, TIN, TIN#,
tin#, tin, netherlands tin, netherland's tin

Nederlands belasting identificatienummer,

identificatienummer van belasting, identificatienummer
belasting, Nederlands belasting identificatie,
Nederlands belasting id nummer, Nederlands
belastingnummer, btw nummer, Nederlandse belasting
identificatie, Nederlands belastingnummer

netherlands tax identification tal, netherland's tax

identification tal, tax identification tal, tax tal,
Nederlânske tax identification tal, Hollânske tax
identification, Nederlânsk tax tal, Hollânske tax id tal

netherlands impuesto identification number,

netherland's impuesto identification number, impuesto
identification number, impuesto number, hulandes
impuesto identification number, hulandes impuesto
identification, hulandes impuesto number, hulandes
impuesto id number

Netherlands Value Added Tax (VAT) Number

Value-added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Netherlands, the Value
Added Tax is issued by VAT office for the region in which the business is established.
The Netherlands Value Added Tax (VAT) Number data identifier detects a 14-character
alphanumeric pattern that matches the Netherlands VAT Number format.
The Netherlands Value Added Tax (VAT) Number data identifier provides three breadths of
detection:
■ The wide breadth detects a 14-character alphanumeric pattern beginning with NL, without
checksum validation.
Library of system data identifiers 1368
Netherlands Value Added Tax (VAT) Number

See “Netherlands Value Added Tax (VAT) Number wide breadth” on page 1368.
■ The medium breadth detects a 14-character alphanumeric pattern beginning with NL, with
checksum validation.
See “Netherlands Value Added Tax (VAT) Number medium breadth” on page 1368.
■ The narrow breadth detects a 14-character alphanumeric pattern beginning with NL, with
checksum validation. It also requires the presence of related keywords.
See “Netherlands Value Added Tax (VAT) Number narrow breadth” on page 1369.

Netherlands Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 14-character alphanumeric pattern beginning with NL, without
checksum validation

Table 45-772 Netherlands Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Nn][Ll]\d{9}[Bb]\d{2}

[Nn][Ll]-\d{9}-[Bb]\d{2}

[Nn][Ll] \d{9} [Bb]\d{2}

[Nn][Ll].\d{9}.[Bb]\d{2}

Table 45-773 Netherlands Value Added Tax (VAT) Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Netherlands Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 14-character alphanumeric pattern beginning with NL, with
checksum validation.

Table 45-774 Netherlands Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Nn][Ll]\d{9}[Bb]\d{2}

[Nn][Ll]-\d{9}-[Bb]\d{2}

[Nn][Ll] \d{9} [Bb]\d{2}

Library of system data identifiers 1369
Netherlands Value Added Tax (VAT) Number

Table 45-774 Netherlands Value Added Tax (VAT) Number medium-breadth patterns
(continued)

Pattern

[Nn][Ll].\d{9}.[Bb]\d{2}

Table 45-775 Netherlands Value Added Tax (VAT) Number medium breadth validator

Mandatory validator Description

Netherlands VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Netherlands Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 14-character alphanumeric pattern beginning with NL, with
checksum validation. It also requires the presence of related keywords.

Table 45-776 Netherlands Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Nn][Ll]\d{9}[Bb]\d{2}

[Nn][Ll]-\d{9}-[Bb]\d{2}

[Nn][Ll] \d{9} [Bb]\d{2}

[Nn][Ll].\d{9}.[Bb]\d{2}

Table 45-777 Netherlands Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding numbers.

Netherlands VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

VAT Number, vat no, vat number, VAT#, vat#

BTW, wearde tafoege tax getal, BTW nûmer,

BTW-nummer
Library of system data identifiers 1370
New Zealand Driver's Licence Number

New Zealand Driver's Licence Number

The New Zealand driver license allows the holder to drive specified vehicles with or without
restrictions on public roads. New Zealand driver's licenses are issued by the NZ Transport
Agency.
The New Zealand Driver's Licence data identifier detects an eight-character alphanumeric
pattern that matches the New Zealand Driver's Licence format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an eight-character alphanumeric pattern that matches the New
Zealand Driver's Licence format. It checks for common test patterns.
See “New Zealand Driver's Licence Number wide breadth” on page 1370.
■ The narrow breadth detects an eight-character alphanumeric pattern that matches the New
Zealand Driver's Licence format. It checks for common test patterns, and also requires the
presence of related keywords.
See “New Zealand Driver's Licence Number narrow breadth” on page 1370.

New Zealand Driver's Licence Number wide breadth

The wide breadth detects an eight-character alphanumeric pattern that matches the New
Zealand Driver's Licence format. It checks for common test patterns.

Table 45-778 New Zealand Driver's Licence Number wide-breadth patterns

Pattern

[a-zA-Z][a-zA-Z]\d\d\d\d\d\d

Table 45-779 New Zealand Driver's Licence Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

New Zealand Driver's Licence Number narrow breadth

The narrow breadth detects an eight-character alphanumeric pattern that matches the New
Zealand Driver's Licence format. It checks for common test patterns, and also requires the
presence of related keywords.
Library of system data identifiers 1371
New Zealand National Health Index Number

Table 45-780 New Zealand Driver's Licence Number narrow-breadth patterns

Pattern

[a-zA-Z][a-zA-Z]\d\d\d\d\d\d

Table 45-781 New Zealand Driver's Licence Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

driver license, driver license number, drivers license

number, dlno#, driver's license number, driver permit,
drivers permit, driving permit, license number, licence
number, drivers permit number, dl#

raihana taraiwa

New Zealand National Health Index Number

The National Health Index number (NHI number) is a unique seven-character alphanumeric
identifier that is assigned to every person who uses health and disability support services in
New Zealand.
The New Zealand National Health Index Number detects a seven-character alphanumeric
pattern that matches the NHI number format.
The New Zealand National Health Index Number data identifier provides three breadths of
detection:
■ The wide breadth detects a seven-character alphanumeric pattern with no validation.
See “New Zealand National Health Index Number wide breadth” on page 1372.
■ The medium breadth detects a seven-character alphanumeric pattern with checksum
validation.
See “New Zealand National Health Index Number medium breadth” on page 1372.
Library of system data identifiers 1372
New Zealand National Health Index Number

■ The narrow breadth detects a seven-character alphanumeric pattern with checksum

validation. It also requires the presence of related keywords.
See “New Zealand National Health Index Number narrow breadth” on page 1372.

New Zealand National Health Index Number wide breadth

The wide breadth detects a seven-character alphanumeric pattern with no validation.

Table 45-782 New Zealand National Health Index Number wide-breadth pattern

Pattern

\l{3}\d{4}

The wide breadth does not include any validators.

New Zealand National Health Index Number medium breadth

The medium breadth detects a seven-character alphanumeric pattern with checksum validation.

Table 45-783 New Zealand National Health Index Number medium-breadth pattern

Pattern

\l{3}\d{4}

Table 45-784 New Zealand National Health Index Number medium-breadth validators

Mandatory validator Description

New Zealand National Health Index Number Validation Computes the checksum and validates the pattern against
Check it.

Number delimiter Validates a match by checking the surrounding numbers.

New Zealand National Health Index Number narrow breadth

The narrow breadth detects a seven-character alphanumeric pattern with checksum validation.
It also requires the presence of related keywords.

Table 45-785 New Zealand National Health Index Number narrow-breadth patterns

Pattern

\l{3}\d{4}
Library of system data identifiers 1373
New Zealand Passport Number

Table 45-786 New Zealand National Health Index Number narrow-breadth validators

Mandatory validator Description

New Zealand National Health Index Number Validation Computes the checksum and validates the pattern against
Check it.

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

New Zealand National Health Index Number Validation

Check Find keywords: National Health Index Number,
nhi number, NHI Number, nhi no., NHI number, National
Health Index No., National Health Index Id

New Zealand Passport Number

New Zealand passports are issued to New Zealand citizens for the purpose of international
travel by the Department of Internal Affairs.
The New Zealand Passport Number data identifier detects a seven- or eight-character
alphanumeric pattern that matches the New Zealand Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a seven- or eight-character alphanumeric pattern that matches
the New Zealand Passport Number format. It checks for common test numbers.
See “New Zealand Passport Number wide breadth” on page 1373.
■ The narrow breadth detects a seven- or eight-character alphanumeric pattern that matches
the New Zealand Passport Number format. It checks for common test numbers, and also
requires the presence of related keywords.
See “New Zealand Passport Number narrow breadth” on page 1374.

New Zealand Passport Number wide breadth

The wide breadth detects a seven- or eight-character alphanumeric pattern that matches the
New Zealand Passport Number format. It checks for common test numbers.
Library of system data identifiers 1374
New Zealand Passport Number

Table 45-787 New Zealand Passport Number wide-breadth patterns

Pattern

[Ll][Aa]\d\d\d\d\d\d

[Ll][Dd]\d\d\d\d\d\d

[Ll][Ff]\d\d\d\d\d\d

[Nn]\d\d\d\d\d\d

[Ee][Aa]\d\d\d\d\d\d

[Ll][Hh]\d\d\d\d\d\d

[Ee][Pp]\d\d\d\d\d\d

Table 45-788 New Zealand Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

New Zealand Passport Number narrow breadth

The narrow breadth detects a seven- or eight-character alphanumeric pattern that matches
the New Zealand Passport Number format. It checks for common test numbers, and also
requires the presence of related keywords.

Table 45-789 New Zealand Passport Number narrow-breadth patterns

Pattern

[Ll][Aa]\d\d\d\d\d\d

[Ll][Dd]\d\d\d\d\d\d

[Ll][Ff]\d\d\d\d\d\d

[Nn]\d\d\d\d\d\d

[Ee][Aa]\d\d\d\d\d\d
Library of system data identifiers 1375
Norway Driver's Licence Number

Table 45-789 New Zealand Passport Number narrow-breadth patterns (continued)

Pattern

[Ll][Hh]\d\d\d\d\d\d

[Ee][Pp]\d\d\d\d\d\d

Table 45-790 New Zealand Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

passport, passport number, passport no, passportno,

passport no., passport#, passportno#

uruwhenua, tau uruwhenua, uruwhenua no, uruwhenua

no.

Norway Driver's Licence Number

A driver's license is required in Norway before a person is permitted to drive a motor vehicle
of any description on a road in Norway.
The Norway Driver's Licence Number data identifier detects an 11-digit number that matches
the Norway Driver's Licence Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-digit number that matches the Norway Driver's Licence
Number format. It checks for common test numbers.
See “Norway Driver's Licence Number wide breadth” on page 1376.
■ The narrow breadth detects an 11-digit number that matches the Norway Driver's Licence
Number format. It checks for common test numbers, and also requires the presence of
related keywords.
See “Norway Driver's Licence Number narrow breadth” on page 1376.
Library of system data identifiers 1376
Norway Driver's Licence Number

Norway Driver's Licence Number wide breadth

The wide breadth detects an 11-digit number that matches the Norway Driver's Licence Number
format. It checks for common test numbers.

Table 45-791 Norway Driver's Licence Number wide-breadth patterns

Pattern

\d\d \d\d \d\d\d\d\d\d \d

Table 45-792 Norway Driver's Licence Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Norway Driver's Licence Number narrow breadth

The narrow breadth detects an 11-digit number that matches the Norway Driver's License
Number format. It checks for common test numbers, and also requires the presence of related
keywords.

Table 45-793 Norway Driver's Licence Number narrow-breadth patterns

Pattern

\d\d \d\d \d\d\d\d\d\d \d

Table 45-794 Norway Driver's Licence Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.
Library of system data identifiers 1377
Norway National Identification Number

Table 45-794 Norway Driver's Licence Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

driver license, drivers license, driving license, driver

førerkort, førerkortnummer

Norway National Identification Number

The Norway National identification number is assigned by the Norwegian state to all citizens
of the country. It is administered by the Tax Administration.
The Norway National Identification Number data identifier detects a 9- or 11-digit number that
matches the Norway National Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 9- or 11-digit number that matches the Norway National
Identification Number format without checksum validation. It checks for common test
numbers.
See “Norway National Identification Number wide breadth” on page 1377.
■ The medium breadth detects a 9- or 11-digit number that matches the Norway National
Identification Number format with checksum validation.
See “Norway National Identification Number medium breadth” on page 1378.
■ The narrow breadth detects a 9- or 11-digit number that matches the Norway National
Identification Number format. It checks for common test numbers, and also requires the
presence of related keywords.
See “Norway National Identification Number narrow breadth” on page 1379.

Norway National Identification Number wide breadth

The wide breadth detects a 9- or 11-digit number that matches the Norway National Identification
Number format without checksum validation. It checks for common test numbers.
Library of system data identifiers 1378
Norway National Identification Number

Table 45-795 Norway National Identification Number wide-breadth patterns

Pattern

[0123]\d[01]\d\d\d\d\d\d\d\d

[89]\d\d\d\d\d\d\d\d\d\d

[89]\d\d\d\d\d\d\d\d

[89]\d\d \d\d\d \d\d\d

Table 45-796 Norway National Identification Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Norway National Identification Number medium breadth

The medium breadth detects a 9- or 11-digit number that matches the Norway National
Identification Number format with checksum validation.

Table 45-797 Norway National Identification Number medium-breadth patterns

Pattern

[0123]\d[01]\d\d\d\d\d\d\d\d

[89]\d\d\d\d\d\d\d\d\d\d

[89]\d\d\d\d\d\d\d\d

[89]\d\d \d\d\d \d\d\d

Table 45-798 Norway National Identification Number medium-breadth validators

Mandatory validator Description

Norway National Identification Number Validation Computes the checksum and validates the pattern against
Check it.
Library of system data identifiers 1379
Norway Value Added Tax Number

Norway National Identification Number narrow breadth

The narrow breadth detects a 9- or 11-digit number that matches the Norway National
Identification Number format. It checks for common test numbers, and also requires the
presence of related keywords.

Table 45-799 Norway National Identification Number narrow-breadth patterns

Pattern

[0123]\d[01]\d\d\d\d\d\d\d\d

[89]\d\d\d\d\d\d\d\d\d\d

[89]\d\d\d\d\d\d\d\d

[89]\d\d \d\d\d \d\d\d

Table 45-800 Norway National Identification Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Norway National Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

national ID, national identification number, personal

ID, personal identification number, nationalid#,
personalid#, Nasjonalt ID, personlig ID, Nasjonalt ID#,
personlig ID#, tax id, tax number, tax identification
number, tax code, taxpayer id, taxpayer identification
number, skatt id, skattenummer, skattekode,
skattebetalers id, skattebetalers identifikasjonsnummer

Norway Value Added Tax Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Norway, VAT Is
administered by the VAT office for the region in which the business is established.
Library of system data identifiers 1380
Norway Value Added Tax Number

The Norway Value Added Tax Number data identifier detects an 11- or 14-character
alphanumeric pattern that matches the Norway Value Added Tax Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11- or 14-character alphanumeric pattern that matches the
Norway Value Added Tax Number format without checksum validation. It checks for common
test patterns.
See “Norway Value Added Tax Number wide breadth” on page 1380.
■ The medium breadth detects an 11- or 14-character alphanumeric pattern that matches
the Norway Value Added Tax Number format with checksum validation.
See “Norway Value Added Tax Number medium breadth” on page 1381.
■ The narrow breadth detects an 11- or 14-character alphanumeric pattern that matches the
Norway Value Added Tax Number format with checksum validation. It checks for common
test patterns, and also requires the presence of related keywords.
See “Norway Value Added Tax Number narrow breadth” on page 1381.

Norway Value Added Tax Number wide breadth

The wide breadth detects an 11- or 14-character alphanumeric pattern that matches the Norway
Value Added Tax Number format without checksum validation. It checks for common test
patterns.

Table 45-801 Norway Value Added Tax Number wide-breadth patterns

Pattern

[Nn][Oo]\d\d\d-\d\d\d-\d\d\d

[Nn][Oo]\d\d\d\d\d\d\d\d\d[Mm][Vv][Aa]

Table 45-802 Norway Value Added Tax Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999
Library of system data identifiers 1381
Norway Value Added Tax Number

Norway Value Added Tax Number medium breadth

The medium breadth detects an 11- or 14-character alphanumeric pattern that matches the
Norway Value Added Tax Number format with checksum validation.

Table 45-803 Norway Value Added Tax Number medium-breadth patterns

Pattern

[Nn][Oo]\d\d\d-\d\d\d-\d\d\d

[Nn][Oo]\d\d\d\d\d\d\d\d\d[Mm][Vv][Aa]

Table 45-804 Norway Value Added Tax Number medium-breadth validators

Mandatory validator Description

Norway Value Added Tax (VAT) Number Check Computes the checksum and validates the pattern against
it.

Norway Value Added Tax Number narrow breadth

The narrow breadth detects an 11- or 14-character alphanumeric pattern that matches the
Norway Value Added Tax Number format with checksum validation. It checks for common test
patterns, and also requires the presence of related keywords.

Table 45-805 Norway Value Added Tax Number narrow-breadth patterns

Pattern

[Nn][Oo]\d\d\d-\d\d\d-\d\d\d

[Nn][Oo]\d\d\d\d\d\d\d\d\d[Mm][Vv][Aa]

Table 45-806 Norway Value Added Tax Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999
Library of system data identifiers 1382
Norwegian Birth Number

Table 45-806 Norway Value Added Tax Number narrow-breadth validators (continued)

Mandatory validator Description

Norway Value Added Tax (VAT) Number Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat, vat number, vat#, value added tax number, VAT,

VAT#, vat registration number, VAT Number

mva, MVA, momsnummer, Momsnummer,

momsregistreringsnummer

Norwegian Birth Number

The Norwegian Birth Number is assigned at birth or registration with the National Population
Register. The birth number is written on identity documents, making it possible to match a
bank account or authority document to a person.
The Norwegian Birth Number data identifier detects an 11-digit number that matches the
Norwegian Birth Number format.
The Norwegian Birth Number system data identifier provides three breadths of detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “ Norwegian Birth Number wide breadth” on page 1382.
■ The medium breadth detects an 11-digit number with checksum validation.
See “ Norwegian Birth Number medium breadth” on page 1383.
■ The narrow breadth detects an 11-digit number that passes checksum validation. It also
requires the presence of related keywords.
See “ Norwegian Birth Number narrow breadth” on page 1383.

Norwegian Birth Number wide breadth

The wide breadth detects an 11- digit number without checksum validation.

Table 45-807 Norwegian Birth Number wide breadth patterns

Pattern

[01234567]\d[012345]\d[56789]\d[567]\d{4}
Library of system data identifiers 1383
Norwegian Birth Number

Table 45-807 Norwegian Birth Number wide breadth patterns (continued)

Pattern

[01234567]\d[012345]\d\d\d[01234]\d{4}

[01234567]\d[012345]\d[456789]\d[9]\d{4}

[01234567]\d[012345]\d[0123]\d[56789]\d{4}

Table 45-808 Norwegian Birth Number wide breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Norwegian Birth Number medium breadth

The medium breadth detects an 11-digit number with checksum validation.

Table 45-809 Norwegian Birth Number medium breadth patterns

Pattern

[01234567]\d[012345]\d[56789]\d[567]\d{4}

[01234567]\d[012345]\d\d\d[01234]\d{4}

[01234567]\d[012345]\d[456789]\d[9]\d{4}

[01234567]\d[012345]\d[0123]\d[56789]\d{4}

Table 45-810 Norwegian Birth Number medium breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Norwegian Birth Number Validation Check Computes the checksum and validates the pattern against
it.

Norwegian Birth Number narrow breadth

The narrow breadth detects an 11-digit number that passes checksum validation. It also
requires the presence of Norwegian Birth Number-related keywords.
Library of system data identifiers 1384
People's Republic of China ID

Table 45-811 Norwegian Birth Number narrow breadth patterns

Pattern

[01234567]\d[012345]\d[56789]\d[567]\d{4}

[01234567]\d[012345]\d\d\d[01234]\d{4}

[01234567]\d[012345]\d[456789]\d[9]\d{4}

[01234567]\d[012345]\d[0123]\d[56789]\d{4}

Table 45-812 Norwegian Birth Number narrow breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Norwegian Birth Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Norwegian birth number, birth number, birth no,

birthnumber#, birthbo#

fødselsnummer#, fødsel nummer, Fødsel nr, fødsel

nei, fødselnei#

People's Republic of China ID

The People's Republic of China ID is used for residential registration, army enrollment
registration, registration of marriage/divorce, traveling abroad, taking part in various national
exams, and other social or civil matters in China.
The People's Republic of China ID data identifier detects an 18-digit number that matches the
People's Republic of China ID format.
The People's Republic of China ID data identifier provides two breadths of detection:
■ The wide breadth detects an 18-digit number with the checksum validation.
See “People's Republic of China ID wide breadth” on page 1385.
■ The narrow breadth detects an 18-digit number with the checksum validation. It also requires
the presence of People's Republic of China ID-related keywords.
Library of system data identifiers 1385
People's Republic of China ID

See “People's Republic of China ID narrow breadth” on page 1385.

People's Republic of China ID wide breadth

The wide breadth detects an 18-digit number with the checksum validation.

Table 45-813 People's Republic of China ID wide-breadth pattern

Pattern

\d{18}

\d{17}[Xx]

Table 45-814 People's Republic of China ID wide-breadth validator

Mandatory validator Description

China ID checksum validator Computes the checksum and validates the pattern against
it.

People's Republic of China ID narrow breadth

The narrow breadth detects an 18-digit number with the checksum validation. It also requires
the presence of People's Republic of China ID-related keywords.

Table 45-815
Pattern

\d{18}

\d{17}[Xx]

Table 45-816
Mandatory validator Description

China ID checksum validator Computes the checksum and validates the pattern
against it.
Library of system data identifiers 1386
Poland Driver's Licence Number

Table 45-816 (continued)

Mandatory validator Description

Find keywords At least one of the following keywords or key

phrases must be present for the data to be matched
when you use this option.

Inputs:

身份证,居民信息,居民身份信息

Identity Card, Information of resident,

Information of resident identification

Poland Driver's Licence Number

Poland issues driving licenses confirming the rights of the holder to drive motor vehicles.
The Poland Driver's Licence Number data identifier detects an 11-digit number that matches
the Poland Driver's Licence Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects an 11-digit number that matches the Poland Driver's License
Number format. It checks for common test numbers.
See “Poland Driver's Licence Number wide breadth” on page 1386.
■ The narrow breadth detects an 11-digit number that matches the Poland Driver's License
Number format. It checks for common test numbers, and also requires the presence of
related keywords.
See “Poland Driver's Licence Number narrow breadth” on page 1387.

Poland Driver's Licence Number wide breadth

The wide breadth detects an 11-digit number that matches the Poland Driver's Licence Number
format. It checks for common test numbers.

Table 45-817 Poland Driver's Licence Number wide-breadth patterns

Pattern

\d{5}\/\d{2}\/\d{4}
Library of system data identifiers 1387
Poland European Health Insurance Number

Table 45-818 Poland Driver's Licence Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of numbers is not all the same.

Poland Driver's Licence Number narrow breadth

The narrow breadth detects an 11-digit number that matches the Poland Driver's Licence
Number format. It checks for common test numbers, and also requires the presence of related
keywords.

Table 45-819 Poland Driver's Licence Number narrow-breadth patterns

Pattern

\d{5}\/\d{2}\/\d{4}

Table 45-820 Poland Driver's Licence Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of numbers is not all the same.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

DLNo#, dlno#, DL#, Drivers Lic., driver licence, driver

license, drivers licence, drivers license, driver's
licence, driver's license, driving licence, driving
license, licence number, license number, driving permit

Kierowcy Lic., prawo jazdy, numer licencyjny,

zezwolenie na prowadzenie, PRAWO JAZDY

Poland European Health Insurance Number

The Polish European Health Insurance Number is a unique 20-digit identifier assigned to each
person using Polish health services.
The Poland European Health Insurance Number data identifier detects a 20-digit number that
matches the Polish European Health Insurance Number format.
Library of system data identifiers 1388
Poland European Health Insurance Number

This data identifier provides the following breadths of detection:

■ The wide breadth detects a 20-digit number that matches the Polish European Health
Insurance Number format. It checks for common test numbers.
See “Poland European Health Insurance Number wide breadth” on page 1388.
■ The narrow breadth detects a 20-digit number that matches the Polish European Health
Insurance Number format. It checks for common test numbers, and also requires the
presence of related keywords.
See “Poland European Health Insurance Number narrow breadth” on page 1388.

Poland European Health Insurance Number wide breadth

The wide breadth detects a 20-digit number that matches the Polish European Health Insurance
Number format. It checks for common test numbers.

Table 45-821 Poland European Health Insurance Number wide-breadth pattern

Pattern

80616000\d{2}\d{10}

Table 45-822 Poland European Health Insurance Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

80616000000000000000, 80616000111111111111,
80616000222222222222, 80616000333333333333,
80616000444444444444, 80616000555555555555,
80616000666666666666, 80616000777777777777,
80616000888888888888, 80616000999999999999

Poland European Health Insurance Number narrow breadth

The narrow breadth detects a 20-digit number that matches the Polish European Health
Insurance Number format. It checks for common test numbers, and also requires the presence
of related keywords.
Library of system data identifiers 1389
Poland Passport Number

Table 45-823 Poland European Health Insurance Number narrow-breadth pattern

Pattern

80616000\d{2}\d{10}

Table 45-824 Poland European Health Insurance Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

poland medical account number, health insurance

number, health insurance card, EHIC number, Numer
EHIC, Karta Ubezpieczenia Zdrowotnego, Europejska
Karta Ubezpieczenia Zdrowotnego, numer
ubezpieczenia zdrowotnego, numer rachunku
medycznego, ehic, ehic#, EHIC, EHIC#, medical
account number, medical account no, numer rachunku
medycznego, medical account#, health insurance no,
health insurance#

Poland Passport Number

A Polish passport is an international travel document issued to nationals of Poland. It may also
serve as proof of Polish citizenship.
The Poland Passport Number data identifier detects a nine-character alphanumeric pattern
that matches the Poland Passport Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a nine-character alphanumeric pattern that matches the Poland
Passport Number format. It checks for common test patterns.
See “Poland Passport Number wide breadth” on page 1390.
Library of system data identifiers 1390
Poland Passport Number

■ The narrow breadth detects a nine-character alphanumeric pattern that matches the Poland
Passport Number format. It checks for common test patterns, and also requires the presence
of related keywords.
See “Poland Passport Number narrow breadth” on page 1390.

Poland Passport Number wide breadth

The wide breadth detects a nine-character alphanumeric pattern that matches the Poland
Passport Number format. It checks for common test patterns.

Table 45-825 Poland Passport Number wide-breadth patterns

Pattern

[a-zA-Z]{2}\d{7}

Table 45-826 Poland Passport Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

1111111, 2222222, 3333333, 4444444, 5555555,

6666666, 7777777, 8888888, 9999999

Poland Passport Number narrow breadth

The narrow breadth detects a nine-character alphanumeric pattern that matches the Poland
Passport Number format. It checks for common test patterns, and also requires the presence
of related keywords.

Table 45-827 Poland Passport Number narrow-breadth patterns

Pattern

[a-zA-Z]{2}\d{7}

Table 45-828 Poland Passport Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1391
Poland Value Added Tax (VAT) Number

Table 45-828 Poland Passport Number narrow-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

1111111, 2222222, 3333333, 4444444, 5555555,

6666666, 7777777, 8888888, 9999999

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

passport, passport number, passportnumber,

passport#, passport no, passport book

paszport#, numer paszportu, Nr paszportu, paszport,

książka paszportowa, passeport, nombre, numéro de
passeport, passeport#, No de passeport

Poland Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. For Poland, VAT is
administered by the VAT office for the region in which the business is established.
The Poland Value Added Tax (VAT) Number data identifier detects a 12-character alphanumeric
pattern that matches the Poland VAT Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 12-character alphanumeric pattern that matches the Poland
VAT Number format without checksum validation. It checks for common test patterns.
See “Poland Value Added Tax (VAT) Number wide breadth” on page 1392.
■ The medium breadth detects a 12-character alphanumeric pattern that matches the Poland
VAT Number format with checksum validation.
See “Poland Value Added Tax (VAT) Number medium breadth” on page 1392.
■ The narrow breadth detects a 12-character alphanumeric pattern that matches the Poland
VAT Number format with checksum validation. It checks for common test patterns, and
also requires the presence of related keywords.
See “Poland Value Added Tax (VAT) Number narrow breadth” on page 1393.
Library of system data identifiers 1392
Poland Value Added Tax (VAT) Number

Poland Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 12-character alphanumeric pattern that matches the Poland VAT
Number format without checksum validation. It checks for common test patterns.

Table 45-829 Poland Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Pp][Ll]\d{10}

[Pp][Ll] \d{10}

[Pp][Ll]\d{3}-\d{3}-\d{2}-\d{2}

[Pp][Ll]\d{3}-\d{2}-\d{2}-\d{3}

[Pp][Ll] \d{3}-\d{3}-\d{2}-\d{2}

[Pp][Ll] \d{3}-\d{2}-\d{2}-\d{3}

Table 45-830 Poland Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

1111111111, 2222222222, 3333333333, 4444444444,

5555555555, 6666666666, 7777777777, 8888888888,
9999999999

Poland Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 12-character alphanumeric pattern that matches the Poland
VAT Number format with checksum validation.

Table 45-831 Poland Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Pp][Ll]\d{10}

[Pp][Ll] \d{10}

[Pp][Ll]\d{3}-\d{3}-\d{2}-\d{2}

[Pp][Ll]\d{3}-\d{2}-\d{2}-\d{3}
Library of system data identifiers 1393
Poland Value Added Tax (VAT) Number

Table 45-831 Poland Value Added Tax (VAT) Number medium-breadth patterns (continued)

Pattern

[Pp][Ll] \d{3}-\d{3}-\d{2}-\d{2}

[Pp][Ll] \d{3}-\d{2}-\d{2}-\d{3}

Table 45-832 Poland Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Poland VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Poland Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 12-character alphanumeric pattern that matches the Poland
VAT Number format with checksum validation. It checks for common test patterns, and also
requires the presence of related keywords.

Table 45-833 Poland Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Pp][Ll]\d{10}

[Pp][Ll] \d{10}

[Pp][Ll]\d{3}-\d{3}-\d{2}-\d{2}

[Pp][Ll]\d{3}-\d{2}-\d{2}-\d{3}

[Pp][Ll] \d{3}-\d{3}-\d{2}-\d{2}

[Pp][Ll] \d{3}-\d{2}-\d{2}-\d{3}

Table 45-834 Poland Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

1111111111, 2222222222, 3333333333, 4444444444,

5555555555, 6666666666, 7777777777, 8888888888,
9999999999
Library of system data identifiers 1394
Polish Identification Number

Table 45-834 Poland Value Added Tax (VAT) Number narrow-breadth validators (continued)

Mandatory validator Description

Poland VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat number, value added tax, vat, VAT, VAT#, vat#

VATIN, vatin

Numer Identyfikacji Podatkowej, NIP, nip, Liczba VAT,

podatek od wartosci dodanej, faktura VAT, faktura
VAT#

Polish Identification Number

Every Polish citizen 18 years of age or older residing permanently in Poland must have an
Identity Card, with a unique personal number. The number is used as identification for almost
all purposes.
The Polish Identification Number detects a nine-digit alphanumeric pattern that matches the
Polish Identification Number format.
The Polish ID Number system data identifier provides three breadths of detection:
■ The wide breadth detects a nine-digit alphanumeric pattern without checksum validation.
See “Polish Identification Number wide breadth” on page 1394.
■ The medium breadth detects a nine-digit alphanumeric pattern with checksum validation.
See “Polish Identification Number medium breadth” on page 1395.
■ The narrow breadth detects a nine-digit alphanumeric pattern with checksum validation. It
also requires the presence of related keywords.
See “Polish Identification Number narrow breadth” on page 1395.

Polish Identification Number wide breadth

The wide breadth detects a nine-digit alphanumeric pattern without checksum validation.

Table 45-835 Polish Identification Number wide-breadth pattern

Pattern

[A-Z]{3}\d{6}
Library of system data identifiers 1395
Polish Identification Number

Table 45-836 Polish Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Polish Identification Number medium breadth

The medium breadth detects a nine-digit alphanumeric pattern with checksum validation.

Table 45-837 Polish Identification Number medium-breadth pattern

Pattern

[A-Z]{3}\d{6}

Table 45-838 Polish Identification Number medium-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Polish ID Number Validation Check Computes the checksum and validates the pattern against
it.

Polish Identification Number narrow breadth

The narrow breadth detects a nine-digit alphanumeric pattern with checksum validation. It also
requires the presence of related keywords.

Table 45-839 Polish ID Number narrow-breadth pattern

Pattern

[A-Z]{3}\d{6}

Table 45-840 Polish Identification Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Polish ID Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1396
Polish REGON Number

Table 45-840 Polish Identification Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

national identification number, personal identification

number, personal identity no, unique identity number,
nationalidno#, personal ID, personal identity,
personalidentityno#, uniqueid#, nationalid#,
natioanlidentity#

Dowód osobisty, Tożsamości narodowej, osobisty

numer identyfikacyjny, niepowtarzalny numer, numer
identyfikacyjny, Dowódosobisty#,
niepowtarzalnynumer#

Polish REGON Number

Each national economy entity is obligated to register in the register of business entities called
REGON in Poland. It is the only integrated register in Poland covering all of the national
economy entities. Each company has a unique REGON number.
The Polish REGON Number data identifier detects a 14-digit number that matches the REGON
Number format.
The Polish REGON Number system data identifier provides three breadths of detection:
■ The wide breadth detects a14-digit number without checksum validation.
See “Polish REGON Number wide breadth” on page 1396.
■ The medium breadth detects a 14-digit number with checksum validation.
See “Polish REGON Number medium breadth” on page 1397.
■ The narrow breadth detects a 14-digit number with checksum validation. It also requires
the presence related keywords.
See “Polish REGON Number narrow breadth” on page 1397.

Polish REGON Number wide breadth

The wide breadth detects a 14-digit number without checksum validation.
Library of system data identifiers 1397
Polish REGON Number

Table 45-841 Polish REGON Number wide-breadth patterns

Patterns

\d{14}

\d{9}-\d{5}

Table 45-842 Polish REGON Number wide breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Polish REGON Number medium breadth

The medium breadth detects a 14-digit number with checksum validation.

Table 45-843 Polish REGON Number medium-breadth patterns

Patterns

\d{14}

\d{9}-\d{5}

Table 45-844 Polish REGON Number medium-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Polish REGON Number Validation Check Computes the checksum and validates the pattern against
it.

Polish REGON Number narrow breadth

The narrow breadth detects a 14-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-845 Polish REGON Number narrow-breadth patterns

Patterns

\d{14}

\d{9}-\d{5}
Library of system data identifiers 1398
Polish Social Security Number (PESEL)

Table 45-846 Polish REGON Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Polish REGON Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

REGON ID, statistical number, statistical ID, statistical

no, REGON number, regonid#, REGONID#, regonno#,
company ID, companyID#, company ID no, company
ID number, companyIDno#

numer statystyczny, REGON, numeru REGON,

numerstatystyczny#, numeruREGON#

Polish Social Security Number (PESEL)

The Polish Social Security Number (PESEL) is the national identification number used in
Poland. The PESEL number is mandatory for all permanent residents of Poland and for
temporary residents living in Poland. It uniquely identifies a person and cannot be transferred
to another.
The Polish Social Security Number (PESEL) data identifier detects an 11-digit number that
matches the PESEL format.
The Polish Social Security Number (PESEL) system data identifier provides three breadths of
detection:
■ The wide breadth detects an 11-digit number without checksum validation.
See “Polish Social Security Number (PESEL) wide breadth” on page 1399.
■ The medium breadth detects an 11-digit number with checksum validation.
See “Polish Social Security Number (PESEL) medium breadth” on page 1399.
■ The narrow breadth detects an 11-digit number with checksum validation. It also requires
the presence related keywords.
See “Polish Social Security Number (PESEL) narrow breadth” on page 1399.
Library of system data identifiers 1399
Polish Social Security Number (PESEL)

Polish Social Security Number (PESEL) wide breadth

The wide breadth detects an 11-digit number without checksum validation.

Table 45-847 Polish Social Security Number (PESEL) wide-breadth pattern

Pattern

\d{2}[012389]\d[0-3]\d{6}

Table 45-848 Polish Social Security Number (PESEL) wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Polish Social Security Number (PESEL) medium breadth

The medium breadth detects an 11-digit number with checksum validation.

Table 45-849 Polish Social Security Number (PESEL) medium breadth pattern

Pattern

\d{2}[012389]\d[0-3]\d{6}

Table 45-850 Polish Social Security Number (PESEL) medium breadth validators

Mandatory validator Description

Polish Social Security Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Polish Social Security Number (PESEL) narrow breadth

The narrow breadth detects an 11-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-851 Polish Social Security Number (PESEL) narrow breadth patterns

Pattern

\d{2}[012389]\d[0-3]\d{6}
Library of system data identifiers 1400
Polish Tax Identification Number

Table 45-852 Polish Social Security Number (PESEL) narrow breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Polish Social Security Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

PESEL ID, polish SSN, social security number, social

security no, SSN#, PESELID#, peselno#, pesel number,
social security code

PESEL Liczba, społeczny bezpieczeństwo liczba,

społeczny bezpieczeństwo ID, społeczny
bezpieczeństwo kod, PESELliczba#,
społecznybezpieczeństwoliczba#

Polish Tax Identification Number

The Polish Tax Identification Number (NIP) is a number the government gives to every Poland
citizen who works or does business in Poland. All taxpayers have a tax identification number
called NIP.
The Polish Tax Identification Number data identifier detects a 10-digit number that matches
the NIP format.
The Polish Tax Identification Number (NIP) system data identifier provides three breadths of
detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “Polish Tax Identification Number wide breadth” on page 1401.
■ The medium breadth detects a 10-digit number with checksum validation.
See “Polish Tax Identification Number medium breadth” on page 1401.
■ The narrow breadth detects a 10-digit number with checksum validation. It also requires
the presence of related keywords.
See “Polish Tax Identification Number narrow breadth” on page 1401.
Library of system data identifiers 1401
Polish Tax Identification Number

Polish Tax Identification Number wide breadth

The wide breadth detects a 10-digit number without checksum validation.

Table 45-853 Polish Tax Identification Number wide-breadth patterns

Pattern

\d{10}

\d{3}[ -]\d{3}[ -]\d{2}[ -]\d{2}

\d{3}[ -]\d{2}[ -]\d{2}[ -]\d{3}

Table 45-854 Polish Tax Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Polish Tax Identification Number medium breadth

The medium breadth detects a 10-digit number with checksum validation.

Table 45-855 Polish Tax Identification Number medium-breadth patterns

Pattern

\d{10}

\d{3}[ -]\d{3}[ -]\d{2}[ -]\d{2}

\d{3}[ -]\d{2}[ -]\d{2}[ -]\d{3}

Table 45-856 Polish Tax Identification Number medium breadth-validators

Mandatory validator Description

Polish Social Security Number Validation Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Polish Tax Identification Number narrow breadth

The narrow breadth detects a 10-digit number with checksum validation. It also requires the
presence of related keywords.
Library of system data identifiers 1402
Portugal Driver's Licence Number

Table 45-857 Polish Tax Identification Number narrow-breadth patterns

Pattern

\d{10}

\d{3}[ -]\d{3}[ -]\d{2}[ -]\d{2}

\d{3}[ -]\d{2}[ -]\d{2}[ -]\d{3}

Table 45-858 Polish Tax Identification Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Polish Tax ID Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Tax Number, tax number, tax no., taxno#, taxnumber#,

taxnumber, NIP, NIP#, Tax ID, taxid#, TAXID#, NIP ID,
NIPID#, nip#, tax identification number, tax
identification no., VAT Number, VAT No., vatno#, VAT
ID, VATID#

Numer Identyfikacji Podatkowej, Polski numer

identyfikacji podatkowej,
NumerIdentyfikacjiPodatkowej#, NIP

Portugal Driver's Licence Number

The institute for Mobility and Land Transport (IMTT) issues driver's licenses in Portugal.
The Portugal Driver's Licence Number data identifier detects an 8- to 10-character alphanumeric
pattern that matches the Portugal Driver's Licence Number format.
The Portugal Driver's Licence Number data identifier provides two breadths of detection:
■ The wide breadth detects an 8- to 10-character alphanumeric pattern without checksum
validation.
See “Portugal Driver's Licence Number wide breadth” on page 1403.
Library of system data identifiers 1403
Portugal Driver's Licence Number

■ The narrow breadth detects an 8- to 10-character alphanumeric pattern without checksum

validation. It requires the presence of related keywords.
See “Portugal Driver's Licence Number narrow breadth” on page 1403.

Portugal Driver's Licence Number wide breadth

The wide breadth detects an 8- to 10-character alphanumeric pattern without checksum
validation.

Table 45-859 Portugal Driver's Licence Number wide-breadth patterns

Patterns

[A-Za-z]{2}-\d{5,6} \d

[A-Za-z]-\d{6,8} \d

Table 45-860 Portugal Driver's Licence Number wide-breadth validator

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Portugal Driver's Licence Number narrow breadth

The narrow breadth detects an 8- to 10-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.

Table 45-861 Portugal Driver's Licence Number narrow-breadth patterns

Patterns

[A-Za-z]{2}-\d{5,6} \d

[A-Za-z]-\d{6,8} \d

Table 45-862 Portugal Driver's Licence Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1404
Portugal National Identification Number

Table 45-862 Portugal Driver's Licence Number narrow-breadth validators (continued)

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

DLNo#, dlno#, DL#, Drivers Lic., driver licence, driver

license, drivers licence, drivers license, driver's
licence, driver's license, driving licence, driving
license, licence number, license number, driving
permit, portugal driving license

carteira de motorista, carteira motorista, carteira de

habilitação, carteira habilitação, número de licença,
número licença, permissão de condução, permissão
condução, Licença condução Portugal, carta de
condução

Portugal National Identification Number

The national identification number is a unique identification number usually present on
documents like citizen cards that are issued by the Portuguese government to its citizens. It
can be used as a travel document within the EU and some other European countries.
The Portugal National Identification Number data identifier detects a seven- to nine-character
alphanumeric pattern that matches the Portugal National Identification Number format.
The Portugal National Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects a seven- to nine-character alphanumeric pattern without checksum
validation.
See “Portugal National Identification Number wide breadth” on page 1405.
■ The medium breadth detects a seven- to nine-character alphanumeric pattern with checksum
validation.
See “Portugal National Identification Number medium breadth” on page 1405.
■ The narrow breadth detects a seven- to nine-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.
See “Portugal National Identification Number narrow breadth” on page 1406.
Library of system data identifiers 1405
Portugal National Identification Number

Portugal National Identification Number wide breadth

The wide breadth detects a seven- to nine-character alphanumeric pattern without checksum
validation.

Table 45-863 Portugal National Identification Number wide-breadth patterns

Patterns

\d{8}

\d{7} \d

\d{7}-\d

\d{9}

\d{9}\l{2}\d

\d{8} \d

\d{8}-\d

\d{8} \d \l{2}\d

\d{8}-\d-\l{2}\d

Table 45-864 Portugal National Identification Number wide-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Portugal National Identification Number medium breadth

The medium breadth detects a seven- to nine-character alphanumeric pattern with checksum
validation.

Table 45-865 Portugal National Identification Number medium-breadth patterns

Patterns

\d{8}

\d{7} \d

\d{7}-\d
Library of system data identifiers 1406
Portugal National Identification Number

Table 45-865 Portugal National Identification Number medium-breadth patterns (continued)

Patterns

\d{9}

\d{9}\l{2}\d

\d{8} \d

\d{8}-\d

\d{8} \d \l{2}\d

\d{8}-\d-\l{2}\d

Table 45-866 Portugal National Identification Number medium-breadth validator

Mandatory validator Description

Portugal National Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Portugal National Identification Number narrow breadth

The narrow breadth detects a seven- to nine-character alphanumeric pattern with checksum
validation. It also requires the presence of related keywords.

Table 45-867 Portugal National Identification Number narrow-breadth patterns

Patterns

\d{8}

\d{7} \d

\d{7}-\d

\d{9}

\d{9}\l{2}\d

\d{8} \d

\d{8}-\d

\d{8} \d \l{2}\d

\d{8}-\d-\l{2}\d
Library of system data identifiers 1407
Portugal Passport Number

Table 45-868 Portugal National Identification Number narrow-breadth validators

Mandatory validators Description

Portugal National Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

id number, portugal bi number, NIC, nic, document

number, citizen card, identity card number, identity
card no, national identity card number, national identity
card no, national identification number, national
identification no, identification number, identification
no

bilhete de identidade, número de identificação civil,

número de cartão de cidadão, documento de
identificação, cartão de cidadão, número bi de
portugal, número do documento

Number delimiter Validates a match by checking the surrounding characters.

Portugal Passport Number

Portuguese passports are issued to citizens of Portugal for the purpose of international travel.
The passport, along with the national identity card allows for free rights of movement and
residence in any of the states of the European Union and European economic area.
The Portugal Passport Number data identifier detects a seven-character alphanumeric pattern
that matches the Portugal Passport Number format.
The Portugal Passport Number data identifier provides two breadths of detection:
■ The wide breadth detects a seven-character alphanumeric pattern without validation.
See “Portugal Passport Number wide breadth” on page 1408.
■ The narrow breadth detects a seven-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.
See “Portugal Passport Number narrow breadth” on page 1408.
Library of system data identifiers 1408
Portugal Tax Identification Number

Portugal Passport Number wide breadth

The wide breadth detects a seven-character alphanumeric pattern without validation.

Table 45-869 Portugal Passport Number

Pattern

[a-zA-Z]\d{6}

Portugal Passport Number narrow breadth

The narrow breadth detects a seven-character alphanumeric pattern without checksum
validation. It requires the presence of related keywords.

Table 45-870 Portugal Passport Number narrow-breadth pattern

Pattern

[a-zA-Z]\d{6}

Table 45-871 Portugal Passport Number narrow-breadth validators

Mandatory validator Description

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

passport number, passport, passport no, passaporte,

passeport, portuguese passport, portuguese
passeport, portuguese passaporte, passaporte nº,
passeport nº

Number delimiter Validates a match by checking the surrounding characters.

Portugal Tax Identification Number

A fiscal number is a tax identification number that is issued in Portugal to anyone who wishes
to undertake any official matters in Portugal.
The Portugal Tax Identification Number data identifier detects a nine-digit number in the
Portugal Tax Identification Number format.
The Portugal Tax Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects a nine-digit number without checksum validation.
Library of system data identifiers 1409
Portugal Tax Identification Number

See “Portugal Tax Identification Number wide breadth” on page 1409.

■ The medium breadth detects a nine-digit number with checksum validation.
See “Portugal Tax Identification Number medium breadth” on page 1409.
■ The narrow breadth detects a nine-digit number with checksum validation. It also requires
the presence of related keywords.
See “Portugal Tax Identification Number narrow breadth” on page 1410.

Portugal Tax Identification Number wide breadth

The wide breadth detects a nine-digit number without checksum validation.

Table 45-872 Portugal Tax Identification Number wide-breadth patterns

Patterns

\d{9}

\d{3}-\d{3}-\d{3}

\d{3} \d{3} \d{3}

\d{3}.\d{3}.\d{3}

Table 45-873 Portugal Tax Identification Number wide-breadth validators

Mandatory validators Description

Duplicate digits Ensures that a string of digits is not all the same.

Portugal Tax Identification Number medium breadth

The medium breadth detects a nine-digit number with checksum validation.

Table 45-874 Portugal Tax Identification Number medium-breadth patterns

Patterns

\d{9}

\d{3}-\d{3}-\d{3}

\d{3} \d{3} \d{3}

\d{3}.\d{3}.\d{3}
Library of system data identifiers 1410
Portugal Tax Identification Number

Table 45-874 Portugal Tax Identification Number medium-breadth patterns (continued)

Patterns

\d{3}.\d{3}.\d{3}

Table 45-875 Portugal Tax Identification Number medium-breadth validator

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000,111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

Portugal Tax and VAT Identification Number Validation Computes the checksum and validates the match against
Check it.

Number delimiter Validates a match by checking the surrounding characters.

Portugal Tax Identification Number narrow breadth

The narrow breadth detects a nine-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-876 Portugal Tax Identification Number narrow-breadth patterns

Patterns

\d{9}

\d{3}-\d{3}-\d{3}

\d{3} \d{3} \d{3}

\d{3}.\d{3}.\d{3}

\d{3}.\d{3}.\d{3}
Library of system data identifiers 1411
Portugal Value Added Tax (VAT) Number

Table 45-877 Portugal Tax Identification Number narrow-breadth validators

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000,111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

Portugal Tax and VAT Identification Number Validation Computes the checksum and validates the match against
Check it.

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

TIN#, NIF#, tax identification number, taxpayer

identification number, tax id number, tax id no, tax id

CPF, CPF#, NIF, número identificação fiscal

Number delimiter Validates a match by checking the surrounding characters.

Portugal Value Added Tax (VAT) Number

VAT is a consumption tax that is borne by the end consumer. VAT is paid for each transaction
in the manufacturing and distribution process.
The Portugal Value Added Tax (VAT) Number data identifier detects an 11-character
alphanumeric pattern that matches the Portugal Value Added Tax (VAT) Number format.
The Portugal Value Added Tax (VAT) Number data identifier provides three breadths of
detection:
■ The wide breadth detects an 11-character alphanumeric pattern starting with PT and followed
by nine digits without checksum validation.
See “Portugal Value Added Tax (VAT) Number wide breadth” on page 1412.
■ The medium breadth detects an 11-character alphanumeric pattern starting with PT and
followed by nine digits with checksum validation.
See “Portugal Value Added Tax (VAT) Number medium breadth” on page 1412.
Library of system data identifiers 1412
Portugal Value Added Tax (VAT) Number

■ The narrow breadth detects an 11-character alphanumeric pattern starting with PT and
followed by nine digits with checksum validation. It also requires the presence of related
keywords.
See “Portugal Value Added Tax (VAT) Number narrow breadth” on page 1413.

Portugal Value Added Tax (VAT) Number wide breadth

The wide breadth detects an 11-character alphanumeric pattern starting with PT and followed
by nine digits without checksum validation.

Table 45-878 Portugal Value Added Tax (VAT) Number wide-breadth patterns

Patterns

[Pp][Tt]\d{9}

[Pp][Tt] \d{9}

[Pp][Tt]-\d{9}

[Pp][Tt] \d{3} \d{4} \d{2}

[Pp][Tt] \d{3}-\d{3}-\d{3}

[Pp][Tt] \d{3} \d{3} \d{3}

Table 45-879 Portugal Value Added Tax (VAT) Number wide-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000,111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

Portugal Value Added Tax (VAT) Number medium breadth

The medium breadth detects an 11-character alphanumeric pattern starting with PT and followed
by nine digits with checksum validation.
Library of system data identifiers 1413
Portugal Value Added Tax (VAT) Number

Table 45-880 Portugal Value Added Tax (VAT) Number medium-breadth patterns

Patterns

[Pp][Tt]\d{9}

[Pp][Tt] \d{9}

[Pp][Tt]-\d{9}

[Pp][Tt] \d{3} \d{4} \d{2}

[Pp][Tt] \d{3}-\d{3}-\d{3}

[Pp][Tt] \d{3} \d{3} \d{3}

Table 45-881 Portugal Value Added Tax (VAT) Number medium-breadth validator

Mandatory validator Description

Portugal Tax and VAT Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Portugal Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects an 11-character alphanumeric pattern starting with PT and followed
by nine digits with checksum validation. It also requires the presence of related keywords.

Table 45-882 Portugal Value Added Tax (VAT) Number narrow-breadth patterns

Patterns

[Pp][Tt]\d{9}

[Pp][Tt] \d{9}

[Pp][Tt]-\d{9}

[Pp][Tt] \d{3} \d{4} \d{2}

[Pp][Tt] \d{3}-\d{3}-\d{3}

[Pp][Tt] \d{3} \d{3} \d{3}

Library of system data identifiers 1414
Randomized US Social Security Number (SSN)

Table 45-883 Portugal Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validators Description

Portugal Tax and VAT Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

portugal vat number, portugal vat no, vat number,

NUPC, vat no, vat, VAT#, vat code, value added tax
number, vat id, vat registration number, value added
tax, vat reg no

imposto sobre valor acrescentado, VAT nº, número

iva, vat não, cuba, código iva

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

000000000,111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,
888888888, 999999999

Randomized US Social Security Number (SSN)

The Randomized US Social Security Number (SSN) data identifier detects 9-digit numbers
with the pattern DDD-DD-DDDD, separated with dashes or spaces or without separators. The
number must be in valid assigned number ranges. Pattern validators eliminate common test
numbers, such as 123456789 or all the same digit. The data identifier is narrow in breadth
and requires the presence of a Social Security-related keyword.
See “Updating policies to use the Randomized US SSN data identifier” on page 810.
See “Use the Randomized US SSN data identifier to detect SSNs” on page 836.
The Randomized US SSN data identifier provides two breadths of detection:
■ The medium breadth detects a 9-digit number in the format DDD-DD-DDDD. The digits
must be in assigned number ranges.
See “Randomized US Social Security Number (SSN) medium breadth” on page 1415.
■ The narrow breadth detects a 9-digit number in the format DDD-DD-DDDD. The digits must
be in assigned number ranges. It also requires the presence of SSN-related keywords.
Library of system data identifiers 1415
Randomized US Social Security Number (SSN)

See “Randomized US Social Security Number (SSN) narrow breadth” on page 1415.

Randomized US Social Security Number (SSN) medium breadth

The medium breadth detects a 9-digit number in the format DDD-DD-DDDD. The digits must
be in assigned number ranges.

Table 45-884 Randomized US SSN medium-breadth patterns and normalizer

Component Value Description

Patterns Detects 9-digit numbers with the

[0-8]\d{2} \d{1}[1-9] \d{4} pattern DDD-DD-DDDD, separated
[0-8]\d{3}[1-9]\d{4} with dashes, spaces, or none. The
[0-8]\d{2}[1-9]\d{5} number must be in valid assigned
[0-8]\d{2}-\d{1}[1-9]-\d{4} number ranges
[0-8]\d{2} [1-9]\d{1} \d{4}
[0-8]\d{2}-[1-9]\d{1}-\d{4}

Data normalizer Digits See “About data normalizers”

on page 733.

Table 45-885 Randomized US SSN medium breadth validators and input

Active Validators Input (if any) Description

Exclude beginning characters 666, 000, 123456789, 111111111, See “Using pattern validators”
222222222, 333333333, 444444444, on page 818.
555555555, 666666666, 77777777,
888888888

Number delimiter

Exclude ending characters 0000

Randomized US Social Security Computes the checksum and validates

Number Validation Check the pattern against it.

Randomized US Social Security Number (SSN) narrow breadth

The narrow breadth detects a 9-digit number in the format DDD-DD-DDDD. The digits must
be in assigned number ranges. It also requires the presence of SSN-related keywords.
Library of system data identifiers 1416
Romania Driver's Licence Number

Table 45-886 Randomized US Social Security Number (SSN) narrow-breadth patterns

Pattern

[0-8]\d{2} \d{1}[1-9] \d{4}

[0-8]\d{3}[1-9]\d{4}

[0-8]\d{2}[1-9]\d{5}

[0-8]\d{2}-\d{1}[1-9]-\d{4}

[0-8]\d{2} [1-9]\d{1} \d{4}

[0-8]\d{2}-[1-9]\d{1}-\d{4}

Table 45-887
Validator Description

Number Delimiter Validates a match by checking the surrounding characters.

Exclude beginning characters Data beginning with any of the following list of values is
not matched:

666, 000, 123456789, 111111111, 222222222,

333333333, 444444444, 555555555, 666666666,
77777777, 888888888

Exclude ending characters Data ending with any of the following list of values is not
matched:
0000

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched when you use this
option.

Inputs:

social security number, ssn, ss#

Randomized US Social Security Number Validation Computes the checksum and validates the pattern against
Check it.

Romania Driver's Licence Number

A driving license in Romania is a document confirming the rights of the holder to drive motor
vehicles.
Library of system data identifiers 1417
Romania Driver's Licence Number

The Romania Driver's Licence Number data identifier detects a 9-, 10-, or 11-character
alphanumeric pattern that matches the Romania Driver's Licence Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 9-, 10-, or 11-character alphanumeric pattern that matches the
Romania Driver's Licence Number format with checksum validation. It checks for common
test patterns.
See “Romania Driver's Licence Number wide breadth” on page 1417.
■ The narrow breadth detects a 9-, 10-, or 11-character alphanumeric pattern that matches
the Romania Driver's Licence Number format with checksum validation. It checks for
common test patterns, and also requires the presence of related keywords.
See “Romania Driver's Licence Number narrow breadth” on page 1418.

Romania Driver's Licence Number wide breadth

The wide breadth detects a 9-, 10-, or 11-character alphanumeric pattern that matches the
Romania Driver's Licence Number format with checksum validation. It checks for common test
patterns.

Table 45-888 Romania Driver's Licence Number wide-breadth patterns

Pattern

[Ii][Gg][Pp]\d{8}

[A-Za-z]\d{8}

[A-Za-z]\d{8}[A-Za-z]

Table 45-889 Romania Driver's Licence Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Romania Driver's Licence Number Validation Check Computes the checksum and validates the pattern against
it.
Library of system data identifiers 1418
Romania Driver's Licence Number

Romania Driver's Licence Number narrow breadth

The narrow breadth detects a 9-, 10-, or 11-character alphanumeric pattern that matches the
Romania Driver's Licence Number format with checksum validation. It checks for common test
patterns, and also requires the presence of related keywords.

Table 45-890 Romania Driver's Licence Number narrow-breadth patterns

Pattern

[Ii][Gg][Pp]\d{8}

[A-Za-z]\d{8}

[A-Za-z]\d{8}[A-Za-z]

Table 45-891 Romania Driver's Licence Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Exclude ending characters Data ending with any of the following list of values is not
matched:

00000000, 11111111, 22222222, 33333333, 44444444,

55555555, 66666666, 77777777, 88888888, 99999999

Romania Driver's Licence Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

driver license, drivers license, driving license, driver

permis de conducere, PERMIS DE CONDUCERE,

Permis de conducere, numărul permisului de
conducere, Numărul permisului de conducere
Library of system data identifiers 1419
Romania National Identification Number

Romania National Identification Number

In Romania each citizen has a personal numerical code as a unique national identification
number. This number is also used as a tax identification number for financial purposes.
The Romania National Identification Number data identifier detects a 13-digit number that
matches the CNP format.
The Romania National Identification Number data identifier provides three breadths of detection:
■ The wide breadth detects a 13-digit number without checksum validation.
See “Romania National Identification Number wide breadth” on page 1419.
■ The medium breadth detects a 13-digit number with checksum validation.
See “Romania National Identification Number medium breadth” on page 1419.
■ The narrow breadth detects a 13-digit number with checksum validation. It also requires
the presence of related keywords.
See “Romania National Identification Number narrow breadth” on page 1420.

Romania National Identification Number wide breadth

The wide breadth detects a 13-digit number without checksum validation.

Table 45-892 Romania National Identification Number wide-breadth pattern

Pattern

\d{13}

Table 45-893 Romania National Identification Number wide-breadth validators

Mandatory validators Description

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Romania National Identification Number medium breadth

The medium breadth detects a 13-digit number with checksum validation.

Table 45-894 Romania National Identification Number medium-breadth pattern

Pattern

\d{13}
Library of system data identifiers 1420
Romania Value Added Tax (VAT) Number

Table 45-895 Romania National Identification Number medium-breadth validator

Mandatory validator Description

Romania National Identification Number Check Computes the checksum and validates the pattern against
it.

Romania National Identification Number narrow breadth

The narrow breadth detects a 13-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-896 Romania National Identification Number narrow-breadth pattern

Pattern

\d{13}

Table 45-897 Romania National Identification Number narrow-breadth validators

Mandatory validators Description

Romania National Identification Number Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

romania tax identification number, tax identification

number, tin, tin#, tin number, tin no, numărul de
identificare fiscală, identificarea fiscală nr #, codul
fiscal nr.

national ID, national ID#, ID#, national identification

number, Cod Numeric Personal, cnp, CNP

Romania Value Added Tax (VAT) Number

Value Added Tax (VAT) is a consumption tax that is borne by the end consumer. VAT is paid
for each transaction in the manufacturing and distribution process. In Romania, it is also called
TVA or CIF.
Library of system data identifiers 1421
Romania Value Added Tax (VAT) Number

The Romania Value Added Tax (VAT) Number data identifier detects a 4- to 12-character
alphanumeric pattern that matches the Romania VAT Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 4- to 12-character alphanumeric pattern that matches the
Romania VAT Number format without checksum validation. It checks for common test
patterns.
See “Romania Value Added Tax (VAT) Number wide breadth” on page 1421.
■ The medium breadth detects a 4- to 12-character alphanumeric pattern that matches the
Romania VAT Number format with checksum validation.
See “Romania Value Added Tax (VAT) Number medium breadth” on page 1422.
■ The narrow breadth detects a 4- to 12-character alphanumeric pattern that matches the
Romania VAT Number format with checksum validation. It checks for common test patterns,
and also requires the presence of related keywords.
See “Romania Value Added Tax (VAT) Number narrow breadth” on page 1423.

Romania Value Added Tax (VAT) Number wide breadth

The wide breadth detects a 4- to 12-character alphanumeric pattern that matches the Romania
VAT Number format without checksum validation. It checks for common test patterns.

Table 45-898 Romania Value Added Tax (VAT) Number wide-breadth patterns

Pattern

[Rr][Oo][1-9]\d{1,9}

[Rr][Oo] [1-9]\d{1,9}

Table 45-899 Romania Value Added Tax (VAT) Number wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1422
Romania Value Added Tax (VAT) Number

Table 45-899 Romania Value Added Tax (VAT) Number wide-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00, 11, 22, 33, 44, 55, 66, 77, 88, 99

000, 111, 222, 333, 444, 555, 666, 777, 888, 999

0000, 1111, 2222, 3333, 4444, 5555, 6666, 7777, 8888,

9999

00000, 11111, 22222, 33333, 44444, 55555, 66666,

77777, 88888, 99999

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999,
00000000, 11111111, 22222222, 33333333, 44444444,
55555555, 66666666, 77777777, 88888888, 99999999

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,8
88888888, 999999999

0000000000, 1111111111, 2222222222, 3333333333,

4444444444, 5555555555, 6666666666, 7777777777,
8888888888, 9999999999

Romania Value Added Tax (VAT) Number medium breadth

The medium breadth detects a 4- to 12-character alphanumeric pattern that matches the
Romania VAT Number format with checksum validation.

Table 45-900 Romania Value Added Tax (VAT) Number medium-breadth patterns

Pattern

[Rr][Oo][1-9]\d{1,9}

[Rr][Oo] [1-9]\d{1,9}
Library of system data identifiers 1423
Romania Value Added Tax (VAT) Number

Table 45-901 Romania Value Added Tax (VAT) Number medium-breadth validators

Mandatory validator Description

Romania VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Romania Value Added Tax (VAT) Number narrow breadth

The narrow breadth detects a 4- to 12-character alphanumeric pattern that matches the
Romania VAT Number format with checksum validation. It checks for common test patterns,
and also requires the presence of related keywords.

Table 45-902 Romania Value Added Tax (VAT) Number narrow-breadth patterns

Pattern

[Rr][Oo][1-9]\d{1,9}

[Rr][Oo] [1-9]\d{1,9}

Table 45-903 Romania Value Added Tax (VAT) Number narrow-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

Library of system data identifiers 1424
Romania Value Added Tax (VAT) Number

Table 45-903 Romania Value Added Tax (VAT) Number narrow-breadth validators (continued)

Mandatory validator Description

Exclude ending characters Data ending with any of the following list of values is not
matched:

00, 11, 22, 33, 44, 55, 66, 77, 88, 99

000, 111, 222, 333, 444, 555, 666, 777, 888, 999

0000, 1111, 2222, 3333, 4444, 5555, 6666, 7777, 8888,

9999

00000, 11111, 22222, 33333, 44444, 55555, 66666,

77777, 88888, 99999

000000, 111111, 222222, 333333, 444444, 555555,

666666, 777777, 888888, 999999

0000000, 1111111, 2222222, 3333333, 4444444,

5555555, 6666666, 7777777, 8888888, 9999999,
00000000, 11111111, 22222222, 33333333, 44444444,
55555555, 66666666, 77777777, 88888888, 99999999

000000000, 111111111, 222222222, 333333333,

444444444, 555555555, 666666666, 777777777,8
88888888, 999999999

0000000000, 1111111111, 2222222222, 3333333333,

4444444444, 5555555555, 6666666666, 7777777777,
8888888888, 9999999999

Romania VAT Number Validation Check Computes the checksum and validates the pattern against
it.

Find keywords At least one of the following keywords or key phrases must
be present for the data to be matched.

Inputs:

vat number, value added tax, vat, VAT, VAT#, vat#,

VATIN, vatin, fiscal identification code, fiscal code,
unique identification code, unique registration code

CIF, cif, CUI, cui, TVA, tva, TVA#, tva#, taxa pe valoare
adaugata, cod fiscal, cod fiscal de identificare, cod
fiscal identificare, Cod Unic de Înregistrare, cod unic
de identificare, cod unic identificare, cod unic de
înregistrare, cod unic înregistrare
Library of system data identifiers 1425
Romanian Numerical Personal Code

Romanian Numerical Personal Code

In Romania, each citizen has a unique numerical personal code. The number is used by
authorities, health care, schools, universities, banks, and insurance companies for customer
identification.
The Romanian Numerical Personal Code data identifier detects a 13-digit number that matches
the CNP format.
The Romanian Numerical Personal Code system data identifier provides three breadths of
detection:
■ The wide breadth detects a 13-digit number without checksum validation.
See “ Romanian Numerical Personal Code wide breadth” on page 1425.
■ The medium breadth detects a 13-digit number with checksum validation.
See “ Romanian Numerical Personal Code medium breadth” on page 1425.
■ The narrow breadth a 13-digit number that passes checksum validation. It also requires
the presence of CNP-related keywords.
See “ Romanian Numerical Personal Code narrow breadth” on page 1426.

Romanian Numerical Personal Code wide breadth

The wide breadth detects a 13-digit number without checksum validation.

Table 45-904 Romanian Numerical Personal Code wide-breadth pattern

Pattern

[1-9]\d\d[0-1]\d[0-3]\d{7}

Table 45-905 Romanian Numerical Personal Code wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Romanian Numerical Personal Code medium breadth

The medium breadth detects a 13-digit number with checksum validation.

Table 45-906 Romanian Numerical Personal Code medium-breadth pattern

Pattern

[1-9]\d\d[0-1]\d[0-3]\d{7}
Library of system data identifiers 1426
Romanian Numerical Personal Code

Table 45-907 Romanian Numerical Personal Code medium-breadth validators

Mandatory validator Description

Romanian Numerical Personal Code Check Computes the checksum and validates the pattern against
it.

Number delimiter Validates a match by checking the surrounding characters.

Romanian Numerical Personal Code narrow breadth

The narrow breadth a 13-digit number with checksum validation. It also requires the presence
of related keywords.

Table 45-908 Romanian Numerical Personal Code narrow-breadth pattern

Pattern

[1-9]\d\d[0-1]\d[0-3]\d{7}

Table 45-909 Romanian Numerical Personal Code narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding characters.

Romanian Numerical Personal Code Check Computes the checksum and validates the pattern against
it.

Find keywords With this option selected, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

Personal Numeric Code, unique identification number,

CNP, CNP#, PIN, PIN#, Insurance Number,
insurancenumber#, unique identity number,
uniqueidentityno#, Cod Numeric Personal, cod
identificare personal, cod unic identificare, număr
personal unic, număr identitate, număr identificare
personal, număridentitate#, CodNumericPersonal#,
numărpersonalunic#
Library of system data identifiers 1427
Russian Passport Identification Number

Russian Passport Identification Number

Russia issues two types of passports: domestic and international. Every Russian citizen has
a domestic passport. It is the main document used for personal identification.
The Russian Passport Identification Number data identifier detects a 10-digit number that
matches the Russian Passport Identification Number format.
The Russian Passport Identification Number data identifier provides two breadths of detection:
■ The wide breadth detects a 10-digit number without checksum validation.
See “Russian Passport Identification Number wide breadth” on page 1427.
■ The narrow breadth detects a 10-digit number with checksum validation. It also requires
the presence of related keywords.
See “Russian Passport Identification Number narrow breadth” on page 1427.

Russian Passport Identification Number wide breadth

The wide breadth detects a 10-digit number without checksum validation.

Table 45-910 Russian Passport Identification Number wide-breadth patterns

Pattern

\d{10}

\d{4}[ ]\d{6}

\d{2}[- ]\d{2}[ ]\d{6}

Table 45-911 Russian Passport Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Russian Passport Identification Number narrow breadth

The narrow breadth detects a 10-digit number with checksum validation. It also requires the
presence of related keywords.

Table 45-912 Russian Passport Identification Number narrow-breadth patterns

Pattern

\d{10}
Library of system data identifiers 1428
Russian Taxpayer Identification Number

Table 45-912 Russian Passport Identification Number narrow-breadth patterns (continued)

Pattern

\d{4}[ ]\d{6}

\d{2}[- ]\d{2}[ ]\d{6}

Table 45-913 Russian Passport Identification Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Number delimiter Validates a match by checking the surrounding numbers.

Find keywords If you select this option, at least one of the following
keywords or key phrases must be present for the data to
be matched.

Inputs:

passport number, passport no, passport ID,

passportnumber#, passportno#, russian passport ID,
паспорт нет, паспорт, номер паспорта, паспорт ID,
Российской паспорт, Русский номер паспорта,
паспорт#, паспортID#, номерпаспорта#

Russian Taxpayer Identification Number

The Russian Taxpayer Identification Number (TIN or INN) is a multi-digit number that enables
the tax inspectorate to identify the tax status of legal entities and individuals.
The Russian Taxpayer Identification Number data identifier detects a 10- or 12-digit number
that matches the Russian Taxpayer Identification Number format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a 10- or 12-digit number without checksum validation.
See “Russian Taxpayer Identification Number wide breadth” on page 1429.
■ The medium breadth validates the detected number using the final check digit and eliminates
common test numbers.
See “Russian Taxpayer Identification Number medium breadth” on page 1429.
■ The narrow breadth detects a 10- or 12-digit number with checksum validation. It also
requires the presence of related keywords.
See “Russian Taxpayer Identification Number narrow breadth” on page 1429.
Library of system data identifiers 1429
Russian Taxpayer Identification Number

Russian Taxpayer Identification Number wide breadth

The wide breadth detects a 10- or 12-digit number without checksum validation.

Table 45-914 Russian Taxpayer Identification Number wide-breadth patterns

Pattern

\d{10}

\d{12}

\d{3}[ -]\d{3}[ -]\d{3}[ -]\d{3}

Table 45-915 Russian Taxpayer Identification Number wide-breadth validator

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same.

Russian Taxpayer Identification Number medium breadth

The medium breadth detects a 10- or 12-digit number with checksum validation.

Table 45-916 Russian Taxpayer Identification Number medium-breadth patterns

Pattern

\d{10}

\d{12}

\d{3}[ -]\d{3}[ -]\d{3}[ -]\d{3}

Table 45-917 Russian Taxpayer Identification Number medium-breadth validators

Mandatory validator Description

Russian Taxpayer Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Number delimiter Validates a match by checking the surrounding numbers.

Russian Taxpayer Identification Number narrow breadth

The narrow breadth detects a 10- or 12-digit number with checksum validation. It also requires
the presence of related keywords.
Library of system data identifiers 1430
SEPA Creditor Identifier Number North

Table 45-918 Russian Taxpayer Identification Number narrow-breadth patterns

Pattern

\d{10}

\d{12}

\d{3}[ -]\d{3}[ -]\d{3}[ -]\d{3}

Table 45-919 Russian Taxpayer Identification Number narrow-breadth validators

Mandatory validator Description

Duplicate digits Ensures that a string of digits is not all the same

Russian Taxpayer Identification Number Validation Computes the checksum and validates the pattern against
Check it.

Duplicate digits Ensures that a string of digits is not all the same.

Find keywords If you select this option, you have to use at least one of
the following keywords or key phrases must be present
for the data to be matched.

Inputs:

TIN, taxpayer number, taxpayer ID, taxpayer no, tax

ID, tin,tinno#, inn, inn#, taxpayerno#, taxid#,
taxpayeridno#, taxpayerid#, НДС, номер
налогоплательщика, Налогоплательщика ИД, налог
число, налогчисло#, ИНН#, НДС#

SEPA Creditor Identifier Number North

The Single Euro Payment Area (SEPA) is a payments system created by the European Union
that harmonizes the way cashless payments transact between Euro countries. SEPA North is
for the United Kingdom, Sweden, Denmark, Finland, Ireland. European consumers, businesses,
and government agents who make payments by direct debit, credit card or through credit
transfers use the SEPA architecture. The Single Euro Payment Area is approved and regulated
by European Commission.
The SEPA Creditor Identifier Number North data identifier detects a unique alphanumeric
string that matches the SEPA Credit Identifier North format.
This data identifier provides the following breadths of detection:
■ The wide breadth detects a unique alphanumeric string that matches the SEPA Credit
Identifier North format without checksum validation.
Library of system data identifiers 1431
SEPA Creditor Identifier Number North

See “SEPA Creditor Identifier Number North wide breadth” on page 1431.
■ The medium breadth detects a unique alphanumeric string that matches the SEPA Credit
Identifier North format with checksum validation.
See “SEPA Creditor Identifier Number North medium breadth” on page 1433.
■ The narrow breadth detects a unique alphanumeric string that matches the SEPA Credit
Identifier North format with checksum validation. It also requires the presence of related
keywords.
See “SEPA Creditor Identifier Number North narrow breadth” on page 1435.

SEPA Creditor Identifier Number North wide breadth

The wide breadth detects a unique alphanumeric string that matches the SEPA Credit Identifier
North format without checksum validation.

Table 45-920 SEPA Creditor Identifier Number North wide-breadth patterns

Pattern

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d[Zz][Zz][Zz]\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w
Library of system data identifiers 1432
SEPA Creditor Identifier Number North

Table 45-920 SEPA Creditor Identifier Number North wide-breadth patterns (continued)

Pattern

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w\w\w\w

[Gg][Bb]\d\d\d\d\d\w\w\w\w\w\w\w\d\d\d\d\d\d\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w

[Ss][Ee]\d\d[Zz][Zz][Zz]\d\d\d\d\d\d\d\d\d\d

[Ss][Ee]\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d

[Ii][Ee]\d\d[Zz][Zz][Zz]\d\d\d\d\d\d

[Ii][Ee]\d\d\d\d\d\d\d\d\d\d\d

[Ff][Ii]\d\d[Zz][Zz][Zz]\d\d\d\d\d\d\d\d

[Ff][Ii]\d\d\d\d\d\d\d\d\d\d\d\d\d

[Dd][Kk]\d\d[Zz][Zz][Zz]\d\d\d\d\d\d\d\d\d\d\d\d

[Dd][Kk]\d\d[Zz][Zz][Zz]\d\d\d\d\d\d\d\d\d\d\d\d
Library of system data identifiers 1433
SEPA Creditor Identifier Number North

Table 45-921 SEPA Creditor Identifier Number North wide-breadth validators

Mandatory validator Description

Number delimiter Validates a match by checking the surrounding characters.

SEPA Creditor Identifier Number North medium breadth

The medium breadth detects a unique alphanumeric string that matches the SEPA Credit
Identifier North format with checksum validation.

Table 45-922 SEPA Creditor Identifier Number North medium-breadth patterns