0% found this document useful (0 votes)
96 views10 pages

The Role Mining Problem: Finding A Minimal Descriptive Set of Roles

mining in data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views10 pages

The Role Mining Problem: Finding A Minimal Descriptive Set of Roles

mining in data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

The Role Mining Problem: Finding a Minimal Descriptive

Set of Roles ∗

Jaideep Vaidya Vijayalakshmi Atluri Qi Guo


MSIS Department and CIMIC MSIS Department and CIMIC MSIS Department and CIMIC
Rutgers University Rutgers University Rutgers University
180 University Ave, Newark, 180 University Ave, Newark, 180 University Ave, Newark,
NJ 07102 NJ 07102 NJ 07102
[email protected] [email protected] [email protected]

ABSTRACT 1. INTRODUCTION
Devising a complete and correct set of roles has been recog- Role-based access control (RBAC) has been adopted suc-
nized as one of the most important and challenging tasks in cessfully by a variety of commercial systems. As a result,
implementing role based access control. A key problem re- RBAC has become the norm in many of today’s organiza-
lated to this is the notion of goodness/interestingness – when tions for enforcing security. Basically, a role is nothing but
is a role good/interesting? In this paper, we define the role a set of permissions. Roles represent organizational agents
mining problem (RMP) as the problem of discovering an op- that perform certain job functions within the organization.
timal set of roles from existing user permissions. The main Users, in turn, are assigned appropriate roles based on their
contribution of this paper is to formally define RMP, and qualifications [18, 3].
analyze its theoretical bounds. In addition to the above ba- However, one of the major challenges in implementing
sic RMP, we introduce two different variations of the RMP, RBAC is to define a complete and correct set of roles. This
called the δ-approx RMP and the Minimal Noise RMP that process, known as role engineering [2], has been identified
have pragmatic implications. We reduce the known “set ba- as one of the costliest components in realizing RBAC [4].
sis problem” to RMP to show that RMP is an NP-complete Essentially, role engineering is the process of defining roles
problem. An important contribution of this paper is also to and assigning permissions to them.
show the relation of the role mining problem to several prob- There are two basic approaches towards role engineering:
lems already identified in the data mining and data analysis top-down and bottom-up. Under the top-down approach,
literature. By showing that the RMP is in essence reducible roles are defined by carefully analyzing and decomposing
to these known problems, we can directly borrow the exist- business processes into smaller units in a functionally inde-
ing implementation solutions and guide further research in pendent manner. These functional units are then associated
this direction. with permissions on information systems. In other words,
this approach begins with defining a particular job function
and then creating a role for this job function by associat-
Categories and Subject Descriptors ing needed permissions. Often, this is a cooperative process
D.4.6 [Operating Systems]: Security and Protection—Ac- where various authorities from different disciplines under-
cess controls; H.2.8 [Database Management]: Database stand the semantics of business processes of one another
Appliations—Data Mining and then incorporate them in the form of roles. Since there
are often dozens of business processes, tens of thousands of
General Terms users and millions of authorizations, this is rather a diffi-
cult task. Therefore, relying solely on a top-down approach
Security in most cases is not viable, although some case studies [19]
indicate that it has been done successfully by some organi-
Keywords zations (though at a high cost).
RBAC, role engineering, role mining In contrast, since organizations do not exist in a vacuum,
the bottom-up approach utilizes the existing permission as-

The work is supported in part by the National Science signments to formulate roles. Starting from the existing
Foundation under grant IIS-0306838. permissions before RBAC is implemented, the bottom-up
approach aggregates these into roles. It may also be advan-
tageous to use a mixture of the top-down and the bottom-up
approaches to conduct role engineering. While the top-down
Permission to make digital or hard copies of all or part of this work for model is likely to ignore the existing permissions, a bottom-
personal or classroom use is granted without fee provided that copies are
up model may not consider business functions of an organi-
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to zation [9]. However, the bottom-up approach excels in the
republish, to post on servers or to redistribute to lists, requires prior specific fact that much of the role engineering process can be auto-
permission and/or a fee. mated. Role mining can be used as a tool, in conjunction
SACMAT’07, June 20-22, 2007, Sophia Antipolis, France. with a top-down approach, to identify potential or candidate
Copyright 2007 ACM 978-1-59593-745-2/07/0006 ...$5.00.

175
roles which can then be examined to determine if they are in designing a database schema. However, from the practi-
appropriate given existing functions and business processes. cal point of view, one may denormalize the database for
There have been several attempts to propose good bottom- improving the query response. Similar to our analogous ex-
up techniques to finding roles. Kuhlmann et al. [10] present ample, the minimal set of roles gives us a good set of roles
a clustering technique similar to the well known k-means to begin with. At least, it shows the bare minimum required
clustering, which requires pre-defining the number of clus- to accurately describe the current state of the organization.
ters. In [20], Schlegelmilch and Steffens propose an ag- We argue that this is likely to be of immense help to the se-
glomerative clustering based approach to role mining (called curity administrator. In this paper, we formally define the
ORCA), which discovers roles by merging permissions ap- basic RMP problem and show that the decision version is
propriately. However, in ORCA, the order in which permis- NP-complete by reducing the known NP-complete set basis
sions are merged determines the outcome of roles. More- problem to this.
over, it does not allow overlapping roles (i.e., a user cannot We also consider several interesting variations of the ba-
play multiple roles), which is a significant drawback. More sic RMP, including the δ-approx Role Mining Problem (δ-
recently, Vaidya et al. [21] propose an approach based on approx RMP) and the Minimal Noise Role Mining Problem
subset enumeration, called RoleMiner, which eliminates the (MinNoise RMP). These are of practical importance. Both
above limitations. the δ-approx RMP and the MinNoise RMP are likely to re-
An inherent problem with all of the above approaches is sult in a lower number of roles than the basic RMP – and
that there is no formal notion of goodness/interestingness of might more accurately model the dynamic state of the or-
a role. All of the algorithms above present heuristic ways ganization. We describe these individually below: While
to find a set of candidate roles. While offering justifications solving the basic RMP, the goal is to identify the minimal
for the identified roles, there is no integrative view of the set of roles such that the original user-permission assign-
entire set of roles. For insightful bottom-up analysis, we ment matrix is decomposed. However, if we allow a slight
need to define interestingness metrics for roles. [21] takes inaccuracy in the decomposition such that when multiplied
a first step towards this by ordering candidate roles on the it does not generate the original matrix, it may still be ac-
basis of their support (i.e., roles that are more prevalent are ceptable. It is this variation of the basic RMP that we rec-
ordered higher). However, this metric still is quite ad-hoc ognize as the δ-approx RMP. Moreover, when discovering
and preliminary. Also, while one may come up with inter- roles, one may state the number of roles to be identified.
estingness metrics for a role by itself, this does not directly Given the specified set of roles, one may come up with a de-
lead to the notion of a good collection of roles. Indeed, there composition of the user-permission assignment. Note that,
is no formal definition of what is a good collection of roles. in this process, since we rigidly set the order of the matri-
Defining this is critical for the security administrator to gain ces to be decomposed, they may not generate the original
confidence and be able to fully utilize the output of any role matrix when multiplied. This discrepancy denoted as noise
mining algorithm beyond a piece-meal fashion. should be at a minimum. We recognize this problem as the
The main contribution of this paper is to formally define MinNoise RMP. We show that the complexity of both the
the role mining problem, and analyze its theoretical bounds. δ-approx RMP and the MinNoise RMP is NP-complete.
Assuming that we can represent the user permissions as a We have discovered that our basic role mining problem is
binary matrix, informally, we define the basic role mining identical to the problem of database tiling recently proposed
problem as follows: Given a m × n binary matrix A repre- by Geerts et al. [6]. We show how our basic RMP can be
senting the user-permissions, decompose A into two matri- mapped to the database tiling and present an algorithm to
ces B and C, where B is a m × k matrix representing the use tiling to discover roles. Similarly, the recently proposed
user-role assignment and C is k × n matrix representing the discrete basis problem [14] is identical to the MinNoise RMP,
role-permission association, such that k is minimal. and we show the mapping between our MinNoise RMP to
It is important to note that it is quite easy to come up the discrete basis problem.
with some decomposition of matrix A. For example, two This paper is organized as follows. In section 2 we review
extreme cases are 1) where each user is placed in a role by the RBAC model and some preliminary definitions employed
itself (i.e., k = m, B = I, the identity matrix, and C = A), in the paper. In section 3, we define our basic role mining
and 2) where each permission is placed in a role by itself problem as well as its variations and prove results about
(i.e., k = n, B = A, and C = I, the identity matrix). Both their complexity. In sections 4 and 5 we show the mappings
of these decompositions are accurate, but are not necessarily of our RMP to the database tiling and the discrete basis
minimal. An alternative decomposition, is to place all users problem, respectively. Finally, section 6 provides some in-
in a single role (i.e., k = 1). However, this decomposition sight into our ongoing and future research.
is likely to be very inaccurate unless all of the users indeed
have the same set of permissions. In none of these cases 2. PRELIMINARIES
are the roles likely to be accurate (i.e., close to reality).
What we are really interested in, is a fairly accurate set of We adopt the NIST standard of the Role Based Access
roles. In this regard, we consider the minimal set of roles Control (RBAC) model [3]. For the sake of simplicity, we
as the accurate set. Minimality is a good notion, since it do not consider sessions, role hierarchies or separation of
allows us to formally define the problem. Without semantics duties constraints in this paper. In other words, we restrict
(i.e., human expert knowledge), minimality serves as a best ourselves to RBAC0 without considering sessions.
approximation for realizing good descriptive roles.
One may note that, in a specific implementation of RBAC, Definition 1 (RBAC).
the most useful set of roles may be different from the mini-
• U, ROLES, OP S, and OBJ are the set of users, roles,
mal set. It is analogous to employing the best normalization
operations, and objects.

176
• U A ⊆ U × ROLES, a many-to-many mapping user- assignment of permission j to role i. Finally, the user-to-
to-role assignment relation. permission mapping can be represented as an m × n boolean
 matrix where a 1 in cell {ij} indicates the assignment of
• PRMS (the set of permissions) ⊆ {(op, obj)|op ∈ OP S permission j to user i.
obj ∈ OBJ} We now introduce the notion of δ-consistency between
• P A ⊆ ROLES × P RM S, a many-to-many mapping U A, P A, and U P A which is critical to the notion of accuracy
of role-to-permission assignments.1 of the roles. The L1 norm defined above is useful in defining
this.
• U P A ⊆ U × P RM S, a many-to-many mapping of Definition 4 (δ-Consistency). A given user-to-role
user-to-permission assignments. assignment U A, role-to-permission assignment P A and user-
• assigned users(R) = {u ∈ U |(u, R) ∈ U A}, the map- to-permission assignment U P A are δ-consistent if and only
ping of role R onto a set of users. if

• assigned permissions(R) = {p ∈ P RM S|(p, R) ∈  M (U A) ⊗ M (P A) − M (U P A) 1 ≤ δ


P A}, the mapping of role R onto a set of permissions. where M (U A), M (P A), and M (U P A) denote the matrix
representation of U A, P A and U P A respectively.
We now need some additional definitions from [13] per-
taining to boolean matrix multiplication: Essentially, the notion of δ-consistency allows us to bound
the degree of difference between the user-to-role assignment
Definition 2 (Boolean matrix multiplication). A U A, role-to-permission assignment P A and user-to-permission
Boolean matrix multiplication between Boolean matrices A ∈ assignment U P A. For U A, P A, and U P A to be δ-consistent,
{0, 1}m×k and B ∈ {0, 1}k×n is A ⊗ B = C where C is in the user-permission matrix generated from U A and P A should
space {0, 1}m×n and be within δ of U P A.

k
3.1 RMP and its variants
cij = (ail ∧ blj ).
l=1 In this section, we present the basic RMP and two of its
variants, δ-approx RMP and the MinNoise RMP.
Definition 3 (L1 norm). The L1 norm of a d-
dimensional vector v ∈ X d , for some set X, is Definition 5 (Role Mining Problem (RMP)). Given
a set of users U , a set of permissions P RM S, and a user-

d
permission assignment U P A, find a set of roles, ROLES,
 v 1 = |vi |.
a user-to-role assignment U A, and a role-to-permission as-
i=1
signment P A 0-consistent with U P A and minimizing the
The L1 -norm also defines a distance metric between vec- number of roles, k.
tors, referred to as L1 -metric and defined as
Given the user-permission matrix, the basic Role Mining

d problem asks us to find a user-to-role assignment U A and
 v − w 1 = |vi − wi |. a role-to-permission assignment P A such that U A and P A
i=1 exactly describe U P A while minimizing the number of roles.
Finally, the L1 -metric between vectors is expanded to ma- Put another way, it asks us what is the minimum number
trices in a natural way, i.e., if A and B are matrices in X n×m , of roles necessary to fully describe the given data (and what
for some set X, then are those roles, and the corresponding user assignments)?
While exact match is a good thing to have, at times we

n 
n 
m
may be satisfied with an approximate match. For example,
 A − B 1 =  ai − bi 1 = |aij − bij |.
consider a case where we have 1000 users and 100 permis-
i=1 i=1 j=1
sions. The size of U P A is 5000 (i.e., 5000 user-permission
The L1 -metric allows us to count the difference between assignments are allowed out of the possible 100, 000). Now,
two matrices – i.e., to figure out how good an approxima- suppose 100 roles are required to exactly match the given
tion one is of the other. When the L1 -metric is 0, the two user-permission data. However, if we allow approximate
matrices are identical. Other metrics (and distances) can matching – i.e., if it is good enough to match 99% of the
also be used – [13] discusses some alternatives and their im- matrix (4950 of the user-permission assignments), assume
plications. that the minimum number of roles required is only 60. As
long as we do not add any spurious permissions (i.e., no ex-
3. THE ROLE MINING PROBLEM tra 1s are added), the second case is clearly better than the
first, since we significantly reduce the number of roles. This
Given m users, n permissions and k roles (i.e., |U | = m, significantly reduces the burden of maintenance on the secu-
|P RM S| = n, |ROLES| = k), the user-to-role mapping can rity administrator while leaving only a few user-permission
be represented as an m × k boolean matrix where a 1 in cell assignments uncovered. Also, any given user-permission as-
{ij} indicates the assignment of role j to user i. Similarly, signment is only a snapshot of the current state of the or-
the role-to-permission mapping can be represented as an ganizations. Permissions and (to a lesser extent, Roles) are
k × n boolean matrix where a 1 in cell {ij} indicates the dynamic. Thus while exact match may be the best descrip-
1 tor in the static case, it is probably not good for the dynamic
Note that in the original NIST standard [3], P A was de-
fined as P A ⊆ P RM S ×ROLES, a many-to-many mapping case. Approximate match might be a prudent choice for dy-
of permission-to-role assignments. namic data. The notion of δ-consistency is useful, since it

177
p1 p2 p3 p4 p5
u1 0 1 0 0 1
u2 1 1 1 0 1
u3 1 1 0 1 1 r1 r2 r3
p1 p2 p3 p4 p5
u4 1 1 1 0 0 u1 0 0 1
r1 1 1 1 0 0
u2 1 0 1
r2 1 1 0 1 0
Table 1: User-privilege assignment u3 0 1 1
r3 0 1 0 0 1
u4 1 0 0

helps to bound the degree of approximation. Therefore, we (b) Role-permission as-


(a) User-role as-
now define the approximate Role Mining Problem using δ- signment
signment
consistency.

Definition 6 (δ-approx RMP). Given a set of users Table 2: Basic Role Mining Problem
U , a set of permissions P RM S, a user-permission assign-
ment U P A, and a threshold δ, find a set of roles, ROLES,
a user-to-role assignment U A, and a role-to-permission as-
signment P A, δ-consistent with U P A and minimizing the
number of roles, k.

It should be clear that the basic Role Mining Problem r1 r2


defined earlier is simply a special case of the δ-approx RMP u1 0 1 p1 p2 p3 p4 p5
(with δ set to 0). Instead of bounding the approximation, u2 1 1 r1 1 1 1 0 0
and minimizing the number of roles, it might be interesting u3 1 1 r2 0 1 0 0 1
to do the reverse – bound the number of roles, and minimize u4 1 0
the approximation. We call this the Minimal Noise Role (b) Role-permission as-
Mining Problem (MinNoise RMP). Thus, we fix the number (a) User- signment
of roles that we would like to find, but now we want to role as-
find those roles that incur minimal difference with respect signment
to the original user-permission matrix (U P A). The security
administrator might want to do this when he is looking for
r1
the top-k roles that describe the problem space well enough,
and are still (in some sense) robust to noise. u1 0
p1 p2 p3 p4 p5
u2 1
r1 1 1 1 0 1
u3 1
Definition 7 (Minimal Noise RMP). Given a set of
u4 1
users U , a set of permissions P RM S, a user-permission (d) Role-permission as-
assignment U P A, and the number of roles k, find a set of k signment
roles, ROLES, a user-to-role assignment U A, and a role- (c)
to-permission assignment P A, minimizing User-
role
 M (U A) ⊗ M (P A) − M (U P A) 1 assign-
ment
where M (U A), M (P A), and M (U P A) denote the matrix
representation of U A, P A and U P A respectively. Table 3: δ-approx RMP
We can clarify these problems further by means of an ex-
ample. Table 1 shows a sample user-privilege assignment
(U P A), for 4 users and 5 privileges. Tables 2(a) and 2(b)
depict a user-role assignment (U A) and role-privilege assign-
ment (P A) that completely describe the given user-privilege
assignment (i.e., M (U A) ⊗ M (P A) = M (U P A)). Indeed, r1 r2
the given U A, P A, and ROLES are optimal. It is not u1 0 1 p1 p2 p3 p4 p5
possible to completely describe the given U P A with less u2 1 1 r1 1 1 1 0 0
than 3 roles. Tables 3(a) and 3(b) depict the optimal user- u3 0 1 r2 0 1 0 0 1
role assignment (U A) and role-privilege assignment (P A) 2- u4 1 0
consistent, 3-consistent, as well as 4-consistent with U P A. (b) Role-permission as-
Tables 3(c) and 3(d) show the optimal user-role assignment (a) User- signment
(U A) and role-privilege assignment (P A) 5-consistent with role as-
U P A. Similarly, if we set k = 2, Tables 3(a) and 3(b) de- signment
pict one possible optimal minimal noise U A and P A. Ta-
bles 4(a) and 4(b) depict another optimal U A and P A for
the MinNoise RMP. Both represent correct solutions to the Table 4: MinNoise RMP
MinNoise RMP, though the second one does not incorrectly
cover any 0s with 1s.

178
3.2 Complexity • The transformation is quite simple. Given an instance
of the set basis problem, here is how we transform it
Before proceeding any further, we would like to estab- to an instance of the decision Role Mining Problem:
lish some results on the complexity of these problems. The S denotes the permissions P RM S. C denotes U P A.
Role Mining Problem, the δ-approx RMP, and the MinNoise Thus, every set c ∈ C stands for one user u. Now, the
RMP Problem are all optimization problems. The theory of answer to the decision role mining problem directly
NP-completeness applies to decision problems. Therefore, provides the answer to the set basis problem.
in order to consider the complexity of the problems, we now
frame the decision version of these problems. • The transformation is clearly polynomial (since it is a
direct one-to-one mapping).
Definition 8 (decision RMP). Given a set of users
U , a set of permissions P RM S, a user-permission assign-
ment U P A, and k ≥ 0, are there a set of roles, ROLES,
a user-to-role assignment U A, and a role-to-permission as- Theorem 2. The decision δ-approx RMP is NP-complete.
signment P A 0-consistent with U P A such that |ROLES| ≤
k? Proof. • The decision δ-approx RMP is in NP. The
set of roles ROLES, the user-to-role assignment U A,
Definition 9 (decision δ-approx RMP). Given a set and the role-to-permission assignment P A together form
of users U , a set of permissions P RM S, a user-permission the polynomial certificate/witness. It only takes poly-
assignment U P A, a threshold δ ≥ 0, and k ≥ 0, are there a nomial time to compute
set of roles, ROLES, a user-to-role assignment U A, and a
 M (U P A) − (M (U A) ⊗ M (P A)) 1
role-to- permission assignment P A, δ-consistent with U P A
such that |ROLES| ≤ k? and ensure that it is less than or equal to δ, and that
|ROLES| ≤ k.
Definition 10 (decision MinNoise RMP). Given a
set of users U , a set of permissions P RM S, a user-permission • We select the set basis problem as π 
assignment U P A, the number of roles k, and a noise thresh-
old θ, are there a set of k roles, ROLES, a user-to-role • The transformation is quite simple. Given an instance
assignment U A, and a role-to-permission assignment P A, of the set basis problem, here is how we transform it
such that to an instance of the decision Role Mining Problem:
S denotes the permissions P RM S. C denotes U P A.
 M (U A) ⊗ M (P A) − M (U P A) 1 ≤ θ Thus, every set c ∈ C stands for one user u. δ is set to
where M (U A), M (P A), and M (U P A) denote the matrix 0. Now, the answer to the decision approx role mining
representation of U A, P A and U P A respectively? problem directly provides the answer to the set basis
problem.
We can now prove that decision RMP, decision δ-approx
RMP, and decision MinNoise RMP are all NP-complete (In- • The transformation is clearly polynomial.
deed, some of these results have already been obtained in the
literature[13, 6]). Proving that a problem π is NP-Complete
consists of four main steps [5]:
Theorem 3. The decision MinNoise RMP is NP-complete.
1. showing that π is in NP Proof. • The decision MinNoise RMP is in NP. The
2. selecting a known NP-complete problem π  set of roles ROLES, the user-to-role assignment U A,
and the role-to-permission assignment P A together form
3. constructing a transformation f from π  to π, and the polynomial certificate/witness. It only takes poly-
nomial time to compute
4. proving that f is a (polynomial) transformation
 M (U P A) − (M (U A) ⊗ M (P A)) 1
The problem π  used to reduce from is the “set basis prob-
lem” defined below: and ensure that it is less than or equal to θ, and
|ROLES| = k.
Definition 11 (Set basis Problem). Given a collec-
tion C of subsets of a finite set S, and a positive integer • We select the set basis problem as π 
K ≤ |C|, is there a collection B of subsets of S with |B| = K
• The transformation is quite simple. Given an instance
such that, for each c ∈ C, there is a sub-collection of B
of the set basis problem, here is how we transform it
whose union is exactly c?
to an instance of the decision Role Mining Problem:
S denotes the permissions P RM S. C denotes U P A.
Theorem 1. The decision RMP is NP-complete.
Thus, every set c ∈ C stands for one user u. Set
Proof. • The decision Role Mining Problem is in θ = 0. Now, the answer to the decision MinNoise RMP
NP. The set of roles ROLES, the user-to-role assign- directly provides the answer to the set basis problem.
ment U A, and the role-to-permission assignment P A
together form the polynomial certificate/witness. • The transformation is clearly polynomial.

• We select the set basis problem as π 

179
p1 p2 p3 p4 p5 p6 p7 p1 p2 p3 p4 p5 p6 p7

u1 1 1 0 0 1 1 1 u1 1 1 0 0 1 1 1

u2 0 0 0 1 1 1 1 u2 0 0 1 1 1 1

u3 1 1 0 1 1 0 0 u3 1 1 0 1 1 0 0

u4 1 1 0 0 0 0 0 u4 1 1 0 0 0 0 0

(a) A 4 × 7 user-to-permission assignment (UPA) (b) Shaded areas indicate tiles, the 3 identified roles

R1 R2 R3
p1 p2 p3 p4 p5 p6 p7
u1 1 0 1
R1 1 1 0 0 0 0 0
u2 0 1 1
R2 0 0 0 1 1 0 0
u3 1 1 0
R3 0 0 0 0 1 1 1
u4 1 0 0

(c) user-to-role assignment (UA) (d) permission-to-role assignment (PA)

Figure 1: An example of mapping basic RMP to Minimum Tiling Problem

Instead of asking for the user-role assignment, U A, as well (subset) of the attributes. Then a tile t corresponding to an
as the role-permission assignment P A, we could consider the itemset I consists of the columns in itemset I as well as all
problem of obtaining each individually. For exact cover, [14, the rows that have 1s in all the columns in I. The area of a
13] shows that given the set of roles, and the role-permission tile is defined as the cardinality of the tile (i.e., the number
assignment, one can determine the user-role assignment in of 1s in the tile).
polynomial time. However, when an approximate answer Informally, a tile consists of a block of ones in a boolean
is required, such as in the MinNoise RMP, determining the database as shown in Figure 1(b). A collection of (possibly
user-role assignment requires O(2k mn) time – this is known overlapping) tiles constitutes a tiling. Among the collection
as fixed parameter tractable since the solution is exponential of 5 related problems defined in [6], the Minimum Tiling
only in terms of a fixed parameter. Unfortunately, k refers problem is of the most interest to us, which is defined below.
to the number of roles which is likely to be quite large in
practice, making this quite infeasible. It remains to be seen Definition 12 (Minimum Tiling). Given a boolean ma-
if finding the user-role assignment in the case of the δ-approx trix, find a tiling of the matrix with area equal to the total
RMP is any easier. number of 1s in the matrix and consisting of the least possi-
In the following sections, we show that the RMP along ble number of tiles.
with several variants can be mapped to other problems al- 4.2 Mapping Basic RMP to Minimum Tiling
ready studied in the data mining and data analysis litera-
ture. We discuss the complexity for each variant along with To see that the Minimum Tiling problem corresponds ex-
suggested methods for solving the problem. actly to the basic RMP, one must first see how a tile corre-
sponds to a role. As defined above, a tile is just a block of 1s
– i.e., a collection of rows and columns that all have 1s. Re-
4. MAPPING THE RMP TO THE TILING member that without semantics, a role is simply a collection
PROBLEM of permissions. Thus, inherently, in any tile, the collection
of the columns provides the role-to-permission assignment
In this section, we demonstrate the equivalence of the Role
(P A) for that role. At the same time, the collection of rows
Mining Problem with the Tiling Databases problem. This
denotes those users/entities that have that role – thus the
mapping allows us to directly borrow existing implementa-
collection of rows corresponds to the user-to-role assignment
tion solutions to RMP. In fact, the original Database Tiling
(U A) for that role. As such, any tiling corresponds to a set
paper by Geerts et al. [6] looked at a set of five problems,
of roles and their role/permission and user/role assignments.
one of which exactly matches the role mining problem. We
If the tiling completely covers the entire matrix – then all
now describe the relevant problems studied and then discuss
1s have been covered, meaning that all user/permission as-
their implications.
signments have been covered. Since each tile corresponds to
4.1 Tiling Databases a role, if the tiling is minimal and covers the entire matrix,
this means that we have found a set of minimal roles such
Consider a binary matrix of size m × n where the number that they completely describe the given user-permission as-
of rows, m, can be viewed as the number of objects and signment.
the number of columns, n, can be viewed as the number of The following example clearly demonstrates this mapping.
attributes. A 1 in cell {ij} denotes that object i has/owns In the context of tiling databases, Figure 1(a) shows the
attribute j (i.e., some relationship exists between object i boolean matrix representing a transactional database con-
and attribute j). Now, let an itemset I denote a collection sisting of 4 transactions and 7 items. Rows denote the trans-

180
actions and columns denote the items. We may order trans- 4.3 Algorithm to Discover Minimal Roles
actions from top to bottom sequentially as 1 – 4 and items
from left to right as 1 – 7. A 1 in cell {ij} represents that Since the Minimum Tiling problem is equivalent to the
transaction i contains item j. Figure 1(b) shows a tiling of basic RMP, the algorithms developed for Minimum Tiling
the matrix consisting of 3 tiles. The shaded region repre- now directly apply. [6] proposes a greedy approximation al-
sents a tile. Thus, Tile 1={(1,1), (1,2), (3,1), (3,2), (4,1), gorithm to find the minimum tiling of any given database.
(4,2)}. Tile 2={(2,4), (2,5), (3,4), (3,5)} and Tile 3={(1,5), This algorithm depends on finding all maximal tiles having
(1,6), (1,7), (2,5), (2,6), (2,7)}. As one can see, Tiles 2 and an area over a given threshold. A depth first search strat-
3 overlap on cell (2, 5). Figure 1(b) also gives the mini- egy is used to find all large tiles. [6] prove that the Mini-
mum tiling of the matrix. It is not possible to find a tiling mum Tiling problem can be approximated within the factor
the covers the entire matrix with less than 3 tiles. We can O(log mn), given an oracle that finds for any database D
view the same problem from the role mining perspective. and tiling T , the tile t such that the area(T ∪ t) is the max-
As described before, each tile corresponds to a role. Fig- imum (i.e., the oracle returns the tile which covers as much
ure 1(c) and 1(d) show an optimal U A and P A, such that of the remaining uncovered part of the database). Such an
M (U A) ⊗ M (P A) = M (U P A). Again, the decomposition oracle can be implemented reasonably efficiently by adapt-
is optimal in the sense that it is impossible to find only two ing the maximal tile algorithm. [6] provides more detail on
roles such that U A and P A will be 0-consistent with U P A. this. We now briefly present the adapted algorithm for the
Formally, we can reduce the Minimum Tiling problem to basic RMP.
the basic RMP as follows. Algorithm 1 presents the basic RMP algorithm. It con-
sists of two phases. In the first phase, we find a minimum
Theorem 4. The Minimum Tiling problem is identical tiling for the given U P A. In the second phase, we convert
to the basic Role Mining Problem. the tiling into ROLES, U A, and P A. As described earlier,
Proof. To show that the two problems are identical we phase 1 uses a simple greedy strategy of adding the largest
show that their inputs and outputs exactly match. Thus, uncovered tile to the current tiling, until U P A is completely
for every input instance, the output of both problems have covered (i.e., the largest uncovered tile remaining is empty).
a direct one-to-one mapping. Algorithm 2 describes the procedure for finding the largest
uncovered tile from U P A.
• The input to both problems is a m×n boolean matrix.
• For any problem instance, the Minimum Tiling prob- Algorithm 1 RMP(U P A)
lem returns a set of tiles that completely cover the 1: {Find the minimum tiling for U P A}
input while minimizing the number of tiles. Each tile 2: T ← {}
corresponds to a role, R. For each tile, we extract the 3: while (T  ← LUTM(U P A,T )) = {} do
set of columns C, in the tile. For each column c ∈ C, 4: T ← T ∪ T
add the assignment {c, R} to P A. Similarly, for each 5: end while
row i, belonging to the tile, add the assignment {i, R} 6: {Convert the minimum tiling into U A and P A}
to U A. Add R to ROLES. 7: ROLES ← {}, U A ← {}, P A ← {}
• The resulting set of roles (ROLES), user-role assign- 8: for each tile t ∈ T do
ment (U A), and permission-role assignment (P A) are 9: Create a new role R and add it to ROLES
guaranteed to be a solution to the basic RMP. (i.e., 10: Extract the set of permissions P in the tile
U A and P A are 0-consistent with the corresponding 11: For each permission p ∈ P , add the assignment {p, R}
U P A, and the number of roles is minimal). To prove to P A
the 0-consistency, it is sufficient to note that U A ⊗ P A 12: Extract the set of users U in the tile
gives us the original tiling of the input matrix which is 13: For each user u ∈ U , add the assignment {u, R} to
equivalent to the original U P A. We can prove the min- UA
imality by contradiction. Assume that a different solu- 14: end for
tion to the RMP exists – consisting of ROLES  , U A
and P A where |ROLES  | < |ROLES|. In this case, The LUTM algorithm (Algorithm 2) is a depth-first re-
we can transform this solution into a corresponding cursive algorithm that finds the largest uncovered tile. In
solution for tiling. For each role r ∈ ROLES  , create order to do a depth-first search, we simply assume some
the corresponding tile tR consisting of the permissions canonical order over the permissions. The key idea behind
 
given by P A and the users given by U A . The union the algorithm is that all large tiles containing a permis-
of all tiles R TR gives a tiling of the matrix. This sion i ∈ P RM S, but not containing any permission lower
tiling covers the entire matrix since U A and P A are than i (according to the canonical order) can be found in
0-consistent with U P A. However, the number of tiles the so-called i-conditional database [7]. In our context,
is the same as |ROLES  | which is less than |ROLES|. the P -conditional database U P AP consists of all user-to-
But that means that the earlier solution is not min- permission assignments that contain P , but from which all
imal – and we have a contradiction. Therefore, the permissions before the last permission in P and that last
solution to the tiling databases problem directly maps permission itself would have been removed. Now, any large
to a solution for the role mining problem. tile that is found in this conditional database, at once implies
a corresponding large tile including P . Therefore, whenever
we want to compute an area associated with a set of permis-
Thus, the Minimum Tiling problem exactly corresponds sions P  in U P AP , we simply need to add |P | to the width
to the basic RMP. of the area (|P  |) and multiply this with |U (P  )| [6]. We

181
p1 p2 p3 R1 R2 p1 p2 p3
u1 1 1 1 u1 0 1
R1 1 0 0
u2 1 0 0 u2 1 0
u3 1 1 1 u3 0 1 R2 0 1 1
u4 1 1 1 u4 1 1

(a) A 4 × 3 user-to-permission
(b) Decomposition of UPA into UA
assignment (UPA).
and PA with k=2.
R1 R2
u1 0 1 p1 p2 p3

u2 1 0 R1 1 0 0
u3 1 1
R2 1 1 1
u4 1 1

(c) Optimal decomposition of UPA into UA and PA with k=2.

Figure 2: An example of mapping MinNoise RMP to DBP

modify the original LTM algorithm [6] to return the largest algorithm shown here is quite simple. However, its ef-
uncovered tile. For this, we keep track of the current largest ficiency can be significantly improved by using several
uncovered tile, LT, and its uncovered area, LTarea. The pruning techniques – more details can be found in [6].
main steps of the algorithm are as follows:
Step 1: Originally, LT and LTarea are initialized to the empty Algorithm 2 LUTM(UPA,T)
set and 0, respectively. The current set of permissions
being considered, P is also initialized to the empty set. 1: P ← {}
Lines 1 and 2 perform this initialization. 2: LT ← {}, AreaLT ← 0
3: for ∀p ∈ P RM S do
Step 2: Line 3 starts the main loop of the algorithm, and iter- 4: if uncovered area of t(P ∪ {p}) > AreaLT then
ates over each permission separately. On lines 4-7, if 5: LT ← t(P ∪ {p})
the uncovered area of the current tile being considered 6: Update AreaLT to have uncovered area of t(P ∪
is larger than the current known best, the best is up- {p})
dated to this. i.e., LT and LTarea always refer to the 7: end if
largest uncovered tile seen so far. Over here, we need 8: {Create the conditional database for recursion}
to clarify what we mean by uncovered area. For any 9: U P A(P ∪{p}) ← {}
tile, the uncovered area is the number of 1s that the 10: for (∀q|(q ∈ P RM S) ∩ (q > p)) do
tile covers that are not already covered in the existing 11: Add (q, U ({p}) ∩ U ({q})) to U P A(P ∪{p})
tiling – i.e., the uncovered area refers to that part of 12: end for
the tile that is new and not seen before. 13: Compute T ((P ∪ {p}), U P A(P ∪{p}) ) recursively
Referring back to Figure 1(b), assume that the current 14: end for
tiling consists of Tile 1 and Tile 2. Now, the covered
area is simply the distinct number of 1s included in
the Tiling. In our case, since the tiles do not overlap, 5. MAPPING THE MINNOISE RMP TO THE
the overall covered area is equal to 10 (6 for Tile 1 and
4 for Tile 2). DISCRETE BASIS PROBLEM
Now, suppose we are considering Tile 3. The uncov- In this section, we demonstrate the direct equivalence of
ered area of Tile 3 is 5 (since the total number of 1s in the MinNoise RMP to the Discrete Basis problem. This
Tile 3 is 6, and one out of those 1s, at position {u2,p5} mapping again allows us to directly borrow existing imple-
is already covered in the current tiling). Thus, given mentation solutions. Miettinen, in his thesis [13], studies a
a database and an existing tiling, whenever a new tile set of three related problems and shows that these are NP-
is considered, it is easy to compute the uncovered area complete. We now describe the relevant problems studied
by simply removing the already covered area from the and then discuss their implications.
area of the tile. The Discrete Basis problem [14] studies the problem of
finding a basis from given data. Similar to Principal Com-
Step 3: Lines 8-12 creates the conditional database U P AP .
ponent Analysis (PCA), the discrete basis problem is a tech-
Step 4: Finally, line 13 invokes the algorithm recursively to nique for simplifying a dataset, by reducing multidimen-
calculate the largest uncovered area in the smaller con- sional datasets to lower dimensions for summarization, anal-
ditional database. Since the conditional database pro- ysis, and/or compression. Unlike PCA, the discrete ba-
gressively shrinks, the algorithm is guaranteed to fin- sis problem only considers boolean data, and finds boolean
ish after all the permissions have been considered. The bases.

182
We have already introduced some of the notation used for time of this algorithm is clearly polynomial in the size of the
defining the discrete basis problem from [14]. Formally, the input [13].
discrete basis problem is defined as follows: Miettinen [13] also shows that the discrete basis problem
cannot be approximated to in polynomial time within any
Definition 13 (Discrete Basis Problem). Given a constant factor unless P = N P . This essentially shuts the
matrix C ∈ {0, 1}n×d and a positive integer k ≤ min{n, d}, door on any attempt to find an approximation algorithm
find a matrix B ∈ {0, 1}k×d minimizing for the problem. However, heuristic solutions based on as-
sociation rule mining are proposed and seem to give fairly
l⊗ (C, B) = minS∈{0,1}n×k  C − S ⊗ B 1 good results on simulated data. Again, [13] provides further
details on this. Other heuristics can also be used. One pos-
The Discrete Basis Problem only asks for a discrete basis.
sibility is to extend the RoleMiner algorithm [21] to find the
A related problem is the Basis Usage problem: best candidates to describe the dataset. As part of future
work, we intend to comprehensively test a set of heuristics
Definition 14 (Basis Usage Problem). Given a ma-
(including the one in [13]) to determine what really works
trix C ∈ {0, 1}n×d and a matrix B ∈ {0, 1}k×d , find a matrix
well in our domain.
S ∈ {0, 1}n×k minimizing
 C − S ⊗ B 1

Together, the Discrete Basis Problem and the Basis Us-


age Problem correspond to the MinNoise RMP. C represents
6. CONCLUSIONS AND FUTURE
the user-privilege assignment, U P A. B represents the role-
permission assignment, P A. S represents the user-role as- RESEARCH
signment U A. The following example clearly demonstrates In this paper, we have formally defined the role mining
this equivalence. problem (RMP) for conducting a bottom-up role engineer-
In the context of the discrete basis problem, the input is a ing. In addition to the basic RMP, we also define the δ-
boolean matrix, where the rows and columns might stand for approx RMP and the MinNoise RMP that are useful when
anything – users and permissions, or documents and words. performing role mining in real world settings. We have ana-
For now, we assume that these show the user-permission lyzed the theoretical bounds of the basic RMP as well as its
assignment, U P A. Thus, Figure 2(a) is a n × m input bi- variants and have shown that all of them are NP-complete
nary matrix where n = 4, m = 3. Given the positive integer problems. We have mapped these problems to the recently
k = 2 (k <min{m, n}), Figure 2(b) shows one possible de- proposed problems in the area of data mining and data anal-
composition into a usage matrix S and basis vector matrix ysis – the database tiling and the discrete basis. As a result,
B. As we can see, in this case |C − S ⊗ B| is 2.2 Figure we could borrow the implementation solutions proposed for
2(c) shows a better decomposition since |C − S ⊗ B| =0. these problem and directly apply them to solve the basic
Indeed this is the best (optimal) decomposition possible for RMP and MinNoise RMP. We are currently working towards
the given input matrix. Note that the discrete basis prob- a solution to the δ-approx RMP variant.
lem only asks for the optimal basis B (i.e., role-permission Also, in mathematics, the problem of finding boolean rank
assignment P A). Given B, the basis usage problem asks for / schein rank of a matrix is exactly the same as the basic
the optimal usage matrix S (i.e., user-role assignment U A). RMP. It has been earlier proven that finding the Schein rank
In our case, the MinNoise RMP asks for both P A and U A is NP-complete [11]. This matches our results. Other prop-
together. The difference is semantic – in either case, the erties of the boolean rank have also been studied [1]. It
problem (as stated) is NP-complete [13]. would be interesting to investigate what other results are
However, splitting the problem into two parts (i.e., finding directly applicable to our problem and see if they offer new
optimal P A, and then finding optimal U A given P A) does insight into our domain. Bipartite graphs and bicliques are
help in the case of the basic RMP. For the basic RMP, we another way of defining the RMP and its variants. Sev-
wish to exactly match the given U P A. In this case, while eral papers have looked at different variants of this (e.g., [8]
the discrete basis problem (finding optimal P A) remains and [17]) – though most concentrate on finding one biclique
NP-hard, the basis usage problem (finding U A given P A) from a bipartite or general graph. Conjunctive clustering
becomes polynomial. A simple algorithm for the basis usage [15] generalizes this to finding multiple bicliques, which is
problem in this case is as follows: For each user and for each more relevant to our problem. We also need to see which
role, if the set of permissions of the role is a subset of the solutions among this work can be utilized for our problem.
permissions of the user, then assign that role to that user. Since the RMP and its variants are NP-complete, it is im-
Since we only assign a role to a user as long as all of its portant to come up with heuristic strategies for achieving
permissions are owned by the user, there are no mistakes implementations with reasonable complexity. In fact, the
(and we have an exact match). Obviously, this assumes that recently proposed RoleMiner solution [21] could also serve
the provided basis is complete (i.e., that each user can be as a heuristic strategy for the basic RMP. We intend to in-
exactly described using some subset of the roles), and thus vestigate this and other possibilities in real settings to create
all of the required roles are assigned to the user. Thus, after a set of tools for the security administrator. Moreover, most
going through the entire set of users and permissions, we of the role mining approaches employ clustering techniques
automatically come up with the optimal U A. The running or its variants to discover roles. We are currently investigat-
2 ing other data mining techniques including association rule
We keep the notations of matrix product and L1 norm as
what they originally are in DBP paper [14], even if they are mining (specifically closed itemset mining [16, 12]) for role
slightly different with those used in RMP. discovery.

183
Acknowledgments [11] G. Markowsky. Ordering d-classes and computing
We would like to gratefully acknowledge the help of Pauli schein rank is hard. Semi-group Forum, 44:373–375,
Miettinen and Taneli Mielikainen. 1992.
[12] T. Mielikäinen. Intersecting data to closed sets with
constraints. In B. Goethals and M. J. Zaki, editors,
7. REFERENCES FIMI, volume 90 of CEUR Workshop Proceedings.
[1] C. Damm, K. H. Kim, and F. Roush. On covering and CEUR-WS.org, 2003.
rank problems for boolean matrices and their [13] P. Miettinen. The discrete basis problem, master’s
applications. In Computing and Combinatorics: 5th thesis. Master’s thesis, University of Helsinki, 2006.
Annual International Conference, COCOON’99, [14] P. Miettinen, T. Mielikainen, A. Gionis, G. Das, and
volume 1627 of Lecture Notes in Computer Science, H. Mannila. The discrete basis problem. In Knowledge
pages 123 – 133. Springer-Verlag, 1999. Discovery in Databases: PKDD 2006, Lecture Notes
[2] E.J.Coyne. Role-engineering. In 1st ACM Workshop in Artificial Intelligence, pages 335 – 346, 2006.
on Role-Based Access Control, 1995. [15] N. Mishra, D. Ron, and R. Swaminathan. On finding
[3] D. Ferraiolo, R. Sandhu, S. Gavrila, D. Kuhn, and large conjunctive clusters. In Learning Theory and
R. Chandramouli. Proposed nist standard for Kernel Machines: 16th Annual Conference on
role-based access control. TISSEC, 2001. Learning Theory and 7th Kernel Workshop,
[4] M. P. Gallagher, A. O’Connor, and B. Kropp. The COLT/Kernel 2003, volume 2777 of Lecture Notes in
economic impact of role-based access control. Computer Science, pages 448 – 462. Springer, 2003.
Planning report 02-1, National Institute of Standards [16] F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. J.
and Technology, March 2002. Zaki. Carpenter: finding closed patterns in long
[5] M. R. Garey and D. S. Johnson. Computers and biological datasets. In KDD, pages 637–642, 2003.
Intractability: A Guide to the Theory of [17] R. Peeters. The maximum edge biclique problem is
NP-Completeness, chapter 3. W. H. Freeman, 1979. np-complete. Discrete Appl. Math., 131(3):651–654,
[6] F. Geerts, B. Goethals, and T. Mielikainen. Tiling 2003.
databases. In Discovery Science, Lecture Notes in [18] R. S. Sandhu et al. Role-based Access Control Models.
Computer Science, pages 278 – 289. Springer-Verlag, IEEE Computer, pages 38–47, February 1996.
2004. [19] A. Schaad, J. Moffett, and J. Jacob. The role-based
[7] J. Han, J. Pei, and Y. Yin. Mining frequent patterns access control system of a european bank: A case
without candidate generation. In W. Chen, study and discussion. In Proceedings of ACM
J. Naughton, and P. A. Bernstein, editors, 2000 ACM Symposium on Access Control Models and
SIGMOD Intl. Conference on Management of Data, Technologies, pages 3–9, May 2001.
pages 1–12. ACM Press, 05 2000. [20] J. Schlegelmilch and U. Steffens. Role mining with
[8] D. S. Hochbaum. Approximating clique and biclique orca. In Symposium on Access Control Models and
problems. J. Algorithms, 29(1):174–200, 1998. Technologies (SACMAT). ACM, June 2005.
[9] A. Kern, M. Kuhlmann, A. Schaad, and J. Moffett. [21] J. Vaidya, V. Atluri, and J. Warner. Roleminer:
Observations on the role life-cycle in the context of mining roles using subset enumeration. In CCS ’06:
enterprise security management. In 7th ACM Proceedings of the 13th ACM conference on Computer
Symposium on Access Control Models and and communications security, pages 144–153, 2006.
Technologies, June 2002.
[10] M. Kuhlmann, D. Shohat, and G. Schimpf. Role
mining - revealing business roles for security
administration using data mining technology. In
Symposium on Access Control Models and
Technologies (SACMAT). ACM, June 2003.

184

You might also like