The Role Mining Problem: Finding A Minimal Descriptive Set of Roles
The Role Mining Problem: Finding A Minimal Descriptive Set of Roles
Set of Roles ∗
ABSTRACT 1. INTRODUCTION
Devising a complete and correct set of roles has been recog- Role-based access control (RBAC) has been adopted suc-
nized as one of the most important and challenging tasks in cessfully by a variety of commercial systems. As a result,
implementing role based access control. A key problem re- RBAC has become the norm in many of today’s organiza-
lated to this is the notion of goodness/interestingness – when tions for enforcing security. Basically, a role is nothing but
is a role good/interesting? In this paper, we define the role a set of permissions. Roles represent organizational agents
mining problem (RMP) as the problem of discovering an op- that perform certain job functions within the organization.
timal set of roles from existing user permissions. The main Users, in turn, are assigned appropriate roles based on their
contribution of this paper is to formally define RMP, and qualifications [18, 3].
analyze its theoretical bounds. In addition to the above ba- However, one of the major challenges in implementing
sic RMP, we introduce two different variations of the RMP, RBAC is to define a complete and correct set of roles. This
called the δ-approx RMP and the Minimal Noise RMP that process, known as role engineering [2], has been identified
have pragmatic implications. We reduce the known “set ba- as one of the costliest components in realizing RBAC [4].
sis problem” to RMP to show that RMP is an NP-complete Essentially, role engineering is the process of defining roles
problem. An important contribution of this paper is also to and assigning permissions to them.
show the relation of the role mining problem to several prob- There are two basic approaches towards role engineering:
lems already identified in the data mining and data analysis top-down and bottom-up. Under the top-down approach,
literature. By showing that the RMP is in essence reducible roles are defined by carefully analyzing and decomposing
to these known problems, we can directly borrow the exist- business processes into smaller units in a functionally inde-
ing implementation solutions and guide further research in pendent manner. These functional units are then associated
this direction. with permissions on information systems. In other words,
this approach begins with defining a particular job function
and then creating a role for this job function by associat-
Categories and Subject Descriptors ing needed permissions. Often, this is a cooperative process
D.4.6 [Operating Systems]: Security and Protection—Ac- where various authorities from different disciplines under-
cess controls; H.2.8 [Database Management]: Database stand the semantics of business processes of one another
Appliations—Data Mining and then incorporate them in the form of roles. Since there
are often dozens of business processes, tens of thousands of
General Terms users and millions of authorizations, this is rather a diffi-
cult task. Therefore, relying solely on a top-down approach
Security in most cases is not viable, although some case studies [19]
indicate that it has been done successfully by some organi-
Keywords zations (though at a high cost).
RBAC, role engineering, role mining In contrast, since organizations do not exist in a vacuum,
the bottom-up approach utilizes the existing permission as-
∗
The work is supported in part by the National Science signments to formulate roles. Starting from the existing
Foundation under grant IIS-0306838. permissions before RBAC is implemented, the bottom-up
approach aggregates these into roles. It may also be advan-
tageous to use a mixture of the top-down and the bottom-up
approaches to conduct role engineering. While the top-down
Permission to make digital or hard copies of all or part of this work for model is likely to ignore the existing permissions, a bottom-
personal or classroom use is granted without fee provided that copies are
up model may not consider business functions of an organi-
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to zation [9]. However, the bottom-up approach excels in the
republish, to post on servers or to redistribute to lists, requires prior specific fact that much of the role engineering process can be auto-
permission and/or a fee. mated. Role mining can be used as a tool, in conjunction
SACMAT’07, June 20-22, 2007, Sophia Antipolis, France. with a top-down approach, to identify potential or candidate
Copyright 2007 ACM 978-1-59593-745-2/07/0006 ...$5.00.
175
roles which can then be examined to determine if they are in designing a database schema. However, from the practi-
appropriate given existing functions and business processes. cal point of view, one may denormalize the database for
There have been several attempts to propose good bottom- improving the query response. Similar to our analogous ex-
up techniques to finding roles. Kuhlmann et al. [10] present ample, the minimal set of roles gives us a good set of roles
a clustering technique similar to the well known k-means to begin with. At least, it shows the bare minimum required
clustering, which requires pre-defining the number of clus- to accurately describe the current state of the organization.
ters. In [20], Schlegelmilch and Steffens propose an ag- We argue that this is likely to be of immense help to the se-
glomerative clustering based approach to role mining (called curity administrator. In this paper, we formally define the
ORCA), which discovers roles by merging permissions ap- basic RMP problem and show that the decision version is
propriately. However, in ORCA, the order in which permis- NP-complete by reducing the known NP-complete set basis
sions are merged determines the outcome of roles. More- problem to this.
over, it does not allow overlapping roles (i.e., a user cannot We also consider several interesting variations of the ba-
play multiple roles), which is a significant drawback. More sic RMP, including the δ-approx Role Mining Problem (δ-
recently, Vaidya et al. [21] propose an approach based on approx RMP) and the Minimal Noise Role Mining Problem
subset enumeration, called RoleMiner, which eliminates the (MinNoise RMP). These are of practical importance. Both
above limitations. the δ-approx RMP and the MinNoise RMP are likely to re-
An inherent problem with all of the above approaches is sult in a lower number of roles than the basic RMP – and
that there is no formal notion of goodness/interestingness of might more accurately model the dynamic state of the or-
a role. All of the algorithms above present heuristic ways ganization. We describe these individually below: While
to find a set of candidate roles. While offering justifications solving the basic RMP, the goal is to identify the minimal
for the identified roles, there is no integrative view of the set of roles such that the original user-permission assign-
entire set of roles. For insightful bottom-up analysis, we ment matrix is decomposed. However, if we allow a slight
need to define interestingness metrics for roles. [21] takes inaccuracy in the decomposition such that when multiplied
a first step towards this by ordering candidate roles on the it does not generate the original matrix, it may still be ac-
basis of their support (i.e., roles that are more prevalent are ceptable. It is this variation of the basic RMP that we rec-
ordered higher). However, this metric still is quite ad-hoc ognize as the δ-approx RMP. Moreover, when discovering
and preliminary. Also, while one may come up with inter- roles, one may state the number of roles to be identified.
estingness metrics for a role by itself, this does not directly Given the specified set of roles, one may come up with a de-
lead to the notion of a good collection of roles. Indeed, there composition of the user-permission assignment. Note that,
is no formal definition of what is a good collection of roles. in this process, since we rigidly set the order of the matri-
Defining this is critical for the security administrator to gain ces to be decomposed, they may not generate the original
confidence and be able to fully utilize the output of any role matrix when multiplied. This discrepancy denoted as noise
mining algorithm beyond a piece-meal fashion. should be at a minimum. We recognize this problem as the
The main contribution of this paper is to formally define MinNoise RMP. We show that the complexity of both the
the role mining problem, and analyze its theoretical bounds. δ-approx RMP and the MinNoise RMP is NP-complete.
Assuming that we can represent the user permissions as a We have discovered that our basic role mining problem is
binary matrix, informally, we define the basic role mining identical to the problem of database tiling recently proposed
problem as follows: Given a m × n binary matrix A repre- by Geerts et al. [6]. We show how our basic RMP can be
senting the user-permissions, decompose A into two matri- mapped to the database tiling and present an algorithm to
ces B and C, where B is a m × k matrix representing the use tiling to discover roles. Similarly, the recently proposed
user-role assignment and C is k × n matrix representing the discrete basis problem [14] is identical to the MinNoise RMP,
role-permission association, such that k is minimal. and we show the mapping between our MinNoise RMP to
It is important to note that it is quite easy to come up the discrete basis problem.
with some decomposition of matrix A. For example, two This paper is organized as follows. In section 2 we review
extreme cases are 1) where each user is placed in a role by the RBAC model and some preliminary definitions employed
itself (i.e., k = m, B = I, the identity matrix, and C = A), in the paper. In section 3, we define our basic role mining
and 2) where each permission is placed in a role by itself problem as well as its variations and prove results about
(i.e., k = n, B = A, and C = I, the identity matrix). Both their complexity. In sections 4 and 5 we show the mappings
of these decompositions are accurate, but are not necessarily of our RMP to the database tiling and the discrete basis
minimal. An alternative decomposition, is to place all users problem, respectively. Finally, section 6 provides some in-
in a single role (i.e., k = 1). However, this decomposition sight into our ongoing and future research.
is likely to be very inaccurate unless all of the users indeed
have the same set of permissions. In none of these cases 2. PRELIMINARIES
are the roles likely to be accurate (i.e., close to reality).
What we are really interested in, is a fairly accurate set of We adopt the NIST standard of the Role Based Access
roles. In this regard, we consider the minimal set of roles Control (RBAC) model [3]. For the sake of simplicity, we
as the accurate set. Minimality is a good notion, since it do not consider sessions, role hierarchies or separation of
allows us to formally define the problem. Without semantics duties constraints in this paper. In other words, we restrict
(i.e., human expert knowledge), minimality serves as a best ourselves to RBAC0 without considering sessions.
approximation for realizing good descriptive roles.
One may note that, in a specific implementation of RBAC, Definition 1 (RBAC).
the most useful set of roles may be different from the mini-
• U, ROLES, OP S, and OBJ are the set of users, roles,
mal set. It is analogous to employing the best normalization
operations, and objects.
176
• U A ⊆ U × ROLES, a many-to-many mapping user- assignment of permission j to role i. Finally, the user-to-
to-role assignment relation. permission mapping can be represented as an m × n boolean
matrix where a 1 in cell {ij} indicates the assignment of
• PRMS (the set of permissions) ⊆ {(op, obj)|op ∈ OP S permission j to user i.
obj ∈ OBJ} We now introduce the notion of δ-consistency between
• P A ⊆ ROLES × P RM S, a many-to-many mapping U A, P A, and U P A which is critical to the notion of accuracy
of role-to-permission assignments.1 of the roles. The L1 norm defined above is useful in defining
this.
• U P A ⊆ U × P RM S, a many-to-many mapping of Definition 4 (δ-Consistency). A given user-to-role
user-to-permission assignments. assignment U A, role-to-permission assignment P A and user-
• assigned users(R) = {u ∈ U |(u, R) ∈ U A}, the map- to-permission assignment U P A are δ-consistent if and only
ping of role R onto a set of users. if
177
p1 p2 p3 p4 p5
u1 0 1 0 0 1
u2 1 1 1 0 1
u3 1 1 0 1 1 r1 r2 r3
p1 p2 p3 p4 p5
u4 1 1 1 0 0 u1 0 0 1
r1 1 1 1 0 0
u2 1 0 1
r2 1 1 0 1 0
Table 1: User-privilege assignment u3 0 1 1
r3 0 1 0 0 1
u4 1 0 0
Definition 6 (δ-approx RMP). Given a set of users Table 2: Basic Role Mining Problem
U , a set of permissions P RM S, a user-permission assign-
ment U P A, and a threshold δ, find a set of roles, ROLES,
a user-to-role assignment U A, and a role-to-permission as-
signment P A, δ-consistent with U P A and minimizing the
number of roles, k.
178
3.2 Complexity • The transformation is quite simple. Given an instance
of the set basis problem, here is how we transform it
Before proceeding any further, we would like to estab- to an instance of the decision Role Mining Problem:
lish some results on the complexity of these problems. The S denotes the permissions P RM S. C denotes U P A.
Role Mining Problem, the δ-approx RMP, and the MinNoise Thus, every set c ∈ C stands for one user u. Now, the
RMP Problem are all optimization problems. The theory of answer to the decision role mining problem directly
NP-completeness applies to decision problems. Therefore, provides the answer to the set basis problem.
in order to consider the complexity of the problems, we now
frame the decision version of these problems. • The transformation is clearly polynomial (since it is a
direct one-to-one mapping).
Definition 8 (decision RMP). Given a set of users
U , a set of permissions P RM S, a user-permission assign-
ment U P A, and k ≥ 0, are there a set of roles, ROLES,
a user-to-role assignment U A, and a role-to-permission as- Theorem 2. The decision δ-approx RMP is NP-complete.
signment P A 0-consistent with U P A such that |ROLES| ≤
k? Proof. • The decision δ-approx RMP is in NP. The
set of roles ROLES, the user-to-role assignment U A,
Definition 9 (decision δ-approx RMP). Given a set and the role-to-permission assignment P A together form
of users U , a set of permissions P RM S, a user-permission the polynomial certificate/witness. It only takes poly-
assignment U P A, a threshold δ ≥ 0, and k ≥ 0, are there a nomial time to compute
set of roles, ROLES, a user-to-role assignment U A, and a
M (U P A) − (M (U A) ⊗ M (P A)) 1
role-to- permission assignment P A, δ-consistent with U P A
such that |ROLES| ≤ k? and ensure that it is less than or equal to δ, and that
|ROLES| ≤ k.
Definition 10 (decision MinNoise RMP). Given a
set of users U , a set of permissions P RM S, a user-permission • We select the set basis problem as π
assignment U P A, the number of roles k, and a noise thresh-
old θ, are there a set of k roles, ROLES, a user-to-role • The transformation is quite simple. Given an instance
assignment U A, and a role-to-permission assignment P A, of the set basis problem, here is how we transform it
such that to an instance of the decision Role Mining Problem:
S denotes the permissions P RM S. C denotes U P A.
M (U A) ⊗ M (P A) − M (U P A) 1 ≤ θ Thus, every set c ∈ C stands for one user u. δ is set to
where M (U A), M (P A), and M (U P A) denote the matrix 0. Now, the answer to the decision approx role mining
representation of U A, P A and U P A respectively? problem directly provides the answer to the set basis
problem.
We can now prove that decision RMP, decision δ-approx
RMP, and decision MinNoise RMP are all NP-complete (In- • The transformation is clearly polynomial.
deed, some of these results have already been obtained in the
literature[13, 6]). Proving that a problem π is NP-Complete
consists of four main steps [5]:
Theorem 3. The decision MinNoise RMP is NP-complete.
1. showing that π is in NP Proof. • The decision MinNoise RMP is in NP. The
2. selecting a known NP-complete problem π set of roles ROLES, the user-to-role assignment U A,
and the role-to-permission assignment P A together form
3. constructing a transformation f from π to π, and the polynomial certificate/witness. It only takes poly-
nomial time to compute
4. proving that f is a (polynomial) transformation
M (U P A) − (M (U A) ⊗ M (P A)) 1
The problem π used to reduce from is the “set basis prob-
lem” defined below: and ensure that it is less than or equal to θ, and
|ROLES| = k.
Definition 11 (Set basis Problem). Given a collec-
tion C of subsets of a finite set S, and a positive integer • We select the set basis problem as π
K ≤ |C|, is there a collection B of subsets of S with |B| = K
• The transformation is quite simple. Given an instance
such that, for each c ∈ C, there is a sub-collection of B
of the set basis problem, here is how we transform it
whose union is exactly c?
to an instance of the decision Role Mining Problem:
S denotes the permissions P RM S. C denotes U P A.
Theorem 1. The decision RMP is NP-complete.
Thus, every set c ∈ C stands for one user u. Set
Proof. • The decision Role Mining Problem is in θ = 0. Now, the answer to the decision MinNoise RMP
NP. The set of roles ROLES, the user-to-role assign- directly provides the answer to the set basis problem.
ment U A, and the role-to-permission assignment P A
together form the polynomial certificate/witness. • The transformation is clearly polynomial.
179
p1 p2 p3 p4 p5 p6 p7 p1 p2 p3 p4 p5 p6 p7
u1 1 1 0 0 1 1 1 u1 1 1 0 0 1 1 1
u2 0 0 0 1 1 1 1 u2 0 0 1 1 1 1
u3 1 1 0 1 1 0 0 u3 1 1 0 1 1 0 0
u4 1 1 0 0 0 0 0 u4 1 1 0 0 0 0 0
(a) A 4 × 7 user-to-permission assignment (UPA) (b) Shaded areas indicate tiles, the 3 identified roles
R1 R2 R3
p1 p2 p3 p4 p5 p6 p7
u1 1 0 1
R1 1 1 0 0 0 0 0
u2 0 1 1
R2 0 0 0 1 1 0 0
u3 1 1 0
R3 0 0 0 0 1 1 1
u4 1 0 0
Instead of asking for the user-role assignment, U A, as well (subset) of the attributes. Then a tile t corresponding to an
as the role-permission assignment P A, we could consider the itemset I consists of the columns in itemset I as well as all
problem of obtaining each individually. For exact cover, [14, the rows that have 1s in all the columns in I. The area of a
13] shows that given the set of roles, and the role-permission tile is defined as the cardinality of the tile (i.e., the number
assignment, one can determine the user-role assignment in of 1s in the tile).
polynomial time. However, when an approximate answer Informally, a tile consists of a block of ones in a boolean
is required, such as in the MinNoise RMP, determining the database as shown in Figure 1(b). A collection of (possibly
user-role assignment requires O(2k mn) time – this is known overlapping) tiles constitutes a tiling. Among the collection
as fixed parameter tractable since the solution is exponential of 5 related problems defined in [6], the Minimum Tiling
only in terms of a fixed parameter. Unfortunately, k refers problem is of the most interest to us, which is defined below.
to the number of roles which is likely to be quite large in
practice, making this quite infeasible. It remains to be seen Definition 12 (Minimum Tiling). Given a boolean ma-
if finding the user-role assignment in the case of the δ-approx trix, find a tiling of the matrix with area equal to the total
RMP is any easier. number of 1s in the matrix and consisting of the least possi-
In the following sections, we show that the RMP along ble number of tiles.
with several variants can be mapped to other problems al- 4.2 Mapping Basic RMP to Minimum Tiling
ready studied in the data mining and data analysis litera-
ture. We discuss the complexity for each variant along with To see that the Minimum Tiling problem corresponds ex-
suggested methods for solving the problem. actly to the basic RMP, one must first see how a tile corre-
sponds to a role. As defined above, a tile is just a block of 1s
– i.e., a collection of rows and columns that all have 1s. Re-
4. MAPPING THE RMP TO THE TILING member that without semantics, a role is simply a collection
PROBLEM of permissions. Thus, inherently, in any tile, the collection
of the columns provides the role-to-permission assignment
In this section, we demonstrate the equivalence of the Role
(P A) for that role. At the same time, the collection of rows
Mining Problem with the Tiling Databases problem. This
denotes those users/entities that have that role – thus the
mapping allows us to directly borrow existing implementa-
collection of rows corresponds to the user-to-role assignment
tion solutions to RMP. In fact, the original Database Tiling
(U A) for that role. As such, any tiling corresponds to a set
paper by Geerts et al. [6] looked at a set of five problems,
of roles and their role/permission and user/role assignments.
one of which exactly matches the role mining problem. We
If the tiling completely covers the entire matrix – then all
now describe the relevant problems studied and then discuss
1s have been covered, meaning that all user/permission as-
their implications.
signments have been covered. Since each tile corresponds to
4.1 Tiling Databases a role, if the tiling is minimal and covers the entire matrix,
this means that we have found a set of minimal roles such
Consider a binary matrix of size m × n where the number that they completely describe the given user-permission as-
of rows, m, can be viewed as the number of objects and signment.
the number of columns, n, can be viewed as the number of The following example clearly demonstrates this mapping.
attributes. A 1 in cell {ij} denotes that object i has/owns In the context of tiling databases, Figure 1(a) shows the
attribute j (i.e., some relationship exists between object i boolean matrix representing a transactional database con-
and attribute j). Now, let an itemset I denote a collection sisting of 4 transactions and 7 items. Rows denote the trans-
180
actions and columns denote the items. We may order trans- 4.3 Algorithm to Discover Minimal Roles
actions from top to bottom sequentially as 1 – 4 and items
from left to right as 1 – 7. A 1 in cell {ij} represents that Since the Minimum Tiling problem is equivalent to the
transaction i contains item j. Figure 1(b) shows a tiling of basic RMP, the algorithms developed for Minimum Tiling
the matrix consisting of 3 tiles. The shaded region repre- now directly apply. [6] proposes a greedy approximation al-
sents a tile. Thus, Tile 1={(1,1), (1,2), (3,1), (3,2), (4,1), gorithm to find the minimum tiling of any given database.
(4,2)}. Tile 2={(2,4), (2,5), (3,4), (3,5)} and Tile 3={(1,5), This algorithm depends on finding all maximal tiles having
(1,6), (1,7), (2,5), (2,6), (2,7)}. As one can see, Tiles 2 and an area over a given threshold. A depth first search strat-
3 overlap on cell (2, 5). Figure 1(b) also gives the mini- egy is used to find all large tiles. [6] prove that the Mini-
mum tiling of the matrix. It is not possible to find a tiling mum Tiling problem can be approximated within the factor
the covers the entire matrix with less than 3 tiles. We can O(log mn), given an oracle that finds for any database D
view the same problem from the role mining perspective. and tiling T , the tile t such that the area(T ∪ t) is the max-
As described before, each tile corresponds to a role. Fig- imum (i.e., the oracle returns the tile which covers as much
ure 1(c) and 1(d) show an optimal U A and P A, such that of the remaining uncovered part of the database). Such an
M (U A) ⊗ M (P A) = M (U P A). Again, the decomposition oracle can be implemented reasonably efficiently by adapt-
is optimal in the sense that it is impossible to find only two ing the maximal tile algorithm. [6] provides more detail on
roles such that U A and P A will be 0-consistent with U P A. this. We now briefly present the adapted algorithm for the
Formally, we can reduce the Minimum Tiling problem to basic RMP.
the basic RMP as follows. Algorithm 1 presents the basic RMP algorithm. It con-
sists of two phases. In the first phase, we find a minimum
Theorem 4. The Minimum Tiling problem is identical tiling for the given U P A. In the second phase, we convert
to the basic Role Mining Problem. the tiling into ROLES, U A, and P A. As described earlier,
Proof. To show that the two problems are identical we phase 1 uses a simple greedy strategy of adding the largest
show that their inputs and outputs exactly match. Thus, uncovered tile to the current tiling, until U P A is completely
for every input instance, the output of both problems have covered (i.e., the largest uncovered tile remaining is empty).
a direct one-to-one mapping. Algorithm 2 describes the procedure for finding the largest
uncovered tile from U P A.
• The input to both problems is a m×n boolean matrix.
• For any problem instance, the Minimum Tiling prob- Algorithm 1 RMP(U P A)
lem returns a set of tiles that completely cover the 1: {Find the minimum tiling for U P A}
input while minimizing the number of tiles. Each tile 2: T ← {}
corresponds to a role, R. For each tile, we extract the 3: while (T ← LUTM(U P A,T )) = {} do
set of columns C, in the tile. For each column c ∈ C, 4: T ← T ∪ T
add the assignment {c, R} to P A. Similarly, for each 5: end while
row i, belonging to the tile, add the assignment {i, R} 6: {Convert the minimum tiling into U A and P A}
to U A. Add R to ROLES. 7: ROLES ← {}, U A ← {}, P A ← {}
• The resulting set of roles (ROLES), user-role assign- 8: for each tile t ∈ T do
ment (U A), and permission-role assignment (P A) are 9: Create a new role R and add it to ROLES
guaranteed to be a solution to the basic RMP. (i.e., 10: Extract the set of permissions P in the tile
U A and P A are 0-consistent with the corresponding 11: For each permission p ∈ P , add the assignment {p, R}
U P A, and the number of roles is minimal). To prove to P A
the 0-consistency, it is sufficient to note that U A ⊗ P A 12: Extract the set of users U in the tile
gives us the original tiling of the input matrix which is 13: For each user u ∈ U , add the assignment {u, R} to
equivalent to the original U P A. We can prove the min- UA
imality by contradiction. Assume that a different solu- 14: end for
tion to the RMP exists – consisting of ROLES , U A
and P A where |ROLES | < |ROLES|. In this case, The LUTM algorithm (Algorithm 2) is a depth-first re-
we can transform this solution into a corresponding cursive algorithm that finds the largest uncovered tile. In
solution for tiling. For each role r ∈ ROLES , create order to do a depth-first search, we simply assume some
the corresponding tile tR consisting of the permissions canonical order over the permissions. The key idea behind
given by P A and the users given by U A . The union the algorithm is that all large tiles containing a permis-
of all tiles R TR gives a tiling of the matrix. This sion i ∈ P RM S, but not containing any permission lower
tiling covers the entire matrix since U A and P A are than i (according to the canonical order) can be found in
0-consistent with U P A. However, the number of tiles the so-called i-conditional database [7]. In our context,
is the same as |ROLES | which is less than |ROLES|. the P -conditional database U P AP consists of all user-to-
But that means that the earlier solution is not min- permission assignments that contain P , but from which all
imal – and we have a contradiction. Therefore, the permissions before the last permission in P and that last
solution to the tiling databases problem directly maps permission itself would have been removed. Now, any large
to a solution for the role mining problem. tile that is found in this conditional database, at once implies
a corresponding large tile including P . Therefore, whenever
we want to compute an area associated with a set of permis-
Thus, the Minimum Tiling problem exactly corresponds sions P in U P AP , we simply need to add |P | to the width
to the basic RMP. of the area (|P |) and multiply this with |U (P )| [6]. We
181
p1 p2 p3 R1 R2 p1 p2 p3
u1 1 1 1 u1 0 1
R1 1 0 0
u2 1 0 0 u2 1 0
u3 1 1 1 u3 0 1 R2 0 1 1
u4 1 1 1 u4 1 1
(a) A 4 × 3 user-to-permission
(b) Decomposition of UPA into UA
assignment (UPA).
and PA with k=2.
R1 R2
u1 0 1 p1 p2 p3
u2 1 0 R1 1 0 0
u3 1 1
R2 1 1 1
u4 1 1
modify the original LTM algorithm [6] to return the largest algorithm shown here is quite simple. However, its ef-
uncovered tile. For this, we keep track of the current largest ficiency can be significantly improved by using several
uncovered tile, LT, and its uncovered area, LTarea. The pruning techniques – more details can be found in [6].
main steps of the algorithm are as follows:
Step 1: Originally, LT and LTarea are initialized to the empty Algorithm 2 LUTM(UPA,T)
set and 0, respectively. The current set of permissions
being considered, P is also initialized to the empty set. 1: P ← {}
Lines 1 and 2 perform this initialization. 2: LT ← {}, AreaLT ← 0
3: for ∀p ∈ P RM S do
Step 2: Line 3 starts the main loop of the algorithm, and iter- 4: if uncovered area of t(P ∪ {p}) > AreaLT then
ates over each permission separately. On lines 4-7, if 5: LT ← t(P ∪ {p})
the uncovered area of the current tile being considered 6: Update AreaLT to have uncovered area of t(P ∪
is larger than the current known best, the best is up- {p})
dated to this. i.e., LT and LTarea always refer to the 7: end if
largest uncovered tile seen so far. Over here, we need 8: {Create the conditional database for recursion}
to clarify what we mean by uncovered area. For any 9: U P A(P ∪{p}) ← {}
tile, the uncovered area is the number of 1s that the 10: for (∀q|(q ∈ P RM S) ∩ (q > p)) do
tile covers that are not already covered in the existing 11: Add (q, U ({p}) ∩ U ({q})) to U P A(P ∪{p})
tiling – i.e., the uncovered area refers to that part of 12: end for
the tile that is new and not seen before. 13: Compute T ((P ∪ {p}), U P A(P ∪{p}) ) recursively
Referring back to Figure 1(b), assume that the current 14: end for
tiling consists of Tile 1 and Tile 2. Now, the covered
area is simply the distinct number of 1s included in
the Tiling. In our case, since the tiles do not overlap, 5. MAPPING THE MINNOISE RMP TO THE
the overall covered area is equal to 10 (6 for Tile 1 and
4 for Tile 2). DISCRETE BASIS PROBLEM
Now, suppose we are considering Tile 3. The uncov- In this section, we demonstrate the direct equivalence of
ered area of Tile 3 is 5 (since the total number of 1s in the MinNoise RMP to the Discrete Basis problem. This
Tile 3 is 6, and one out of those 1s, at position {u2,p5} mapping again allows us to directly borrow existing imple-
is already covered in the current tiling). Thus, given mentation solutions. Miettinen, in his thesis [13], studies a
a database and an existing tiling, whenever a new tile set of three related problems and shows that these are NP-
is considered, it is easy to compute the uncovered area complete. We now describe the relevant problems studied
by simply removing the already covered area from the and then discuss their implications.
area of the tile. The Discrete Basis problem [14] studies the problem of
finding a basis from given data. Similar to Principal Com-
Step 3: Lines 8-12 creates the conditional database U P AP .
ponent Analysis (PCA), the discrete basis problem is a tech-
Step 4: Finally, line 13 invokes the algorithm recursively to nique for simplifying a dataset, by reducing multidimen-
calculate the largest uncovered area in the smaller con- sional datasets to lower dimensions for summarization, anal-
ditional database. Since the conditional database pro- ysis, and/or compression. Unlike PCA, the discrete ba-
gressively shrinks, the algorithm is guaranteed to fin- sis problem only considers boolean data, and finds boolean
ish after all the permissions have been considered. The bases.
182
We have already introduced some of the notation used for time of this algorithm is clearly polynomial in the size of the
defining the discrete basis problem from [14]. Formally, the input [13].
discrete basis problem is defined as follows: Miettinen [13] also shows that the discrete basis problem
cannot be approximated to in polynomial time within any
Definition 13 (Discrete Basis Problem). Given a constant factor unless P = N P . This essentially shuts the
matrix C ∈ {0, 1}n×d and a positive integer k ≤ min{n, d}, door on any attempt to find an approximation algorithm
find a matrix B ∈ {0, 1}k×d minimizing for the problem. However, heuristic solutions based on as-
sociation rule mining are proposed and seem to give fairly
l⊗ (C, B) = minS∈{0,1}n×k C − S ⊗ B 1 good results on simulated data. Again, [13] provides further
details on this. Other heuristics can also be used. One pos-
The Discrete Basis Problem only asks for a discrete basis.
sibility is to extend the RoleMiner algorithm [21] to find the
A related problem is the Basis Usage problem: best candidates to describe the dataset. As part of future
work, we intend to comprehensively test a set of heuristics
Definition 14 (Basis Usage Problem). Given a ma-
(including the one in [13]) to determine what really works
trix C ∈ {0, 1}n×d and a matrix B ∈ {0, 1}k×d , find a matrix
well in our domain.
S ∈ {0, 1}n×k minimizing
C − S ⊗ B 1
183
Acknowledgments [11] G. Markowsky. Ordering d-classes and computing
We would like to gratefully acknowledge the help of Pauli schein rank is hard. Semi-group Forum, 44:373–375,
Miettinen and Taneli Mielikainen. 1992.
[12] T. Mielikäinen. Intersecting data to closed sets with
constraints. In B. Goethals and M. J. Zaki, editors,
7. REFERENCES FIMI, volume 90 of CEUR Workshop Proceedings.
[1] C. Damm, K. H. Kim, and F. Roush. On covering and CEUR-WS.org, 2003.
rank problems for boolean matrices and their [13] P. Miettinen. The discrete basis problem, master’s
applications. In Computing and Combinatorics: 5th thesis. Master’s thesis, University of Helsinki, 2006.
Annual International Conference, COCOON’99, [14] P. Miettinen, T. Mielikainen, A. Gionis, G. Das, and
volume 1627 of Lecture Notes in Computer Science, H. Mannila. The discrete basis problem. In Knowledge
pages 123 – 133. Springer-Verlag, 1999. Discovery in Databases: PKDD 2006, Lecture Notes
[2] E.J.Coyne. Role-engineering. In 1st ACM Workshop in Artificial Intelligence, pages 335 – 346, 2006.
on Role-Based Access Control, 1995. [15] N. Mishra, D. Ron, and R. Swaminathan. On finding
[3] D. Ferraiolo, R. Sandhu, S. Gavrila, D. Kuhn, and large conjunctive clusters. In Learning Theory and
R. Chandramouli. Proposed nist standard for Kernel Machines: 16th Annual Conference on
role-based access control. TISSEC, 2001. Learning Theory and 7th Kernel Workshop,
[4] M. P. Gallagher, A. O’Connor, and B. Kropp. The COLT/Kernel 2003, volume 2777 of Lecture Notes in
economic impact of role-based access control. Computer Science, pages 448 – 462. Springer, 2003.
Planning report 02-1, National Institute of Standards [16] F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. J.
and Technology, March 2002. Zaki. Carpenter: finding closed patterns in long
[5] M. R. Garey and D. S. Johnson. Computers and biological datasets. In KDD, pages 637–642, 2003.
Intractability: A Guide to the Theory of [17] R. Peeters. The maximum edge biclique problem is
NP-Completeness, chapter 3. W. H. Freeman, 1979. np-complete. Discrete Appl. Math., 131(3):651–654,
[6] F. Geerts, B. Goethals, and T. Mielikainen. Tiling 2003.
databases. In Discovery Science, Lecture Notes in [18] R. S. Sandhu et al. Role-based Access Control Models.
Computer Science, pages 278 – 289. Springer-Verlag, IEEE Computer, pages 38–47, February 1996.
2004. [19] A. Schaad, J. Moffett, and J. Jacob. The role-based
[7] J. Han, J. Pei, and Y. Yin. Mining frequent patterns access control system of a european bank: A case
without candidate generation. In W. Chen, study and discussion. In Proceedings of ACM
J. Naughton, and P. A. Bernstein, editors, 2000 ACM Symposium on Access Control Models and
SIGMOD Intl. Conference on Management of Data, Technologies, pages 3–9, May 2001.
pages 1–12. ACM Press, 05 2000. [20] J. Schlegelmilch and U. Steffens. Role mining with
[8] D. S. Hochbaum. Approximating clique and biclique orca. In Symposium on Access Control Models and
problems. J. Algorithms, 29(1):174–200, 1998. Technologies (SACMAT). ACM, June 2005.
[9] A. Kern, M. Kuhlmann, A. Schaad, and J. Moffett. [21] J. Vaidya, V. Atluri, and J. Warner. Roleminer:
Observations on the role life-cycle in the context of mining roles using subset enumeration. In CCS ’06:
enterprise security management. In 7th ACM Proceedings of the 13th ACM conference on Computer
Symposium on Access Control Models and and communications security, pages 144–153, 2006.
Technologies, June 2002.
[10] M. Kuhlmann, D. Shohat, and G. Schimpf. Role
mining - revealing business roles for security
administration using data mining technology. In
Symposium on Access Control Models and
Technologies (SACMAT). ACM, June 2003.
184