Distributed and Parallel Computation of the ...

Distributed and Parallel Computation of the Canonical Direct Basis J-F. Viaud1 , K. Bertet1 , C. Demko1 , R. Missaoui2 1 Laboratoire

L3i, Universit´ e de La Rochelle, France

2 Universit´ e

J-F. Viaud et al.

de Qu´ ebec en Outaouais, Canada

Canonical Direct Basis

2017 − 06 − 14

1 / 31

1

General Overview Introduction Technical points

2

Main contribution Overview Dual Transversal Computing the CDB with Minimal Dual Transversals

3

Conclusion

4

References

J-F. Viaud et al.


2017 − 06 − 14

2 / 31

Context

Mining association rules, including implications, is a key topic [ZP02] in Knowledge Discovery (KD) research area and in Formal Concept Analysis (FCA) [GW99]. Parallel or distributed architectures are easily accessible and the increasing size of data motivates their use.

J-F. Viaud et al.


2017 − 06 − 14

3 / 31

Previous work

Many efficient algorithms for implication basis computation have been designed [VA14]. Considering the canonical basis (or Guigues-Duquenne basis or stem basis), sequential [OD07] and parallel algorithms exist [FN03, KOV10, KB15]. Distributed computation in FCA remain scarce [KV09, TBK11, XdFROF12]. The computation of the canonical direct basis [BM10] or the D-basis [ANR12], in a parallel or distributed manner, has been less studied.

J-F. Viaud et al.


2017 − 06 − 14

4 / 31

Notation Definition A formal context (G , M, I ) (see Table 1) is defined by a set G of objects, a set M of attributes, and a binary relation I ⊆ G × M, between G and M.

Two operators are derived: 1 2 3 4 5 6

a x

x x

b x x

x

c

d

e

x

x x

x

for each subset X ⊂ G , we define X 0 = {m ∈ M, g I m ∀g ∈ X } and dually,

x

for each subset Y ⊂ M, we define Y 0 = {g ∈ G , g I m ∀m ∈ Y }.

x x x

x

Table 1: A binary context J-F. Viaud et al.

For instance 10 = ab and (cd)0 = 26. Canonical Direct Basis

2017 − 06 − 14

5 / 31

Implications Definition Let K = (G , M, I ) a formal context. An implication is a binary relation between two sets A, B ⊆ M, usually written A → B, defined by B 0 ⊆ A0 . Given an implication A → B, A is the premise and B is the conclusion.

1 2 3 4 5 6

a x

x x

b x x

x

c

d

e

x

x x

x

x

x

x x x

When A → B holds, every object having all attributes in A also has all attributes in B. In the previous example bd → c holds.

Table 2: Running example J-F. Viaud et al.


2017 − 06 − 14

6 / 31

Basis

Definition Given a context, a set of implications is a basis if every implication can be deduced from the basis. A basis is direct if the closure of any arbitrary subset of its implications is computed through a single iteration. A basis is unitary if conclusions of its implications are singletons. Implications are deduced from each other using an inference system such as Armstrong’s axioms or Simplification Logic [RLB15].

J-F. Viaud et al.


2017 − 06 − 14

7 / 31

Generators

Definition Given a closed set of attributes T = T 00 , a subset Y ⊆ M is a generator of T , if Y 00 = T . Among all generators of a given set, some are minimal for the inclusion and are called minimal generators. Premises of the implications in the canonical direct basis are minimal generators. The canonical direct basis has been discovered many times, with many names, including the generic basis. The last item can be considered as a definition.

J-F. Viaud et al.


2017 − 06 − 14

8 / 31

Canonical Direct Basis Considering our previous example whose context is given in Table 1, the canonical direct basis is given in Table 4

1 2 3 4 5 6

a x

x x

b x x

c

d

e

x

x x

x

x

x x x

x

x

bd → c cd → b abc → e ae → bc ce → abd

bc → d e→d ad → bce be → acd

Table 4: The canonical direct basis of the context given in Table 1.

Table 3: Running example

J-F. Viaud et al.


2017 − 06 − 14

9 / 31

1


2


3

Conclusion

4

References

J-F. Viaud et al.


2017 − 06 − 14

10 / 31

Steps

The main steps of our procedure are as follows: First, split the initially given context K = (G , M, I ) into sub-contexts Kj = (Gj , M, I ∩ Gj × A), such that G = ∪Gj and K is the subposition of the Kj for 1 ≤ j ≤ m. Compute the canonical direct unit basis of each sub-context For each unary-attribute conclusion, merge implications from canonical unit direct bases using minimal dual transversals.

J-F. Viaud et al.


2017 − 06 − 14

11 / 31

Definition The merge process is based on the following definition:

Definition Let (Mij )1≤j≤m,1≤i≤nj be a finite family of sets corresponding to the premises of implications (with the same unary-attribute conclusion) related to the m generated subcontexts, where nj is the cardinality of Mij . The set τ is a dual transversal if for all 1 ≤ j ≤ m, there exists 1 ≤ ij ≤ nj such that: τ = ∪1≤j≤m Mijj This means that τ contains at least one member from each family (Mij )1≤i≤nj , j being given. A dual transversal τ is minimal if any dual transversal τ 0 ⊆ τ is such that τ 0 = τ .

J-F. Viaud et al.


2017 − 06 − 14

12 / 31

Example Consider the following example with two set families (i.e., m = 2). Take M1 = {b, e} and M2 = {c, e, ac, ae, be, ce}. Then e and bc are dual transversals. However, ace is not minimal since ae is also a dual transversal. Consider the context in Table 1 and let us split it horizontally into the two subcontexts shown in Tables 5 and 6. The generated bases from the two subcontexts are given in Table 7.

1 2 3

a x

b x x

c x

d x x

e

x

Table 5: Upper part of the initial context.

J-F. Viaud et al.

4 5 6

a

b

x x

x

c x x x

d

e

x

x

Table 6: Lower part of the initial context.


2017 − 06 − 14

13 / 31

Basis Upper Context be → a ce → a

Lower Context b→a d →a e→a

a→b c →b ac → b ad → b ae → b ce → b ac → e ad → e

d →b e→b

Upper Context bd → c ad → c ae → c be → c c→d e→d ac → d ae → d be → d ce → d

Lower Context a→c b→c d →c e→c b→d b→d

b→e d →e

Table 7: The canonical direct unit basis of each one of the sub-contexts of the initial context given in Table 1. J-F. Viaud et al.


2017 − 06 − 14

14 / 31

Theorem

Theorem Let K be the context split into m sub-contexts Kj with 1 ≤ j ≤ m. Let a be a unary-attribute conclusion, and Mij → a, i ∈ {1, . . . , nj } implications of the canonical direct unit basis computed for each Kj . Let {τk | 1 ≤ k ≤ p} be the set of minimal dual transversals of (Mij )1≤j≤m,1≤i≤nj . Then τk → a is an implication of the canonical direct unit basis of K with a in its conclusion.

J-F. Viaud et al.


2017 − 06 − 14

15 / 31

Algorithm to compute minimal dual transversals

First, transform implications into implications with unary-attribute conclusion. Use algorithm on next slide for each attribute in the conclusion. Computing the canonical direct basis by means of minimal tranversals is NP-complete [BDVG16]. Consider Mij the premises of the canonical direct unit basis of the Kj . Minimal dual transversals are computed iteratively, considering successively each j.

J-F. Viaud et al.


2017 − 06 − 14

16 / 31

Algorithm

1

2 3 4 5 6 7 8 9

Algorithm 1: Merging algorithm to compute minimal unit transversals Data: K ← Initial formal context split into m subcontexts Kj with 1 ≤ j ≤ m; Mij ← premises of the CDB of Kj . Output: Lm containing the premises of the canoninal basis of K . L1 ← Set of all the Mi1 ; /* Special treatment for j = 1 */ for j ← 2 to m do forall α ∈ Lj−1 and β ∈ (Mij )1≤i≤nj do if Lj contains a set included in α ∪ β then Remove α ∪ β else Add α ∪ β to Lj Remove every set of Lj strictly included in α ∪ β Return Lm J-F. Viaud et al.


2017 − 06 − 14

17 / 31

Algorithm

Step j = 1: The unique context is K1 , so minimal dual transversals are in the set Mi1 . Let L1 be the set of these minimal dual transversals. Further: Suppose that the merge process has been done up to Step j − 1. The set Lj−1 of minimal dual transversals covering contexts K1 to Kj−1 is computed. Lj−1 contains all the premises of the canonical direct unit basis of the subposition of K1 , . . . , Kj−1 having a as a conclusion.

J-F. Viaud et al.


2017 − 06 − 14

18 / 31

Algorithm Step j: Consider successively all the Mij and all the α ∈ Lj−1 . At the beginning, Lj is supposed to be empty. Now consider α ∪ Mij : If Lj contains a set included in α ∪ Mij , then α ∪ Mij is removed. If not, α ∪ Mij is added to Lj . Every set of Lj strictly included in α ∪ Mij is removed. Notice that, if α ∪ Mij ∈ Lj , it is not removed at this step. At the end of the process, we get all the minimal dual transversals since the minimal property is maintained at each step.

J-F. Viaud et al.


2017 − 06 − 14

19 / 31

Algorithm

1 2 3 4 5 6

a x

x x

b x x

c

d

e

x

x x

x

x

x x x

x

x

bd → c cd → b abc → e ae → bc ce → abd

bc → d e→d ad → bce be → acd

Table 9: The canonical direct basis of the running example.

Table 8: Running example

Let us focus on implications with conclusion e. At the first step, we immediately get L1 = {ac, ad}. Iterations of the second step are given in Table 10. J-F. Viaud et al.


2017 − 06 − 14

20 / 31

Algorithm

Iteration 1 2 3 4

Element of L1 ac ac ad ad

Mi2 b d b d

Dual Transversal abc acd abd ad

Removed at Iteration 4 4

Table 10: Second step of the computation of minimal dual transversals with the conclusion e.

At the end of the four iterations of Table 10, L2 = {abc, ad} and these sets are minimal generators for the implications abc → e and ad → e respectively. We can then repeat this process for the rest of attributes in M and thus get the canonical direct unit basis. J-F. Viaud et al.


2017 − 06 − 14

21 / 31

1


2


3

Conclusion

4

References

J-F. Viaud et al.


2017 − 06 − 14

22 / 31

Conclusion

We presented a new algorithm to compute the canonical direct unit basis by exploiting the notion of minimal dual transversal. These minimal dual transversals are computed by means of a merge process from minimal generators of subcontexts. This point of view allows the computation of the canonical direct unit basis in a parallel and distributed way. Very first experiments with a parallelized algorithm seem to show that the execution time of the merging process is too high to get an efficient computation. Further distributed experiments using MapReduce will be conducted.

J-F. Viaud et al.


2017 − 06 − 14

23 / 31

Perspective

Our algorithm can also be generalized to association rules. Indeed, rules that are valid in a subcontext having n1 rows over the whole set of n objects have a confidence greater than or equal to nn1 .

J-F. Viaud et al.


2017 − 06 − 14

24 / 31

The End Thanks Any questions ?

J-F. Viaud et al.


2017 − 06 − 14

25 / 31

1


2


3

Conclusion

4

References

J-F. Viaud et al.


2017 − 06 − 14

26 / 31

Bibliography I

Kira V. Adaricheva, James B. Nation, and Robert Rand. Ordered direct implicational basis of a finite closure system. In ISAIM, 2012. Karell Bertet, Christophe Demko, Jean-Fran¸cois Viaud, and Clément Guérin. Lattices, closures systems and implication bases: A survey of structural aspects and algorithms. Theoretical Computer Science, pages –, 2016. Karell Bertet and Bernard Monjardet. The multiple facets of the canonical direct unit implicational basis. Theoretical Computer Science, 411(22–24):2155 – 2166, 2010.

J-F. Viaud et al.


2017 − 06 − 14

27 / 31

Bibliography II

Huaiguo Fu and E.M. Nguifo. Partitioning large data to scale up lattice-based algorithm. In Tools with Artificial Intelligence, 2003. Proceedings. 15th IEEE International Conference on, pages 537–541, Nov 2003. Bernhard Ganter and Rudolf Wille. Formal concept analysis - mathematical foundations. Springer, 1999. Francesco Kriegel and Daniel Borchmann. Nextclosures: Parallel computation of the canonical base. In Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications, Clermont-Ferrand, France, October 13-16, 2015., pages 181–192, 2015.

J-F. Viaud et al.


2017 − 06 − 14

28 / 31

Bibliography III Petr Krajca, Jan Outrata, and Vilem Vychodil. Parallel algorithm for computing fixpoints of galois connections. Annals of Mathematics and Artificial Intelligence, 59(2):257–272, 2010. Petr Krajca and Vilem Vychodil. Distributed algorithm for computing formal concepts using map-reduce framework. In NiallM. Adams, Céline Robardet, Arno Siebes, and Jean-Fran¸cois Boulicaut, editors, Advances in Intelligent Data Analysis VIII, volume 5772 of Lecture Notes in Computer Science, pages 333–344. Springer Berlin Heidelberg, 2009. S. Obiedkov and V. Duquenne. Attribute-incremental construction of the canonical implication basis. Annals of Mathematics and Artificial Intelligence, 49(1-4):77–99, April 2007. J-F. Viaud et al.


2017 − 06 − 14

29 / 31

Bibliography IV Estrella Rodr´ıguez-Lorenzo and Karell Bertet. From implicational systems to direct-optimal bases: A logic-based approach. Applied Mathematics & Information Sciences, 9(305):305–317, 2015. Elena Tsiporkova, Veselka Boeva, and Elena Kostadinova. Mapreduce and fca approach for clustering of multiple-experiment data compendium. In Patrick De Causmaecker, Joris Maervoet, Tommy Messelis, Katja Verbeeck, and Tim Vermeulen, editors, Proceedings of the 23rd Benelux Conference on Artificial Intelligence, 2011. Lan Vu and Gita Alaghband. Novel parallel method for association rule mining on multi-core shared memory systems. Parallel Computing, 40(10):768 – 785, 2014. J-F. Viaud et al.


2017 − 06 − 14

30 / 31

Bibliography V

Biao Xu, Ruairi de Frein, Eric Robson, and Micheal O Foghlu. Distributed formal concept analysis algorithms based on an iterative mapreduce framework. In Florent Domenach, DmitryI. Ignatov, and Jonas Poelmans, editors, Formal Concept Analysis, volume 7278 of Lecture Notes in Computer Science, pages 292–308. Springer Berlin Heidelberg, 2012. Mohammed Javeed Zaki and Yi Pan. Introduction: Recent developments in parallel and distributed data mining. Distributed and Parallel Databases, 11(2):123–127, 2002.

J-F. Viaud et al.


2017 − 06 − 14

31 / 31