Distributed Algorithms for Computing Closed Itemsets Based on an Iterative MapReduce Framework

Xu, Biao (2011) Distributed Algorithms for Computing Closed Itemsets Based on an Iterative MapReduce Framework. Masters thesis, Waterford Institute of Technology.

[img]
Preview
PDF
thesis_BiaoXu_TSSG.pdf

Download (1MB) | Preview

Abstract

Closed Itemset mining is a major task both in Data Mining and Formal Concept Analysis. It is an efficient way to determine patterns which are hidden in raw data. These patterns may be expressed as association rules which capture relations between variables in databases. In the case of Formal Concept Analysis, Closed Itemsets form the basis of formal concepts. A concept consists of an extent and intent. It turns out that the intent is exactly a closed itemset whose support is the cardinality of extent. With the increasing application of Data Mining and Formal Concept Analysis in knowledge discovery, Closed Itemset mining is fast becoming an attractive research area. While many prominent algorithms have been developed to mine closed itemsets in both Formal Concept Analysis and Frequent Closed Itemset mining area, they are typically unsuitable for distributed implementation. The root cause is that the theoretical foundation of closed itemset mining is based on a centralized data representation. By examining the features of closed itemset, we propose distributed closed itemset computing theories for Formal Concept Analysis and Frequent Closed Itemset mining in a distributed environment. Additionally, taking the MapReduce framework as our inspiration we propose a general distributed approach for performing closed itemset mining. We then provide representative exemplars of how the classic centralized algorithms in Formal Concept Analysis and Frequent Closed Itemset mining can be implemented in a distributed fashion using our methodology and present a family of MR* algorithms, where the abbreviation stands for MapReduce and * stands for the underlying algorithm that has been adapted. To analyse the factors which impact a distributed algorithm's performance, we compare our MR* algorithms with the state-of-the-art. Experiments conducted on real datasets demonstrate that the MR* algorithms for Formal Concept Analysis are efficient and scalable. The MR* algorithm and theory for Frequent Closed Itemset mining of this work set the scene for future developments. We analyze our experimental results and discuss the bottlenecks which we will focus on in future work.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Distributed algorithms
Departments or Groups: *NONE OF THESE*
Divisions: School of Science > Department of Computing, Maths and Physics
Depositing User: Derek Langford
Date Deposited: 07 Nov 2012 11:13
Last Modified: 22 Aug 2016 10:26
URI: http://repository.wit.ie/id/eprint/1788

Actions (login required)

View Item View Item