Abstract
Subgroup discovery is a popular form of supervised rule learning, applicable to descriptive and predictive tasks. In this work we study two natural extensions of classical subgroup discovery to distributed settings. In the first variant the goal is to efficiently identify global subgroups, i.e. the rules an analysis would yield after collecting all the data at a single central database. In contrast, the second considered variant takes the locality of data explicitly into account. The aim is to find patterns that point out major differences between individual databases with respect to a specific property of interest (target attribute). We point out substantial differences between these novel learning problems and other kinds of distributed data mining tasks. These differences motivate new search and communication strategies, aiming at a minimization of computation time and communication costs. We present and empirically evaluate new algorithms for both considered variants.
Chapter PDF
References
Zaki, M.J.: Parallel and Distributed Association Mining: A Survey. IEEE Concurrency 7 (1999)
Park, B.H., Kargupta, H.: Distributed Data Mining: Algorithms, Systems, and Applications. In: Ye, N. (ed.) Data Mining Handbook. IEA (2002)
Klösgen, W.: Subgroup discovery. In: Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (2002)
Wrobel, S.: An Algorithm for Multi–relational Discovery of Subgroups. In: Principles of Data Mining and Knowledge Discovery: First European Symposium (1997)
Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. In: Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)
Lavrac, N., Cestnik, B., Gamberger, D., Flach, P.: Decision support through subgroup discovery: three case studies and the lessons learned. MLJ 57 (2004)
Atzmüller, M., Puppe, F., Buscher, H.P.: Exploiting background knowledge for knowledge-intensive subgroup discovery. In: Proc. of IJCAI (2005)
Scholz, M.: Sampling-Based Sequential Subgroup Mining. In: Proc. of KDD (2005)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large data bases. In: Proc. of VLDB (1994)
Fürnkranz, J., Flach, P.: ROC ’n’ Rule Learning – Towards a Better Understanding of Covering Algorithms. MLJ 58 (2005)
Nada Lavrac, N., Flach, P., Zupan, B.: Rule Evaluation Measures: A Unifying View. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS, vol. 1634, p. 174. Springer, Heidelberg (1999)
Scheffer, T., Wrobel, S.: Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling. JMLR 3 (2002)
Scholz, M.: On the Tractability of Rule Discovery from Distributed Data. In: Proc. of ICDM (2005)
Otey, M.E., Parthasarathy, S., Wang, C., Veloso, A., Meira, W.: Parallel and Distributed Methods for Incremental Frequent Itemset Mining. IEEE Transactions on Systems, Man, and Cybernetics, Part B 34, 2439–2450 (2004)
Lazarevic, A., Obradovic, Z.: Boosting algorithms for parallel and distributed learning. Distributed and Parallel Databases Journal 11 (2002)
Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions On Knowledge And Data Engineering 8 (1996)
Cheung, D., Han, J., Ng, V., Fu, A., Fu, Y.: A Fast Distributed Algorithm for Mining Association Rules. In: International Conference on Parallel and Distributed Information Systems (1996)
Schuster, A., Wolff, R.: Communication-efficient distributed mining of association rules. In: Proc. of SIGMOD (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wurst, M., Scholz, M. (2006). Distributed Subgroup Mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_40
Download citation
DOI: https://doi.org/10.1007/11871637_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)