Skip to main content

Knowledge-Based Sampling for Subgroup Discovery

  • Conference paper
Local Pattern Detection

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3539))

Abstract

Subgroup discovery aims at finding interesting subsets of a classified example set that deviates from the overall distribution. The search is guided by a so-called utility function, trading the size of subsets (coverage) against their statistical unusualness. By choosing the utility function accordingly, subgroup discovery is well suited to find interesting rules with much smaller coverage and bias than possible with standard classifier induction algorithms. Smaller subsets can be considered local patterns, but this work uses yet another definition: According to this definition global patterns consist of all patterns reflecting the prior knowledge available to a learner, including all previously found patterns. All further unexpected regularities in the data are referred to as local patterns. To address local pattern mining in this scenario, an extension of subgroup discovery by the knowledge-based sampling approach to iterative model refinement is presented. It is a general, cheap way of incorporating prior probabilistic knowledge in arbitrary form into Data Mining algorithms addressing supervised learning tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  2. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. In: Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD 1997), Tucson, AZ, pp. 255–264. ACM, New York (1997)

    Chapter  Google Scholar 

  4. Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers. Submitted to Machine Learning (2004)

    Google Scholar 

  5. Freund, Y., Schapire, R.R.: A decision–theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  6. Friedman, J.H., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. Annals of Statistics (28), 337–374 (2000)

    Google Scholar 

  7. Fürnkranz, J., Flach, P.A.: An Analysis of Rule Evaluation Metrics. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003). Morgan Kaufman, San Francisco (2003)

    Google Scholar 

  8. Hand, D.: Pattern detection and discovery. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, p. 1. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  10. Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, ch. 3, pp. 249–272. AAAI Press/The MIT Press, Menlo Park (1996)

    Google Scholar 

  11. Lavrac, N., Zelezny, F., Flach, P.: RSD: Relational subgroup discovery through first-order feature construction. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 149–165. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Lavrac, N., Flach, P., Kavsek, B., Todorovski, L.: Rule Induction for Subgroup Discovery with CN2-SD. In: Bohanec, M., Mladenic, D., Lavrac, N. (eds.) 2nd Int. Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta Learning (August 2002)

    Google Scholar 

  13. Lavrac, N., Flach, P., Zupan, B.: Rule Evaluation Measures: A Unifying View. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, p. 174. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  14. Mackay, D.J.C.: Introduction To Monte Carlo Methods. In: Learning in Graphical Models, pp. 175–204 (1998)

    Google Scholar 

  15. Mierswa, I., Klinkberg, R., Fischer, S., Ritthoff, O.: A Flexible Platform for Knowledge Discovery Experiments: YALE – Yet Another Learning Environment. In: LLWA 2003 - Tagungsband der GI-Workshop-Woche Lernen - Lehren - Wissen - Adaptivität (2003)

    Google Scholar 

  16. Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  17. Schapire, R.E.: The Strength of Weak Learnability. Machine Learning 5, 197–227 (1990)

    Google Scholar 

  18. Schapire, R.E., Singer, Y.: Improved boosting using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)

    Article  MATH  Google Scholar 

  19. Scheffer, T., Wrobel, S.: A Sequential Sampling Algorithm for a General Class of Utility Criteria. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (2000)

    Google Scholar 

  20. Scheffer, T., Wrobel, S.: Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling. Journal of Machine Learning Research 3, 833–862 (2002)

    Article  MathSciNet  Google Scholar 

  21. Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering 8(6), 970–974 (December 1996)

    Article  Google Scholar 

  22. Suzuki, E.: Discovering Interesting Exception Rules with Rule Pair. In: ECML/PKDD 2004 Workshop, Advances in Inductive Rule Learning (2004)

    Google Scholar 

  23. Witten, I., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  24. Wrobel, S.: An Algorithm for Multi–relational Discovery of Subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)

    Google Scholar 

  25. Zadrozny, B., Langford, J., Naoki, A.: Cost–Sensitive Learning by Cost–Proportionate Example Weighting. In: Proceedings of the 2003 IEEE International Conference on Data Mining, ICDM 2003 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scholz, M. (2005). Knowledge-Based Sampling for Subgroup Discovery. In: Morik, K., Boulicaut, JF., Siebes, A. (eds) Local Pattern Detection. Lecture Notes in Computer Science(), vol 3539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11504245_11

Download citation

  • DOI: https://doi.org/10.1007/11504245_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26543-6

  • Online ISBN: 978-3-540-31894-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics