In:
電腦學刊, Angle Publishing Co., Ltd., Vol. 33, No. 2 ( 2022-04), p. 105-114
Abstract:
〈p〉The text filtering of traditional anti spam system mainly focuses on keyword matching and text fingerprint analysis, which is difficult to accurately identify and classify spam. Therefore, an integrated learning algorithm based on stackin g is proposed in this paper. Firstly, the algorithm takes the manually marked text data of various categories as samples, uses TF-IDF algorithm to train the word vector space model, then selects linear SVC, xgboost and logistic regression algorithm to structure the base classifier, uses random forest algorithm to structure the meta classifier, and combines the stacking ensemble learning algorithm to structure the classification model. It achieves the function of dividing e-mail into five categories: illegal, advertisement, news, bill and recruitment. From the simulation results, the AUC values of the stacking integrated learning classification algorithm for each category are 0.92, 0.95, 1.00, 0.93 and 0.97 respectively, and the AP values are 0.86, 0.88, 1.00, 0.88 and 0.94 respectively, which realizes the high performance and high precision of text classification.〈/p〉
〈p〉 〈/p〉
Type of Medium:
Online Resource
ISSN:
1991-1599
,
1991-1599
Uniform Title:
An E-mail Classification Algorithm based on Stacking Integrated Learning
DOI:
10.53106/199115992022043302009
Language:
Unknown
Publisher:
Angle Publishing Co., Ltd.
Publication Date:
2022