Author/Editor     Mladenić, Dunja; Grobelnik, Marko
Title     Word sequences as features in text-learning
Type     članek
Source     In: Zajc B, editor. Zbornik 7. elektrotehniške in računalniške konference ERK'98. Zvezek B. Računalništvo in informatika, umetna inteligenca, robotika, razpoznavanje vzorcev, biomedicinska tehnika, močnostna elektrotehnika, didaktika, študentski članki; 1998 sep 24-26; Portorož. Ljubljana: Slovenska sekcija IEEE,
Publication year     1998
Volume     str. 145-8
Language     eng
Abstract     This paper proposes an efficient algorithm for the generation of new features that enrich the known bagof-words document representation. New features are generated based on word sequences of different length. Learning is performed using Naive Bayesian classifier on feature-vectors, where only highly scored features are used. THe performance of enriched document representation is evaluated onthe problem of automatic document categorization using Yahoo text hierarchy. Our experiments show that using word sequences of length up to 3 instead of using only single words improves the performance, while longer sequences in average have no influence to the performance.
Descriptors     LEARNING
ARTIFICIAL INTELLIGENCE
ALGORITHMS