Tokenization and Filtering Process in RapidMiner

Tanu Verma; Renu; Deepti Gaur

Research Article

Tokenization and Filtering Process in RapidMiner

by Tanu Verma, Renu, Deepti Gaur

International Journal of Applied Information Systems

Foundation of Computer Science (FCS), NY, USA

Volume 7 - Issue 2

Published: April 2014

Authors: Tanu Verma, Renu, Deepti Gaur

10.5120/ijais14-451139

PDF

Tanu Verma, Renu, Deepti Gaur . Tokenization and Filtering Process in RapidMiner. International Journal of Applied Information Systems. 7, 2 (April 2014), 16-18. DOI=10.5120/ijais14-451139

                        @article{ 10.5120/ijais14-451139,
                        author  = { Tanu Verma,Renu,Deepti Gaur },
                        title   = { Tokenization and Filtering Process in RapidMiner },
                        journal = { International Journal of Applied Information Systems },
                        year    = { 2014 },
                        volume  = { 7 },
                        number  = { 2 },
                        pages   = { 16-18 },
                        doi     = { 10.5120/ijais14-451139 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2014
                        %A Tanu Verma
                        %A Renu
                        %A Deepti Gaur
                        %T Tokenization and Filtering Process in RapidMiner%T 
                        %J International Journal of Applied Information Systems
                        %V 7
                        %N 2
                        %P 16-18
                        %R 10.5120/ijais14-451139
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Text mining is defined as a knowledge-intensive process in which a user interacts with a document collection. As in data mining[2,4,9], text mining seeks to extract useful information from data sources through the identi?cation and exploration of interesting patterns. A key element of text mining is its focus on the document collection. A document collection can be any grouping of text-based documents. Most text mining solutions are aimed at discovering patterns across very large document collections. The number of documents can range from the many thousands to millions. In this paper, we will see how text mining is implemented in Rapidminer.

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in Proceedings of the 20th International Conference on Very Large Databases (VLDB-94), Chile, Sept. 1994.
Margaret H. Dunham, Data Mining "Introduction and Advanced Topics".
R. Baeza-Yates and B. Ribeiro-Neto, "Modern Information Retrieval" ACM Press, New York, 1999.
Agrawal , T. lmielinski and A. Swami " Database mining: A performance perspective", IEEE Transactions on knowledge and Data Eng. , vol. 5, no. 6.
M. E. Califf, editor. Papers from the Sixteenth National Conference on Arti?cial Intelligence(AAAI-99) Workshop on Machine Learning for Information Extraction, Orlando, FL, 1999. AAAI Press.
M. E. Califf and R. J. Mooney, " Relational learning of pattern-match rules for information extraction" in Proceedings of the 16th National Conference on Arti?cial Intelligence(AAAI-99), pages 328–334, Orlando, FL, July 1999.
C. Cardie, "Empirical methods in information extraction", AI Magazine, 18(4):65–79, 1997.
C. Cardie and R. J. Mooney, "Machine learning and natural language (Introduction to special issue on natural language learning)" Machine Learning, 34:5–9, 1999.
Jiawei Han and Micheline Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann Publisher, 722
Yang Y M, "An evaluation of statistical approach to text categorization [R]" in Technical Report CMU - CS - 97-127. Computer Science Department, Carnegie Mellon University, 1997
C. Choi and Y. Park "R&D proposal screening system based on text-mining approach", Int. J. Technol. Intell. Plan. , vol. 2, no. 1, pp. 61 -72 2006
H. C. Yang and C. H. Lee "A text mining approach for automatic construction of hypertexts", Expert Syst. Appl. , vol. 29, no. 4, pp. 723 -734 2005
Agrawal R, Imielinski T and Swami A, "Mining association rules between sets of items in large database[M]", Washington, DC: SIGMOD, 1993. 207-216.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Text mining Tokenize Filtering Stop words Stemming.