MACHINE READABLE GRAMMAR FOR OPTIMIZING AUTOMATIC RETRIEVAL IN TEXT CORPUS: A Comparison of Regular Expression and Local Grammar Graph

PRIHANTORO, PRIHANTORO MACHINE READABLE GRAMMAR FOR OPTIMIZING AUTOMATIC RETRIEVAL IN TEXT CORPUS: A Comparison of Regular Expression and Local Grammar Graph. Proceeding of International Linguistic Congress (held by MLI: Indonesian Linguists Society) . ISSN ISBN 9786021716113

[img]Microsoft Word - Published Version
181Kb

Abstract

Machine Readable Grammar (MRG) is aimed at supporting the computer to perform Natural Language Processing (NLP) tasks. As for this paper, it discusses the one of the essences of MRG, which is to perform automatic retrieval in a text corpus. In automatic retrieval, the MRG serves a crucial importance in performing queries of the target expressions. In this research, I use Unitex, a Java based corpus processing software. It can manage texts from languages with their own alphabets such as Chinese, Greek, Japanese, Korean and etc. This software allows two different methods of queries. The first one is by using regular expression. This method resembles queries used in search engines like Google, Yahoo, Naver and some other well known search engines. The second one is by using Local Grammar Graphs (LGGs), which are the representation of finite state transducers (FSTs). When queries are performed by using LGGs, users can set various constraints and perform more complex retrievals. In terms of speed, regular expression works faster. But the advantage of LGGs is that it allows users to perform multiple retrievals and generate outputs at the same time. However, it depends completely to decide which MRG that suits their goal. Keywords: Machine Readable Grammar, Corpus, Regular Expression, Local Grammar Graphs

Item Type:Article
Subjects:P Language and Literature > PB Modern European Languages
P Language and Literature > P Philology. Linguistics
Divisions:Faculty of Humanities > Department of English
ID Code:42927
Deposited By:INVALID USER
Deposited On:21 Apr 2014 12:05
Last Modified:21 Apr 2014 12:08

Repository Staff Only: item control page