Niam, Muhammad Jazilun and Hidayatno, Achmad and Isnanto, R.Rizal (2011) PEMBENTUKAN DERET PENDEKATAN UNTUK BAHASA INDONESIA MENGGUNAKAN METODE SHANNON. Undergraduate thesis, Diponegoro University.
| PDF - Published Version 229Kb |
Abstract
At the beginning of existing of information theory, Shannon has shown a visual representation of how a series of processes approaches a language in letter and word level, in this case is English. Then the question arises, whether the method can be used for other languages in the world such as Bahasa Indonesia. Therefore a research is required to be done to make a series of approximation to Bahasa Indonesia with a method which has been used by Shannon in forming the series of approximation to English. A previous research by Shannon has produced a series of approximation to English. The same method was also used in this research. Research done by these following steps. The first step is data collection process of letter probabilities, digram probabilities, trigram probabilities, word probabilities, and digram word probabilities of Bahasa Indonesia by using samples of 2 books and 2 articles. Opportunities for character data, digram and trigram done with 3 variations of one paragraph, half page and one page of the article. Then the next stage is designing the program that consists of 6 stages, zero, first, second, and third order letter approximation level, and first and second order word approximation level, then at last step is analyzing the results obtained. Based on the result of the series of approximation to Bahasa Indonesia, at each series length, the emergence of Indonesian words increases as the existing level increase.The emergence of Indonesian words also increases at every increase of series length except for zeroth order approximation. In the simulation of series length influence ,the emergence of a minimum value of occurrence the Indonesian word is 0% and the maximum value of occurrence the Indonesian word is 44.70899471%. Effect of using different database, occurrence of the word is the difference in value of occurrence the Indonesian word and generated Indonesian words itself, generated Indonesian word suitable with the used source of database. Effect of variation in database source of one paragraph, a half page and one page is the Indonesian word emergence increases as the existing source database increase from a single source of one paragraph, a half page and one page, this applies to both articles. Keywords: Shannon Method, Series Approximation, Bahasa Indonesia.
Item Type: | Thesis (Undergraduate) |
---|---|
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisions: | Faculty of Engineering > Department of Electrical Engineering Faculty of Engineering > Department of Electrical Engineering |
ID Code: | 32069 |
Deposited By: | INVALID USER |
Deposited On: | 20 Dec 2011 14:19 |
Last Modified: | 20 Dec 2011 14:19 |
Repository Staff Only: item control page