Hoax Detection System on Indonesian News Sites Based on Text Classification using SVM and SGD

Prasetijo, Agung B. and Isnanto, R.Rizal and Eridani, Dania and Soetrisno, Yosua A.A. and Arfan, Muhammad and Sofwan, Aghus Hoax Detection System on Indonesian News Sites Based on Text Classification using SVM and SGD. Proc. of 2017 4th Int. Conf. on Information Tech., Computer, and Electrical Engineering (ICITACEE) .

[img]
Preview
PDF - Published Version
599Kb

Abstract

A deliberate falsehood intentionally fabricated to appear as the truth, or often called as hoax (hocus to trick) has been increasing at an alarming rate. This situation may cause restlessness/anxiety and panic in society. Even though hoaxes have no effect on threats, however, new perceptions can be spread that they can affect both the social and political conditions. Imagery blown from hoaxes can bring negative effects and intervene state policies that may decrease the economy. An early detection on hoaxes helps the Government to reduce and even eliminate the spread. There are some system that filter hoaxes based on title and also from voting processes from searching processes in a search engine. This research develops Indonesian hoax filter based on text vector representation based on Term Frequency and Document Frequency as well as classification techniques. There are several classification techniques and for this research, Support Vector Machine and Stochastic Gradient Descent are chosen. Support Vector Machine divides a word vector using linear function and Stochastic Gradient Descent divides a word vector using nonlinear function. SVM and SGD are chosen because the characteristic of text classification includes multidimensional matrixes. Each word in news articles can be modeled as feature and with Linear SVC and SGD, the feature of word vector can be reduced into two dimensions and can be separated using linear and non-linear lines. The highest accuracy obtained from SGD classifier using modified-huber is 86% over 100 hoax and 100 nonhoax websites which are randomly chosen outside dataset which are used in the training process.

Item Type:Article
Subjects:T Technology > Computer engineering. Embedded system. Network. Softwares. Robotics. Multimedia
Divisions:Faculty of Engineering > Department of Electrical Engineering
Faculty of Engineering > Department of Electrical Engineering

Faculty of Engineering > Department of Computer System
Faculty of Engineering > Department of Computer System
ID Code:69088
Deposited By:Ms Melati mt
Deposited On:29 Jan 2019 08:36
Last Modified:15 May 2019 15:28

Repository Staff Only: item control page