Relevance Mapping of Wikipedia Edits Using Semantic Web Concepts
Abstract
The Semantic Web gives a well-defined meaning to the information that enables computers and people to work efficiently in cooperation [1]. It provides enhanced information access based on the machine-processable meta-data. It is an exciting new type of hierarchy and standardization that will replace the current "web of links" with a "web of meaning" [2]. Wikipedia is a massive free online encyclopedia with the concept of "wiki". Wikipedia lets everyone in. Anyone with an Internet connection can become an editor, which increases the frequency of vandalism. Due to the volume of edits, it is crucial to examine the nature of these
progressions and help keep up the trustworthiness of Wikipedia articles. In this thesis, an effective way to categorize the nature of these changes is presented. Both simple and ontology-based web crawlers were studied. Use of ontological hierarchy, cognitive synonyms or synsets and concept based ranking helped in the calculation of relevance of edits [3]. A shallow, cross-domain ontology, called DBpedia, which has been manually created based on the most commonly used infoboxes within Wikipedia, was chosen for this study. This empirical study is focused on calculating the relevance of a Wikipedia edit by crawling the web while keeping into consideration the metadata of the edit, ontological hierarchy of the article, and synonyms of the extracted keyword. The goal of this thesis is to increase the accuracy of the information provided by Wikipedia and make it more reliable by following a multithreaded ontological approach to model a domain of interest that guides the crawler to the relevant information on the semantic web.