GRAPH-BASED CONCEPT DISCOVERY IN MULTI RELATIONAL DATA


Kavurucu Y., MUTLU A., Ensari T.

6th International Conference on Cloud System and Big Data Engineering (Confluence), Noida, India, 14 - 15 January 2016, pp.274-278 identifier identifier

Abstract

Developments in technology, especially in computer science created the need of storing data in variety of areas. This need created the term database where the data is stored in a useful form. In the database, data is logically integrated in file/files according to relations among them. One of the important issues is to extract knowledge from these databases that hold data in a useful and complete form. This process is called as data mining. The main objective of data mining is to extract implicit and useful knowledge from huge and at first glance meaningless mass of data that is stored in database(s). Multi-Relational databases are the ones in which the data is stored in multiple tables (relations). The relationships between those tables are also stored as tables (relations) in the database. The more effective and commonly known approaches for Multi-Relational Data Mining (MRDM) are based on Inductive Logic Programming (ILP). ILP contains concepts from Inductive Learning and Logic Programming. From this point, the main purpose of MRDM is extracting implicit and trivial knowledge from relational database(s) using ILP approaches and techniques. In this approach, data is represented in graph structures and graph mining techniques are used for knowledge discovery. Concept discovery in multi-relational data mining aims to find relational rules that best describe a relation, called target relation, in terms of other relations in the database, called background knowledge. In this study, a graph-based concept discovery method for concept discovery is presented. The proposed method, namely G-CDS (Graph-based Concept Discovery System), utilizes methods both from substructure-based and path-finding based approaches, hence it can be considered as a hybrid method. G-CDS generates disconnected graph structures for each target relation and its related background knowledge, which are initially stored in a relational database, and utilizes them to guide generation of a summary graph. The summary graph is traversed to find concept descriptors. A set of experiments is conducted on datasets that belong to different learning problems. The experimental results show that G-CDS is capable of learning definitions of target relations that belong to different learning problems.