Kaufman, K. and Michalski, R.S., "The Development of the Inductive Database System VINLEN: A Review of Current Research," International Intelligent Information Processing and Web Mining Conference, Zakopane, Poland, 2003.
Cervone, G., Michalski, R.S. and Kaufman, K. "CAG1--A Program for Generating Concept Association Graphs," Reports of the Machine Learning and Inference Laboratory, George Mason University, Fairfax, VA, 2003 (to appear).
Michalski, R.S. and Kaufman, K., "Learning Patterns in Noisy Data: The AQ Approach," in Paliouras, G., Karkaletsis, V. and Spyropoulos, C. (eds.), Machine Learning and its Applications, Springer-Verlag, pp. 22-38, 2001.
Glowinsky, C. and Michalski, R.S., "Discovering Multi-head Attributional Rules in Large Databases," Tenth International Symposium on Intelligent Information Systems, Zakopane, Poland, 2001.
Michalski, R.S. and Kaufman, K.A., "The AQ19 System for Machine Learning and Pattern Discovery: A General Description and User's Guide," Reports of the Machine Learning and Inference Laboratory, MLI 01-2, George Mason University, Fairfax, VA, 2001.
Kaufman, K.A. and Michalski, R.S., "A Knowledge Scout for Discovering Medical Patterns: Methodology and System SCAMP," Proceedings of the Fourth International Conference on Flexible Query Answering Systems, FQAS'2000, Warsaw, Poland, pp. 485-496, 2000.
Michalski, R.S. and Kaufman, K., "Building Knowledge Scouts Using KGL Metalanguage," Fundamenta Informaticae, Vol. 40, pp. 433-447, 2000.
Current research is concerned with the development of methodology and an implementation of an experimental system, VINLEN, that will seamlessly integrate advanced inductive learning capabilities with an SQL-accessible database. A user can invoke VINLEN’s capabilities via a sophisticated visual interface, and the knowledge query language (KQL) that integrates SQL with knowledge generation operators. The aim of the VINLEN project is to provide the user with an integrated system for pattern discovery, knowledge mining, inference, and decision support. VINLEN will also serve as an educational aid for teaching principles of data mining, and developing of modern advisory systems. .
This project has also motivated us to introduce a new Ph.D. concentration, "Computational Intelligence and Knowledge Mining," for which research and development of advanced inductive databases is one of the major topics (see http://www.mli.gmu.edu/cikm.html). The new concentration has been approved by the faculty of the School of Computational Sciences, and has already attracted the interest of over 40 students from diverse academic backgrounds who submitted applications to conduct Ph.D. research in this area. To serve the needs of students in this concentration area, the PI has developed two new courses, "Data Mining and Knowledge Discovery,” and “Principles of Knowledge Mining.” Students taking these courses are offered, among others, class projects on topics related to inductive databases. Recently, one more related course has been developed by the PI: “Computational Learning and Discovery,” which is currently in the process of review by the Curriculum Committee.
This project has also attracted researchers at the Institute of Computer Science at the Polish Academy of Sciences, have collaborated with us on this project, culminating in their sending a student to GMU to participate in VINLEN development.
We are exploring the prospects of applying this technology in social sciences and areas of computer security.
To this end, we are developing a new type of database operators, called knowledge generation operators (KGOs), and integrating them within a database language. A KGO takes a selection of data from the database, and possibly some prior knowledge or constraints represented in an associated knowledge base, and generates new knowledge. Knowledge generation operators (plus conventional database operators and deductiv inference operators) are invoked through a knowledge query language (KQL) that is used to define knowledge scouts. Knowledge scouts are in the form of KQL scripts that guide processes in deriving knowledge of interest to a given user, or a class of users. A script includes a plan of operations to be performed on a database (or multiple databases), and a target knowledge specification, which abstractly characterizes the knowledge of interest to the user. A script can be a program to be executed upon a user's request, or a "live" software agent that runs continuously in the background, and outputs its findings whenever an alert-user criterion is satisfied.
Among the knowledge generation operators that have been studied and developed for incorporation into VINLEN are operators for generating multihead rules, that is, rules with multiple consequents, and for visualizing knowledge using concept association graphs. See the references for further information on these topics. We have also enhanced rule learning and conceptual clustering programs for use within VINLEN.
Outcomes of this research will include a methodology for building inductive databases, and a prototype inductive database (IDB) system. IDB will integrate a standard database language (SQL) with a knowledge query language, and will work with several widely-available relational database system (ORACLE, Access and Paradox are currently supported). The system will support, in a seamless fashion, all standard database operations, as well as several novel operators for knowledge discovery, manipulation, inference, and visualization.
Michalski, R.S. and Kaufman, K.A., "Data Mining and Knowledge Discovery: A Review of Issues and a Multistrategy Approach," in Michalski, R.S., Bratko, I. and Kubat, M. (Eds.), Machine Learning and Data Mining: Methods and Applications, London: John Wiley & Sons, pp. 71-112, 1998.