inductive databases and knowledge scouts

Supported by National Science Foundation Grant No. IIS-9906858

(Michalski, Kaufman, Pietrzykowski, Sniezynski, Wojtusiak, Sharma, Seeman, Fischthal, Alkharouf, White, Draminski, Glowinski)

The objectives of this research are to develop, implement, and test a methodology for building inductive databases, which extend conventional databases by integrating in them inductive inference capabilities. These capabilities allow a database to answer queries that require synthesizing plausible knowledge. Such knowledge is not directly or deductively obtainable from the database, but can be hypothesized through inductive inference. This knowledge may be in the form of hypotheses about future datapoints, likely consequences from the data, generalized data summaries, emerging global patterns, exceptions from hypothesized patterns, suspected errors and implied inconsistencies, hypothetical plans synthesized from the data, etc.

These capabilities are obtained by implementing a new type of database operators that are based on methods for inductive inference developed in the fields of machine learning and approximate reasoning. These operators, together with conventional DB operators, are integrated into a knowledge generation language, which allows a user to create scripts for synthesizing desirable knowledge (target knowledge). A script includes a plan of operations to be performed on a database, and an abstract definition of target knowledge. A script can run continuously in the background, and outputs its findings when an alert criterion is satisfied or on the user’s request. As inductively derived knowledge normally has lower certainty than directly or deductively obtained knowledge, results of inductive queries are annotated by a certainty measure.

An inductive database can be used to build knowledge scouts, which are specialized agents operating on a system of databases (e.g., one or more distributed temporal databases, web, etc.). Their function is to synthesize and manage knowledge that is tailored to a specific user or a defined group of users. During the course of its existence, a knowledge scout builds a model of interests and experiences of the user, and employs that model in synthesizing the target knowledge (e.g., builds a data summary on a specific topic, generates a personal travel plan, etc.). Our initial efforts toward the development of the concept of an inductive database have resulted in a preliminary system, which integrates a database with a simple knowledge base and several machine learning and inference programs. The system includes a preliminary knowledge generation language (KGL-1) for creating simple scripts for building knowledge scouts.

A simple knowledge scout based on these principles was experimentally implemented for problems of determining multidimensional patterns in a medical database (click here to download the paper describing this work).

Current research has focused methodologies for on building an inductive database system that includes domain-specific knowledge systems, access to a relational DBMS through SQL, an addressable knowledge base stored in relational tables, a knowledge query language (KQL) for creating and applying knowledge scouts, and a functionally oriented graphical interface for facilitated use. One distinguishing feature of this approach is the storage of knowledge in relational tables along with the data. The hierarchical storage scheme allows for easy access, querying, and manipulation of the knowledge, manually, or through KQL.

In conjunction with this research, we are developing a system VINLEN that makes available these capabilities through an understandable graphical interface representing the knowledge system and its available operators. Among the areas to which VINLEN is to be applied are computer intrusion and misuse detection through user profiling, and the discovery of climatological patterns and relationships.

Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

References

Wojtusiak, J., Michalski, R. S., Kaufman, K. and Pietrzykowski, J., “The AQ21 Natural Induction Program for Pattern Discovery: Initial Version and its Novel Features,” Proceedings of The 18th IEEE International Conference on Tools with Artificial Intelligence, Washington D.C., November 13-15, 2006.

Kaufman, K., Michalski, R. S., Pietrzykowski, J. and Wojtusiak, J., “An Integrated Multi-task Inductive Database and Decision Support System VINLEN: An initial implementation and first results ,” Presented at the 5th International Workshop on Knowledge Discovery in Inductive Databases, KDID’06, in conjunction with ECML/PKDD, Berlin, Germany, September 18, 2006.

Michalski, R. S. and Wojtusiak, J., “Semantic and Syntactic Attribute Types in AQ Learning,” Reports of the Machine Learning and Inference Laboratory, MLI 06-6, George Mason University, Fairfax, VA, 2006.

Kaufman, K., Michalski, R. S., Pietrzykowski, J. and Wojtusiak, J., “The VINLEN Multi-task Inductive Database and Decision Support System: Current Status,” Reports of the Machine Learning and Inference Laboratory, MLI 06-4, George Mason University, Fairfax, VA, 2006.

Seeman, W.D. and Michalski, R. S., “The CLUSTER3 System for Goal-oriented Conceptual Clustering: Method and Preliminary Results,” Proceedings of The Data Mining and Information Engineering 2006 Conference, Prague, Czech Republic, July 11-13, 2006.

Michalski, R.S., Kaufman, K., Pietrzykowski, J., Sniezynski, B. and Wojtusiak, J., “Learning Symbolic User Models for Intrusion Detection: A Method and Initial Results,” Proceedings of the Intelligent Information Processing and Web Mining Conference, IIPWM 06, Ustron, Poland, June 19-22, 2006.

Wojtusiak, J. and Michalski, R.S., “The Use of Compound Attributes in AQ Learning,” Proceedings of the Intelligent Information Processing and Web Mining Conference, IIPWM 06, Ustron, Poland, June 19-22, 2006.

Michalski, R.S., Kaufman, K., Pietrzykowski, J., Wojtusiak, J., Mitchell, S. and Seeman, W.D., “”Natural Induction and Conceptual Clustering: A Review of Applications,” Reports of the Machine Learning and Inference Laboratory, MLI 06-3, George Mason University, Fairfax, VA, June, 2006.

Wojtusiak, J., Michalski, R.S., Kaufman, K. and Pietrzykowski, J., “Multitype Pattern Discovery Via AQ21: A Brief Description of the Method and Its Novel Features,” Reports of the Machine Learning and Inference Laboratory, MLI 06-2, George Mason University, Fairfax, VA, 2006.

Michalski, R.S. and Wojtusiak, J., “Reasoning with Missing, Not-applicable and Irrelevant Meta-values in Concept Learning and Pattern Discovery,” Technical Report 2005-02, Collaborative Research Center 637, University of Bremen, Germany, July 2005.

Szydlo, T., Sniezynski, B. and Michalski, R.S., “A Rules-to-Trees Conversion in the Inductive Database System VINLEN,” Proceedings of the Intelligent Information Processing and Web Mining Conference, Gdansk, Poland, June 13-16, 2005.

Sniezynski, B., Szymacha, R. and Michalski, R.S., “Knowledge Visualization Using Optimized General Logic Diagrams,” Proceedings of the Intelligent Information Processing and Web Mining Conference, Gdansk, Poland, June 13-16, 2005.

Michalski, R.S. and Wojtusiak, J., “Reasoning with Meta-values in AQ Learning,” Reports of the Machine Learning and Inference Laboratory, MLI 05-1, George Mason University, Fairfax, VA, June, 2005.

Kaufman, K. and Michalski, R.S. “Initial Considerations toward Knowledge Mining,” Reports of the Machine Learning and Inference Laboratory, MLI 04-4, George Mason University, Fairfax, VA, October, 2004.

Michalski R.S., “ATTRIBUTIONAL CALCULUS: A Logic and Representation Language for Natural Induction,” Reports of the Machine Learning and Inference Laboratory, MLI 04-2, George Mason University, Fairfax, VA, April, 2004.

Michalski, R.S. and Kaufman, K., “Report on the Project Inductive Databases and Knowledge Scouts, presented at the IDM 2003 Workshop,” Seattle, WA, September 14-16, 2003.

Kaufman K. and Michalski R.S., “The Development of the Inductive Database System VINLEN: A Review of Current Research,” International Intelligent Information Processing and Web Mining Conference, Zakopane, Poland, 2003.

Michalski, R.S. and Kaufman, K., “Report on the Project Inductive Databases and Knowledge Scouts, presented at the IDM 2002 Workshop,” Arlington, VA, May 5-7, 2002.

Michalski, R.S. and Kaufman, K., “Learning Patterns in Noisy Data: The AQ Approach,” in Paliouras, G., Karkaletsis, V. and Spyropoulos, C. (eds.), Machine Learning and Applications, Springer-Verlag, pp. 22-38, 2001.

Michalski, R.S. and Kaufman, K., “Report on the Project Inductive Databases and Knowledge Scouts, presented at the IDM 2001 Workshop,” Fort Worth, TX, April 29 – May 1, 2001.

Michalski, R.S. and Kaufman, K.A., “ The AQ19 System for Machine Learning and Pattern Discovery: A General Description and User’s Guide,” Reports of the Machine Learning and Inference Laboratory, MLI 01-2, George Mason University, Fairfax, VA, 2001.

Kaufman, K.A. and Michalski, R.S., “A Knowledge Scout for Discovering Medical Patterns: Methodology and System SCAMP,” Proceedings of the Fourth International Conference on Flexible Query Answering Systems, FQAS’2000, Warsaw, Poland, pp. 485-496, October 25-28, 2000.

Michalski, R.S. and Kaufman, K., “Building Knowledge Scouts Using KGL Metalanguage,” Fundamenta Informaticae 40, pp. 433-447, 2000.

Michalski, R.S. and Kaufman, K.A., “A Measure of Description Quality for Data Mining and its Implementation in the AQ18 Learning System,” Proceedings of the ICSC Congress on Computational Intelligence Methods and Applications (CIMA-99), Rochester, NY, pp. 369-375, June, 1999.

Michalski, R.S. and Kaufman, K.A., “Discovering Multidimensional Patterns in Large Datasets Using Knowledge Scouts,” Reports of the Machine Learning and Inference Laboratory, MLI 99-7, George Mason University, Fairfax, VA, June, 1999.

Kaufman, K.A. and Michalski, R.S., “Learning from Inconsistent and Noisy Data: The AQ18 Approach,” Proceedings of the Eleventh International Symposium on Methodologies for Intelligent Systems, Warsaw, pp. 411-419, June 8-11, 1999.

Kaufman, K.A. and Michalski, R.S., “Multistrategy Data Mining via the KGL Metalanguage,” Proceedings of the Seventh Symposium on Intelligent Information Systems (IIS’98), Malbork, Poland, pp. 39-48, June 15-19, 1998.

Kaufman, K.A. and Michalski, R.S., “Discovery Planning: Multistrategy Learning in Data Mining,” Proceedings of the Fourth International Workshop on Multistrategy Learning (MSL’98), Desenzano del Garda, Italy, June 11-13, 1998.

Michalski, R.S. and Kaufman, K.A., “Data Mining and Knowledge Discovery: A Review of Issues and a Multistrategy Approach,” in Michalski, R.S., Bratko, I. and Kubat, M. (Eds.), Machine Learning and Data Mining: Methods and Applications, London: John Wiley & Sons, pp. 71-112, 1998.

Michalski, R.S., “Seeking Knowledge in the Deluge of Facts,” Fundamenta Informaticae, Vol. 30, pp. 283-297, 1997.

Kaufman, K.A. and Michalski, R.S., “KGL: A Language for Learning,” Reports of the Machine Learning and Inference Laboratory, MLI 97-3, George Mason University, Fairfax, VA, 1997.

Kaufman, K. and Michalski, R.S., “A Method for Reasoning with Structured and Continuous Attributes in the INLEN-2 Knowledge Discovery System,” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, August, 1996, pp. 232-237.

Michalski, R.S., Kerschberg, L., Kaufman, K.A. and Ribeiro, J.S., “Mining For Knowledge in Databases: The INLEN Architecture, Initial Implementation and First Results,” Intelligent Information Systems: Integrating Artificial Intelligence and Database Technologies, Vol. 1, No. 1, pp. 85-113, August 1992.

Michalski, R.S., “Searching for Knowledge in a World Flooded with Facts,” an invited talk, Proceedings of the Fifth International Symposium on Applied Stochastic Models and Data Analysis, Granada, Spain, April 23-26, 1991.

For more references, see publications section.