Approved by the CSI Executive Council  on April 28, 2000

An Interdisciplinary Ph.D. Concentration Area
Computational Intelligence and Knowledge Mining
in the School of Computational Sciences

 

I Motivation and Objectives

Due to a phenomenal growth of computer-based information systems and their global interconnectivity, all areas of science and technology are experiencing a severe information overload. The accelerating expansion of the Internet and related technologies will undoubtedly make this problem more severe in the future. In this context, the development of an educational and research program concerning methods and tools for assisting scientists and other professionals in an effective extraction of problem-oriented knowledge from diverse and massive information sources, and for using this knowledge in problem solving emerges as one of the most fundamental research directions for computational sciences.

To illustrate the significance of this problem, note, for example, that the amount of data to be generated throughout this decade by NASA and Earth observing platforms, such as Terra, is expected to be several terabytes per day. This amount of data will occupy several petabytes in governmental data repositories and its analysis will constitute an immense challenge for the current data mining and knowledge extraction technologies. In the area of physics, CERN’s particle collider alone generates each year a petabyte of data. Similar challenges are emerging in many other areas, such as earth sciences, biochemistry, bioinformatics, economics, business, finance, medicine, astronomy, agriculture, and defense. Addressing these challenges in their full scope will require a synergistic application of ideas and methods from multiple disciplines, including data mining, machine learning and inference, data science and statistics, computational intelligence (computational aspects of artificial intelligence), databases and information systems, visualization, natural language processing, image understanding, and other.

To educate students and train experts in modern methods in this interdisciplinary area, the School of Computational Science has developed a new Ph.D. concentration entitled "Computational Intelligence and Knowledge Mining,” along with corresponding research activities and projects. The name of the track was chosen to reflect an interdisciplinary and novel character of this program. Its aim is to integrate different methods and technologies, such as those based on analytic algorithms, computational statistics and data visualization with those based on machine learning, symbolic reasoning and other relevant areas of computational intelligence. The term "knowledge mining," which is closely related to the currently popular term “data mining and knowledge discovery,” is used to emphasize future-oriented methodologies that generate knowledge through inference from data and knowledge, rather than from data only, and present the generated knowledge in user-oriented forms. That is, up to the limits of the inherent complexity, the knowledge representations are designed to be in the forms in which experts might create and use them, and by that will facilitate human understanding and cognitive interpretation.

There is a rapidly growing interest in topics covered by the proposed program and a great demand for experts by industry as well as by academic institutions. Universities around the world are currently developing courses, degrees, and research centers concerning this area. For example, Carnegie Mellon University has recently established a new Center for Automated Learning and Discovery and the University of Wisconsin established a Data Mining Institute. Three major conferences are now being held annually on these topics, one in US, one in Europe, and one in Asia.  Is it believed that this track will be attractive to many students and will help to strengthen research activities and interdisciplinary collaborations among GMU faculty, and will facilitate links to industry, governmental laboratories and other academic institutions.

II The Program of Study

The SCS Ph.D. program requires from a student 12 hours of core courses, 12 hours in one of the science areas, 12 hours in electives from science courses, 9 hours from general electives, 3 hours of colloquia or seminars, and 24 hours of dissertation research. A Ph.D. student doing thesis research in the area of computational intelligence and knowledge mining may concentrate primarily on the development and testing of new methodologies and tools in this area, or on the application of such methodologies and tools to one of the scientific areas of interest to the School Computational Sciences.

Core courses (12 credit hours):

    CSI 700: Numerical Methods

      CSI 701: Foundations of Computational Sciences

      CSI 703: Scientific and Statistical Visualization

      CSI 710: Scientific Databases

Scientific core (15 credit hours):

     CSI 709: Selection: Data Mining and Knowledge Discovery

     CSI 771/STAT 571: Computational Statistics

      CSI 773/STAT 663: Statistical Graphics and Data Exploration

      CSI 777: Pronciples of Knowledge Mining

      CSI 873: Computational Learning and Discovery

Subject electives:

    CSI Bioinformatics courses: Genomics, Protein-folding

      CSI 753:Observations of the Earth and its Climate

      CSI 754: Earth Observation-remote sensing Data and Data Systems

      CSI 763: Statistical Methods in Space Sciences

      INFT 842: Alternative Systems of Probabilistic Reasoning

      CSI 854: Computing and Communications Systems for Earth Observing

      CSI 979/INFT 910: Advanced Topics in Artificial Intelligence (new)

      INFT 867: Intelligent Databases

      INFT 944: The Process of Discovery and its Enhancement in Engineering

Supporting courses:

      CS 580: Introduction to Artificial Intelligence

      INFS 614: Database Management

      SCS 601-607: Computational Tools

      SCS 610: Introduction to Computational Sciences

      SCS 672: Foundations of Computational Intelligence (new)

      INFT 819: Computational Models of Probabilistic Inference

A modification of this program may be made in special cases to reflect the background and the professional or education experience of a particular student.

III Research Goals and Topics in this Area

Research problems in the proposed track concern the development of theories, methods and systems that support scientists and other computer users in analyzing large volumes of data in order to derive from it new, useful knowledge and in judiciously employing obtained knowledge in solving problems of interest. The data may be available in various forms, such as numerical and symbolic databases, web sites, image databases, document information systems, human testimony, or multimedia. 

Such a process involves not only the data being analyzed but also domain and commonsense knowledge related to the topic of interest. The results of such knowledge mining should be presented in user-oriented forms, easy to understand and interpret. Therefore, relevant research topics include not only those dealing directly with knowledge extraction from data, but also those concerned with developing and maintaining large databases and knowledge bases, reasoning about data and knowledge, developing methodologies for machine learning and inference, and for representing knowledge in the forms that are simple and easy to interpret and understand (visualization and generation of natural language descriptions).

Efforts in this direction will draw upon ideas and methods from a variety of disciplines, such as data mining and knowledge discovery, machine learning and inference, statistical data analysis, databases and information systems, data and knowledge visualization, evolutionary computation, computer inference, image processing, and related areas of computational intelligence. The information science-related disciplines need to be interfaced with scientific knowledge in various sciences, such as biochemistry, physics, biology, business, economics, finance, and others. A Ph.D. student conducting thesis research in this area may concentrate primarily on the development of new methodologies and tools or on the application of such methodologies and tools to a selected scientific area.

The development of this area of research involve fostering a collaboration between research GMU units already engaged in closely related activities, in particular, the Center for Computational Statistics, the Center for Earth Observing and Space Research and the Machine Learning and Inference Laboratory, as well as interested faculty members in IT&E, IB3, Krasnow Institute, and other GMU units.

IV   IV  Relationship to Other Ph.D. Programs at GMU

This interdisciplinary area is significantly different from other SCS Ph.D. concentration areas, as well as other GMU Ph.D. programs. It strengthens and complements the other concentration areas in the SCS doctoral program.

V Related Programs at Other Universities

 

      VI  Administration of this Concentration Area

This track has been established within the School of Computational Sciences and its general  administrative management is provided by the Office of the School Dean.  A technical coordinator for the track is PRC Professor R. S. Michalski (michalski@mli.gmu.edu; http://www.mli.gmu.edu/michalski).