I Motivation and Objectives
Due to a phenomenal growth of computer-based information systems and their global interconnectivity, all areas of science and technology are experiencing a severe information overload. The accelerating expansion of the Internet and related technologies will undoubtedly make this problem more severe in the future. In this context, the development of an educational and research program concerning methods and tools for assisting scientists and other professionals in an effective extraction of problem-oriented knowledge from diverse and massive information sources, and for using this knowledge in problem solving emerges as one of the most fundamental research directions for computational sciences.
To illustrate the significance of this problem, note, for example, that the amount of data to be generated throughout this decade by NASA and Earth observing platforms, such as Terra, is expected to be several terabytes per day. This amount of data will occupy several petabytes in governmental data repositories and its analysis will constitute an immense challenge for the current data mining and knowledge extraction technologies. In the area of physics, CERN’s particle collider alone generates each year a petabyte of data. Similar challenges are emerging in many other areas, such as earth sciences, biochemistry, bioinformatics, economics, business, finance, medicine, astronomy, agriculture, and defense. Addressing these challenges in their full scope will require a synergistic application of ideas and methods from multiple disciplines, including data mining, machine learning and inference, data science and statistics, computational intelligence (computational aspects of artificial intelligence), databases and information systems, visualization, natural language processing, image understanding, and other.
To educate students and train experts in modern methods in this interdisciplinary area, the School of Computational Science has developed a new Ph.D. concentration entitled "Computational Intelligence and Knowledge Mining,” along with corresponding research activities and projects. The name of the track was chosen to reflect an interdisciplinary and novel character of this program. Its aim is to integrate different methods and technologies, such as those based on analytic algorithms, computational statistics and data visualization with those based on machine learning, symbolic reasoning and other relevant areas of computational intelligence. The term "knowledge mining," which is closely related to the currently popular term “data mining and knowledge discovery,” is used to emphasize future-oriented methodologies that generate knowledge through inference from data and knowledge, rather than from data only, and present the generated knowledge in user-oriented forms. That is, up to the limits of the inherent complexity, the knowledge representations are designed to be in the forms in which experts might create and use them, and by that will facilitate human understanding and cognitive interpretation.
There is a rapidly growing interest in topics covered by the proposed program and a great demand for experts by industry as well as by academic institutions. Universities around the world are currently developing courses, degrees, and research centers concerning this area. For example, Carnegie Mellon University has recently established a new Center for Automated Learning and Discovery and the University of Wisconsin established a Data Mining Institute. Three major conferences are now being held annually on these topics, one in US, one in Europe, and one in Asia. Is it believed that this track will be attractive to many students and will help to strengthen research activities and interdisciplinary collaborations among GMU faculty, and will facilitate links to industry, governmental laboratories and other academic institutions.
II
The Program of Study
The SCS Ph.D. program requires from a student 12 hours of core courses, 12 hours in one of the science areas, 12 hours in electives from science courses, 9 hours from general electives, 3 hours of colloquia or seminars, and 24 hours of dissertation research. A Ph.D. student doing thesis research in the area of computational intelligence and knowledge mining may concentrate primarily on the development and testing of new methodologies and tools in this area, or on the application of such methodologies and tools to one of the scientific areas of interest to the School Computational Sciences.
CSI 763: Statistical Methods in Space Sciences
A modification of this program may be made in special cases to reflect the background and the professional or education experience of a particular student.
III Research Goals and Topics in this Area
Research problems in the proposed track concern the development of theories, methods and systems that support scientists and other computer users in analyzing large volumes of data in order to derive from it new, useful knowledge and in judiciously employing obtained knowledge in solving problems of interest. The data may be available in various forms, such as numerical and symbolic databases, web sites, image databases, document information systems, human testimony, or multimedia.
Such a process involves not only the data being analyzed but also domain and commonsense knowledge related to the topic of interest. The results of such knowledge mining should be presented in user-oriented forms, easy to understand and interpret. Therefore, relevant research topics include not only those dealing directly with knowledge extraction from data, but also those concerned with developing and maintaining large databases and knowledge bases, reasoning about data and knowledge, developing methodologies for machine learning and inference, and for representing knowledge in the forms that are simple and easy to interpret and understand (visualization and generation of natural language descriptions).
Efforts in this direction will draw upon ideas and methods from a variety of disciplines, such as data mining and knowledge discovery, machine learning and inference, statistical data analysis, databases and information systems, data and knowledge visualization, evolutionary computation, computer inference, image processing, and related areas of computational intelligence. The information science-related disciplines need to be interfaced with scientific knowledge in various sciences, such as biochemistry, physics, biology, business, economics, finance, and others. A Ph.D. student conducting thesis research in this area may concentrate primarily on the development of new methodologies and tools or on the application of such methodologies and tools to a selected scientific area.
The development of this area of research involve fostering a collaboration between research GMU units already engaged in closely related activities, in particular, the Center for Computational Statistics, the Center for Earth Observing and Space Research and the Machine Learning and Inference Laboratory, as well as interested faculty members in IT&E, IB3, Krasnow Institute, and other GMU units.
IV IV Relationship to Other Ph.D. Programs at GMU
This interdisciplinary area is significantly different from other SCS Ph.D. concentration areas, as well as other GMU Ph.D. programs. It strengthens and complements the other concentration areas in the SCS doctoral program.
VI Administration of this Concentration Area
This
track has been established within the School of Computational Sciences and its
general administrative management is provided by the Office of the School
Dean. A technical coordinator for the track is PRC Professor R. S. Michalski (michalski@mli.gmu.edu; http://www.mli.gmu.edu/michalski).