
The Intelligent Data Analysis and Graphical Models Research Unit investigates and develops methods for intelligent data analysis and probabilistic and fuzzy reasoning. It focuses its research on probabilistic methods (for example, probabilistic graphical models and Bayesian and expectation maximization clustering), possibilistic and fuzzy methods (for example, possibilistic graphical models and fuzzy clustering), and frequent pattern mining methods (for example, frequent item set and sequence mining as well as frequent subgraph mining).
Strong emphasis is placed on open source implementations of the developed methods, with the objective of integrating them under a graphical user interface. This user interface also provides preprocessing and visualization modules, so that they are easily available in industrial applications as well as easily extendable by user-programmed functionality.
The computer-aided analysis of molecular databases plays an increasingly important role in drug discovery as well as compound synthesis prediction. One of its most prominent goals is to find discriminative substructures of molecules, which are frequent in the set of (already known) active molecules, but rare in the set of (already known) inactive molecules and thus discriminate between the two classes. The rationale underlying this approach is that the discriminative fragments may be the key structures that determine whether a molecule is active or not (or can be synthesized or not).
The research unit works on methods to find frequent (approximate) substructures in molecules in order to help biochemists to identify promising drug candidates and effective substructures. Earlier work in this direction led to the MoSS/MoFa algorithm and its various extensions, which has been implemented in Java and has been applied successfully to several freely available molecular data sets.
Future research is planned to extend the handling of wildcard atoms and thus to allow for more flexible approximate matching. In addition, other properties of molecules than just the connection structure (for example, the 3D structure and binding angles, charge distribution, solubility etc.) have to be taken into account in order to make the output more useful for (bio)chemists.
Frequent item set mining is an active area of research in which a large number of algorithms have been developed. The research unit focuses on finding approximate frequent item sets and sequences in noisy and unreliable data. Algorithms for this task have applications in analyzing alarm sequences in telecommunication networks.
The core idea of finding approximate frequent item sets is to allow for certain editing operations on the transactions of the database to mine (for example, insertion, replacement, reordering etc.) In this way frequent patterns can be found that otherwise would be lost due to noise and lossy transmission of the transaction data. Earlier work along these lines already yielded the relx algorithm, which allows for insertions at user-specified costs.
Fuzzy clustering and expectation maximization often show superior performance compared to classical crisp clustering algorithms. Especially the more sophisticated variants, which allow for shape and size parameters, can find cluster structures that are difficult to capture with classical methods, which are restricted to spherical and equally sized clusters. However, this comes at the price of higher execution times and lower robustness.
The research unit works on accelerating the clustering process, improving the robustness of more sophisticated algorithms (while still allowing cluster shapes and sizes, but constraining them with different regularization approaches) and on methods to determine the number of clusters (especially resampling based methods as they are currently the most promising approach).
Los modelos gráficos proporcionan unos medios excelentes para estructurar y representar el conocimiento necesario para propósitos de planificación, por ejemplo, la planificación de la demanda de una pieza para la producción en situaciones de interacción técnica entre las piezas. En esta área la unidad de investigación trabaja conjuntamente con ISC Gebhardt, que es responsable del desarrollo e implementación del sistema de planificación de la demanda de Volskwagen.
La unidad de investigación trabaja en el aprendizaje de (partes de) modelos gráficos a partir de datos históricos (bajo restricciones especificadas por reglas técnicas y de marketing) así como en la revisión del conocimiento y la identificación y eliminación de inconsistencias en modelos gráficos.
Graphical models have a long tradition of being used for diagnosis purposes, because they are one of the best-founded and most consistent approaches to handling uncertainty about system states and their dependences. However, methods and tools for the (semi-)automatic construction of a diagnosis system based on a graphical model from a technical description of the system are still missing.
The research unit focuses on applying graphical models to identify so called "soft faults" (that is, deviations from nominal values) in analog electrical circuits and other technical devices. Earlier work in this direction produced some highly initial ideas for the diagnosis of analog electrical circuits, but several challenging problems remain to be solved.
In order to make the developed methods easily usable for non-experts, the research unit strives to implement at least all data analysis methods under a graphical user interface that is based on data streams (pipes and filters architecture). A prototypical implementation already exists and will be improved and extended by the members of the research team.
European Centre for Soft Computing (c) 2009
Edificio Científico-Tecnológico. 3ª Planta.
Calle Gonzalo Gutiérrez Quirós S/N 33600 Mieres, Asturias.
Teléfono: +34 985456545