Basic Services

Import Service: This service transforms external data into information entities respectively their properties which are then made accessible to other services by the above mentioned common data access.

Information Extraction Service: Information extraction makes implicit information stored in unstructured text computer processable. Examples for such information can be people’s names, places, or organizations. Information extraction results provide additional faceted dimensions for search results and can be used to discover implicit relationships between information entities.

Feature Extraction Service: The task of this service is to generate meaningful features from annotations and metadata. It supports different common weighting and feature selection schemes. Features can be real valued, nominal or boolean and are managed via a set of vectors.

Information Retrieval Services: Finding information is an important task in projects. The results of information retrieval queries are often shown in lists ranked by relevance. Each result comes along with some metadata such as a snippet. This gives users the option of multiple views on the search result set. In our framework an index service is used to store the content of information entities in a structure appropriate for searching.

Association service: An association service is used to retrieve synonyms to a given term. These synonyms can be used during query specification to provide keyword suggestions to users. For instance, having a search field for countries and one for cities allows searching for a country and retrieving associated cities.

Summarization Service: This service extracts metadata from a set of information entities by examining the coherence between a set of entities, for example a keyword extraction service. First, for each document a feature vector is generated. These vectors are weighted using a TF/IDF algorithm. This favours terms that occur frequently in the examined document and weakens terms that occur frequently in many documents since they have not much information entropy.

Clustering-Service: Clustering service identifies groups of related information entities such as documents which are represented by high-dimensional vectors. Relatedness between any pair of entities is expressed by computing the distance between the corresponding vectors. The resulting low-dimensional coordinates of projected entities are used by visualization components designed for visual analysis and interactive exploration of large, high-dimensional data sets. As of 2008 the service has been successfully applied on large, high dimensional corpora containing up to one million text documents.

Classification Service: Classification is a supervised learning approach for assigning documents to a given set of concepts. For example classification can be used to distinguish between spam and no spam mail. First, the classifier is trained on a training document set containing information to which concepts the documents belong to. Afterwards, new documents can be assigned automatically to the concepts. The classifier uses features from the features service as input and returns the concept association suggestions together with a confidence value.