Research

Grants


Multimedia Digital Libraries

The overall aim of the project is to facilitate visual and text-based access approaches to digital libraries through novel browsing, search and visualisation paradigms in additional to the traditional text search approach.

Project interactive approaches will focus on general users with a straightforward retrieval paradigm that requires the minumum of retrieval effort from the user. As the majority of text searches are composed of 2-3 keywords this is an important consideration.

The digital content used in the project is composed of television news, newspaper archives, museum photos and personal digital photos. Material is supplied by the British library, the Victoria & Albert Museum, BBC, University of Waikato and Imperial College London.

Project Objectives

The project will showcase novel search and browsing engines allowing to search multimedia collections by alternative ways, eg, image similarity, as well as textual metadata which may or may not be present.


Project Participants

Digital Content


The UK Multimedia Knowledge Management Network

The UK Multimedia Knowledge Management Network consists of research teams from seven UK universities who work in this new interdisciplinary field. The aim of this network is to enhance communication between the experts in both academia and industry, and to maintain shared resources for the direct benefit of the research community. The network is hosted at and maintained by Imperial College London.

Project Objectives


Relevant research topics within Multimedia Knowledege Management include, but are not limited to, multimedia analysis, indexing, storage and delivery, needs elicitation and analysis, retrieval, summarisation, presentation and personalisation, crafting appropriate access environments, capitalisation and evaluation.

Project Participants

Principal Funding Research Groups Industrial Partners


back to top

Finished grants


NSF-EU Grant: Cultural Heritage Language Technologies

This is a collaborative project to create computational tools for the study of Ancient Greek, Early Modern Latin, and Old Norse texts in a network of affiliated digital libraries. Our contribution at the Department of Computing, Imperial College London, will be the creation of generic document and information visualisation tools.

This research is jointly sponsored by the NSF and the EU from 2002 to 2004.

References:

M Carey, D Heesch and S Rüger: Info Navigator: A visualization tool for document searching and browsing. Proc of Intl Conf on Distributed Multimedia Systems (DMS Sep 2003), 2003

P Au, M Carey, S Sewraz, Y Guo and S Rüger: New paradigms in information visualization. Int'l ACM Information Retrieval Conf (SIGIR, Athens, Greece), pp 307–309, ISBN 1-58113-226-3, Jul 2000

S Sewraz and S Rüger: A visual information-retrieval navigator. European Colloquium on Information Retrieval Research (ECIR, Cambridge, UK), pp 222–231, Apr 2000


The Freedom to Forget: Multimedia Knowledge Management

The research that was carried out during this fellowship was centred around the theme of Multimedia Information Retrieval, ie, video search engines, sketch databases, image databases, spoken document retrieval, music retrieval, query languages and query mediation. The main focus was to explore ways of content-based search, eg, search by image example (not by words matching the associated meta-data or library cards) or finding music pieces by humming. A related challenge is the question to which extent automated annotation and classification of multimedia objects can be made possible.

This project has made a number of contributions to the area of multimedia information retrieval ranging from i) the development and evaluation of simple image features such as texture, colour and shape for images based on psychological, signal-processing and statistical methods; ii) a novel polyphonic symbolic music representation that allows the use of ordinary text search technology such as google's to index and search music repositories by humming; iii) the introduction of novel automated structuring principles such as lateral similarity and search-result clustering that allow the user to browse (sub)collections intuitively; to iv) novel video summarisation schemes that are suitable, eg, for news search engines.

The overriding principle in this research has been the ability to create an easy, intuitive and user-friendly content-based multimedia search engine. To that end a number of research prototypes of music, image and video search engines were successfully developed and integrated into a multimedia search platform. This platform has undergone extensive metric-based evaluation in international collaborative evaluation conferences (such as TRECVID and ImageCLEF) where it has consistently proven to be amongst the top systems worldwide.

The research we have carried out so far during this fellowship has resulted in a well-designed and robust general framework for multimedia searches which lends itself to be deployed in specific application areas. Ultimately, those results are bound to improve searching, browsing, discovery and access in areas such as arts and media through imaginative navigation modes; crime prevention through automated analysis of CCTV footage; intellectual property through detection of trademark duplication or copyright infringement; journalism through content-based image searches and resource discovery, medical diagnosis through finding similar images from a database; and, in general, web repositories, cultural heritage collections and multimedia digital libraries.

This research is sponsored through the award of an EPSRC Advanced Research Fellowship from Oct 1999 to Sept 2004.


Low-cost, efficient, parallel algorithms for musical electronic learning aids

The EPSRC project GR/L 18273 Low-cost, efficient parallel algorithms for musical electronic learning aids was proposed to research, develop, implement and evaluate monophonic and polyphonic music recognition algorithms for use in computerised interactive musical learning systems. Specifically the aim was to develop real-time algorithms for note recognition in monophonic (task 1) and polyphonic (task 3) music. Further and essentially independent tasks were the development of a real-time tune recognition algorithm (task 2) and of an interactive electronic music tutor for a monophonic instrument (task 4).

We are pleased to report that considerable progress has been made, if along lines slightly different from the ones originally outlined in the proposal. Task 1 was completed early in the project and it was shown that this algorithm coped well with monophonic signals. However, the methods suggested in the proposal to extend this method to handle polyphony proved impractical. We were thus forced to return to more fundamental studies of pitch detection algorithms. Substantial theoretical and experimental investigations were carried out into existing algorithms and novel algorithms were developed and implemented that are capable of detecting notes in polyphonic music and which, we believe, represent significant advances over the current state-of-the-art in many aspects. Thus task 3, which was the most difficult fundamental part of the project, was successfully completed.

A two-step approach was adopted which divides the task of note recognition into two subtasks: (A) short-time spectral estimation of the musical signal, resulting in a time-frequency spectrum, and (B) note extraction based on the resulting spectra. Novel approaches have been developed for both the spectral analysis as well as the pattern recognition part of the note identification problem; for the former, the main novelty lies in the use of auto-regressive as opposed to conventional Fourier spectral estimators, for the latter in a combination of data classification methods and a topological approach to note identification which emphasises connectivity patterns in both time and pitch.

The resulting algorithms were coded in Mathematica and successfully tested with digitised recordings of both mono- and polyphonic piano music with up to 3 tones occurring simultaneously. At the time of writing one paper has been published [1], a second one is in preparation which will contain the major part of our results [2], and more technical issues are contained in an as yet unpublished report [3].

References:

[1] T von Schroeter (1998): Frequency Warping with Arbitrary Allpass Maps. IEEE Signal Processing Letters, 6, pp 116-118

[2] T von Schroeter and J Darlington (in preparation): Connectivity in auto-regressive spectra of polyphonic piano music - a topological approach to automated transcription.

[3] T von Schroeter: Auto-regressive spectral line analysis of piano tones, Technical report.


back to top

PhD research


João Magalhães (2004-now): Semantic Multimedia Information

The aim of this research is to enhance multimedia retrieval applications by combining both knowledge and statistical data in a learning framework to extract semantic information from multimedia. We will approach the problem as a Bayesian learning problem divided in three parts: 

  1. Multimedia mining: mines the feature space for the problem’s most common patterns and learns the causality relations between the occurrence of these patterns and keywords.
  2. Multi-modal information fusion: multi-modal features will be combined in a statistical framework to increase the prediction accuracy of keywords in new unseen content. and 
  3. Semantic information extraction: improve the inference results obtained in the previous steps by using knowledge about keywords co-occurrences. 

Peter Howarth (2003-now): Indexing structures for multimedia information retrieval

This project is focussed on issues around the scalability of content-based image retrieval systems. Specifically I am looking at how to efficiently index high dimensional feature vectors.


Alexei Yavlinsky (2003-now): Automated Image Annotation using Invariant Image Statistics

This project is concerned with learning robust statistical models of invariant image properties for automatically annotating unseen images with relevant keywords. These annotations are intended for providing text-based search access to large collections of unlabelled images and videos. We have built a prototype search engine based on these principles.

Keywords: automatic image annotation, automated image annotation, learning image captions, statistics of natural images


back to top

Finished PhD projects


Edward Schofield (2002-2006): Fitting maximum-entropy models on large sample spaces

This PhD project investigated the application of iterative Monte Carlo methods to the problem of parameter estimation for models of maximum entropy, minimum divergence, and maximum likelihood among the class of exponential-family densities. It showed how to apply such models to large domains in which exact computation is not practically possible.

More information


Daniel Heesch (2001-2005): The NNk technique for image searching and browsing

My PhD thesis, which I successfully defended in mid-November 2005, was in the area of image analysis and image retrieval. It addressed the specific problem of how to represent the semantic richness of images and how to learn combinations of visual features that optimally model human perception of similarity. The abstract is given below:

Retrieval of images from large image archives based solely on their visual similarity to a query image provides an exciting alternative to conventional text -based search. For content-based retrieval images are represented in terms of visual features. The question of how to combine these for similarity computati on is typically addressed by eliciting relevance feedback from the user on the retrieved images. We argue in this thesis that the prevailing approach to relevance feedback suffers from three significant shortcomings: firstly, it leaves unsolved the question of how to combine features for the first retrieval; se condly, the advantage of automated content-extraction over manual annotation is greatest for large collections but if the query image is not constrained to come from the indexed collection, content-based retrieval entails imagewise comparisons leading to prohibitive response times; thirdly, users may only have vaguely defined information needs or may change their needs in the course of the interaction. The large majority of relevance feedback techniques are ill-suited for such undirected exploration. We propose a new framework of user interaction that addresses these limitations. It is centred on what we call the NNk idea. The NNk of an image are all those images that are most similar to it under some combination of features. They can be viewed as representatives of the possible semantic facets an image may exhibit to different users. The NNk idea is first applied to the problem of automated retrieval where it suggests a two-step method of relevance feedback that is shown to outperform existing techniques. In the continuation of the thesis we broaden the view and introduce NNk Networks as static structures for browsing image collections. NNk Networks are directed graphs in which every image is connected to all its NNk. NNk Networks obviate the need to articulate an information need pictorially. Moreover, by being entirely precomputed we achieve interaction times that are independent of the collection size. We investigate topological properties of the networks and analyse how well they capture the semantic structure of a collection. These formal analyses are complemented by a large-scale quantitative evaluation on 32,000 images and a set of realistic search tasks. Both approaches suggest that NNk Networks provide a very effective alternative to automated retrieval both for directed searching and undirected browsing.


Marcus Pickering (2000-2004): Video Retrieval and Summarisation

This PhD project in the area of Multimedia Information Retrieval will focus in particular on Video Retrieval.

The work is supported by AT&T Laboratories, Cambridge.


Shyamala Doraisamy (2000-2004): Polyphonic Music Retrieval: The N-gram approach

This PhD project involves the study of content-based retrieval techniques for Music Information Retrieval systems, with a focus on polyphonic music data. Multimedia indexing and retrieval systems, together with standard principles of IR are amongst the technologies to be used with this media towards the system development and evaluation.  One clear challenge is addressing human musical perception.


back to top

MIR Design Studies

We are currently building a multimedia information retrieval system as a framework for research and demonstrator for applications.

Jonas Wolf 2005: GeoBrowser

Every day, around the world, events take place that change history. Because people are curious, technologies have evolved that make this information available for immediate consumption, and because people are also sentimental, for eternity. First, newspapers dominated the scene, then television was invented, and now, the Internet is ubiquitous, embedded in our lives like the very air we breathe. As a result, information is available always and everywhere. Thus, a shift has occurred - the question is no longer "Where can I find information?", the question has evolved to "How can I find the information I want?".

This report introduces GeoBrowser, a web-based graphical user interface which implements a new method of navigating through large amounts of news material.


A May 2004: Image and Video Browser over the Web

The objective of this project is to build a "photo album" or "image gallery" viewable over the web. Although it is relatively easy to generate a simple static browser of an image database, this project is more challenging.

The Image Browser is expected to be screen-aware (ie utilise the full physical screen of the user), band-width aware (ie, decrease resolution for slow connections and pre-load/cache the images which are likely to be viewed next), dynamic (so that results from an image search engine can be viewed), context-aware (so that annotations, if any, can be displayed along with the images). We have several pure image collections of up to 32,000 images which are partially annotated. There exist backend image search engines which can be integrated into the browsing process.

Video Browsing should initially be the same as image browsing, ie, one can expect that a video has been dissected into "shots" each of which is represented by a key-frame image. When one clicks on a key-frame, the clip should be played in this window. We have some 100 hours of videos dissected into shots and key-frames with annotations from speech recognition or teletext subtitles for these collections.


Z Huang 2004: Index Structures for High-Dimensional Image Features

This individual project is about the design and implementation of a high dimensional index application to facilitate the speedy searching in feature based image information retrieval.

We first investigate the principles of high dimensional indexing, choose an index structure for implementing and verify our choice. Then present our design and implementation of the indexing structure. Finally, the indexing system is evaluated by intensive performance tests.


Z Huang 2004, ISO: Scalable Multimedia Database Indexing

This individual study option is to investigate the field of high dimensional indexing. Different from conventional indexing structures, high dimensional indexing suffers from the so-called Curse of Dimensionality. We investigate different approaches to high dimensional indexing and summarize their strength and weakness.

A Yavlinsky 2002, ISO: Support Vector Machines and Feature Selection for CBIR

This individual study option is a study into the application of Support Vector Machines (SVMs) and a number of feature selection to Content Based Image Retrieval (CBIR). We investigate the effects of different feature selection algorithm on an image representation designed by Tieu and Viola and observe differences in retrieval performance using SVMs.

This project has a theoretical and a practical sections. The theoretical section investigates different approaches to CBIR; the practical section presents the experimental results of this study.


L Wong 2002: ANSES - Automatic News Summarization and Extraction System

This project proposes to build a system which address summarization at a multimedia level. In one short sentence ths system could be described as:

"Watch the news while I was away and tell me what happened."

This project combines a Video scene change algorithm, with the current text segmentation and summarization techniques to build an automatic news summarization and extraction system.

Television broadcast news are captured both in Video/Audio format with the accompanying subtitles in text format. News stories are identified, extracted from the video, and summarized in a short paragraph which reduces the amount of information into a manageable size. Individual news video clips can be retrieved effectively by a combination of video and text, using a reversed indexed search engine to provide distilled information such as a summarized version of the orginal text and highlights important key words in the text.


P Lal 2002: Summarisation - Conquering Information Overload

The availability of vast amounts of cheap storage has led to a situation in which more information exists than we can process effectively. Summarization techniques enable users to digest more quickly large amounts of textual information, for example the search results from a web search engine.

The aim of this project is to develop an extractive text summarization system, and to examine how such a system might personalize its output to the level of knowledge of its user.

This work was done as a final-year project for an MEng Degree in Computing. It was recognised as a distinguished project.


P Techasith 2002: Image Search Engine

This project implements feature extraction algorithms from Images and builds an image retrieval system. Simple, non-semantic features are, eg, colour histograms or texture statistics. Other features to be considered may be the extraction of salient geometric features, wavelet coefficients or Fourier coefficients. A browsing tool for quickly visualising a set of images should also be implemented as well as an appealing query mask.


D Heesch 2001: Content-Based Sketch Retrieval

This project develops a Sketch Retrieval system with relevance feedback and evaluates various shape features.

M Pickering 2000: Video Search Engine

This project implements a video search engine prototype.

Its overall aims are to allow full content search and retrieval of video The system performs a number of functions. It records TV material and the respective subtitles of the teletext system, identifies video scenes from analysis of colour histograms and motion vectors, and then automatically indexes these 'video paragraphs' according to significant words detected in the subtitles. A query is typically submitted as text input. Thumbnails of keyframes are then displayed with the option to show a sentence describing the content of each shot, extracted from the subtitles, or to play back the shot itself.


MSc group 2001:Relevance Feedback for Image Databases

This group project implements feature extraction methods for images submitted in jpeg format. These features are used to search a database of pre-computed features for images in a data bank. The database was implemented using R-trees to facilitate multi-dimensional searches.


M Carey 2000: Retrieved-Document Visualisation

This project is to visually present a large set of documents (returned by a search engine) in a way that the users can easily spot subsets of documents they are interested in. This to be integrated into a search engine and implemented as a web based distributed application.


MSc group 2000: Information Visualisation

This project is to visually present a large set of documents (returned by a search engine) in a way that the users can easily spot subsets of documents they are interested in. The search engine returns a list of documents, a list of keywords in these documents and a occurrence matrix for which keywords occur in which document.


M Sukthankar 2000: Spoken-Document Retrieval

A Spoken Document Retrieval System automatically indexes and then retrieves relevant items from a large collection of speech recordings in response to a user query.

This project involved producing such a system.  The speech recordings took the form of BBC News broadcasts. In order to allow the indexing and then retrieval of the broadcasts, they needed to be firstly transcribed into text form. This means that a conventional text retrieval system can be used for the indexing and retrieval.  Secondly the news broadcasts need to be segmented into indivdual stories which allows the retrieval to be in a more managable form for user.


J Gevrey 2001: Hubs and Authorities

We assess a family of ranking mechanisms for search engines based on linkage analysis using a carefully engineered subset of the World Wide Web, WT10g, and a set of relevance judgements for 50 different queries from Trec-9 to evaluate the performance of several link-based ranking techniques. Among these link-based algorithms, Kleinberg's HITS and Larry Page and Sergey Brin's PageRank are assessed. Link analysis seems to yield poor results in Trec's Web Ad Hoc Task. We suggest some alternative algorithms which reuse both text-based search similarity measures and linkage analysis. Although these algorithms yield better results, improving text-only search recall-precision curves in the Web Ad Hoc Task remains elusive; only a certain category of queries seems to benefit from linkage analysis. Among these queries, homepage searches may be good candidates.

R Cooper 2000: High-Precision Information Retrieval

How many Calories has a Big Mac? A good information retrieval system should be able to answer this question with one sentence. Traditional Search Engines direct the user to documents that may or may not answer this question. This project designs a "Information Search Engine" that preprocesses the document repository with regard to entities, extracts a database of facts that can be queried with concrete questions like the one above. Ideally, a prototype of such a system is created working with a repository of 500,000 newspaper articles. This project will deploy natural language processing techniques.


S Sewraz 1999: A Visual Information-Retrieval Navigator

This project is to address the problem of low precision most conventional search engines are plagued with by designing a visualisation front-end that aids navigation through the set of documents returned by a search engine.

The project is based on identifying relevant words in the set of hit documents and on a clustering of the hit-document set; it shall make use of the clustering to visually group the documents returned from the search and label the groups with their respective related words. Also, the navigator shall be able to browse cluster information as well as drill up or down in one or more clusters and refine the search using one or more of the suggested related keywords.