Training the next generation of technology leaders

The Prosperity Partnership will provide an excellent training opportunity for PhD students, combining experience of leading creative industry production with expertise at the forefront of AI for content understanding, and software-defined networks for optimising resources for data processing and distributions. A cohort of 8 PhD students will be funded through the Prosperity Partnership offering an enhanced training programme that includes opportunities for industry secondment.

The Prosperity Partnership will offer funding for a cohort of 8 PhD students.

The learning objective of the training programme is to ensure that all PhD researchers develop the scientific, engineering, collaborative teamwork and leadership skills required for successful careers as future leaders in industry and academia. The training programme builds on the strengths of existing models at the BBC and University partners, including Surrey’s/CVSSP CDT in AI co-funded with industry.

Current funding opportunities

Explore our studentships.

PhD studentships at CVSSP

Six fully-funded PhD studentships including fees and stipend available for outstanding candidates to join the BBC partnership 'AI4ME' at the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey.

Exciting opportunity for nascent researchers to fulfil their potential in Computer Vision, Audio or Audio-Visual AI, joining CVSSP as part of a major new five-year research partnership with the BBC to realize Future Personalised Media Experiences through scientific and technological research.

The goal of the research partnership is to enable future personalised content creation and delivery at scale for the public, at home or on the move. CVSSP research will address the key challenges for personalised content creation and rendering by advancing computer vision and audio-visual AI to transform captured 2D video to object-based media. Research will advance automatic online understanding, reconstruction and neural rendering of complex dynamic real-world scenes and events. This will enable a new generation of personalised media content which adapts to user requirements and interests. The new partnership with the BBC and creative industry partners will position the UK to lead future personalised media experiences. To this end, we propose these topics that could form the basis of your PhD project:

  • Audio-visual object-based dynamic scene representation from monocular video (Prof. Adrian Hilton, to investigate the transformation of monocular audio and visual video into a spatially localised object-based audio-visual representation
  • Temporally consistent tracking and segmentation of people in video Prof. Adrian Hilton, to address the live (video-rate) temporally consistent segmentation of non-rigid objects such as people in video of general scenes to enable video-augmentation
  • Deep learning for audio-visual object separation (Prof. Wenwu Wang, to build deep models to characterise the coherence of audio and visual modalities, which allows the detection of activities of audio objects in video (e.g. speech, music, or other environmental sounds) and their separation from sound mixtures guided by video
  • Audio-visual room acoustics and reverberation (Dr Philip Jackson, to use machine learning with audio-visual data to generate the audio reverb as a sound moves within a real room to convey its position within the scene and an immersive room impression
  • Audio-visual sound localisation and tracking (Dr Philip Jackson, to exploit the combination of camera and microphones to find and follow sound sources, handling observation problems like occlusion, clutter, poor illumination and sparse spatio-temporal views
  • Object-based intrinsic video decomposition for audio-visual content manipulation and adaption (Dr Jean-Yves Guillemaut, to develop algorithms and representations to allow the decomposition of a video into objects and their intrinsic components (appearance, shading, etc) to allow editing tasks such as insertion or removal of objects, relighting or adaptation to the user’s accessibility requirements
  • Audio-visual neural rendering for general scenes (Dr Marco Volino, to explore the relationship between audio and visual signals within a neural rendering framework. Previous work has explored this relationship for single object classes e.g. synthesis of a human face given audio speech data. In contrast, this project will explore more complex and natural audio-visual scenes

About Us

The Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey is ranked first in the UK for computer vision. The centre leads ground-breaking research in audio-visual AI and machine perception for the benefit of people and society through technological innovations in healthcare, security, entertainment, robotics and communications. Over the past two decades CVSSP has pioneered advances in 3D and 4D computer vision, machine listening, spatial audio and audio-visual AI, which have enabled award winning technologies for content production in TV, film, games and VR/XR entertainment.

BBC R&D ( has a worldwide reputation for developments in media technology going back over 90 years and has worked closely with CVSSP for over 20 years. It has pioneered the development of object-based media, working closely with programme-makers and technology teams across the BBC. Recent work has included object-based audio delivery across multiple synchronised devices for sports and drama, and AI for recognising wildlife for natural history.

We are committed to equal opportunities and inclusion, and recognize the value of diversity in science for innovation. We therefore would particularly welcome applications from any under-represented groups and communities. This is an opportunity for outstanding students to join a world-renowned research centre at the start of a major new five-year research partnership to take advantage of this vibrant collaborative research environment.

About You

You will have a strong interest in media, whether it is audio or video, and demonstrate a high level of academic achievement in relevant subject areas and a clear aptitude for engineering research. We will need to be convinced that you have the necessary background knowledge and research skills to begin your doctoral training. You will have a 1st or 2:1 BSc/BEng degree (or equivalent) and either an MSc/MEng in a relevant engineering or scientific discipline or equivalent specialist experience. You will be able to demonstrate excellent mathematical, analytical and computer programming skills. Advantage will be given to applicants with experience in one or more of the following: statistical analysis, software development, signal/image processing, machine learning, computer vision, acoustics, spatial audio. You will have advanced research skills, evidenced by a significant Bachelors/Masters project, for example, involving experimental research, appropriate use of the literature, computer-based simulations and a formal dissertation-style report.


These PhD studentships cover UK university tuition fees for 3 years, a tax-free enhanced maintenance grant of up to £18,609 per annum and contribution to travel costs to present your research at national and international conferences. Studentships are open to international students, from the EU and overseas, provided they can support the difference between home and overseas fees (£18,200).

Next Steps

General enquiries are welcomed by Professor Adrian Hilton by email (, or to the contacts listed above for queries on specific PhD topics. Otherwise, you may apply directly:

Please use your research statement to identify the topic(s) of greatest interest to you and to explain how your skills meet the person specification above. If you are in doubt about any aspects of PhD studies or your application, please contact us. Applications for October 2021 have a closing date of 30th July 2021.

Audio-visual object-based dynamic scene representation from monocular video

The Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey. GUILDFORD, UK.