Showcasing AI4ME at IBC2024

Members of the AI team, including PAI and CVSSP, showcased our AI in media work at IBC2024, 13 – 16 September in Amsterdam, under the AI4ME project umbrella, alongside AI4ME colleagues from BBC R&D and Lancaster University. We showcased our latest advancements in AI and how they're shaping the future of media. Our renowned research centre, CVSSP, spearheads the AI4ME project with support from the PAI, focusing on developing AI capable of transforming content to deliver personalised media experiences.

Showcasing AI4ME at IBC2024

The AI4ME team, including representatives from the University of Surrey, Lancaster University and the BBC will be exhibiting some of our work in AI for media at the prestigious International Broadcasting Convention - IBC2024, in Amsterdam from 13th to 16th September. This year IBC is hosting an entire subsection of the exhibition and conference on the topic of AI and its impact on the media and broadcasting industries. Find us at Stand 14.AIB14 in the IBC AI Tech Zone, Hall 14.

New AI Tool Improves Audio-Visual Video Parsing

Researchers at AI4ME have developed a new AI tool that can more accurately identify and categorize events in videos based on both audio and visual information. The tool, called CoLeaF, is particularly effective at weakly supervised audio-visual video parsing (AVVP), which means it can learn to identify events even with limited training data. Existing AVVP methods often struggle to distinguish between audible-only, visible-only, and audible-visible events, especially when the audio and visual information don't perfectly align. CoLeaF addresses this issue by learning to combine cross-modal information (audio and visual) only when it's relevant. This helps the tool avoid introducing irrelevant information that can hinder performance. Additionally, CoLeaF models complex class relationships to improve accuracy without increasing computational costs. To evaluate CoLeaF's performance, the researchers conducted extensive experiments using the LLP and UnAV-100 datasets. They found that CoLeaF significantly outperformed existing methods, achieving an average improvement of 1.9% and 2.4% in F-score, respectively. This new AI tool has the potential to improve a variety of applications, such as video analysis, content creation, and accessibility. By more accurately understanding the content of videos, CoLeaF can help developers create more personalised and engaging experiences for users.

The researchers were: Faegheh Sardari, Armin Mustafa, Philip JB Jackson, Adrian Hilton. Proceedings of the ECCV 2024.

Surrey researchers for AI4ME have released a new dataset designed to accelerate the development of personalised media experiences.

The dataset, called ForecasterFlexOBM, features a weather forecast presented by three actors in a variety of settings, including English and British Sign Language. It was captured using a sophisticated 16-camera array and high-quality audio equipment, providing researchers with a rich source of data for machine learning experiments. The ForecasterFlexOBM dataset includes scenes that are relevant to both production and research, such as neural radiance fields, shadow casting, action/event detection, speaker source tracking, and video captioning. By using this dataset, researchers can develop and test new techniques for personalizing media content to meet the specific needs and preferences of individual viewers. The release of the ForecasterFlexOBM dataset is a significant step forward in the field of personalised media production. By providing a high-quality, publicly available resource, researchers can collaborate and accelerate the development of innovative solutions that will benefit audiences around the world.

The researchers were: Berghi, D., Cieciura, C., Einabadi, F., Glancy, M., Camilleri, O. C., Foster, P. A., Nadeem, A., Sardari, F., Zhao, J., & Volino, M. ICME 2024.

A Breakthrough in Audio-Visual Question Answering

A team of AI4ME researchers has developed a new AI model, CAD, that significantly outperforms existing methods in answering questions based on audio and visual information. By aligning audio and visual data on spatial, temporal, and semantic levels, CAD achieves a remarkable 9.4% improvement on the MUSIC-AVQA dataset. This breakthrough could have far-reaching implications for applications such as video captioning, content search, and accessibility for the visually impaired.

The researchers were: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024.

AI Tool Improves Video Captioning Accuracy: Researchers at the CVSSP research programme, AI4ME have developed a new AI tool that can generate more accurate and grammatically correct video captions.

The tool, called SEM-POS, uses a novel global-local fusion network to combine visual and linguistic features. This approach helps the tool better align the visual information with the language description, resulting in more accurate and coherent captions. By using different parts of speech components for supervision, SEM-POS can generate captions that are more grammatically correct and capture key information from the video. Extensive testing on benchmark datasets has shown that SEM-POS significantly outperforms existing methods in terms of caption accuracy. This new AI tool has the potential to improve the accessibility and usability of video content for people with hearing impairments or who simply prefer to read captions. It could also be used to enhance search engine optimization and improve the discoverability of online videos.

The researchers were: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa; Proceedings of the IEEE/CVF. Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 2606-2616

CVSSP's Expertise Drives Progress in AI4ME Research

The Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey played a pivotal role in the recent AI4ME Independent Advisory Board (IAB) meeting, held at BBC MediaCity in Salford. As a key contributor to the AI4ME research group, CVSSP's expertise has been instrumental in advancing the development of personalised media experiences. The IAB meeting, which took place on January 26, 2023, brought together leading figures from the creative industry to provide valuable insights and guidance to the AI4ME project. The event showcased the significant progress made over the first nine months, with live demonstrations highlighting the innovative research that is laying the groundwork for the creation, production, and delivery of personalized media content.

A New Era of Content Creation and Consumption

The University of Surrey, in collaboration with the BBC, EPSRC and Lancaster University, is spearheading a groundbreaking initiative to revolutionize the UK media industry through personalised media experiences. Leveraging the power of AI and Object-Based Media (OBM), this partnership aims to create media content that seamlessly adapts to individual preferences, accessibility needs, devices, and location. Building upon the BBC's extensive experience in OBM and its ability to conduct large-scale audience trials, this project seeks to redefine how we interact with media. The University of Surrey's expertise in audio-visual AI will enable the efficient creation of personalized OBM experiences, while Lancaster University's proficiency in software-defined networking will ensure the scalable and cost-effective delivery of these experiences to millions of users. This ambitious £15m UK Research and Innovation (UKRI) Prosperity Partnership goal is to develop technologies that will provide the UK creative sector with unrivalled capabilities to create a step-change in audience experiences at scale.

Independent Advisory Board meeting

The first AI4ME Independent Advisory Board meeting was hosted at BBC MediaCity in Salford on the 26th of January 2023. The IAB comprises creative industry leaders acting as a critical friend to the project to review and advise AI4ME. The event showcased progress over the first 9 months with live demonstrations of research advances providing the foundations for the creation, production and delivery of personalised media experiences.

Shaping The Algorithms That Personalise Your Media

Joined by BBC R&D, this panel explores digital rights, human values, trust and decentralisation as directions to advance society's ability, our ability, to shape the algorithms that feed our media. The session includes presentations from speakers researching these themes and provide an opportunity for questions and discussion. It is hoped that their insights will enrich and inform a debate as part of society's ongoing conversation to design technologies for media personalisation that better serve the needs of society. Watch video recording here.

Surrey to lead AI research Prosperity Partnership to enable future personalised media for all

The University of Surrey has teamed up with the BBC and Lancaster University on a new five-year ‘Prosperity Partnership’ to develop state-of-the-art artificial intelligence (AI) and cutting-edge technologies that will allow future media experiences to be hyper-personalised by adapting to individual users’ interests, devices, location and accessibility requirements.

Delivering the future of interactive and personalised media at scale with partners

We are delighted to announce the start of a new five-year Prosperity Partnership with the universities of Surrey and Lancaster to develop and trial new ways to create and deliver object-based media at scale. Through groundbreaking collaborative research, we aim to transform how our audiences can enjoy personalised content and services in future. The goal is to enable scaled delivery of a wide range of content experiences - efficiently and sustainably - to mainstream audiences via multiple platforms and devices.