Videos contribute about 70% of today's Internet traffic, and it is still growing. Researchers in both industry and academia have designed numerous great compression algorithms to handle the tradeoff between bitrate and quality. One common assumption held by most existing works is that the original (pristine) video always has perfect quality, however, such assumption no longer holds for User Generated Contents (UGC). Most videos uploaded to YouTube are generated by non-professional creators with consumer-level devices, which means original videos may already contain artifacts (e.g. noise, blur, jerkiness, etc.) or even be heavily edited. It is also possible to enable some filters (like denoising) to make the transcoded quality looks better than the original one. Commonly used quality metrics are all reference metrics, which are not able to fairly evaluate ”positive” quality changes. In this talk, we will address new challenges for UGC, and introduce our recent work.
Yilin Wang received his PhD from the University of North Carolina at Chapel Hill in 2014, working on topics in computer vision and image processing, with a special focus on 3D reconstruction. After graduation, he joined the Media Algorithm team in Youtube/Google. His yesresearch fields include video processing infrastructure, video quality assessment, and video compression.
Sasi Inguva completed his undergraduate in Computer Science from IIT Madras in 2012, with special focus on Computer Vision. After graduation, he joined Media Algorithm team in Youtube/Google. His research fields include video processing infrastructure, 3d reconstruction from videos and video quality assessment.
Neil Birkbeck obtained his PhD from the University of Alberta in 2011 working on topics in computer vision, graphics and robotics, with a specific focus on image-based modeling and rendering. He went on to become a Research Scientist at Siemens corporate research working on automatic detection and segmentation of anatomical structures in full body medical images. He is now a software engineer in the transcoding team at YouTube/Google, with an interest in video processing aspects of new technologies like 360/VR/Omnidirectional video and HDR video.
Alliance for Open Media (AOM) is an industry consortium founded in 2016 by leading Internet companies for developing next-generation open codecs and technologies. AV1 is the first video codec from AOM that was finalized June of 2018. It achieves a 30+% reduction in bandwidth over today's state-of-the-art codecs HEVC/VP9, making it the most advanced video codec available today that is also royalty-free. This talk will provide a glimpse at the most innovative tools in AV1, present current coding results, and update the community on ongoing activities on AV1 adoption in the industry, as well as on plans for AV2.
Debargha Mukherjee received his M.S./Ph.D. degrees in ECE from University of California Santa Barbara in 1999. Thereafter, through 2009 he was with Hewlett Packard Laboratories, conducting research on video/image coding and processing. Since 2010 he has been with Google Inc., where he is currently involved with open-source video codec research and development. Prior to that he was responsible for video quality control and 2D-3D conversion on YouTube. Debargha has authored/co-authored more than 100 papers on various signal processing topics, and holds more than 60 US patents, with many more pending. He has delivered many workshops and talks in the last few years on Google's royalty-free line of codecs and recently AV1. He currently serves as an Associate Editor of the IEEE Trans. on Circuits and Systems for Video Technology and has previously served as Associate Editor of the IEEE Trans. on Image Processing. He is also a member of the IEEE Image, Video, and Multidimensional Signal Processing Technical Committee (IVMSP TC).
Learning-based methods for image compression have made significant progress over the past three years primarily due to the application of deep networks and end-to-end optimized models. This talk covers the latest results for such methods, which yield rate-distortion performance comparable to the best deployed lossy image codecs (including HEIF) across several image quality metrics. We will present a brief overview covering several image compression architectures and highlight the primary modeling insights that drove rate-distortion improvement. Currently, the best models use a hierarchical structure and jointly optimize the deep networks used for both nonlinear transform coding and for predicting the parameters of the entropy model. We’ll cover this architecture in more detail and show example images comparing distortion artifacts across several codecs. Additionally, we’ll present random images sampled from the optimized models, which helps visualize the distribution over natural images learned by the networks.
David Minnen is a Senior Software Engineer at Google where he focuses on deep learning for image and video compression. Previously, he developed user preference models for real-time frame analysis for "smart" camera applications on Android, and he created a vision-based finger tracking and classification system for interactive gestural interfaces at Oblong Industries. David received his Ph.D. at the Georgia Institute of Technology in 2008 where his research on unsupervised time series analysis was funded by an NSF Graduate Research Fellowship.
Johannes Ballé defended his master's and doctoral theses on image and video compression at RWTH Aachen University in 2007 and 2012, respectively. This was followed by a short visit at CSIC in Madrid, Spain, and a four year stay as a postdoctoral fellow at New York University, where he studied models of the physiology of the human visual system, of visual perception, and of probabilistic models of natural images, as well as connections between the three. While there, he pioneered the use of variational Bayesian modeling and machine learning techniques for end-to-end optimized image compression. He joined Google in early 2017 to continue working in this line of research. Johannes has served as a reviewer for conferences both in fields of data compression and machine learning, and for several academic journals, including IEEE Transactions on Circuits and Systems for Video Technology and IEEE Transactions on Image Processing.