Acceleration of H.266 Encoding Using OPENCL And Vectorization with Block Size Variation
No Thumbnail Available
Date
2025-06
Authors
Michael Girma
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Versatile Video Coding (H.266) achieves approximately a 50% reduction in bitrate
compared to its predecessor. However, this improvement in compression
efficiency comes with a significant increase in computational complexity, presenting
major challenges for real-time encoding on general-purpose processors. Most
existing H.266 (VVC) implementations rely heavily on CPU-only processing or
on vendor specific GPU solutions such as CUDA, which limits portability and
cross platform compatibility. Moreover, these approaches often fail to fully utilize
modern heterogeneous CPU-GPU architectures, leaving substantial performance
potential unexploited. This work proposes an OpenCL-based H.266 encoding solution
aimed at delivering high performance, broad cross-platform support, and
efficient hardware utilization. Key encoding modules including block partitioning,
prediction, transform and quantization, loop filtering, and entropy coding—are implemented
as OpenCL kernels to leverage task-level parallelism across both CPUs
and GPUs. Additionally, AVX and SSE vectorization techniques are applied on
the CPU side to enhance per-core throughput, particularly in compute intensive
operations such as transform and quantization. Experimental results across various
platforms demonstrate significant performance improvements. On an NVIDIA
V100 GPU, the OpenCL-accelerated encoder achieves speedups of up to 7500×
compared to a sequential implementation running on an Intel Xeon E5-2698 v4,
with peak efficiency observed at a block size of 512×512. Tests conducted on an
Intel UHD 620 GPU and an Intel i5-8265U CPU reveal speedups ranging from
15.5× to 370×, depending on the block size. The findings suggest that medium
block sizes (64×64 to 256×256) strike the best balance between computational
efficiency and workload distribution. While AVX provides only modest gains over
SSE, the primary performance bottleneck lies in memory access speed rather than
computational power. Overall, the proposed OpenCL-based implementation significantly
accelerates H.266 encoding while maintaining high compression quality.
Description
Keywords
H.266/VVC, OpenCL, GPU Acceleration, CPU Optimization, Video Encoding