Image Quality Assessment
Dual-Branch Vision Transformer for Blind Image Quality Assessment

Blind image quality assessment (BIQA) aims to predict the perceptual quality of an image without access to a reference. We propose a dual-branch vision transformer that simultaneously considers both local distortions and global semantic information. Dual-scale features (S-Feature and L-Feature) are extracted from a ResNet-50 backbone and fed into separate transformer encoder branches. Each branch captures scale-variant local distortions through local feature embeddings, and jointly models global distortion context via content-aware IQA (CA-IQA) embeddings. The outputs of both branches are combined through feed-forward blocks to predict the final image quality score.
| Method | SRCC | |||||
|---|---|---|---|---|---|---|
| LIVEC | TID2013 | LIVE | CSIQ | LIVE MD | KADID-10k | |
| BRISQUE | 0.608 | 0.604 | 0.939 | 0.746 | 0.886 | 0.528 |
| M3 | 0.607 | 0.689 | 0.951 | 0.795 | 0.892 | - |
| FRIQUEE | 0.682 | 0.680 | 0.940 | 0.835 | 0.923 | - |
| CORNIA | 0.629 | 0.678 | 0.947 | 0.678 | 0.899 | - |
| HOSA | 0.640 | 0.735 | 0.946 | 0.741 | 0.913 | - |
| Le-CNN | - | - | 0.956 | - | - | - |
| BIECON | 0.595 | 0.717 | 0.961 | 0.815 | 0.909 | 0.623 |
| DIQaM-NR | 0.606 | 0.835 | 0.960 | - | - | - |
| WaDIQaM-NR | 0.671 | 0.761 | 0.954 | - | - | 0.739 |
| ResNet-ft | 0.819 | 0.712 | 0.950 | 0.876 | 0.909 | - |
| IW-CNN | 0.663 | 0.800 | 0.963 | 0.812 | 0.914 | - |
| DBCNN | 0.851 | 0.816 | 0.968 | 0.946 | 0.927 | 0.851 |
| HyperIQA | 0.859 | 0.797 | 0.962 | 0.923 | 0.898 | 0.852 |
| TReS | 0.846 | 0.863 | 0.969 | 0.922 | 0.916 | 0.859 |
| BIQA, M.D. | - | 0.835 | 0.969 | 0.903 | - | - |
| RNSA | 0.871 | 0.849 | 0.969 | 0.931 | - | 0.855 |
| Proposed | 0.862 | 0.877 | 0.976 | 0.942 | 0.935 | 0.970 |
| Method | PLCC | |||||
|---|---|---|---|---|---|---|
| LIVEC | TID2013 | LIVE | CSIQ | LIVE MD | KADID-10k | |
| BRISQUE | 0.629 | 0.694 | 0.935 | 0.829 | 0.917 | 0.567 |
| M3 | 0.630 | 0.771 | 0.950 | 0.839 | 0.919 | - |
| FRIQUEE | 0.705 | 0.753 | 0.944 | 0.874 | 0.934 | - |
| CORNIA | 0.671 | 0.768 | 0.950 | 0.776 | 0.921 | - |
| HOSA | 0.678 | 0.815 | 0.947 | 0.823 | 0.926 | - |
| Le-CNN | - | - | 0.953 | - | - | - |
| BIECON | 0.613 | 0.762 | 0.962 | 0.823 | 0.933 | 0.648 |
| DIQaM-NR | 0.601 | 0.855 | 0.972 | - | - | - |
| WaDIQaM-NR | 0.680 | 0.787 | 0.963 | - | - | 0.752 |
| ResNet-ft | 0.849 | 0.756 | 0.954 | 0.905 | 0.920 | - |
| IW-CNN | 0.705 | 0.802 | 0.964 | 0.791 | 0.929 | - |
| DBCNN | 0.869 | 0.865 | 0.971 | 0.959 | 0.869 | 0.856 |
| HyperIQA | 0.882 | 0.823 | 0.966 | 0.942 | 0.924 | 0.845 |
| TReS | 0.877 | 0.883 | 0.968 | 0.942 | 0.921 | 0.858 |
| BIQA, M.D. | - | 0.859 | 0.978 | 0.925 | - | - |
| RNSA | 0.883 | 0.861 | 0.972 | 0.959 | - | 0.859 |
| Proposed | 0.882 | 0.894 | 0.976 | 0.952 | 0.935 | 0.971 |
Publications
- Se-Ho Lee and Seung-Wook Kim, “Dual-branch vision transformer for blind image quality assessment,” Journal of Visual Communication and Image Representation, vol. 94, pp. 103850, Jun. 2023. [DOI]