HMOO 讀書筆記: 利用深度學習解決 stereo matching 的問題

上一篇文章 Stereo Matching：Semi-Global Matching 列了一些傳統 stereo matching 的參考資料，而本文會介紹一些利用深度學習方法解決 stereo matching 的文章。

GC-Net

GC-Net [1] 的文章標題為 End-to-End Learning of Geometry and Context for Deep Stereo Regression。直接來看架構圖：

以下為一些筆記：

輸入為 stereo images，首先各自通過 2D convolution 以及 residual block 來取得 feature map。
將兩個 feature map 連接起來成為一個四維的 cost volume，維度為 H*W*D*F，其中 D 為所有可能的 disparity 的值的數目，F 為 feature map 的維度。
利用 3D convolution 學習 feature map 至 cost volume 的關係式。
3D deconvolution 來 upsample cost volume。
利用 soft argmax 求出最有可能的 disparity 值。Soft argmax 的式子為：\(\sum_{d=0}^{D_{max}} d \times \sigma(-c_{d})\)，意思是先將 cost volume 中的數字取負號使得 cost 越小的 element 值越大，再取 softmax (\(\sigma 函數\))，最後再加權平均算出最後的 disparity。
Loss 函數為真正的 disparity 與估計的 disparity 的 L1 norm。

GA-Net

GA-Net [2] 的文章標題為 GA-Net: Guided Aggregation Net for End-to-end Stereo Matching，為GC-Net 與 SGM 結合在一起的方法。以下為架構圖：

以下為一些筆記：

Feature map 與 cost volume 的部分與 GC-Net 相同。
SGA layer （Semi-Global Guided Aggregation）參考了 SGM 的演算法。以下為此 layer 的式子：
第一行代表的是原始 cost volume 的 cost，第二行為在 p-r 點上 d disparity 的 cost，其中 r 代表四個方向中的其中一個。第三與第四行為 disparity 相差 1 的 cost，而第五行代表所有其他 disparity 的差大於 1 的情形。為了防止 cost 無限制地變大，在此加入了五個 weight 相加必須等於 1 的限制。
這些 weights 從哪來的呢？從上面的 branch（稱為 Guidance Branch）學來的。假設 cost volume 的維度為 H*W*D*F 的話，guidance branch 提供的 weight 維度必須為 4（4 個方向）* 5（5 個 weight）* H * W * D。
LGA layer （Local Guided Aggregation）可以當成是 disparity map 的 refinement。以下為 LGA layer 的式子：
此 local filter 為 K*K*3 的 filter；本文的例子中 K 為 5，也就是從 local 的 5*5 window 中來 aggregate，而 3 為三個 disparity（d, d-1, d+1）的 aggregation，因此這個 weight layer 的維度為 H*W*75。此 weight layer 也是從 Guidance Branch 而來。

ACVNet

ACVNet [3] 的文章標題為 Attention Concatenation Volume for Accurate and Efficient Stereo Matching。其設計了 attention concatenation volume 來取代以前的 cost volume 設計。首先來看它與其他方法的比較：

以下為架構圖：

Attention 的部分主要來自 MAPM (multi-level adaptive patch matching)，也就是用 Atrous convolution (dilated convolution) 來計算 feature map，這樣能得到在不同 scale 之下各自的 attention。在算完 attention map 以後，就與 concatenated cost volume 用個 element wise 乘法得到 attention concatenation volume。之後的 layers 跟其他的文章都相當類似：cost aggregation 與 disparity prediction。最後附上一個 atrous convolution 的示意圖：

參考資料

[1] End-to-End Learning of Geometry and Context for Deep Stereo Regression

[2] GA-Net: Guided Aggregation Net for End-to-end Stereo Matching

[3] Attention Concatenation Volume for Accurate and Efficient Stereo Matching

HMOO 讀書筆記

2022年4月16日星期六

利用深度學習解決 stereo matching 的問題

GC-Net

GA-Net

ACVNet

參考資料

沒有留言:

張貼留言

2022年4月16日 星期六

利用深度學習解決 stereo matching 的問題

GC-Net

GA-Net

ACVNet

參考資料

沒有留言:

張貼留言

2022年4月16日星期六