LayoutLMv2 中的 Spatial-Aware Self-Attention Mechanism

date

Jan 27, 2022

slug

LayoutLMv2

status

Published

tags

Multi-Modal

Deep Learning

summary

好像作用不大

type

Post

论文中关于引入 Spatial-Aware Self-Attention Mechanism(SASAM) 的出发点：

However, the original self-attention mechanism can only implicitly capture the relationship between the input tokens with the absolute position hints. In order to efﬁciently model local invariance in the document layout, it is necessary to insert relative position information explicitly.

原始 self-attention 的权重系数（softmax 前）计算公式如下：

SASAM 会在其基础上，加上三项相对位置编码作为 bias

semantic relative position( )：语义信息的相对位置编码，由绝对位置编码经过 relative_position_bucket 生成

spatial relative position( 和 )：x 和 y 方向的空间信息的相对位置编码，由归一化之后的 x/y 坐标(0-1000) 经过 relative_position_bucket 生成

以为例，以下为加了部分 tensor 维度注释的代码，帮助理解：

在 LayoutXLM 中，虽然作者在 Figure 1 中画了 SASAM，但实际上官方放出来的 LayouXLM 的预训练模型配置中未开启 Spatial-Aware Self-Attention Mechanism 相关的参数，官方项目上也有人提了 issue:

Mismatches between paper descriptions and codes

Updated Sep 28, 2021

。在业务数据 fine tuning 测试中（NER 任务），开启 has_spatial_attention_bias 和 has_relative_attention_bias 参数后，f1 还下降了 1%，不知道是否和预训练模型没有开启这两个参数有关。