iBERT: Interpretable BERT

Interpretable Embeddings via Sense Decomposition

Vishal Anand, Milad Alshomary, Kathleen McKeown

EACL 2026 (Oral)

An input sequence with variation in expressionTokenizedSparse k-senses (ℓ₀) per tokeniBERTEncoder Blocks<Key, Query> AttentionInterpretable activationControlled EditInterpretable+EditableEmbedding1 2 3 ----kv2Top-Sense Pooling(𝜏→ 0)v1Mean Pooling(𝜏→ ∞)v3SoftmaxBlend(𝜏= 1)v3Sharpened Blend(𝜏= 10)
Token1x1:n𝒱nTokensToken2Tokenn-1Tokennn Tokensh1H = h1:nn ×dHidden-layer outputh2hn-1hne1 = E x1e1:nn ×dToken embeddingse1 = E x2e1 = Exn-1e1 = E xnStandard TransformerEncoder block-1Standard TransformerEncoderblock-LL count of Encoder-Blocks(or Decoder Blocks without autoregression-masks)TransformerQ(1)d ×d/kQ(2)d ×d/kQ(k)d ×d/kK(1)d ×d/kK(2)d ×d/kK(k)d ×d/kMultiplied by corresponding Q(), K()matrices for each senseQ1= H . (Q(1))TQ2= H . (Q(2))TQl-1= H . (Q(n-1))TQl= H . (Q(n))TK1= H . (K(1))TK2= H . (K(2))TKl-1= H . (K(n-1))TKl= H . (K(n))TEach Kn ×(d/k)k-count of entriesEach Qn ×(d/k)k-count of entriesQ1K1TQ2K2TQk-1 Kk-1TQk KkTQKTn ×nPre-normalized activation (k-count)α1α2α(k-1)αkαn ×nSoftmaxnormalization(per k-sense)C(x1)C(x2)C(xn-1)C(xn)Sense-construction-block1Sense-construction-block2Sense-construction-blockn-1Sense-construction-blocknC(x1:n) n ×k ×dIndependent per-tokenαk ×n ×no1o2on-1onoi=nj=1k𝓁=1α𝓁,i,jCxj𝓁αℓ,i,j→ How much sense ℓ of j-thtoken contribute to representation at i-thpositionσ(ET o1)σ(ET o2)σ(ET on-1)σ(ET on)Project into vocab-spaceThese oi are the final sense-representationsObjective: Masked Language Model15% tokens maskedo n ×d

iBERT vs BERT Embedding Distribution



Results

N.B: BERT (gray) on WG+SD setup was the prior State-of-The-Art (Patel et al. StyleDistance, ACL 2025)

Interpretable editability

Citation

To appear in Proceedings of EACL (Main) 2026 (Oral)

@misc{anand2026ibertinterpretableembeddingssense,
      title={iBERT: Interpretable Embeddings via Sense Decomposition}, 
      author={Vishal Anand and Milad Alshomary and Kathleen McKeown},
      year={2026},
      eprint={2510.09882},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.09882}, 
}