}It also works on optionals. A function returning ?T can propagate none from another optional:
A multi-block masking strategy determines which patches the context encoder sees and which ones it must predict. First, three prediction blocks are sampled, each covering 15–20% of the patch grid, with randomized aspect ratios so the model can’t learn to expect a fixed shape. These blocks can overlap each other, so together they select roughly 35–50% of the grid as prediction targets. After that, a large context block (80–100% of the grid) is sampled, and the prediction regions are subtracted from it. The result is that the context encoder typically sees around 40–55% of the patches.
,这一点在91吃瓜中也有详细论述
FT Digital Edition。关于这个话题,谷歌提供了深入分析
Opens in a new window