This is a series of posts about the RWKV architecture (From v4 to v7)
From MLP-mixer to MetaFormer and ConvMixer
Double Descent is a phenomenon in which the test error of a model first decreases, then increases, and then decreases again as the model complexity increases.