Bootstrap Sampling / Repeatable Sampling / Sampling With Replacement
For a sample, the probability of being collected in a random sampling of a training set containing m samples is 1m, and the probability of not being collected is 1−1m.
If the probability that no data is collected after m samplings is (1−1m)m, then when m→∞, (1−1m)m→1/e≃0.368, that is, in each round of random sampling, approximately 36.8% of data in the training set is not collected in the sampling set.
After replacement sampling, the data set will have some data duplication and some data missing. K samples are sampled from N samples, and the expectation of different sample numbers is U(K)=N(1−(N−1N)K).