HyperAI

Bootstrap Sampling / Repeatable Sampling / Sampling With Replacement

For a sample, the probability of being collected in a random sampling of a training set containing m samples is 1m, and the probability of not being collected is 1−1m.

If the probability that no data is collected after m samplings is (1−1m)m, then when m→∞, (1−1m)m→1/e≃0.368, that is, in each round of random sampling, approximately 36.8% of data in the training set is not collected in the sampling set.

After replacement sampling, the data set will have some data duplication and some data missing. K samples are sampled from N samples, and the expectation of different sample numbers is U(K)=N(1−(N−1N)K).