Holdout Method and Random Subsampling
These are techniques used to validate the performance of a model on unseen data:
-
Holdout MethodHoldout MethodOptimising eh To divide training and test data optimisingly There will be: The dataset is split into two separate sets, typically a training set (e.g., 70-80% of the data) and a test set (20-30%). The model is trained on the training set and evaluated on the test set. This method is straightforward but can lead to high variance in the evaluation metric due to the arbitrary split.
-
Random SubsamplingRandom SubsamplingIterations of Holdout Method, and then shuffle (randomise) and then shuffle Which give the minimum error will be the minimum output (Repeated Holdout Method): Involves performing the holdout method multiple times with different random splits of the data. The performance metrics are averaged over all iterations to provide a more robust evaluation.