Machine-learned classification and ranking techniques often use ensembles to aggregate partial scores of feature vectors for high accuracy and the runtime score computation can become expensive when employing a large number of ensembles. Taking advantage of memory hierarchy in a modern CPU architecture can effectively shorten the time of score computation. However, different data access pattern and blocking parameter settings can exhibit different cache and cost behavior depending on data and architectural characteristics. This project provides a detailed theoretical analysis and comparison on cache blocking methods in regarding to their data access performance. The project studies with three datasets and different applications, showing the effectiveness of our theoretical analysis and proposition.