The bag-of-systems (BoS) illustration could be a descriptor of motion in a very video, where dynamic texture (DT) codewords represent the standard motion patterns in spatio-temporal patches extracted from the video. The efficacy of the BoS descriptor depends on the richness of the codebook, which depends on the amount of codewords within the codebook. But, for even modest sized codebooks, mapping videos onto the codebook results in a serious computational load. In this paper we propose the BoS Tree, which constructs a bottom-up hierarchy of codewords that allows economical mapping of videos to the BoS codebook. By leveraging the tree structure to efficiently index the codewords, the BoS Tree permits for quick look-ups in the codebook and allows the sensible use of larger, richer codebooks. We have a tendency to demonstrate the effectiveness of BoS Trees on classification of four video datasets, as well as on annotation of a video dataset and a music dataset. Finally, we tend to show that, though the fast look-ups of BoS Tree lead to different descriptors than BoS for the same video, the distance (and kernel) matrices are highly correlated ensuing in similar classification performance.

