Neural Architecture Transformer: Towards Accurate and Compact Architectures PROJECT TITLE : Towards Accurate and Compact Architectures via Neural Architecture Transformer ABSTRACT: One of the most important aspects that contributed to the accomplishments of deep neural networks was the process of designing efficient architectures. Neural Architecture Search (NAS) methods can either be used to manually design existing deep architectures or to automatically search for potential candidates. However, even a well-designed and thoroughly researched architecture might still have many modules or operations that are either irrelevant or unnecessary (e.g., some intermediate convolution or pooling layers). This kind of redundancy might not only result in a significant increase in memory consumption and computational cost, but it might also bring about a decline in performance. Therefore, in order to improve performance without increasing the amount of additional computational cost, it is necessary to optimize the operations that are contained within an architecture. In order to achieve this goal, we have proposed a Neural Architecture Transformer (NAT) method. This method transforms the optimization problem into a Markov Decision Process (MDP) and attempts to replace redundant operations with operations that are more effective, such as skipping or null connecting. It is important to keep in mind that NAT only takes into account a limited number of potential replacements or transitions, and as a result, it comes with a restricted search space. As a consequence of this, the performance of architecture optimization might be hindered by such a limited search space. We propose a Neural Architecture Transformer++ (NAT++) method as a solution to this problem. This method further expands the set of candidate transitions in order to improve the performance of architecture optimization. To be more specific, we present a two-level transition rule in order to obtain valid transitions. This rule makes it possible for operations to have types that are more efficient (for example, convolution $to $ separable convolution) or kernel sizes that are smaller (for example, $5times 5 to 3times 3$). Take note that different operations might have a different set of transitions that are considered valid. In addition to this, we suggest adding a Binary-Masked Softmax (BMSoftmax) layer, which will eliminate any potential invalid transitions. In the final step, using the MDP formulation as a foundation, we apply policy gradient to learn an optimal policy, which will then be used to infer the architectures that have been optimized. Extensive testing shows that the transformed architectures have significantly better performance than both their original counterparts and the architectures that were optimized using the methods that are currently available. Did you like this research project? To get this research project Guidelines, Training and Code... Click Here facebook twitter google+ linkedin stumble pinterest Multi-level Attention Network for Segmenting Retinal Vessels WeDea: A New Framework for Emotion Recognition Based on EEG