NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints, Yu Ji (Tsinghua University), YouHui Zhang (Tsinghua University), ShuangChen Li (University of California, Santa Barbara), Ping Chi (University of California, Santa Barbara), CiHang Jiang (Tsinghua University), Peng Qu (Tsinghua University), Yuan Xie (University of California, Santa Barbara), WenGuang Chen (Tsinghua University) Cambricon-X: An Accelerator for Sparse Neural Networks, Shijin Zhang (Chinese Academy of Sciences), Zidong Du (Chinese Academy of Sciences), Lei Zhang (Chinese Academy of Scienses), Huiying Lan (Chinese Academy of Sciences), Shaoli Liu (Chinese Academy of Sciences), Ling Li (Chinese Academy of Sciences), Qi Guo (Chinese Academy of Sciences), Tianshi Chen (Chinese Academy of Sciences), Yunji Chen (Chinese Academy of Sciences) From High-Level Deep Neural Models to FPGAs, Hardik Sharma (Georgia Institute of Technology), Jongse Park (Georgia Institute of Technology), Divya Mahajan (Georgia Institute of Technology), Emmanuel Amaro (Georgia Institute of Technology), Joon Kyung Kim (Georgia Institute of Technology), Chenkai Shao (Georgia Institute of Technology), Asit Mishra (Intel), Hadi Esmaeilzadeh (Georgia Institute of Technology) vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design,Minsoo Rhu (NVIDIA), Natalia Gimelshein (NVIDIA), Jason Clemons (NVIDIA), Arslan Zulfiqar (NVIDIA), Stephen W. Keckler (NVIDIA) Stripes: Bit-Serial Deep Neural Network Computing, Patrick Judd (University of Toronto), Jorge Albericio (University of Toronto), Tayler Hetherington (University of British Columbia), Tor M. Aamodt (University of British Columbia), Andreas Moshovos (University of Toronto) Fused-Layer CNN Accelerators, Manoj Alwani (Stony Brook University), Han Chen (Stony Brook University), Michael Ferdman (Stony Brook University), Peter Milder (Stony Brook University) 1 NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints 摘要 《NEUTRAMS:神经网络在考虑类脑计算芯片硬件限制下的变换和协同设计》论文由清华大学张悠慧教授课题组和加州大学圣巴巴拉分校谢源教授课题组合作完成。该论文以连接复杂的神经网络算法和高效的类脑计算芯片为目标。类脑计算芯片,特别是基于 ReRAM 的芯片,可以融合数据存储与计算,在较低的功耗下还能达到很高的计算性能。然而这类芯片及硬件设计,受到为成熟的 ReRAM 工艺影响,存在许多限制:比如计算与存储精度受限,ReRAM 阵列规模受限(即计算点积向量程度受限)等问题。该论文讲高层次描述的神经网络加以转换并重新训练,使得目标神经网络可以完好的映射到这些硬件设计上,并且将硬件限制带来的影响降到最小。为了做到这一点,该论文采用神经网络模型转换和硬件映射这两个步骤。在模型转过程中,开奖,考虑到 ReRAM 阵列规模的限制,将原神经网络稀疏化后划分成规模适应于ReRAM阵列的子网络。同时,对数据进行了量化来适应硬件精度受限的问题。最后,增加新的网络层并重新训练,来减小为硬件进行的网络裁剪得来的识别率损失。硬件映射过程则采用了 Kernighan-Lin 策略。该工作分别针对以计算 SNN 的加速器结构的 Tianji 芯片和计算 CNN 的内存中计算架构(processing-in-memory)的 PRIME 硬件结构,进行了实验和探索。 【点评】基于新型器件的神经网络加速器,因为极高的能量效率,非常吸引研究者。但是由于工艺问题,这些新型硬件带来的约束条件阻碍了其得以大规模应用。研究者们从硬件设计到体系结构设计,都提出了一些解决该问题的方法。这篇文章站的更高,从软件的角度去攻克这一问题,从而也起到了更有效的效果。 2 Cambricon-X: An Accelerator for Sparse Neural Networks 摘要 《Cambricon-X:一种针对稀疏神经网络的加速器》的作者是中科院计算所陈云霁、陈天石研究员课题组。神经网络剪枝技术可以在保证预测精度不变的情况下消除网络中大部分突触权值数据,减少冗余计算量。目前主流深度学习加速器(如 DianNao、DaDianNao)缺乏对处理剪枝后的稀疏网络提供有效支持——需要用零填充剪去的权值,再用普通方式进行计算,故无法从剪枝中获益。 (责任编辑:本港台直播) |