Faculty Profiles - AWANO Hiromitsu

写真a

AWANO Hiromitsu

Organization

Graduate School of Engineering Information and Communication Engineering 2 Professor

Undergraduate School

School of Engineering Electrical Engineering, Electronics, and Information Engineering

Papers 75

Frieren: A Fault-Tolerant Reconfigurable Energy-Efficient Computing Architecture With Enhanced Reliability in Harsh Environments Reviewed

Cheng Q., Li Q., Dong W., Zhang M., Zhang R., Huang M., Yu H., Shi Y., Awano H., Sato T., Saligane M., Lin L., Hashimoto M.

IEEE Transactions on Computers Vol. 75 ( 7 ) page： 2589 - 2603 2026.7

　More details

Publisher：IEEE Transactions on Computers

In harsh environments such as space, strong radiation effects often induce single-event effects that threaten the reliability of computing systems. Meanwhile, edge artificial intelligence (AI) processors deployed in these conditions must not only tolerate faults but also operate under stringent resource constraints, while still ensuring efficient task execution. Achieving high-performance and energy-efficient computation with adaptive reliability in such harsh conditions is therefore of great importance. This work presents Frieren, a fault-tolerant and reconfigurable computing architecture for reliable operation in harsh environments. A 22 nm system-on-chip (SoC) prototype is implemented to validate Frieren and evaluate its resilience to soft errors. Frieren operates in three primary modes: (1) a high-throughput computation engine mode, (2) a multi-core mode featuring adaptive dual-core lockstep (DCLS) for fault tolerance and programmable parallel computing, and (3) a JTAG-assisted scan-chain-based fault injection (FI) mode. The first two modes fully share processing elements and memory resources, ensuring zero data movement during mode transitions, while the third mode supports pre-deployment reliability evaluation by emulating transient faults. Both irradiation and hardware-level FI experiments are conducted to verify reliability, confirming the robustness of Frieren. Radiation tests of the SoC indicate that DCLS can correct up to about 83% of RISC-V errors, while customized parallel computing in multi-core mode achieves a 17.77× latency reduction. Moreover, the SoC delivers up to 17.18 TOPS/W in computation engine mode and 1.92 TOPS/W in multi-core mode, demonstrating an energy-efficient and resilient platform for AI deployment under harsh conditions. In real workloads, the SoC achieves peak energy efficiencies of 14.72 TOPS/W on SuperYOLO and 12.33 TOPS/W on DROID-SLAM.

DOI： 10.1109/TC.2026.3688989

Scopus
Analog In-Memory Computing from a Memory-Agnostic Perspective: Theory, Nonidealities, and Hardware-Aware Training Reviewed Open Access

SAKEMI Yusuke, AWANO Hiromitsu, MORIE Takashi

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences Vol. E109.A ( 5 ) page： 840 - 859 2026.5

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

Analog in-memory computing (AIMC) executes matrix-vector multiplications (MVMs) inside memory to alleviate the von Neumann bottleneck and improve energy efficiency. This tutorial classifies AIMC circuits in a memory-agnostic way, namely, current-domain, charge-domain, charge-redistribution, capacitive-division, resistive-division, and time-domain IMC. We explain each type of AIMC circuit with simple mathematical models. Furthermore, we review key device and circuit nonidealities (e.g., process variation, IR drop, sneak paths, and I/O quantization/nonlinearity) with practical mitigation strategies in circuitry and peripherals. Finally, we organize hardware-aware training into three complementary families — probabilistic/precise modeling, physical modeling, and hardware-in-the-loop techniques — providing a mathematically grounded bridge between circuits and learning for robust, scalable AIMC accelerators.

DOI： 10.1587/transfun.2025gci0001

Web of Science

Scopus

CiNii Research
Biologically Constrained DNA Encoding With Triplet Networks for Similarity Image Retrieval Reviewed Open Access

Koike T., Awano H., Sato T.

IEEE Transactions on Computational Biology and Bioinformatics Vol. 23 ( 3 ) page： 1240 - 1252 2026.5

　More details

Publisher：IEEE Transactions on Computational Biology and Bioinformatics

As the volume of digital data continues to grow exponentially, DNA has emerged as a promising medium for long-term data storage due to its high density and durability. For enabling data retrieval via DNA's biochemical reactions, the encoding strategy plays a critical role. This paper proposes a training framework for a DNA encoder that improves both accuracy and training efficiency in content-based image retrieval by incorporating deep metric learning. In addition, we introduce loss functions that enforce biological constraints, specifically homopolymer length and GC content, thereby improving the biochemical stability of the generated DNA sequences. To evaluate the effectiveness of the proposed method, we conduct quantitative assessments based on image classification performance. Simulations on the CIFAR-10 and CIFAR-100 datasets demonstrate that our method achieves classification accuracy comparable to CNN-based baselines and a 20-fold speedup over the training time of the existing method. Moreover, the generated DNA sequences enable strict control of homopolymer length and maintain GC content within the optimal 40-60% range, significantly improving biological feasibility compared to baseline methods.

DOI： 10.1109/TCBBIO.2026.3673740

Open Access

Scopus
Improving Robustness of Leakage-Based MOSFET Reservoir Computing Using Adaptive Pulse-Width Control Reviewed

Seki R., Utsunomiya M., Chen Y.G., Awano H., Sato T.

IEEE International Conference on Microelectronic Test Structures 2026

　More details

Publisher：IEEE International Conference on Microelectronic Test Structures

This paper proposes a method to enhance the robustness of Leakage-based MOSFET Echo State Network (LMESN) against environmental variations. LMESN is a hardware reservoir computing architecture that exploits MOSFET subthreshold leakage currents. The proposed method consists of two components: adaptive tuning of the minimum input pulse width based on temperature to compensate for leakage-current change, and the use of Lasso regression for output-weight training to suppress errors arising from temperature-coefficient variations. Simulation results on a time-series classification task confirm that the inference accuracy is maintained across temperatures ranging from 5 to 75°C without requiring retraining over this temperature range.

DOI： 10.1109/ICMTS69943.2026.11471724

Scopus
Online Training and Inference System on Edge FPGA Using Delayed Feedback Reservoir Reviewed Open Access

Ikeda, S; Awano, H; Sato, T

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS Vol. 44 ( 9 ) page： 3323 - 3335 2025.9

　More details

Publisher：IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems

A delayed feedback reservoir (DFR) is a hardware-friendly reservoir computing system. Implementing DFRs in embedded hardware requires efficient online training. However, two main challenges prevent this: 1) hyperparameter selection, which is typically done by offline grid search, and 2) training of the output linear layer, which is memory-intensive. This article introduces a fast and accurate parameter optimization method for the reservoir layer utilizing backpropagation and gradient descent by adopting a modular DFR model. A truncated backpropagation strategy is proposed to reduce memory consumption associated with the expansion of the recursive structure while maintaining accuracy. The computation time is significantly reduced compared to grid search. In addition, an in-place Ridge regression for the output layer via 1-D Cholesky decomposition is presented, reducing memory usage to be 1/4. These methods enable the realization of an online edge training and inference system of DFR on an FPGA, reducing computation time by about 1/13 and power consumption by about 1/27 compared to software implementation on the same board.

DOI： 10.1109/TCAD.2025.3541565

Web of Science

Scopus
A 22nm Resource-Frugal Hyper-Heterogeneous Multi-Modal System-on-Chip Towards In-Orbit Computing Reviewed

Cheng Q., Li Q., Dong W., Zhang M., Zhang R., Huang M., Yu H., Shi Y., Awano H., Sato T., Lin L., Hashimoto M.

Proceedings of the Custom Integrated Circuits Conference 2025

　More details

Publisher：Proceedings of the Custom Integrated Circuits Conference

Integrating artificial intelligence (AI) into in-orbit computing offers significant benefits, but current satellites face challenges in processing large sensor data volumes due to limited communication and computing resources, resulting in high latency [1]. Intelligent Early Discard (IED) [2] addresses this by filtering irrelevant data early, optimizing bandwidth and data usage. However, this demands highperformance onboard computing for efficient data preprocessing and AI acceleration [3], [4]. Additionally, Space radiation, including solar energetic particles and cosmic rays, can cause Single Event Upsets (SEUs) in satellite systems [5], risking mission failure and increasing reliability demands [6], [7]. To tackle these challenges, we propose a resource-frugal hyper-heterogeneous System-on-Chip (SoC) architecture for in-orbit computing. The SoC features two modes: (1) a specialized computation engine for AI acceleration, and (2) a multicore mode with dual-core lock-step (DCLS) and vector computing for efficient, fault-tolerant data processing (Fig. 1). This resource-frugal architecture enables full sharing of Processing Elements (PEs) and memories for dynamic workload allocation, enhancing in-orbit performance by processing IED data directly on the satellite and reducing costly data transmission to Earth.

DOI： 10.1109/CICC63670.2025.10983627

Scopus
SOME: Symmetric One-Hot Matching Elector - A Lightweight Microsecond Decoder for Quantum Error Correction Reviewed

Guo X., Miao G., Nishizawa S., Awano H., Kimura S., Sato T.

IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad 2025

　More details

Publisher：IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

Conventional quantum error correction (QEC) de-coders such as Minimum-Weight Perfect Matching (MWPM) and Union-Find (UF) offer high thresholds and fast decoding, respectively, but both suffer from high topological complexity. In contrast, Ising model-based decoders reduce topological complexity but demand considerable decoding time. We propose the Symmetric One-Hot Matching Elector (SOME), a novel decoder that reformulates the QEC decoding task as a Quadratic Unconstrained Binary Optimization (QUBO) problem - termed the One-Hot QUBO (OHQ). Each variable in the QUBO represents whether a given pair of flipped syndromes is matched, while the error probabilities between the pair are encoded as interaction coefficients (weight). Constraints ensure that each flipped syndrome is matched exactly once. Valid solutions of OHQ correspond to self-inverse permutation matrices, characterized by symmetric one-hot encoding. To solve the OHQ efficiently, SOME reformulates the decoding task as the construction of permutation matrices that minimize the total weight. It initializes each candidate matrix from one of the minimum-weight syndrome pairs, then iteratively appends additional pairs in ascending order of weight, and finally selects the permutation matrix with the lowest total energy. SOME achieves up to a 99.9x reduction in variable count and reduces decoding times from milliseconds to microseconds on a single-threaded commodity CPU. OHQ also maintains performance up to a 10.5% physical error rate, surpassing the highest known threshold of MWPM.

DOI： 10.1109/ICCAD66269.2025.11240965

Scopus
Random Telegraph Noise Observed on 65-nm Bulk pMOS Transistors at 3.8K Reviewed Open Access

Kawakami, T; Sato, T; Awano, H

30TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2025 page： 1438 - 1443 2025

　More details

Publisher：Proceedings of the Asia and South Pacific Design Automation Conference ASP DAC

This paper presents a detailed study on Random Telegraph Noise (RTN) behavior under cryogenic conditions. The study leverages a device array, BTIarray, to statistically measure RTN in a temperature range from room temperature down to 3.8 K. The measurement results indicate that while RTN's impact decreases in the low-temperature region at about 100 K, it becomes more pronounced at lower temperatures, especially in transistors with shorter channel lengths. This research advances the understanding of RTN in cryogenic environments, offering essential insights for future integrated circuit (IC) design.

DOI： 10.1145/3658617.3703140

Open Access

Web of Science

Scopus
Lookup Table-based Multiplication-free All-digital DNN Accelerator Featuring Self-Synchronous Pipeline Accumulation Reviewed

Tagata, H; Sato, T; Awano, H

PROCEEDINGS OF THE 62ND ANNUAL ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2025 2025

　More details

Publisher：Proceedings Design Automation Conference

Deep neural networks (DNNs) have been widely applied in our society, yet reducing power consumption due to large-scale matrix computations remains a critical challenge. MADDNESS is a known approach to improving energy efficiency by substituting matrix multiplication with table lookup operations. Previous research has employed large analog computing circuits to convert inputs into LUT addresses, which presents challenges to area efficiency and computational accuracy. This paper proposes a novel MADDNESS-based all-digital accelerator featuring a self-synchronous pipeline accumulator, resulting in a compact, energy-efficient, and PVT-invariant computation. Post-layout simulation using a commercial 22nm process showed that 2.5 × higher energy efficiency (174 TOPS/W) and 5× higher area efficiency (2.01 TOPS/mm2) can be achieved compared to the conventional accelerator.

DOI： 10.1109/DAC63849.2025.11132097

Web of Science

Scopus
GaitCloud: Leveraging Spatial-temporal Information for LiDAR-base Gait Recognition with A True-3D Gait Representation Reviewed

Zhang, SX; Awano, H; Sato, T

2025 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV page： 2849 - 2858 2025

　More details

Publisher：Proceedings 2025 IEEE Winter Conference on Applications of Computer Vision Wacv 2025

Gait recognition using point clouds captured by LiDAR (Light Detection And Ranging) sensors offers better adaptability to variations in walking conditions compared to camera-based methods, due to the precise spatial information captured. However, existing methods typically project the point clouds into a sequence of 2D depth images extended along the time dimension and adopt gait recognition networks optimized for camera-based approaches. This planar projection compromises the integrity of the 3D coordinates (length, width, and depth) and results in severe silhouette deformations with varied observation viewpoints, similar to the camera-based methods. To better utilize the spatial information in gait point clouds, we propose a true 3D gait representation using efficient point cloud voxelization, termed GaitCloud. Additionally, we explore the unique nature of LiDAR-captured point clouds and present two improved modules adapted to our method, called Layer Encoder (LE) and Horizontal Convolutional Pooling (HCP). Evaluation results using the open-access gait dataset SUSTech1K show that our method outperforms the state-of-the-art, achieving recognition accuracies of 93.1 % and 89.2 % in cross-view and variance experiments, respectively. These results demonstrate that 3D gait representation based on point cloud voxelization more effectively utilizes spatial information than depth images, offering new possibilities for high-performance LiDAR-based gait recognition. The source code is available at https://github.com/seagrgz/GaitCloud-master.git.

DOI： 10.1109/WACV61041.2025.00282

Web of Science

Scopus
Beamforming Feedback-Based Respiration and Heart Rate Estimation Toward Firmware-Agnostic WiFi Sensing Reviewed Open Access

Kanda, T; Kondo, S; Shimomura, H; Sato, T; Awano, H; Yamamoto, K

IEEE ACCESS Vol. 13 page： 146008 - 146019 2025

　More details

Publisher：IEEE Access

WiFi-based vital sign monitoring has attracted growing attention for its potential applications in contactless healthcare. However, most existing techniques rely on channel state information (CSI), which typically requires custom firmware and specific chipsets. To address this issue, this study explores firmware-agnostic respiration and heart rate estimation using beamforming feedback (BFF), compressed representation of CSI. This eliminates the need for custom firmware or chipset support, enabling broader applicability using off-the-shelf devices. However, it is not trivial to apply CSI-based estimation techniques to BFF-based estimation because the information content and data structure of BFF differ from those of CSI. The proposed BFF-based estimation algorithm addresses this issue by adapting the CSI-based estimation techniques to work with BFF. The algorithm consists of four key components: subcarrier selection, data calibration, signal extraction, and respiration and heart rate estimation. The performance of the BFF-based estimation algorithm is experimentally validated in several indoor environments using commodity IEEE 802.11ac devices. Results show that respiration rate and heart rate can be estimated with average errors below 1 breaths/min and 10 beats/min, respectively. Furthermore, accuracy comparisons between BFF-based and CSI-based estimations are provided to investigate the impact of lossy compression from CSI to BFF, specifically singular value decomposition (SVD) calculation and quantization. Comparisons reveal that the accuracy degradation of the BFF-based estimation compared to CSI-based estimation is primarily caused by the quantization rather than the SVD calculation.

DOI： 10.1109/ACCESS.2025.3600278

Open Access

Web of Science

Scopus
A Radiation-Hardened Neuromorphic Imager with Self-Healing Spiking Pixels and Unified Spiking Neural Network for Space Robotics Reviewed

Cheng Q., Li Q., Yang Z., Kong Z., Niu G., Liang Y., Li J., Park J.H., Liao W., Awano H., Sato T., Lin L., Hashimoto M.

Digest of Technical Papers Symposium on VLSI Technology 2025

　More details

Publisher：Digest of Technical Papers Symposium on VLSI Technology

A radiation-hardened neuromorphic imager prototype is developed for space exploration, featuring a fully spike-based neuromorphic vision system architecture, in-pixel self-healing against radiation-induced damage, and integrated unified spiking neural network (USNN) with adaptive neurons and synapses and contrast enhancement at low-contrast conditions. Self-healing reduces dark current by 6.25× at 14kGy cumulative dose, recovering recognition accuracy by 27.8%. USNN consumes 0.0529 pj/SOP at 5,000 events/s.

DOI： 10.23919/VLSITechnologyandCir65189.2025.11075180

Scopus
Weighted Range-Constrained Ising-Model Decoder for Quantum Error Correction Reviewed

Guo, XY; Awano, H; Sato, T

PROCEEDINGS OF THE 62ND ANNUAL ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2025 2025

　More details

Publisher：Proceedings Design Automation Conference

Ising model-based Quantum Error Correction decoders reduce topological complexity compared to classical decoders. However, the SOTA Ising decoder has a higher time complexity than union-find (UF) and a lower threshold than minimum-weight perfect-matching (MWPM). We propose the Weighted Range-Constrained Ising Model-Based (WRIM) decoder. WRIM uses a polygonal region to enclose flipped syndromes, ensuring the coverage of all potential error chains while optimizing coupling and external field coefficients. WRIM reduces the variable count by 97.8x, achieves microsecondlevel decoding, and has a worst-case time complexity of O(n), outperforming UF. WRIM exhibits threshold behavior up to 10.711.0%, surpassing the MWPM's highest reported threshold.

DOI： 10.1109/DAC63849.2025.11133309

Web of Science

Scopus
Zero-Aware Regularization for Energy-Efficient Inference on Akida Neuromorphic Processor Reviewed

Habara, T; Sato, T; Awano, H

2025 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2025

　More details

Publisher：Proceedings IEEE International Symposium on Circuits and Systems

Spiking Neural Networks (SNNs) and their hardware accelerators have emerged as promising systems for advanced cognitive processing with low power consumption. Although the development of SNN hardware accelerators is particularly active, research on the intelligent use of these accelerators remains limited. This study focuses on the SNN accelerator Akida, a commercially available neuromorphic processor, and presents a novel training method designed to reduce inference energy by leveraging the unique architecture of the hardware. Specifically, we apply sparse constraints on neuron activations and synaptic connection weights, aiming to minimize the number of firing neurons by considering Akida's batch spike processing feature. Our proposed method was applied to a network consisting of three convolutional layers and two fully connected layers. In the MNIST image classification task, the activations became 76.1% sparser, and the weights became 22.1% sparser, resulting in a 13.8% reduction in energy consumption per image.

DOI： 10.1109/ISCAS56072.2025.11044086

Web of Science

Scopus
Window Function-less DFT with Reduced Noise and Latency for Real-Time Music Analysis Reviewed

Biesinger C., Awano H., Hashimoto M.

European Signal Processing Conference page： 431 - 435 2025

　More details

Publisher：European Signal Processing Conference

Music analysis applications demand algorithms that can provide both high time and frequency resolution while minimizing noise in an already-noisy signal. Real-time analysis additionally demands low latency and low computational requirements. We propose a DFT-based algorithm that accomplishes all these requirements by extending a method that post-processes DFT output without the use of window functions. Our approach yields greatly reduced sidelobes and noise, and improves time resolution without sacrificing frequency resolution. We use exponentially spaced output bins which directly map to notes in music. The resulting improved performance, compared to existing FFT and DFT-based approaches, creates possibilities for improved real-time visualizations, and contributes to improved analysis quality in other applications such as automatic transcription.

DOI： 10.23919/EUSIPCO63237.2025.11226525

Scopus
A Robust and Energy Efficient Hyperdimensional Computing System for Voltage-scaled Circuits Reviewed Open Access

Liang, DH; Awano, H; Miura, N; Shiomi, J

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS Vol. 23 ( 6 ) 2024.11

　More details

Publisher：ACM Transactions on Embedded Computing Systems

Voltage scaling is one of the most promising approaches for energy efficiency improvement but also brings challenges to fully guaranteeing stable operation in modern VLSI. To tackle such issues, we further extend the DependableHD to the second version DependableHDv2, a HyperDimensional Computing (HDC) system that can tolerate bit-level memory failure in the low voltage region with high robustness. DependableHDv2 introduces the concept of margin enhancement for model retraining and utilizes noise injection to improve the robustness, which is capable of application in most state-of-the-art HDC algorithms. We additionally propose the dimension-swapping technique, which aims at handling the stuck-at errors induced by aggressive voltage scaling in the memory cells. Our experiment shows that under 8% memory stuck-at error, DependableHDv2 exhibits a 2.42% accuracy loss on average, which achieves a 14.1× robustness improvement compared to the baseline HDC solution. The hardware evaluation shows that DependableHDv2 supports the systems to reduce the supply voltage from 430 mV to 340 mV for both item Memory and Associative Memory, which provides a 41.8% energy consumption reduction while maintaining competitive accuracy performance.

DOI： 10.1145/3620671

Web of Science

Scopus
BayesianSpikeFusion: accelerating spiking neural network inference via Bayesian fusion of early prediction Reviewed Open Access

Habara, T; Sato, T; Awano, H

FRONTIERS IN NEUROSCIENCE Vol. 18 page： 1420119 2024.8

　More details

Language：English Publisher：Frontiers in Neuroscience

Spiking neural networks (SNNs) have garnered significant attention due to their notable energy efficiency. However, conventional SNNs rely on spike firing frequency to encode information, necessitating a fixed sampling time and leaving room for further optimization. This study presents a novel approach to reduce sampling time and conserve energy by extracting early prediction results from the intermediate layer of the network and integrating them with the final layer's predictions in a Bayesian fashion. Experimental evaluations conducted on image classification tasks using MNIST, CIFAR-10, and CIFAR-100 datasets demonstrate the efficacy of our proposed method when applied to VGGNets and ResNets models. Results indicate a substantial energy reduction of 38.8% in VGGNets and 48.0% in ResNets, illustrating the potential for achieving significant efficiency gains in spiking neural networks. These findings contribute to the ongoing research in enhancing the performance of SNNs, facilitating their deployment in resource-constrained environments. Our code is available on GitHub: https://github.com/hanebarla/BayesianSpikeFusion.

DOI： 10.3389/fnins.2024.1420119

Open Access

Web of Science

Scopus

PubMed
DNA-based Similar Image Retrieval via Triplet Network-driven Encoder Reviewed

Koike, T; Awano, H; Sato, T

2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE 2024

　More details

DOI： 10.23919/DATE58400.2024.10546567

Web of Science
S3M: Static Semi-Segmented Multipliers for Energy-efficient DNN Inference Accelerators Reviewed Open Access

Zhang, MT; Cheng, Q; Awano, H; Lin, LY; Hashimoto, M

2024 IEEE 42ND INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD page： 16 - 23 2024

　More details

Publisher：Proceedings IEEE International Conference on Computer Design VLSI in Computers and Processors

Approximate multipliers offer an efficient approach to reduce power consumption in compute-intensive applications, such as Deep Neural Networks (DNNs). However, current 8-bit approximate multipliers struggle to maintain high accuracy across various DNN applications. In this paper, we highlight challenges in 8-bit multiplier designs with body approximation strategies and evaluate the effectiveness of input approximation methods. Recognizing that exact multipliers with quantization bit-widths below 8 bits have demonstrated superior performance, we aim to explore whether alternative input approximation methods can provide an even better tradeoff between accuracy and energy consumption. To this end, by exploiting the fact that weight operand values are smaller than activations and prepared offline in DNNs, we simplify a static segmented multiplier (SSM) into a static semi-segmented multiplier (S3M), achieving a 31.58% reduction in power-delay product (PDP) compared to the original SSM, with similar classification accuracy. Additionally, we propose Coded S3M with optimized memory usage and im-plement various multipliers on a systolic array-based accelerator. Experimental results show that the proposed S3M and Coded S3M outperform existing 8-bit approximate multipliers in DNN applications, effectively bridging the PDP and inference accuracy tradeoff observed across exact commercial IP multipliers of varied bit-widths without requiring time-consuming retraining. Consequently, the proposed multiplier designs provide enhanced computational solutions for energy-efficient DNN inference ac-celerators.

DOI： 10.1109/ICCD63220.2024.00014

Web of Science

Scopus
S3M: Static Semi-Segmented Multipliers for Energy-efficient DNN Inference Accelerators Reviewed

Zhang Mingtao, Cheng Quan, Awano Hiromitsu, Lin Longyang, Hashimoto Masanori

IEEE International Conference on Computer Design: VLSI in Computers and Processors, (ICCD) page： 16 - 23 2024

　More details

Language：English

Approximate multipliers offer an efficient approach to reduce power consumption in compute-intensive applications, such as Deep Neural Networks (DNNs). However, current 8-bit approximate multipliers struggle to maintain high accuracy across various DNN applications. In this paper, we highlight challenges in 8-bit multiplier designs with body approximation strategies and evaluate the effectiveness of input approximation methods. Recognizing that exact multipliers with quantization bit-widths below 8 bits have demonstrated superior performance, we aim to explore whether alternative input approximation methods can provide an even better trade-off between accuracy and energy consumption. To this end, by exploiting the fact that weight operand values are smaller than activations and prepared offline in DNNs, we simplify a static segmented multiplier (SSM) into a static semi-segmented multiplier (S³M), achieving a 31.58% reduction in power-delay product (PDP) compared to the original SSM, with similar classification accuracy. Additionally, we propose Coded S³M with optimized memory usage and implement various multipliers on a systolic array-based accelerator. Experimental results show that the proposed S³M and Coded S³M outperform existing 8-bit approximate multipliers in DNN applications, effectively bridging the PDP and inference accuracy trade-off observed across exact commercial IP multipliers of varied bit-widths without requiring time-consuming retraining. Consequently, the proposed multiplier designs provide enhanced computational solutions for energy-efficient DNN inference accelerators.

CiNii Research
Fast Parameter Optimization of Delayed Feedback Reservoir with Backpropagation and Gradient Descent Reviewed Open Access

Ikeda, S; Awano, H; Sato, T

2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE 2024

　More details

DOI： 10.23919/DATE58400.2024.10546773

Web of Science
Double MAC on a Cell: A 22-nm 8T-SRAM-Based Analog In-Memory Accelerator for Binary/Ternary Neural Networks Featuring Split Wordline Reviewed Open Access

Tagata, H; Sato, T; Awano, H

IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS Vol. 5 page： 328 - 340 2024

　More details

Publisher：IEEE Open Journal of Circuits and Systems

This paper proposes a novel 8T-SRAM based computing-in-memory (CIM) accelerator for the Binary/Ternary neural networks. The proposed split dual-port 8T-SRAM cell has two input ports, simultaneously performing two binary multiply-and-accumulate (MAC) operations on left and right bitlines. This approach enables a twofold increase in throughput without significantly increasing area or power consumption, since the area overhead for doubling throughput is only two additional WL wires compared to the conventional 8T-SRAM. In addition, the proposed circuit supports binary and ternary activation input, allowing flexible adjustment of high energy efficiency and high inference accuracy depending on the application. The proposed SRAM macro consists of a 128 × 128 SRAM array that outputs the MAC operation results of 96 binary/ternary inputs and 96 × 128 binary weights as 1-5 bit digital values. The proposed circuit performance was evaluated by post-layout simulation with the 22-nm process layout of the overall CIM macro. The proposed circuit is capable of high-speed operation at 1 GHz. It achieves a maximum area efficiency of 3320 TOPS/mm2, which is 3.4 × higher compared to existing research with a reasonable energy efficiency of 1471 TOPS/W. The simulated inference accuracies of the proposed circuit are 96.45%/97.67% for MNIST dataset with binary/ternary MLP model, and 86.32%/88.56% for CIFAR-10 dataset with binary/ternary VGG-like CNN model.

DOI： 10.1109/OJCAS.2024.3482469

Open Access

Web of Science

Scopus
StrideHD: A Binary Hyperdimensional Computing System Utilizing Window Striding for Image Classification Reviewed Open Access

Liang, DH; Shiomi, J; Miura, N; Awano, H

IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS Vol. 5 page： 211 - 223 2024

　More details

Publisher：IEEE Open Journal of Circuits and Systems

Hyper-Dimensional (HD) computing is a brain-inspired learning approach for efficient and fast learning on today's embedded devices. HDC first encodes all data points to high-dimensional vectors called hypervectors and then efficiently performs the classification task using a well-defined set of operations. Although HDC achieved reasonable performances in several practical tasks, it comes with huge memory requirements since the data point should be stored in a very long vector having thousands of bits. To alleviate this problem, we propose a novel HDC architecture, called StrideHD. By utilizing the window striding in image classification, StrideHD enables HDC system to be trained and tested using binary hypervectors and achieves high accuracy with fast training speed and significantly low hardware resources. StrideHD encodes data points to distributed binary hypervectors and eliminates the expensive Channel item Memory (CiM) and item Memory (iM) in the encoder, which significantly reduces the required hardware cost for inference. Our evaluation also shows that compared with two popular HD algorithms, the singlepass StrideHD model achieves a 27.6 × and 8.2 × reduction in inference memory cost without hurting the classification accuracy, while the iterative mode further provides 8.7× memory efficiency. Under the same inference memory cost, our single-pass mode StrideHD averagely achieves 13.56% accuracy improvement in comparison with the single-pass baseline HD, which is a similar performance even in comparison with the costly iterative baseline HD models. As an extension, the iterative retraining mode of StrideHD averagely provides 11.33% accuracy improvement to its single-pass mode, which can be accomplished in fewer iterations in comparison with the baseline HD algorithms. The hardware implementation also demonstrates that StrideHD achieves over 9.9× and 28.8× reduction compared with baseline in area and power, respectively.

DOI： 10.1109/OJCAS.2024.3401028

Open Access

Web of Science

Scopus
Square-wave defined pulse generator for high fidelity gate operation of superconducting qubits Reviewed

Matsuo, R.; Ogawa, K.; Shiomi, H.; Negoro, M.; Ohira, R.; Miyoshi, T.; Shintani, M.; Awano, H.; Sato, T.; Shiomi, J.

2024 IEEE International Conference on Quantum Computing and Engineering (QCE) 2024

　More details

Language：English
Exploring Surface Code Decoding via Cryo-CMOS for Fault-Tolerant Quantum Computers Reviewed

Wang, R.T.; Sato, T.; Awano, H.

2024 IEEE International Conference on Quantum Computing and Engineering (QCE) 2024

　More details

Authorship：Last author Language：English
Triplet Network-Based DNA Encoding for Enhanced Similarity Image Retrieval Reviewed

Koike, T; Awano, H; Sato, T

PROCEEDINGS OF THE 61ST ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2024 2024

　More details

Publisher：Proceedings Design Automation Conference

With the exponential growth of digital data, DNA is emerging as an attractive medium for storage and computing. Thus, design methods for encoding, storing, and searching digital data within DNA storage are of utmost importance. This paper introduces image classification as a measurable task for evaluating the performance of DNA encoders in similar image searches. Furthermore, we propose a novel triplet network-based DNA encoder to improve the accuracy and efficiency. The evaluation using the CIFAR-100 dataset demonstrates that the proposed encoder outperforms existing encoders in retrieving similar images, with an accuracy of 0.77, which is equivalent to 94% of the practical upper limit, and 16 times faster training time.

DOI： 10.1145/3649329.3657320

Web of Science

Scopus
Modular DFR: Digital Delayed Feedback Reservoir Model for Enhancing Design Flexibility Reviewed

Ikeda, S; Awano, H; Sato, T

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS Vol. 22 ( 5 ) 2023.10

　More details

Publisher：ACM Transactions on Embedded Computing Systems

A delayed feedback reservoir (DFR) is a type of reservoir computing system well-suited for hardware implementations owing to its simple structure. Most existing DFR implementations use analog circuits that require both digital-to-analog and analog-to-digital converters for interfacing. However, digital DFRs emulate analog nonlinear components in the digital domain, resulting in a lack of design flexibility and higher power consumption. In this paper, we propose a novel modular DFR model that is suitable for fully digital implementations. The proposed model reduces the number of hyperparameters and allows flexibility in the selection of the nonlinear function, which improves the accuracy while reducing the power consumption. We further present two DFR realizations with different nonlinear functions, achieving 10× power reduction and 5.3× throughput improvement while maintaining equal or better accuracy.

DOI： 10.1145/3609105

Web of Science

Scopus
Uncertainty-Aware Haptic Shared Control With Humanoid Robots for Flexible Object Manipulation Reviewed

Hara, T; Sato, T; Ogata, T; Awano, H

IEEE ROBOTICS AND AUTOMATION LETTERS Vol. 8 ( 10 ) page： 6435 - 6442 2023.10

　More details

Publisher：IEEE Robotics and Automation Letters

We propose a haptic shared control system that predicts human manipulation intentions using a neural network and adaptively presents haptic guidance to achieve smooth robot control remotely. Although the haptic shared control has garnered increasing attention as a method to improve operability in remote operations, incorrect guidance can worsen operability. In this study, we dynamically switch the strength of haptic guidance presentation depending on the uncertainty of the inference results of the neural network. Thus, we weaken the haptic guidance presentation strength for predictions in which the neural network lacks confidence and strengthen it for those with high confidence, thereby achieving guidance presentation that does not impede human manipulation. As a result of experiments using the Nextage OPEN upper-body humanoid robot, in a task involving folding a flexible object, we succeeded in reducing task execution time by 17.1% compared to that with an existing method that determines the strength of haptic guidance presentation without considering the confidence of the neural network.

DOI： 10.1109/LRA.2023.3306668

Web of Science

Scopus
BayesianPUFNet: Training Sample Efficient Modeling Attack for Physically Unclonable Functions Reviewed Open Access

AWANO Hiromitsu, IKEDA Makoto

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences Vol. E106.A ( 5 ) page： 840 - 850 2023.5

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

This paper proposes a deep neural network named BayesianPUFNet that can achieve high prediction accuracy even with few challenge-response pairs (CRPs) available for training. Generally, modeling attacks are a vulnerability that could compromise the authenticity of physically unclonable functions (PUFs); thus, various machine learning methods including deep neural networks have been proposed to assess the vulnerability of PUFs. However, conventional modeling attacks have not considered the cost of CRP collection and analyzed attacks based on the assumption that sufficient CRPs were available for training; therefore, previous studies may have underestimated the vulnerability of PUFs. Herein, we show that the application of Bayesian deep neural networks that incorporate Bayesian statistics can provide accurate response prediction even in situations where sufficient CRPs are not available for learning. Numerical experiments show that the proposed model uses only half the CRP to achieve the same response prediction as that of the conventional methods. Our code is openly available on https://github.com/bayesian-puf-net/bayesian-puf-net.git.

DOI： 10.1587/transfun.2022eap1061

Web of Science

Scopus

CiNii Research
B2N2: Resource efficient Bayesian neural network accelerator using Bernoulli sampler on FPGA Reviewed Open Access

Awano H., Hashimoto M.

Integration Vol. 89 page： 1 - 8 2023.3

　More details

Publisher：Integration

A resource efficient hardware accelerator for Bayesian neural network (BNN) named B2N2, Bernoulli random number based Bayesian neural network accelerator, is proposed. As neural networks expand their application into risk sensitive domains where mispredictions may cause serious social and economic losses, evaluating the NN's confidence on its prediction has emerged as a critical concern. Among many uncertainty evaluation methods, BNN provides a theoretically grounded way to evaluate the uncertainty of NN's output by treating network parameters as random variables. By exploiting the central limit theorem, we propose to replace costly Gaussian random number generators (RNG) with Bernoulli RNG which can be efficiently implemented on hardware since the possible outcome from Bernoulli distribution is binary. We demonstrate that B2N2 implemented on Xilinx ZCU104 FPGA board consumes only 465 DSPs and 81661 LUTs which corresponds to 50.9% and 14.3% reductions compared to Gaussian-BNN (Hirayama et al., 2020) implemented on the same FPGA board for fair comparison. We further compare B2N2 with VIBNN (Cai et al., 2018), which shows that B2N2 successfully reduced DSPs and LUTs usages by 50.9% and 57.9%, respectively. Owing to the reduced hardware resources, B2N2 improved energy efficiency by 7.50% and 57.5% compared to Gaussian-BNN (Hirayama et al., 2020) and VIBNN (Cai et al., 2018), respectively.

DOI： 10.1016/j.vlsi.2022.11.005

Open Access

Scopus
DependableHD: A Hyperdimensional Learning Framework for Edge-oriented Voltage-scaled Circuits Reviewed

Liang, D; Awano, H; Miura, N; Shiomi, J

2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC page： 416 - 422 2023

　More details

Publisher：Proceedings of the Asia and South Pacific Design Automation Conference ASP DAC

Voltage scaling is one of the most promising approaches for energy efficiency improvement but also brings challenges to fully guaranteeing the stable operation in modern VLSI. To tackle such issues, we propose DependableHD, a learning framework based on HyperDimensional Computing (HDC), which supports the systems to tolerate bit-level memory failure in the low voltage region with high robustness. For the first time, DependableHD introduces the concept of margin enhancement for model retraining and utilizes noise injection to improve the robustness, which is capable of application in most state-of-the-art HDC algorithms. Our experiment shows that under 10% memory error, DependableHD exhibits a 1.22% accuracy loss on average, which achieves an 11.2× improvement compared to the baseline HDC solution. The hardware evaluation shows that DependableHD supports the systems to reduce the supply voltage from 400mV to 300mV, which provides a 50.41% energy consumption reduction while maintaining competitive accuracy performance.

DOI： 10.1145/3566097.3567886

Web of Science

Scopus
Pay Attention via Quantization: Enhancing Explainability of Neural Networks via Quantized Activation Reviewed Open Access

Tashiro, Y; Awano, H

IEEE ACCESS Vol. 11 page： 34431 - 34439 2023

　More details

Publisher：IEEE Access

Modern deep learning algorithms comprise highly complex artificial neural networks, making it extremely difficult for humans to track their inference processes. As the social implementation of deep learning progresses, the human and economic losses caused by inference errors are becoming increasingly problematic, making it necessary to develop methods to explain the basis for the decisions of deep learning algorithms. Although an attention mechanism-based method to visualize the regions that contribute to steering angle prediction in an automated driving task has been proposed, its explanatory capability is low. In this paper, we focus on the fact that the importance of each bit in the activation value of a network is biased (i.e., the sign and exponent bits are weighted more heavily than the mantissa bits), which has been overlooked in previous studies. Specifically, this paper quantizes network activations, encouraging important information to be aggregated to the sign bit. Further, we introduce an attention mechanism restricted to the sign bit to improve the explanatory power. Our numerical experiment using the Udacity dataset revealed that the proposed method achieves a 1.14× higher area under curve (AUC) in terms of the deletion metric.

DOI： 10.1109/ACCESS.2023.3264855

Open Access

Web of Science

Scopus
Introducing Transfer Learning Framework on Device Modeling by Machine Learning Reviewed

Niiyama K., Awano H., Sato T.

IEEE International Conference on Microelectronic Test Structures Vol. 2023-March 2023

　More details

Publisher：IEEE International Conference on Microelectronic Test Structures

In this study, we propose a novel transistor modeling method using machine learning techniques, with a focus on extrapolation performance. Our method leverages knowledge from a base model that is related to the target model, instead of relying solely on device-specific information. The results show that our approach outperforms other transistor modeling methods based on machine learning, particularly in modeling similar but different transistors that belong to the same device family. Our method was able to reduce the root mean squared error (RMSE) by up to 80.0% compared to other methods.

DOI： 10.1109/ICMTS55420.2023.10094067

Scopus
Hardware-Friendly Delayed-Feedback Reservoir for Multivariate Time-Series Classification Reviewed Open Access

Ikeda, S; Awano, H; Sato, T

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS Vol. 41 ( 11 ) page： 3650 - 3660 2022.11

　More details

Publisher：IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems

Reservoir computing (RC) is attracting attention as a machine-learning technique for edge computing. In time-series classification tasks, the number of features obtained using a reservoir depends on the length of the input series. Therefore, the features must be converted to a constant-length intermediate representation (IR), such that they can be processed by an output layer. Existing conversion methods involve computationally expensive matrix inversion that significantly increases the circuit size and requires processing power when implemented in hardware. In this article, we propose a simple but effective IR, namely, dot-product-based reservoir representation (DPRR), for RC based on the dot product of data features. Additionally, we propose a hardware-friendly delayed-feedback reservoir (DFR) consisting of a nonlinear element and delayed feedback loop with DPRR. The proposed DFR successfully classified multivariate time series data that has been considered particularly difficult to implement efficiently in hardware. In contrast to conventional DFR models that require analog circuits, the proposed model can be implemented in a fully digital manner suitable for high-level syntheses. A comparison with existing machine-learning methods via field-programmable gate array implementation using 12 multivariate time-series classification tasks confirmed the superior accuracy and small circuit size of the proposed method.

DOI： 10.1109/TCAD.2022.3197488

Web of Science

Scopus
A Hardware Efficient Reservoir Computing System Using Cellular Automata and Ensemble Bloom Filter Reviewed Open Access

LIANG Dehua, SHIOMI Jun, MIURA Noriyuki, HASHIMOTO Masanori, AWANO Hiromitsu

IEICE Transactions on Information and Systems Vol. E105.D ( 7 ) page： 1273 - 1282 2022.7

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

Reservoir computing (RC) is an attractive alternative to machine learning models owing to its computationally inexpensive training process and simplicity. In this work, we propose EnsembleBloomCA, which utilizes cellular automata (CA) and an ensemble Bloom filter to organize an RC system. In contrast to most existing RC systems, EnsembleBloomCA eliminates all floating-point calculation and integer multiplication. EnsembleBloomCA adopts CA as the reservoir in the RC system because it can be implemented using only binary operations and is thus energy efficient. The rich pattern dynamics created by CA can map the original input into a high-dimensional space and provide more features for the classifier. Utilizing an ensemble Bloom filter as the classifier, the features provided by the reservoir can be effectively memorized. Our experiment revealed that applying the ensemble mechanism to the Bloom filter resulted in a significant reduction in memory cost during the inference phase. In comparison with Bloom WiSARD, one of the state-of-the-art reference work, the EnsembleBloomCA model achieves a 43× reduction in memory cost while maintaining the same accuracy. Our hardware implementation also demonstrated that EnsembleBloomCA achieved over 23× and 8.5× reductions in area and power, respectively.

DOI： 10.1587/transinf.2021edp7203

Open Access

Web of Science

Scopus

CiNii Research
Temporal Ensemble SSDLite: Exploiting Temporal Correlation in Video for Accurate Object Detection Reviewed

NAKAMURA Lukas, AWANO Hiromitsu

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences Vol. E105.A ( 7 ) page： 1082 - 1090 2022.7

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

We propose “Temporal Ensemble SSDLite,” a new method for video object detection that boosts accuracy while maintaining detection speed and energy consumption. Object detection for video is becoming increasingly important as a core part of applications in robotics, autonomous driving and many other promising fields. Many of these applications require high accuracy and speed to be viable, but are used in compute and energy restricted environments. Therefore, new methods that increase the overall performance of video object detection i.e., accuracy and speed have to be developed. To increase accuracy we use ensemble, the machine learning method of combining predictions of multiple different models. The drawback of ensemble is the increased computational cost which is proportional to the number of models used. We overcome this deficit by deploying our ensemble temporally, meaning we inference with only a single model at each frame, cycling through our ensemble of models at each frame. Then, we combine the predictions for the last N frames where N is the number of models in our ensemble through non-max-suppression. This is possible because close frames in a video are extremely similar due to temporal correlation. As a result, we increase accuracy through the ensemble while only inferencing a single model at each frame and therefore keeping the detection speed. To evaluate the proposal, we measure the accuracy, detection speed and energy consumption on the Google Edge TPU, a machine learning inference accelerator, with the Imagenet VID dataset. Our results demonstrate an accuracy boost of up to 4.9% while maintaining real-time detection speed and an energy consumption of 181mJ per image.

DOI： 10.1587/transfun.2021eap1068

Web of Science

Scopus

CiNii Research
DistriHD: A Memory Efficient Distributed Binary Hyperdimensional Computing Architecture for Image Classification Reviewed

Liang D., Shiomi J., Miura N., Awano H.

Proceedings of the Asia and South Pacific Design Automation Conference ASP DAC Vol. 2022-January page： 43 - 49 2022

　More details

Publisher：Proceedings of the Asia and South Pacific Design Automation Conference ASP DAC

Hyper-Dimensional (HD) computing is a brain-inspired learning approach for efficient and fast learning on today's embedded devices. HD computing first encodes all data points to high-dimensional vectors called hypervectors and then efficiently performs the classification task using a well-defined set of operations. Although HD computing achieved reasonable performances in several practical tasks, it comes with huge memory requirements since the data point should be stored in a very long vector having thousands of bits. To alleviate this problem, we propose a novel HD computing architecture, called DistriHD which enables HD computing to be trained and tested using binary hypervectors and achieves high accuracy in single-pass training mode with significantly low hardware resources. DistriHD encodes data points to distributed binary hypervectors and eliminates the expensive item memory in the encoder, which significantly reduces the required hardware cost for inference. Our evaluation also shows that our model can achieve a 27.6× reduction in memory cost without hurting the classification accuracy. The hardware implementation also demonstrates that DistriHD achieves over 9.9× and 28.8× reduction in area and power, respectively.

DOI： 10.1109/ASP-DAC52403.2022.9712589

Scopus
Respiratory Rate Estimation Based on WiFi Frame Capture Reviewed Open Access

Kanda T., Sato T., Awano H., Kondo S., Yamamoto K.

Proceedings IEEE Consumer Communications and Networking Conference Ccnc page： 881 - 884 2022

　More details

Publisher：Proceedings IEEE Consumer Communications and Networking Conference Ccnc

This paper presents a method that estimates the respiratory rate based on the frame capturing of wireless local area networks. The method uses beamforming feedback matrices (BFMs) contained in the captured frames, which is a rotation matrix of channel state information (CSI). BFMs are transmitted unencrypted and easily obtained using frame capturing, requiring no specific firmware or WiFi chipsets, unlike the methods that use CSI. Such properties of BFMs allow us to apply frame capturing to various sensing tasks, e.g., vital sensing. In the proposed method, principal component analysis is applied to BFMs to isolate the effect of the chest movement of the subject, and then, discrete Fourier transform is performed to extract respiratory rates in a frequency domain. Experimental evaluation results confirm that the frame-capture-based respiratory rate estimation can achieve estimation error lower than 3.5 breaths/minute.

DOI： 10.1109/CCNC49033.2022.9700721

Scopus
Pay Attention via Binarization: Enhancing Explainability of Neural Networks via Binarization of Activation Reviewed

Tashiro, Y; Awano, H

2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22) Vol. 2022-May page： 3160 - 3164 2022

　More details

Publisher：Proceedings IEEE International Symposium on Circuits and Systems

Modern deep learning algorithms consist of highly complex artificial neural networks, making it extremely difficult for humans to track the inference process. While the social implementation of deep learning is progressing, the human and economic losses caused by inference errors are becoming more and more problematic, and there is a need for methods to explain the basis for the decisions of deep learning algorithms. Although, in an automated driving task, a method to visualize the regions that contribute to steering angle prediction using an attention mechanism has been proposed, its explanatory capability is still low. In this paper, we focus on the difference in the importance of each bit in the activation (i.e., the LSBs have the lowest weight while the MSBs have the highest weight), and propose a method to add attention only to the sign bits to further enhance the explanation. Our numerical experiment using the Udacity dataset revealed that the proposed method achieves 33% higher area under curve (AUC) in terms of the deletion metric.

DOI： 10.1109/ISCAS48785.2022.9937289

Web of Science

Scopus
Psyche Navigation System 構想 Open Access

熊谷誠慈, 三浦典之, 粟野皓光, 上田祥行

人工知能 Vol. 36 ( 6 ) page： 684 - 694 2021.11

　More details

Language：Japanese Publisher：一般社団法人人工知能学会

DOI： 10.11517/jjsai.36.6_684

Open Access

CiNii Research
Visualization of a chorus structure in multiple frog species by a sound discrimination device Reviewed

Awano, H; Shirasaka, M; Mizumoto, T; Okuno, HG; Aihara, I

JOURNAL OF COMPARATIVE PHYSIOLOGY A-NEUROETHOLOGY SENSORY NEURAL AND BEHAVIORAL PHYSIOLOGY Vol. 207 ( 1 ) page： 87 - 98 2021.1

　More details

Language：English Publisher：Journal of Comparative Physiology A Neuroethology Sensory Neural and Behavioral Physiology

We developed a sound discrimination device to identify and localize the species of nocturnal animals in their natural habitat. The sound discrimination device is equipped with a microphone, a light-emitting diode, and a band-pass filter. By tuning the center frequency of the filter to include a dominant frequency of the calls of a focal species, we enable the device to be illuminated only when detecting the calls of the focal species. In experiments in a laboratory room, we tuned the sound discrimination devices to detect the calls of Hyla japonica or Rhacophorus schlegelii and broadcast the frog calls from loudspeakers. By analyzing the illumination pattern of the devices, we successfully identified and localized the two kinds of sound sources. Next, we placed the sound discrimination devices in a field site where actual male frogs (H. japonica and R. schlegelii) produced sounds. The analysis of the illumination pattern demonstrates the efficacy of the developed devices in a natural environment and also enables us to extract pairs of male frogs that significantly overlapped or alternated their calls.

DOI： 10.1007/s00359-021-01463-9

Web of Science

Scopus

PubMed
Binary Neural Network in Robotic Manipulation: Flexible Object Manipulation for Humanoid Robot Using Partially Binarized Auto-Encoder on FPGA Reviewed

Ohara, S; Ogata, T; Awano, H

2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) page： 6010 - 6015 2021

　More details

Publisher：IEEE International Conference on Intelligent Robots and Systems

A neural network based flexible object manipulation system for a humanoid robot on FPGA is proposed. Although the manipulations of flexible objects using robots attract ever increasing attention since these tasks are the basic and essential activities in our daily life, it has been put into practice only recently with the help of deep neural networks. However such systems have relied on GPU accelerators, which cannot be implemented into the space limited robotic body. Although field programmable gate arrays (FPGAs) are known to be energy efficient and suitable for embedded systems, the model size should be drastically reduced since FPGAs have limited on-chip memory. To this end, we propose partially binarized deep convolutional auto-encoder technique, where only an encoder part is binarized to compress model size without degrading the inference accuracy. The model implemented on Xilinx ZCU102 achieves 41.1 frames per second with a power consumption of 3.1 W, which corresponds to 10× and 3.7× improvements from the systems implemented on Core i7 6700K and RTX 2080 Ti, respectively.

DOI： 10.1109/IROS51168.2021.9636825

Web of Science

Scopus
BloomCA: A Memory Efficient Reservoir Computing Hardware Implementation Using Cellular Automata and Ensemble Bloom Filter Reviewed

Liang, DH; Hashimoto, M; Awano, H

PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021) page： 587 - 590 2021

　More details

DOI： 10.23919/DATE51398.2021.9474047

Web of Science
Ising-PUF: A machine learning attack resistant PUF featuring lattice like arrangement of Arbiter-PUFs Reviewed

Awano H., Sato T.

Proceedings of the 2018 Design Automation and Test in Europe Conference and Exhibition Date 2018 Vol. 2018-January page： 1447 - 1452 2018.4

　More details

Publisher：Proceedings of the 2018 Design Automation and Test in Europe Conference and Exhibition Date 2018

A concept of Ising-PUF, a novel PUF structure that utilizes chaotic behavior of mutually interacting small PUFs, is proposed. Ising-PUF consists of a lattice like arrangement of small PUFs, each of which contains a spin register that stores the response of the small PUF, which also serves as a challenge of its neighbors. The spin patterns that develop along time determine the 1-bit response of the Ising-PUF. Utilizing state-memorizing nature of the spin registers, Ising-PUF attains a challenge hysteresis, i.e., allowing sequence of challenge inputs that continuously stimulate its chaotic behavior, which provides the drastically large challenge-to-response space. Experimental results demonstrate nearly ideal metrics; inter-chip Hamming distance (HD) of 50.1% and inter-environment HD of 2.26%. Further, Ising-PUF is remarkably tolerant to machine learning attacks, demonstrating that, even with a deep neural network using a 50k training cRPs, the prediction accuracy remains 50%, which is comparable to a random guess.

DOI： 10.23919/DATE.2018.8342239

Scopus
Visualizing Phonotactic Behavior of Female Frogs in Darkness Reviewed Open Access

Aihara, I; Bishop, PJ; Ohmer, MEB; Awano, H; Mizumoto, T; Okuno, HG; Narins, PM; Hero, JM

SCIENTIFIC REPORTS Vol. 7 ( 1 ) page： 10539 2017.9

　More details

Language：English Publisher：Scientific Reports

Many animals use sounds produced by conspecifics for mate identification. Female insects and anuran amphibians, for instance, use acoustic cues to localize, orient toward and approach conspecific males prior to mating. Here we present a novel technique that utilizes multiple, distributed sound-indication devices and a miniature LED backpack to visualize and record the nocturnal phonotactic approach of females of the Australian orange-eyed tree frog (Litoria chloris) both in a laboratory arena and in the animal's natural habitat. Continuous high-definition digital recording of the LED coordinates provides automatic tracking of the female's position, and the illumination patterns of the sound-indication devices allow us to discriminate multiple sound sources including loudspeakers broadcasting calls as well as calls emitted by individual male frogs. This innovative methodology is widely applicable for the study of phonotaxis and spatial structures of acoustically communicating nocturnal animals.

DOI： 10.1038/s41598-017-11150-y

Open Access

Web of Science

Scopus

PubMed
RTN in Scaled Transistors for On-Chip Random Seed Generation Reviewed

Mohanty, A; Sutaria, KB; Awano, H; Sato, T; Cao, Y

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS Vol. 25 ( 8 ) page： 2248 - 2257 2017.8

　More details

Publisher：IEEE Transactions on Very Large Scale Integration VLSI Systems

Random numbers play a vital role in cryptography, where they are used to generate keys, nonce, one-time pads, and initialization vectors for symmetric encryption. The quality of random number generator (RNG) has significant implications on vulnerability and performance of these algorithms. A pseudo-RNG uses a deterministic algorithm to produce numbers with a distribution very similar to uniform. True RNGs (TRNGs), on the other hand, use some natural phenomenon/process to generate random bits. They are nondeterministic, because the next number to be generated cannot be determined in advance. In this paper, a novel on-chip noise source, random telegraph noise (RTN), is exploited for simple and reliable TRNG. RTN, a microscopic process of stochastic trapping/detrapping of charges, is usually considered as a noise and mitigated in design. Through physical modeling and silicon measurement, we demonstrate that RTN is appropriate for TRNG, especially in highly scaled MOSFETs. Due to the slow speed of RTN, we purpose the system for on-chip seed generation for random number. Our contributions are: 1) physical model calibration of RTN with comprehensive 65-and 180-nm transistor measurements; 2) the scaling trend of RTN, validated with silicon data down to 28 nm; 3) design principles to achieve 50% signal probability by using intrinsic RTN physical properties, without traditional postprocessing algorithms, the generated sequence passes the National Institute of Standards and Technology (NIST) tests; and 4) solutions to manage realistic issues in practice, including multilevel RTN signal, robustness to voltage and temperature fluctuations and the operation speed.

DOI： 10.1109/TVLSI.2017.2687762

Web of Science

Scopus
Scalable Device Array for Statistical Characterization of BTI-Related Parameters Reviewed

Awano, H; Morita, S; Sato, T

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS Vol. 25 ( 4 ) page： 1455 - 1466 2017.4

　More details

Publisher：IEEE Transactions on Very Large Scale Integration VLSI Systems

A device array circuit, scalable in terms of the number of transistors used, is proposed. The proposed array facilitates accurate and simultaneous bias voltage application to a large number of devices, making it suitable for the measurement-based statistical characterization of device degradation, known as bias temperature instability. Using the proposed array, the degradation measurement of thousands of transistors is made possible in a practical amount of time. The experimental results show that the defect-centric model can approximate the statistical variation in magnitudes of threshold voltage shifts ( Delta V{mathrm {TH}} ) and that the variance of Delta V{mathrm {TH}} bears an inverse relationship to the channel areas of transistors. The degradation variability under ac stress conditions is also presented for the first time.

DOI： 10.1109/TVLSI.2016.2638021

Web of Science

Scopus
Efficient circuit failure probability calculation along product lifetime considering device aging Reviewed

Awano H., Hiromoto M., Sato T.

Proceedings of the Asia and South Pacific Design Automation Conference ASP DAC page： 93 - 98 2017.2

　More details

Publisher：Proceedings of the Asia and South Pacific Design Automation Conference ASP DAC

A device-aging simulation that efficiently estimates temporal degradation of failure probability of a circuit is proposed. As the size of transistors shrinks, consideration of device aging in addition to manufacturing variability has become an urgent issue for maintaining reliability of LSIs. Contrary to existing techniques that separately handle manufacturing variability and the device aging, we propose a simultaneous evaluation approach using an augmented reliability and subset simulation. By eliminating the repetitive failure-probability calculations at each device-age, the proposed method reduces the number of required circuit simulations to about 1/6 of that of the conventional method without compromising accuracy.

DOI： 10.1109/ASPDAC.2017.7858302

Scopus
Swarm of sound-to-light conversion devices to monitor acoustic communication among small nocturnal animals Reviewed Open Access

Mizumoto T., Aihara I., Otsuka T., Awano H., Okuno H.G.

Journal of Robotics and Mechatronics Vol. 29 ( 1 ) page： 255 - 267 2017.2

　More details

Publisher：Journal of Robotics and Mechatronics

While many robots have been developed to monitor environments, most studies are dedicated to navigation and locomotion and use off-the-shelf sensors. We focus on a novel acoustic device and its processing software, which is designed for a swarm of environmental monitoring robots equipped with the device. This paper demonstrates that a swarm of monitoring devices is useful for biological field studies, i.e., understanding the spatio-temporal structure of acoustic communication among animals in their natural habitat. The following processes are required in monitoring acoustic communication to analyze the natural behavior in the field: (1) working in their habitat, (2) automatically detecting multiple and simultaneous calls, (3) minimizing the effect on the animals and their habitat, and (4) working with various distributions of animals. We present a sound-imaging system using sound-to-light conversion devices called “Fireflies” and their data analysis method that satisfies the requirements. We can easily collect data by placing a swarm (dozens) of Fireflies and record their light intensities using an offthe- shelf video camera. Because each Firefly converts sound in its vicinity into light, we can easily obtain when, how long, and where animals call using temporal analysis of the Firefly light intensities. The device is evaluated in terms of three aspects: volume to light-intensitycharacteristics, battery life through indoor experiments, and water resistance via field experiments. We also present the visualization of a chorus of Japanese tree frogs (Hyla japonica) recorded in their habitat, that is, paddy fields.

DOI： 10.20965/jrm.2017.p0255

Open Access

Scopus
Efficient Aging-Aware Failure Probability Estimation Using Augmented Reliability and Subset Simulation Reviewed Open Access

AWANO Hiromitsu, SATO Takashi

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences Vol. E100.A ( 12 ) page： 2807 - 2815 2017

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

A circuit-aging simulation that efficiently calculates temporal change of rare circuit-failure probability is proposed. While conventional methods required a long computational time due to the necessity of conducting separate calculations of failure probability at each device age, the proposed Monte Carlo based method requires to run only a single set of simulation. By applying the augmented reliability and subset simulation framework, the change of failure probability along the lifetime of the device can be evaluated through the analysis of the Monte Carlo samples. Combined with the two-step sample generation technique, the proposed method reduces the computational time to about 1/6 of that of the conventional method while maintaining a sufficient estimation accuracy.

DOI： 10.1587/transfun.e100.a.2807

Web of Science

Scopus

CiNii Research
Identification and Application of Invariant Critical Paths under NBTI Degradation Reviewed Open Access

BIAN Song, MORITA Shumpei, SHINTANI Michihiro, AWANO Hiromitsu, HIROMOTO Masayuki, SATO Takashi

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences Vol. E100.A ( 12 ) page： 2797 - 2806 2017

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

As technology further scales semiconductor devices, aging-induced device degradation has become one of the major threats to device reliability. In addition, aging mechanisms like the negative bias temperature instability (NBTI) are known to be sensitive to workload (i.e., signal probability) that is hard to be assumed at design phase. In this work, we analyze the workload dependence of NBTI degradation using a processor, and propose a novel technique to estimate the worst-case paths. In our approach, we exploit the fact that the deterministic nature of circuit structure limits the amount of NBTI degradation on different paths, and propose a two-stage path extraction algorithm to identify the invariant critical paths (ICPs) in the processor. Utilizing these paths, we also propose an optimization technique for the replacement of internal node control logic that mitigates the NBTI degradation in the design. Through numerical experiment on two processor designs, we achieved nearly 300x reduction in the sheer number of paths on both designs. Utilizing the extracted ICPs, we achieved 96x-197x speedup without loss in mitigation gain.

DOI： 10.1587/transfun.e100.a.2797

Web of Science

Scopus

CiNii Research
Physically unclonable function using RTN-induced delay fluctuation in ring oscillators Reviewed

Yoshinaga M., Awano H., Hiromoto M., Sato T.

Proceedings IEEE International Symposium on Circuits and Systems Vol. 2016-July page： 2619 - 2622 2016.7

　More details

Publisher：Proceedings IEEE International Symposium on Circuits and Systems

This paper proposes RTN-PUF, a novel PUF that utilizes random telegraph noise (RTN) of transistors as the physical uniqueness of individual devices. Our proposed RTN-PUF generates a response from a pair of ring oscillators (ROs) by comparing the numbers of frequency changes, which depend on the time constants of RTN. Due to the log-uniform distribution of the time constants, our RTN-PUF provides more stable responses than the existing manufacturing-variation-based PUFs. The numerical experiments show that the RTN-PUF reduces false negative errors by about 60 times compared to the conventional RO-based PUF. This facilitates to implement PUF into security purposes.

DOI： 10.1109/ISCAS.2016.7539130

Scopus
Call Alternation between Specific Pairs of Male Frogs Revealed by a Sound-Imaging Method in Their Natural Habitat Reviewed Open Access

Aihara, I; Mizumoto, T; Awano, H; Okuno, HG

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5 Vol. 08-12-September-2016 page： 2597 - 2601 2016

　More details

Publisher：Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

Male frogs vocalize calls to attract conspecific females as well as to announce their own territories to other male frogs. In the choruses, acoustic interaction allows the male frogs to alternate their calls with each other. Such call alternation is reported in various species of frogs including Japanese tree frogs (Hyla japonica). During call alternation, both male and female frogs are likely to discriminate calls of the male frogs because of small amount of call overlaps. Here, we show that call alternation is observed in natural choruses of male Japanese tree frogs especially between neighboring pairs. First, we demonstrate that caller positions and call timings can be estimated by a sound-imaging method. Second, the occurrence of call alternation is detected on the basis of statistical tests on phase differences of calls between respective pairs. Although our previous study revealed a global synchronization pattern in natural choruses of the male frogs, local chorus structures were not examined well. Through the observation of call alternation between specific pairs, this study suggests the existence of selective attention in the frog choruses.

DOI： 10.21437/Interspeech.2016-336

Open Access

Web of Science

Scopus
Efficient Aging-Aware SRAM Failure Probability Calculation via Particle Filter-Based Importance Sampling Reviewed Open Access

AWANO Hiromitsu, HIROMOTO Masayuki, SATO Takashi

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences Vol. E99.A ( 7 ) page： 1390 - 1399 2016

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

An efficient Monte Carlo (MC) method for the calculation of failure probability degradation of an SRAM cell due to negative bias temperature instability (NBTI) is proposed. In the proposed method, a particle filter is utilized to incrementally track temporal performance changes in an SRAM cell. The number of simulations required to obtain stable particle distribution is greatly reduced, by reusing the final distribution of the particles in the last time step as the initial distribution. Combining with the use of a binary classifier, with which an MC sample is quickly judged whether it causes a malfunction of the cell or not, the total number of simulations to capture the temporal change of failure probability is significantly reduced. The proposed method achieves 13.4× speed-up over the state-of-the-art method.

DOI： 10.1587/transfun.e99.a.1390

Web of Science

Scopus

CiNii Research
Workload-Aware Worst Path Analysis of Processor-Scale NBTI Degradation Reviewed

Bian, S; Shintani, M; Morita, S; Awano, H; Hiromoto, M; Sato, T

2016 INTERNATIONAL GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI) Vol. 18-20-May-2016 page： 203 - 208 2016

　More details

Publisher：Proceedings of the ACM Great Lakes Symposium on VLSI Glsvlsi

As technology further scales semiconductor devices, aging-induced device degradation has become one of the major threats to device reliability. In addition, aging mechanisms like the negative bias temperature instability (NBTI) is known to be sensitive to workload (i.e., signal probability) that is hard to be assumed at design phase. In this work, we analyze the workload dependence of NBTI degradation using a processor, and propose a novel technique to estimate the worst-case paths. In our approach, with careful examination, we exploit the fact that the deterministic nature of circuit structure limits the amount of NBTI degradation on different paths, and proposes a two-stage path extraction algorithm to identify the invariable critical paths in the processor. Through numerical experiment on a MIPS32 processor, we performed a detailed signal probability analysis, and successfully extracted 85 invariable critical paths out of the 24,978 path candidates, achieving nearly 300x reduction in the sheer number of paths.

DOI： 10.1145/2902961.2903013

Web of Science

Scopus
Efficient Transistor-level Timing Yield Estimation via Line Sampling Reviewed

Awano, H.; Sato, T.

2016 ACM/EDAC/IEEE Design Automation Conference (DAC) 2016

　More details

Authorship：Lead author Language：English
ECRIPSE: An efficient method for calculating RTN-induced failure probability of an SRAM cell Reviewed

Awano H., Hiromoto M., Sato T.

Proceedings Design Automation and Test in Europe Date Vol. 2015-April page： 549 - 554 2015.4

　More details

Publisher：Proceedings Design Automation and Test in Europe Date

Failure rate degradation of an SRAM cell due to random telegraph noise (RTN) is calculated for the first time. ECRIPSE, an efficient method for calculating the RTN-induced failure probability of an SRAM cell, has been developed to exhaustively cover a large number of possible bias-voltage combinations on which RTN statistics strongly depend. In order to shorten computational time, the Monte Carlo calculation of a single gate-bias condition is accelerated by incorporating two techniques: 1) construction of an optimal importance sampling using particles that move about the 'important' regions in a variability space, and 2) a classifier that quickly judges whether the random samples are in failure regions or not. We show that the proposed method achieves at least 15.6× speed-up over the state-of-the-art method. We then integrate an RTN model to modulate failure probability. In our experiment, RTN worsens failure probability by six times than that calculated without the effect of RTN.

DOI： 10.7873/date.2015.0731

Scopus
A-3-5 On Stochastic modeling of NBTI induced threshold voltage variation

Sato Masahiro, Izuka Syoichi, Awano Hiromitsu, Hashimoto Masanori, Onoye Takao

Proceedings of the IEICE General Conference Vol. 2015 page： 84 2015.2

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Research
Recognition of in-field frog chorusing using Bayesian nonparametric microphone array processing Reviewed

Bandog Y., Otsuka T., Aihara I., Awano H., Itoyama K., Yoshii K., Okuno H.G.

Aaai Workshop Technical Report Vol. WS-15-06 page： 2 - 6 2015

　More details

Publisher：Aaai Workshop Technical Report

In this paper, we exploit Bayesian nonparametric microphone array processing (BNP-MAP) for analyzing the spatio- Temporal patterns of the frog chorus. Such analysis in real environments is made more difficult due to unpredictable sound sources including calls of various species of animals. An application of conventional signal processing algorithms has been difficult because these algorithms usually require the number of sound sources in advance. BNP-MAP is developed to cope with auditory uncertainties such as reverberation or unknown number of sounds by using a unified model based on Bayesian nonparametrics. We exploit BNP-MAP for analyzing the sound data of 20 minutes captured by a 7-channel microphone array in a paddy rice field in Oki Island, Japan, and revealed that two individuals of Schlegel's green tree frog {Rhacophorus schlegelii) called alternately with anti-phase. This result is compared with the video data captured by a video camera with 18 units of sound-imaging devices called Firefly deployed along the bank of the rice field. The auditory result provides more detailed patterns of the frog chorus in higher temporal resolutions. This higher resolution enables to analyze fine temporal structures of the frog calls. For example, BNP-MAP reveals the trill-like calling pattern of R. schlegelii.

Scopus
Variability in device degradations: Statistical observation of NBTI for 3996 transistors Reviewed

Awano H., Hiromoto M., Sato T.

European Solid State Device Research Conference page： 218 - 221 2014.11

　More details

Publisher：European Solid State Device Research Conference

Degradations of thousands of transistors have been observed in a practical time. A novel device array circuit suitable for measurement-based statistical characterization has been devised to facilitate parallel stress bias application to capture negative bias temperature instability (NBTI). The experimental results show that log-normal distributions approximate the distribution of power-law exponents very well and that the variation in magnitude of threshold voltage shifts bears an inverse relation to the channel areas of transistors. The variability in degradations under an AC-stress condition is also presented for the first time.

DOI： 10.1109/ESSDERC.2014.6948799

Scopus
A-7-1 A Study of Chip Identification Using Random Telegraph Noise

Yoshinaga Motoki, Awano Hiromitsu, Hiromoto Masayuki, Sato Takashi

Proceedings of the Society Conference of IEICE Vol. 2014 page： 95 2014.9

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

CiNii Research
BTIarray: A Time-Overlapping Transistor Array for Efficient Statistical Characterization of Bias Temperature Instability Reviewed

Awano, H; Hiromoto, M; Sato, T

IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY Vol. 14 ( 3 ) page： 833 - 843 2014.9

　More details

Publisher：IEEE Transactions on Device and Materials Reliability

A transistor array has been developed that is capable of efficiently collecting parametric data for a statistical model of bias-temperature instability (BTI) degradation. This BTIarray uses a time-overlapping technique, in which all transistors in the array undergo BTI stress or recovery bias in parallel, which greatly reduces the measurement time for a large number of transistors. An implementation using 65-nm technology validated the time-overlapping concept. The use of this array reduces the time to measure the statistical threshold voltage shifts of 128 transistors from a month to within a day while retaining precision as high as 50 μV (rms). Experiments showed that the statistical distribution of the time exponent for the degradation model of the pMOS transistor was log-normal.

DOI： 10.1109/TDMR.2014.2327164

Web of Science

Scopus
Spatio-Temporal Dynamics in Collective Frog Choruses Examined by Mathematical Modeling and Field Observations Reviewed Open Access

Aihara, I; Mizumoto, T; Otsuka, T; Awano, H; Nagira, K; Okuno, HG; Aihara, K

SCIENTIFIC REPORTS Vol. 4 page： 3891 2014.1

　More details

Language：English Publisher：Scientific Reports

This paper reports theoretical and experimental studies on spatio-temporal dynamics in the choruses of male Japanese tree frogs. First, we theoretically model their calling times and positions as a system of coupled mobile oscillators. Numerical simulation of the model as well as calculation of the order parameters show that the spatio-temporal dynamics exhibits bistability between two-cluster antisynchronization and wavy antisynchronization, by assuming that the frogs are attracted to the edge of a simple circular breeding site. Second, we change the shape of the breeding site from the circle to rectangles including a straight line, and evaluate the stability of two-cluster and wavy antisynchronization. Numerical simulation shows that two-cluster antisynchronization is more frequently observed than wavy antisynchronization. Finally, we recorded frog choruses at an actual paddy field using our sound-imaging method. Analysis of the video demonstrated a consistent result with the aforementioned simulation: namely, two-cluster antisynchronization was more frequently realized.

DOI： 10.1038/srep03891

Open Access

Web of Science

Scopus

PubMed
A scalable device array for statistical device-aging characterization Reviewed

Sato T., Awano H., Hiromoto M.

Proceedings 2014 IEEE 12th International Conference on Solid State and Integrated Circuit Technology Icsict 2014 2014.1

　More details

Publisher：Proceedings 2014 IEEE 12th International Conference on Solid State and Integrated Circuit Technology Icsict 2014

A device array circuit that is suitable for efficiently characterizing device parameter degradation due to bias temperature instability (BTI) is reviewed. The device array facilitates parallel application of stress and recovery bias voltages to multiple devices, reducing total measurement time significantly. The device count in the array is easily scalable to meet necessary statistical confidence level. Measurement examples of the two implementations containing 128 and 3,996 devices are also presented.

DOI： 10.1109/ICSICT.2014.7021224

Scopus
Automation of Model Parameter Estimation for Random Telegraph Noise Reviewed Open Access

SHIMIZU Hirofumi, AWANO Hiromitsu, HIROMOTO Masayuki, SATO Takashi

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences Vol. E97.A ( 12 ) page： 2383 - 2392 2014

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

The modeling of random telegraph noise (RTN) of MOS transistors is becoming increasingly important. In this paper, a novel method is proposed for realizing automated estimation of two important RTN-model parameters: the number of interface-states and corresponding threshold voltage shift. The proposed method utilizes a Gaussian mixture model (GMM) to represent the voltage distributions, and estimates their parameters using the expectation-maximization (EM) algorithm. Using information criteria, the optimal estimation is automatically obtained while avoiding overfitting. In addition, we use a shared variance for all the Gaussian components in the GMM to deal with the noise in RTN signals. The proposed method improved estimation accuracy when the large measurement noise is observed.

DOI： 10.1587/transfun.e97.a.2383

Web of Science

Scopus

CiNii Research
Compact Modeling of Statistical BTI under Trapping/Detrapping Reviewed

Velamala, JB; Sutaria, KB; Shimizu, H; Awano, H; Sato, T; Wirth, G; Cao, Y

IEEE TRANSACTIONS ON ELECTRON DEVICES Vol. 60 ( 11 ) page： 3645 - 3654 2013.11

　More details

Publisher：IEEE Transactions on Electron Devices

The aging process due to negative bias temperature instability (NBTI) is a key limiting factor of circuit lifetimes in CMOS design. Recent NBTI data exhibits an excessive amount of randomness and fast recovery, which are difficult to be handled by conventional power-law model (tn). Such discrepancies further pose the challenge on long-term reliability prediction under statistical variations and dynamic voltage scaling (DVS) in real circuit operation. To overcome these barriers, this paper: 1) practically explains the aging statistics due to randomness in number of traps with the log(t) model, accurately predicting the mean and variance shift; 2) proposes cycle-to-cycle model (from the first principles of trapping) to handle aging under multiple supply voltages, predicting the nonmonotonic behavior under DVS; 3) presents a long-term model to estimate a tight upper bound of dynamic aging over multiple cycles; and 4) comprehensively validates the new set of aging models with 65-nm statistical silicon data. Compared with previous models, the new set of aging models capture the aging variability and the essential role of the recovery phase under DVS, reducing unnecessary guard banding during the design stage. © 1963-2012 IEEE.

DOI： 10.1109/TED.2013.2281986

Web of Science

Scopus
Logarithmic modeling of BTI under dynamic circuit operation: Static, dynamic and long-term prediction Reviewed

Velamala J.B., Sutaria K.B., Shimuzu H., Awano H., Sato T., Wirth G., Cao Y.

IEEE International Reliability Physics Symposium Proceedings 2013.8

　More details

Publisher：IEEE International Reliability Physics Symposium Proceedings

Bias temperature instability (BTI) is the dominant source of aging in nanoscale transistors. Recent works show the role of charge trapping/de-trapping (T-D) in BTI through discrete V<inf>th</inf> shifts, with the degradation exhibiting an excessive amount of randomness. Furthermore, modern circuits employ dynamic voltage scaling (DVS) where V<inf>dd</inf> is tuned, complicating the aging effect. It becomes challenging to predict long-term aging in an actual circuit under statistical variation and DVS. To accurately predict the degradation in these circumstances, this work (1) examines the principles of T-D, thereby proposing static and cycle-to-cycle (dynamic) models under voltage tuning in DVS; (2) presents a long-term model, estimating a tight upper bound of dynamic aging; (3) comprehensively validates the new set of models with 65nm silicon data. The proposed aging models accurately capture the recovery behavior in dynamic operations, reducing the unnecessary margin and enhancing the simulation efficiency for aging estimation during the design stage. © 2013 IEEE.

DOI： 10.1109/IRPS.2013.6532063

Scopus
Multi-trap RTN parameter extraction based on Bayesian inference Reviewed

Awano H., Tsutsui H., Ochi H., Sato T.

Proceedings International Symposium on Quality Electronic Design Isqed page： 597 - 602 2013.7

　More details

Publisher：Proceedings International Symposium on Quality Electronic Design Isqed

This paper presents a new analysis method for estimating the statistical parameters of random telegraph noise (RTN). RTN is characterized by the time constants of carrier capture and emission, and associated changes of threshold voltage. Because trap activities are projected on to the threshold voltage, the separation of time constants and amplitude for each trap is an ill-posed problem. The proposed method solves this problem by statistical method that can reflect the physical generation process of RTN. By using Gibbs sampling algorithm developed in statistical machine learning community, we decompose the measured threshold voltage sequence to time constants and amplitude of each trap. We also demonstrate that the proposed method estimates time constants about 2.1 times more accurately than the existing work that uses hidden Markov model, which contributes to enhance the accuracy of reliability-aware circuit simulation. © 2013 IEEE.

DOI： 10.1109/ISQED.2013.6523672

Scopus
Statistical aging under dynamic voltage scaling: A logarithmic model approach Reviewed

Velamala J.B., Sutaria K., Shimizu H., Awano H., Sato T., Cao Y.

Proceedings of the Custom Integrated Circuits Conference 2012.11

　More details

Publisher：Proceedings of the Custom Integrated Circuits Conference

Aging mechanisms, such as Negative Bias Temperature Instability (NBTI), limit the lifetime of CMOS design. Recent NBTI data exhibits an excessive amount of randomness and fast recovery, which are difficult to be handled by conventional power-law model (tn). Such discrepancies further pose the challenge on long-term reliability prediction in real circuit operation. To overcome these barriers, this work (1) proposes a logarithmic model (log(t)) that is derived from the trapping/de-trapping assumptions; (2) practically explains the aging statistics and the non-monotonic behavior under dynamic voltage scaling (DVS); and (3) comprehensively validates the new model with 65nm silicon data. Compared to previous models, the new result captures the essential role of the recovery phase under DVS, reducing unnecessary guard-banding in reliability protection. © 2012 IEEE.

DOI： 10.1109/CICC.2012.6330572

Scopus
Statistical observations of NBTI-induced threshold voltage shifts on small channel-area devices Reviewed

Sato T., Awano H., Shimizu H., Tsutsui H., Ochi H.

Proceedings International Symposium on Quality Electronic Design Isqed page： 306 - 311 2012.7

　More details

Publisher：Proceedings International Symposium on Quality Electronic Design Isqed

Performance variability of miniaturized devices has become a major obstacle for designing electronic systems. Temporal degradation of threshold voltages and its variation are going to be an additional concerns to ensure their reliability. In this paper, based on measurement results on large number of devices, we present statistical properties of device degradation and recovery. The measurement data is obtained by using a device-array circuit suitable for efficiently collect statistical data on degradations and recoveries of very small channel-area devices. Stair-like change of threshold voltages found in our measurement suggests that charge trapping and emission may play a key role in the device degradation process. © 2012 IEEE.

DOI： 10.1109/ISQED.2012.6187510

Scopus
Bayesian Estimation of Multi-Trap RTN Parameters Using Markov Chain Monte Carlo Method Reviewed Open Access

AWANO Hiromitsu, TSUTSUI Hiroshi, OCHI Hiroyuki, SATO Takashi

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences Vol. E95.A ( 12 ) page： 2272 - 2283 2012

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

Random telegraph noise (RTN) is a phenomenon that is considered to limit the reliability and performance of circuits using advanced devices. The time constants of carrier capture and emission and the associated change in the threshold voltage are important parameters commonly included in various models, but their extraction from time-domain observations has been a difficult task. In this study, we propose a statistical method for simultaneously estimating interrelated parameters: the time constants and magnitude of the threshold voltage shift. Our method is based on a graphical network representation, and the parameters are estimated using the Markov chain Monte Carlo method. Experimental application of the proposed method to synthetic and measured time-domain RTN signals was successful. The proposed method can handle interrelated parameters of multiple traps and thereby contributes to the construction of more accurate RTN models.

DOI： 10.1587/transfun.e95.a.2272

Web of Science

Scopus

CiNii Research
Use of a sparse structure to improve learning performance of recurrent neural networks Reviewed

Awano H., Nishide S., Arie H., Tani J., Takahashi T., Okuno H.G., Ogata T.

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics Vol. 7064 LNCS ( PART 3 ) page： 323 - 331 2011.11

　More details

Publisher：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

The objective of our study is to find out how a sparse structure affects the performance of a recurrent neural network (RNN). Only a few existing studies have dealt with the sparse structure of RNN with learning like Back Propagation Through Time (BPTT). In this paper, we propose a RNN with sparse connection and BPTT called Multiple time scale RNN (MTRNN). Then, we investigated how sparse connection affects generalization performance and noise robustness. In the experiments using data composed of alphabetic sequences, the MTRNN showed the best generalization performance when the connection rate was 40%. We also measured sparseness of neural activity and found out that sparseness of neural activity corresponds to generalization performance. These results means that sparse connection improved learning performance and sparseness of neural activity would be used as metrics of generalization performance. © 2011 Springer-Verlag.

DOI： 10.1007/978-3-642-24965-5_36

Scopus
A study on parameter estimation for modeling of random-telegraph noise

AWANO Hiromitsu, SHIMIZU Hirofumi, TSUTSUI Hiroshi, OCHI Hiroyuki, SATO Takashi

Vol. 111 ( 324 ) page： 85 - 90 2011.11

　More details

Language：Japanese

CiNii Research
Human-robot cooperation in arrangement of objects using confidence measure of neuro-dynamical system Reviewed

Awano H., Ogata T., Nishide S., Takahashi T., Komatani K., Okuno H.

Conference Proceedings IEEE International Conference on Systems Man and Cybernetics page： 2533 - 2538 2010.12

　More details

Publisher：Conference Proceedings IEEE International Conference on Systems Man and Cybernetics

The objective of our study was to develop dynamic collaboration between a human and a robot. Most conventional studies have created pre-designed rule-based collaboration systems to determine the timing and behavior of robots to participate in tasks. Our aim is to introduce the confidence of the task as a criterion for robots to determine their timing and behavior. In this paper, we report the effectiveness of applying reproduction accuracy as a measure for quantitatively evaluating confidence in an object arrangement task. Our method is comprised of three phases. First, we obtain human-robot interaction data through the Wizard of OZ method. Second, the obtained data are trained using a neuro-dynamical system, namely, the Multiple Time-scales Recurrent Neural Network (MTRNN). Finally, the prediction error in MTRNN is applied as a confidence measure to determine the robot's behavior. The robot participated in the task when its confidence was high, while it just observed when its confidence was low. Training data were acquired using an actual robot platform, Hiro. The method was evaluated using a robot simulator. The results revealed that motion trajectories could be precisely reproduced with a high degree of confidence, demonstrating the effectiveness of the method. ©2010 IEEE.

DOI： 10.1109/ICSMC.2010.5641924

Scopus
Human and Robot Cooperation for Arrangement of Objects by Prediction using Recurrent Neural Network

AWANO Hiromitsu, OGATA Tetsuya, KOMATANI Kazunori, TAKAHASHI Toru, OKUNO Hiroshi G.

Vol. 72 ( 0 ) page： 395 - 396 2010.3

　More details

Language：Japanese

CiNii Research

▼display all

To the head of Papers.▲

Books 1

On-chip characterization of statistical device degradation

Sato T., Awano H.

Circuit Design for Reliability 2015.1 （ ISBN:9781461440772, 9781461440789 ）

　More details

Bias temperature instability (BTI) is one of the most critical degradation mechanisms that occur in modern semiconductor devices. The degradation due to BTI is transient, and known to be greatly influenced by bias voltages and temperature, making it very difficult to detect possible BTI-related failures during manufacturing test. Characterization and modeling of BTI is hence extremely important to protect a chip from BTI-related failures. In this chapter, an array structure that accelerates the statistical characterization of BTI is described. By overlapping the stress-application period for each device, measurements on hundreds or thousands ofdevices can be conducted concurrently. Test chip measurement results that provides a statistical insight on the parameters of BTI-related degradation process are also presented.

DOI： 10.1007/978-1-4614-4078-9_5

Scopus

To the head of Books.▲