Faculty Profiles - IRINO Toshio

写真a

IRINO Toshio

Name of department

Faculty of Systems Engineering, Media Design

Job title

Professor

Concurrent post

Informatics Division（Professor）

Mail Address

Homepage

External link

Education

1982

-

1987

Tokyo Institute of Technology Graduate School of Science and Engineering Department of Electrical and Electronic Engineering
1978

-

1982

Tokyo Institute of Technology School of Engineering 電気・電子工学科

Degree

Doctor of Engineering 1987

Academic & Professional Experience

2005

-

2007

The Institute of Statistical Mathematics 客員教授
2002

-

Now

Wakayama University Faculty of Systems Engineering Professor
2000

-

2002

NTTコミュニケーション科学基礎研究所主任研究員
1997

-

2000

Advanced Telecommunications Research Institute International 主任研究員
1993

-

1994

Medical Research Council, Applied Psychology Unit Visiting researcher
1987

-

1997

NTT基礎研究所研究主任〜主任研究員

▼display all

Association Memberships

米国音響学会 (ASA)
THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.
IEEE
ACOUSTICAL SOCIETY OF JAPAN
International Speech and Communication Association
ARO

▼display all

Research Areas

Humanities & social sciences / Experimental psychology
Informatics / Perceptual information processing
Life sciences / Cognitive neuroscience
Humanities & social sciences / Clinical psychology
Humanities & social sciences / Linguistics
Informatics / Statistical science
Informatics / Intelligent robotics

▼display all

Classes (including Experimental Classes, Seminars, Graduation Thesis Guidance, Graduation Research, and Topical Research)

2024 Graduation Research Specialized Subjects
2024 Sound Design Methods Specialized Subjects
2024 Graduation Research Specialized Subjects
2024 Media Design Seminar 1B Specialized Subjects
2024 Media Design Seminar 1A Specialized Subjects
2024 Media Signal Processing Basics Specialized Subjects
2023 Media Signal Processing Basics Specialized Subjects
2023 Sound Design Methods Specialized Subjects
2023 Graduation Research Specialized Subjects
2023 Introduction to Latest Information Technology Specialized Subjects
2023 Media Design Seminar 2A Specialized Subjects
2023 Media Design Seminar 1B Specialized Subjects
2023 Media Design Seminar 2B Specialized Subjects
2023 Media Design Seminar 1A Specialized Subjects
2023 Graduation Research Specialized Subjects
2023 Graduation Research Specialized Subjects
2023 Media Design Seminar 2A Specialized Subjects
2023 Media Design Seminar 2B Specialized Subjects
2023 Media Design Seminar 1A Specialized Subjects
2023 Media Design Seminar 1B Specialized Subjects
2023 Sound Design Methods Specialized Subjects
2023 Media Signal Processing Basics Specialized Subjects
2022 Fundamentals of Robotics Liberal Arts and Sciences Subjects
2022 Graduation Research Specialized Subjects
2022 Sound Design Methods Specialized Subjects
2022 Media Signal Processing Basics Specialized Subjects
2022 Media Design Seminar 2B Specialized Subjects
2022 Media Design Seminar 2A Specialized Subjects
2022 Media Design Seminar 1B Specialized Subjects
2022 Media Design Seminar 1A Specialized Subjects
2022 Introductory Seminar in Systems Engineering Specialized Subjects
2021 Sound Design Methods Specialized Subjects
2021 Graduation Research Specialized Subjects
2021 Fundamentals of Robotics Liberal Arts and Sciences Subjects
2021 Media Design Seminar 2B Specialized Subjects
2021 Media Design Seminar 2A Specialized Subjects
2021 Media Design Seminar 1B Specialized Subjects
2021 Media Design Seminar 1A Specialized Subjects
2021 Media Signal Processing Basics Specialized Subjects
2020 Graduation Research Specialized Subjects
2020 Media Design Seminar Ⅱ Specialized Subjects
2020 Media Signal Processing Basics Specialized Subjects
2020 Sound Design Methods Specialized Subjects
2020 Graduation Research Specialized Subjects
2020 Graduation Research Specialized Subjects
2020 Graduation Research Specialized Subjects
2020 Media Design Seminar 2B Specialized Subjects
2020 Media Design Seminar 2A Specialized Subjects
2020 Media Design Seminar 1B Specialized Subjects
2020 Media Design Seminar 1A Specialized Subjects
2020 Media Signal Processing Basics Specialized Subjects
2019 Media Design Seminar Ⅱ Specialized Subjects
2019 Media Design Seminar Ⅰ Specialized Subjects
2019 Exercises in Sound Programming Specialized Subjects
2019 Media Signal Processing Basics Specialized Subjects
2019 Introductory Seminar in Systems Engineering Specialized Subjects
2019 Sound Design Methods Specialized Subjects
2019 Introductory Seminar in Systems Engineering Specialized Subjects
2019 Sound Design Methods Specialized Subjects
2019 Media Signal Processing Basics Specialized Subjects
2019 Graduation Research Specialized Subjects
2019 Media Design Seminar Ⅱ Specialized Subjects
2019 Exercises in Sound Programming Specialized Subjects
2019 Sound Design Methods Specialized Subjects
2019 Media Design Seminar Ⅰ Specialized Subjects
2018 Media Signal Processing Basics Specialized Subjects
2018 Graduation Research Specialized Subjects
2018 Media Design Seminar Ⅱ Specialized Subjects
2018 Media Design Seminar Ⅰ Specialized Subjects
2018 Exercises in Sound Programming Specialized Subjects
2018 Sound Design Methods Specialized Subjects
2018 Sound Design Methods Specialized Subjects
2018 NA Specialized Subjects
2018 Graduation Research Specialized Subjects
2018 Exercises in Sound Programming Specialized Subjects
2018 Media Signal Processing Basics Specialized Subjects
2018 Media Design Seminar Ⅰ Specialized Subjects
2017 NA Specialized Subjects
2017 Voluntary Study on Systems Engineering Ⅳ Specialized Subjects
2017 Media Design Seminar Ⅰ Specialized Subjects
2017 Exercises in Sound Programming Specialized Subjects
2017 Media Signal Processing Basics Specialized Subjects
2017 Introductory Seminar in Systems Engineering Specialized Subjects
2017 Graduation Research Specialized Subjects
2017 Introductory Seminar in Systems Engineering Liberal Arts and Sciences Subjects
2017 Exercises in Sound Programming Specialized Subjects
2017 Media Signal Processing Basics Specialized Subjects
2017 Media Design Seminar Ⅱ Specialized Subjects
2017 Media Design Seminar Ⅰ Specialized Subjects
2016 Sound Design Methods Specialized Subjects
2016 Graduation Research Specialized Subjects
2016 Design and Information Sciences Seminar Ⅱ Specialized Subjects
2016 Design and Information Sciences Seminar Ⅰ Specialized Subjects
2016 Voluntary Study on Systems Engineering Ⅴ Specialized Subjects
2016 Voluntary Study on Systems Engineering Ⅳ Specialized Subjects
2016 Exercises in Sound Programming Specialized Subjects
2016 Sound Design Methods Specialized Subjects
2016 Media Signal Processing Basics Specialized Subjects
2015 Applicable Mathematics to Computing Specialized Subjects
2015 Media Information Processing Specialized Subjects
2015 Design and Information Sciences Seminar Ⅰ Specialized Subjects
2015 Digital Signal Processing Specialized Subjects
2015 Introductory Seminar in Systems Engineering Specialized Subjects
2015 Voluntary Study on Systems Engineering Ⅴ Specialized Subjects
2015 Voluntary Study on Systems Engineering Ⅲ Specialized Subjects
2015 Media Science (Basic Course) Specialized Subjects
2015 Design and Information Sciences Seminar Ⅱ Specialized Subjects
2015 Voluntary Study on Systems Engineering Ⅱ Specialized Subjects
2015 Media Science (Basic Course) Specialized Subjects
2015 Media Information Processing Specialized Subjects
2015 Introductory Seminar in Systems Engineering Liberal Arts and Sciences Subjects
2015 Digital Signal Processing Specialized Subjects
2015 Design and Information Sciences Seminar Ⅰ Specialized Subjects
2015 Applicable Mathematics to Computing Specialized Subjects
2014 Design and Information Sciences Seminar Ⅱ Specialized Subjects
2014 Design and Information Sciences Seminar Ⅰ Specialized Subjects
2014 Applicable Mathematics to Computing Specialized Subjects
2014 Media Science (Basic Course) Specialized Subjects
2014 Introduction to Design and InformationSciences Specialized Subjects
2014 Digital Signal Processing Specialized Subjects
2014 Introduction to Design and InformationSciences Specialized Subjects
2014 Information systems in everyday life Liberal Arts and Sciences Subjects
2014 Digital Signal Processing Specialized Subjects
2014 Media Science (Basic Course) Specialized Subjects
2014 Applicable Mathematics to Computing Specialized Subjects
2013 Design and Information Sciences Seminar Ⅱ Specialized Subjects
2013 Design and Information Sciences Seminar Ⅰ Specialized Subjects
2013 Applicable Mathematics to Computing Specialized Subjects
2013 Media Science (Basic Course) Specialized Subjects
2013 Introduction to Design and InformationSciences Specialized Subjects
2013 Digital Signal Processing Specialized Subjects
2013 Information systems in everyday life Liberal Arts and Sciences Subjects
2013 Introductory Seminar Liberal Arts and Sciences Subjects
2013 Media Science (Basic Course) Specialized Subjects
2013 Information systems in everyday life Liberal Arts and Sciences Subjects
2013 Introduction to Design and InformationSciences Specialized Subjects
2013 Design and Information Sciences Seminar Ⅰ Specialized Subjects
2013 Digital Signal Processing Specialized Subjects
2013 Media Science (Basic Course) Specialized Subjects
2013 Graduation Research Specialized Subjects
2013 Design and Information Sciences Seminar Ⅱ Specialized Subjects
2013 Applicable Mathematics to Computing Specialized Subjects
2013 Introductory Seminar Liberal Arts and Sciences Subjects
2012 Graduation Research Specialized Subjects
2012 Applicable Mathematics to Computing Specialized Subjects
2012 Introduction to Design and InformationSciences Specialized Subjects
2012 Design and Information Sciences Seminar Ⅰ Specialized Subjects
2012 Digital Signal Processing Specialized Subjects
2012 Voluntary Study on Systems Engineering Ⅴ Specialized Subjects
2012 Voluntary Study on Systems Engineering Ⅲ Specialized Subjects
2012 Information systems in everyday life Liberal Arts and Sciences Subjects
2012 Media Science (Basic Course) Specialized Subjects
2012 Design and Information Sciences Seminar Ⅱ Specialized Subjects
2011 Voluntary Study on Systems Engineering Ⅳ Specialized Subjects
2011 Voluntary Study on Systems Engineering Ⅲ Specialized Subjects
2011 Voluntary Study on Systems Engineering Ⅱ Specialized Subjects
2011 Voluntary Study on Systems Engineering Ⅰ Specialized Subjects
2011 Media Science (Basic Course) Specialized Subjects
2011 Information systems in everyday life Liberal Arts and Sciences Subjects
2011 Graduation Research Specialized Subjects
2011 Introduction to Design and InformationSciences Specialized Subjects
2011 Digital Signal Processing Specialized Subjects
2011 Applicable Mathematics to Computing Specialized Subjects
2011 NA Specialized Subjects
2011 NA Specialized Subjects
2010 Information systems in everyday life Liberal Arts and Sciences Subjects
2010 Graduation Research Specialized Subjects
2010 Introduction to Design and InformationSciences Specialized Subjects
2010 Media Science (Basic Course) Specialized Subjects
2010 Digital Signal Processing Specialized Subjects
2010 Applicable Mathematics to Computing Specialized Subjects
2010 NA Specialized Subjects
2010 NA Specialized Subjects
2009 NA Specialized Subjects
2009 NA Specialized Subjects
2009 Applicable Mathematics to Computing Specialized Subjects
2009 Digital Signal Processing Specialized Subjects
2009 Media Science (Basic Course) Specialized Subjects
2009 Introduction to Design and InformationSciences Specialized Subjects
2009 Graduation Research Specialized Subjects
2009 Information systems in everyday life Liberal Arts and Sciences Subjects
2008 NA Specialized Subjects
2008 NA Specialized Subjects
2008 Applicable Mathematics to Computing Specialized Subjects
2008 Digital Signal Processing Specialized Subjects
2008 Media Science (Basic Course) Specialized Subjects
2008 Introduction to Design and InformationSciences Specialized Subjects
2008 Graduation Research Specialized Subjects
2008 Information systems in everyday life Liberal Arts and Sciences Subjects
2007 NA Specialized Subjects
2007 NA Specialized Subjects
2007 Applicable Mathematics to Computing Specialized Subjects
2007 Digital Signal Processing Specialized Subjects
2007 Media Science (Basic Course) Specialized Subjects
2007 Introduction to Design and InformationSciences Specialized Subjects
2007 Graduation Research Specialized Subjects
2007 Information systems in everyday life Liberal Arts and Sciences Subjects

▼display all

Independent study

2016 スピーカー製作と音や音響機器に関する基礎知識の修得
2015 スピーカー通じて音の出る仕組みを理解しよう
2015 ドラムとボイスパーカッションの特徴比較
2011 高級オーディオに匹敵するステレオシステムの製作
2011 音響増幅装置製作
2010 聴覚とスピーカーのしくみ
2010 音響提示装置作製
2010 身体動作と聴覚の関係の基礎検討

▼display all

Classes

2024 Systems Engineering Advanced Research Doctoral Course
2024 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2024 Systems Engineering Advanced Seminar Ⅰ Doctoral Course
2024 Systems Engineering SeminarⅠB Master's Course
2024 Systems Engineering SeminarⅡB Master's Course
2024 Systems Engineering SeminarⅠB Master's Course
2024 Systems Engineering SeminarⅡB Master's Course
2024 Systems Engineering Project SeminarⅠA Master's Course
2024 Systems Engineering Project SeminarⅠB Master's Course
2024 Systems Engineering Project SeminarⅡA Master's Course
2024 Systems Engineering Project SeminarⅡB Master's Course
2024 Systems Engineering Project SeminarⅡB Master's Course
2023 Systems Engineering Project SeminarⅡB Master's Course
2023 Systems Engineering Project SeminarⅡA Master's Course
2023 Systems Engineering Project SeminarⅠB Master's Course
2023 Systems Engineering Project SeminarⅠA Master's Course
2023 Systems Engineering SeminarⅡB Master's Course
2023 Systems Engineering SeminarⅡA Master's Course
2023 Systems Engineering SeminarⅠB Master's Course
2023 Systems Engineering SeminarⅠA Master's Course
2023 Systems Engineering Advanced Seminar Ⅰ Doctoral Course
2023 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2023 Systems Engineering Advanced Research Doctoral Course
2023 Systems Engineering Global Seminar Ⅰ Doctoral Course
2023 Systems Engineering Global Seminar Ⅰ Doctoral Course
2023 Systems Engineering Global Seminar Ⅱ Doctoral Course
2023 Systems Engineering Global Seminar Ⅱ Doctoral Course
2023 Systems Engineering SeminarⅠB Master's Course
2023 Systems Engineering Advanced Seminar Ⅰ Doctoral Course
2023 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2023 Systems Engineering Advanced Research Doctoral Course
2023 Systems Engineering SeminarⅠA Master's Course
2023 Systems Engineering SeminarⅡA Master's Course
2023 Systems Engineering SeminarⅡB Master's Course
2023 Systems Engineering Project SeminarⅠA Master's Course
2023 Systems Engineering Project SeminarⅠB Master's Course
2023 Systems Engineering Project SeminarⅡA Master's Course
2023 Systems Engineering Project SeminarⅡB Master's Course
2022 Systems Engineering Global Seminar Ⅱ Doctoral Course
2022 Systems Engineering Global Seminar Ⅰ Doctoral Course
2022 Systems Engineering Advanced Research Doctoral Course
2022 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2022 Systems Engineering Advanced Seminar Ⅰ Doctoral Course
2022 Systems Engineering Project SeminarⅡB Master's Course
2022 Systems Engineering Project SeminarⅡA Master's Course
2022 Systems Engineering Project SeminarⅠB Master's Course
2022 Systems Engineering Project SeminarⅠA Master's Course
2022 Systems Engineering SeminarⅡB Master's Course
2022 Systems Engineering SeminarⅡA Master's Course
2022 Systems Engineering SeminarⅠB Master's Course
2022 Systems Engineering SeminarⅠA Master's Course
2021 Systems Engineering Global Seminar Ⅱ Doctoral Course
2021 Systems Engineering SeminarⅠA Master's Course
2021 Systems Engineering SeminarⅠB Master's Course
2021 Systems Engineering SeminarⅡA Master's Course
2021 Systems Engineering SeminarⅡB Master's Course
2021 Systems Engineering Project SeminarⅠA Master's Course
2021 Systems Engineering Project SeminarⅠB Master's Course
2021 Systems Engineering Project SeminarⅡA Master's Course
2021 Systems Engineering Project SeminarⅡB Master's Course
2021 Systems Engineering Advanced Seminar Ⅰ Doctoral Course
2021 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2021 Systems Engineering Advanced Research Doctoral Course
2021 Systems Engineering Global Seminar Ⅰ Doctoral Course
2021 Systems Engineering Global Seminar Ⅱ Doctoral Course
2020 NA Master's Course
2020 NA Master's Course
2020 NA Master's Course
2020 NA Master's Course
2020 Systems Engineering Project SeminarⅠA Master's Course
2020 Systems Engineering Project SeminarⅡA Master's Course
2020 Systems Engineering SeminarⅠA Master's Course
2020 NA Master's Course
2020 Systems Engineering Global Seminar Ⅱ Doctoral Course
2020 Systems Engineering Global Seminar Ⅰ Doctoral Course
2020 Systems Engineering Advanced Research Doctoral Course
2020 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2020 Systems Engineering Advanced Seminar Ⅰ Doctoral Course
2020 Systems Engineering Project SeminarⅡB Master's Course
2020 Systems Engineering Project SeminarⅡA Master's Course
2020 Systems Engineering Project SeminarⅠB Master's Course
2020 Systems Engineering Project SeminarⅠA Master's Course
2020 Systems Engineering SeminarⅡB Master's Course
2020 Systems Engineering SeminarⅡA Master's Course
2020 Systems Engineering SeminarⅠB Master's Course
2020 Systems Engineering SeminarⅠA Master's Course
2019 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2019 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2019 Systems Engineering Advanced Research Doctoral Course
2019 Systems Engineering Advanced Research Doctoral Course
2019 Systems Engineering SeminarⅡB Master's Course
2019 Systems Engineering SeminarⅡA Master's Course
2019 Systems Engineering SeminarⅠB Master's Course
2019 Systems Engineering SeminarⅠA Master's Course
2019 Systems Engineering Global Seminar Ⅱ Doctoral Course
2019 Systems Engineering Global Seminar Ⅱ Doctoral Course
2019 Systems Engineering Project SeminarⅡB Master's Course
2019 Systems Engineering Project SeminarⅡA Master's Course
2019 Systems Engineering Project SeminarⅠB Master's Course
2019 Systems Engineering Project SeminarⅠA Master's Course
2019 Systems Engineering Project SeminarⅡA Master's Course
2019 Systems Engineering Project SeminarⅠB Master's Course
2019 Systems Engineering Project SeminarⅠA Master's Course
2019 Systems Engineering SeminarⅡB Master's Course
2019 Systems Engineering SeminarⅡA Master's Course
2019 Systems Engineering SeminarⅠB Master's Course
2019 Systems Engineering Project SeminarⅡB Master's Course
2018 Systems Engineering Global Seminar Ⅰ Doctoral Course
2018 Systems Engineering Advanced Research Doctoral Course
2018 Systems Engineering Advanced Seminar Ⅰ Doctoral Course
2018 Systems Engineering Project SeminarⅡB Master's Course
2018 Systems Engineering Project SeminarⅡA Master's Course
2018 Systems Engineering Project SeminarⅠB Master's Course
2018 Systems Engineering Project SeminarⅠA Master's Course
2018 Systems Engineering SeminarⅡB Master's Course
2018 Systems Engineering SeminarⅡA Master's Course
2018 Systems Engineering SeminarⅠB Master's Course
2018 Systems Engineering SeminarⅠA Master's Course
2018 Systems Engineering Advanced Research Doctoral Course
2018 Systems Engineering Global Seminar Ⅰ Doctoral Course
2018 Systems Engineering SeminarⅡA Master's Course
2018 Systems Engineering SeminarⅡB Master's Course
2018 Systems Engineering Project SeminarⅡB Master's Course
2018 Systems Engineering Project SeminarⅠB Master's Course
2018 Systems Engineering Project SeminarⅠA Master's Course
2018 Systems Engineering Project SeminarⅡA Master's Course
2018 Systems Engineering SeminarⅡB Master's Course
2018 Systems Engineering Global Seminar Ⅱ Doctoral Course
2018 Systems Engineering SeminarⅠA Master's Course
2017 Systems Engineering Global Seminar Ⅰ Doctoral Course
2017 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2017 Systems Engineering Global Seminar Ⅱ Doctoral Course
2017 Systems Engineering Advanced Research Doctoral Course
2017 Systems Engineering Project SeminarⅡB Master's Course
2017 Systems Engineering Project SeminarⅡA Master's Course
2017 Systems Engineering Project SeminarⅠB Master's Course
2017 Systems Engineering Project SeminarⅠA Master's Course
2017 Systems Engineering SeminarⅡB Master's Course
2017 Systems Engineering SeminarⅡA Master's Course
2017 Systems Engineering SeminarⅠB Master's Course
2016 Systems Engineering Global Seminar Ⅱ Doctoral Course
2016 Systems Engineering Advanced Research Doctoral Course
2016 Systems Engineering Project SeminarⅡB Master's Course
2016 Systems Engineering Project SeminarⅠB Master's Course
2016 Systems Engineering Project SeminarⅠA Master's Course
2016 Systems Engineering SeminarⅡB Master's Course
2016 Systems Engineering SeminarⅠB Master's Course
2016 NA Master's Course
2016 NA Master's Course
2016 NA Master's Course
2016 Systems Engineering Advanced Seminar Ⅱ Doctoral Course
2015 Systems Engineering Advanced Seminar Ⅰ
2015 Systems Engineering SeminarⅡA
2015 Systems Engineering SeminarⅠA Master's Course
2015 Systems Engineering Project SeminarⅡA
2015 Systems Engineering Project SeminarⅠA
2015 Systems Engineering Advanced Seminar Ⅰ
2015 Systems Engineering Advanced Research
2015 Systems Engineering SeminarⅡB
2015 Systems Engineering SeminarⅠB Master's Course
2015 Systems Engineering Project SeminarⅡB
2015 Systems Engineering Project SeminarⅠB
2015 Systems Engineering Global Seminar Ⅰ
2015 NA
2015 NA
2015 NA
2014 Systems Engineering Advanced Research
2014 Systems Engineering Advanced Research
2014 Systems Engineering Advanced Seminar Ⅱ
2014 Systems Engineering Advanced Seminar Ⅱ
2014 Systems Engineering Advanced Seminar Ⅰ
2014 Systems Engineering Advanced Seminar Ⅰ
2014 Systems Engineering Project SeminarⅡB
2014 Systems Engineering Project SeminarⅡA
2014 Systems Engineering Project SeminarⅠB
2014 Systems Engineering Project SeminarⅠA
2014 Systems Engineering SeminarⅡB
2014 Systems Engineering SeminarⅡA
2014 Systems Engineering SeminarⅠB
2014 Systems Engineering SeminarⅠA
2014 NA
2014 NA
2014 NA
2014 NA
2014 NA
2014 NA
2014 Systems Engineering SeminarⅠB
2014 NA
2014 Systems Engineering Project SeminarⅡB
2014 Systems Engineering Advanced Research
2014 Systems Engineering SeminarⅠA
2013 Systems Engineering Advanced Research
2013 Systems Engineering Advanced Research
2013 Systems Engineering Advanced Seminar Ⅱ
2013 Systems Engineering Advanced Seminar Ⅱ
2013 Systems Engineering Advanced Seminar Ⅰ
2013 Systems Engineering Advanced Seminar Ⅰ
2013 Systems Engineering Project SeminarⅡB
2013 Systems Engineering Project SeminarⅡA
2013 Systems Engineering Project SeminarⅠB
2013 Systems Engineering Project SeminarⅠA
2013 Systems Engineering SeminarⅡB
2013 Systems Engineering SeminarⅡA
2013 Systems Engineering SeminarⅠB
2013 Systems Engineering SeminarⅠA
2012 Systems Engineering Advanced Seminar Ⅱ
2012 Systems Engineering Advanced Seminar Ⅰ
2012 Systems Engineering Advanced Research
2012 Systems Engineering SeminarⅡA
2012 Systems Engineering SeminarⅠA
2012 Systems Engineering Project SeminarⅡA
2012 Systems Engineering Project SeminarⅠA
2012 Systems Engineering Advanced Seminar Ⅱ
2012 Systems Engineering Advanced Seminar Ⅰ
2012 Systems Engineering Advanced Research
2012 Systems Engineering SeminarⅡB
2012 Systems Engineering SeminarⅠB
2012 Systems Engineering Project SeminarⅡB
2012 Systems Engineering Project SeminarⅠB
2011 Systems Engineering Project SeminarⅡB
2011 Systems Engineering Project SeminarⅡA
2011 Systems Engineering Project SeminarⅠB
2011 Systems Engineering Project SeminarⅠA
2011 Systems Engineering Advanced Research
2011 Systems Engineering Advanced Research
2011 NA
2011 NA
2011 Systems Engineering Advanced Seminar Ⅱ
2011 Systems Engineering Advanced Seminar Ⅱ
2011 Systems Engineering Advanced Seminar Ⅰ
2011 Systems Engineering Advanced Seminar Ⅰ
2010 NA Master's Course
2010 NA Master's Course
2010 NA Master's Course
2010 NA Master's Course
2009 NA Master's Course
2009 NA Master's Course
2009 NA Master's Course
2009 NA Master's Course
2008 NA Master's Course
2008 NA Master's Course
2008 NA Master's Course
2008 NA Master's Course
2007 NA Master's Course
2007 NA Master's Course
2007 NA Master's Course
2007 NA Master's Course
2005 NA
2005 Systems Engineering SeminarⅠA

▼display all

Research Interests

模擬難聴
聴覚心理実験
寸法知覚
音声信号処理
聴覚計算理論
ガンマチャープ聴覚フィルタ
音声知覚
高齢難聴
音声了解度客観評価指標
非線形時間軸
聴覚モデル
高品質音声分析合成系STRAIGHT
母音正規化
聴覚情報処理
音脈分凝
時間追随性
非線形時間軸伸縮
時間-周波数表現
聴覚ボコーダ
wavelet-Mellin変換
音声強調・分離
寸法正規化
単語知覚
スケール理論
カーネルマシン
生態学的拘束
音響測定
音の生態学
聴覚情報表現
聴覚情景分析
信号処理
音源定位
音声認識器
音源情報
学習機械
感情知覚

▼display all

Published Papers

Effects of age and hearing loss on speech emotion discrimination

Toshio Irino, Yukiho Hanatani, Kazuma Kishida, Shuri Naito, Hideki Kawahara （Part： Lead author,　Corresponding author )

Scientific Reports ( Springer Science and Business Media LLC ) 14 ( 1 ) 2024.08 [Refereed]

DOI
Improving Auditory Filter Estimation by Incorporating Absolute Threshold and a Level-dependent Internal Noise

Toshio Irino, Kenji Yokota, Roy D. Patterson （Part： Lead author,　Corresponding author )

Trends in Hearing ( SAGE Publications ) 27 2023.10 [Refereed]

　View Summary

Auditory filter (AF) shape has traditionally been estimated with a combination of a notched-noise (NN) masking experiment and a power spectrum model (PSM) of masking. However, there are several challenges that remain in both the simultaneous and forward masking paradigms. We hypothesized that AF shape estimation would be improved if absolute threshold (AT) and a level-dependent internal noise were explicitly represented in the PSM. To document the interaction between NN threshold and AT in normal hearing (NH) listeners, a large set of NN thresholds was measured at four center frequencies (500, 1000, 2000, and 4000 Hz) with the emphasis on low-level maskers. The proposed PSM, consisting of the compressive gammachirp (cGC) filter and three nonfilter parameters, allowed AF estimation over a wide range of frequencies and levels with fewer coefficients and less error than previous models. The results also provided new insights into the nonfilter parameters. The detector signal-to-noise ratio ([Formula: see text]) was found to be constant across signal frequencies, suggesting that no frequency dependence hypothesis is required in the postfiltering process. The ANSI standard “Hearing Level-0dB” function, i.e., AT of NH listeners, could be applied to the frequency distribution of the noise floor for the best AF estimation. The introduction of a level-dependent internal noise could mitigate the nonlinear effects that occur in the simultaneous NN masking paradigm. The new PSM improves the applicability of the model, particularly when the sound pressure level of the NN threshold is close to AT.

DOI
Hearing Impairment Simulator Based on Auditory Excitation Pattern Playback: WHIS

Toshio Irino （Part： Lead author,　Corresponding author )

IEEE Access ( Institute of Electrical and Electronics Engineers (IEEE) ) 11 78419 - 78430 2023.07 [Refereed]

DOI
Speech intelligibility of simulated hearing loss sounds and its prediction using the Gammachirp Envelope Similarity Index (GESI)

Toshio Irino, Honoka Tamaru, Ayako Yamamoto （Part： Lead author,　Corresponding author )

Proc. Interspeech2022 2022.09 [Refereed]
Improving auditory filter estimation with level-dependent cochlear noise floor

Toshio Irino, Kenji Yokota, Roy Patterson （Part： Lead author,　Corresponding author )

International Symposium on Hearing 2022 10.5281/zenodo.6576893 2022.06 [Refereed]

DOI
Modelling speaker-size discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift

Toshie Matsui, Toshio Irino, Ryo Uemura, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson （Part： Corresponding author )

Speech Communication ( Elsevier BV ) 136 23 - 41 2022.01 [Refereed]

DOI
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech

Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

Speech Communication 123 43 - 58 2020.10 [Refereed]
The gammachirp auditory filter and its application to speech perception

Toshio Irino, Roy D. Patterson （Part： Lead author,　Corresponding author )

Acoust, Sci. & Tech. 41 ( 1 ) 99 - 107 2020.01 [Refereed] [Invited]

DOI
Auditory Representation Effective for Estimating Vocal Tract Information

Toshio Irino, Shintaro Doan （Part： Lead author,　Corresponding author )

2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ( IEEE ) 2023.10 [Refereed]

DOI
Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine

Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani, Toshio Irino （Part： Last author )

INTERSPEECH 2023 ( ISCA ) 2023.08 [Refereed]

DOI
Effective data screening technique for crowdsourced speech intelligibility experiments: Evaluation with IRM-based speech enhancement,

Ayako Yamamoto, Toshio Irino, Shoko Araki, Kenichi Ara, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

Proc. APSIPA ASC 2022 2022.11 [Refereed]
Intelligibility Prediction of Enhanced Speech Using Recognition Accuracy of End-To-End ASR System

Kenichi Arai, Atsunori Ogawa, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, Naoyuki Kamo, Toshio Irino （Part： Last author )

Proc. APSIPA ASC2022 2022.11 [Refereed]
Speech Intelligibility Prediction Through Direct Estimation of Word Accuracy Using Conformer

Naoyuki Kamo, Kenichi Arai, Atsunori Ogawa, Shoko Araki, Tomohiro Nakatani, Keisuke Kinoshita, Marc Delcroix, Tsubasa Ochiai, Toshio Irino （Part： Last author )

Proc. APSIPA ASC 2022 2022.11 [Refereed]
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility

Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

Interspeech 2021 ( ISCA ) 2021.08 [Refereed]

DOI
Observational and accelerometer analysis of head movement patterns in psychotherapeutic dialogue

Masashi Inoue, Toshio Irino, Nobuhiro Furuyama, Ryoko Hanada

Sensors 21 ( 9 ) 2021.05 [Refereed]
Interactive and real-time acoustic measurement tools for speech data acquisition and presentation: Application of an extended member of time stretched pulses

Hideki Kawahara, Kohei Yatabe, Ken Ichi Sakakibara, Mitsunori Mizumachi, Masanori Morise, Hideki Banno, Toshio Irino （Part： Last author )

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 3 2197 - 2198 2021

　View Summary

Objective measurements of speech data acquisition and presentation processes are crucial for assuring reproducibility and reusability of experimental results and acquired materials. We introduce setting and measurement examples of those conditions using an interactive and real-time acoustic measurement tool based on an extended time-stretched pulse. We also introduce supporting tools.
Mixture of Orthogonal Sequences Made from Extended Time-Stretched Pulses Enables Measurement of Involuntary Voice Fundamental Frequency Response to Pitch Perturbation.

Hideki Kawahara, Toshie Matsui, Kohei Yatabe, Ken-Ichi Sakakibara, Minoru Tsuzaki, Masanori Morise, Toshio Irino （Part： Last author )

Interspeech ( ISCA ) 4 3206 - 3210 2021 [Refereed]

　View Summary

Auditory feedback plays an essential role in the regulation of the fundamental frequency of voiced sounds. The fundamental frequency also responds to auditory stimulation other than the speaker’s voice. We propose to use this response of the fundamental frequency of sustained vowels to frequency-modulated test signals for investigating involuntary control of voice pitch. This involuntary response is difficult to identify and isolate by the conventional paradigm, which uses step-shaped pitch perturbation. We recently developed a versatile measurement method using a mixture of orthogonal sequences made from a set of extended time-stretched pulses (TSP). In this article, we extended our approach and designed a set of test signals using the mixture to modulate the fundamental frequency of artificial signals. For testing the response, the experimenter presents the modulated signal aurally while the subject is voicing sustained vowels. We developed a tool for conducting this test quickly and interactively. We make the tool available as an open-source and also provide executable GUI-based applications. Preliminary tests revealed that the proposed method consistently provides compensatory responses with about 100 ms latency, representing involuntary control. Finally, we discuss future applications of the proposed method for objective and non-invasive auditory response measurements.

DOI
Implementation of Interactive Tools for Investigating Fundamental Frequency Response of Voiced Sounds to Auditory Stimulation

Hideki Kawahara, Toshie Matsui, Kohei Yatabe, Ken Ichi Sakakibara, Minoru Tsuzaki, Masanori Morise, Toshio Irino （Part： Last author )

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings abs/2109.11594 897 - 903 2021 [Refereed]

　View Summary

We introduced a measurement procedure for the involuntary response of voice fundamental-frequency to frequency modulated auditory stimulation. This involuntary response plays an essential role in voice fundamental frequency control while less investigated due to technical difficulties. This article introduces an interactive and real-time tool for investigating this response and supporting tools adopting our new measurement method. The method enables simultaneous measurement of multiple system properties based on a novel set of extended time-stretched pulses combined with orthogonalization. We made MATLAB implementation of these tools available as an open-source repository. This article also provides the detailed measurement procedure using the interactive tool followed by offline measurement tools for conducting subjective experiments and statistical analyses. It also provides technical descriptions of constituent signal processing subsystems as appendices. This application serves as an example for adopting our method to biological system analysis.
Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System

Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino （Part： Last author )

Interspeech 2020 2020.10 [Refereed]
Speech clarity improvement by vocal self-training using a hearing impairment simulator and its correlation with an auditory modulation index

Toshio Irino, Soichi Higashiyama, Hanako Yoshigi （Part： Lead author,　Corresponding author )

Interspeech 2020 2020.10 [Refereed]
Speech intelligibility prediction using a multi-resolution gammachirp envelope distortion index with common parameters for different noise conditions

Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

Acoust, Sci. & Tech. 41 ( 1 ) 396 - 399 2020.01 [Refereed]

DOI
Frequency domain variant of Velvet noise and its application to acoustic measurements,

Hideki Kawahara, Ken-Ichi Sakakibara, Mitsunori Mizumachi, Hideki Banno, Mazanori Morise, Toshio Irino （Part： Last author )

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(APSIPA) ( IEEE ) 1523 - 1532 2019.11 [Refereed]

　View Summary

APSIPA ASC 2019 ,Lanzhou, China, 18-21 Nov. 2019,

DOI
Predicting speech intelligibility of enhanced speech using phone accuracy of DNN-based ASR systems,

Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Katsuhiko Yamamoto, Toshio Irino （Part： Last author )

Interspeech 2019 ( ISCA ) 4275 - 4279 2019.09 [Refereed]

　View Summary

Graz, Austria, 15-19 Sep. 2019

DOI
Modification of piano performance by simulated hearing loss: Analyses on the key velocities and output powers,

Minoru Tsuzaki, Noriko Maegawa, Chie Ohsawa, Hideki Banno, Toshio Irino （Part： Last author )

International Symposium on Performance Science 2019 2019.07 [Refereed]

　View Summary

(ISPS2019), 16-20 July 2019.
Rising-frequency chirp stimulus to effectively enhance wave-I amplitude of auditory brainstem response,

Takashi Morimoto, Yoh-ichi Fujisaka, Yasuhide Okamoto, Toshio Irino （Part： Last author )

Hear. Res 377 104 - 108 2019.06 [Refereed]

　View Summary

(Short communication)
臨床心理面接における「傾聴」の再考に向けた時系列連続評価アプローチの提案

花田里欧子, 入野俊夫, 古山宣洋, 井上雅史, 門田圭祐

東京女子大学心理臨床センター紀要 9 41 - 62 2019.03
Speech intelligibility prediction with the dynamic compressive gammachirp filterbank and modulation power spectrum,

Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

Acoust. Sci. & Tech 40 ( 2 ) 84 - 92 2019.03 [Refereed]

DOI
Two-Point Method for Measuring the Temporal Modulation Transfer Function.

Takashi Morimoto, Toshio Irino, Kouta Harada, Takeshi Nakaichi, Yasuhide Okamoto, Ayako Kanno, Sho Kanzaki, Kaoru Ogawa （Part： Corresponding author )

Ear and hearing 40 ( 1 ) 55 - 62 2019.01 [Refereed]

　View Summary

OBJECTIVE: The temporal modulation transfer function (TMTF) has been proposed to estimate the temporal resolution abilities of listeners with normal hearing and listeners with hearing loss. The TMTF data of patients would be useful for clinical diagnosis and for adjusting the hearing instruments at clinical and fitting sites. However, practical application is precluded by the long measurement time of the conventional method, which requires several measurement points. This article presents a new method to measure the TMTF that requires only two measurement points. DESIGN: Experiments were performed to estimate the TMTF of normal listeners and listeners with hearing loss to demonstrate that the two-point method can estimate the TMTF parameter and the conventional method. Sixteen normal hearing and 21 subjects with hearing loss participated, and the difference between the estimated TMTF parameters and measurement time were compared. RESULTS: The TMTF parameters (the peak sensitivity Lps and cutoff frequency fcutoff) estimated by the conventional and two-point methods showed significantly high correlations: the correlation coefficient for Lps was 0.91 (t(45) = 14.3; p < 10) and that for fcutoff was 0.89 (t(45) = 13.2; p < 10). There were no fixed and proportional biases. Therefore, the estimated values were in good agreement. Moreover, there was no systematic bias depending on the subject's profile. The measurement time of the two-point method was approximately 10 min, which is approximately one-third that of the conventional method. CONCLUSION: The two-point method enables the introduction of TMTF measurement in clinical diagnosis.

DOI
A real time hearing loss simulator

Nicolas Grimault, Toshio Irino, Samar Dimachki, Alexandra Corneyllie, Roy D. Patterson, Samuel Garcia

Acta Acustica united with Acustica 104 ( 5 ) 904 - 908 2018.10 [Refereed]

DOI
Auditory filter derivation at low levels where masked threshold interacts with absolute threshold

Toshio Irino, Kenji Yokota, Toshie Matsui, Roy D. Patterson （Part： Lead author,　Corresponding author )

Proc. International Symopium on Hearing (ISH2018) 104 ( 5 ) 887 - 890 2018.10 [Refereed]

DOI
Speech intelligibility prediction using a multi-resolution gammachirp envelope distortion index with common parameters for different noise conditions,

Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

Seminar on brain, hearing and speech sciences for universal speech communication 2018.10 [Refereed]

　View Summary

Tohoku Univ., Sendai, Japan, 25 - 26 Oct 2018, (発表：25 Oct 2018)
Multi-resolution Gammachirp Envelope Distortion Index for Intelligibility Prediction of Noisy Speech

Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

Proc. Interspeech 2018 1863 - 1867 2018.08 [Refereed]

DOI
Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis

Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino （Part： Last author )

Proc. Interspeech 2018 ( ISCA ) 2027 - 2031 2018.08 [Refereed]

DOI
Effectiveness of inter-phrase pausing for sentence intelligibility in the elderly with hard of hearing-A simulation study using a hearing impairment simulator-

畑山春菜, 長谷川純, 吐師道子, 松井淑恵, 入野俊夫（Part： Last author )

人間と科学 ( 県立広島大学保健福祉学部学術誌編集委員会 ) 18 ( 1 ) 19‐26 - 26 2018.03

　View Summary

"難聴のある高齢者に話しかける際に，聞き取りを助ける方法の１つとして，文節の間をあけて話すことが推奨されている。こうした文節休止が文の聴取に与える効果について，模擬難聴システムを用いて高齢者の聴力をシミュレーションし，若年健聴者を対象に聴取実験を行なって検討した。その結果，文節休止0.6 秒，文節休止0.1秒，文節休止なしの順に文聴取の正答率が高く，文節休止は聞き取りの向上につながると考えられた。また，ごく短い0.1 秒の休止でも，一定の効果があることが示唆された。"When speaking to elderly people with hard of hearing, it is commonly recommended to insert short pauses between phrases of a sentence in order to assist their listening comprehension. In this study, the effectiveness of inter-phrase pausing for listening comprehension of sentences was investigated by simulating elderly people's hearing utilizing a hearing impairment simulator. Young adults with normal hearing participated in an experiment in which they listened to sentences through the hearing impairment simulator and were asked to repeat the sentences as they heard them. The results showed that the correct answer rate was highest with a 0.6 second pausing, followed by a 0.1 second pausing, and lowest when sentences were presented without pausing. It can be concluded that inter-phrase pausing facilitates listening comprehension of sentences and even a very short pausing of 0.1 second is effective.原著
高齢難聴者の文聴取における文節休止の効果模擬難聴システムを用いたシミュレーションによる検討

畑山春菜, 長谷川純, 吐師道子, 松井淑恵, 入野俊夫

人間と科学: 県立広島大学保健福祉学部誌 ( 県立広島大学保健福祉学部学術誌編集委員会 ) 18 ( 1 ) 19 - 26 2018.03
An Auditory Model of Speaker Size Perception for Voiced Speech Sounds.

Toshio Irino, Eri Takimoto, Toshie Matsui,Roy D. Patterson （Part： Lead author,　Corresponding author )

Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 ( ISCA ) 2017- 1153 - 1157 2017.08 [Refereed]

　View Summary

An auditory model was developed to explain the results of behavioral experiments on perception of speaker size with voiced speech sounds. It is based on the dynamic, compressive gammachirp (dcGC) filterbank and a weighting function (SSI weight) derived from a theory of size-shape segregation in the auditory system. Voiced words with and without high-frequency emphasis (+6 dB/octave) were produced using a speech vocoder (STRAIGHT). The SSI weighting function reduces the effect of glottal pulse excitation in voiced speech, which, in turn, makes it possible for the model to explain the individual subject variability in the data.

DOI
A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis.

Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino （Part： Last author )

Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 ( ISCA ) abs/1702.06724 1358 - 1362 2017.08 [Refereed]

　View Summary

We Formulated And Implemented A Procedure To Generate Aliasing-Free Excitation Source Signals. It Uses A New Antialiasing Filter In The Continuous Time Domain Followed By An Iir Digital Filter For Response Equalization. We Introduced A Cosine-Series-Based General Design Procedure For The New Antialiasing Function. We Applied This New Procedure To Implement The Antialiased Fujisaki-Ljungqvist Model. We Also Applied It To Revise Our Previous Implementation Of The Antialiased Fant-Liljencrants Model. A Combination Of These Signals And A Lattice Implementation Of The Time Varying Vocal Tract Model Provides A Reliable And Flexible Basis To Test FO Extractors And Source Ape-Riodicity Analysis Methods. Matlab Implementations Of These Antialiased Excitation Source Models Are Available As Part Of Our Open Source Tools For Speech Science.

DOI
The Effect of Spectral Tilt on Size Discrimination of Voiced Speech Sounds.

Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara,Roy D. Patterson （Part： Corresponding author )

Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 ( ISCA ) 2017- 601 - 605 2017.08 [Refereed]

　View Summary

A number of studies, with either voiced or unvoiced speech, have demonstrated that a speaker's geometric mean formant frequency (MFF) has a large effect on the perception of the speaker's size, as would be expected. One study with unvoiced speech showed that lifting the slope of the speech spectrum by 6 dB/octave also led to a reduction in the perceived size of the speaker. This paper reports an analogous experiment to determine whether lifting the slope of the speech spectrum by 6 dB/octave affects the perception of speaker size with voiced speech (words). The results showed that voiced speech with high-frequency enhancement was perceived to arise from smaller speakers. On average, the point of subjective equality in MFF discrimination was reduced by about 5%. However, there were large individual differences
some listeners were effectively insensitive to spectral enhancement of 6 dB/octave
others showed a consistent effect of the same enhancement. The results suggest that models of speaker size perception will need to include a listener specific parameter for the effect of spectral slope.

DOI
Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio.

Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 ( ISCA ) 2017- 2949 - 2953 2017.08 [Refereed]

　View Summary

A new intelligibility prediction measure, called "Gammachirp Envelope Distortion Index (GEDI)" is proposed for the evaluation of speech enhancement algorithms. This model calculates the signal-to-distortion ratio (SDR) in envelope responses SDRenv derived from the gammachirp filterbank outputs of clean and enhanced speech, and is an extension of the speech based envelope power spectrum model (sEPSM) to improve prediction and usability. An evaluation was performed by comparing human subjective results and model predictions for the speech intelligibility of noise-reduced sounds processed by spectral subtraction and a recent Wiener filtering technique. The proposed GEDI predicted the subjective results of the Wiener filtering better than those predicted by the original sEPSM and well-known conventional measures, i.e., STOI, CSII, and HASPI.

DOI
Pitch: The perceptual ends of the periodicity; but of what periodicity?

Minoru Tsuzaki, Sawa Hanada, Junko Sonoda, Satomi Tanaka, Toshio Irino

Proceedings of the INTER-NOISE 2016 - 45th International Congress and Exposition on Noise Control Engineering: Towards a Quieter Future ( German Acoustical Society (DEGA) ) 6687 - 6698 2016.08 [Refereed] [Invited]

　View Summary

The model for pitch assumes that pitch is based on the periodicity in the neural activities after the cochlear filtering. One could argue that the auditory system "uses" the pitch as cue for stream segregation. A question, however, would whether pitch is a cause or an end of such grouping. We investigated the case where two pulse trains with an identical periodicity are added with variable temporal disparities. The second pulse train with the identical IPI was added with various phase delays. When the phase delay was 50 %, the pitch raised by an octave. This impression of the octave shift appeared to be continuous as a function of the degree of the phase delay except for a hump was observed at 25 % point. The auditory model could not provide any corresponding peak in the time interval histogram of the neural activities. Another series of experiments by the authors indicated that aged absolute pitch possessors tended to perceive pitches higher than young AP possessors. An additional experiment using experimental sounds indicated that similar results could be obtained only for sounds having temporal information in the lower order region.
The Effect of Peripheral Compression on Syllable Perception Measured with a Hearing Impairment Simulator

Toshie Matsui, Toshio Irino, Misaki Nagae, Hideki Kawahara, Roy D. Patterson （Part： Corresponding author )

PHYSIOLOGY, PSYCHOACOUSTICS AND COGNITION IN NORMAL AND IMPAIRED HEARING ( SPRINGER-VERLAG BERLIN ) 894 307 - 314 2016 [Refereed]

　View Summary

Hearing impaired (HI) people often have difficulty understanding speech in multi-speaker or noisy environments. With HI listeners, however, it is often difficult to specify which stage, or stages, of auditory processing are responsible for the deficit. There might also be cognitive problems associated with age. In this paper, a HI simulator, based on the dynamic, compressive gammachirp (dcGC) filterbank, was used to measure the effect of a loss of compression on syllable recognition. The HI simulator can counteract the cochlear compression in normal hearing (NH) listeners and, thereby, isolate the deficit associated with a loss of compression in speech perception. Listeners were required to identify the second syllable in a three-syllable "nonsense word", and between trials, the relative level of the second syllable was varied, or the level of the entire sequence was varied. The difference between the Speech Reception Threshold (SRT) in these two conditions reveals the effect of compression on speech perception. The HI simulator adjusted a NH listener's compression to that of the "average 80-year old" with either normal compression or complete loss of compression. A reference condition was included where the HI simulator applied a simple 30-dB reduction in stimulus level. The results show that the loss of compression has its largest effect on recognition when the second syllable is attenuated relative to the first and third syllables. This is probably because the internal level of the second syllable is attenuated proportionately more when there is a loss of compression.

DOI
Speech intelligibility prediction based on the envelope power spectrum model with the dynamic compressive gammachirp auditory filterbank

Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5 ( ISCA-INT SPEECH COMMUNICATION ASSOC ) 2885 - 2889 2016 [Refereed]

　View Summary

In this study, we develop a new method to realize speech intelligibility prediction of synthetic sounds processed by nonlinear speech enhancement algorithms. A speech envelope power spectrum model (sEPSM) was proposed to account for subjective results on a spectral subtraction, but it is untested by recent state-of-the-art speech enhancement algorithms. We introduce a dynamic compressive gammachirp auditory filterbank as the front-end of the sEPSM (dcGC-sEPSM) to improve the predictability. We perform subjective experiments on speech intelligibility (SI) of noise-reduced sounds processed by the spectral subtraction, and a recently developed Wiener filter algorithm. We compare the subjective SI scores with the objective SI scores predicted by the proposed dcGC-sEPSM, the original GT-sEPSM, the three-level coherence SII (CSII), and the short time objective intelligibility (STOI). The results show that the proposed dcGC-sEPSM performs better than the conventional models.

DOI
Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation.

Hideki Kawahara, Ken-Ichi Sakakibara, Hideki Banno, Masanori Morise, Tomoki Toda, Toshio Irino

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2015, Hong Kong, December 16-19, 2015 ( IEEE ) 520 - 529 2015.12 [Refereed]

　View Summary

A closed-form representation of anti-aliased L-F model is derived for a LPF function family based on cosine series. The Matlab based implementation of the derived form provides virtually aliasing-free source signal, which is applicable to speech synthesis and F0 extractor evaluation. This aliasing-free representation is also suitable for testing perceptual effects of wave shape parameters in the L-F model, since possible artifacts caused by spurious component are completely removed. A post processing procedure for fine tuning spectral shape is also introduced. An interactive tool for investigating speech production model parameters is designed using this Matlab implementation and will be made freely available.

DOI
How the slope of the speech spectrum affects the perception of speaker size.

Kodai Yamamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara,Roy D. Patterson （Part： Corresponding author )

INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 ( ISCA ) 1556 - 1560 2015.09 [Refereed]

　View Summary

We performed a behavioral experiment to demonstrate the effect of spectral slope on the perception of speaker size, and we developed an auditory model based on the dynamic compressive gammachirp filterbank (dcGC-FB) to explain the results. STRAIGHT was used to generate "unvoiced" and "whispered" versions of naturally recorded words; the only difference was that the spectral slope of the whispered words was tilted up 6 dB/octave with respect to that of the unvoiced words. The experiment confirmed that the whispered words are heard to come from smaller speakers. The auditory model uses the tonotopic excitation pattern, Ep, as the internal representation of speech sounds. The model is found to be much more effective when the gradient of the excitation pattern, del Ep, is included in the size discrimination process. It is particularly useful for explaining individual subject variability.
劣化音声認識における単語の音響的連続性とモーラ遷移情報の影響の評価

森本隆司, 入野俊夫, 西村竜一, 河原英紀（Part： Corresponding author )

日本音響学会誌 ( 一般社団法人日本音響学会 ) 70 ( 11 ) 578 - 588 2014.11 [Refereed]

　View Summary

模擬難聴を実現する一手段として劣化音声を用いることが考えられている。日常会話における聴取特性を調べたい場合,単音節ではなく単語以上の単位の音声を使うことが望ましい。しかし,音声発話に伴う調音や韻律の連続性や心的辞書内のモーラ遷移情報がどの程度結果に影響しているか分かっていない。そこで本研究では,単語了解度試験用リストFW03中の低親密度単語の劣化音声における音響的な連続性やモーラ遷移情報の影響を評価することを試みた。まず,自然発話単語の劣化音声の聴取実験の結果と対比するために,単音節を有意味あるいは無意味に並ぶようにした単音節系列劣化音声を用いた聴取実験を行った。更に,自動音声認識器を用いて自然発話単語における劣化音声の認識実験を行い,人間の聴取実験結果と対比して考察した。この結果,人間でも自動音声認識器で抽出可能な音響的な連続性やモーラ遷移情報に支えられて劣化音声を認識していることが示唆された。

DOI
Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation.

Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino （Part： Last author )

INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014 ( ISCA ) 2243 - 2247 2014.09 [Refereed]

　View Summary

（発表日 17 Sept.）
Proposal for an Interactive 3D Sound Playback Interface Controlled by User behavior.

Ryuichi Nisimura, Kazuki Hashimoto, Hideki Kawahara, Toshio Irino （Part： Last author )

HCI International 2014 - Posters' Extended Abstracts - International Conference, HCI International 2014, Heraklion, Crete, Greece, June 22-27, 2014. Proceedings, Part I ( Springer ) 434 446 - 450 2014.06 [Refereed]

　View Summary

Springer International Publishing, (2014), presented at HCI International 2014 (Poster), Heraklion, Crete, Greece,

DOI
Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals

Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( IEEE ) 1 - 10 2014 [Refereed]

　View Summary

A new group delay representation, which yields value zero for periodic signals irrespective to the initial phase and the relative level of each harmonic component. This new group delay representation provides a unified basis for defining "aperiodicity" in speech sounds. For example, the periodic to noise ratio or harmonic to noise ratio is directly derived from the deviation of this group delay representation from value zero, after removing FM effects of harmonic frequencies and removing AM effects of harmonic component level. The derived deviation is combined with estimated excitation duration information and used to design aperiodic components of excitation source for high-quality synthetic speech. The proposed group delay representation is based on F0-adaptive weighted average of frequency shifted versions and temporally shifted versions of group delays with power spectral weighting.

DOI
Hearing Impairment Simulator Based on Compressive Gammachirp Filter

Misaki Nagae, Toshio Irino, Ryuich Nisimura, Hideki Kawahara, Roy D. Patterson

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( IEEE ) 1 - 4 2014 [Refereed]

　View Summary

This paper describes a simulator for presenting normal hearing (NH) listeners with the experience of a hearing impaired (HI) listener. The simulator is based on the compressive gammachirp (cGC) filter used to derive level-dependent filter shapes and the cochlear compression function from to notched noise masking data. The level dependence of the cGC is reversed to produce inverse compression which is used to resynthesize sounds that cancel the compression applied by the auditory system of the NH listener. A frame-based analysis/synthesis procedure is newly introduced to improve processing speed for a graphical user interface (GUI) that allows the users to control the degree of compression within the range of the audiogram of the HI person. The simulator is intended for speech-language hearing therapists (ST) and patients' families.

DOI
Development of a Mobile Application for Crowdsourcing the Data Collection of Environmental Sounds

Minori Matsuyama, Ryuichi Nisimura, Hideki Kawahara, Junnosuke Yamada, Toshio Irino

HUMAN INTERFACE AND THE MANAGEMENT OF INFORMATION: INFORMATION AND KNOWLEDGE DESIGN AND EVALUATION, PT I ( SPRINGER-VERLAG BERLIN ) 8521 514 - 524 2014 [Refereed]

　View Summary

Our study introduces a mobile navigation system enabling a sound input interface. To realize high-performance environmental sound recognition system using Android devices, we organized a database of environmental sounds collected in our daily lives. Crowdsourcing is a useful approach for organizing a database based on collaborative works of people. We recruited trial users to test our system via a web-based crowdsourcing service provider in Japan. However, we found that improvement of the system is important for maintaining the motivation of users in order to continue the collection of sounds. We believe that the improved user interface (UI) design introduced to facilitate the annotation task. This paper describes an overview of our system, focusing on a method for utilizing the crowdsourcing approach using Android devices, and its UI design. We developed a touch panel UI for the annotation task by selecting an appropriate class of a sound source.

DOI
Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information

Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 ( ISCA-INT SPEECH COMMUNICATION ASSOC ) 870 - 874 2014 [Refereed]

　View Summary

A highly-reproducible estimation method of vocal tract length (VTL) and text independent VTL estimation method are proposed based on a Japanese vowel database spoken by 385 male and female speakers ranging from age 6 to 56 and other vowel database with MRI-based vocal tract shape information. Proposed methods are based on interference-free power spectral representation and systematic suppression of biasing factors. MRI data is used to calibrate VTL estimation result to be represented in terms of physically meaningful unit. These databases are normalized based on the estimated VTL information to provide a reference template, which is used to implement a text independent VTL estimation method. A prototype system for text independent estimation of VTL is implemented using Mat lab and runs faster than realtime on a PC.
Continuous Annotations for Dialogue Status and Their Change Points

Masashi Inoue, Toshio Irino, Ryoko Hanada, Nobuhiro Furuyama, Hiroyasu Massaki

LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION ( EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA ) 2014 [Refereed]

　View Summary

This paper presents an attempt to continuously annotate the emotion and status of multimodal corpora for understanding pyschotherapeutic interviews. The collected continuous annotations are then used as the signal data to find change points in the dialogues. Our target dialogues are carried between clients with some psychological problems and their therapists. We measured two values, namely the degree of the dialogue progress and the degree of clients being listened to. The first value reflects the goal-oriented nature of the target dialogues. The second value corresponds to the idea of active listening that is considered as an important aspect in psychotherapy. We have modified an existing continuous emotion annotation toolkit that has been created for tracking generic emotion of dialogues. By applying a change point detection algorithm on the obtained annotations, we evaluated the validity and utility of the collected annotation based on our method.
Spectrally estimaed vocal tract lengths of singing voices and their contributing factors,

Toshio Irino

Proc. MAVEBA 2013 , Firenze, Italy, 16 - 18 Dec. 2013. 2013.12 [Refereed]

　View Summary

(発表 17 Dec. 2013)
Vocal tract length estimation for voiced and whispered speech using gammachirp filterbank.

Toshio Irino, Erika Okamoto, Ryuichi Nisimura,Array

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013, Kaohsiung, Taiwan, October 29 - November 1, 2013 ( IEEE ) 1 - 4 2013.10 [Refereed]

　View Summary

(発表 30 Oct. 2013)

DOI
Controlling linguistic information and filtered sound identity for a new cross-synthesis vocoder.

Taiki Nishi, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

Acoust. Sci. & Tech. (ed. by the Acoustical Society of Japan) 34 ( 4 ) 287 - 288 2013.07 [Refereed]

　View Summary

A study was conducted to propose a new cross-synthesis framework based on an interference-free representation of a power spectrum combined with normalization and modulation transfer function design for spectral envelope preprocessing of speech sounds. The proposed cross-synthesis enabled control of the linguistic information and the timbre identity. The spectral envelope of speech was extracted in the proposed method using a F0-adaptive procedure called TANDEM-STRAIGHT. It was demonstrated that the procedure effectively removed interference caused by periodic excitation from the spectrogram of the speech and yielded a smooth representation. A two-staged procedure was also introduced to remove the timbre-modifying components from the speech spectral envelope. The primary procedure involved the approximation of the global spectral shape and the secondary one was the filtering of temporal modulations.

DOI
The role of size normalization in vowel recognition and speaker identification,

Roy D. Patterson, Toshio Irino

The 21st International Congress on Acoustics, ICA2013 , 1pSCb7, ASA Proceedings of Meetings on Acoustics (POMA) 19, 060038, Montreal, Canada, 2 - 7, June, 2013. 2013.06 [Refereed]

　View Summary

(発表 3 June 2013)

DOI
Estimated relative vocal tract lengths from vowel spectra based on fundamental frequency adaptive analyses and their relations to relevant physical data of speakers,

Mayuko Kobayashi, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

ICA2013 , 5aCb44, ASA Proceedings of Meetings on Acoustics (POMA) 19, 060288, Montreal, Canada, 2 - 7, June, 2013. 19 2013.06 [Refereed]

　View Summary

(発表 7 June 2013)

DOI
Optimizing the simultaneous estimation of frequency selectivity and compression using notched-noise maskers with asymmetric levels,

Tomofumi Fukawatase, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara, Roy D. Patterson

The 21st International Congress on Acoustics, ICA2013 , 1aPP3, ASA Proceedings of Meetings on Acoustics (POMA) 19, 050022, Montreal, Canada, 2 - 7, June, 2013. 19 2013.06 [Refereed]

　View Summary

It is important for the development of hearing aids and other audio devices to estimate the frequency selectivity and compression of the auditory filter accurately. Previously, we reported a technique for estimating the compression of the auditory filter that combined data from a simultaneous notched-noise experiment and a temporal masking curve (TMC) experiment. Unfortunately, the TMC data derived for individual listeners in forward masking is not stable
the cue to the presence of the signal is not entirely clear in forward masking. In this paper, we report attempts to make the traditional simultaneous notched-noise technique more sensitive to the effects of cochlear compression by varying the relative levels of the noise bands. Asymmetric-level maskers (ALMs) make it possible to estimate the filter shape and compression of the auditory filter simultaneously and reliably
the slope of the input-output function is substantially lower than with symmetric-level maskers. We also describe a procedure for incorporating a sensitivity analysis into the filter-fitting process to determine the minimum number of notched-noise conditions required to produce reliable estimates of selectivity and compression, in hopes of being able to employ the technique with hearing impaired listeners. © 2013 Acoustical Society of America.

DOI
Higher order waveform symmetry measure and its application to periodicity detectors for speech and singing with fine temporal resolution.

Hideki Kawahara, Masanori Morise, Ryuichi Nisimura, Toshio Irino

IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013 ( IEEE ) 6797 - 6801 2013.05 [Refereed]

　View Summary

(発表 30 May 2013)

DOI
Accurate estimation of compression in simultaneous masking enables the simulation of hearing impairment for normal-hearing listeners.

Irino T, Fukawatase T, Sakaguchi M, Nisimura R, Kawahara H, Patterson RD

Advances in experimental medicine and biology ( SPRINGER ) 787 73 - 80 2013 [Refereed]

　View Summary

This chapter presents a unified gammachirp framework for estimating cochlear compression and synthesizing sounds with inverse compression that cancels the compression of a normal-hearing (NH) listener to simulate the experience of a hearing-impaired (HI) listener. The compressive gammachirp (cGC) filter was fitted to notched-noise masking data to derive level-dependent filter shapes and the cochlear compression function (e.g., Patterson et al., J Acoust Soc Am 114:1529-1542, 2003). The procedure is based on the analysis/synthesis technique of Irino and Patterson (IEEE Trans Audio Speech Lang Process 14:2222-2232, 2006) using a dynamic cGC filterbank (dcGC-FB). The level dependency of the dcGC-FB can be reversed to produce inverse compression and resynthesize sounds in a form that cancels the compression applied by the auditory system of the NH listener. The chapter shows that the estimation of compression in simultaneous masking is improved if the notched-noise procedure for the derivation of auditory filter shape includes noise bands with different levels. Since both the estimation and resynthesis are performed within the gammachirp framework, it is possible for a specific NH listener to experience the loss of a specific HI listener.

DOI
Perceptual outcomes by rapid alternation of the resonant scaling and its relation to the fundamental frequency.

Minoru Tsuzaki, Takeshima Chihiro, Matsui Toshie, Irino Toshio

Proceedings of Meetings on Acoustics 19 2013 [Refereed]

　View Summary

Timbre provided by the resonant characteristics of the vibrating body can be represented as spectral envelope patterns and can contribute as one of the important cues for sound source identification. However, its concept is not so strictly established as that of loudness, and of pitch. Recently, the fact that the spectral pattern can be decomposed into two factors, i.e., the shape and size of the resonant body, has been reconsidered. Several psychophysical findings have successfully suggested tat a "bottom-up" perceptual mechanism of the decomposition might be implemented. Manipulating the scaling factor of resonance can change the perceptual size of the sound source. By concatenating synthesized vowel segments whose resonant scale (RS) alternates between two values in an "ABA-ABA-" fashion, one can generate series of test stimuli for stream segregation with the galloping rhythm paradigm. The experimental results revealed that th e RS factor could provide a reliable cue for streaming. As an extreme variation of this RS alternation, scale alternating wavelet sequences (SAWSs) have been proposed. In the SAWS, the RS alternates at every regular time grid. When the difference between the two RS factors exceeded a certain limit, perceived pitch shifted downwards by an octave. © 2013 Acoustical Society of America.

DOI
Controlling "shout" expression in a Japanese POP singing performance: analysis and suppression study.

Yuri Nishigaki, Ken-Ichi Sakakibara, Masanori Morise, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013 ( ISCA ) 2905 - 2909 2013 [Refereed]

　View Summary

Degree of "shout" singing performance is effectively controlled by combining global spectral shape equalization, peak cancellation in frequency modulation spectrum of FO trajectory, and synchronized shape-modulation of voice spectral envelope. This "shout-reduction" processing is based on a symmetry based FO extractor with fine temporal resolution, a temporally stable representation of instantaneous frequency of periodic signals, and the TANDEM-STRAIGHT, a speech analysis, modification and resynthesis framework. The proposed procedure successfully converted an expressive Japanese POP song performance with "shout" into a plain performance without damaging original naturalness. Possibility of adding artificial "shout" to plain performance is also discussed.
Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds.

Hideki Kawahara, Masanori Morise, Tomoki Toda, Ryuichi Nisimura, Toshio Irino

INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013 ( ISCA ) 34 - 38 2013 [Refereed]

　View Summary

A new spectral envelope estimation procedure is proposed to recover details beyond band limitation imposed by the Shannon's sampling theory when interpreting periodic excitation of voiced sounds as the sampling operation in the frequency domain. The proposed procedure is a hybrid of STRAIGHT, a FO-adaptive spectral envelope estimation and the auto regressive model parameter estimation. Wavelet analyses of these spectral models on the frequency domain enabled objective evaluation of this recovery procedure. The proposed procedure provides better speech quality especially when parameter manipulation is introduced.
Comparison of performance with voiced and whispered speech in word recognition and mean-formant-frequency discrimination

Toshio Irino, Yoshie Aoki, Hideki Kawahara, Roy D. Patterson

SPEECH COMMUNICATION ( ELSEVIER SCIENCE BV ) 54 ( 9 ) 998 - 1013 2012.11 [Refereed]

　View Summary

There has recently been a series of studies concerning the interaction of glottal pulse rate (GPR) and mean-formant-frequency (MFF) in the perception of speaker characteristics and speech recognition. This paper extends the research by comparing the recognition and discrimination performance achieved with voiced words to that achieved with whispered words. The recognition experiment shows that performance with whispered words is slightly worse than with voiced words at all MFFs when the GPR of the voiced words is in the middle of the normal range. But, as GPR decreases below this range, voiced-word performance decreases and eventually becomes worse than whispered-word performance. The discrimination experiment shows that the just noticeable difference (JND) for MFF is essentially independent of the mode of vocal excitation; the JND is close to 5% for both voiced and voiceless words for all speaker types. The interaction between GPR and VTL is interpreted in terms of the stability of the internal representation of speech which improves with GPR across the range of values used in these experiments. (c) 2012 Elsevier B.V. All rights reserved.

DOI
Accurate estimation of compression in simultaneous masking enables the simulation of hearing impairment for normal hearing listeners,

Toshio Irino, Tomofumi Fukawatase, Makoto Sakaguchi, Ryuichi Nisimura, Hideki Kawahara, Roy D. Patterson

16th International Symposium on Hearing (ISH2012) , St John's College, Cambridge UK, 23-27 July, 2012 ( SPRINGER ) 787 73 - 80 2012.07 [Refereed]

　View Summary

（発表日 23 July)

DOI
Multimodal corpus for psychotherapeutic situation,

Masashi Inoue, Ryoko Hanada, Nobuhiro Furuyama, Toshio Irino, Takako Ichinomiya, Hiroyasu Massaki

Workshop on Multimodal corpora: How Should Multimodal corpora Deal with the Situation? , (Pre-conference workshop of LREC 2012 ), Istanbul, Turkey, 22 May 2012. 2012.05 [Refereed]

　View Summary

（発表日 22 May）
Modulation transfer function design for a flexible cross synthesis VOCODER based on F0 adaptive spectral envelope recovery

Taiki Nishi, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( IEEE ) 1 - 7 2012 [Refereed]

　View Summary

A new design procedure for flexible cross synthesis VOCODER is proposed based on TANDEM-STRAIGHT framework, a F0 adaptive spectral envelope estimator, and modulation transfer function design. The proposed design procedure enables control of speech intelligibility and timber identity of musical instruments or animal voices. Removal of the averaged and smoothed logarithmic spectrum of speech from the filter reduced the timbre modification effect of filtered sounds and manipulation of cut-off frequencies of modulation transfer function for designing the filter enabled control of trade-offs between intelligibility and timbre preservation.
Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation

Hideki Kawahara, Masanori Morise, Ryuichi Nisimura, Toshio Irino

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 ( ISCA-INT SPEECH COMMUNICATION ASSOC ) 386 - 389 2012 [Refereed]

　View Summary

A simple and high-speed F0 extractor with high temporal resolution is proposed based on a waveform symmetry measure. Strictly speaking, it is not an F0 extractor. Instead, it is a detector of the lowest prominent sinusoidal component with a salience measure. It can make use of an F0 refinement procedure, when the signal under investigation is a sum of harmonic sinusoidal components. The refinement procedure is based on a stable representation of instantaneous frequency of periodic signals. Application of the proposed algorithm revealed that rapid temporal modulations in both F0 trajectory and spectral envelope exist typically in expressive voices such as lively singing performance. Manipulation of these temporal fine structures (texture) effectively modified perceptual expressiveness, while somewhat preserving perceptual vocal effort and register.
Detecting child speaker based on auditory feature vectors for VTL estimation

Ryuichi Nisimura, Shoko Miyamori, Erika Okamoto, Hideki Kawahara, Toshio Irino

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( IEEE ) 1 - 5 2012 [Refereed]

　View Summary

We introduce novel auditory features in the hidden Markov model (HMM) system for detecting child speakers. The features derived by the gammachirp auditory filterbank (GCFB) have been demonstrated to be suitable for vocal tract length (VTL) estimation, both theoretically and experimentally. We performed numerical experiments to distinguish between child and adult speakers using HMMs trained on 2,360 speech samples collected through a web-based query interface, and we compared the performance of the common mel-frequency cepstral coefficients (MFCC) and the GCFB-based feature vectors. We also introduced the modulation features as the substitution of delta parameters. It has been clearly demonstrated that the error rate distinguishing a child from an adult is reduced by GCFB. To enhance our method for use as a web application, we applied our original voice-enabled web framework to the front-end interface of the proposed system.
An interference-free representation of group delay for periodic signals

Hideki Kawahara, Masanori Morise, Ryuichi Nisimura, Toshio Irino

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( IEEE ) 1 - 4 2012 [Refereed]

　View Summary

This article introduces a new group delay representation for periodic signals. The proposed method yields a group delay representation that is free from interferences due to repetitive excitation. Power spectrum-weighted averaged group delay using shifted copies of the weighted group delay separated by a half fundamental frequency is proven to have the desired property.
Developing a method to build Japanese speech recognition system based on 3-gram language model expansion with Google database,

Toshiaki Shimada, Ryuichi Nisimura, Masayasu Tanaka, Hideki Kawahara, Toshio Irino

IEEE International Conference on Intelligent Computing and Integrated Systems ICISS2011 , Guilin, China, 24-26 Oct 2011. ( IEEE Computer Society ) 2011.10 [Refereed]

　View Summary

We have developed a method to build a Japanese automatic speech recognition (ASR) system based on 3-gram language model expansion with the Google database. Our aim is to enhance the recognition accuracy of ASR systems based on the 3-gram language model, even in cases where the language model is trained using short text segments. We investigate a practical approach to expanding language models by using 3-gram information from external web documents. In addition, we filter 3-gram entries on the basis of term frequency-inverse document frequency (TF-IDF) scores and the output of the Yahoo! web API to prevent the unnecessary addition of redundant or irrelevant 3-gram entries. In the experiments, we achieved an improvement of 0.71% in the word error rate and proved that the recognition accuracy can be improved by combining the proposed method and the traditional back-off smoothing technique without any costs being incurred in collecting additional text for training the model. © 2013 IEEE.

DOI
Evaluation and Optimization of FO-Adaptive Spectral Envelope Extraction Based on Spectral Smoothing with Peak Emphasis

AKAGIRI Hayato, MORISE Masanori, IRINO Toshio, KAWAHARA Hideki

The IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (Japanese edition) A ( 一般社団法人電子情報通信学会 ) 94 ( 8 ) 557 - 567 2011.08 [Refereed]

　View Summary

窓の位置に依存しない周期信号のパワースペクトルの計算方法と,対数スペクトル上での基本周波数に適応したスペクトル平滑化及び補償処理を組み合わせることにより,聴覚的に重要であるスペクトルピーク周辺での近似精度を改善した,スペクトル包絡の抽出法を提案する.提案法はケプストラムのlifterとして実装されており,1個の調整用パラメータを有する.本研究では,MRIに基づく声道形状及び梨状窩や音源波形モデルから求められるスペクトルを目標としたシミュレーションにより,このパラメータを数値的に最適化する.なお,最適化のための精度の評価には,聴覚の特性を反映した周波数軸重みを加えた板倉-斎藤距離を用いる.その結果,数値的に最適化された提案法は,短時間パワースペクトル,ケプストラムの次数打切りによる平滑化,線形予測分析,STRAIGHTの従来の実装のいずれよりも高い近似精度であることが示された.
Evaluation of voice morphing using vocal tract length normalization based on auditory filterbank,

Erika Okamoto, Toshio Irino, Ryuichi Nishimura, Hideki Kawahara

J. Signal Processing (信号処理） ( 〔信号処理学会〕 ) 15 ( 4 ) 283 - 286 2011.07
A proposal of expanding language model using web data resources for Japanese automatic speech recognition systems,

Ryuichi Nisimura, Toshiaki Shimada, Yuuki Nagai, Hideki Kawahara, Toshio Irino

2011 International Conference on Data Engineering and Internet Technology ( DEIT 2011 ),429-432, Bali Dynasty Resort, Bali, Indonesia, 15-17 March 2011. 429 - 432 2011.03 [Refereed]

　View Summary

（発表日 16 Mar.）
Evaluation of Voice Morphing Using Vocal Tract Length Normalization Based on Auditory Filterbank,

Erika Okamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara

2011 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing ( NCSP'11 )Tianjin SaiXiang Hotel, Tianjin, China, 1-3 March , 2011. 187 - 190 2011.03 [Refereed]

　View Summary

（発表日 2 Mar.）
A New Formulation of a Multiple Periodicity Extractor for Expressive and Pathological Voices,

Yoshika Wada, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

2011 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing ( NCSP'11 ),Tianjin SaiXiang Hotel, Tianjin, China, 1-3 March , 2011. 336 - 339 2011.03 [Refereed]

　View Summary

（発表日 3 Mar.）
AN INTERFERENCE-FREE REPRESENTATION OF INSTANTANEOUS FREQUENCY OF PERIODIC SIGNALS AND ITS APPLICATION TO F0 EXTRACTION

H. Kawahara, T. Irino, M. Morise

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING ( IEEE ) 5420 - 5423 2011 [Refereed]

　View Summary

An interference-free representation of the instantaneous frequency of constituent harmonic components of periodic signals is introduced. The power weighted average instantaneous frequency of a band-pass filter yields this property when the effective passband of the filter covers up to two harmonic components and the two windows used in averaging are separated by a half pitch period. The proposed representation eliminates the abrupt changes found in usual instantaneous frequency representations and is applicable to any periodic signals consisting of multiple harmonic components. An F0 extractor of voiced sounds based on this representation is introduced as an example of prospective applications.

DOI
Development of Web-Based Voice Interface to Identify Child Users Based on Automatic Speech Recognition System

Ryuichi Nisimura, Shoko Miyamori, Lisa Kurihara, Hideki Kawahara, Toshio Irino

HUMAN-COMPUTER INTERACTION: USERS AND APPLICATIONS, PT IV ( SPRINGER-VERLAG BERLIN ) 6764 607 - 616 2011 [Refereed]

　View Summary

We propose a method to identify child speakers, which can be adopted in Web filtering systems to protect children from the dangers of the Internet. The proposed child identification method was developed relies on an automatic speech recognition (ASR) algorithm, that uses an acoustic hidden Markov model (HMM) and a support vector machine (SVM). To extend the proposed method for use in a Web application, we used our voice-enabled Web system (the w3voice system) as a front-end interface for a prototype system. In this paper, we present an overview of the prototype system to elucidate our proposal. We also evaluate the efficacy of the proposed method in identifying child speakers by using voices captured from real Web users.

DOI
Manual and Accelerometer Analysis of Head Nodding Patterns in Goal-oriented Dialogues

Masashi Inoue, Toshio Irino, Nobuhiro Furuyama, Ryoko Hanada, Takako Ichinomiya, Hiroyasu Massaki

HUMAN-COMPUTER INTERACTION: INTERACTION TECHNIQUES AND ENVIRONMENTS, PT II ( SPRINGER-VERLAG BERLIN ) 6762 259 - 267 2011 [Refereed]

　View Summary

We studied communication patterns in face-to-face dialogues between people for the purpose of identifying conversation features that can be exploited to improve human-computer interactions. We chose to study the psychological counseling setting as it provides good examples of task-oriented dialogues. The dialogues between two participants, therapist and client, were video recorded. The participants' head movements were measured by using head-mounted accelerometers. The relationship between the dialogue process and head nodding frequency was analyzed on the basis of manual annotations. The segments where nods of the two participants correlated were identified on the basis of the accelerometer data. Our analysis suggests that there are characteristic nodding patterns in different dialogue stages.

DOI
Auditory Filterbank Improves Voice Morphing

Erika Okamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 ( ISCA-INT SPEECH COMMUNICATION ASSOC ) 2528 - 2531 2011 [Refereed]

　View Summary

This paper presents a new method for vocal tract length (VTL) estimation and normalization based on a gammachirp auditory filterbank (GCFB) to improve the sound quality in voice morphing. VTL ratios between 28 speakers were estimated based on the spectral distances for all permutations (756 = P-28(27)). The VTL estimation using the mel-frequency filterbank (MFFB), which is a preprocessor for calculating MFCCs commonly used in ASR, was also evaluated for comparison. The results of subjective listening tests of morphed voice sounds with and without VTL normalization are also reported. The objective and subjective results indicate that VTL normalization is essential for voice morphing, and the proposed GCFB-based method outperforms the MFCC-based method.
Comparing Abilities of Humans and Machine for Child Speaker Identification based on Web Utterances Collection,

Shoko Miyamori, Ryuichi Nisimura, Lisa Kurihara, Toshio Irino, Hideki Kawahara

Proceedings of the Second APSIPA Annual Summit and Conference (APSIPA 2010)(Student Symposium) 9 2010.12 [Refereed]

　View Summary

Biopolis, Singapore, 14-17 Dec. 2010. (発表日 14 Dec.)
Optimization of a multiple local periodicity detector for vocal excitation structure analysis

Yoshika Wada, Masanori Morise, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

APSIPA ASC 2010 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Biopolis, Singapor, 14-17 Dec. 2010 518 - 521 2010.12 [Refereed]

　View Summary

Non-periodic voices play indispensable roles in expressive speech, traditional theatrical performance, various types of singing and other vocal activities. Such voices usually have complex excitation structures, which are not readily represented by a single number, F0. This article introduces optimization of system parameters and evaluation of our new analysis procedure called XSX (eXcitation Structure eXtractor), designed for such complex excitation signals. The proposed method, XSX consists of two subsystems; an integrated periodicity detector which extracts simultaneous multiple periodicity candidates and a frequency refinement procedure based on instantaneous frequency of F0 and harmonic components. Firstly, the candidate detector is optimized followed by optimization of the refinement procedure. Secondly, comparative test with conventional F0 extractors were conducted and revealed that the proposed method outperforms those procedures in terms of accuracy and tracking speed.
Real world utterance collection using voice-enabled web system for child speaker identification,

Shoko Miyamori, Ryuichi Nisimura, Lisa Kurihara, Toshio Irino, Hideki Kawahara

13th Oriental COCOSDA Workshop, O-COCOSDA 2010, 2010.11 [Refereed]

　View Summary

Kathmandu, Nepal, 24-25, Nov., 2010. (発表日 25 Nov.)
An introduction to auditory filter

IRINO Toshio

The Journal of the Acoustical Society of Japan ( 一般社団法人日本音響学会 ) 66 ( 10 ) 506 - 512 2010.10 [Invited]

DOI
Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems.

Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino

INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010 ( ISCA ) 38 - 41 2010.09 [Refereed]

　View Summary

Makuhari, Japan, 26-30 Sep., 2010. (発表日 27 Sep.)
Evaluation and optimization of F0-adaptive spectral envelope estimation based on spectral smoothing with peak emphasis,

Hayato Akagiri, Masanori Morise, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

20th International Congress on Acoustics, ICA2010, 2010.08 [Refereed]

　View Summary

Sydney, Australia, 23-27 Aug., 2010. (発表日 24 Aug.)
Analysis and synthesis of singing with hoarse vocal expressions,

Hideki Kawahara, Hanae Itagaki, Yoshika Wada, Masanori Morise, Ryuichi Nisimura, Toshio Irino

20th International Congress on Acoustics, ICA2010 2010.08 [Refereed]

　View Summary

Sydney, Australia, 23-27 Aug., 2010. (発表日 26 Aug.)
Auditory speech processing for scale-shift covariance and its evaluation in automatic speech recognition.

Roy D. Patterson, Thomas C. Walters, Jessica Monaghan, Christian Feldbauer, Toshio Irino

International Symposium on Circuits and Systems (ISCAS 2010), May 30 - June 2, 2010, Paris, France ( IEEE ) 3813 - 3816 2010.05 [Refereed]

　View Summary

(発表日 2 Jun 2010)

DOI
High-quality and light-weight voice transformation enabling extrapolation without perceptual and objective breakdown.

Array,Ryuichi Nisimura, Toshio Irino, Masanori Morise, Toru Takahashi, Hideki Banno

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA ( IEEE ) 4818 - 4821 2010.03 [Refereed]

　View Summary

(発表日 19 Mar 2010)

DOI
Perception of vowel sequence with varying speaker size

Chihiro Takeshima, Minoru Tsuzaki, Toshio Irino

Acoustical Science and Technology 31 ( 2 ) 156 - 164 2010.03 [Refereed]

　View Summary

Speech sounds convey information about the size of the speaker. Several studies have demonstrated that human vowel recognition is possible even for an unnatural size range, and have revealed that size factor normalization can be achieved automatically in the auditory system. In this study, we further investigated the characteristics of the size normalization process, using vowel sequences with temporal changes in the speaker size. In the current experiments, listeners were presented with six-vowel sequences in which the vocal-tract length was alternated vowel by vowel. The experimental results for the identification of the vowel sequence showed that it was increasingly difficult for listeners to identify vowels in the correct order as size alternation was applied with a higher speed and to a larger degree. However, they showed the high performance of vowel recognition when serial order judgment between vowels was not required, and in this case the performance deterioration caused by size alternation became small. The observed deterioration of sequence identification is likely to have been caused not by a failure in size normalization in the auditory system but because of a difficulty in judging the serial order between vowels in the sequence with rapid size changes. The results suggest that the auditory system has a fast process for normalizing speaker-size information and that it operates appropriately even when a sequence contains the temporal alternation of vocal-tract length. © 2010 The Acoustical Society of Japan.

DOI
音の持続時間が音源の大きさ知覚に及ぼす影響 : 母音刺激を用いた検討(日本基礎心理学会第28回大会,大会発表要旨)

竹島千尋, 津崎実, 入野俊夫

基礎心理学研究 ( 日本基礎心理学会 ) 28 ( 2 ) 278 - 278 2010

DOI
A bottom-up procedure to extract periodicity structure of voiced sounds and its application to represent and restoration of pathological voices.

Hanae Itagaki, Masanori Morise, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

Sixth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2009, Florence, Italy, December 12-14, 2009 ( Firenze University Press / ISCA ) 115 - 118 2009.12 [Refereed]

　View Summary

(発表日 15 Dec.)
Development of speech input method for interactive voiceweb systems

Ryuichi Nisimura, Jumpei Miyake, Hideki Kawahara, Toshio Irino

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ( SPRINGER-VERLAG BERLIN ) 5611 LNCS 710 - 719 2009.10

　View Summary

We have developed a speech input method called "w3voice" to build practical and handy voice-enabled Web applications. It is constructed using a simple Java applet and CGI programs comprising free software. In our website (http://w3voice.jp/), we have released automatic speech recognition and spoken dialogue applications that are suitable for practical use. The mechanism of voice-based interaction is developed on the basis of raw audio signal transmissions via the POST method and the redirection response of HTTP. The system also aims at organizing a voice database collected from home and office environments over the Internet. The purpose of the work is to observe actual voice interactions of human-machine and human-human. We have succeeded in acquiring 8,412 inputs (47.9 inputs per day) captured by using normal PCs over a period of seven months. The experiments confirmed the user-friendliness of our system in human-machine dialogues with trial users. © 2009 Springer Berlin Heidelberg.

DOI
Topic-Dependent Language Modeling for VoiceWeb Systems

Kentaro Suzuta, Ryuichi Nisimura, Hideki Kawahara, Toshio Irino

WESPAC X 2009 , Beijing, China, 21-23 Sept. 2009 2009.09 [Refereed]

　View Summary

(発表日23 Sept.)
Influences of vowel duration on speaker-size estimation and discrimination.

Chihiro Takeshima, Minoru Tsuzaki, Toshio Irino

INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009 ( ISCA ) 128 - 131 2009.09 [Refereed]

　View Summary

(発表日 7 Sept.)
Observation of empirical cumulative distribution of vowel spectral distances and its application to vowel based voice conversion.

Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino

INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009 ( ISCA ) 2647 - 2650 2009.09 [Refereed]

　View Summary

(発表日 10 Sept.)
Brain regions for auditory size processing of speech sounds,

Toshio Irino, Yuki Tsukada, Yoshikazu Oya, Hideki Kawahara, Roy D. Patterson

Auditory Cortex 2009, Magdeburg, Germany, 29 Aug. - 2 Sept. 2009 2009.08 [Refereed]

　View Summary

(発表日 30-31 Aug)
Size Perception for acoustically scaled sounds of naturally pronounced and whispered words,

Toshio Irino, Yoshie Aoki, Hideki Kawahara, Roy D. Patteson

15th International Symposium on Hearing (ISH2009) , Salamanca, Spain, 1 - 5 Jun. 2009 ( SPRINGER ) 235 - + 2009.06 [Refereed]

　View Summary

(発表日 2 Jun )

DOI
Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown.

Array,Ryuichi Nisimura, Toshio Irino, Masanori Morise, Toru Takahashi, Hideki Banno

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, 19-24 April 2009, Taipei, Taiwan ( IEEE ) 3905 - 3908 2009.04 [Refereed]

　View Summary

(発表日 23 Apr. 2009)

DOI
Perception of size modulated vowel sequence : Can we normalize the size of continuously changing vocal tract?

TSUZAKI Minoru, TAKESHIMA Chihiro, IRINO Toshio

Journal of the Acoustical Society of Japan (E) ( ACOUSTICAL SOCIETY OF JAPAN ) 30 ( 2 ) 83 - 88 2009.03 [Refereed]

　View Summary

Changes in vocal tract size vary the formant frequencies, even when the shape of vocal tracts is the same and the spoken vowels are categorized to be the same. Several studies have demonstrated that the normalization of vocal tract size can be achieved in a bottom-up manner. To investigate how fast this process works, the identification of vowel sequences was examined under conditions where the size was sinusoidally modulated with several frequencies (0.24–62.50 Hz). The performance level changed slightly, but significantly depending on the modulation frequency, and the dependence was not monotonic. The performance dropped for modulation around 4 Hz. The nonmonotonic function could not be predicted by a simple assumption of usage of a single size-estimator that requires a certain processing time. Mismatches were prominent for high frequencies: a deterioration was predicted because of the limited processing time, while the actual performance showed a recovery. This indicates that a switching of the process mode for modulation occurs at around 4 Hz. Below 4 Hz, the auditory system can successfully normalize the size change. Above 4 Hz, the auditory system segregates the sounds using the size cue and the recognition of each vowel is not critically affected.

DOI
Speech Analysis Using Temporally Stable Power Spectrum Estimation Method for Periodic Signals

MORISE Masanori, TAKAHASHI Toru, KAWAHARA Hideki, IRINO Toshio

The IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (Japanese edition) A ( 一般社団法人電子情報通信学会 ) J92-A ( 3 ) 163 - 171 2009.03 [Refereed]

　View Summary

本論文では,周期信号から分析時刻に依存した成分を除去できるパワースペクトル推定法TANDEMを音声分析に用いる場合の評価を行う.TANDEMは,基本周期の半分だけ離れた位置に配置した二つの窓関数で切り出した周期信号のパワースペクトルを平均することで,分析時刻に依存しないパワースペクトルを推定する方法として提案されている.TANDEMの導出においては,サイドローブによる影響と基本周波数の時間変化が無視できるものとしていた.しかし,有限長の窓関数にはサイドローブが存在し,音声の基本周波数は時間的に変化する.本論文では,分析時刻に依存したパワースペクトルの変動量を指標とし,分析対象となる信号の基本周波数が既知という条件で計算機シミユレーションを行い,基本周波数が時間とともに変化し,雑音が混在する音声の分析に適したTANDEM窓を選定する.選定されたTANDEM窓により得られたスペクトル包絡は,従来法よりも分析時刻に依存した変動が少なく,時間分解能,耐雑音性においても優れていることを示す.
Vowel-based voice conversion and its application to singing-voice manipulation

Yuri Yoshida, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

AES 35th Int. Conf. Audio for Games, 11-13 Feb. 2009, London, UK. 2009.02 [Refereed]

　View Summary

(発表日 13 Feb. 2009)
Vowel-based frequency alignment function design and recognition-based time alignment for automatic speech morphing.

Masato Onishi, Toru Takahashi, Toshio Irino,Array

2008 IEEE Spoken Language Technology Workshop, SLT 2008, Goa, India, December 15-19, 2008 ( IEEE ) 25 - 28 2008.12 [Refereed]

　View Summary

(発表日 15 Dec 2008)

DOI
Speech-to-text input method for web system using JavaScript.

Ryuichi Nisimura, Jumpei Miyake,Array, Toshio Irino

2008 IEEE Spoken Language Technology Workshop, SLT 2008, Goa, India, December 15-19, 2008 ( IEEE ) 209 - 212 2008.12 [Refereed]

　View Summary

(発表日 17 Dec. 2008)

DOI
Spectral envelope recovery beyond the nyquist limit for high-quality manipulation of speech sounds.

Hideki Kawahara, Masanori Morise, Hideki Banno, Toru Takahashi, Ryuichi Nisimura, Toshio Irino

INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, September 22-26, 2008 ( ISCA ) 650 - 653 2008.09 [Refereed]

　View Summary

(発表日 24 Sept.)
A unified approach for F0 extraction and aperiodicity estimation based on a temporally stable power spectral representation,

Hideki Kawahara, Masanori Morise, Toru Takahashi, Ryuichi Nisimura, Hideki Banno, Toshio Irino

ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery" Aalborg University 2008.06 [Refereed]

　View Summary

Aalborg, Denmark, 4 - 6, Jun. 2008, (発表日 4 Jun. )
A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments

Tomohiro Nakatani, Shigeaki Amano, Toshio Irino, Kentaro Ishizuka, Tadahisa Kondo

Speech Communication ( ELSEVIER SCIENCE BV ) 50 ( 3 ) 203 - 214 2008.03 [Refereed]

　View Summary

This paper proposes a method for fundamental frequency (F0) estimation and voicing decision that can handle wide-ranging speech signals including adult and infant utterances recorded in real noisy environments. In particular, infant utterances have unique characteristics that are different from those of adults, such as a wide F0 range, F0 abrupt transitions, and unique energy distribution patterns over frequencies. Therefore, conventional methods that were developed mainly for adult utterances do not necessarily work well for infant utterances especially when the signals are contaminated by background noise. Several techniques are introduced into the proposed method to cope with this problem. We show that the ripple-enhanced power spectrum based method (REPS) can estimate the F0s robustly, and that the use of instantaneous frequency (IF) enables us to refine the accuracy of the F0 estimates. In addition, the degree of dominance defined based on the IF is introduced as a robust voicing decision measure. The effectiveness of the proposed method is confirmed in terms of gross pitch errors and voicing decision errors in comparison with the recently proposed methods, Praat and YIN, using both longitudinal recordings of Japanese infant utterances and adult utterances. © 2007 Elsevier B.V. All rights reserved.

DOI
Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation.

Hideki Kawahara, Masanori Morise, Toru Takahashi, Ryuichi Nisimura, Toshio Irino, Hideki Banno

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, March 30 - April 4, 2008, Caesars Palace, Las Vegas, Nevada, USA ( IEEE ) 3933 - 3936 2008.03 [Refereed]

　View Summary

(発表日 1 Apr. )

DOI
Vowel-based voice conversion and its objective evaluation,

Masato Onishi, Toru Takahashi, Masanori Morise, Toshio Irino, Hideki Kawahara

2008 RISP International Workshop on Nonlinear Circuits and Signal Processing (NCSP'08), pp.275-278, Gold Coast, Australia, 6-8 Mar. 2008 2008.03 [Refereed]

　View Summary

(発表日 7 Mar. )
Power Spectrum Estimation Method for Periodic Signals Virtually Irrespective of Time Window Positioning

MORISE Masanori, TAKAHASHI Toru, KAWAHARA Hideki, IRINO Toshio

The IEICE transactions on information and systems ( 社団法人電子情報通信学会 ) 90 ( 12 ) 3265 - 3267 2007.12 [Refereed]

　View Summary

信号分析に広く短時間フーリエ変換が用いられている.しかし,周期信号を対象とした場合,推定されるパワースペクトルが分析時刻により変動する問題がある.本論文では二つのハニング窓を用いた分析法により,この問題を実質的に解消できることを示す.
Detection of temporal modulation of "size" in vowel sequences

TAKESHIMA Chihiro, TSUZAKI Minoru, IRINO Toshio

Journal of the Acoustical Society of Japan (E) ( ACOUSTICAL SOCIETY OF JAPAN ) 28 ( 5 ) 349 - 351 2007.09 [Refereed]

　View Summary

Size extraction, Resonance characteristics, Size modulation detection, Timbre perception Experiments were performed with listeners to detect the STSM in a vowel sequence. The measured characteristics appeared to be high-pass. The observed high-pass tendency suggested that a more efficient cue was available based on the differences in fine temporal structures caused by the resonance change within a vowel. This indicated that the current experimental paradigm was not appropriate to measure the limit of tracking speed of the VTL extraction process. Therefore, further study will be required by using stimuli that cannot be judged as STSM on the basis of the fine structural cues.

DOI
Continuous time-frequency coordinate mapping with sparse anchoring templates and its application to auditory morphing,

Toru Takahashi, Toshio Irino, Hideki Kawahara

19th International Congress on Acoustics (ICA2007) , Madrid, Spain, 2-7 Sept. 2007 2007.09 [Refereed]

　View Summary

(発表日 2 Sept.)
Group delay for acoustic event representation and its application for speech aperiodicity analysis.

Hideki Kawahara, Masanori Morise, Toru Takahashi, Toshio Irino, Hideki Banno, Osamu Fujimura

15th European Signal Processing Conference, EUSIPCO 2007, Poznan, Poland, September 3-7, 2007 ( IEEE ) 2219 - 2223 2007.09 [Refereed]

　View Summary

(発表日 7 Sept. )
単母音による歌唱音声スペクトルの統計的分析に基づく音色制御法の提案と評価

森勢将雅, 田原佳代子, 高橋徹, 入野俊夫, 河原英紀

第６回情報科学技術フォーラム（情報科学技術レターズ） FIT 2007 119 - 122 2007.09 [Refereed]

　View Summary

中京大学, 愛知, 2007年9月5日-7日.(発表日 9月6日)
Error Evaluation of Impulse Response Estimation by Cross Spectral Method Using Speech Signal

MORISE Masanori, IRINO Toshio, KAWAHARA Hideki

The IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (Japanese edition) A ( 一般社団法人電子情報通信学会 ) J90-A ( 7 ) 559 - 566 2007.07 [Refereed]

　View Summary

クロススペクトル法の測定用信号を音声とした場合におけるインパルス応答推定の推定誤差を調査する.先行研究では,時間窓の種類と推定誤差との関係を示し,インパルス応答推定に適した時間窓を提案している.しかし,これはホワイトノイズを測定用信号とした場合における結論である.音声のように,周期的で振幅周波数特性が平たんではない信号を測定用信号とした場合のインパルス応答推定に適した時間窓は,示されていない.本論文では,クロススペクトル法の測定用信号を音声とした場合に生じる推定誤差の要因を示し,推定誤差の小さい時間窓を明らかにする.測定用信号と推定誤差との関係を,様々な測定用信号を用いて調査した.インパルス応答の推定誤差は,測定用信号の振幅周波数特性におけるダイナミックレンジに依存することが明らかとなった.音声のようにダイナミックレンジが40dBを超えるような信号においては,ホワイトノイズにおいて最適とされた時間窓の推定誤差は大きく,ハニング窓,ブラックマン窓のようにサイドローブの小さな時間窓の推定誤差が小さいという結論が得られた.
Implementation of realtime STRAIGHT speech manipulation system : Report on its first implementation

BANNO Hideki, HATA Hiroaki, MORISE Masanori, TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

Journal of the Acoustical Society of Japan (E) ( ACOUSTICAL SOCIETY OF JAPAN ) 28 ( 3 ) 140 - 146 2007.05 [Refereed]

　View Summary

A very high quality speech analysis, modification and synthesis system—STRAIGHT—has now been implemented in C language and operated in realtime. This article first provides a brief summary of STRAIGHT components and then introduces the underlying principles that enabled realtime operation. In STRAIGHT, the built-in extended pitch synchronous analysis, which does not require analysis window alignment, plays an important role in realtime implementation. A detailed description of the processing steps, which are based on the so-called "just-in-time" architecture, is presented. Further, discussions on other issues related to realtime implementation and performance measures are also provided. The software will be available to researchers upon request.

DOI
Auditory stream segregation based on speaker size, and identification of size-modulated vowel sequences

Minoru Tsuzaki, Chihiro Takeshima, Toshio Irino, Roy D. Patterson

HEARING - FROM SENSORY PROCESSING TO PERCEPTION ( SPRINGER-VERLAG BERLIN ) 285 - + 2007 [Refereed]
Discrimination and Recognition of Scaled Word Sounds

Toshio Irino, Yoshie Aoki, Yoshie Hayashi, Hideki Kawahara, Roy D. Patterson

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 ( ISCA-INT SPEECH COMMUNICATION ASSOC ) 321 - + 2007 [Refereed]

　View Summary

Smith et al. [2] and Ives et al. [3] demonstrated that humans could extract information about the size of a speaker's vocal tract from speech sounds (vowels and syllables, respectively). We have extended their discrimination and recognition experiments to naturally pronounced words. The Just Noticeable Difference (JND) for size discrimination was between 5.5% and 19% depending on the listener. The smallest JND is comparable to that of the syllable experiments; the average JND is comparable to that of the vowel experiments. The word recognition scores remain above 50% for speaker sizes beyond the normal range for humans. The fact that good performance extends over such a large range of acoustic scales supports Irino and Patterson's hypothesis [1] that the auditory system segregates size and shape information at an early stage in the processing.
Warped-TSP: An acoustic measurement signal robust to background noise and harmonic distortion

Masanori Morise, Toshio Irino, Hideki Banno, Hideki Kawahara

ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE ( SCRIPTA TECHNICA-JOHN WILEY & SONS ) 90 ( 4 ) 18 - 26 2007 [Refereed]

　View Summary

TSP (Time-Stretched Pulse, lin-TSP afterwards) and logarithmic TSP (log-TSP) are commonly used in impulse response measurements of audio systems and room acoustics. But the optimal test signal for each environment is different. It is necessary to choose an appropriate test signal for each environment to achieve a better SNR in the measured impulse response. A new acoustic measurement signal that is a hybrid signal of lin-TSP and log-TSP is proposed. The proposed signal, called "warped-TSP," achieves an SNR higher than that obtained by lin-TSP and log-TSP. It also provides a means to eliminate harmonic distortion due to the reproduction system. In this paper, the definition and features of warped-TSP are introduced in comparison with lin-TSP and log-TSP. We also present the relations between the parameters of warped-TSP, the amplitude frequency characteristics of warped-TSP, and the effects on the representation components due to harmonic distortion. Based on these discussions, a method of selecting the optimal parameters of warped-TSP for a specific measuring environment is given. A series of impulse response measurements performed under different ambient noise conditions revealed that the proposed method outperformed lin-TSP and log-TSP under all conditions in terms of the SNR of the measured impulse response. (C) 2006 Wiley Periodicals, Inc.

DOI
A Dynamic Compressive Gammachirp Auditory Filterbank.

Irino T, Patterson RD

IEEE transactions on audio, speech, and language processing ( 6 ) 14 ( 6 ) 2222 - 2232 2006.11 [Refereed]

　View Summary

It is now common to use knowledge about human auditory processing in the development of audio signal processors. Until recently, however, such systems were limited by their linearity. The auditory filter system is known to be level-dependent as evidenced by psychophysical data on masking, compression, and two-tone suppression. However, there were no analysis/synthesis schemes with nonlinear filterbanks. This paper describe 18300060s such a scheme based on the compressive gammachirp (cGC) auditory filter. It was developed to extend the gammatone filter concept to accommodate the changes in psychophysical filter shape that are observed to occur with changes in stimulus level in simultaneous, tone-in-noise masking. In models of simultaneous noise masking, the temporal dynamics of the filtering can be ignored. Analysis/ synthesis systems, however, are intended for use with speech sounds where the glottal cycle can be long with respect to auditory time constants, and so they require specification of the temporal dynamics of auditory filter. In this paper, we describe a fast-acting level control circuit for the cGC filter and show how psychophysical data involving two-tone suppression and compression can be used to estimate the parameter values for this dynamic version of the cGC filter (referred to as the "dcGC" filter). One important advantage of analysis/synthesis systems with a dcGC filterbank is that they can inherit previously refined signal processing algorithms developed with conventional short-time Fourier transforms (STFTs) and linear filterbanks.

DOI
Speech Segregation Using an Auditory Vocoder With Event-Synchronous Enhancements.

Irino T, Patterson RD, Kawahara H

IEEE transactions on audio, speech, and language processing ( 6 ) 14 ( 6 ) 2212 - 2221 2006.11 [Refereed]

　View Summary

We propose a new method. to segregate concurrent speech sounds using an auditory version of a channel vocoder. The auditory representation of sound, referred to as an "auditory image' " preserves fine temporal information, unlike conventional window-based processing systems. This makes it possible to segregate speech sources with an event synchronous procedure. Fundamental frequency information is used to estimate the sequence of glottal pulse times for ' a target speaker, and to repress the glottal events of other speakers. The procedure leads to robust extraction of the target speech and effective segregation even when the signal-to-noise ratio is as low as 0 dB. Moreover, the segregation performance remains high when the speech contains jitter, or when the estimate of the fundamental frequency F0 is inaccurate. This contrasts with conventional comb-filter methods where errors in F0 estimation produce a mark ed reduction in performance. We compared the new method to a comb-filter method using a cross-correlation measure and perceptual recognition experiments. The results suggest that the new method has the potential to supplant comb-filter and harmonic-selection methods for speech enhancement.

DOI
Speech style conversion based on the statistics of vowel spectrograms and nonlinear frequency mapping.

Toru Takahashi, Hideki Banno, Toshio Irino, Hideki Kawahara

14th European Signal Processing Conference, EUSIPCO 2006, Florence, Italy, September 4-8, 2006 ( IEEE ) 1 - 5 2006.09 [Refereed]

　View Summary

(発表日 8 Sept.)
Analyzing dialogue data for real-world emotional speech classification.

Ryuichi Nisimura, Souji Omae, Hideki Kawahara, Toshio Irino

INTERSPEECH 2006 - ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 17-21, 2006 ( ISCA ) 1822 - 1825 2006.09 [Refereed]

　View Summary

In order to obtain an understanding of the user's emotion in human-machine dialogues, an analysis of dialogical utterances in the real world was performed. This work comprises three major steps. (1) The actual conditions of 16 basic emotions were evaluated using Japanese child voices, which were collected through the field test of the public spoken dialogue system. (2) Two factors were derived by a factor analysis. The factors were defined as fundamental psychological factors representing "delightful" and "hateable" emotions. (3) The relationships between the factors and the physical acoustic features were investigated to establish a capability to sense a user's mental state for the dialogue system. In the experimental discriminations between the delightful and hateable emotions, a correct rate of 98.8% was achieved in classifying child's utterances by the SVM (Support Vector Machine) with 11 acoustic features.
Logarithmic temporal processing applied to accurate empirical transfer function measurements in vocal sound propagation.

Masanori Morise, Toshio Irino, Hideki Kawahara

14th European Signal Processing Conference, EUSIPCO 2006, Florence, Italy, September 4-8, 2006 ( IEEE ) 1 - 5 2006.09 [Refereed]

　View Summary

(発表日 8 Sept.)
Comparison of the roex and gammachirp filters as representations of the auditory filter.

Unoki M, Irino T, Glasberg B, Moore BC, Patterson RD

The Journal of the Acoustical Society of America ( 3 ) 120 ( 3 ) 1474 - 1492 2006.09 [Refereed]

　View Summary

Although the rounded-exponential (roex) filter has been successfully used to represent the magnitude response of the auditory filter, recent studies with the roex(p,w,t) filter reveal two serious problems: the fits to notched-noise masking data are somewhat unstable unless the filter is reduced to a physically unrealizable form, and there is no time-domain version of the roex(p, w, t) filter to support modeling of the perception of complex sounds. This paper describes a compressive gammachirp (cGC) filter with the same architecture as the roex(P I w,t) which can be implemented in the time domain. The gain and asymmetry of this parallel cGC filter are shown to be comparable to those of the roex(p,w,t) filter, but the fits to masking data are still somewhat unstable. The roex(p,w,t) and parallel cGC filters were also compared with the cascade cGC filter [Patterson et al., J. Acoust. Soc. Am. 114, 1529-1542 (2003)], which was found to provide an equivalent fit with 25% fewer coefficients. Moreover, the fits were stable. The advantage of the cascade cGC filter appears to derive from its parsimonious representation of the high-frequency side of the filter. It is concluded that cGC filters offer better prospects than roex filters for the representation of the auditory filter. (c) 2006 Acoustical Society of America.

DOI
Automatic assignment of anchoring points on vowel templates for defining correspondence between time-frequency representations of speech samples.

Toru Takahashi, Masashi Nishi, Toshio Irino, Hideki Kawahara

INTERSPEECH 2006 - ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 17-21, 2006 ( ISCA ) 2514 - 2517 2006.09 [Refereed]

　View Summary

(発表日 21 Sept.)
Auditory stream segregation by size and idenfication of size-modulated vowel sequences,

Minoru Tsuzaki, Chihiro Takeshima, Toshio Irino, Roy D. Patterson

14 th International Symposium on Hearing (ISH2006) 220 - 226 2006.08 [Refereed]

　View Summary

(発表日 20 Aug.)
Human-robot interaction interface using GMM-based noise recognition

Ryuichi Nisimura, Aki Hashizume, Toshio Irino, Hideki Kawahara

WESPAC IX 2006, (9th Western Pacific Acoustics Conference) 347 - 352 2006.06 [Refereed]

　View Summary

Seoul, Korea, 26-28 June 2006
General framework for flexible speech style manipulation and synthesis,

Tohru Takahashi, Toshio Irino, Hideki Kawahara

WESPAC IX 2006, (9th Western Pacific Acoustics Conference), pp.254-259, Seoul, Koria, 26-28 June 2006 2006.06 [Refereed]

　View Summary

(発表日 26 Sept.)
Dynamic, Compressive Gammachirp Auditory Filterbank for Perceptual Signal Processing.

Toshio Irino,Roy D. Patterson

2006 IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP 2006, Toulouse, France, May 14-19, 2006 ( IEEE ) 133 - 136 2006.05 [Refereed]

　View Summary

(発表日 17 May)

DOI
Warped-TSP : An Acoustic Measurement Signal Robust against Background Noise and Harmonics Distortion

MORISE Masanori, IRINO Toshio, BANNO Hideki, KAWAHARA Hideki

The IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (Japanese edition) A ( 社団法人電子情報通信学会 ) 89 ( 1 ) 7 - 14 2006.01 [Refereed]

　View Summary

音響機器や室内伝達関数のインパルス応答測定に線形時間軸伸長のTSP(Linear Time-Stretched Pulse, Lin-TSP)や対数時間軸伸長のLogarithmic TSP(Log-TSP)が従来からよく用いられている. しかし, 暗騒音の特性によって最適な測定用信号は異なっており, 高いSN比を得るには, 測定環境に応じて両信号を使い分ける必要がある. そこで, 本論文では, 両信号を接続した測定用信号"Warped-TSP"を提案する. このWarped-TSPを用いると, 音響機器や室内伝達関数のインパルス応答を, Lin-TSP・Log-TSPより高いSN比で測定できる. 更に, 再生系における高調波ひずみを簡単な操作で除去できるという性質も有している. まずWarped-TSPの定義をLin-TSPやLog-TSPと対比させて述べる. 更に特徴を示し, Warped-TSPに含まれるパラメータと特性の関係や高調波ひずみの影響を明らかにする. 更に測定環境に適したパラメータ設定の方法を述べる. 暗騒音の特性が異なる二つの環境でインパルス応答測定を行い, SN比が改善できることを示す.
Dynamic, compressive gammachirp auditory filterbank for perceptual signal processing

Toshio Irino, Roy D. Patterson

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 ( IEEE ) 4991 - 4994 2006 [Refereed]

　View Summary

A gammachirp auditory filter was developed 1) to extend the domain of the gammatone auditory filter, 2) to simulate the changes in filter shape that occur with changes in stimulus level, 3) to explain a large body of simultaneous masking data, 4) to explain the compressive characteristics of the auditory filter system, and 5) to facilitate the development of a nonlinear, analysis/synthesis framework. What remains is to specify the dynamics of how the stimulus level controls the filter parameters. In this paper, we use psychophysical data involving compression to derive the details of the level control circuit for the dynamic version of the cGC (dcGC) filter and filterbank. The dcGC filterbank enhances spectral contrasts and reduces the dynamic range. This property with the analysis/synthesis framework should be useful in various forms of perceptual signal processing.
Spectral fluctuation mapping model for Japanese speech style conversion based on statistics in emotional speech database

Toru Takahashi, Hideki Banno, Ryuich Nisimura, Toshio Irino, Hideki Kawahara

Oriental COCOSDA 2005 , Indonesia, 6-8 Dec. 2005. 111 - 116 2005.12 [Refereed]
Speech intelligibility derived from time-frequency and source smearing.

Toshio Irino, Satoru Satou, Shunsuke Nomura, Hideki Banno, Hideki Kawahara

INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005 ( ISCA ) 1737 - 1740 2005.09 [Refereed]
Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT.

Hideki Kawahara, Alain de Cheveigné, Hideki Banno, Toru Takahashi, Toshio Irino

INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005 ( ISCA ) 537 - 540 2005.09 [Refereed]

　View Summary

(発表日 5 Sept.)
Voice and emotional expression transformation based on statistics of vowel parameters in an emotional speech database.

Toru Takahashi, Takeshi Fujii, Masashi Nishi, Hideki Banno, Toshio Irino, Hideki Kawahara

INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005 ( ISCA ) 1853 - 1856 2005.09 [Refereed]

　View Summary

(発表日 7 Sept.)
A test signal robust against background noise in the measurement of acoustic impulse responses: Warped-TSP,

Masanori Morise, Toshio Irino, Hideki Banno, Hideki Kawahara

The 34th International Congress and Exposition on Noise Control Engineering (Internoise 2005) , Rio de Janeiro, Brazil, 2005.08 [Refereed]

　View Summary

7-10 Aug. 2005 (発表日 8 Aug.)
A Study of Talker Localization Based on Subband CSP Analysis in Real Noisy Environments,

Yuki Denda, Takanobu Nishiura, Hideki Kawahara, Toshio Irino

IEEE International Workshop on Nonlinear Signal and Image Processing 2005 (NISP 05) 320 - 323 2005.05 [Refereed]

　View Summary

Sapporo, Japan, 18-20, May 2005.
The processing and perception of size information in speech sounds

DRR Smith, RD Patterson, R Turner, H Kawahara, T Irino

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA ( ACOUSTICAL SOC AMER AMER INST PHYSICS ) 117 ( 1 ) 305 - 318 2005.01 [Refereed]

　View Summary

There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies. of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception, and speech recognition [Irino. and Patterson, Speech Commun.. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for ' an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine Judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled Well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds. (C) 2005 Acoustical Society of America.

DOI
Comparison of the compressive-gammachirp and double-roex auditory filters

RD Patterson, M Unoki, T Irino

AUDITORY SIGNAL PROCESSINGP: PHYSIOLOGY, PSYCHOACOUSTICS, AND MODELS ( SPRINGER ) 21 - 29 2005 [Refereed]

　View Summary

( to appear in "Auditory signal processing: physiology, psychoacoustics, and models," Pressnitzer, D., de Cheveigne A., McAdams, S., Collet, L. Eds., Springer Verlag, New York, 2004. )
Underlying principles of a high-quality speech manipulation system STRAIGHT and its application to speech segregation

H Kawahara, T Irino

SPEECH SEPARATION BY HUMANS AND MACHINES ( SPRINGER ) 167 - 180 2005 [Refereed]
Speech segregation using an event-synchronous auditory image and STRAIGHT

T Irino, RD Patterson, H Kawakhara

SPEECH SEPARATION BY HUMANS AND MACHINES ( SPRINGER ) 155 - 165 2005 [Refereed]
Robust and accurate fundamental frequency estimation based on dominant harmonic components

T Nakatani, T Irino

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA ( ACOUSTICAL SOC AMER AMER INST PHYSICS ) 116 ( 6 ) 3690 - 3700 2004.12 [Refereed]

　View Summary

This paper presents a new method for robust and accurate fundamental frequency (F-0) estimation in the presence of background noise and spectral distortion. Degree of dominance and dominance spectrum are defined based on instantaneous frequencies. The degree of dominance allows one to evaluate the magnitude of individual harmonic components of the speech signals relative to background noise while reducing the influence of spectral distortion. The fundamental frequency is more accurately estimated from reliable harmonic components which are easy to select given the dominance spectra. Experiments are performed using white and babble background noise with and without spectral distortion as produced by a SRAEN filter. The results show that the present method,is better than previously reported methods in terms of both gross and fine F-0 errors. (C) 2004 Acoustical Society of America.

DOI
Intelligibility of degraded speech from smeared STRAIGHT spectrum.

Hideki Kawahara, Hideki Banno, Toshio Irino, Jiang Jin

INTERSPEECH 2004 - ICSLP, 8th International Conference on Spoken Language Processing, Jeju Island, Korea, October 4-8, 2004 ( ISCA ) 2004.10 [Refereed]
An evaluation of in-car speech enhancement techniques with microphone array steering,

Masato Nakayama, Yuki Denda, Takanobu Nishiura, Hideki Kawahara, Toshio Irino

18th International Congress on Acoustics (ICA2004) 4 3041 - 3044 2004.04 [Refereed]

　View Summary

Kyoto, Japan, 4-9 Apr. 2004 (abstract review)
Speech segregation using an auditory vocoder with event-synchronous enhancements,

Toshio Irino, Roy D. Patterson, Hideki Kawahara

18th International Congress on Acoustics 4 3025 - 3028 2004.04 [Refereed]

　View Summary

Kyoto, Japan, 4-9 Apr. 2004 (abstract review)
Algorithm amalgam: Morphing waveform based methods, sinuisoidal models and straight

H Kawahara, H Banno, T Irino, P Zolfaghari

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS ( IEEE ) I 13 - 16 2004 [Refereed]

　View Summary

A tool to investigate an important fundamental question in speech processing is proposed aiming to promote research on voice quality and para and non linguistic aspects of speech. The proposed method effectively emulates waveform-based methods, sinusoidal models and the high quality source filter model STRAIGHT The Key idea that enables blending these seemingly disjoint algorithms is a group delay based representation of signal excitation. By using a STRAIGHT-based smoothed time-frequency representation that is shared by these three types of speech processing methods, a unified source representation is used to implement the proposed system. Informal listening tests using the proposed system indicated that phase manipulation introduces different timbre, but it does not need to reproduce the exact waveform to reproduce the same timbre. This may suggest that the possibility of further information reduction exists in synthesizing close to natural quality speech.

DOI
A design of audio-visual talker tracking system based on CSP analysis and frame difference in real noisy environments

Y Denda, T Nishiura, H Kawahara, T Irino

2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING ( IEEE ) 63 - 66 2004 [Refereed]

　View Summary

It is very important to capture distant-talking speech with high-quality for voice-controlled systems or teleconferencing systems. A microphone array steering is an ideal candidate for this purpose. However, for the microphone array steering, it is necessary to track the target talker. Conventional talker tracking algorithms with only audio signal (ex. CSP (Cross-power Spectrum Phase) analysis) have a difficulty estimating the target talker direction accurately in higher noisy environments. To overcome with this problem, we propose a new target talker tracking algorithm that not only utilize the audio signal, but also utilize the visual signal. The proposed algorithm is based on integration of CSP analysis with audio signal and frame difference with visual signal. As a result of evaluation experiments in a real room, we confirmed that the proposed algorithm could track target talker accurately than the conventional algorithm.

DOI
Speech recognition with wavelet spectral subtraction in real noisy environment

N Denda, T Nishiura, H Kawahara, T Irino

2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3 ( PUBLISHING HOUSE ELECTRONICS INDUSTRY ) 638 - 641 2004 [Refereed]

　View Summary

In this paper, we focused the effectiveness of the wavelet spectral subtraction in noisy speech recognition. For this purpose, Fourier spectral subtraction is a conventional effective technique, for example. It is a suitable technique for stationary noise reduction (ex. white Gaussian like noise), because the short-time Fourier transform provides a uniform time-frequency resolution on each frequency band. However, it can not reduce suddenly noise effectively, etc. On the other hand.. the wavelet transform may be a suitable technique for suddenly signal analysis, etc. (non-stationary, signal analysis), because it admits a non-uniform time-frequency resolution on each frequency band. Therefore, we reported to provide effectively performance of noise reduction using the Fourier spectral subtraction,, the wavelet spectral subtraction and the microphone array steering in real noisy environments on EUROSPEECH2003. However, it was not clear that what kind of noise characteristics could be reduced with the wavelet spectral subtraction. In this paper, to cope with this problem, we evaluated the performance of the wavelet spectral subtraction and the Fourier spectral subtraction in various noisy environments. As a result of evaluation experiments, we confirmed that the wavelet spectral subtraction could effectively reduce suddenly noise or higher frequency noise than the Fourier spectral subtraction.
Speech segregation based on fundamental event information using an auditory vocoder.

Toshio Irino,Roy D. Patterson, Hideki Kawahara

8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003 ( ISCA ) 2003.09 [Refereed]
Dominance spectrum based v/UV classification and f_0 estimation.

Tomohiro Nakatani, Toshio Irino, Parham Zolfaghari

8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003 ( ISCA ) 2313 - 2316 2003.09 [Refereed]
Extending the domain of center frequencies for the compressive gammachirp auditory filter

RD Patterson, M Unoki, T Irino

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA ( ACOUSTICAL SOC AMER AMER INST PHYSICS ) 114 ( 3 ) 1529 - 1542 2003.09 [Refereed]

　View Summary

The gammatone filter was imported from auditory physiology to provide a time-domain version of the roex auditory filter and enable the development of a realistic auditory filterbank for models of auditory perception [Patterson et al., J. Acoust. Soc. Am. 98, 1890-1894 (1995)]. The gammachirp auditory filter was developed to extend the domain of the gammatone auditory filter and simulate the changes in filter shape that occur with changes in stimulus level. Initially, the gammachirp filter was limited to center frequencies in the 2.0-kHz region where there were sufficient "notched-noise" masking data to define its parameters accurately. Recently, however, the range of the masking data has been extended in two massive studies. This paper reports how a compressive version of the gammachirp auditory filter was fitted to these new data sets to define the filter parameters over the extended frequency range. The results show that the shape of the filter can be specified for the entire domain of the data using just six constants (center frequencies from 0.25 to 6.0 kHz and levels from 30 to 80 dB SPL). The compressive, gammachirp auditory filter also has the advantage of being consistent with physiological studies of cochlear filtering insofar as the compression of the filter is mainly limited to the passband and the form of the chirp in the impulse response is largely independent of level. (C) 2003 Acoustical Society of America.

DOI
Glottal closure instant synchronous sinusoidal model for high quality speech analysis/synthesis.

Parham Zolfaghari, Tomohiro Nakatani, Toshio Irino, Hideki Kawahara, Fumitada Itakura

8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003 ( ISCA ) 2441 - 2444 2003.09 [Refereed]
Speech segregation using event synchronous auditory vocoder

T Irino, RD Patterson, H Kawahara

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS ( IEEE ) 525 - 528 2003 [Refereed]

　View Summary

We present a new auditory method to segregate concurrent speech sounds. The system is based on an auditory vocoder developed to resynthesize speech from an auditory Mellin representation using the vocoder STRAIGHT. The quality of the transmitted sound is improved by introducing an event synchronous procedure to estimate glottal pulse times. The auditory representation preserves fine temporal information, unlike conventional window-based processing, which makes it possible to segregate the speech synchronously. The results show that the segregation is good even when the SNR is 0 dB; the extracted target speech was a little distorted but entirely intelligible (like telephone speech), whereas the distracter speech was reduced to a non-speech sound that was not perceptually disturbing. So, this auditory vocoder has potential for speech enhancement in applications such as hearing aids.

DOI
Robust fundamental frequency estimation against background noise and spectral distortion.

Tomohiro Nakatani, Toshio Irino

7th International Conference on Spoken Language Processing, ICSLP2002 - INTERSPEECH 2002, Denver, Colorado, USA, September 16-20, 2002 ( ISCA ) 3 1733 - 1736 2002.09 [Refereed]
Evaluation of a speech recognition / generation method based on HMM and straight.

Toshio Irino, Yasuhiro Minami, Tomohiro Nakatani, Minoru Tsuzaki, H. Tagawa

7th International Conference on Spoken Language Processing, ICSLP2002 - INTERSPEECH 2002, Denver, Colorado, USA, September 16-20, 2002 ( ISCA ) 2545 - 2548 2002.09 [Refereed]
Auditory vocoder to playback sound from an auditory Mellin representation,

Toshio Irino, Roy D. Patterson, Hideki Kawahara

Dynamics of Speech Production and Perception, NATO Advanced Study Institute , Il Ciocco, Itary, 24 June - 6 July, 2002. 2002.06 [Refereed]
Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform

Toshio Irino, Roy D. Patterson

Speech Communication 36 ( 3-4 ) 181 - 203 2002.01 [Refereed]

　View Summary

We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal tract expands or contracts as the length of the vocal tract increases or decreases. There is a transform, the Mellin transform, that is immune to the effects of time dilation; it maps impulse responses that differ in temporal scale onto a single distribution and encodes the size information separately as a scalar constant. In this paper we investigate the use of the Mellin transform for vowel normalisation. In the auditory system, sounds are initially subjected to a form of wavelet analysis in the cochlea and then, in each frequency channel, the repeating patterns produced by periodic sounds appear to be stabilised by a form of time-interval calculation. The result is like a two-dimensional array of interval histograms and it is referred to as an auditory image. In this paper, we show that there is a two-dimensional form of the Mellin transform that can convert the auditory images of vowel sounds from vocal tracts with different sizes into an invariant Mellin image (MI) and, thereby, facilitate the extraction and separation of the size and shape information associated with a given vowel type. In signal processing terms, the MI of a sound is the Mellin transform of a stabilised wavelet transform of the sound. We suggest that the MI provides a good model of auditory vowel normalisation, and that this provides a good framework for auditory processing from cochlea to cortex. © 2002 Elsevier Science B.V. All rights reserved.

DOI
Auditory VOCODER: Speech resynthesis from an auditory Mellin representation

T Irino, RD Patterson, H Kawahara

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS ( IEEE ) II 1921 - 1924 2002 [Refereed]

　View Summary

We assume that speech morphing, noise suppression, and speech segregation would improve if they were more accurately based on human perception, Accordingly, an Auditory VOCODER was developed to resynthesize speech from an auditory Mellin representation used to explain human perception. The Auditory VOCODER has three modules: an Auditory Mellin Image model [9,10], a STRAIGHT VOCODER [2], and a mapping module consisting of warped-frequency cepstral analysis and nonlinear, multivariate regression analysis (MRA). We describe the modules and an evaluation of the system. Informal listening indicates that the sound quality is reasonable.

DOI
Improvement of an IIR asymmetric compensation gammachirp filter

UNOKI Masashi, IRINO Toshio, PATTERSON Roy D

Journal of the Acoustical Society of Japan (E) ( ACOUSTICAL SOCIETY OF JAPAN ) 22 ( 6 ) 426 - 430 2001.11 [Refereed]

　View Summary

An IIR implementation of the gammachirp filter has been proposed to simulate basilar membrane motion efficiently (Irino and Unoki, 1999). A reasonable filter response was provided by a combination of a gammatone filter and an IIR asymmetric compensation (AC) filter. It was noted, probably however, that the rms error was high when the absolute values of the parameters are large, because the coefficients of the IIR-AC filter were selected heuristically. In this report, we show that this is due to the sign inversion of the phase of poles and zeros in the conventional model. We propose a new definition of the IIR-AC filter and we describe a method of systematic determining the optimum coefficients and number of cascade for the second-order filter. This results in a reduction of the error to about 1/3 that produced by the conventional model.

DOI
Sound resynthesis from Auditory Mellin Image using STRAIGHT,

Toshio Irino, Roy D. Patterson, Hideki Kawahara

CRAC (Consistent and Reliable Acoustic Cues for sound analysis) workshop , Aalborg, Denmark, 2nd Sept. 2001 2001.09 [Refereed]
A compressive gammachirp auditory filter for both physiological and psychophysical data

Toshion Irino, Roy D. Patterson

Journal of the Acoustical Society of America ( ACOUSTICAL SOC AMER AMER INST PHYSICS ) 109 ( 5,Pt.1 ) 2008 - 2022 2001.05 [Refereed]

　View Summary

A gammachirp auditory filter was developed by Irino and Patterson [J. Acoust. Soc. Am. 101, 412-419 (1997)] to provide a level-dependent version of the linear, gammatone auditory filter, with which to explain the level-dependent changes in cochlear filtering observed in psychophysical masking experiments. In this 'analytical' gammachirp filter, the chirp varied with level and there was no explicit representation of the change in filter gain or compression with level. Subsequently, Carney et al. [J. Acoust. Soc. Am. 105, 2384-2391 (1999)] reviewed Carney and Yin's [J. Neurophysiol. 60, 1653-1677 (1988)] reverse-correlation (revcor) data and showed that the frequency glide of the chirp does not vary with level in their data. In this article, the architecture of the analytical gammachirp is reviewed with respect to cochlear physiology and a new form of gammachirp filter is described in which the magnitude response, the gain, and the compression vary with level but the chirp does not. This new 'compressive' gammachirp filter is used to fit the level-dependent revcor data reported by Carney et al. (1999) and the level-dependent masking data reported by Rosen and Baker [Hear. Res. 73, 231-243 (1994)] . © 2001 Acoustical Society of America.

DOI
Topic1 Auditory filtering/Cochlear frequency analysis

IRINO Toshio

The Journal of the Acoustical Society of Japan ( 一般社団法人日本音響学会 ) 57 ( 1 ) 56 - 56 2001.01 [Invited]
An analysis/synthesis auditory filterbank based on an IIR gammachirp filter

T Irino, M Unoki

COMPUTATIONAL MODELS OF AUDITORY FUNCTION ( I O S PRESS ) 312 49 - 64 2001 [Refereed]
Robust Estimation of Fundamental Frequency Using Instantaneous Frequencies of Harmonic Components

ATAKE Yoshinori, IRINO Toshio, KAWAHARA Hideki, LU Jinlin, NAKAMURA Satoshi, SHIKANO Kiyohiro

The Transactions of the Institute of Electronics,Information and Communication Engineers. ( 社団法人電子情報通信学会 ) 83 ( 11 ) 2077 - 2086 2000.11 [Refereed]

　View Summary

河原らにより開発されたSTRAIGHTは, VOCODER型分析合成方式であるにもかかわらず, 原音に迫る高い自然性をもった分析合成音を得ることが可能である.しかし, 耐雑音性が低く, 雑音環境下では合成音声の品質が大きく劣化するという弱点があった.それは, STRAIGHTが処理の各段階に基本周期にした処理を積極的に利用していて, 雑音により推定された基本周波数が誤差を含んだ場合, その影響を大きく受けることが原因と考えられる.そこで本論文では, その欠点を克服するために耐雑音性の高い基本周波数推定方法を提案する.このため, 従来のTEMPO法で用いられてきた基本波成分だけではなく, その調波成分も利用し, Cohenの帯域幅方程式を用いて統合する新しい方法を提案する.また, 提案手法の性能の評価のために, 音声データとEGGデータを同時収録したデータベースを作成した.これを用いて提案法及びTEMPO法などの従来法と推定精度の比較をした結果, 提案法は他の従来法に比べて無雑音では同等以上で, 雑音付加時の推定精度は大幅に改善されることがわかった.
Auditory imaging for segregating size and shape information of sound sources

IRINO Toshio

The Journal of the Acoustical Society of Japan ( 一般社団法人日本音響学会 ) 56 ( 7 ) 505 - 508 2000.07 [Invited]

DOI
Auditory images: How complex sounds are represented in the auditory system

PATTERSON Roy D, IRINO Toshio

The Journal of the Acoustical Society of Japan ( 一般社団法人日本音響学会 ) 56 ( 7 ) 503 - 504 2000.07 [Invited]

DOI
Mellin images of vowel sounds and the phonological distinctiveness of multi-formant vowels

RD Patterson, S Uppenkamp, T Irino

BRITISH JOURNAL OF AUDIOLOGY ( WHURR PUBLISHERS LTD ) 34 ( 2 ) 118 - 118 2000.04 [Refereed]
Robust fundamental frequency estimation using instantaneous frequencies of harmonic components

Yoshinori Atake, Toshio Irino, Toshio Irino, Hideki Kawahara, Hideki Kawahara, Hideki Kawahara, Jinlin Lu, Satoshi Nakamura, Kiyohiro Shikano

6th International Conference on Spoken Language Processing, ICSLP 2000 2 907 - 910 2000.01

　View Summary

This paper proposes a noise-tolerant method for fundamental frequency (F0) extraction. This method includes several new ideas, including the estimation of the instantaneous frequencies of the higher harmonic components, and the design of an adaptive weighting function based on a bandwidth equation that combines the F0 information in the harmonic components. To evaluate the proposed method, we constructed a relatively large database of simultaneous recordings of speech waveforms and EGG (Electro Glotto Graphy). The database consists of 30 sentences pronounced by 14 male and 14 female normal subjects, i.e., 840 sentences in total. The duration of the sound is about 35 minutes including about 20 minutes of voicing. The experiments were performed with additive noise for four pitch extraction methods, i.e., the proposed method, the original TEMPO, an improved cepstrum method, and a common F0 extraction program in ESPS. The results were as follows: 1) the proposed method is always better than any of the other methods when the SNR is greater than about 2 dB; 2) for high SNR values (> 15 dB), the correct rates of the proposed method and the original TEMPO are about 95% and much better than the improved cepstrum method (92%) and the ESPS function (89%); and 3) all of the methods degrade to less than 62% when the SNR is 0 dB. As a result, the proposed method improves the performance for low SNR values and also maintains high accuracy inherent from the original TEMPO for high SNR values.
A gammachirp perspective of cochlear mechanics that can also explain human auditory masking quantitatively

T Irino, RD Patterson

PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON RECENT DEVELOPMENTS IN AUDITORY MECHANICS ( WORLD SCIENTIFIC PUBL CO PTE LTD ) 230 - 236 2000 [Refereed]

　View Summary

Recently, the gammachirp function was proposed as an auditory filter for explaining psychoacoustical masking data [7]. It can also account for some basic physiological observations such as the frequency glide in basilar membrane motion (BMM), but it cannot readily account for other observations such as the nonlinear compressive relationship between signal level and BMM. In this paper, the gammachirp filter is extended to include an extra stage of filtering as suggested by the NonLinear Resonant Tectorial Membrane (NL-RTM) hypothesis [1,2]. The extra filter was initially proposed for an IIR implementation of the gammachirp [8]. The new gammachirp filter provides excellent fits to human masking data, and it enables us to unify physiological and psychoacoustical data within a common modelling framework.
Analysis/synthesis auditory filterbank based on an IIR implementation of the gammachirp

Toshio Irino, Masashi Unoki

Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) ( Acoustical Soc Jpn ) 20 ( 6 ) 397 - 406 1999.11 [Refereed]

　View Summary

This paper proposes a new auditory filterbank that enables signal resynthesis from dynamic representations produced by a level-dependent auditory filterbank. The filterbank is based on a new IIR implementation of the gammachirp, which has been shown to be an excellent candidate for asymmetric, level-dependent auditory filters. Initially, the gammachirp filter is shown to be decomposed into a combination of a gammatone filter and an asymmetric function. The asymmetric function is excellently simulated with a minimum-phase IIR filter, named the `asymmetric compensation filter'. Then, two filterbank structures are presented each based on the combination of a gammatone filterbank and a bank of asymmetric compensation filters controlled by a signal level estimation mechanism. The inverse filter of the asymmetric compensation filter is always stable because the minimum-phase condition is satisfied. When a bank of inverse filters is utilized after the gammachirp analysis filterbank and the idea of wavelet transform is applied, it is possible to resynthesize signals with small time-invariant errors and achieve a guaranteed precision. This feature has never been accomplished by conventional active auditory filterbanks. The proposed analysis/synthesis gammachirp filterbank is expected to be useful in various applications where human auditory filtering has to be modeled.

DOI
Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size.

Toshio Irino,Roy D. Patterson

Sixth European Conference on Speech Communication and Technology, EUROSPEECH 1999, Budapest, Hungary, September 5-9, 1999 ( ISCA ) 1899 - 1902 1999.09 [Refereed]
Extracting size and shape information of sound source in an optimal auditory processing model,

Toshio Irino, Roy D. Patterson

Workshop on Computational Auditory Scene Analysis (CASA), International Joint Conference on Artificial Intelligence (IJCAI'99) , Stockholm, Sweden, 1st August 1999. 1999.08 [Refereed]
Noise suppression using a time-varying, analysis/synthesis gammachirp filterbank

T Irino

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI ( IEEE ) 97 - 100 1999 [Refereed]

　View Summary

Spectral subtraction has been cited most often as a noise suppression method for speech signals in steady background noise, because it is basically a non-parametric method and simple enough to implement for various applications using FFT. It has also been well known, however, that spectral subtraction produces so called "musical noise" in synthetic sounds. Since such musical noise, even at low levels, can often bother humans in speech perception, spectral subtraction has not been very successful in signal processing applications for human listeners. To suppress noise without producing musical noise, an alternative method has been developed using a time-varying, analysis/synthesis gammachirp filterbank; this was initially proposed as an auditory filterbank. The present method achieves about the same SNR improvement as spectral subtraction when using the same information on the non-speech interval. Moreover, the synthetic sounds only contain steady white-like noise at reduced levels when the original noise is white. This method is, therefore, advantageous in various applications for human listeners.

DOI
Modeling temporal asymmetry in the auditory system

RD Patterson, T Irino

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA ( ACOUSTICAL SOC AMER AMER INST PHYSICS ) 104 ( 5 ) 2967 - 2979 1998.11 [Refereed]

　View Summary

Sound sources in the environment produce waves that are almost invariably asymmetric in time, and human listeners are highly sensitive to temporal asymmetry. The spectral analysis and neural transduction processes in the cochlea enhance temporal asymmetry, as do time-domain models of cochlear processes, but it appears that the resulting asymmetry is not sufficient to explain the observed perceptual asymmetry. In the auditory image model (AIM) of hearing, the temporal asymmetry in the neural activity produced by the cochlea is further enhanced by the "strobed" temporal integration that converts the neural activity pattern into an auditory image, and the temporal asymmetry in the auditory image is sufficient to explain the perceptual asymmetry. Modern versions of the "duplex model" of pitch have time-domain cochlea simulations that produce neural activity with temporal asymmetry similar to that produced by AIM. In the final stage, however, they apply autocorrelation to the neural pattern and autocorrelation is a symmetric process in time. In this paper the effect of autocorrelation on temporal asymmetry is examined in a range of auditory models with varying forms of auditory filterbank, compression, and neural transduction. It is concluded that autocorrelation does not enhance temporal asymmetry and often reduces it, and that autocorrelogram models cannot explain the magnitude of the perceptual asymmetry in their current form. Then, the original version of strobed-temporal-integration is reviewed with regard to temporal asymmetry, and the delta-gamma theory of temporal asymmetry [Irino and Patterson, J. Acoust. Soc. Am. 99, 2316-2331 (1996)] is used to develop a new version of strobed-temporal-integration that is more robust and physiologically more plausible. (C) 1998 Acoustical Society of America. [S0001-4966(98)05711-7]

DOI
A time-varying analysis/synthesis auditory filterbank based on an IIR gammachirp filter

Toshio Irino, Masashi Unoki

NATO Advanced Study Institute, Computational Hearing 205 - 210 1998.07 [Refereed]

　View Summary

Il Ciocco (Tuscany), Italy, July 1 - July 12, 1998.
The gammachirp for optimal auditory filtering

T Irino, RD Patterson

ICONIP'98: THE FIFTH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING JOINTLY WITH JNNS'98: THE 1998 ANNUAL CONFERENCE OF THE JAPANESE NEURAL NETWORK SOCIETY - PROCEEDINGS, VOLS 1-3 ( OHMSHA LTD ) 1322 - 1326 1998 [Refereed]

　View Summary

This paper reviews the "gammachirp" auditory filter based on physical theory and supported by psychoacoustical 'and physiological observations. Various studies have demonstrated that the auditory filter cannot be simulated by the Gabor function that is well-known as an optimal function in terms of minimal uncertainty in a time-frequency representation. This seems to suggest that the auditory system is non-optimal. However, for a time-scale representation, the function minimizing uncertainty is the gammachirp. With a frequency-modulation term. the gammachirp is an extension of the gammatone filter that is often used in functional auditory filterbanks. The gammachirp is found to provide an excellent tit to human masking data that show level-dependent asymmetry in the frequency characteristic. Moreover, it is consistent with recent physiological observations of the frequency-modulation in the impulse response of the basilar membrane.
A time-varying, analysis/synthesis auditory filterbank using the gammachirp

T Irino, M Unoki

PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 ( IEEE ) VI 3653 - 3656 1998 [Refereed]

　View Summary

A time-varying, analysis/synthesis auditory filterbank has been developed using a new implementation of the "gammachirp", which has been shown to be an excellent function for the asymmetric, level-dependent auditory filter. The gammachirp filter is shown to be implemented through a combination of a gammatone filter and an IIR asymmetric compensation filter; which largely reduces the computational cost for time-varying filtering. The gammachirp filterbank is designed using a linear gammatone filterbank and a bank of time-varying asymmetric compensation filters controlled by the sound pressure level estimated at the output of the filterbank. Since the inverse filter of the asymmetric compensation filter is always stable, it is possible to resynthesize signals from time-varying, level-dependent auditory representations. The resynthesis error is only determined by the linear analysis/synthesis gammatone filterbank. The proposed filterbank is applicable to various types of signal processing required to model human auditory filtering.

DOI
A time-domain, level-dependent auditory filter: The gammachirp

Toshio Irino, Roy D. Patterson

Journal of the Acoustical Society of America ( ACOUSTICAL SOC AMER AMER INST PHYSICS ) 101 ( 1 ) 412 - 419 1997.01 [Refereed]

　View Summary

A frequency modulation term has been added to the gammatone auditory filter to produce a filter with an asymmetric amplitude spectrum. When the degree of asymmetry in this 'gammachirp' auditory tiller is associated with stimulus level, the gammachirp is found to provide an excellent fit to 12 sets of notched-noise masking data from three different studies. The gammachirp has a well-defined impulse response, unlike the conventional roex auditory filter, and so it is an excellent candidate for an asymmetric, level-dependent auditory filterbank in time-domain models of auditory processing.

DOI
Temporal asymmetry in the auditory system

T Irino, RD Patterson

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA ( ACOUSTICAL SOC AMER AMER INST PHYSICS ) 99 ( 4 ) 2316 - 2331 1996.04 [Refereed]

　View Summary

When a damped exponential with a half-life of 4-8 ms is repeated every 25-50 ms and used to modulate a sinusoid or a wideband noise, it suppresses the sound quality typically associated with the carrier. When the envelopes of these ''damped'' sounds are reversed in time, producing ''ramped'' sounds, a continuous component with the sound quality of the carrier is restored to the perception. This paper presents an experiment that measures the temporal asymmetry revealed by this perceptual contrast. A ramped sinusoid or noise with a given half-life was presented with a damped sinusoid or noise having the same or greater half-life, to determine the damped half-life required to produce a continuous component with the equivalent relative strength in the two sounds. The results with sinusoidal carriers show that the half-life of the damped sound has to be, on average, about five times the half-life of the ramped sound if the tonal component of the two perceptions is to have the same relative strength. The asymmetry for the noise carrier is about half that of the sinusoidal carrier and, again, the damped sound has the greater matching half-life. Several multichannel auditory models based on a gammatone filterbank are used to try to explain the data in terms of traditional leaky integration, but they produce neither sufficient asymmetry nor the correct pattern of asymmetry. A ''delta-gamma'' theory is then developed to provide a framework for understanding temporal asymmetry in the auditory system. The theory is used to compare the temporal asymmetry produced by several auditory models and to explain when and how they can accommodate the perceptual asymmetry observed in the experiments. (C) 1996 Acoustical Society of America.

DOI
A 'gammachirp' function as an optimal auditory filter with the Mellin transform

Toshio, I

1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6 ( IEEE ) II 981 - 984 1996 [Refereed]

　View Summary

Atlanta, Georgia, May 7-10, 1996.

DOI
An Optimal Auditory Filter,

Toshio Irino

IEEE SP 1995 Workshop on Applications of Signal Processing to Audio and Acoustics , IEEE Signal Processing Society, Mohonk, New Paltz, NY, October 15-18, 1995. 1995.10 [Refereed]
A theory of asymmetric intensity enhancement around acoustic transients.

Toshio Irino,Roy D. Patterson

The 3rd International Conference on Spoken Language Processing, ICSLP 1994, Yokohama, Japan, September 18-22, 1994 ( ISCA ) 4 1955 - 1958 1994.09 [Refereed]
SIGNAL RECONSTRUCTION FROM MODIFIED AUDITORY WAVELET TRANSFORM

T IRINO, H KAWAHARA

IEEE TRANSACTIONS ON SIGNAL PROCESSING ( IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC ) 41 ( 12 ) 3549 - 3554 1993.12 [Refereed]

　View Summary

We propose a new method for signal modification in auditory peripheral representation: an auditory wavelet transform and algorithms for reconstructing a signal from a modified wavelet transform. We present the characteristics of signal analysis, synthesis, and reconstruction and also the data reduction criteria for signal modification.

DOI
SIGNAL RECONSTRUCTION FROM MODIFIED WAVELET TRANSFORM - AN APPLICATION TO AUDITORY SIGNAL-PROCESSING

T IRINO, H KAWAHARA

ICASSP-92 - 1992 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 ( I E E E ) 1 A85 - A88 1992 [Refereed]

　View Summary

© 1992 IEEE. A new method of signal reconstruction from a modified auditory representation is presented. This consists of four parts: 1) an algorithm to reconstruct a signal from its modified wavelet transform with a general wavelet; 2) obtaining an auditory representation using an auditory wavelet transform whose analyzing wavelet is the impulse response of an auditory peripheral model; 3) estimating the reconstruction algorithm both with and without data reduction; 4) an example of its application to the time-scale modification of speech. This wavelet reconstruction algorithm is the counterpart of the signal reconstruction algorithm which uses the short-time Fourier transform. High-quality speech successfully generated by time-scale modification shows that the reconstruction method is suitable for various applications as well as making experimental auditory stimuli.

DOI
A method for designing neural networks using nonlinear multivariate analysis—application to speaker‐independent vowel recognition

Toshio Irino, Hideki Kawahara

Systems and Computers in Japan 21 ( 9 ) 80 - 88 1990.01 [Refereed]

　View Summary

This paper proposes a method of constructing a multilayered neural network, using the multiple logistic model (MLM). The model is a nonlinear multivariate analysis considering the output logistic function of each unit, which is used in the back‐propagation method (BP). The idea can be applied directly to the determination of the multilayered neural network structure. The model can also be utilized as a systematic method to introduce such information as pattern distribution into the neural network structure. Considering the speaker‐independent vowel recognition as the problem, this paper compares the results by the proposed method (MLM), the construction by the linear multiple regression analysis (MRA), the learning by BP with the weight being defined at random as the initial value, and the learning by BP with the initial weight determined by MLM or MRA. It is seen as a result that the recognition rate is the best when BP is applied after introducing the speaker distribution information by the proposed method. It is seen also that the computation time is reduced compared with the BP, with the initial weight being defined at random. Copyright © 1990 Wiley Periodicals, Inc., A Wiley Company

DOI
A Method for Designing Neural Networks Using Nonlinear Multivariate Analysis: Application to Speaker-Independent Vowel Recognition.

Toshio Irino, Hideki Kawahara

Neural Computation 2 ( 3 ) 386 - 397 1990 [Refereed]

DOI
多層神経回路網の非線形多変量解析による構成法--不特定話者母音認識への適用 (新しい音声処理技術特集)

入野俊夫, 河原英紀

電子情報通信学会論文誌 D-2 情報・システム ( 電子情報通信学会情報・システムソサイエティ ) 72 ( 8 ) p1187 - 1193 1989.08
Theoretical analysis of Stoneley waves propagating along an interface between two substrates of the same piezoelectric material

Toshio Irino, Yasutaka Shimizu

Electronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi) 72 1 - 12 1989.04 [Refereed]

　View Summary

A theoretical investigation of Stoneley waves propagating along an interface between two substrates of the same piezoelectric material is presented. A method of determining the upper cutoff velocity of Stonely waves is described Stoneley waves can also occur in trigonal LiNbO 3 and LiTaO 3 and also with one of the substrates turned over, even without a short-circuit plate. The degree of energy concentration, velocity and the electromechanical coupling coefficient k 2 of Stoneley waves in LiNbO 3 are calculated for various cuts and propagation velocities. The occurrence or nonoccurrence of Stoneley waves when two substrates of different cuts are joined and when two substrates of the same cut are joined with different orientations in the plane is investigated.
OPTIMIZED STONELEY WAVE DEVICE BY PROPER CHOICE OF GLASS OVERCOAT

T IRINO, Y SHIMIZU

IEEE TRANSACTIONS ON ULTRASONICS FERROELECTRICS AND FREQUENCY CONTROL ( IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC ) 36 ( 2 ) 159 - 167 1989.03 [Refereed]

　View Summary

The characteristics of Stoneley wave propagated along an interface between a piezoelectric material and an isotropic material were investigated both theoretically and experimentally. First, the condition for existence of Stoneley waves was shown for various piezoelectric materials. A rule of thumb for selecting the combination of the two materials was obtained. Then, LiTa03 was selected for a piezoelectric material and Si02 was selected for an isotropic material. After the calculation of the Stoneley wave characteristics, actual devices were fabricated and measured. The experimental results were found to be in good agreement with the theory; zero slope temperature (TCD = 0) and high electromechanical coupling coefficient (K2 = 1.5 percent) were obtained for Stoneley wave propagation between Si02/X-148° LiTa03. As a result, future surface-acoustic-wave (SAW) devices can be made without any package. © 1989 IEEE

DOI
Propagation of Boundary Acoustic Waves Along a ZnO Layer between Two Materials

Toshio Irino, Yoshimasa Shirosaki, Yasutaka Shimizu

IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control ( IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC ) 35 ( 6 ) 701 - 707 1988.11

　View Summary

Theoretical and experimental results on boundary acoustic waves (BAW) propagated along a ZnO layer sandwiched between two materials are presented. The dispersion curve of the propagation velocity, the electromechanical coupling coefficient (K2) and the displacements were obtained theoretically as a function of the normalized thickness of the ZnO layer. The temperature coefficients of delay time (TCD) were also calculated and become zero at a particular thickness. Boundary acoustic waves can exist only when the material constants of three materials satisfy the particular conditions obtained in the work. The existence regions are larger than those of the Stoneley waves the authors presented elsewhere. Experiments on SiO2/Zn0/SiO2 were also performed to verify the theoretical prediction of the existence of boundary waves. A ZnO film and a thick SiO2 layer were fabricated on a fused quartz substrate by a sputtering technique. Then the boundary waves were excited and received by interdigital transducers and propagated along the ZnO layer. Propagation loss was practically the same value as for Rayleigh waves, indicating a proper mode of the system. These results lead us to expect that future SAW devices can be made without any package. © 1988 IEEE

DOI
Vowel-feature extraction from cochlear vibration using neural networks.

Toshio Irino, Hideki Kawahara

Neural Networks 1 ( Supplement-1 ) 300 - 301 1988.09 [Refereed]

　View Summary

First annual conference of International Neural Network Society (INNS), Boston, Sept. 1988.

DOI
Propagation of boundary acoustic waves along a ZnO layer between two materials

Toshio Irino, Yoshimasa Shirosaki, Yasutaka Shimizu

Electronics and Communications in Japan (Part II: Electronics) 71 ( 5 ) 1 - 12 1988.01 [Refereed]

　View Summary

This paper describes the theoretical and experimental results on the propagation of boundary acoustic waves along a ZnO layer between two materials. It was proven theoretically that the boundary acoustic waves propagate in SiO 2 /ZnO/SiO 2 , SiO 2 /ZnO/PYREX and SiO 2 /ZnO/(Z – X)Si structures. The propagation velocity, electromechanical coupling coefficient K 2 , and the concentration of energy to the mid‐layer were calculated as a function of the ZnO film thickness. The thermal coefficient of delay time TCD was also calculated for the SiO 2 /ZnO/SiO 2 and SiO 2 /ZnO/(Z – X)Si structures, showing that a certain ZnO film thickness provides zero TCD. Next, requirements of a glass substrate for propagation of boundary acoustic waves along the ZnO film sandwiched by SiO 2 and glass substrate or glass film and glass substrate is discussed. As a result, as the thickness of the ZnO film and the second velocity increase, the boundary acoustic wave has a better chance to exist. Finally, the device with SiO 2 /ZnO/SiO 2 structure was actually fabricated and it was confirmed that the boundary acoustic wave was excited and propagated in the device. If the Rayleigh wave characteristic is taken into account, the experimental and theoretical results agree. Copyright © 1988 Wiley Periodicals, Inc., A Wiley Company

DOI
Zero slope tempartures SiO<inf>2</inf>/LiTaO<inf>3</inf> structure substrate for stoneley waves

Toshio Irino, Yasutaka Shimizu, Takaya Watanabe

Electronics and Communications in Japan (Part II: Electronics) 71 ( 6 ) 55 - 62 1988.01 [Refereed]

　View Summary

A theoretical and experimental study has been conducted on Stoneley waves propagating along the interface between LiTaO 3 and SiO 2 . First, it is shown that Stoneley waves can exist for specific cuts and propagation directions. The velocity, electromechanical coupling coefficient, energy concentration, delay time temperature coefficient, and delay time temperature characteristics are calculated. In an SiO 2 /X‐148°Y LiTaO 3 structure, a zero temperature coefficient which is not available for a Rayleigh wave on an LiTaO 3 substrate has been realized. In addition, the electromechanical coupling coefficient is larger. Next, a device of this structure has been fabricated. It is confirmed that Stoneley waves can be excited and received by interdigital electrodes. The characteristics observed have been found to agree well with the theoretical predictions. Also, a zero temperature coefficient is obtained with an SiO 2 /X‐148.5°Y LiTaO 3 structure and the quadratic temperature coefficient is about the same as in an ST cut quartz Rayleigh wave substrate. Copyright © 1988 Wiley Periodicals, Inc., A Wiley Company

DOI
Zero slope temperature sic/sio<inf>2</inf>/litao<inf>3</inf>substrate for boundary acoustic waves

Toshio Irino, Takaya Watanabe, Yasutaka Shimizu

Japanese Journal of Applied Physics ( JAPAN J APPLIED PHYSICS ) 27-1 154 - 156 1988.01 [Refereed]

　View Summary

Zero slope temperature SiO 2 /X-148ºY LiTaO 3 substrate has been proposed for use in packageless SAW devices. However, the SiO 2 film is required to be about three times the wavelength and, therefore, is easily removed by temperature variation. In this paper, SiC overcoat on SiO 2 to reduce the film thickness is proposed. The calculated energy concentration to the middle layer is better than the two media structure. The experimental result agreed with the theory and zero slope temperature was obtained when the total thickness of SiC and SiO 2 was about 2.5 times the wavelength. © 1988 The Japan Society of Applied Physics.

DOI
ZERO SLOPE TEMPERATURE SIO//2/LITAO//3 STRUCTURE SUBSTRATE FOR STONELEY WAVES.

Toshio Irino, Takaya Watanabe, Yasutaka Shimizu

Ultrasonics Symposium Proceedings 257 - 260 1987.12 [Refereed]

　View Summary

Theoretical and experimental results on Stoneley waves along an interface between LiTaO//3 and SiO//2 are presented. Stoneley waves can exist only when the material constants of a piezoelectric material and an isotropic material satisfy particular conditions. After the cut angle and propagation direction of LiTaO//3 with SiO//2 were determined from the calculated characteristics, an experiment was performed showing the measured values to be in good agreement with the theory. Zero slope temperature (TCD equals 0) and higher coupling coefficient (K**2 equals 1. 5%) were obtained with Stoneley waves in the SiO//2/X-148 degree Y LiTaO//3 structure. These results indicate that future SAW devices could be made without package.
Zero slope temperature SiO2/LiTaO3 structure substrate for stoneley waves.

入野俊夫, 渡辺隆弥, 清水康敬

電子情報通信学会論文誌 C ( 電子情報通信学会 ) 70 ( 7 ) p1070 - 1075 1987.07 [Refereed]
PROPAGATION OF BOUNDARY ACOUSTIC-WAVES ALONG A ZNO LAYER BETWEEN 2 MATERIALS

T IRINO, Y SHIROSAKI, Y SHIMIZU

IEEE TRANSACTIONS ON ULTRASONICS FERROELECTRICS AND FREQUENCY CONTROL ( IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC ) 34 ( 3 ) 390 - 390 1987.05 [Refereed]
Propagation of boundary acoustic waves along a ZnO layer between two materials.

入野俊夫, 白崎良昌, 清水康敬

電子情報通信学会論文誌 C ( 電子情報通信学会 ) 70 ( 1 ) p59 - 68 1987.01 [Refereed]
PROPAGATION OF BOUNDARY ACOUSTIC WAVES ALONG A ZnO LAYER BETWEEN TWO MATERIALS.

Toshio Irino, Yoshimasa Shirosaki, Yasutaka Shimizu

Ultrasonics Symposium Proceedings 195 - 200 1986.12 [Refereed]

　View Summary

Theoretical and experimental results are presented for boundary acoustic waves propagated along a ZnO layer sandwiched between two materials. The dispersion curve of the propagation velocity, the electromechanical coupling coefficient and the displacements were obtained theoretically as a function of the normalized thickness of the ZnO layer. The temperature coefficients of delay time were also calculated and found to become zero at a particular thickness. Boundary acoustic waves can exist only when the material constants of three materials satisfy the particular conditions obtained here. Experiments on SiO//2/ZnO/SiO//2 were also performed to verify theoretical prediction of the existence of boundary waves. Propagation loss was practically the same value as for Rayleigh waves.
Theoretical analysis of stoneley waves propagating along an interface between two substrates of the same piezoelectric materials.

入野俊夫, 清水康敬

電子通信学会論文誌 A ( 電子通信学会 ) 69 ( 9 ) 1144 - 1153 1986.09 [Refereed]
Acoustic boundary waves propagating along a thin layer between two bonded substrates

Toshio Irino, Yasutaka Shimizu

Japanese Journal of Applied Physics 25 ( 1 ) 130 - 132 1986.01 [Refereed]

　View Summary

The characteristics of boundary waves propagating along a thin layer between two bonded substrates were investigated both theoretically and experimentally. The structures are PZT/ADHESIVE/PZT and PZT/ADHESIVE/GLASS. It was found that the propagation loss of the devices is greater than theoretical results because of a non-uniform adhesive layer. Therefore, the two substrates must be carefully and accurately bonded to decrease the propagation loss. © 1986 The Japan Society of Applied Physics.

DOI
Theoretical analysis of stoneley waves propagating along an interface between piezoelectric material and isotropic material

Toshio Irino, Yasutaka Shimizu

Electronics and Communications in Japan (Part II: Electronics) 68 ( 3 ) 29 - 36 1985.01 [Refereed]

　View Summary

Conventional surface acoustic wave (SAW) devices mainly use Rayleigh waves that propagate on the substrate surface. Therefore, they require protective packaging and are expensive as well as unreliable. This paper reports an effort to develop SAW devices that do not require packaging. To this end we study the Stoneley waves propagating along the interface between the piezoelectric and isotropic materials. A range of material constants of isotropic materials is obtained which allows the Stoneley wave if combined with piezoelectric materials with various cuts and propagation directions. We obtain the relation of the allowable range to the maximum velocity of the Stoneley wave and the velocity of the Rayleigh wave. It is found that the Stoneley wave can be supported with a combination of glass and LiTaO 3 , PZT 4 and ZnO. It is not possible to concentrate the energy near the interface if LiNbO 3 and Bi 12 GeO 20 are used. Copyright © 1985 Wiley Periodicals, Inc., A Wiley Company

DOI
STONELEY WAVES PROPAGATING ALONG AN INTERFACE BETWEEN PIEZOELECTRIC MATERIAL AND GLASS

Y SHIMIZU, T IRINO

IEEE TRANSACTIONS ON SONICS AND ULTRASONICS ( IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC ) 32 ( 1 ) 105 - 105 1985 [Refereed]
Theoretical analysis of Stoneley waves propagating along an interface between piezoelectric material and isotropic material.

入野俊夫, 清水康敬

電子通信学会論文誌 C ( 電子通信学会 ) 67 ( 10 ) 727 - 732 1984.10 [Refereed]
STONELEY WAVES PROPAGATING ALONG AN INTERFACE BETWEEN PIEZOELECTRIC MATERIAL AND ISOTROPIC MATERIAL.

Yasutaka Shimizu, Toshio Irino

Ultrasonics Symposium Proceedings 1 373 - 376 1983.12 [Refereed]

　View Summary

IEEE Ultrasonics Symposium, Atlanta, GA, Nov, 1983.
Stoneley Waves Propagating along an Interface between Piezoelectric Material and Glass : Surface Acoustic Waves and Devices

SHIMIZU Yasutaka, IRINO Toshio

Japanese journal of applied physics. Supplement ( 社団法人応用物理学会 ) 22 ( 3 ) 145 - 147 1983.07 [Refereed]
ZnOとガラスの境界面を伝搬するストンリー波の理論的検討

清水康敬, 入野俊夫

電子通信学会論文誌 C ( 電子通信学会 ) 65 ( 11 ) 883 - 890 1982.11 [Refereed]
The theoretical analysis of stoneley waves propagating along an interface between Zno and glass

Yasutaka Shimizu, Toshio Irino

Electronics and Communications in Japan (Part I: Communications) 65 ( 11 ) 108 - 117 1982.01 [Refereed]

　View Summary

Conventional surface acoustic wave devices mainly use a Rayleigh wave, propagating along the surface of the substrate. Therefore, they require packaging. How—ever, the cost of packaging is high. Also, if the packaging quality is poor, water drops accumulate on the substrate surface at low temperature and the device may malfunction. This paper describes devices that do not require packaging and examine Stoneley waves, propagating along the interface between a piezoelectric ZnO layer, which can excite a surface wave, and a glass layer, in which material constants can be changed relatively easily. We find the range of the material constants of the glass which, in combination with ZnO, can generate Stoneley waves. We obtain the velocity, electromechanical coupling coefficient and energy concentration at the interface within this range. The effect of the material constants on these parameters is also considered. It is found that there are glasses that support Stoneley waves and others that do not. Copyright © 1982 Wiley Periodicals, Inc., A Wiley Company

DOI

▼display all

Books etc

Hearing

Shigeto Furukawa, Junsei Horikawa, Toshio Irino( Part： Joint author, Work： Section 2 Functions of frequency analysis)

Colona publications 2021.03
人工知能学大辞典

人工知能学会編( Part： Joint author, Work：入野俊夫 "聴覚系のモデル,")

共立出版 2017.07 ISBN: 9784320124202
Perspectives on Auditory Research

A. N. Popper, R. R. Fay( Part： Joint author, Work： Roy D. Patterson and Toshio Irino, "Size Matters in Hearing: How the Auditory System Normalizes the Sounds of Speech and Music for Source Size,")

Springer 2014 ISBN: 9781461491019

　View Summary

Springer Handbook of Auditory Research Vol. 50
聴覚モデル

森周司, 香田徹, 日比野浩, 任書晃, 倉智嘉久, 入野俊夫, 鵜木祐史, 鈴木陽一, 牧勝弘, 津崎実( Part： Joint author, Work：第4章"聴覚フィルタの心理物理実験とモデル," 第7章"シミュレータによる内部表現と特徴量,")

コロナ社 2011 ISBN: 9784339013238

　View Summary

日本音響学会編音響サイエンスシリーズ
Neurophysiological Bases of Auditory Perception

Enrique A. Lopez-Poveda, Alan R. Palmer, Ray Meddis( Part： Joint author, Work： Toshio Irino, Yoshie Aoki, Hideki Kawahara, and Roy D. Patterson, "Size Perception for acoustically scaled sounds of naturally pronounced and whispered words,")

Springer, LaVergne, TN USA 2010.04 ISBN: 9781441956859
Computer Processing of Asian Spoken Languages

Shuichi Itahashi, Chiu-yu Tseng( Part： Joint author, Work： Hideki Kawahara, Masanori Morise, Toru Takahashi, Ryuich Nishimura, Hideki Banno, Toshio Irino, "STRAIGHT, a framework for speech analysis, modification and synthesis,")

Consideration Books, Los Angeles, USA 2010.03 ISBN: 9780935047721
現代数理科学辞典（第２版）

広中平祐, 他( Part： Joint author, Work：入野俊夫, 河原英紀, "聴覚認知過程の数理,")

丸善, 東京 2009.12 ISBN: 9784621081259
新編感覚知覚心理学ハンドブック Part 2 (分担："聴覚初期過程の機能モデル")

大山正, 今井省吾, 和氣典二, 菊池正編( Part： Joint author, Work：入野俊夫, 津崎実第III部聴覚 "聴覚初期過程の機能モデル,")

誠信書房 2007.09 ISBN: 9784414305043
The Dynamics of Speech Production and Perception (分担："Vowel normalisation: Time-domain processing of the internal dynamics of speech,"

Pierre Divenyi, Steven Greenberg, George Meyer( Part： Joint author, Work： Richard E. Turner, Marc A. Al-Hames, David R. R. Smith, Hideki Kawahara, Toshio Irino, and Roy D. Patterson "Vowel normalisation: Time-domain processing of the internal dynamics of speech,")

IOS press, Amsterdam 2006 ISBN: 1586036661

　View Summary

NATO Science Series, Series A: Life Sciences,
Speech Separation by Humans and Machines

Pierre Divenyi( Part： Joint author, Work： "Speech Segregation Using an Event-Synchronous Auditory Image and STRAIGHT," "Underlying Principles of a High-quality Speech Manipulation Systsem STRAIGHT and Its Application to Speech Segregation,")

Kluwer Academic Publishers, Dordrechet (The Netherlands) 2005 ISBN: 1402080018
Auditory Signal Processing: Physiology, Psychoacoustics, and Models

Pressnitzer, D, de Cheveigne A, McAdams, S, Collet, L( Part： Joint author, Work： Roy D. Patterson, Masashi Unoki, and Toshio Irino, "Comparison of the compressive-gammachirp and double-roex auditory filters,")

Springer, New York 2005 ISBN: 0387219153
Computational Models of Auditory Function NATO Science Series, Series A: Life Sciences, Vol. 312

Greenberg, S, Slaney, M( Part： Joint author, Work： Toshio Irino and Masashi Unoki, "An analysis/synthesis auditory filterbank based on an IIR gammachirp filter")

IOS Press, Amsterdam 2001 ISBN: 9051994575
Physiological and Psychophysical Bases of Auditory Function,

Breebaart, D.J, Houstsma, A.J.M, Kohlrausch, A, Prijs, V.F, Schoonhoven, R( Part： Joint author, Work： Toshio Irino and Roy D. Patterson ,"A gammachirp framework of auditory filtering : Unification of cochlear frequency-glide data and Psychoacoustical masking data,")

Shaker Publishing, The Netherlands 2001 ISBN: 9042301155
Recent Developments in Auditory Mechanics

Wada, H, Takasaka, T, Ikeda, K, Ohyama, K, Koike, T( Part： Joint author, Work： Toshio Irino and Roy D. Patterson , "A gammachirp perspective of cochlear mechanics that can also explain human auditory masking quantitatively,")

World Scientific, Singapole 2000 ISBN: 9810241704
Psychophysical and Physiological Advances in Hearing

A.R.Palmer, A.Rees, A.Q.Summerfield, R.Meddis( Part： Joint author, Work： Roy D. Patterson and Toshio Irino "Auditory temporal asymmetry and autocorrelation")

Whurr Publishers, London 1998 ISBN: 1861560699
Mathematics Applied to Biology and Medicine

J. Demongeot, V. Capasso( Part： Joint author, Work： Thierry Herve, Toshio Irino, Hideki Kawahara, "How synaptic delays change the response of a massively parallel post-cochlear neural network,")

Wuerz Publishing Ltd., Winnipeg, Canada 1993 ISBN: 0920063632

▼display all

Misc

Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening,

Ayako Yamamoto, Toshio Irino, Shoko Araki, Kenichi Arai, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

arXiv arXiv:2203.16760 2022.03
GESI: Gammachirp Envelope Similarity Index for Predicting Intelligibility of Simulated Hearing Loss Sounds

Ayako Yamamoto, Toshio Irino, Fuki Miyazaki, Honoka Tamaru （Part： Corresponding author )

arXiv.2310.15399 preprint 2023.12

DOI
Speech intelligibility of simulated hearing loss sounds and its prediction using the Gammachirp Envelope Similarity Index (GESI)

Toshio Irino, Honoka Tamaru, Ayako Yamamoto （Part： Lead author,　Corresponding author )

arXiv.2206.06573 preprint --- accepted to Interspeech2022 2022.06

DOI
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility

Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani （Part： Corresponding author )

Interspeech 2021 ( ISCA ) 2104.10001 2021.08

DOI
Toward handy understanding of environment for speech material acquisition and presentation: Application of a cascaded all-pass filters with randomized center frequencies and phase polarities.

河原英紀, 矢田部浩平, 榊原健一, 水町光徳, 森勢将雅, 坂野秀樹, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2021 2021
Application of frequency domain variant of Velvet noise to the measurement of auditory effects on the voice fundamental frequency

河原英紀, 榊原健一, 津崎実, 松井淑恵, 森勢将雅, 入野俊夫

電子情報通信学会技術研究報告 119 ( 440(SIP2019 103-169) ) 2020
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech

Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani

arXiv 1904.02096 2019.04
Application of frequency domain variants of velvet noise to multi-aspect measurement of acoustic systems

河原英紀, 榊原健一, 水町光徳, 森勢将雅, 坂野秀樹, 入野俊夫

電子情報通信学会技術研究報告 119 ( 253(EA2019 36-49) ) 2019
Annotating Compliments

井上雅史, 中島隆太郎, 花田里欧子, 古山宣洋, 入野俊夫

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 ( 電子情報通信学会 ) 117 ( 509 ) 11 - 15 2018.03
Evaluation of active listening in psychotherapy:Comparison of clinical psychologists and students

HANADA Ryoko, NAKAJIMA Ryutaro, INOUE Masashi, FURUYAMA Nobuhiro, IRINO Toshio

Proceedings of the Annual Conference of JSAI ( The Japanese Society for Artificial Intelligence ) 2018 3C1OS14a02 - 3C1OS14a02 2018

　View Summary

Active listening is one of the indispensable axes in evaluating the dialogue of psychotherapy. Although there have been discussions about it in the area of clinical psychology, the method for evaluating active listening has been missing. It is thus necessary to establish it to improve the quality of listening in the interview. The authors have proposed a measurement method of the degree of active listening with a device we originally developed to evaluate emotion (EMO system). This paper reports on the experiment conducted to compare the evaluations of a psychotherapy by expert clinical psychologists with those by undergraduate students as one of the coursework tasks for a clinical psychology course. A new experimental setup was proposed including a multiresolutional analysis to detect the change of active listening evaluation.

DOI
Hearing impairment simulator for training course of speech therapists and development of its web application

米満麻弥, 入野俊夫, 松井淑恵, 西村竜一, 吐師道子, 長谷川純

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 ( 電子情報通信学会 ) 117 ( 29 ) 277 - 282 2017.05
Aliasing-free Fujisaki-Ljungqvist model and its application to voice quality perception

KAWAHARA Hideki, TSUZAKI Minoru, MATSUI Toshie, IRINO Toshio, SAKAKIBARA Ken-Ichi

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 47 ( 2 ) 71 - 76 2017.03
Correspondence between tags assigned via microcounseling and the listening assessment of evaluators as determined by the EMOtional MOvement Observation system (EMO system)

花田里欧子, 入野俊夫, 古山宣洋, 井上雅史, 中島隆太郎

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 ( 電子情報通信学会 ) 116 ( 524 ) 113 - 118 2017.03
Active listening learning support for counselors by employing a psychological counseling corpus and the EMOtional MOvement Observation system (EMO system)

花田里欧子, 入野俊夫, 古山宣洋, 井上雅史, 中島隆太郎

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 ( 電子情報通信学会 ) 116 ( 436 ) 5 - 10 2017.01
An improvement of the predicting method for speech intelligibility using the dynamic compressive gammachirp filterbank

山本克彦, 入野俊夫, 松井淑恵

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 46 ( 1 ) 35 - 40 2016.02
The enhancing high-frequency components of unvoiced sounds impacts on the size perception

山本航大, 入野俊夫, 岡本江美

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 45 ( 8 ) 681 - 686 2015.11
Study on predicting speech intelligibility of enhanced speech sounds using the dynamic compressive gammachirp auditory filterbank and modulation filterbank

YAMAMOTO Katsuhiko, IRINO Toshio, ARAKI Shoko

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 45 ( 7 ) 569 - 574 2015.10
Statistical modelling of an F0 estimation method based on higher-order waveform symmetry and its application to filled pause analysis

KAWAHARA Hideki, NISIMURA Ryuichi, IRINO Toshio

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 114 ( 475 ) 307 - 312 2015.03

　View Summary

A robust method for tacking F0 trajectory as an initial estimate followed by a refinement procedure which is base on a temporally static instantaneous frequency is proposed. The proposed initial estimation method is based on a higher-order waveform symmetry measure which is computationally efficient and has finer temporal resolution. This proposal is aiming at analysing filled pause, which is frequently observed in spontaneous speech used in everyday situations. Instabilities of vocal fold vibration usually found in filled pauses, which make commonly used F0 extractors to fail, motivated this development of a new F0 extraction method.
Syllable identification of speech sounds processed by a hearing impairment simulator which cancels auditory peripheral compression

松井淑恵, 入野俊夫, 永江美沙貴

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 45 ( 2 ) 93 - 98 2015.03
Change of size perception when enhancing high-frequency components of unvoiced sounds and its computational theory

山本航大, 入野俊夫, 西村竜一

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 45 ( 2 ) 99 - 104 2015.03
Realtime singing voice conversion to growl-like singing based on vocal tract shape and glottal source characteristics

溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2015 ( 12 ) 1 - 6 2015.02

　View Summary

Outline of a system to convert usual singing voice to growl-like performance in realtime is introduced. Relatively high-speed periodic variations (around 70Hz) in spectral shapes and fundamental frequency trajectories were found dominant features of growl-like singing in our pervious investigations. A set of simulations revealed that these spectral shape variations can be closely replicated by introducing vocal tract shape variations around spura-glottal structures and shape variations in glottal source waveform using the LF-model. Despite the fact that realtime extraction of LF parameters from input voice is not feasible, the simulation results indicated that the net effect of the variation can be represented by simple spectral slope variations. For vocal tract shape variation, several set of spectral models for approximating simulated variations can be suggested. These indicate that by using these approximated models, it is possible to design a realtime system for converting usual singing voices to growl-like voices.
Improving voice attractiveness by speech parameter modification for interactive voice training applications

吉元照貴, 西村竜一, 入野俊夫, 河原英紀

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2015 ( 25 ) 1 - 6 2015.02

　View Summary

A simple voice training system for improving attractiveness is introduced with descriptions on a set of procedures which consist of the system. Those procedures are based on findings drawn from our investigations on voice attractiveness using a new voice morphing method. They are summarized as follows. a) Most contributing physical factors on attractiveness are fundamental frequency and spectral information. b) Attractiveness judgement differs among listeners. c) Change in perceived talker of the modified voice caused by physical parameter manipulation for improving voice attractiveness is disturbing for listener's judgement and adjustment. To overcome the last disturbing factor, physical parameters change within each talker for improving attractiveness is acquired recruiting student actors in our university. Several sets of physical parameters change are applied to improve attractiveness of voices with lower attractiveness score. Attractiveness of the modified voices using these sets of physical parameters change were tested for all possible combinations of the source actor, talkers of manipulated voices and the listeners. The proposed voice training system is introduced based on the results of tests.
Realtime singing voice conversion to growl-like singing based on vocal tract shape and glottal source characteristics

溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

IPSJ SIG technical reports ( Information Processing Society of Japan (IPSJ) ) 2015 ( 12 ) 1 - 6 2015.02

　View Summary

Outline of a system to convert usual singing voice to growl-like performance in realtime is introduced. Relatively high-speed periodic variations (around 70Hz) in spectral shapes and fundamental frequency trajectories were found dominant features of growl-like singing in our pervious investigations. A set of simulations revealed that these spectral shape variations can be closely replicated by introducing vocal tract shape variations around spura-glottal structures and shape variations in glottal source waveform using the LF-model. Despite the fact that realtime extraction of LF parameters from input voice is not feasible, the simulation results indicated that the net effect of the variation can be represented by simple spectral slope variations. For vocal tract shape variation, several set of spectral models for approximating simulated variations can be suggested. These indicate that by using these approximated models, it is possible to design a realtime system for converting usual singing voices to growl-like voices.
Improving voice attractiveness by speech parameter modification for interactive voice training applications

吉元照貴, 西村竜一, 入野俊夫, 河原英紀

IPSJ SIG technical reports ( Information Processing Society of Japan (IPSJ) ) 2015 ( 25 ) 1 - 6 2015.02

　View Summary

A simple voice training system for improving attractiveness is introduced with descriptions on a set of procedures which consist of the system. Those procedures are based on findings drawn from our investigations on voice attractiveness using a new voice morphing method. They are summarized as follows. a) Most contributing physical factors on attractiveness are fundamental frequency and spectral information. b) Attractiveness judgement differs among listeners. c) Change in perceived talker of the modified voice caused by physical parameter manipulation for improving voice attractiveness is disturbing for listener's judgement and adjustment. To overcome the last disturbing factor, physical parameters change within each talker for improving attractiveness is acquired recruiting student actors in our university. Several sets of physical parameters change are applied to improve attractiveness of voices with lower attractiveness score. Attractiveness of the modified voices using these sets of physical parameters change were tested for all possible combinations of the source actor, talkers of manipulated voices and the listeners. The proposed voice training system is introduced based on the results of tests.
Nonlinearity and Wavelet property of the auditory filterbank suitable for scale analysis in the auditory system (Wavelet analysis and sampling theory)

Irino Toshio, Kawahara Hideki, Patterson Roy D.

RIMS Kokyuroku ( Kyoto University ) 1928 27 - 57 2014.12
The role of STRAIGHT in research on the perception of size in speech and music(Invited talk)

PATTERSON Roy D., IRINO Toshio

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 114 ( 272 ) 71 - 75 2014.10

　View Summary

Fifteen years ago, while working on the mathematics of the gammachirp auditory filter, we realized that the perception of speech and music is largely scale invariant. People understand the speech of other people no matter what their average voice pitch or their mean formant frequency. People also know the family of an instrument (brass, string or woodwind) independent of its size and register. We illustrated how the auditory system could use a form of "stabilized wavelet-Mellin transform" to normalize the sounds of speech and music, and we decided to do some research on the perceptual invariance of speech and musical sounds. This was easier said than done, as it requires the manipulation of the acoustic scale variables in natural sounds. Fortunately, at about the same time, Kawahara-sensei released STRAIGHT which provided high-fidelity manipulation of the pitch and vocal tract length of speech sounds and musical tones. This paper describes a sequence of experiments on the perception of size using sounds in which the scale parameters were manipulated by STRAIGHT, and how the resynthesis element of STRAIGHT was adapted for musical sounds. The research provides one extended example of how STRAIGHT has empowered research on the perception of natural sounds.
Invited talk : The role of STRAIGHT in research on the perception of size in speech and music

PATTERSON Roy D., 入野俊夫

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 44 ( 7 ) 473 - 477 2014.10
Pre-processing for vocal tract area function estimation using linear prediction analysis

ISA Kinuyo, YOSHIMOTO Shoki, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 114 ( 272 ) 27 - 28 2014.10

　View Summary

Estimaiton of vocal tract area function based on linear predictive analysis suffers from biasing factors such as glottal waveform and radiation from mouth. Preprocessing procedures for compensating these effects consist of high-frequency emphasis and spectrum flattening and were investigated. Analysis results using these procedures on a vowel database are also introduced.
Investigations on estimated vocal tract area functions of growl-like singing voices

MIZOBUCHI Shohei, ISA Kinuyo, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 114 ( 272 ) 29 - 30 2014.10

　View Summary

Behavior of vocal tract area functions estimated from growl-like singing voices was investigated to introduce a simple model for generating synthetic growl-like singing. Our previous study revealed that a fast modulation of spectral shape around 2 to 4 kHz is the most significant feature of growl-like singing. LPC-based vocal tract shape estimation with relevant preprocessing procedures was applied to growl-like singing and normal singing voices.
Acquisition and retention of perceptual cue for size judgment using whispered speech

YAMAMOTO Koudai, IRINO Toshio, NISHIMURA Ryuichi, KAWAHARA Hideki

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 114 ( 52 ) 237 - 242 2014.05

　View Summary

We have suggested that the auditory system can extract and separate information about vocal tract shape from information about vocal tract length (VTL) (strictly speaking, acoustic scale).The previous research shows that just noticeable difference (JND) values using the speech stimuli is about 5%. This is the case when the subjects have acquired size perception clue. The JND values is not necessarily small particularly for naive subjects. This parer presents a series of experiments to survey the characteristics of acquisition and retention of the perceptual cue for size discrimination task. We performed pretest, training session, posttest, and retention test using whispered words in the same procedure as reported previously. Prom the results of the first posttest, eight subjects was grouped into high performance (HP) group and low performance (LP) group. HP group performed the retention test after one month to confirm the JND values are almost the same. LP group was trained again to improve the JND values similar to the HP's values. As a result, given the sufficient acquisition of size perception clue, the JND values become the same as the values reported in the previous studies.
ROCKON : Environmental sound collection and recognition system using smartphones

MATSUYAMA Minori, Tsuda TAKAHIKO, NISHIMURA Ryuichi, KAWAHARA Hideki, YAMADA Junnosuke, IRINO Toshio

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 114 ( 52 ) 181 - 186 2014.05

　View Summary

We have been developing an Android mobile application which can provide an useful information for users by recognizing environmental sounds around us. This paper evaluates environmental sound recognition methods in comparison with the AdaBoost and the HMMs (Hidden Markov Models). The experimental results proved that AdaBoost could obtain better performances from the viewpoint of the accuracy and the processing speed. Further collection of environmental sounds based on the crowdsourcing approach needs to introduce the Android app with the improved user interface (UI) for annotating a source type of a sound. Crowdsourcing proved useful for easily developing the sound database. However,we discovered that improvements to the system were necessary to maintain the motivation of trial users in order for them to continue the sound collection activity. We developed a new UI that enables users to simply select an appropriate sound source class from a list prepared in advance. In the experiments in evaluating two types of UIs: a hierarchical type and a list view type, we concluded that there is no significant difference between both UIs in terms of convenience. In order to utilize the advantages of both types, we implemented an annotation UI that can be switched between both types of UIs.
Design of voice-enabled web test system for eliminating users' impatience

TAFUJI Chihiro, NISHIMURA Ryuichi, KAWAHARA Hideki, IRINO Toshio

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 114 ( 52 ) 337 - 342 2014.05

　View Summary

We have investigated the user interface (UI) design of the web-based test system with a voice input function. As for the visual feedbacks to the examinee, a time gauge indicating the remainder of the answer time and a level meter for checking an input state of the speech are located on the screen of our system displaying the questions. In the previous UI, the similarities of two visual presentations often caused confusions of the examinees. In order to provide the appropriate presentations of the questions on the web screen, we improved the design of the voice-enabled UI. In the experiment for evaluating the improved UI, we have developed a system to answer computational questions via the speech web interface. By focusing on the time gauge, we investigated "time guage speed and impatience" which the users feel in the time of using the system. As a result, we confirmed the suitability that the brick-type time gauge displaying elapsed time based on discreted indicators dividing the time into 1 second. Based on investigations of the relationship of examinees' speaking styles and speech recognition rates, we found a tendency for the accuracy of the person who did not aware of the interaction with the machine is low. Because we adopted HTML5 as a implementation language of the voice-enabled UI, the improved system could run on the Android mobile machine and PCs.
A GUI for manipulating grow-like taste in singing voice

MIZOBUCHI SHOHEI, NISHIMURA RYUICHI, IRINO TOSHIO, KAWAHARA HIDEKI

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 114 ( 52 ) 279 - 284 2014.05

　View Summary

A set of GUIs is designed to add and manipulate growl-like taste in singing voice based on a set of simple signal processing procedures, proposed in our previous report. It consists of a temporal axis modulator for simulating rapid F0 variations, an equalizer to modify global spectral shape, and an approximate time varying filter for simulating rapid spectral modulation around F3 area. The proposed set of procedures is potentially applicable to realtime applications, such as live performance. This set of GUIs will be presented in the poster session for demonstrating possibilities of the proposed procedures and acquiring feedback and comments from prospective participants.
A GUI for manipulating growl-like taste in singing voice

溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

研究報告音楽情報科学（MUS） 2014 ( 55 ) 1 - 6 2014.05

　View Summary

本研究では通常歌唱をグロウル系統の歌唱音声の印象をもつ音声に変換するシステムの検討を行っている．先行研究では簡単な信号処理で歌唱音声にグロウルらしさを付与する方法が提案された．本報告では提案手法で用いる特徴付与のパラメタを対話的に操作し，歌唱音声にグロウルらしさを付与する GUI について紹介する．提案手法は時間変調による基本周波数の高速な時間振動の付与，FIR フィルタによる処理範囲に共通した帯域強調処理，及び近似時変フィルタによる第 3 フォルマント周辺の高速な時間変調の付与の 3 つより構成されている．提案手法は変換処理に分析・合成を必要としないためリアルタイム処理を可能とし，ライブで一種のエフェクターとして用いることが出来る．GUI の開発は主にデモやポスターセッションの場で本手法による処理内容と処理の影響について直感的理解を促すことを目的としている．開発した GUI は実際にポスターセッションの場で操作し，操作性やデザイン性についてコメントを頂きたい．A set of GUIs is designed to add and manipulate growl-like taste in singing voice based on a set of simple signal processing procedures, proposed in our previous report. It consists of a temporal axis modulator for simulating rapid F0 variations, an equalizer to modify global spectral shape, and an approximate time varying filter for simulating rapid spectral modulation around F3 area. The proposed set of procedures is potentially applicable to realtime applications, such as live performance. This set of GUIs will be presented in the poster session for demonstrating possibilities of the proposed procedures and acquiring feedback and comments from prospective participants.
ROCKON: Environmental sound collection and recognition system using smartphones

松山みのり, 津田貴彦, 西村竜一, 河原英紀, 山田順之介, 入野俊夫

研究報告音楽情報科学（MUS） 2014 ( 37 ) 1 - 6 2014.05

　View Summary

本研究では、身の回りの環境音を認識することで、ユーザに有益な情報を提供できるモバイルアプリケーションを開発する。本稿では、環境音の認識アルゴリズムとして比較した HMM と AdaBoost による性能評価と、クラウドソーシングを用いた環境音サンプルの収集方法について述べる。評価実験の結果、Android 端末を用いて収集した実環境の環境音サンプルに対して、AdaBoost が HMM よりも認識性能および処理スピードにおいて有利な結果を示した。今後、対応音源の種類を増やすためには多くの環境音サンプルが必要となってくる。そのため、環境音収集アプリの改良をした。環境音を収集する際に協力者に与える負担の軽減を目指して、本研究では 2 種類のユーザインタフェース (UI) を提案する。実験協力者を用いた調査では、提案する 2 種類の UI の併用が妥当であるという結論が得られた。そこで、改良後の環境音収集アプリには両手法を併用した UI を実装することにした。We have been developing an Android mobile application which can provide an useful information for users by recognizing environmental sounds around us. This paper evaluates environmental sound recognition methods in comparison with the AdaBoost and the HMMs (Hidden Markov Models). The experimental results proved that AdaBoost could obtain better performances from the viewpoint of the accuracy and the processing speed. Further collection of environmental sounds based on the crowdsourcing approach needs to introduce the Android app with the improved user interface (UI) for annotating a source type of a sound. Crowdsourcing proved useful for easily developing the sound database. However,we discovered that improvements to the system were necessary to maintain the motivation of trial users in order for them to continue the sound collection activity. We developed a new UI that enables users to simply select an appropriate sound source class from a list prepared in advance. In the experiments in evaluating two types of UIs: a hierarchical type and a list view type, we concluded that there is no significant difference between both UIs in terms of convenience. In order to utilize the advantages of both types, we implemented an annotation UI that can be switched between both types of UIs.
Acquisition and retention of perceptual cue for size judgment using whispered speech

山本航大, 入野俊夫, 西村竜一, 河原英紀

研究報告音楽情報科学（MUS） 2014 ( 47 ) 1 - 6 2014.05

　View Summary

人間の聴覚系には，音源の寸法情報と形状情報を分離抽出する機能があるという理論が提案されている．先行研究にて，音声刺激を用いた寸法知覚の弁別閾が測定されており，約 5％であると示されている．ところが，これは寸法情報の知覚手がかりを把握している場合であり，この実験の未経験者においては弁別閾がそれほど小さくないことも多い．そこで本研究では，弁別訓練による手がかり情報の獲得，またその保持について検討する．無声音声を用いた寸法弁別実験において，プリテスト，訓練，ポストテストを被験者 8 名で行った．ポストテストの結果，訓練効果があることがわかったが，弁別閾が小さい HP 群と大きい LP 群に分かれた，HP 群は一定期間後，手がかり保持に関するテストを行い，弁別精度に違いがないことが確認できた LP 群は再訓練を行うことにより弁別閾が小さくなることを確認した．これらのことより，手がかりが十分把握できれば先行研究と同程度の弁別閾になることがわかった．We have suggested that the auditory system can extract and separate information about vocal tract shape from information about vocal tract length (VTL) (strictly speaking, acoustic scale). The previous research shows that just noticeable difference (JND) values using the speech stimuli is about 5%. This is the case when the subjects have acquired size perception clue. The JND values is not necessarily small particularly for naive subjects. This parer presents a series of experiments to survey the characteristics of acquisition and retention of the perceptual cue for size discrimination task. We performed pretest, training session, posttest, and retention test using whispered words in the same procedure as reported previously. From the results of the first posttest, eight subjects was grouped into high performance (HP) group and low performance (LP) group. HP group performed the retention test after one month to confirm the JND values are almost the same. LP group was trained again to improve the JND values similar to the HP's values. As a result, given the sufficient acquisition of size perception clue, the JND values become the same as the values reported in the previous studies.
Design of voice-enabled web test system for eliminating users' impatience

田藤千弘, 西村竜一, 河原英紀, 入野俊夫

研究報告音楽情報科学（MUS） 2014 ( 65 ) 1 - 6 2014.05

　View Summary

本研究は、音声入力機能を備えたウェブ試験システムにおけるユーザインタフェース（UI）のデザイン指針を検討する。本研究で対象とするウェブ試験システムの問題提示画面では、問題文の他、音声の入力状態を確認するためのレベルメータと解答時間の残りを示すタイムゲージが受験者に提示される。従来のシステムでは、この二つの視覚的情報提示が似ており、受験者に混同されることがあった。受験者に適切な問題提示画面を提供するために、UI デザインを改良した。実験では、発話によって計算問題を解答するシステムを試作し、タイムゲージに着目して、解答の際に受験者が感じる「焦り」と「体感時間の速さ」を調査した。その結果、1 秒ずつ離散的に区切って表示するブロック型のデザインが適切であることを確認した。受験者の発話態度と音声認識率の関係を調査したところ、機械との対話を意識しなかった人の精度は低くなる傾向があった。また、本研究では、音声入力 UI の実装言語を HTML5 とした。その結果、PC およびモバイル端末 (Android) から利用できるシステムを実現することができた。We have investigated the user interface (UI) design of the web-based test system with a voice input function. As for the visual feedbacks to the examinee, a time gauge indicating the remainder of the answer time and a level meter for checking an input state of the speech are located on the screen of our system displaying the questions. In the previous UI, the similarities of two visual presentations often caused confusions of the examinees. In order to provide the appropriate presentations of the questions on the web screen, we improved the design of the voice-enabled UI. In the experiment for evaluating the improved UI, we have developed a system to answer computational questions via the speech web interface. By focusing on the time gauge, we investigated "time guage speed and impatience" which the users feel in the time of using the system. As a result, we confirmed the suitability that the brick-type time gauge displaying elapsed time based on discreted indicators dividing the time into 1 second. Based on investigations of the relationship of examinees' speaking styles and speech recognition rates, we found a tendency for the accuracy of the person who did not aware of the interaction with the machine is low. Because we adopted HTML5 as a implementation language of the voice-enabled UI, the improved system could run on the Android mobile machine and PCs.
Shifts of Absolute Pitch Judgment by Aging : Effects of Pitch Registers

津崎実, 松井淑恵, 入野俊夫

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 44 ( 2 ) 81 - 86 2014.03
D-9-25 DEVELOPMENT OF ANDROID APP*FOR COLLECTING ENVIRONMENTAL SOUNDS BASED ON CROWDSOURCING APPROACH

Matsuyama Minori, Tsuda Takahiko, Nishimura Ryuichi, Yamada Junnosuke, Irino Toshio, Kawahara Hideki

Proceedings of the IEICE General Conference ( The Institute of Electronics, Information and Communication Engineers ) 2014 ( 1 ) 109 - 109 2014.03
Realtime conversion of growl-type voice qualities based on modulation and approximate time-varying filtering driven by a non-linear oscillator: Formulation

Hideki Kawahara, Shohei Mizobuchi, Masanori Morise, Ken-ichi Sakakibara, Ryuichi Nisimura, Toshio Irino

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2014 ( 14 ) 1 - 6 2014.02

　View Summary

A formulation of voice conversion to add growl-like voice qualities to singing voices is proposed based on our findings of features in such singing performances. The proposed method does not consist of any analysis and synthesis stage(s). A preliminary implementation using Matlab demonstrated that its throughput is faster than realtime. The proposed formulation provides not only post processing capabilities of rendering styles of existing performances to recorded materials but also realtime capabilities of adding growl-like voice qualities in live performances.
Processing of Inverse compression and user-interface for hearing impairment simulator

永江美沙貴, 入野俊夫, 西村竜一

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 44 ( 1 ) 13 - 18 2014.02
Contrast of an asymmetrical level notch masking method and a temporal masking curve method in estimating compression

深渡瀬智史, 入野俊夫, 西村竜一

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 44 ( 1 ) 7 - 12 2014.02
Contributing factors in preference judgement in read sentences using morphing of individual attributes

YOSHIMOTO Shoki, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 113 ( 404 ) 49 - 54 2014.01

　View Summary

A new research strategy based on a recently proposed morphing algorithm, time varying multi-aspect N-way speech morphing algorithm, is applied to investigate evaluation and control of speech (voice) "attractiveness." The new algorithm generates a morphed speech using arbitrarily many numbers of speech samples in a one staged procedure. The morphing rates in this formulation can be manipulated independently using a time series for each of five physical parameters and, in addition, can have negative values. In the current report, a set of representative utterances of spoken sentences having different "attractiveness" were selected to generate a set of stimulus continuum using the morphing procedure. Preliminary tests indicated that morphing of physical parameter actually morphs "attractiveness" in a monotonic way. Using independent control of physical attributes, morphed speech stimuli which are corresponding to vertices of a five dimensional hyper cube in the attribute space were generated. They were evaluated their "attractiveness" by subjective tests of paired comparison to investigate contributions of each physical attribute. Finally, exploratory research using speech neutralization and caricaturization, which were made feasible by the new algorithm, discussed as a prospective direction of further study.
音声認識を用いた日本語スピーキングテストとそのユーザインタフェースデザインの検討

田藤千弘, 西村竜一, 河原英紀, 入野俊夫, 今井新悟

教育システム情報学会全国大会講演論文集(CD-ROM) 39th 2014
日本語スピーキングテストS-CATの音声入力インタフェース設計

田藤千弘, 西村竜一, 河原英紀, 入野俊夫, 今井新悟

日本音響学会研究発表会講演論文集(CD-ROM) 2014 2014
周期信号の群遅延の静的表現と音声の非周期成分への応用について

河原英紀, 森勢将雅, 榊原健一, 戸田智基, 坂野秀樹, 西村竜一, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2014 2014
Shifts in the absolute pitch judgment by aging : An investigation of the relationship to the hearing loss

津崎実, 松井淑恵, 入野俊夫

日本音響学会研究発表会講演論文集日本音響学会編 ( 日本音響学会 ) 549 - 552 2014
Shifts in the absolute pitch judgment by aging and its relation to the otoacoustic emissions

津崎実, 松井淑恵, 入野俊夫

日本音響学会研究発表会講演論文集日本音響学会編 ( 日本音響学会 ) 479 - 482 2014
Psychophysical measurement of cochlear compression and its application to a hearing impairment simulator

入野俊夫

日本音響学会研究発表会講演論文集日本音響学会編 ( 日本音響学会 ) 1579 - 1582 2014
Prediction by the Spectro-Temporal Receptive Field Model for Pitch Shifts in SAWSs (Scale Alternating Wavelet Sequences) : Comparison with a Fourier Power Spectral Model

津崎実, 入野俊夫, 竹島千尋

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 43 ( 8 ) 631 - 638 2013.11
An analysis of the relationship between prosodic information, head motion, and estimated emotional state in explanatory dialogue

YAGI Miyuki, MORITA Reiko, NAKAI Masato, NISHIMURA Ryuichi, KAWAHARA Hideki, IRINO Toshio

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 113 ( 220 ) 15 - 20 2013.09

　View Summary

There is a long history to study the relationship between paralinguistic information of speech and emotional state. The dynamics of emotion in dialogues has not been well studied since the information about emotional state was usually given as static annotations on individual utterances. In this paper, we analyze the dynamics of emotional status evaluated by using a new GUI, emotional movement Observation (EMO) system, in a goal-oriented dialogue. We also modeled the relationship between the emotional stated and paralinguistic quantities, like fundamental frequency and speech power, and with acceleration of head nodding by using stepwise approximation of linear regression model.
Frequency-proportional dilation and compression in singing voice spectra and contributing factors

SAKAGUCHI Makoto, KOBAYASHI Mayuko, IRINO Toshio, NISIMURA Ryuichi, KAWAHARA Hideki

Technical report of IEICE. EA ( The Institute of Electronics, Information and Communication Engineers ) 113 ( 134 ) 9 - 14 2013.07

　View Summary

A new estimation method of relative vocal tract length was proposed based on short time Fourier analysis and demonstrated its high reproducibility. The proposed method is based on an interference-free power spectral representation of periodic signals. The interference-free envelope spectrum is preprocessed by removing global spectral shape, which is dependent on the glottal source waveform and the radiation characteristic from mouth opening. It also preprocessed by smoothing excessive spectral details such as; differences of formant peak bandwidths, spectral dips caused by vocal tract branching, existence of closing phase of vocal fold and three dimensional vocal tract shape. Spectral distance calculation of preprocessed spectra using only relevant frequency region is introduced to alleviate disturbing factors other than vocal tract length differences. This article reports application of the proposed method on singing voices for investigating effects of singers' individual differences and voice pitch on the estimated relative vocal tract lengths. It also discusses possible application to computer assisted voice training.
Voice tells your body information

小林真優子, 西村竜一, 入野俊夫, 河原英紀

研究報告音楽情報科学（MUS） 2013 ( 47 ) 1 - 6 2013.05

　View Summary

声を聴くと，何となくその人の体型が分かる．ここでは，母音だけを用いて相対的な声道長を推定する方法を提案する．この方法では，声道長以外の要因によるスペクトル形状変化の影響を軽減するために，スペクトル距離の計算に用いる帯域を制限し，スペクトルの大局的な平坦化と形状の過度な詳細の平滑化とを組合せている．6歳から56歳までの284名の男女が発声した母音と身体情報からなるデータベースを用いることで，これらの処理に用いるパラメタを決定した．母音だけを用いた簡易な方法にも関わらず，以前報告した聴覚モデルを用いた方法を凌駕する精度での声道長推定が可能であることを確認した．また，このデータベースに付与された身体情報を母音だけから推定できることを示した．When we hear a voice, we will see the person's body type somehow. In this article, we propose a method for estimating relative vocal tract length using only vowels. The proposed method consists of procedures to alleviate spectral deforming effects caused by other factors than the vocal tract length. They are selection of spectral region for calculating spectral distance, removal of global spectral shape, and smoothing of excessive details of spectrum. Parameter tuning of the proposed method was conducted by using a speech database with relevant physical data which consists of Japanese five vowels spoken by 284 male, female and adolescent talkers ranging from 6 to 56 years old. This simple vowel-based method found to provide better estimates than our previously proposed method. The proposed method also provides estimates of talkers' height and weight only from vowels using the relevant physical data stored in the database.
Development of Collection and Recognition Method for Environmental Sound Samples using Android Mobile Devices

津田貴彦, 中西恭介, 松山みのり, 西村竜一, 山田順之介, 河原英紀, 入野俊夫

研究報告音楽情報科学（MUS） 2013 ( 18 ) 1 - 6 2013.05

　View Summary

本研究では、環境音を入力とするインターフェースを有するモバイルアプリケーションの開発を行っている。実現に必要なのは、環境音認識手法の開発と、環境音サンプルの収集及び、クライアントアプリケーションの実装である。認識システムを予備評価した結果、アルゴリズムの改良と学習用データの拡充が必要であることを確認した。この問題に対し、データ収集用のAndroidアプリケーションを作成し、学内ではサークル等の活動に伴う音を29時間24分、学外では電車の走行音や救急車のサイレン等の音を10時間36分にわたって集めることに成功した。本発表では、収集データの分類と、その認識手法について議論する。We have been developing an Android mobile application which can recognize environmental sound signals. This report describes environmental sound signal recognition method, our collection of environmental sounds, and an overview of the prototype system. In order to collect further samples of environmental sounds, Android applications for data collection was developed.
環境音認識を応用した情報提供機能を有するモバイルアプリケーションの検討

中西恭介, 津田貴彦, 西村竜一, 河原英紀, 入野俊夫

全国大会講演論文集 ( 一般社団法人情報処理学会 ) 2013 ( 1 ) 463 - 465 2013.03

　View Summary

近年、スマートフォンで利用できる音声ナビゲーション機能が注目されている。また、日常では環境音からも多くの情報を得ることができる。そこで、本研究では環境音認識を応用し,その場の状況を判断するガイドシステムの開発を目指す。具体的には、和歌山大学の案内システムを開発する。本システムは、サーバークライアント型のアーキテクチャを採用しており、Android端末で録音した音響信号をサーバー側で認識処理する。実現に必要なのは、環境音認識プログラムの開発と、音響信号サンプルの収集およびアプリケーションの実装である。現在までに、収集した環境音を用いて認識実験を行った。結果を報告する。
波形の高次対称性に基づく基本周波数抽出法における潜在変数ダイナミクスの導入について

河原英紀, 森勢将雅, 榊原健一, 西村竜一, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2013 2013
Introduction of asymmetric level maskers in notched noise masking and reduction of measurement points

深渡瀬智史, 入野俊夫, 西村竜一

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 42 ( 7 ) 547 - 552 2012.10
A stable representation of group delay for periodic signals

KAWAHARA Hideki, MORISE Masanori, NISIMURA Ryuichi, IRINO Toshio

Technical report of IEICE. EA ( The Institute of Electronics, Information and Communication Engineers ) 112 ( 125 ) 1 - 6 2012.07

　View Summary

Instantaneous frequency and group delay, which are defined as the temporal derivative and the frequency derivative of phase respectively, are better representations than phase itself, because they are physically meaningful and do not require unwrapping, which is a fragile operation. However, abrupt changes and discontinuities, which are caused by interference between constituent components prevented them from potential applications. As the final piece of the authors' investigations for providing interference-free representations of power spectrum and instantaneous frequency, an interference-free representation of group delay is introduced. It is derived from the group delay representation, analogues to Flanagan's instantaneous frequency representation. The interference-free group delay is the power spectrum weighted average of the shifted pair of group delays 1/2 fundamental frequency apart.
Speaker Size Discrimination and Vowel Identification for Acoustically Scaled Vowels : Dependence of Vowel Duration

竹島千尋, 津崎実, 入野俊夫

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 42 ( 4 ) 369 - 374 2012.06
Speaker Size Discrimination and Vowel Identification for Acoustically Scaled Vowels : Dependence of Vowel Duration

TAKESHIMA Chihiro, TSUZAKI Minoru, IRINO Toshio

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 112 ( 81 ) 39 - 44 2012.06

　View Summary

This study aims to investigate characteristics of temporal integration for the auditory processing of size information. In this paper, we measured listeners' speaker-size discrimination using acoustically scaled vowels. The experimental results showed the discrimination performance largely improved when the vowel duration increased from 16 ms to 32 ms, while the performance did not show the large effects of duration when the duration was longer than 32 ms. This finding suggests that an integration window of around 32 ms influences on the size processing in the auditory system. The similar performance deterioration for 16-ms vowels was observed in vowel identification experiment, although the degree of deterioration was different by the driving source and frequency of vowels.
Cross synthesis VOCODER which preserves linguistic information and characteristic timbre of musical instruments and animal voices

西大輝, 西村竜一, 入野俊夫, 河原英紀

研究報告音楽情報科学（MUS） 2012 ( 3 ) 1 - 6 2012.05

　View Summary

楽器音や動物の鳴声と，音声の２つの音源の特徴を併せ持つ合成音を作るクロス合成 VOCODER の検討をしている．クロス合成は，音声の狭帯域伝送技術である VOCODER を応用した技術で，現在では楽曲制作や Vocal エフェクター等，音楽の分野で広く用いられる．しかし，クロス合成でつくられる合成音は，楽器音等の音色の特徴が失われ，元の楽器の音が何か不明確になるという問題がある．本報告では，この問題を解決するため，変調周波数領域を帯域制限することにより，音声の言語情報だけを残したスペクトルを用いる新たなクロス合成を提案する．さらに，変調周波数領域を処理するフィルタにおける遮断周波数の設計を検討し，その効果を主観評価実験により明らかにした．A new design method of cross synthesis VOCODER, which synthesizes sounds by mixing features of two input sounds, such as speech and musical instruments or animal voices, is proposed. Cross synthesis VOCODER is originated from a narrow-band transmission technology and currently widely used as an effector for musical performance and production. However, current cross synthesis effects tend to deteriorate original character of musical instruments and linguistic information of the processed sound is not always intelligible. The proposed method provide ways to alleviate these difficulties using two technique. One is spectral global shape removal form the speech spectral envelope and the other is band-pass filtering in the modulation frequency domain. Subjective test results indicated relevance of the proposed techniques and provide design guideline of new flexible cross synthesis VOCODERs.
Manipulation of temporal fine structures on excitation source and spectral envelope of singing voices and their effects on perceived impression

河原英紀, 森勢将雅, 西村竜一, 入野俊夫

研究報告音楽情報科学（MUS） 2012 ( 4 ) 1 - 6 2012.05

　View Summary

シャウトやデスボイスなどの激しい表現は、ポピュラー歌唱で広く用いられている。これらを適切に分析、再現、制御する方法を明らかにすることは、歌唱合成システムに豊かな表現力を与えるために解決すべき重要な課題である。本報告では、まず、新たに開発した高い時間分解能を有する基本周波数抽出法とそれに基づく TANDEM-STRAIGHT により、様々な歌唱音声を分析した結果について報告する。分析結果は、激しい表現にいおいて、70 Hz付近に 20 dB程度の高さのピークを有する高速の（基本周波数の）周波数変調と、同様に、高速の（スペクトル包絡の）振幅変調が存在することを示した。このような高速の変調の存在は、これまでにはっきりとは報告されていない。予備的な実験により、それらの高速の変調を加工することにより、発声の声区と努力の印象を保ったまま、シャウトなどの歌唱表現の強さ（生々しさ）を制御できる可能性が示された。Strong expressions such as "shout" and "death voice" are common in popular singing. However, current singing synthesis systems are not good at handling these strong expressions and are not capable of using them to expand their limit of expressiveness. This is the topic this article tries to address. A set of singing voice analysis tests was conducted using our newly developed F0 extraction method, which has high temporal resolution and is light-weighted, and TANDEM-STRAIGHT for spectral envelope analyses. This test revealed that expressive singing voices consist of high-speed frequency as well as amplitude modulations in F0 and spectral envelope respectively. In one typical case, about 20 dB higher modulation frequency spectral peak was found around 70 Hz for expressive performance than that of normal performance. Preliminary tests suggested that selective control of "expressiveness" can be implemented by manipulating these high-speed modulations while preserving vocal register and effort intact.
日本語発話能力測定ウェブテストシステムを用いて収集した留学生の日本語発話の分析

栗原理沙, 西村竜一, 和田芳佳, 河原英紀, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2012 ROMBUNNO.3-11-19 2012.03
ウェブデータベースを用いた音声認識用言語モデルの簡易適応

西村竜一, 島田敏明, 田中雅康, 河原英紀, 入野俊夫

全国大会講演論文集 ( 一般社団法人情報処理学会 ) 2012 ( 1 ) 5 - 7 2012.03

　View Summary

我々は、大語彙連続音声認識の精度向上の為、ウェブデータベースを用いた3-gram言語モデルの拡張手法を検討している。本手法は、Googleの日本語N-gramデータベースの登録情報に基づき、学習用コーパス内では未観測であった3-gramの出現確率を推定する。また、本手法では情報量を基準として重要単語を抽出し、拡張する3-gramを選別する。昨年の報告に引き続き、提案法を言語モデルのタスク適応に応用した。実験では、日本語話し言葉コーパス(CSJ)から抽出した講演発話を対象に本手法を適用し、認識精度を評価した。また、提案法を実装したウェブアプリサービスを構築する予定なので、その概要を報告する。
若年話者判別法の音響特徴に対する聴覚フィルタバンクの導入

宮森翔子, 西村竜一, 岡本恵里香, 入野俊夫, 河原英紀

全国大会講演論文集 ( 一般社団法人情報処理学会 ) 2012 ( 1 ) 613 - 615 2012.03

　View Summary

本研究では、対話インタフェースにおいて子どもに優しい振舞いを提供するために、音声認識を用いた若年者判別技術に関する検討を行っている。今回、従来から用いている音響特徴量であるMFCC(メル周波数ケプストラム係数)に、ガンマチャープ聴覚フィルタバンク(GCFB)から抽出した特徴量を組み合わせ、判別性能の調査を行った。MFCCは、音声認識に一般的に使用されている特徴量である。一方、聴覚フィルタバンクは人間の聴覚特性を模擬しており、先行研究により、音声モーフィングのための声道長正規化に有効であることがわかっている。声道長と人間の身長には相関があることから、聴覚フィルタバンクの導入は若年話者判別にも有効であると考えられる。
日本語スピーキングテストS-CATにおける並列セグメンテーションを用いた自動採点の検討

西村竜一, 栗原理沙, 篠崎隆宏, 石塚賢吉, 山田武志, 今井新悟, 河原英紀, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2012 2012
RJ-005 An improvement of an adult and child identification method for spoken dialog systems

Miyamori Shoko, Nisimura Ryuichi, Irino Toshio, Kawahara Hideki

情報科学技術フォーラム講演論文集 ( Forum on Information Technology ) 10 ( 3 ) 37 - 40 2011.09
Cross synthesis vocoder that preserves both speech intelligibility and instruments' timbre

西大輝, 西村竜一, 入野俊夫

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 41 ( 6 ) 463 - 468 2011.08
An excitation structure extraction for voiced sounds with multiple periodicity and its application to pathological voices

WADA Yoshika, MORISE Masanori, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 111 ( 175 ) 81 - 86 2011.08

　View Summary

A new excitation source information analysis method, called XSX (eXcitation Structure extractor) has been investigated to analyze voices with complex excitation behavior; such as singing voices, pathological voices, emotional voices and so on. This article illustrates advantages of XSX over existing PDAs (pitch determination algorithms) and introduces prospective applications. A comparative study with YIN and SWIPE, two well know PDAs, using a harmonic multiple sinusoids with a common frequency modulated fundamental component was conducted and revealed that XSX has superior response to the modulation frequency. Detailed analyses using XSX were also conducted for pathological voices, which displayed large discrepancies between results by XSX and other PDAs. The analyses by XSX clearly indicated that subharmonics by coupling multiple basic periods are sometimes more prominent than the usual fundamental components. These results and advantages illustrates that XSX is useful for analyses of voices with complex behavior, which makes analyses by existing PDAs impractical.
Cross synthesis vocoder that preserves both speech intelligibility and instruments' timbre

NISHI Taiki, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 111 ( 175 ) 87 - 92 2011.08

　View Summary

TANDEM-STRAIGHT, an F0-adaptive spectral envelope extraction procedure was applied to cross synthesis VOCODER, which synthesizes sounds by mixing features of two input sounds, such as speech and musical instruments or animal voices. A set of tests with FIR implementation of time-varying filter illustrated potential improvements of intelligibility by using STRAIGHT spectrum of speech sounds, but at the same time, introduced deterioration of instruments' characteristic timbre. A new cross synthesis framework using deviation spectrum of speech sounds and minimum phase implementation of time varying filter was proposed to solve this problem. Preliminary tests suggested that the proposed method reduces this deterioration while preserving intelligibility.
Estimation of vocal tract length ratio using auditory filterbank

OKAMOTO Erika, IRINO Toshio, NISIMURA Ryuichi, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 111 ( 153 ) 11 - 16 2011.07

　View Summary

Vocal tract length normalization (VTLN) is an important issue in speech applications, such as automatic speech recognition and high-quality voice morphing. Individual spectral differences are primarily dependent on vocal tract length differences.They are also dependent on glottal source signal and the shape of pyriform fossa. This paper propose a new method for vocal tract length (VTL) estimation and normalization based on a gammachirp auditory filterbank (GCFB). VTLratios were estimated based on spectral distances between the same sentence spoken by 2 speakers. The calculation was carried out for all permutations of 28 speakers (_<28>P_<27> =756). Then the estimated error was calculated by the regression analysis. VTL estimation using the mel-frequency filterbank (MFFB), which is a preprocessor for calculating MFCCs commonly used in ASR, the gammatone fileterbank(GCFB) and the gammachirp filterbank(GCFB). The results indicated that the proposed GCFB-based VTL estimation outperforms the MFCC-based and the GTFB-based methods in the objective evaluations.
外部知識としてウェブを用いた3-gram言語モデル拡張手法の検討

西村竜一, 島田敏明, 田中雅康, 河原英紀, 入野俊夫

第73回全国大会講演論文集 2011 ( 1 ) 75 - 76 2011.03

　View Summary

大語彙連続音声認識の精度向上の為、ウェブを用いた3-gram言語モデルの拡張手法に関して報告する。3-gramモデルにおいて、学習コーパスに存在しない未観測3-gramの確率値を推定する手法として、バックオフが従来から用いられている。内包的な確率推定手法であるバックオフが広く普及する一方、本研究のように、外部のデータベースを用いた未観測3-gramの確率推定の手法も存在する。本発表では、外部データベースとしてGoogleデータベースを用いた場合の未観測3-gram確率推定法に関して、従来のバックオフ手法との比較を中心に報告する。
Temporally static representation of phase related quantity for periodic signals

KAWAHARA Hideki, MORISE Masanori, IRINO Toshio

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 110 ( 297 ) 47 - 51 2010.11

　View Summary

An averaged power spectrum, which is calculated from two power spectra using two time windows a half pitch-period apart, does not depend on the relative phase between the analyzed signal and the windows. This article introduces a procedure to calculate instantaneous frequency, which yields temporally static representation of the instantaneous frequency of periodic signals. The proposed method is derived from the famous Flanagan's equation. Specifically, a power weighted average of instantaneous frequencies, which are calculated using the Flanagan's equation, yields this temporally static representation. A proof of the independence of the proposed representation on the relative phase between the analyzed signal and the windows is presented assuming weak conditions on the windowing function. Performance evaluation tests are conducted for popular windowing functions and their results are discussed.
E-012 Investigations of Real Environmental Child Speech Collected by Voice Web System

Kurihara Lisa, Nisimura Ryuichi, Miyamori Shoko, Kawahara Hideki, Irino Toshio

情報科学技術フォーラム講演論文集 ( Forum on Information Technology ) 9 ( 2 ) 229 - 230 2010.08
J-006 An investigation of child user identification based on speech recognition of a short sentence

Miyamori Shoko, Nisimura Ryuichi, Kurihara Risa, Irino Toshio, Kawahara Hideki

情報科学技術フォーラム講演論文集 ( Forum on Information Technology ) 9 ( 3 ) 469 - 472 2010.08
Comlementing 3-gram information using the Google Japanese N-gram database and term weighting

SHIMADA TOSHIAKI, NISIMURA RYUICHI, KAWAHARA HIDEKI, IRINO TOSHIO

研究報告音声言語情報処理（SLP） ( 情報処理学会 ) 2010 ( 19 ) 1 - 6 2010.07

　View Summary

単語 3-gram モデルは，テキストコーパスから統計的手法に基づいて構築される．しかし，テキスト量が少ないと統計量を正しく算出できない．そこで本研究では，Google N-gram データに含まれる 3-gram エントリを用いて，3-gram 情報の補完を行った．3-gram エントリを選別せず補完すると，3-gram エントリ数が爆発的に増加する問題が発生する．そこで，提案手法では TF・IDF 指標と Yahoo! 関連キーワードから算出した単語重要度に基づき，追加する 3-gram エントリを選別した．これにより，重要性の低い 3-gram エントリの追加と，エントリ数の爆発的増加を防ぐ事が出来た．評価では，CSJ コーパスを用いて認識実験を行った．その結果，補完前より単語正解精度において 1.64% の向上が得られた．We have developed a method that utilizes the Google N-gram database to complement 3-gram entries in a language model. Our aim was to improve the accuracies of LVSR systems even when a 3-gram model trained on short texts is being used. This method is based on 3-gram occurrence information in external web documents and consists of three main steps. First, 3-gram entries are searched in the Google database. Secondly, 3-gram appearance counts are normalized on the basis of the ratio of total number of 3-gram entries. Lastly, 3-gram entries are selected on the basis of keywords. To prevent the addition of redundant or not relevant entries, 3-gram entries without a keyword are excluded to calculate 3-gram probabilities. The keywords were composed by measuring the TF-IDF weights and employing the web API of Yahoo! Japan. Experimental results confirmed 1.64% improvement in a recognition accuracy using the CSJ Corpus.
Optimization of excitation structure extraction based on objective evaluation using speech-like test signals

WADA Yoshika, ITAGAKI Hanae, MORISE Masanori, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 110 ( 71 ) 77 - 82 2010.06

　View Summary

Investigations on analysis and synthesis of expressive voice, such as "husky" and "hoarse" voices, which are typically found in emotional speech and singing are presented. Such voice usually has complex excitation structures which are not readily represented by a single number, F0. This article introduces optimization of system parameters and evaluation of our new analysis procedure called XSX (eXcitation Structure eXtractor), designed for such complex excitation signals. Pseudo speech signals are made from complex tones with FM and/or AM depending on the experimental design. They have a spectral slope similar to natural voiced sounds and do not have formant structure. The proposed method, XSX consists of two subsystems; an integrated periodicity detector which extracts simultaneous multiple periodicity candidates and a frequency refinement procedure based on instantaneous frequency of FO and harmonic components. Firstly, the candidate detector is optimized followed by the optimization of the refinement procedure. Secondly, comparative test with conventional F0 extractors were conducted and revealed that the proposed method outperforms those procedures in terms of accuracy and tracking speed.
Relevant Frequency band for vocal track length normalization based on spectral distance

OKAMOTO Erika, ASAKA Yoshiki, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 110 ( 71 ) 83 - 88 2010.06

　View Summary

Normalization of speaker dependent spectral differences is an important issue in speech applications, such as automatic speech recognition and high-quality voice morphing. Individual spectral differences are primarily dependent on vocal tract length differences. They are also dependent on glottal source signal and the shape of pyriform fossa. This article investigates effects of frequency range selection on spectral distance-based vocal tract length normalization (VTLN). It is based on an idea that the best VTLN performance can be attained by selecting frequency region where spectral differences are virtually exclusively determined by differences of vocal tract length. All combination of utterances spoken by 28 subjects were used to calculate estimates of their relative vocal tract lengths, which are used as the tentative "true" lengths to evaluates deviation of each VTL ratio estimation based on spectral distances. The test results revealed that the best performance is yielded by selecting frequency region spanning from 400 Hz to 4000 Hz, using an integrated logarithmic spectral distance using outputs of MFCC filter bank and their frequency derivatives.
Auditory filter shape from temporal masking curves and notched-noise data,

Toshio Irino, Nozomi Shimoshio, Hiroki Takahashi, Hideki Kawahara, Roy Patterson

Auditory Features Workshop, Equipe Audition, DEC, Ecole normale supérieure, France 2010.06

　View Summary

1 & 3 Jun., 2010 (発表日 3 Jun)
A proposal of children discrimination based on web collected utterances

MIYAMORI Shoko, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

全国大会講演論文集 72 285 - 286 2010.03
Representation and estimation of aperiodic components in voiced sounds for high-quality analysis-synthesis systems

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 109 ( 451 ) 99 - 104 2010.02

　View Summary

Mixed-mode excitation is crucially important and effective for high-quality speech analysis, modification and resynthesis systems. However, there are several incompatible constraints in representation and estimation of aperiodic component in the mixed-mode excitation. The current implementation of aperiodic component provides an answer for estimation problem at the expense of complicated representation which prevents ease of applications. This article proposes an aperiodic component spectral model that consists of exponential nonlinearity and a sigmoid. Although the proposed model is still in a preliminary phase and needs verifications based on variety of speech sounds, the proposed model seems to represent aperiodic components in a highly efficient manner. Informal listening tests also suggested that the proposed model provides better synthesized speech quality.
fMRI study on brain regions for scale and pitch processing for speech signal

塚田裕樹, 入野俊夫, 大屋義和

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 39 ( 7 ) 531 - 536 2009.11
Invited lecture: Measurement and formulation of the auditory filter

入野俊夫

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 39 ( 6 ) 413 - 418 2009.10
E-038 Proposal of safety web systems using adult and child voice discriminations

Miyamori Shoko, Nishimura Ryuichi, Suzuta Kentaro, Kawahara Hideki, Irino Toshio

情報科学技術フォーラム講演論文集 ( Forum on Information Technology ) 8 ( 2 ) 343 - 344 2009.08
Web-based adult and child voice collection to develop a voice-oriented web filtering service

NISIMURA RYUICHI, MIYAMORI SHOKO, SUZUTA KENTARO, KAWAHARA HIDEKI, IRINO TOSHIO

研究報告音声言語情報処理（SLP） ( 情報処理学会 ) 2009 ( 19 ) 1 - 6 2009.07

　View Summary

本研究では，利用者の年齢層を発話音声から自動推定し，子どものアクセスを制限するウェブフィルタリングサービスの開発を目指す．今回，提案システムの実現に向けて，(1) 音声ウェブシステム w3voice を用いた大人・子ども発話のネットワーク収集実験，(2) GMM 音響モデルを用いた若年者自動判別の予備的実験を行った．発話収集の実験では，389 名の被験者の実環境発話 1,109 を集めることに成功した．発話を分析した結果，大人と子どもで，発話内容に異なる言語的傾向があることを確認した．また，GMM 音響モデルを用いた 14 歳以下の子どもの検出実験では正解率 65.9% を得た（大人の検出も含めると正解率 82.6%）．This study aims at developing a voice-based web filtering service to restrict children from the harmful websites. It is based on an automatic estimation of an age group from their voices. To realize it, we have performed (1) a collection of adult and child voices using voice-enabled web system "w3voice", and (2) an experiment of young voice detection on the basis of GMM-based acoustic recognition. In the experiment of the utterance collection, we succeeded in the collection of the 389 testees' real environmental 1,109 utterances. It was confirmed that there was the difference of language tendencies between adults and children as a result of analyzing the utterances. In the experiment on 14-years-old or younger child detection, 65.9% correct rate was obtained.
Simultaneous fitting to notched noise and compression data using the compressive gammachirp auditory filter

入野俊夫, 高橋弘樹, 河原英紀

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 39 ( 4 ) 283 - 288 2009.06
Representation of repetitive structures in speech and its application to F0 and aperiodicity extraction

ITAGAKI Hanae, MORISE Masanori, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 109 ( 100 ) 91 - 96 2009.06

　View Summary

A bottom up procedure for extracting repetitive structures in speech sounds is proposed based on a temporally stable representation of periodic sounds (TANDEM) and adaptive spectral smoothing for normalization (STRAIGHT). The proposed method evaluates local periodic structure in the frequency domain for detecting repetition in the time domain. A group of dedicated periodicity detectors are combined to construct the proposed procedure for repetitive structure extractor and called XSX (eXcitation Structure eXtractor). The proposed procedure is tested using a set of stylized test signals with artificial shimmer and jitter to investigate applicability for such aperiodic signals. The test results indicated that the proposed procedure outperformed in descriptive power of those complex excitation modes over existing F0 detectors. Finally, the proposed procedure is applied to analyze pathological voice examples to investigate feasibility of voice quality restoration applications.
Simultaneous fitting to notched noise and compression data using the compressive gammachirp auditory filter

IRINO Toshio, TAKAHASHI Hiroki, KAWAHARA Hideki, PATTERSON Roy D.

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 109 ( 100 ) 67 - 72 2009.06

　View Summary

It is important to estimate precisely the frequency selectivity (filter shape) and the compression characteristics of the human auditory filter in the development of perceptual models for speech and acoustic signals. In the current study, we measured both of masked thresholds by notched noise experiments and an input-output function by forward masking experiment for each individual normal-hearing listener. The compressive gammachirp (cGC) filter was used for simultaneous fitting to the notched noise data and the input-output function. We demonstrated that it is possible to distinguish the common characteristics across the listeners from the individual differences in a set of parameters of the cGC filter.
Non-linguistic subjective evaluation of timbre based on audio-visual integration

NISHIDA Saori, MORISE Masanori, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 109 ( 100 ) 49 - 54 2009.06

　View Summary

Perceptually relevant representations of timbre using two dimensional shapes are investigated aiming at establishing a framework for sound visualization based on human perception characteristics. A preference test of matching shapes to sounds was conducted using eleven sound stimuli having different prototypical power spectra and nine shapes. The results indicated that matching shapes were clearly divided into two classes depending on periodicity of the presented sounds. Perceptual correlates of shape selection were seemingly based on complexity and sharpness, while they are only subjectively defined. A set of objective descriptors of shapes based on complex number representation of their contours were introduced for further investigations on physical correlates of MDS results. These investigations indicated that normalized square root of area ratio to contour length and kurtosis have reasonable correlations with MDS axes.
Effects of spectral envelope representations on resynthesized speech quality

AKAGIRI Hayato, ONISHI Masato, MORISE Masatoshi, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 109 ( 99 ) 63 - 68 2009.06

　View Summary

A speech analysis, modification and synthesis method TANDEM-STRAIGHT consists of two key components; a) temporally independent power spectral estimation for periodic signals: TAMDEM and b) F0 adaptive spectral smoothing based on consistent sampling theory. The second component employes two approximations for implementing its function. The first approximation is truncation of theoretically infinite number of compensating digital filter coefficients. The second approximation is to use log(1+x) instead of using x, because they are virtually similar provided |x|≪1 holds. This assures positivity of spectral envelope. This report investigates effects of these approximations by using subjective tests of resynthesized voiced sounds as well as objective tests based on spectral distance measure. The tests indicated that the resynthesized sounds by both method have equivalent quality of 40 to 50 Q value of MNRU, that is reasonably high. The test also indicated that the resynthesized sounds by legacy-STRAIGHT tend to have higher sound quality than those by TANDEM-STRAIGHT. These subjective results are consistent with the objective results based on the peak weighted spectral distance measure with frequency weighting, suggesting that there exists a room for further quality improvement of TANDEM-STRAIGHT.
Sound quality improvement based on vocal tract length normalization in simplified speech morphing

ASAKA Yoshiki, NISHIDA Saori, AKAGIRI Hayato, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 109 ( 99 ) 69 - 74 2009.06

　View Summary

Need of careful manual placement of anchoring points is the major obstacle for application of current speech morphing based on STRAIGHT. This obstacle can be partially removed by normalizing vocal tract lengths (VTL) of speakers involved in morphing. Auditory inspired spectral distance measures are used to find the best normalizing ratio of VTLs. Preliminary subjective tests indicated that the proposed method improves perceptual quality of the morphed speech sounds. It was also suggested that introducing additional vocal tract shape parameter may be useful for improving quality further.
Interface design for TANDEM-STRAIGHT and temporally variable speech morphing study

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 108 ( 465 ) 51 - 56 2009.02

　View Summary

This article introduces background and design principles of a set of graphical user interfaces to promote research on various aspects of speech processing frameworks, which were made possible by our new algorithms based on TANDEM-STRAIGHT. It is also intended to make new algorithms accessible to researchers with wider range of backgrounds and to acquire their feedback and to accelerate algorithm development itself. Speech morphing that is capable of handling temporally variable multi-aspect morphing rates, and vowel-based speech conversion are representative examples of such new processing frameworks. These algorithms are taking advantages of theoretical transparency and computational efficiency of TANDEM-STRAIGHT, which completely replaced internal algorithms of legacy-STRAIGHT.
Effects of time-frequency parameters of auditory stimuli and shape parameters of visual stimuli on audio-visual integration - Toward music visualization system based on perceptual structure -

NISHIDA Saori, MORISE Masanori, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2009 ( 13 ) 65 - 70 2009.02

　View Summary

An audio-visual integration test was conducted to investigate innate correspondence between sounds and shapes. Seven typical sound stimuli including periodic sounds and aperiodic sounds as well as musical instrumental sounds were presented followed by a pair of shapes. Subjects were asked to select one of shape that fit better with the preceding sound stimulus. MDS analyses of the results suggested that there seem to exist a common perceptual structure between vision and audition.
Singing morphing extension to temporally varying parameters for realtime morphing control interface

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2008 ( 127 ) 91 - 96 2008.12

　View Summary

Reuse of performance design in singing requires temporally localized manipulations of singing style, voice quality and expressions. They can be done in realtime such as in live concert scenes or can be done in off-line such as in the post production editing or recorded materials. A new framework is introduced to extend TANDEM-STRAIGTH-based morphing with a temporally variable multi-dimensional morphing rate and formulated. This formulation provides solid basis for implementing five morphing parameters (fundamental frequency, aperiodicity, STRAIGHT spectrogram, time and frequency axes) on each time-series independently. This formulation is based on interpolation of logarithmic derivative of transformation functions and enables extrapolative morphing without quality breakdown found in our previous formulations. The proposed method is easily extended to multiple exemplar morphing because the formulation is symmetric for each exemplar utterance.
Comparison between perception of degraded sound and result of speech recognition

森本隆司, 入野俊夫, 西村竜一

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 38 ( 8 ) 803 - 808 2008.12
Parameter optimization for a fundamental frequency extractor based on TANDEM-STRAIGHT

ITAGAKI Hanae, MORISE Masanori, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 108 ( 337 ) 155 - 160 2008.12

　View Summary

A fundamental frequency extractor based on a temporally stable power spectral representation for periodic signals (TANDEM spectrum) and a spectral envelope derived from the representation (STRAIGHT spectrum) is proposed. This article describes roles of system parameters of the proposed method and their effects on system performance and reports results of preliminary optimization of them. System parameters investigated are; number of harmonic component for detecting hypothesized periodicity peak and weighting width on the log-lag domain for integrating specialized individual F0 detectors. Detailed descriptions of these parameters and their impact on F0 extraction performance are presented and they were further investigated using a database consisting of simultaneous recording of speech and EGG (electroglottogram) signals. Test results indicated that the proposed method has comparable performance with F0 extractors used in STRAIGHT and other popular F0 extractors such as YIN, when three harmonic components are used and weighting with a width of 1/√<2> of the center lag is used.
Aperiodicity extraction based on linear prediction and temporal axis warping using fundamental frequency information

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 108 ( 337 ) 85 - 90 2008.12

　View Summary

A reliable aperiodicity extractor is crucial for high-quality speech manipulation systems. This article proposes a new extractor based on a critical review on conventional methods (mainly on our previous proposals) and fundamental issues. The proposed method uses forward and backward linear predictors with lags around fundamental period and consists of an instantaneous fundamental frequency-based temporal axis warping. The extractor also consists of Quadrature Mirror Filter for frequency band division to control TB (time-bandwidth) product for reliable estimates. Combination of multiple clues extracted using the original and the manipulated time axes yields reliable and efficient estimates of aperiodicity spectrogram.
Parameter optimization for a fundamental frequency extractor based on TANDEM-STRAIGHT

ITAGAKI Hanae, MORISE Masanori, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2008 ( 123 ) 155 - 160 2008.12

　View Summary

A fundamental frequency extractor based on a temporally stable power spectral representation for periodic signals (TANDEM spectrum) and a spectral envelope derived from the representation (STRAIGHT spectrum) is proposed. This article describes roles of system parameters of the proposed method and their effects on system performance and reports results of preliminary optimization of them. System parameters investigated are; number of harmonic component for detecting hypothesized periodicity peak and weighting width on the log-lag domain for integrating specialized individual F0 detectors. Detailed descriptions of these parameters and their impact on F0 extraction performance are presented and they were further investigated using a database consisting of simultaneous recording of speech and EGG (electroglottogram) signals. Test results indicated that the proposed method has comparable performance with F0 extractors used in STRAIGHT and other popular F0 extractors such as YIN, when three harmonic components are used and weighting with a width of 1/√<2> of the center lag is used.
Aperiodicity extraction based on linear prediction and temporal axis warping using fundamental frequency information

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2008 ( 123 ) 85 - 90 2008.12

　View Summary

A reliable aperiodicity extractor is crucial for high-quality speech manipulation systems. This article proposes a new extractor based on a critical review on conventional methods (mainly on our previous proposals) and fundamental issues. The proposed method uses forward and backward linear predictors with lags around fundamental period and consists of an instantaneous fundamental frequency-based temporal axis warping. The extractor also consists of Quadrature Mirror Filter for frequency band division to control TB (time-bandwidth) product for reliable estimates. Combination of multiple clues extracted using the original and the manipulated time axes yields reliable and efficient estimates of aperiodicity spectrogram.
Investigation of temporal factors affecting speaker-size discrimination using isolated vowels with size scaling

竹島千尋, 津崎実, 入野俊夫

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 38 ( 6 ) 633 - 637 2008.10
Comparison of the cortex for CV and VC syllables in Japanese and English subjects

大屋義和, 入野俊夫, Hervais-Adelman Alexis G.

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 38 ( 6 ) 597 - 602 2008.10
E-023 A method to update ASR lexical information using Web resources

Suzuta Kentaro, Nisimura Ryuichi, Kawahara Hideki, Irino Toshio

情報科学技術フォーラム講演論文集 ( Forum on Information Technology ) 7 ( 2 ) 189 - 190 2008.08
Size discrimination and recognition for acoustically scaled versions of naturally pronounced and whispered speech words

青木良枝, 入野俊夫, Patterson Roy D.

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 38 ( 5 ) 507 - 512 2008.08
解説記事劣化合成音声を用いた模擬難聴の基礎検討

Toshio Irino

Telecom Frontier ( テレコム先端技術研究センター刊) ( 60 ) 4 - 13 2008.08
F0 extraction based on the zero frequency filtered signal method and its application to TANDEM-STRAIGHT

KAWAHARA Hideki, MORISE Masanori, BANNO Hideki, ITAGAKI Hanae, ONISHI Masato, NISIMURA Ryuichi, IRINO Toshio

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2008 ( 78 ) 97 - 102 2008.07

　View Summary

An event based f0 extraction method based on so called zero frequency filtering method was proposed by Yegnanarayana for representing Indian stop consonants. The proposed method uses unstable IIR filters that place four poles at zero frequency and at the same time employs local mean subtracting filters to stabilize its output. This simple method was reported to run extremely fast and has comparative performance with existing F0 extractors. This article reports on a follow-up implementation of the method and evaluations and investigations for its performance and characteristics having its applicability to TANDEM-STRAIGHT and real time STRAIGHT in mind. The results indicated that the proposed method runs 7 times faster than real time with Matlab implementation on a standard laptop PC. It was also found that the gross error rate was 0.55% which is somewhat worse than the most recent methods but still reasonably high for practical applications. Finally, temporal resolution finer (namely 1/3) than instantaneous frequency based methods was also demonstrated.
Size discrimination and recognition for acoustically scaled versions of naturally pronounced and whispered speech words

AOKI Yoshie, IRINO Toshio, PATTERSON Roy D., KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 108 ( 179 ) 35 - 40 2008.07

　View Summary

We have suggested that the auditory system can extract the size information and separate it from vocal-tract shape information. For example, humans can extract the message from the voices of adult and child without being confused by the size information, and they can extract the size information without being confused by the message. There were several size perception experiments about acoustically scaled vowels, syllables, musical instruments and animal voice. In this paper, we extended the size perception experiments to naturally spoken and whispered speech words to demonstrate that size perception is robust to the variation in the utterance (voiced and whispered). This results show that the size discrimination JND of both of voiced and whispered speech is almost the same and the recognition performance remains good beyond the normal range.
Improving accuracy in spectral envelope estimation based on TANDEM-STRAIGHT Recovery of higher spatial frequency components exceeding Nyquist limit posed by the fundamental frequency

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

電子情報通信学会技術研究報告. SP, 音声 108 ( 116 ) 19 - 24 2008.06
Effects on perceived impression of manipulated speech using a simplified morphing procedure based on STRAIGHT

NISHIDA Saori, ONISHI Masato, YOSHIDA Yuri, MORISE Masanori, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2008 ( 50 ) 43 - 48 2008.05

　View Summary

A morphing procedure only relies on temporal axis alignment was tested subjectively in terms of naturalness and speakers' identity. Effects of contributing factors were investigated regarding on test words, morphing rates and used speakers. Naturalness of the morphed speech was deteriorated when the morphing rate nears 50%. Identification of mixing rate of two speakers was about 60% when the morphing rate is 25%, 50% or 75%. Naturalness of the morphed speech sounds were found higher when speakers' sex was identical while mixing rate identification were lower. These results suggest that the proposed simplified procedure is practically usable for morphing speakers having the same sexual distinction.
A New implementation technique for building ASR applications based on voice-enabled Web systems

NISIMURA Ryuichi, MIYAKE Jumpei, KAWAHARA Hideki, IRINO Toshio

全国大会講演論文集 70 343 - 344 2008.03
Improvement of real-time STRAIGHT and implementation of STRAIGHT library

BANNO Hideki, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 107 ( 551 ) 157 - 162 2008.03

　View Summary

This paper describes improvement of real-time STRAIGHT and implementation of STRAIGHT library. STRAIGHT is a high quality speech analysis, modification and synthesis system based on the VOCODER-type representation. STRAIGHT is currently finding wide applications such as speech synthesis systems and tools for auditory experiments. However, the current implementation of STRAIGHT by MATLAB does not fit to real-time applications. Thus, we have been porting the language of the source code to the C language, and finally have finished the porting from the MATLAB latest version to C version. The real-time STRAIGHT using the ported functions was subjectively evaluated by the mean opinion scores (MOS). The MOS of the improved real-time STRAIGHT is approximately 0.7 point better than that of the previous version of the real-time STRAIGHT. We have also implemented the STRAIGHT library including STRAIGHT API for C language.
Development of versatile speech synthesis technology based on STRAIGHT

KAWAHARA Hideki, ONISHI Masato, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, BANNO Hideki, IRINO Toshio

全国大会講演論文集 70 357 - 358 2008.03
AS-5-1 THE POWER SPECTRUM ESTIMATION FOR PERIODIC SIGNAL BASED ON TIME AVERAGING

Morise Masanori, Takahashi Toru, Kawahara Hideki, Irino Toshio

Proceedings of the IEICE General Conference ( The Institute of Electronics, Information and Communication Engineers ) 2008 "S - 48"-"S-49" 2008.03
F0 trajectory deviations from nominal musical transcription in Pop singing

YOSHIDA Yuri, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2008 ( 12 ) 13 - 18 2008.02

　View Summary

A reformulation of STRAIGHT FO extractor based on a new power spectrum estimation method for periodic signals called TANDEM made it practical to extract whole FO trajectory of a singing voice of an actual performance. This article reports a first attempt for representing effects of singing style in terms of deviations from a nominal musical transcription, using a singing database that consists of various types of singing performance played by professional pop singers. FO extraction issues caused by fast FO transitions commonly found in singing voices are also discussed.
聴覚系における共鳴体の「大きさ」知覚の時間追随性 : 寸法変調音声を用いた検討(日本基礎心理学会第26回大会,大会発表要旨)

竹島千尋, 津崎実, 入野俊夫

基礎心理学研究 ( 日本基礎心理学会 ) 26 ( 2 ) 213 - 214 2008

DOI
Fundamental frequency estimation based on TANDEM-STRAIGHT and its evaluation

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, BANNO Hideki, IRINO Toshio

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2007 ( 129 ) 259 - 264 2007.12

　View Summary

TANDEM method, a power spectrum estimation method for periodic signals was proposed to provide temporally stable representation and has been applied to reformulate STRAIGHT, a system for speech analysis modification and synthesis. This article proposes a fundamental period estimation method based on a ratio between TANDEM spectrum and STRAIGHT spectrum. By providing specialized F0 detectors for multiple F0 candidates and integrating individual clues, the proposed method selectively detects fundamental components and yields a probability measure for each estimate. It also provides a method to estimate aperiodicity in each frequency band by making use of estimated fundamental frequency information to design a quadrature signal on the frequency axis for filtering periodic spectral component due to the signal periodicity. The proposed method is capable of representing pathological speech signals more precisely than conventional methods.
Fundamental frequency estimation based on TANDEM-STRAIGHT and its evaluation

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, BANNO Hideki, IRINO Toshio

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 107 ( 406 ) 259 - 264 2007.12

　View Summary

TANDEM method, a power spectrum estimation method for periodic signals was proposed to provide temporally stable representation and has been applied to reformulate STRAIGHT, a system for speech analysis modification and synthesis. This article proposes a fundamental period estimation method based on a ratio between TANDEM spectrum and STRAIGHT spectrum. By providing specialized F0 detectors for multiple F0 candidates and integrating individual clues, the proposed method selectively detects fundamental components and yields a probability measure for each estimate. It also provides a method to estimate aperiodicity in each frequency band by making use of estimated fundamental frequency information to design a quadrature signal on the frequency axis for filtering periodic spectral component due to the signal periodicity. The proposed method is capable of representing pathological speech signals more precisely than conventional methods.
Speaker size discrimination for acoustically scaled versions of naturally spoken words

青木良枝, 入野俊夫, Patterson Roy D.

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 37 ( 10 ) 787 - 792 2007.12
Perception of degraded word sounds from the monosyllable sequence

森本隆司, 入野俊夫, 河原英紀

聴覚研究会資料 ( 日本音響学会聴覚研究委員会 ) 37 ( 10 ) 775 - 780 2007.12
w3voice: Development of Speech Input Method for Voice-enabled Web Applications

NISIMURA Ryuichi, MIYAKE Jumpei, KAWAHARA Hideki, IRINO Toshio

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2007 ( 103 ) 13 - 18 2007.10

　View Summary

We have developed a speech input method called "w3voice" to build practical and handy voice-enabled Web applications. It is constructed using a simple Java applet and CGI programs comprising free software. The mechanism of voice-based interaction is developed on the basis of raw audio signal transmissions via the POST method and the redirection response of HTTP. We have released a number of w3voice applications on our website for public uses. The system also aims at organizing a voice database obtained from home and office environments. We have succeeded in acquiring 8, 412 inputs (47.9 inputs / day) over a period of seven months. This report describes an overview of the proposed system, and results of analyzing collected inputs to observe utterance lengths and SNR.
Vowel-based speech conversion using generalized inverse

ONISHI Masato, TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 107 ( 282 ) 75 - 80 2007.10

　View Summary

A vowel-based voice conversion method using generalized inverse was proposed. The proposed method uses vowel information only to design spectrum conversion function for each frame. The conversion function is generated by mixing each function designed for each vowel based on the similarity of the current frame to each vowel template. The proposed method was compared with our previous proposal where Gaussian potential function of distance to each template was used to calculate similarity. The proposed method enables geometrical interpretation of a mixing weight as a minimum norm to the subspace spanned by vowel templates. A preliminary test results using objective as well as subjective measure were presented.
E-041 Automatic mapping function designing method modeled by segmental linear function for auditory morphing

Takahashi Toru, Ohnishi Masato, Morise Masanori, Banno Hideaki, Kawahara Hideki, Irino Toshio

情報科学技術フォーラム一般講演論文集 ( Forum on Information Technology ) 6 ( 2 ) 233 - 236 2007.08
E-072 Public Open Tests of Interactive Speech-oriented Web applications

Nishimura Ryuichi, Miyake Junpei, Kawahara Hideki, Irino Toshio

情報科学技術フォーラム一般講演論文集 ( Forum on Information Technology ) 6 ( 2 ) 319 - 322 2007.08
A temporal and frequency interference-free power spectral representation of periodic signals : Toward STRAIGHT spectral estimation without tunable component

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, IRINO Toshio, BANNO Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 107 ( 165 ) 13 - 18 2007.07

　View Summary

A new spectral estimation procedure which does not have interferences due to periodicity both in the time and the frequency domain is proposed. The basic form of the proposed method has only a few tunable parameters once the fundamental frequency of the signal under inspection is given. This is strong contrast to the current implementation of STRAIGHT, where many parameters were tuned numerically or in an ad hoc manner. Time domain interference is eliminated by adding power spectra that is calculated by a pair of windows that is separated one half of the fundamental period. Frequency domain interference is eliminated by combining power spectrum integration and linear interpolation based on an approximation-based interpretation of the sampling theory. The proposed method can be use to replace current spectral estimation subsystem of STRAIGHT and suitable for realtime processing.
Computational theory of auditory size-shape information extraction and the localization in the brain

IRINO Toshio, OOYA Yoshikazu, KAWAHARA Hideki, PATTERSON Roy D.

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 107 ( 92 ) 11 - 16 2007.06

　View Summary

Although perception of size and shape from visual stimuli has been studied intensively as an important topic, perception of size and shape from auditory stimuli has been almost unaware in the auditory research field. In this report, we describe size and shape information in acoustic signals and a computational theory to extract the information in the auditory system. We also present experimental studies to support the theory, optimality of the auditory filters based on the theory, and ecological perspectives implied from the theory. We performed fMRI experiments to identify the location of the size-shape perception in the brain. We report the preliminary results and issues.
Applying Speech Transformation function derived from Speech Texture Mapping to Automatic Speech Morphing : An application of voice texture mapping

TAKAHASHI Toru, MORISE Masanori, OHNISHI Masato, NISIMURA Ryuichi, IRINO Toshio, BANNO Hideki, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 107 ( 77 ) 31 - 34 2007.05

　View Summary

A general framework for speech morphing is proposed based on a concept called speech texture mapping. The proposed method eliminates anchoring point assignment that is a severe obstacle for adopting STRAIGHT-based morphing to wide range of applications. Instead of using anchoring points to design the frequency axis mapping, proximity to prototypical spectrum templates are used to calculate weighting coefficients for mixing prototypical mapping functions. This framework is an extension of our previous vowel based speech conversion method. Discussions on several alternative temporal axis alignment methods are presented how the proposed frequency axis design procedure is integrated into a morphing procedure that does not rely on anchoring point assignment.
Speaker conversion system based on vowels : An implementation of voice texture mapping

TAKAHASHI Toru, MORISE Masanori, NISIMURA Ryuichi, IRINO Toshio, BANNO Hideki, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 106 ( 613 ) 13 - 18 2007.03

　View Summary

A simple and high-quality voice conversion procedure only depends on vowel information is proposed. It is based on a framewise conversion of frequency axis, fundamental frequency and global spectral and aperiodicity information using posterior probability as weighting function for calculating mapping function. The proposed method is an implementation of a concept called "speech texture mapping" that was proposed by one of the authors. The key idea which yields advantages of the proposed method is that the role and the relevant mapping function of detailed structure (refer as "texture") and global structure (refer as "framework") are different from each other. This clear distinction of "texture" and "framework" enabled a high-quality voice conversion requiring only a very small amount of training data. This distinction also provides a way to alleviate degradations due to "averaging" or "learning" processes which are indispensable in conventional voice conversion methods.
Acoustic event detection based on bandwise duration and its application to location estimation

MORISE Masanori, TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 106 ( 371 ) 19 - 24 2006.11

　View Summary

A highly accurate acoustic event detection method was proposed based on band wise group delay parameters and minimum phase compensation. These band wise parameters make it possible to select the best band to maximize reliability of the estimates. This is practically very useful because even in a low signal to noise condition, it is usually possible to select a band that has much better signal to noise ratio and yields far better estimates. This local improvement in signal to noise ratio enables accurate event detection. In this paper, an index representing amount of energy concentration was proposed as the parameter for event detection. A series of simulations provides relations between bandwidth and risk of detection errors for each time window length when using the proposed index. Relations between signal to noise ratio and accuracy of event timing estimats were also provided. Finally, applicaitons of the proposed method for three dimentional sound source localization was briefly discussed in terms of distributed acoustic sensors.
On perceptually relevant impulse response compensation : Discrimination threshold of group delay manipulation and its frequency dependency

MORISE Masanori, TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 106 ( 371 ) 13 - 18 2006.11

　View Summary

Discrimination thresholds of group delay modifications in deferent frequency regions were measured using a 2AFC paradigm. The tests were conducted to clarify acceptable errors in temporal structures in impulse response compensation. Taking advantage of this error tolerance, regularization algorithms which do not suffer from erroneous zeroes in measured transfer functions are being investigated. In this report, as the first step to attain this goal, series of tests using a pulse train was designed and conducted. The shape of group delay manipulation has a constant relative band width in terms of ERB_N and has various maximum delay values. The test results indicated that discrimination was poor in the lower frequency region namely lower than 1000Hz. For higher frequency region, it was indicated that the discrimination threshold is inversely proportional to the center frequency of the group delay manipulation. It was also found that threshold is smaller when group delay manipulation has negative peak value than the other case.
Application of auditory model based evaluations for parameter adjustments

FUKUDA Shunsuke, MORISE Masanori, KAWAHARA Hideki, IRINO Toshio

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 106 ( 371 ) 43 - 48 2006.11

　View Summary

A new framework for adjusting adaptive multiband equalizers based on a gammachirp filter bank (GCBF) that closely simulate nonlinear and adaptive frequency analysis in a human auditory system was proposed. The proposed framework is aiming at establishing a method for objective evaluation and optimization of sound reproduction inside a car. The goal of adjustment is to present comparable musical experience to ordinary listening room conditions. Analysis results of background noise, reproduced musical sounds and filtered and mixed these sounds using GCBF are presented with discussions.
解説記事音声研究から音聲研究へ(ちょっとしたエッセイ,コーヒーブレーク)

入野俊夫

日本音響学会誌 ( 一般社団法人日本音響学会 ) 62 ( 11 ) 834 - 834 2006.11
A study of analysis window for high-quality speech analysis, modification and synthesis system system STRAIGHT

TAKAHASHI Toru, MORISE Masanori, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 106 ( 222 ) 1 - 5 2006.08

　View Summary

A Blackman window-based complementary set of time windows is proposed instead of the current implementation of STRAIGHT speech analysis, modification and resynthesis system where a complementary set of pitch synchronized Gaussian window is used to eliminate temporal variations in power spectral calculation. Gaussian window was used in the original implementation of STRAIGHT because it has the identical form in the frequency domain and has the minimum uncertainty. However, those theoretical advantages are destroyed in the process of pitch synchronization where a pitch synchronous Bartlett window is convoluted with the original Gaussian window. It is more straightforward to use cosine-based time windows instead of he pitch synchronized Gaussian window because they are intrinsically pitch synchronous. Evaluations of the proposed window set using test signals consisting of multiple harmonic components with random phase and amplitudes revealed that the proposed Blackman-base window yields the best temporal variations in resulted power spectra.
Identification of size-modulated vowel sequences and temporal characteristics of the size extraction process

竹島千尋, 津崎実, 入野俊夫

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 36 ( 5 ) 439 - 443 2006.07
Identification of size-modulated vowel sequences and temporal characteristics of the size extraction process

TAKESHIMA Chihiro, TSUZAKI Minoru, IRINO Toshio

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 106 ( 178 ) 13 - 17 2006.07

　View Summary

We can identify vowels pronounced by any speaker, although he or she has different length of the vocal tract. At the same time, we can discriminate the difference of length of vocal tract. To simulate these abilities, a computational model has been proposed in which the size information is extracted and separated from the shape information In this paper, we investigated the temporal characteristic of this extracting process of the size information. In the first experiment, listeners were required to identify the size-modulated vowel sequences. The results showed deterioration of the performance for the rapid modulation. This deterioration could be explained by hypothesizing that a rapid change of size of the vocal tract causes the stream segregation. In the second experiment, listeners judged whether a target vowel exist or not in the sequences. The observed tendency also supported the segregation hypothesis.
Speech texture mapping : a general framework for flexible speech style conversion and synthesis

TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 105 ( 571 ) 31 - 36 2006.01

　View Summary

Speech texture mapping is proposed as a unified framework for speech manipulations; such as speaker conversion, speech morphing, emotional speech synthesis and so on. The proposed framework decomposes speech parametric representations into underlying structure (wireframe) and detailed feature and deviations (texture). For example, linguistic information may be attributed to the wireframe and individuality and emotional expressions may be attributed to the texture. In this interpretation, emotional speech conversion is represented as mapping of different texture on a common wireframe. This article also provides methods and examples for applying the proposed framework in varieties of speech conversion tasks.
Automatic assignment of anchoring points on vowel templates for speech morphing

NISHI Masashi, TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 105 ( 571 ) 19 - 24 2006.01

　View Summary

An automatic assignment of anchoring points for speech morphing is proposed. The original morphing procedure interpolates linear transformed parameters of two speech samples on a common time-frequency coordinate system by deforming one of the coordinates. This time-frequency deformation to align the coordinates has significant effects on the quality of the morphed speech sounds. The deforming function was defined by manually allocating anchoring points on the time-frequency representations of each speech sample. This manual allocation was a huge obstacle for using the proposed method in various applications because it is a time consuming tedious labor. This article describes methods to replace this process with an objective procedure. The anchoring points is composed of the frequency coordinates and the temporal coordinates. The central idea is to prepare vowel templates with pre-assigned anchoring points in advance and to deform one of the templates to match the input speech spectrum. As the result of this deformation the coordinates of the frequency anchoring points are obtained by those of the points on the deformed template. The optimum deformation is calculated using the DP (dynamic programming) procedure. The temporal coordinates of the anchoring points are defined using the phoneme labels annotated on the speech sample. A subjective test on the naturalness of the morphed speech sounds was conducted and revealed that the proposed method effectively provides highly natural morphed sounds.
A study on implementation of real-time STRAIGHT and the effect of parameter reduction

BANNO Hideki, HATA Hiroaki, TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 105 ( 571 ) 7 - 11 2006.01

　View Summary

This paper describes implementation of real-time STRAIGHT which is a high quality speech analysis, modification and synthesis system based on the VOCODER-type representation. STRAIGHT is currently finding wide applications such as speech synthesis systems and tools for auditory experiments. However, the current implementation of STRAIGHT by MATLAB does not fit to real-time applications. Thus, porting the language of the source code to the C language, replacing the F0 extracting algorithm to a cepstrum-based method, omitting the control part of the short-time phase in synthesis, and so on has been introduced for real-time processing. The preliminary experimentation confirmed that the real-time STRAIGHT can be executed on the recent personal computers and has higher quality than the cepstrum vocoder.
Perceptually weighted spectral distortions of STRAIGHT parameter interpolation for high quality speech processing

HATA Hiroaki, TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 105 ( 571 ) 1 - 6 2006.01

　View Summary

A high-quality speech analysis, modification and resynthesis procedure referred as STRAIGHT employs excessively redundant speech parameter representation that was the major obstacle in using STRAIGHT in various applications. This article provides a basic information for redundancy reduction by reducing the frame rate of spectral analysis. Two interpolation methods (nearest-neighbor and linear) for STRAIGHT spectral parameter were investigated using spectral distance measure both in the liner frequency axis and the nonlinear perceptual frequency axis (ERB_N rate). The investigation was conducted using the speech database developed for speech conversion and 190 sentences for both four male and four female speakers were used to evaluate those spectral distances. The results indicated that using linear interpolation the default frame interval 1ms can be increased up to 5ms for male speech and 4ms for female speech samples.
(Abstracts of Presentation,The 24th Annual Meeting)

津崎実, 竹島千尋, 入野俊夫

The Japanese Journal of Psychonomic Science ( The Japanese Psychonomic Society ) 24 ( 2 ) 221 - 221 2006

DOI
Comparison of auditory filters with cascade and parallel architectures for simultaneous notched-noise masking

鵜木祐史, 入野俊夫, Glasberg Brian

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 35 ( 11 ) 727 - 732 2005.12
Accuracy improvement in speech sound propagation measurement using logarithmic temporal manipulation

MORISE Masanori, IRINO Toshio, KAWAHARA Hideki

Technical report of IEICE. EA ( The Institute of Electronics, Information and Communication Engineers ) 105 ( 348 ) 43 - 48 2005.10

　View Summary

A new procedure to improve accuracy in empirical transfer function estimation method is proposed for investigating speech sound propagation. In our previous work based on cross spectrum method, vowel dependencies of empirical transfer functions from a lip reference point to observation points around speaker's head were found. The accuracy of the method were also evaluated by using references obtained using a HATS and M-sequence and revealed significant variations in higher frequency range (namely 4kHz or more) due to low speech energy. The proposed method alleviates this problem by introducing a logarithmic temporal manipulation and lowpass filtering in the manipulated domain. The proposed method was tested using 128 vocalizations of sustained Japanese vowels with roving fundamental frequency. The test results indicated that the proposed method reduced standard deviations down to 53% in gain estimation, 18% in group delay estimation and 17% in duration estimation respectively in frequency region higher than 10kHz. Detailed aspects on implementation are also discussed.
Analysis for emotion understanding with utterance collection in spoken dialogue system

OMAE Souji, NISIMURA Ryuichi, KAWAHARA Hideki, IRINO Toshio

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2005 ( 69 ) 99 - 104 2005.07

　View Summary

Understanding emotions that users hold is becoming important for realizing smooth conversations in spoken dialogue systems. This study discusses the actualities of an automatic emotion understanding by analyzing actual users' utterances collected via field testing our spoken dialogue system "Takemaru-kun". Two testers have carried out the five grade rating with 16 basic emotions to the collected utterances. The factor analysis on the rating result indicated the existence of two factors concerning negative or positive emotions. For realization of the emotions understanding, we have been investigating the correlation between the factors and acoustic features in user's voices. In this paper, the results showed that the factors have no remarkable correlation with the fundamental frequency and the power.
Voice and Emotion Conversion based on Statistics of Vowel Parameters in a Emotional Speech Database

FUJII T., NISHI M., TAKAHASHI T., BANNO H., IRINO T., KAWAHARA H.

日本音響学会研究発表会講演論文集 2005 ( 1 ) 299 - 300 2005.03
A study of talker localization based on subband CSP analysis with an average speech spectrum

DENDA Y., NISHIURA T., KAWAHARA H., IRINO T.

日本音響学会研究発表会講演論文集 2005 ( 1 ) 521 - 522 2005.03
Statistical properties of vibrato based on STRAIGHT analysis

MORISE M., HIRACHI Y., BANNO H., IRINO T., KAWAHARA H.

日本音響学会研究発表会講演論文集 2005 ( 1 ) 269 - 270 2005.03
User's emotion analysis by using actual utterances for speech-oriented information system

OMAE S., NISIMURA R., KAWAHARA H., IRINO T.

日本音響学会研究発表会講演論文集 2005 ( 1 ) 63 - 64 2005.03
Distance measure for emotional mapping method using Time-frequency warping based on STRAIGHT

TAKAHASHI T., BANNO H., NISIMURA R., IRINO T., KAWAHARA H.

日本音響学会研究発表会講演論文集 2005 ( 1 ) 213 - 214 2005.03
Perception of degraded speech sounds synthesized from restricted spectral modulation

SATOU S., IRINO T., BANNNO H., KAWAHARA K.

日本音響学会研究発表会講演論文集 2005 ( 1 ) 251 - 252 2005.03
On Spectral Deformation Analysis due to Intensity Variations in Singing Voice

TAHARA K., MORISE M., BANNO H., IRINO T., KAWAHARA H.

日本音響学会研究発表会講演論文集 2005 ( 1 ) 271 - 272 2005.03
A Study of Talker Localization Based on Subband CSP Analysis

DENDA Yuki, NISHIURA Takanobu, KAWAHARA Hideki, IRINO Toshio

IEICE technical report. Natural language understanding and models of communication ( The Institute of Electronics, Information and Communication Engineers ) 104 ( 539 ) 79 - 84 2004.12

　View Summary

It is very important to capture distant-talking speech with high quality for voice-controlled systems or teleconferencing systems. A microphone array steering is an idela candidate as an effective method for capturing distant talking speech with high quality. However, it requires to localize a target talker before capturing distant-talking speech. For this purpose, a talker localization method based on GSP (Cross-power Spectrum Phase) analysis has been proposed, for example. However, talker localization performance of the CSP analysis is degraded in higher noisy environments. To deal with this problem, in this papaer, we propose a subband CSP analysis with weighting of average speech spectrum and we propose a specialized localization method for speech. In addition, we evaluate the ASR (Automatic Speech Recognition) performance when the microphone array steering is steered to the estimated talker direction by the proposed method. As a result of evaluation experiments in a real room, we confirmed that the proposed method provides better talker localization performance than the conventional method.
A Study of Talker Localization Based on Subband CSP Analysis

DENDA Yuki, NISHIURA Takanobu, KAWAHARA Hideki, IRINO Toshio

IPSJ SIG Notes ( Information Processing Society of Japan (IPSJ) ) 2004 ( 131 ) 169 - 174 2004.12

　View Summary

It is very important to capture distant-talking speech with high quality for voice-controlled systems or teleconferencing systems. A microphone array steering is an idela candidate as an effective method for capturing distant talking speech with high quality. However, it requires to localize a target talker before capturing distant-talking speech. For this purpose, a talker localization method based on CSP (Cross-power Spectrum Phase) analysis has been proposed, for example. However, talker localization performance of the CSP analysis is degraded in higher noisy environments. To deal with this problem, in this papaer, we propose a subband CSP analysis with weighting of average speech spectrum and we propose a specialized localization method for speech. In addition, we evaluate the ASR (Automatic Speech Recognition) performance when the microphone array steering is steered to the estimated talker direction by the proposed method. As a result of evaluation experiments in a real room, we confirmed that the proposed method provides better talker localization performance than the conventional method.
Perception of "size-modulated" speech : The relation between the modulation period and the vowel identification

Tsuzaki Minoru, Irino Toshio

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 34 ( 10 ) 713 - 718 2004.12
A method for designing acoustic measurement signals robust against background noise

MORISE Masanori, IRINO Toshio, BANNO Hideki, KAWAHARA Hideki

Technical report of IEICE. EA ( The Institute of Electronics, Information and Communication Engineers ) 104 ( 247 ) 37 - 42 2004.08

　View Summary

We propose a new signal for measuring acoustic impulse responses of rooms and audio equipment. The signal, named as "warped TSP", is a combination of TSP(Time stretched pulses) and Logarithmic-TSP to improve the accuracy both in low and high frequency regions where the ambient or background noise is relatively high. We are able to define an optimal signal in accordance with the spectral distribution of the background noise. In this report, we describe the method for designing the warped TSP and show the dependency between the parameter and the spectral distribution. Moreover, we demonstrate the effectiveness of the warped TSP by measuring the room acoustics and headphone characteristics in the real environments.
Speech Segregation using Auditory Vocoder with Event-Synchronous procedure

入野俊夫, Patterson Roy D., 河原英紀

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 33 ( 9 ) 603 - 608 2003.11
Source signal extraction and aperiodicity evaluation based on STRAIGHT spectrum

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, IRINO Toshio, BANNO Hideki, FUJIMURA Osamu

IEICE technical report ( The Institute of Electronics, Information and Communication Engineers ) 106 ( 333 ) 43 - 48 2003.11

　View Summary

A new procedures to extract aperiodic component was proposed based on a fundamental discussion on how aperiodicity should be defined. This investigation is a part of an ongoing research to provide high-quality speech processing methods consisting of analysis, modification and synthesis. Roles and relations between frequency domain representation of signal duration based on group delay, bandwise durations of the extracted source signal using minimum phase inverse filter derived from a STRAIGHT spectrum, prediction residuals usign franking segments which are one pitch period apart, and apparent residuals due to temporal spectral variations are clarified in this discussion.
Speech segregation based on fundamental periodicity using auditory vocoder

IRINO Toshio, ROY D Patterson, KAWAHARA Hideki

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 103 ( 155 ) 55 - 60 2003.06

　View Summary

We have developed a method for speech segregation based on the Auditory Image Model (AIM) and a scheme of event-synchronous processing. AIM was developed to provide a reasonable representation of the "auditory image" we perceive in response to a sounds. We have also developed an "auditory vocoder" for resynthesizing speech from the auditory image using an existing, high-quality vocoder, STRAIGHT. The auditory representation preserves fine temporal information, unlike conventional window-based processing, which makes it possible to segregate the speech synchronously. We have also developed a method to convert the FO to event times. We have shown that the segregation from the concurrent speech is good even when the SNR is 0 dB, and the glottal-event times of the target speaker are perfectly estimated. The extracted target speech was distorted but entirely intelligible, whereas the distracter speech was reduced to a non-speech sound that was not perceptually disturbing. This system may explain how the auditory system segregates speakers inasmuch as the playback is resynthesized from the output of a reasonable auditory model.
Speech segregation based on fundamental periodicity using auditory vocoder

入野俊夫, パターソンロイ D., 河原英紀

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 33 ( 4 ) 311 - 316 2003.06
Speech segregation from speech mixture using event-synchrounous auditory vocoder

IRINO T., PATTERSON R. D., KAWAHARA H.

日本音響学会研究発表会講演論文集 2003 ( 1 ) 343 - 344 2003.03
Scale theory in the early auditory system

IRINO Toshio

日本音響学会研究発表会講演論文集 2003 ( 1 ) 511 - 514 2003.03
A computational theory of the early auditory system : optimality, explaining experimental data, and ecological point of view

入野俊夫

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 32 ( 7 ) 455 - 460 2002.09
Fundamental Frequency Estimation Based on Dominance Spectrum

中谷智広, 入野俊夫

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 32 ( 2 ) 105 - 112 2002.03
Fundamental Frequency Estimation Based on Dominance Spectrum

NAKATANI Tomohiro, IRINO Toshio

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 101 ( 744 ) 21 - 28 2002.03

　View Summary

This paper presents a new method for robust and accurate fundamental frequency (F_0) estimation in the presence of background noise and spectral distortion. For this purpose, degree of dominance and a dominance spectrum are defined based on instantaneous frequencies of the STFT spectra. The degree of dominance is a measure for evaluating the magnitude of individual harmonic components relative to the background noise. The fundamental frequency is correctly estimated from reliable harmonic components easily selected in the dominance spectra. Experiments are performed using white and multi-talker background noise under the conditions with and without spectral distortion produced by a SRAEN filter. Results show that the present method is better than the commonly-used conventional methods in terms of both the F_0 correct rates and fine F_0 errors.
Acoustic feature and three types of fixed points in time-frequency representations

KAWAHARA Hideki, ZOLFAGHARI Parham, IRINO Toshio

日本音響学会研究発表会講演論文集 2002 ( 1 ) 497 - 498 2002.03
Fundamental frequency estimation based on dominant harmonic components

NAKATANI T., IRINO T.

日本音響学会研究発表会講演論文集 2002 ( 1 ) 323 - 324 2002.03
Parameter estimation of the compressive gammachirp in notched-noise masking data for various frequencies

UNOKI Masashi, PATTERSON Roy D., IRINO Toshio

日本音響学会研究発表会講演論文集 2002 ( 1 ) 495 - 496 2002.03
Fitting the compressive gammachirp auditory filter to human notched-noise masking data for various frequencies

鵜木祐史, Patterson Roy D., 入野俊夫

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 32 ( 1 ) 41 - 48 2002.01
幼児音声の基本周波数および有声区間の推定法

中谷智広, 天野成昭, 入野俊夫

日本音響学会研究発表会講演論文集 2002 2002
Application of F_0 extraction method based on instantaneous frequency to co-channel speech

NAKATANI T., IRINO T.

日本音響学会研究発表会講演論文集 2001 ( 2 ) 211 - 212 2001.10
Multiscale computing

Mei Kobayashi, Toshio Irino, Wim Sweldens

Proceedings of the National Academy of Sciences of the United States of America ( NATL ACAD SCIENCES ) 98 ( 22 ) 12344 - 12345 2001.10

　View Summary

Multiscale computing (MSC) involves the computation, manipulation, and analysis of information at different resolution levels. Widespread use of MSC algorithms and the discovery of important relationships between different approaches to implementation were catalyzed, in part, by the recent interest in wavelets. We present two examples that demonstrate how MSC can help scientists understand complex data. The first is from acoustical signal processing and the second is from computer graphics.

DOI
解説記事私のすすめるこの1冊 : 「相対性理論」アインシュタイン著, 内山龍雄訳・解説, 岩波文庫, 1988

入野俊夫

日本音響学会誌 ( 一般社団法人日本音響学会 ) 57 ( 8 ) 565 - 566 2001.08
Signal resynthesis from Auditory Mellin Image using a high-quality VOCODER, STRAIGHT

入野俊夫, パターソンロイ D., 河原英紀

聴覚研究会資料 = Proceedings of the auditory research meeting ( 日本音響学会 ) 31 ( 5 ) 315 - 322 2001.07
Signal resynthesis from Auditory Mellin Image using a high-quality VOCODER, STRAIGHT

IRINO Toshio, PATTERSON Roy D., KAWAHARA Hideki

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 101 ( 232 ) 31 - 38 2001.07

　View Summary

We propose a method for resynthesizing sounds from auditory representations, Auditory Mellin Images, by using a high-quality VOCIDER, STRAIGHT. Analysis/synthesis systems for speech sounds have been studied extensively until the VOCODER system was developed in 1939. There is, however, no system involving a reallistic auditiory model while human sound perception is known as an important facter to develope the system. We combined Auditory Mellin Image model and STRAIGHT into a new″auditory"VOCODER system by introducing a mapping function including frequency-warping Discrete Cosine Transform and nonlinear multivariate analysis. By using this system, we expect to include auditory functions such as noise-robustess and sound-source separation which have been problems for conventional VOCODERs.
A new pitch extraction method using instantaneous frequencies of harmonic compornents

ATAKE Yoshinori, IRINO Toshio, KAWAHARA Hideki, LU Jinlin, NAKAMURA Satoshi, SHIKANO Kiyohiro

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 99 ( 679 ) 25 - 32 2000.03

　View Summary

STRAIGHT, developed by Hideki Kawahara et. al. in 1996, can produce re-synthesized speech sound very naturally, although it is basically a VOCODER method. But STRAIGHT has a weak point in the noise tolerance that the quality of the re-synthesized sounds largely degraded when using in noisy environments. This is because STRAIGHT uses pitch-adaptive analysis to produce the time-frequency representation and is sensitive to the error in the estimated pitch frequency. To solve this problem, a new pitch extraction method is proposed in this paper. This method extracts the harmonic components of the glottal pulses and combine them using the bandwidth equation adapted from Cohen's equation (1995). A large database for simultaneous recording of speech waveforms and EGG (electro glottal graph) was constructed to evaluate the proposed method, STRAIGHT-TEMPO, and other methods. As a result, the precision of the proposed method is much better than other methods when the signal-to-noise ratio is low, and is very accurate and comparable to TEMPO in the clean condition.
Fitting th physiological gammachirp filter to impulse-response data from the cat cochlea

IRINO Toshio, PATTERSON Roy D.

日本音響学会研究発表会講演論文集 2000 ( 1 ) 397 - 398 2000.03
A new pitch extraction method using instantaneous frequencies of several harmonic components

ATAKE Yoshinori, IRINO Toshio, KAWAHARA Hideki, LU Jinlin, NAKAMURA Satoshi, SHIKANO Kiyohiro

日本音響学会研究発表会講演論文集 2000 ( 1 ) 251 - 252 2000.03
Steady-state noise suppression using a gammachirp auditory filterbank

IRINO Toshio

Technical report of IEICE. DSP ( The Institute of Electronics, Information and Communication Engineers ) 99 ( 504 ) 59 - 66 1999.12

　View Summary

Spectral subtraction has been most popularly cited as a noise suppression method for speech signals with steady background noise because it is basically a non-parametric method and simple enough to be implemented with FFT. But it has been well known that the spectral subtraction produces so called "musical noise" in the synthetic sounds. Since the musical noise, even with lowlevel, often bothers human speech perception, the spectral subtraction has not been successfully used in applications necessary to reproduce sounds for human listeners. To overcome the problem essentially, this paper proposes an alternative method using a time-varying, analysis/synthesis gammachirp filterbank, i.e., initially proposed as an auditory filterbank. The present method is shown to achieve about the same SNR improvement as the spectral subtraction when using the same condition on non-speech interval. Moreover, the synthetic sounds contain no musical noise but just steady white-like noise with reduced level when the original background is white noise. This method is advantageous in various applications for human listeners since it uses the gammachirp that is also suitable for approximation of human auditory filter shapes. (This paper is based on Tech. Rep. of Acoust Soc. Jpn. H-98-98 (Sept, 1998) with minor modifications.)
Imaging of sound source shape:Auditory strategy for opimal signal processing

IRINO Toshio, PATTERSON Roy D.

日本音響学会研究発表会講演論文集 1999 ( 2 ) 1177 - 1178 1999.09
Applicaton of bandwidth equation to fundamental frequency extraction in STRAIGHT

ATAKE Yoshinori, IRINO Toshio, KAWAHARA Hideki

日本音響学会研究発表会講演論文集 1999 ( 1 ) 199 - 200 1999.03
Parameter determination of a gammachirp filter with physiological constrains

IRINO Toshio, PATTERSON Roy D.

日本音響学会研究発表会講演論文集 1999 ( 1 ) 381 - 382 1999.03
On normalization of sound source size by the Mellin transform in model of the auditory pathway

IRINO Toshio, PATTERSON Roy D.

日本音響学会研究発表会講演論文集 1999 ( 1 ) 383 - 384 1999.03
Noise suppression using an analysis/synthesis gammachirp filterbank

IRINO Toshio

日本音響学会研究発表会講演論文集 1998 ( 2 ) 241 - 242 1998.09
ガンマチャープフィルタバンクの構築

鵜木祐史, 入野俊夫, 下平博

Research report ( 北陸先端科学技術大学院大学 ) 98 1 - 11 1998.03
A time-varying, analysis/synthesis auditory model using the gammachirp filterbank

IRINO Toshio, UNOKI Masashi

日本音響学会研究発表会講演論文集 1998 ( 1 ) 413 - 414 1998.03
A method for controlling the asymmetric parameters in the gammachirpfilterbank

UNOKI Masashi, IRINO Toshio

日本音響学会研究発表会講演論文集 1998 ( 1 ) 415 - 416 1998.03
Report on the 11th International Symposium on Hearing and Computational Auditory Scene Analysis '97

TSUZAKI Minoru, IRINO Toshio

The Journal of the Acoustical Society of Japan ( 一般社団法人日本音響学会 ) 54 ( 2 ) 162 - 163 1998.02
An implementation of the gammachirp filter using an asymmetric IIR filter

IRINO Toshio, UNOKI Masashi

日本音響学会研究発表会講演論文集 1997 ( 2 ) 421 - 422 1997.09
Explaning perceptual temopral asymmetry with autocorrelation vs. strobed temporal integration

IRINO Toshio, PATTERSON Roy D.

日本音響学会研究発表会講演論文集 1997 ( 1 ) 455 - 456 1997.03
On approximation of the auditory filter shape using a gammachirp function

IRINO Toshio, PATTERSON Roy D.

日本音響学会研究発表会講演論文集 1996 ( 2 ) 385 - 386 1996.09
Minimal uncertainty of a gammachirp function in Mellin transform

IRINO Toshio

日本音響学会研究発表会講演論文集 1995 ( 2 ) 421 - 422 1995.09
A Computational Theory of the Peripheral Auditory System

IRINO Toshio

IEICE technical report. Speech ( The Institute of Electronics, Information and Communication Engineers ) 95 ( 140 ) 23 - 30 1995.07

　View Summary

A computational theory of the peripheral auditory system is discussed in the manner of D.Marr. A 'gammachirp' function is found to be the optimal auditory filter in terms of minimal uncertainty if the time-scale representation is calculated in the auditory system. Wavelet configuration is optimal for the auditory filterbank above 800Hz in terms of invariability in the scale representation. A 'delta-gamma' theory was introduced to explain temporal asymmetry in auditory perception. The theory can also explain physiological firing patterns of an inner hair-cell and some neurons in the Cochlear Nucleus.
On optimality of gammatone filter

IRINO Toshio

日本音響学会研究発表会講演論文集 1995 ( 1 ) 449 - 450 1995.03
C-4 Zero slope temperature SiC/SiO_2/LiTaO_3 substrate for boundary acoustic waves

Irino Toshio, Watanabe Takaya, Shimizu Yasutaka

Symposium on ultrasonic electronics ( Steering committee of symposium on ultrasonic electronics ) ( 8 ) 69 - 70 1987.12
E-3 Acoustic boundary waves propagating along a thin layer between two bonded substrates

IRINO Toshio, SHIMIZU Yasutaka

Symposium on ultrasonic electronics ( Steering committee of symposium on ultrasonic electronics ) ( 6 ) 119 - 120 1985.12
C-1 Stoneley waves propagating along an interface between piezoelectric material and Glass

SHIMIZU Yasutaka, IRINO Toshio

Symposium on ultrasonic electronics ( Steering committee of symposium on ultrasonic electronics ) ( 3 ) 79 - 80 1982.12

▼display all

Awards & Honors

Fellow

Winner： Toshio Irino

2010.04 The Acoutical Society of America
IEEE Kansai支部メダル（IEEE senior member)

2004.06 IEEE Kansai chapter
第40回佐藤論文賞

2000 日本音響学会
粟屋潔学術奨励賞

1989 日本音響学会

Conference Activities & Talks

Gammachirp Envelope Similarity Index (GESI)による模擬難聴音声の了解度予測～防音室実験とクラウドソーシング遠隔実験の主観評価データを用いて～

入野俊夫, 田丸萌夏, 山本絢子

音学シンポジウム2022 2022.06.18
A new implementation of hearing impairment simulator WHIS based on the gammachirp auditory filterbank

Toshio Irino

The 3rd Japan-Taiwan Symposium on Psychological and Physiological Acoustics 2021.12.11
模擬難聴を用いた補聴処方式の評価

時政和征, 土庵晋太郎, 川⻄真樹, 入野俊夫

日本音響学会関西支部,第27回関西支部若手研究者交流研究発表会 2024.12.14
感情音声の弁別特性における模擬難聴処理の有無の違い − 落着きと怒り・悲しみ・喜びとの間の弁別 −

山崎花梨, 花谷幸歩, 黑谷悠太, 入野俊夫

日本音響学会関西支部,第27回関西支部若手研究者交流研究発表会 2024.12.14
高齢者の聞こえを模擬した音声を用いた健聴者了解度実験

宮﨑芙紀, 國中敬太, 森本隆司, 入野俊夫

日本音響学会関西支部,第27回関西支部若手研究者交流研究発表会 2024.12.14
感情音声の弁別特性における健聴者と高齢者との違い − 落着きと怒り・悲しみ・喜びとの間の弁別 −

黑谷悠太, 花谷幸歩, 山崎花梨, 入野俊夫

日本音響学会関西支部,第27回関西支部若手研究者交流研究発表会 2024.12.14
高齢者を対象とした音声了解度実験と客観評価指標 GESI を用いた予測

宮﨑芙紀, 馬野颯太, 山本絢子, 森本隆司, 入野俊夫

日本音響学会第152回（2024年秋季）研究発表会 2024.09.05
模擬難聴システムの非線形歪み評価とアルゴリズム

土庵晋太郎, 入野俊夫, 石川美波

音学シンポジウム2024 2024.06.14
音声からの感情弁別に対する難聴の影響 -模擬難聴処理を用いた健聴者実験-

花谷幸歩, 岸田一馬, 内藤朱里, 河原英紀, 入野俊夫

日本音響学会第151回（2024年春季）研究発表会 2024.03.06
音声からの感情弁別に対する難聴の影響 -高齢難聴者と模擬難聴者の実験-

花谷幸歩, 岸田一馬, 内藤朱里, 河原英紀, 入野俊夫

日本音響学会聴覚研究会 2024.02.23
模擬難聴システムの音声歪み比較ーケンブリッジ対和歌山ー

土庵晋太郎, 石川美波, 入野俊夫

日本音響学会関西支部,第25回関西支部若手研究者交流研究発表会 2023.12.09
高齢者を対象とした IRM 強調処理音声の了解度主観評価

宮﨑芙紀, 馬野颯太, 森本隆司, 入野俊夫

日本音響学会関西支部,第25回関西支部若手研究者交流研究発表会 2023.12.09
高齢者の聞こえの模擬による音声感情知覚実験

花谷幸歩, 岸田一馬, 内藤朱里, 河原英紀, 入野俊夫

日本音響学会関西支部,第25回関西支部若手研究者交流研究発表会 2023.12.09
音声情報抽出に有効な聴覚表現: 理論・測定・推定・応用

入野俊夫 [Invited]

日本音響学会聴覚研究会 2023.11.23
What is an Effective Auditory Representation for Estimating Vocal Tract Information? - Effectiveness of "Auditory Motivated" Models -

Toshio Irino, Shintaro Doan [Invited]

Miini-workshop "Engneering the Future of Hearing Science and Speech Technologies" 2023.11.06
A First Step in Predicting Speech Intelligibility for Elderly Listeners with Hearing Loss: Gammachirp Envelope Similarity Index (GESI)

Ayako Yamamoto, Toshio Irino, Fuki Miyazaki, Honoka Tamaru [Invited]

Mini-workshop "Engneering the Future of Hearing Science and Speech Technologies 2023.11.06
GESI による実拡声環境下での低親密度単語了解度の推定

渡邊健太郎, 小林洋介, 入野俊夫

日本音響学会第150回（2023年秋季）研究発表会 2023.09.26
客観評価指標 GESI による模擬難聴音声了解度の個人別予測

山本絢子, 宮﨑芙紀, 田丸萌夏, 入野俊夫

日本音響学会春季研究発表会 2023.03.17
クラウドソーシング聴取実験のための効果的な事前参加者スクリーニング

宮﨑芙紀, 山本絢子, 土庵晋太郎, 入野俊夫

日本音響学会春季研究発表会 2023.03.17
基本周波数適応型聴覚表現による声道長推定

入野俊夫, 土庵晋太郎

電子情報通信学会, 音声研究会 2023.02.28
客観評価指標 GESI による模擬難聴音声の了解度予測 – 健聴者による原音声の主観評価値のみを用いて –

山本絢子, 宮﨑芙紀, 田丸萌夏, 入野俊夫

日本音響学会聴覚研究会 12月九州大学大橋キャンパス 2022.12.18
クラウドソーシング聴取実験のための効果的な事前参加者スクリーニングの検討

宮﨑芙紀, 山本絢子, 土庵晋太郎, 入野俊夫

日本音響学会関西支部,第25回関西支部若手研究者交流研究発表会 2022.11.26
模擬難聴音声了解度の主観評価実験とGESIによる予測

山本絢子, 宮﨑芙紀, 田丸萌夏, 入野俊夫

日本音響学会関西支部,第24回関西支部若手研究者交流研究発表会 2022.11.26
高齢難聴者の音声了解度客観評価を目指したGESI の開発 - 強調音声と模擬難聴音声による評価 -

山本絢子, 入野俊夫, 荒木章子, 田丸萌夏, 新井賢一, 小川厚徳, 木下慶介, 中谷智広

日本音響学会：秋季研究発表会 2022.09.16
拡声環境を想定した音声了解度指標GESIと従来手法との比較

渡邊健太郎, 小林洋介, 入野俊夫

日本音響学会：秋季研究発表会 2022.09.16
Speech intelligibility prediction by objective intelligibility measure, GESI - Enhanced speech and level reduced speech -

山本絢子, 入野俊夫, 荒木章子, 田丸萌夏, 新井賢一, 小川厚徳, 木下慶介, 中谷智広

日本音響学会聴覚研究会 2022.07.08
Conformer-based fusion of text, audio, and listener characteristics for predicting speech intelligibility of hearing aid users

Naoyuki Kamo, Kenichi Arai, Atsunori Ogawa, Shoko Araki, Tomohiro Nakatani, Keisuke Kinoshita, Marc Delcroix, Tsubasa Ochiai, Toshio Irino

the 2nd Clarity Workshop on Machine Learning Challenges for Hearing Aids (Clarity-2022), 2022.06.29
異なる身長の小学生の音声を用いた寸法知覚実験

上野朱音, 入野俊夫, 山本絢子

日本音響学会春季研究発表会 2022.03.11
模擬難聴システムWHISの新実装と末梢系特性の音声了解度への影響

入野俊夫, 田丸萌夏, 山本絢子

日本音響学会春季研究発表会 2022.03.10
MVDRビームフォーマーによる音声強調処理の了解度評価ー防音室実験とクラウドソーシング実験の対比ー

山本絢子, 入野俊夫, 新井賢一, 荒木章子, 小川厚徳, 木下慶介, 中谷智広

日本音響学会, 2022 春季研究発表会 2022.03.09
IRMを用いた音声強調処理の主観了解度の上限評価 - 防音室実験とクラウドソーシング実験の対比

山本絢子, 入野俊夫, 新井賢一, 荒木章子, 小川厚徳, 木下慶介, 中谷智広

日本音響学会／電子情報通信学会 2022年3月音声研究会 2022.03
マルチチャンネル音声強調処理の主観評価

山本絢子, 入野俊夫, 新井賢一, 荒木章子, 小川厚徳, 木下慶介, 中谷智広

日本音響学会関西支部,第24回関西支部若手研究者交流研究発表会 2021.12.04
利用価値の高い音声データの録音手順の実際と支援ツールについて～オールパスフィルタの従属接続に基づく拡張された時間伸長パルスの応用～

河原英紀, 矢田部浩平, 榊原健一, 水町光徳, 森勢将雅, 坂野秀樹, 入野俊夫

音学シンポジウム2021 2021.06
クラウドソーシングを利用した音声了解度実験ーウェッブページ制作からデータスクリーニングー

山本絢子, 入野俊夫, 新井賢一, 荒木章子, 小川厚徳, 木下慶介, 中谷智広

音学シンポジウム2021 2021.06
音声資料の収録・再生環境の簡易な把握に向けて: オールパスフィルタの従属接続に基づく拡張された時間伸長パルスの応用

河原英紀, 矢田部浩平, 榊原健一, 水町光徳, 森勢将雅, 坂野秀樹, 入野俊夫

日本音響学会春季研究発表会 2021.03.12
クラウドソーシングと防音室における音声了解度実験の対比

山本絢子, 入野俊夫, 新井賢一, 荒木章子, 小川厚徳, 木下慶介, 中谷智広

電子情報通信学会, 音声研究会 2021.03.03
オンライン実験のためのWebページ制作と聴取条件統制へ向けた検討

山本絢子, 入野俊夫

日本音響学会関西支部,第23回関西支部若手研究者交流研究発表会 2020.12.05
音声収集と聴取における対話的実時間音響計測ツールの応用について

河原英紀, 榊原健一, 水町光徳, 入野俊夫

日本音響学会聴覚研究会 2020.11.20
非侵襲心理物理実験による聴覚末梢系の特性推定とその応用

入野俊夫 [Invited]

第30回日本耳科学会 2020.11.12
模擬難聴システムWHIS を用いた発声訓練が明瞭性に与える効果とその持続性

東山宗一, 吉木華子, 入野俊夫

日本音響学会：春季研究発表会 2020.03.16
ささやき声の寸法知覚におけるピッチ感を導入したモデル化

上村怜央, 入野俊夫, ロイ D. パターソン

日本音響学会：春季研究発表会 2020.03.16
音声の基本周波数に対する聴覚の影響の測定への周波数領域ベルベットノイズの応用について

河原英紀, 榊原健一, 津崎実, 松井淑恵, 森勢将雅, 入野俊夫

電子情報通信学会, 音声研究会 2020.03.02
模擬難聴システムWHISを用いた発声訓練音声の発声特徴量と聴覚特徴量

東山宗一, 吉木華子, 河原英紀, 入野俊夫

電子情報通信学会, 音声研究会 2020.03.02
レベル依存蝸牛雑音フロアを導入した聴覚フィルタ特性推定

横田健治, 入野俊夫, 松浦弘樹, 仲間杏, Roy Patterson

日本音響学会聴覚研究会 2020.02.15 (琉球大学（沖縄県中頭郡）) 日本音響学会

　View Summary

50 (1), pp.29-34, H-2020-6
聴力低下が音声からの男女判別に与える影響ー高齢者と模擬難聴システムWHISによる実験ー

小森理子, 奥谷友梨, 入野俊夫

日本音響学会聴覚研究会 2020.02.15 (琉球大学（沖縄県中頭郡）) 日本音響学会

　View Summary

Vol50(1), pp.17-22, H-2020-4
感情推移観測システムによるスキーマ療法における感情表出の定量化に関する予備的検討

仁田雄介, 入野俊夫, 古山宣洋, 花田里欧子, 井上雅史, 門田圭祐, 熊野宏昭

早稲田大学応用脳科学研究所応用脳科学カンファレンス 2020.02.10
Effects of modified auditory feedback simulating age related hearing loss on piano performances

Minoru Tsuzaki, Noriko Maegawa, Chie Ohsawa, Hideki Banno, Toshio Irino

ARO 43rd MidWinter Meeting 2020.01.25 (San Jose, CA, USA) Association for Research in Otolaryngology
Extending the gammachirp model of notched-noise masking to include absolute threshold: Exploring improvements in the fit provided by assuming an internal, level-dependent, cochlear noise floor

Kenji Yokota, Toshio Irino, Roy D. Patterson

ARO 43rd MidWinter Meeting 2020.01.25 (San Jose, CA, USA) Association for Research in Otolaryngology
模擬難聴システム WHIS を用いた拡張聴覚心理実験と演習

野崎航, 小森理子, 吉木華子, 松井淑恵, 入野俊夫

第22回関西支部若手研究者交流研究発表会 2019.11.30 (大阪産業大(大阪市)) 日本音響学会関西支部

　View Summary

#14(ポスター)
ささやき声のピッチ感は寸法知覚に影響を与えるか? ー計算モデルによる検討ー

上村怜央, 入野俊夫, Roy D. Patterson

第22回関西支部若手研究者交流研究発表会 2019.11.30 (大阪産業大(大阪市)) 日本音響学会関西支部

　View Summary

#15(ポスター)(筆頭著者上村怜央、「優秀奨励賞」受賞 4位内/39件中)
模擬難聴システム WHIS を用いた発声訓練音声の韻律特徴分析

東山宗一, 吉木華子, 入野俊夫

第22回関西支部若手研究者交流研究発表会 2019.11.30 (大阪産業大(大阪市)) 日本音響学会関西支部

　View Summary

#15(ポスター)(筆頭著者東山宗一、「奨励賞」受賞 6位内/39件中)
音響システムの各種特性の計測における周波数領域velvet noiseの応用について

河原英紀, 榊原健一, 水町光徳, 森勢将雅, 坂野秀樹, 入野俊夫

音響研究会（EA）/聴覚研究会 2019.10.28 (東京 (EA, ASJ-H)) NHK放送技術研究所

　View Summary

2019年10月28日-29日
加齢性難聴によりピアノ奏者は何か変わるか

津崎実, 前川典子, 大澤智恵, 坂野秀樹, 入野俊夫 [Invited]

日本音響学会春季研究発表会 2019.09.06 (立命館大学びわこ・くさつキャンパス,滋賀県草津市) 日本音響学会

　View Summary

春季研究発表会講演論文集,3-2-6, pp.1333--1336 4-6 Sep 2019
模擬難聴システムと聴覚・音声実験への応用

入野俊夫 [Invited]

日本音響学会春季研究発表会 2019.09.06 (立命館大学びわこ・くさつキャンパス,滋賀県草津市,) 日本音響学会

　View Summary

春季研究発表会講演論文集,3-2-4, pp.1329--1330 4-6 Sep 2019
通常発声とささやき声を対比した寸法知覚の計算モデル

上村怜央, 入野俊夫, Patterson Roy D

日本音響学会：春季研究発表会講演論文集 2019.09.04 (立命館大学びわこ・くさつキャンパス,滋賀県草津市,) 日本音響学会

　View Summary

1-R-2, pp.579--582,
聴覚フィルタ推定における蝸牛雑音フロアの設定法について

横田健治, 入野俊夫, Patterson Roy D

日本音響学会：春季研究発表会講演論文集 2019.09.04 (立命館大学びわこ・くさつキャンパス,滋賀県草津市,) 日本音響学会

　View Summary

1-R-16, pp.615--616,
ＤＮＮ音声認識システムによる単語了解度予測

新井賢一, 荒木章子, 小川厚徳, 木下慶介, 中谷智広, 山本克彦, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2019.09.04 (立命館大学びわこ・くさつキャンパス,滋賀県草津市,) 日本音響学会

　View Summary

3-P-34, pp.703--706,
Modification of piano performance by simulated hearing loss: Analyses on the key velocities and output powers,

Minoru Tsuzaki, Noriko Maegawa, Chie Ohsawa, Hideki Banno, Toshio Irino

International Symposium on Performance Science 2019 (ISPS2019) 2019.07.16
模擬難聴システムの教育・臨床・研究への適用と言語聴覚士による評価

長谷川純, 吐師道子, 松井淑恵, 入野俊夫

第20回日本言語聴覚学会 2019.06.28 (iichiko総合文化センター他, 大分) 日本言語聴覚学会

　View Summary

1-P03-4,28-29 Jun 2019 http://www.congre.co.jp/jaslht20/
Hearing impairment simulator: its background and applications,

Toshio Irino [Invited]

2019 The 2nd Japan-Taiwan Symposium Psychological and Physiological Acoustics — Inclusive Sound Design 2019.05.17 (National Yang Ming University, Taipei)

　View Summary

https://2019-jptw-symp.github.io
言語聴覚士教育における模擬難聴システムを使用した演習の効果

長谷川純, 吐師道子, 松井淑恵, 入野俊夫

第45回日本コミュニケーション障害学会 2019.05.12 (川崎医療福祉大学, 倉敷)

　View Summary

http://jacd45.umin.jp/program.html
模擬難聴システムを用いた発声訓練が発話長に与える効果とその持続性

東山宗一, 入野俊夫, 山内悠記

日本音響学会：春季研究発表会講演論文集,2-3-1 2019.03.05 (東京都調布市) 電気通信大学
通常発声とささやき声を対比した場合の寸法知覚

上村怜央, 入野俊夫, Roy D. Patterson

日本音響学会：春季研究発表会講演論文集,3-P-24 2019.03.05 (東京都調布市) 電気通信大学
ノッチ雑音レベルに依存した蝸牛雑音を考慮した聴覚フィルタ特性推

横田健治, 入野俊夫, 松浦弘樹, Roy D. Patterson

日本音響学会：春季研究発表会講演論文集,3-P-40, 2019.03.05 (東京都調布市) 電気通信大学
模擬難聴を使った聴力低下による音声寸法弁別特性への影響

米満麻弥, 入野俊夫, 上村怜央, Roy D. Patterson

日本音響学会：春季研究発表会講演論文集,3-P-23 2019.03.05 (東京都調布市) 電気通信大学
レベル依存性のある蝸牛雑音フロアを考慮した聴覚フィルタ特性の推定

横田健治, 入野俊夫, 松浦弘樹, Roy D. Patterson

聴覚研究会 2018.12.14 (福岡市) 九州大

　View Summary

2018年12月14日-15日
ガンマチャープ聴覚フィルタバンクに基づく模擬難聴システムの実装と教育応用

松井淑恵, 坂野秀樹, 西村竜一, 入野俊夫

電子情報通信学会, 音声研究会/福祉工学研究会 2018.10.27 (九州工大(北九州市)) 電子情報通信学会, 音声研究会/福祉工学研究会

　View Summary

vol. 118, no. 269, SP2018-38, pp. 31-36
The gammachirp auditory filter and its application to speech perception

Toshio Irino, Roy D. Patterson [Invited]

International Symposium on Universal Acoustical Communication 2018 2018.10.24 (Tohoku University, Sendai)
複数の雑音条件下における共通パラメータを用いた音声了解度予測

山本克彦, 入野俊夫, 荒木章子, 木下慶介, 中谷智広

秋季研究発表会講演論文集 2018.09.12 (大分大学旦野原キャンパス(大分県大分市)) 日本音響学会

　View Summary

2-P-42, pp.897-898, 12-14 Sep 2018,
敵対的生成ネットワークを用いた楽曲の自動コード推定法の検討

納庄貴大, 西村竜一, 入野俊夫

第120回音楽情報科学研究会(夏のシンポジウム) 2018.08.22 (広島工業大学五日市キャンパス講義棟「三宅の森 Nexus21」 9F(広島県広島市佐伯区)) 情報処理学会

　View Summary

発表番号6, 研究報告音楽情報科学(MUS), 2018-MUS-120(6),1- 6,2018年8月21日-23日
通常発声とささやき声を比較した時の寸法知覚-どちらが小さい話者に聞こえる？

上村怜央, 入野俊夫, Roy D. Patterson

情報処理学会, 音学シンポジウム2018 2018.06.17 (東京大学本郷キャンパス(東京都文京区))

　View Summary

発表番号57, 研究報告音楽情報科学（MUS）,2018-MUS-119(57),1-6 (2018-06-09), 2018年6月16日-17日
蝸牛雑音を導入した絶対閾値と聴覚フィルタ特性の同時推定

横田健治, 入野俊夫, 松井淑恵, Roy D. Patterson

情報処理学会, 音学シンポジウム2018 2018.06.17 (東京大学本郷キャンパス(東京都文京区))

　View Summary

発表番号59, 研究報告音楽情報科学（MUS）,2018-MUS-119(59),1-5 (2018-06-09) , 2018年6月16日-17日
模擬難聴システムを用いた発話訓練による音声の明瞭性向上の評価

東山宗一, 入野俊夫

情報処理学会, 音学シンポジウム2018 2018.06.17 (東京大学本郷キャンパス(東京都文京区))

　View Summary

発表番号55, 研究報告音楽情報科学（MUS）,2018-MUS-119(55),1-6 (2018-06-09), 2018年6月16日-17日
臨床心理面接における傾聴度変化の評価−臨床心理士と初学者の比較

花田里欧子, 中島隆太郎, 井上雅史, 古山宣洋, 入野俊夫

人工知能学会全国大会(第28回) 2018.06.05 (城山観光ホテル(鹿児島市))

　View Summary

3C1-OS-14a-02, 2018年6月5日〜8日
Effet différencié d’un simulateur de perte auditive sur la compression cochléaire et la sélectivité fréquentielle,

Nicolas Grimault, Toshio Irino, Samar Dimachki, Alexandra Corneyllie, Roy D. Patterson, Samuel Garcia

CFA 18 - French Acoustical Congress of Acoustic, Le Harve, 23-27 April 2018. 2018.04
Intelligibility of speech with additive bubble noise and enhancement under hearing impairment simulation

大橋成美, 余村直子, 山本克彦, 荒木章子, 木下慶介, 中谷智広, 入野俊夫

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 2018.03.19 (ホテルミヤヒラ(沖縄石垣市))

　View Summary

電子情報通信学会音声研究会, 信学技報, vol. 117,no.517, SP2017-99, pp. 87-92,
低雑音レベルを含めたノッチ雑音マスキング閾値と聴覚フィルタ推定

横田健治, 入野俊夫, ロイ D. パターソン

日本音響学会 2018.03.13 (日本工業大学宮代キャンパス（埼玉県南埼玉郡))

　View Summary

春季研究発表会講演論文集,2-P-17, pp.691-692, 13-15 Mar 2018
Annotating Compliments

井上雅史, 中島隆太郎, 花田里欧子, 古山宣洋, 入野俊夫

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 2018.03.13 東北大学電気通信研究所(宮城県,仙台市)

　View Summary

電子情報通信学会ヒューマンコミュニケーション基礎研究会 (HCS) , vol. 117, no.509, HCS2017-95, pp. 11-15, 2018年3月13日〜14日
振幅包絡歪み指標に基づくバブル雑音下の音声明瞭予測

山本克彦, 大橋成美, 入野俊夫, 荒木章子, 木下慶介, 中谷智広

日本音響学会 2018.03.13 (日本工業大学宮代キャンパス（埼玉県南埼玉郡))

　View Summary

春季研究発表会講演論文集,3-P-7, pp.1305-1308, 13-15 Mar 2018
小型ボードコンピュータ Raspberry Piを用いた笑い声の収集

Toshio Irino

日本音響学会 2018.03.13 (日本工業大学宮代キャンパス（埼玉県南埼玉郡))

　View Summary

春季研究発表会講演論文集,2-Q-22, pp.199-200, 13-15 Mar 2018
Possible application of velvet noise and its variant in psychology and physiology of hearing

河原英紀, 津崎実, 坂野秀樹, 森勢将雅, 松井淑恵, 入野俊夫

日本音響学会聴覚研究会 2018.03.03 (沖縄産業支援センター(沖縄県那覇市))

　View Summary

信学技報, vol. 117, no. 470, HIP2017-113, pp. 99-104, 2018年3月3日〜4日
Enhancing wave-I of auditory brainstem response by choosing the latency of rising-frequency chirp,

Takashi Morimoto, Yoh-ichi Fujisaka, Yasuhide Okamoto, Toshio Irino

ARO 41st midwinter meeting, Abstract PS-33 San Diego, CA, USA, 9-14 Feb., 2018. 2018.02

　View Summary

(発表日 10 Feb. )
Incorporating absolute threshold and a cochlear noise floor into the GammaChirp model of masking,

Toshio Irino, Kenji Yokota, Toshie Matsui, Roy D. Patterson

ARO 41st midwinter meeting, Abstract PS-800 San Diego, CA, USA, 9-14 Feb., 2018. 2018.02

　View Summary

(発表日 12 Feb. )
Evaluation of active listening in psychotherapy:Comparison of clinical psychologists and students

HANADA Ryoko, NAKAJIMA Ryutaro, INOUE Masashi, FURUYAMA Nobuhiro, IRINO Toshio

Proceedings of the Annual Conference of JSAI 2018 The Japanese Society for Artificial Intelligence

　View Summary

Active listening is one of the indispensable axes in evaluating the dialogue of psychotherapy. Although there have been discussions about it in the area of clinical psychology, the method for evaluating active listening has been missing. It is thus necessary to establish it to improve the quality of listening in the interview. The authors have proposed a measurement method of the degree of active listening with a device we originally developed to evaluate emotion (EMO system). This paper reports on the experiment conducted to compare the evaluations of a psychotherapy by expert clinical psychologists with those by undergraduate students as one of the coursework tasks for a clinical psychology course. A new experimental setup was proposed including a multiresolutional analysis to detect the change of active listening evaluation.
聴覚モデル適合の改良のための低レベルノッチ雑音も含めた閾値

横田健治, 入野俊夫, 松井淑恵, Roy D. Patterson

日本音響学会関西支部,第20回関西支部若手研究者交流研究発表会 2017.12.16 (同志社大学(京田辺市))

　View Summary

#17(ポスター)
音響教育のためのスピーカ及び簡易音圧確認治具

岩城龍之介, 松浦弘樹, 櫻井梨七, 中川望己, 奥谷友梨, 山内悠記, 上村怜央, 東山宗一, 横田健治, 入野俊夫

日本音響学会関西支部,第20回関西支部若手研究者交流研究発表会 2017.12.16 (同志社大学(京田辺市))

　View Summary

#1(デモ発表)
雑音抑圧で音声は聴き取りやすくなる？ーバブル vs ピンクお邪魔対決ー

大橋成美, 山本克彦, 入野俊夫, 荒木章子, 木下慶介, 中谷智広

日本音響学会関西支部,第20回関西支部若手研究者交流研究発表会 2017.12.16 (同志社大学(京田辺市))

　View Summary

#18(ポスター)
加齢によるピッチ・シフト現象とピッチ・モデル:モデルで見落とされてきた側面

津崎実, 牧勝弘, 入野俊夫

日本基礎心理学会第 36 回大会 2017.12.01 (立命館大学・大阪いばらきキャンパス(大阪府茨木市）)

　View Summary

1-3 Dec 2017
変調スペクトル領域の信号対歪み比に基づく音声明瞭度予測法の提案

山本克彦, 山本克彦, 入野俊夫, 松井淑恵, 荒木章子, 木下慶介, 中谷智広

信号処理シンポジウム講演論文集(CD-ROM) 2017.11.08

　View Summary

B8-4, pp.372-377, マリオス盛岡地域交流センター(岩手県盛岡市), 8-10 Nov. 2017
「風力発電所計画厳しい意見続々県環境影響審査会」

入野俊夫

2017.10.05 朝日新聞（p.22 和歌山面）
グループワーク対話の分析を通じた盛り上がりの定量化の検討

三上菜穂, 西村竜一, 入野俊夫

日本音響学会 2017.09.25 (愛媛大学(愛媛県松山市))

　View Summary

日本音響学会：秋季研究発表会講演論文集,1-R-21, pp.113-114, 25 - 27 Sep 2017
高齢難聴者の文聴取における文節休止の効果―模擬難聴システムによる検討―

長谷川純, 畑山春菜, 吐師道子, 松井淑恵, 入野俊夫

第18回日本言語聴覚学会 2017.06.23 (くにびきメッセ－島根県立産業交流会館－(島根県松江市))

　View Summary

2017年6月23-24日
A computational model of speaker size perception for voiced speech sounds

瀧本恵理, 入野俊夫, 松井淑恵, PATTERSON Roy D

情報処理学会, 音学シンポジウム2017 2017.06.18 (お茶の水女子大(東京都文京区))

　View Summary

発表番号55, 情報処理学会研究報告,Vol.2017-MUS-115, No.55,pp.1-6, 2017年6月17日-18日
The enhancing high-frequency components affects size discrimination of voiced speech sounds

松井淑恵, 入野俊夫, 山本航大, 河原英紀, PATTERSON Roy D

情報処理学会, 音学シンポジウム2017 2017.06.18 (お茶の水女子大(東京都文京区))

　View Summary

発表番号44, 情報処理学会研究報告,Vol.2017-MUS-115, No.44,pp.1-6,2017年6月17日-18日
模擬難聴システムの教育・臨床・研究への適用

長谷川純, 吐師道子, 山下祐季, 畑山春菜, 松井淑恵, 入野俊夫

広島県言語聴覚士会学術集会 2017.06.04 (県立広島大(広島県三原市))
Hearing impairment simulator for training course of speech therapists and development of its web application

米満麻弥, 入野俊夫, 松井淑恵, 西村竜一, 吐師道子, 長谷川純

電子情報通信学会ヒューマン情報処理研究会 (HIP) ,ヒューマンコミュニケーション基礎研究会 (HCS) 合同研究会 2017.05.16 (沖縄産業支援センター(沖縄県那覇市))

　View Summary

信学技報, vol. 117, no. 30, HIP2017-42, pp. 277-282, 2017年5月16日〜17日
Hearing impairment simulator using the dynamic compressive gammachirp filterbank and its application

Toshio Irino

日本音響学会関西支部, 聴覚基礎理論談話会／ (科研A)^2 合同ミーティング 2017.03.28 (京都市芸術大学(京都府京都市))
「映画・ゲームの「足音」リアルに−和歌山大、歩行データから自動合成」

入野俊夫

2017.03.28 日刊工業新聞
ユーザ訂正情報に基づいた音声認識API出力の並び替え法の開発

遠山智明, 西村竜一, 入野俊夫

日本音響学会：春季研究発表会講演論文集,1-Q-12, pp. 113-114 2017.03.15 (明治大学(神奈川県川崎市))

　View Summary

15-17 Mar 2017
Correspondence between tags assigned via microcounseling and the listening assessment of evaluators as determined by the EMOtional MOvement Observation system (EMO system)

花田里欧子, 入野俊夫, 古山宣洋, 井上雅史, 中島隆太郎

電子情報通信学会ヒューマンコミュニケーション基礎(HCS)研究会 2017.03.15 (東北大学(宮城県仙台市))

　View Summary

信学技報, vol. 116, no. 524, HCS2016-110, pp. 113- 118 2017年3月15-16日
非対称レベルノッチ雑音マスキング法による高齢者の聴覚フィルタ形状と圧縮特性の推定

稲部葉月, 松井淑恵, 西村友里, PATTERSON Roy D, 入野俊夫

日本音響学会：春季研究発表会講演論文集,2-Q-29, pp.705-706 2017.03.15 (明治大学(神奈川県川崎市))

　View Summary

15-17 Mar 2017(筆頭著者稲部葉月、「学生優秀発表賞賞 (第15回)」受賞)
Correspondence between tags assigned via microcounseling and the listening assessment of evaluators as determined by the EMOtional MOvement Observation system (EMO system)

花田里欧子, 入野俊夫, 古山宣洋, 井上雅史, 中島隆太郎

電子情報通信学会技術研究報告 2017.03.08
Active listening learning support for counselors by employing a psychological counseling corpus and the EMOtional MOvement Observation system (EMO system)

花田里欧子, 入野俊夫, 古山宣洋, 井上雅史, 中島隆太郎

電子情報通信学会ヒューマンコミュニケーション基礎(HCS)研究会 2017.01.27 (なみきスクウェア (福岡県福岡市))

　View Summary

信学技報, vol. 116, no. 436, HCS2016-60, pp. 5-10, 年1月27-28日
難聴者に聞こえやすい音声特徴 ~模擬難聴を用いた発声の振幅変調分析~

吉田駿, 山本克彦, 西村竜一, 松井淑恵, 入野俊夫

日本音響学会関西支部,第19回関西支部若手研究者交流研究発表会 2016.12.18 (関西大学100周年記念会館(大阪府吹田市))

　View Summary

#44 筆頭著者吉田駿、「奨励賞」受賞
深層学習を用いたゲームコンテンツのための効果音自動生成手法の検討

吉田赳, 入野俊夫, 西村竜一

日本音響学会関西支部,第19回関西支部若手研究者交流研究発表会 2016.12.18 (関西大学100周年記念会館(大阪府吹田市))

　View Summary

#34
非対称レベルノッチ雑音マスキング法における測定点削減討

西村友里, 入野俊夫, 松井淑恵, Roy D. Patterson

日本音響学会関西支部,第19回関西支部若手研究者交流研究発表会 2016.12.18 (関西大学100周年記念会館(大阪府吹田市))

　View Summary

#51
オージオグラムを動かして聞く! ~Web アプリケーションとしての模擬難聴システムを目指して~

松井淑恵, 米満麻弥, 西村竜一, 入野俊夫

日本音響学会関西支部,第19回関西支部若手研究者交流研究発表会 2016.12.18 (関西大学100周年記念会館(大阪府吹田市))

　View Summary

#52
Estimation of auditory compression and filter shape of elderly listeners using notched noise masking,

Toshie Matsui, Toshio Irino, Hazuki Inabe, Yuri Nishimura, Roy D. Patterson

Presented at ASA-ASJ joint meeting 2016, J. Acoust. Soc. Am., 140 Hilton Hawaiian Village Waikiki Beach Resort, Honolulu, Hawaii, 28 Nov. - 2 Dec.2016 2016.12

　View Summary

(発表：1 Dec 2016)
招待講演 Characterizing impairments in compression and filter shape to establish their role in hidden hearing loss,

Toshio Irino, Toshie Matsui, Roy D. Patterson [Invited]

ASA-ASJ joint meeting 2016 2016.11.30 (Hilton Hawaiian Village Waikiki Beach Resort, Honolulu, Hawaii,)

　View Summary

28 Nov. - 2 Dec.2016
Analysis of acoustic features for speech intelligibility prediction models

Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani

J. Acoust. Soc. Am., 140,ASA-ASJ joint meeting 2016, Hilton Hawaiian Village Waikiki Beach Resort, Honolulu, Hawaii, 28 Nov. - 2 Dec.2016 2016.11

　View Summary

(発表：29 Nov 2016)
招待講演 Acoustic Scale Processing in the Auditory System,

Toshio Irino [Invited]

RIMS Joint Research & CoopMath 2016, Wavelet analysis and signal processing, 2016.10.24 (Kyoto Univ., Kyoto,)

　View Summary

2016 RIMS 共同研究「ウェーブレット解析と信号処理」 , 24-25, Oct 2016.
音声明瞭度予測法dcGC‐sEPSMの諸検討:評価用雑音の特性と予測精度への影響

山本克彦, 入野俊夫, 松井淑恵, 荒木章子, 木下慶介, 中谷智広

日本音響学会研究発表会講演論文集(CD-ROM) 2016.09.14 (富山大学（富山県富山市))

　View Summary

2-P-44, pp. 663-666 2016年9月14日-16日
ユーザ訂正情報を用いた音声認識APIのカスタマイズ手法の検討

遠山智明, 西村竜一, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2016.09.14 (富山大学（富山県富山市))

　View Summary

3-Q-14, pp. 125-126 2016年9月14日-16日
招待講演 The perceptual ends of the periodicity; but of what periodicity?

Minoru Tsuzaki, Sawa Hanada, Junko Sonoda, Satomi Tanaka, Toshio Irino [Invited]

Internoise 2016 2016.08.24 (Hamburg, Germany,)

　View Summary

21-24, Aug 2016.
Predicting speech intelligibility using the dynamic compressive gammachirp filterbank: comparison with the result for enhanced speech

山本克彦, 入野俊夫, 松井淑恵, 荒木章子, 木下慶介, 中谷智広

音学シンポジウム2016 学会研究報告(Web) 2016.05.21 (東海大学(東京都港区))

　View Summary

発表番号20,Vol.2016-MUS-111, No.20,pp.1-6, 2016年5月21日-22日
招待講演聴覚心理実験に基づいたモデルとその実践応用,

入野俊夫, 松井淑恵, 津崎実, 吐師道子 [Invited]

日本音響学会 2016.03.11 (桐蔭横浜大, 横浜,)

　View Summary

春季研究発表会講演論文集, 3-6-2, pp. 1445-1446, 9--11 Mar 2016.
強調音声のための明瞭度予測法の検証:聴取実験結果との比較

山本克彦, 入野俊夫, 松井淑恵, 荒木章子, 木下慶介, 中谷智広

日本音響学会研究発表会講演論文集(CD-ROM) 2016.03.09 (桐蔭横浜大, 横浜)

　View Summary

春季研究発表会講演論文集, 2-P-23, pp. 823-826, 9--11 Mar 2016
スペクトル傾斜の異なる音声の寸法知覚と聴覚モデルによる説明

山本航大, 入野俊夫, 岡本江美, 松井淑恵, 西村竜一, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2016.03.09 (桐蔭横浜大, 横浜)

　View Summary

春季研究発表会講演論文集, 2-Q-13 pp. 481-484, 9--11 Mar 2016
GetWild:音声生成過程を考慮したグロウルの印象付与システム

溝渕翔平, 入野俊夫, 西村竜一, 松井淑恵, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2016.03.09 (桐蔭横浜大, 横浜)

　View Summary

春季研究発表会講演論文集, 2-2-9, pp. 249-252, 9--11 Mar 2016.
ウェブ試験向け音声入力UI設計における不要語の扱いについて

西村竜一, 牧野さやか, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2016.03.09

　View Summary

春季研究発表会講演論文集, 3-1-5 pp. 81-82, 9--11 Mar 2016
言語聴覚士養成課程における模擬難聴の教育利用に向けた試み

永江美沙貴, 入野俊夫, 松井淑恵, 長谷川純, 吐師道子, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2016.03.09 (桐蔭横浜大, 横浜)

　View Summary

春季研究発表会講演論文集, 3-6-12 pp. 1471-1472, 9--11 Mar 2016
非対称レベルノッチ雑音マスキング法を用いた圧縮特性推定と測定点削減の検討

西村友里, 入野俊夫, 松井淑恵, 河原英紀, PATTERSON Roy D

日本音響学会研究発表会講演論文集(CD-ROM) 2016.03.09 (桐蔭横浜大, 横浜)

　View Summary

春季研究発表会講演論文集, 3-6-8, pp. 1459-1462 9--11 Mar 2016
声道形状と声帯音源特性の操作に基づいたグロウル系歌唱の印象付与法

溝渕翔平, 西村竜一, 松井淑恵, 入野俊夫, 河原英紀

電子情報通信学会論文誌 D(Web) 2016.03
An improvement of the predicting method for speech intelligibility using the dynamic compressive gammachirp filterbank

山本克彦, 入野俊夫, 松井淑恵, 荒木章子, 木下慶介, 中谷智広

聴覚研究会資料 = Proceedings of the auditory research meeting 2016.02.20 (那覇市IT創造館, 沖縄)

　View Summary

Vol.46, No.1, H-2016-9, pp.25--40, 2016年2月20日-21日
招待講演模擬難聴とそれを支える聴覚心理実験,

Toshio Irino [Invited]

県立広島大保健福祉学部コミュニケーション障害学科セミナー 2016.02.17 (県立広島大保健福祉,三原, 広島,)
音声生成過程を考慮したグロウルの印象付与システム~あなたの声にこぶし、効かせます~

溝渕翔平, 入野俊夫, 西村竜一, 松井淑恵, 河原英紀

第18回関西支部若手研究者交流研究発表会 2015.12.13 (関西大学100周年記念会館,大阪) 日本音響学会関西支部

　View Summary

#36
強調音声の明瞭度 -計算機は人の聞こえを予測できる？-

山本克彦, 入野俊夫, 松井淑恵, 荒木章子, 木下慶介, 中谷智広

第18回関西支部若手研究者交流研究発表会 2015.12.13 (関西大学100周年記念会館,大阪) 日本音響学会関西支部

　View Summary

#42 筆頭著者山本克彦、「最優秀奨励賞」受賞
The enhancing high-frequency components of unvoiced sounds impacts on the size perception

山本航大, 入野俊夫, 岡本江美, 松井淑恵, 西村竜一, 河原英紀

日本音響学会聴覚研究会資料 = Proceedings of the auditory research meeting 2015.11.13 (甲州市勝沼ぶどうの丘, 山梨)

　View Summary

Vol.45, No.8, H-2015-120, pp.681--686 2015年11月13日-14日
脳波を用いた時間分解能測定

森本隆司, 森本隆司, 藪下岳, 藤坂洋一, 中市健志, 入野俊夫, 岡本康秀, 岡本康秀, 貫野彩子, 貫野彩子, 神崎晶, 小川郁

日本音響学会聴覚研究会資料 2015.11.13 甲州市勝沼ぶどうの,山梨

　View Summary

Vol.45, No.8, H-2015-119, pp.675--680
招待講演 A perceptual continuum for pitch transition with no chromatic change: A challenge for a new model of pitch,

Minoru Tsuzaki, Sawa Hanada, Katsuhiro Maki, Toshio Irino, Toshie Matsui, Chihiro Takeshima [Invited]

Taiwan/Japan Joint Auditory Research Meeting, National Tsing Hua University, Taiwan, 2015.10.23 (国立精華大学,台湾)

　View Summary

日本音響学会聴覚研究会資料, Vol. 45, No.7, H-2015-105, pp.--, 23--24 Oct. 2015. (発表：23 Oct 2015)
Study on predicting speech intelligibility of enhanced speech sounds using the dynamic compressive gammachirp auditory filterbank and modulation filterbank,

Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani

presented at Taiwan/Japan Joint Auditory Research Meeting, National Tsing Hua University, Taiwan, 音響学会聴覚研究会資料 2015.10

　View Summary

国立精華大学,台湾, 23--24 Oct. 2015 (発表：23 Oct 2015). Proc. Auditory Res. Meeting, Acoust. Soc. Japan,
位相差を伴った同一周期のパルス列が加算される場合の音の知覚について

津崎実, 花田沙和, 牧勝弘, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2015.09.18 (会津大, 会津若松,)

　View Summary

秋季研究発表会講演論文集,3-3-5,pp.1309-1312, 2015年9月16日～18日,
Raspberry Piを用いた笑い声検知システムの提案

三上菜穂, 西村竜一, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2015.09.18 (会津大, 会津若松)

　View Summary

秋季研究発表会講演論文集,3-Q-4,pp.149-150, 2015年9月16日～18日
動的圧縮型ガンマチャープフィルタバンクを用いた強調音声の明瞭度予測法の提案

山本克彦, 入野俊夫, 荒木章子, 木下慶介, 中谷智広

日本音響学会研究発表会講演論文集(CD-ROM) 2015.09.17 (会津大, 会津若松,)

　View Summary

秋季研究発表会講演論文集,2-P-36,pp. 473-474, 2015年9月16日～18日, 筆頭著者山本克彦、「学生優秀発表賞」受賞
非対称レベルノッチ雑音マスキング法を用いた1kHzにおける圧縮特性推定

西村友里, 入野俊夫, 松井淑恵, 河原英紀, PATTERSON Roy D

日本音響学会研究発表会講演論文集(CD-ROM) 2015.09.17 (会津大, 会津若松,)

　View Summary

秋季研究発表会講演論文集,2-P-33,pp.467-468, 2015年9月16日～18日
言語聴覚士養成教育への模擬難聴の導入の試みについて

永江美沙貴, 入野俊夫, 松井淑恵, 長谷川純, 吐師道子, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2015.09.17 (会津大, 会津若松,)

　View Summary

秋季研究発表会講演論文集,2-5-7, pp.1229-1230, 2015年9月16日～18日,
大人・子ども話者識別システムにおける性能改善の検討

西村竜一, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2015.09.16 (会津大, 会津若松)

　View Summary

秋季研究発表会講演論文集,1-2-12, pp.29-30, 2015年9月16日～18日
音声科学教育用対話的ツールのためのエリアシングの無い L-F モデルの実装について

河原英紀, 榊原健一, 坂野秀樹, 森勢将雅, 戸田智基, 入野俊夫

日本音響学会聴覚研究会, 電子情報通信学会／音響学会電気音響研究会, , 電子情報通信学会技術研究報告、EA2015-08, 2015.08.03 (東北大学, 仙台)

　View Summary

2015年8月3 日-4日
Hearing Impairment Simulator with Inverse Compression based on the Compressive Gammachirp Filter,

Toshio Irino, Misaki Nagae, Toshie Matsui, Hideki Kawahara, Roy D. Patterson

Auditory Model Workshop Universität Oldenburg, Oldenburg, 12-13 Jun, 2015 2015.06
声道形状と声帯音源特性の操作に基づくグロウル系歌唱音声の印象付与法の評価について

溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

音学シンポジウム2015 2015.05.24 (電気通信大学, 東京) 情報処理学会

　View Summary

発表番号60, 情報処理学会研究報告,Vol.2015-MUS-107,No.60,pp.1-6, 2015年5月23日-24日
Evaluation of singing voice conversion to growl-like singing based on vocal tract shape and glottal source characteristics

溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

情報処理学会研究報告(Web) 2015.05
声道形状と声帯音源特性を利用したグロウル系歌唱音声への変換について

溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2015.03.18 (中央大, 東京)

　View Summary

3-2-7,pp.289-290 2015年3月16日～18日
スマホを用いた環境音認識アプリに対するDNNの導入

松山みのり, 西村竜一, 河原英紀, 山田順之介, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2015.03.17 (中央大, 東京)

　View Summary

2-1-14,pp.79-80 2015年3月16 日～18日
非対称レベルノッチ雑音マスキング法による4kHzにおける圧縮特性推定

金内由紀, 入野俊夫, 西村竜一, 河原英紀, PATTERSON Roy D

日本音響学会：春季研究発表会講演論文集 2015.03.17 (中央大, 東京)

　View Summary

2-Q-12,pp.505-506 2015年3月16日～18日
聴覚の圧縮特性のキャンセル処理による模擬難聴―語音明瞭度による検討―

永江美沙貴, 松井淑恵, 西村竜一, 河原英紀, PATTERSON Roy D, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2015.03.17 (中央大, 東京)

　View Summary

2-Q-20,pp.523-524, 2015年3月16日～18日
無声音の高域強調処理による寸法知覚特性シフト

山本航大, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：春季研究発表会講演論文集 2015.03.17 (中央大, 東京)

　View Summary

2-Q-18,pp.517-518 2015年3月16日～18日
声道断面積関数推定における声帯音源特性の補償について

伊佐衣代, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2015.03.16 (中央大, 東京)

　View Summary

1-2-4,pp.231 -232 2015年3月16日～18日
音声の好感度に対する声道形状および音源情報操作の効果について

吉元照貴, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2015.03.16 (中央大, 東京)

　View Summary

1-R-32,pp. 351-332 2015年3月16日～18日
ウェブアプリケーションにおける音声入力UIの設計と評価について

田藤千弘, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2015.03.16 (中央大, 東京)

　View Summary

1-P-33,pp. 191-192 2015年3月16日～18日
周期信号の短時間Fourier変換に基づく静的表現と音声分析合成系への応用について

河原英紀, 森勢将雅, 坂野秀樹, 戸田智基, 榊原健一, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2015.03.16 (中央大, 東京)

　View Summary

1-R-18,pp. 313-314 2015年3月16日～18日
SEANA: 利用者の動作を強調する音の拡張現実アプリの開発

吉田赳, 西村竜一, 入野俊夫, 河原英紀

情報処理学会, インタラクション2015 2015.03.07 (東京国際交流館)

　View Summary

pp.972--977 2015年3月5日〜7日
Statistical modelling of an F0 estimation method based on higher-order waveform symmetry and its application to filled pause analysis

KAWAHARA Hideki, NISIMURA Ryuichi, IRINO Toshio

IEICE technical report. Speech 2015.03.03 (南の美ら花ホテルミヤヒラ, 沖縄)

　View Summary

A robust method for tacking F0 trajectory as an initial estimate followed by a refinement procedure which is base on a temporally static instantaneous frequency is proposed. The proposed initial estimation method is based on a higher-order waveform symmetry measure which is computationally efficient and has finer temporal resolution. This proposal is aiming at analysing filled pause, which is frequently observed in spontaneous speech used in everyday situations. Instabilities of vocal fold vibration usually found in filled pauses, which make commonly used F0 extractors to fail, motivated this development of a new F0 extraction method.
Improving voice attractiveness by speech parameter modification for interactive voice training applications

吉元照貴, 西村竜一, 入野俊夫, 河原英紀

IPSJ SIG Notes 2015.03.03 (甲府富士屋ホテル, 山梨)

　View Summary

A simple voice training system for improving attractiveness is introduced with descriptions on a set of procedures which consist of the system. Those procedures are based on findings drawn from our investigations on voice attractiveness using a new voice morphing method. They are summarized as follows. a) Most contributing physical factors on attractiveness are fundamental frequency and spectral information. b) Attractiveness judgement differs among listeners. c) Change in perceived talker of the modified voice caused by physical parameter manipulation for improving voice attractiveness is disturbing for listener's judgement and adjustment. To overcome the last disturbing factor, physical parameters change within each talker for improving attractiveness is acquired recruiting student actors in our university. Several sets of physical parameters change are applied to improve attractiveness of voices with lower attractiveness score. Attractiveness of the modified voices using these sets of physical parameters change were tested for all possible combinations of the source actor, talkers of manipulated voices and the listeners. The proposed voice training system is introduced based on the results of tests.
Realtime singing voice conversion to growl-like singing based on vocal tract shape and glottal source characteristics

溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

IPSJ SIG Notes 2015.03.03 (甲府富士屋ホテル, 山梨)

　View Summary

Outline of a system to convert usual singing voice to growl-like performance in realtime is introduced. Relatively high-speed periodic variations (around 70Hz) in spectral shapes and fundamental frequency trajectories were found dominant features of growl-like singing in our pervious investigations. A set of simulations revealed that these spectral shape variations can be closely replicated by introducing vocal tract shape variations around spura-glottal structures and shape variations in glottal source waveform using the LF-model. Despite the fact that realtime extraction of LF parameters from input voice is not feasible, the simulation results indicated that the net effect of the variation can be represented by simple spectral slope variations. For vocal tract shape variation, several set of spectral models for approximating simulated variations can be suggested. These indicate that by using these approximated models, it is possible to design a realtime system for converting usual singing voices to growl-like voices.
Change of size perception when enhancing high-frequency components of unvoiced sounds and its computational theory

山本航大, 入野俊夫, 西村竜一

日本音響学会聴覚研究会資料 2015.03.02 (北海道医療大学札幌サテライトキャンパス, 北海道)

　View Summary

Vol.45, No.2, H-2015-21, pp.99-104
Syllable identification of speech sounds processed by a hearing impairment simulator which cancels auditory peripheral compression

松井淑恵, 入野俊夫, 永江美沙貴, 河原英紀, Roy D. Patterson

日本音響学会聴覚研究会資料 2015.03.02 (北海道医療大学札幌サテライトキャンパス, 北海道)

　View Summary

Vol.45, No.2, H-2015-20, pp.93-98
Age Related Shifts of Absolute Pitch Judgment and Their Relation to the Auditory Filter Bandwidths.

Minoru Tsuzaki, Toshie Matsui, Toshio Irino, Chihiro Takeshima

ARO 38th midwinter meeting 2015 Abstract PS-319, 2015.02

　View Summary

Baltimore, MA, USA, 21-25 Feb., 2015.(発表日 22 Feb. )
声道断面積関数推定における音源情報の利用の効果について

伊佐衣代, 西村竜一, 入野俊夫, 河原英紀

日本音響学会関西支部, 第17回関西若手研究者交流研究発表会, #17 2014.12.14 (関西大学100周年記念会館,大阪)
音声の発話方法による聴き取りやすさの違いの検討〜一人芝居の声で比べてみた〜

吉田駿, 入野俊夫, 河原英紀, 西村竜一

日本音響学会関西支部,第17回関西支部若手研究者交流研究発表会,#34 2014.12.14 (関西大学100周年記念会館,大阪)
DNNを用いたスマホ収集環境音の認識について

松山みのり, 西村竜一, 河原英紀, 山田順之介, 入野俊夫

日本音響学会関西支部,第17回関西支部若手研究者交流研究発表会,#18 2014.12.14 (関西大学100周年記念会館,大阪)
声道形状を利用したグロウル系歌唱音声への変換について

溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

日本音響学会関西支部,第17回関西支部若手研究者交流研究発表会,#33 2014.12.14 (関西大学100周年記念開館,大阪)
聴覚系の寸法知覚における手がかり情報に関する検討ー聴覚心理実験の側面よりー

山本航大, 入野俊夫, 西村竜一, 河原英紀

日本音響学会関西支部，第17回関西支部若手研究者交流研究発表会，#42 2014.12.14 (関西大学100周年記念会館,大阪)
Nonlinearity and Wavelet property of the auditory filterbank suitable for scale analysis in the auditory system (Wavelet analysis and sampling theory)

Irino Toshio, Kawahara Hideki, Patterson Roy D

RIMS Kokyuroku 2014.12
招待講演 The role of STRAIGHT in research on the perception of size in speech and music,

Roy D. Patterson, Toshio Irino [Invited]

[聴覚/音声研究会招待講演], 和歌山, 2014.10.24 (ホテルシーモア（白浜）, 和歌山,)

　View Summary

Fifteen years ago, while working on the mathematics of the gammachirp auditory filter, we realized that the perception of speech and music is largely scale invariant. People understand the speech of other people no matter what their average voice pitch or their mean formant frequency. People also know the family of an instrument (brass, string or woodwind) independent of its size and register. We illustrated how the auditory system could use a form of "stabilized wavelet-Mellin transform" to normalize the sounds of speech and music, and we decided to do some research on the perceptual invariance of speech and musical sounds. This was easier said than done, as it requires the manipulation of the acoustic scale variables in natural sounds. Fortunately, at about the same time, Kawahara-sensei released STRAIGHT which provided high-fidelity manipulation of the pitch and vocal tract length of speech sounds and musical tones. This paper describes a sequence of experiments on the perception of size using sounds in which the scale parameters were manipulated by STRAIGHT, and how the resynthesis element of STRAIGHT was adapted for musical sounds. The research provides one extended example of how STRAIGHT has empowered research on the perception of natural sounds.
Invited talk : The role of STRAIGHT in research on the perception of size in speech and music

PATTERSON Roy D, 入野俊夫

聴覚研究会資料 = Proceedings of the auditory research meeting 2014.10.23
Pre-processing for vocal tract area function estimation using linear prediction analysis

ISA Kinuyo, YOSHIMOTO Shoki, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report. Speech 2014.10.23 (南紀白浜温泉ホテルシーモア, 和歌山)

　View Summary

Estimaiton of vocal tract area function based on linear predictive analysis suffers from biasing factors such as glottal waveform and radiation from mouth. Preprocessing procedures for compensating these effects consist of high-frequency emphasis and spectrum flattening and were investigated. Analysis results using these procedures on a vowel database are also introduced.
Investigations on estimated vocal tract area functions of growl-like singing voices

MIZOBUCHI Shohei, ISA Kinuyo, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report. Speech 2014.10.23 (南紀白浜温泉ホテルシーモア, 和歌山)

　View Summary

Behavior of vocal tract area functions estimated from growl-like singing voices was investigated to introduce a simple model for generating synthetic growl-like singing. Our previous study revealed that a fast modulation of spectral shape around 2 to 4 kHz is the most significant feature of growl-like singing. LPC-based vocal tract shape estimation with relevant preprocessing procedures was applied to growl-like singing and normal singing voices.
音声認識を用いた日本語スピーキングテストとそのユーザインタフェースデザインの検討

田藤千弘, 西村竜一, 河原英紀, 入野俊夫, 今井新悟

教育システム情報学会全国大会講演論文集(CD-ROM) 2014.09.10 (和歌山大学, 和歌山,)

　View Summary

発表番号I1-32, pp.63-64, 2014年9月10日-12日
聴覚の圧縮特性の逆処理による模擬難聴とその特性

永江美沙貴, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2014.09.05 (北海道学園大, 札幌,)

　View Summary

3-Q-27,pp.457-458, 2014年9月3日〜5日
うっかり者を手助けする環境音認識アプリの開発について

松山みのり, 西村竜一, 河原英紀, 山田順之介, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2014.09.05 (北海道学園大, 札幌,)

　View Summary

3-8-14,pp.1559-1560, 2014年9月3日〜5日
Shifts in the absolute pitch judgment by aging and its relation to the otoacoustic emissions

津崎実, 松井淑恵, 入野俊夫

日本音響学会研究発表会講演論文集日本音響学会編 2014.09.05 (北海道学園大, 札幌,)

　View Summary

3-Q-37,pp.489-482, 2014年9月3日〜5日
招待講演聴覚末梢系の圧縮特性の心理物理測定と模擬難聴への応用,

Toshio Irino [Invited]

日本音響学会 2014.09.04 (北海学園大, 札幌,)

　View Summary

秋季研究発表会講演論文集, 2-2-8, pp.1579-1582, 2014年9月3日～5日,
声道形状と音源情報に注目した音声の好感度改善システムの検討について

吉元照貴, 伊佐衣代, 溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2014.09.04 (北海道学園大, 札幌,)

　View Summary

2-Q-46,pp. 373-375, 2014年9月3日〜5日
周期信号の群遅延の静的表現と音声の非周期成分への応用について

河原英紀, 森勢将雅, 榊原健一, 戸田智基, 坂野秀樹, 西村竜一, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2014.09.03 (北海道学園大, 札幌,)

　View Summary

1-R-30,pp.273-276, 2014年9月3日〜5日
線形予測分析を用いた声道断面積関数推定のための前処理について

伊佐衣代, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2014.09.03 (北海道学園大, 札幌,)

　View Summary

1-R-34,pp.283-284, 2014年9月3日〜5日
時間分解能の低下を模擬した劣化音声の知覚

森本隆司, 中市健志, 原田耕太, 岡本康秀, 神崎晶, 小川郁, 入野俊夫

第11回日本聴覚医学会内耳ひずみ研究会 2014.07.04 (慶應大病院,東京)
A GUI for manipulating grow-like taste in singing voice

MIZOBUCHI SHOHEI, NISHIMURA RYUICHI, IRINO TOSHIO, KAWAHARA HIDEKI

IEICE technical report. Speech 2014.05.25 (日本大学文理学部, 東京,)

　View Summary

A set of GUIs is designed to add and manipulate growl-like taste in singing voice based on a set of simple signal processing procedures, proposed in our previous report. It consists of a temporal axis modulator for simulating rapid F0 variations, an equalizer to modify global spectral shape, and an approximate time varying filter for simulating rapid spectral modulation around F3 area. The proposed set of procedures is potentially applicable to realtime applications, such as live performance. This set of GUIs will be presented in the poster session for demonstrating possibilities of the proposed procedures and acquiring feedback and comments from prospective participants.
Design of voice-enabled web test system for eliminating users' impatience

TAFUJI Chihiro, NISHIMURA Ryuichi, KAWAHARA Hideki, IRINO Toshio

IEICE technical report. Speech 2014.05.25 (日本大学文理学部, 東京,)

　View Summary

We have investigated the user interface (UI) design of the web-based test system with a voice input function. As for the visual feedbacks to the examinee, a time gauge indicating the remainder of the answer time and a level meter for checking an input state of the speech are located on the screen of our system displaying the questions. In the previous UI, the similarities of two visual presentations often caused confusions of the examinees. In order to provide the appropriate presentations of the questions on the web screen, we improved the design of the voice-enabled UI. In the experiment for evaluating the improved UI, we have developed a system to answer computational questions via the speech web interface. By focusing on the time gauge, we investigated "time guage speed and impatience" which the users feel in the time of using the system. As a result, we confirmed the suitability that the brick-type time gauge displaying elapsed time based on discreted indicators dividing the time into 1 second. Based on investigations of the relationship of examinees' speaking styles and speech recognition rates, we found a tendency for the accuracy of the person who did not aware of the interaction with the machine is low. Because we adopted HTML5 as a implementation language of the voice-enabled UI, the improved system could run on the Android mobile machine and PCs.
ROCKON : Environmental sound collection and recognition system using smartphones

MATSUYAMA Minori, Tsuda TAKAHIKO, NISHIMURA Ryuichi, KAWAHARA Hideki, YAMADA Junnosuke, IRINO Toshio

IEICE technical report. Speech 2014.05.24 (日本大学文理学部, 東京)

　View Summary

We have been developing an Android mobile application which can provide an useful information for users by recognizing environmental sounds around us. This paper evaluates environmental sound recognition methods in comparison with the AdaBoost and the HMMs (Hidden Markov Models). The experimental results proved that AdaBoost could obtain better performances from the viewpoint of the accuracy and the processing speed. Further collection of environmental sounds based on the crowdsourcing approach needs to introduce the Android app with the improved user interface (UI) for annotating a source type of a sound. Crowdsourcing proved useful for easily developing the sound database. However,we discovered that improvements to the system were necessary to maintain the motivation of trial users in order for them to continue the sound collection activity. We developed a new UI that enables users to simply select an appropriate sound source class from a list prepared in advance. In the experiments in evaluating two types of UIs: a hierarchical type and a list view type, we concluded that there is no significant difference between both UIs in terms of convenience. In order to utilize the advantages of both types, we implemented an annotation UI that can be switched between both types of UIs.
Acquisition and retention of perceptual cue for size judgment using whispered speech

YAMAMOTO Koudai, IRINO Toshio, NISHIMURA Ryuichi, KAWAHARA Hideki

IEICE technical report. Speech 2014.05.24 (日本大学文理学部, 東京)

　View Summary

We have suggested that the auditory system can extract and separate information about vocal tract shape from information about vocal tract length (VTL) (strictly speaking, acoustic scale).The previous research shows that just noticeable difference (JND) values using the speech stimuli is about 5%. This is the case when the subjects have acquired size perception clue. The JND values is not necessarily small particularly for naive subjects. This parer presents a series of experiments to survey the characteristics of acquisition and retention of the perceptual cue for size discrimination task. We performed pretest, training session, posttest, and retention test using whispered words in the same procedure as reported previously. Prom the results of the first posttest, eight subjects was grouped into high performance (HP) group and low performance (LP) group. HP group performed the retention test after one month to confirm the JND values are almost the same. LP group was trained again to improve the JND values similar to the HP's values. As a result, given the sufficient acquisition of size perception clue, the JND values become the same as the values reported in the previous studies.
招待講演 The relationship between speaker size perception and the auditory filter,

Toshio Irino, Roy D. Patterson [Invited]

J. Acoust. Soc. Am. , Vol.135(4), Pt.2, p.2347, May 2014, ASA meeting, 5-9 May 2014. Special session:"Cambridge Contributions to Auditory Science: Moore-Patterson Legacy" (4aPP) 2014.05.08 (Rhode Island, RI, USA,)
クラウドソーシングによる環境音収集に向けたスマホアプリの開発

松山みのり, 津田貴彦, 西村竜一, 山田順之介, 入野俊夫, 河原英紀

電子情報通信学会 2014年総合大会 2014.03.19 (新潟大, 新潟)

　View Summary

D-9-25, pp.15-20 2014年3月18日〜21日筆頭著者松山みのり、「電子情報通信学会H26年度学術奨励賞」受賞
幅広い年齢層の母音データベースを利用した声道長推定法による簡易発声評価システム開発の検討

坂口諒, 小林真優子, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：春季研究発表会講演論文集 2014.03.11 (日本大, 東京)

　View Summary

2-6-5, pp.303-304, 2014年3月10日〜12日
グロウル系統の歌唱音声にみられるスペクトルの時間変動に注目した分析と再現の検討

溝渕翔平, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2014.03.11 (日本大, 東京)

　View Summary

2-Q5-20, pp.499-500, 2014年3月10日〜12日
日本語母音データベースを用いた声道長推定法の校正について

小林真優子, 坂口諒, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2014.03.11 (日本大, 東京)

　View Summary

2-6-6, pp.305-306, 2014年3月10日〜12日
ピーク形状と調波構造に注目したスペクトル包絡の近似精度の改善に関する検討

齊藤啓介, 山口貴史, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2014.03.11 (日本大, 東京)

　View Summary

2-6-8, pp.311-312, 2014年3月10日〜12日
日本語スピーキングテストS‐CATの音声入力インタフェース設計

田藤千弘, 西村竜一, 河原英紀, 入野俊夫, 今井新悟

日本音響学会：春季研究発表会講演論文集 2014.03.11 (日本大, 東京)

　View Summary

2-Q4-11, pp.141-142, 2014年3月10日〜12日
スマートフォンを用いた環境音の収集と認識方法の検討

津田貴彦, 松山みのり, 西村竜一, 河原英紀, 山田順之介, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2014.03.10 (日本大, 東京)

　View Summary

1-P5-14,pp.847-848 2014年3月10日〜12日
STRAIGHTスペクトルを用いた線形予測分析の改良の検討

山口貴史, 齊藤啓介, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2014.03.10 (日本大, 東京)

　View Summary

1-R5-25, pp.437-438, 2014年3月10日〜12日
外挿が可能な時変多属性任意事例数モーフィングを用いた文章音声好感度の改善について

吉元照貴, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2014.03.10 (日本大, 東京)

　View Summary

1-R5-22, pp.429-430, 2014年3月10日〜12日
加齢に伴う絶対音感のシフト―気導聴力検査結果との関係―

津崎実, 松井淑恵, 入野俊夫, 竹島千尋

日本音響学会：春季研究発表会講演論文集 2014.03.10 (日本大, 東京)

　View Summary

2-3-1, pp.549-552, 2014年3月10日〜12日
Shifts of Absolute Pitch Judgment by Aging : Effects of Pitch Registers

津崎実, 松井淑恵, 入野俊夫

日本音響学会聴覚研究会資料 2014.03.05 (愛知淑徳大, 名古屋)

　View Summary

Vol.44, No.2, H-2014-??, pp.81-86 2014年3月5日〜6 日
加齢に伴う絶対音感のシフト―音域の影響―

津崎実, 松井淑恵, 入野俊夫, 竹島千尋

日本音響学会聴覚研究会資料 2014.02.27
Realtime conversion of growl-type voice qualities based on modulation and approximate time-varying filtering driven by a non-linear oscillator: Formulation,

河原英紀, 溝渕翔平, 森勢将雅, 榊原健一, 西村竜一, 入野俊夫

情報処理学会, 第102回音楽情報科学研究会 2014.02.23 (筑波大学東京キャンパス, 東京)

　View Summary

2014-MUS-102, No.14, 2014年2月23日-24日
Age Related Shifts Of Absolute Pitch Judgment And Their Relation To The Hearing Impairment

Minoru Tsuzaki, Toshie Matsui, Toshio Irino, Chihiro Takeshima

Proceedings of 37th ARO MidWinter Meeting 2014.02.21
圧縮特性推定における非対称レベルノッチマスキング法と時間マスキング曲線法の対比

深渡瀬智史, 入野俊夫, 西村竜一, 河原英紀, PATTERSON Roy D

日本音響学会聴覚研究会資料 2014.02.08 (那覇市IT創造館, 那覇)

　View Summary

Vol.44, No.1, H-2014-2, pp.7 - 12, 2014年2月8日〜9 日
模擬難聴実現のための逆圧縮特性処理とユーザインタフェース

永江美沙貴, 入野俊夫, 西村竜一, 河原英紀

日本音響学会聴覚研究会資料 2014.02.08 (那覇市IT創造館, 那覇)

　View Summary

Vol.44, No.1, H-2014-3, pp.13 - 18, 2014年2月8日〜9 日
Age related shifts of absolute pitch judgment and their relation to the hearing impairment,

Minoru Tsuzaki, Toshie Matsui, Toshio Irino, Chihiro Takeshima

ARO 37th midwinter meeting 2014, Abstract PS-784, 2014.02

　View Summary

San Diego, California, USA, 22-26 Feb., 2014.(発表日 25 Feb. )
Contributing factors in preference judgement in read sentences using morphing of individual attributes

YOSHIMOTO Shoki, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report. Speech 2014.01.24 (名城大, 名古屋)

　View Summary

A new research strategy based on a recently proposed morphing algorithm, time varying multi-aspect N-way speech morphing algorithm, is applied to investigate evaluation and control of speech (voice) "attractiveness." The new algorithm generates a morphed speech using arbitrarily many numbers of speech samples in a one staged procedure. The morphing rates in this formulation can be manipulated independently using a time series for each of five physical parameters and, in addition, can have negative values. In the current report, a set of representative utterances of spoken sentences having different "attractiveness" were selected to generate a set of stimulus continuum using the morphing procedure. Preliminary tests indicated that morphing of physical parameter actually morphs "attractiveness" in a monotonic way. Using independent control of physical attributes, morphed speech stimuli which are corresponding to vertices of a five dimensional hyper cube in the attribute space were generated. They were evaluated their "attractiveness" by subjective tests of paired comparison to investigate contributions of each physical attribute. Finally, exploratory research using speech neutralization and caricaturization, which were made feasible by the new algorithm, discussed as a prospective direction of further study.
聴覚における寸法知覚の練習効果に関する検討

山本航大, 入野俊夫, 河原英紀, 西村竜一

日本音響学会関西支部,第16回関西支部若手研究者交流研究発表会#42 2013.12.08 (産総研関西支部,大阪)
留学生向け日本語能力測定システムのためのUI設計〜HTML5を用いた音声入力インタフェース〜

田藤千弘, 西村竜一, 河原英紀, 入野俊夫, 今井新悟

日本音響学会関西支部第16回関西支部若手研究者交流研究発表会,#41 2013.12.08 (産総研関西支部, 大阪)

　View Summary

（筆頭著者田藤千弘,「優秀奨励賞」受賞）
環境音収集アプリのためのUI設計 ~クラウドソーシング型データ集積サービスの提案~

松山みのり, 津田貴彦, 西村竜一, 河原英紀, 入野俊夫

日本音響学会関西支部第16回関西支部若手研究者交流研究発表会,#36 2013.12.08 (産総研関西支部, 大阪)
オージオグラムから難聴者の聞こえを再現するシステムの開発

永江美沙貴, 入野俊夫, 西村竜一, 河原英紀

日本音響学会関西支部第16回関西支部若手研究者交流研究発表会,#35 2013.12.08 (産総研関西支部, 大阪)

　View Summary

（筆頭著者永江美沙貴,「奨励賞」受賞）
近似時変フィルターを用いたグロウル系統の歌唱音声合成の検討

溝渕翔平, 西垣友理, 西村竜一, 入野俊夫, 河原英紀

日本音響学会関西支部,第16回関西支部若手研究者交流研究発表会,#31 2013.12.08 (産総研関西支部,大阪)
SAWS(スケール交替ウェーブレット系列)刺激のピッチ移動に対するスペクトル‐時間受容野モデルからの検討―フーリエ分析による検討も交えて―

津崎実, 入野俊夫, 竹島千尋, 松井淑恵

日本音響学会聴覚研究会資料 2013.11.28 (豊橋技科大, 豊橋,)

　View Summary

Vol.43, No.8, H-2013-109, pp.631-638, 2013年11月28日〜29 日
招待講演聴覚におけるスケール分析のための末梢系フィルタバンクのウェーブレット性と非線形性,

Toshio Irino [Invited]

2013 RIMS 共同研究「ウェーブレット解析とサンプリング理論」 2013.10.24 (京都大学数理解析研究所, 京都,)

　View Summary

2013年10月24日〜25日
日本語母音データベースを用いた任意発声の相対的声道長の推定について

小林真優子, 坂口諒, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2013.09.27 (豊橋技科大, 豊橋,)

　View Summary

3-P-17, pp.435-436, 2013年9月25日〜27日
SAWS(スケール交替ウェーブレット系列)刺激の支配的ピッチに関する聴覚モデルによる検討―SAIとSTRFとの比較―

津崎実, 入野俊夫, 竹島千尋, 松井淑恵

日本音響学会：秋季研究発表会講演論文集 2013.09.26 (豊橋技科大, 豊橋,)

　View Summary

2-9-5, pp.501-504, 2013年9月25日〜27日
スペクトル距離に基づく声道長推定における歌い手および基本周波数の影響について

坂口諒, 小林真優子, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2013.09.25 (豊橋技科大, 豊橋,)

　View Summary

1-P-44a, pp.381-382, 2013年9月25日〜27日
対数Swept‐Sineで変調した帯域雑音によるMTF測定

苔口祐樹, 金内由紀, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2013.09.25 (豊橋技科大, 豊橋,)

　View Summary

1-6-7, pp.1005-1006, 2013年9月25日〜27日
基本周波数操作による音声の好感度改善に関連する物理的特徴の検討

吉元照貴, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2013.09.25 (豊橋技科大, 豊橋,)

　View Summary

1-P-11c, pp.335-336, 2013年9月25日〜27日
環境音分類結果に基づく収録アプリのインターフェース設計

松山みのり, 津田貴彦, 西村竜一, 河原英紀, 山田順之介, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2013.09.25 (豊橋技科大, 豊橋,)

　View Summary

1-2-5, pp.1387-1388, 2013年9月25日〜27日
波形の高次対称性に基づく基本周波数抽出法における潜在変数ダイナミクスの導入について

河原英紀, 森勢将雅, 榊原健一, 西村竜一, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2013.09.25 (豊橋技科大, 豊橋,)

　View Summary

1-7-12, pp.279-282, 2013年9月25日〜27日
An analysis of the relationship between prosodic information, head motion, and estimated emotional state in explanatory dialogue

YAGI Miyuki, MORITA Reiko, NAKAI Masato, NISHIMURA Ryuichi, KAWAHARA Hideki, IRINO Toshio

IEICE technical report. Speech 2013.09.18 (千葉大, 千葉)

　View Summary

There is a long history to study the relationship between paralinguistic information of speech and emotional state. The dynamics of emotion in dialogues has not been well studied since the information about emotional state was usually given as static annotations on individual utterances. In this paper, we analyze the dynamics of emotional status evaluated by using a new GUI, emotional movement Observation (EMO) system, in a goal-oriented dialogue. We also modeled the relationship between the emotional stated and paralinguistic quantities, like fundamental frequency and speech power, and with acceleration of head nodding by using stepwise approximation of linear regression model.
An analysis of the relationship between prosodic information, head motion, and estimated emotional state in explanatory dialogue

八木みゆき, 森田礼子, 中井正人, 西村竜一, 河原英紀, 入野俊夫

電子情報通信学会技術研究報告 2013.09.11
A Gammachirp Auditory Filterbank for Reliable Estimation of Vocal Tract Length from both Voiced and Whispered Speech,

Toshio Irino, Erika Okamoto, Ryuichi Nisimura, Hideki Kawahara, Roy D. Patterson

The 4th Annual Conference of the British Society of Audiology, Abstract #81, 2013.09

　View Summary

Keele, UK, 4-6, Sept, 2013.(発表日 4-6 Sept. ポスター )
Frequency-proportional dilation and compression in singing voice spectra and contributing factors

SAKAGUCHI Makoto, KOBAYASHI Mayuko, IRINO Toshio, NISIMURA Ryuichi, KAWAHARA Hideki

Technical report of IEICE. EA 2013.07.18 (北海道医療大学札幌)

　View Summary

A new estimation method of relative vocal tract length was proposed based on short time Fourier analysis and demonstrated its high reproducibility. The proposed method is based on an interference-free power spectral representation of periodic signals. The interference-free envelope spectrum is preprocessed by removing global spectral shape, which is dependent on the glottal source waveform and the radiation characteristic from mouth opening. It also preprocessed by smoothing excessive spectral details such as; differences of formant peak bandwidths, spectral dips caused by vocal tract branching, existence of closing phase of vocal fold and three dimensional vocal tract shape. Spectral distance calculation of preprocessed spectra using only relevant frequency region is introduced to alleviate disturbing factors other than vocal tract length differences. This article reports application of the proposed method on singing voices for investigating effects of singers' individual differences and voice pitch on the estimated relative vocal tract lengths. It also discusses possible application to computer assisted voice training.
招待講演 Perceptual outcomes by rapid alternation of the resonant scaling and its relation to the fundamental frequency,

Minoru Tsuzaki, Chihiro Takeshima, Toshie Matsui, Toshio Irino [Invited]

The 21st International Congress on Acoustics, ICA2013 , 5pPP4, ASA Proceedings of Meetings on Acoustics (POMA) 19, 050199, 2013.06.07 (Montreal, Canada,)

　View Summary

2 - 7, June, 2013.
Voice tells your body information

小林真優子, 西村竜一, 入野俊夫, 河原英紀

第99回音楽情報科学研究会, 音学シンポジウム2013 2013.05.12 (お茶の水女子大, 東京, 2013年5月11日-12日) 情報処理学会

　View Summary

声を聴くと，何となくその人の体型が分かる．ここでは，母音だけを用いて相対的な声道長を推定する方法を提案する．この方法では，声道長以外の要因によるスペクトル形状変化の影響を軽減するために，スペクトル距離の計算に用いる帯域を制限し，スペクトルの大局的な平坦化と形状の過度な詳細の平滑化とを組合せている．6歳から56歳までの284名の男女が発声した母音と身体情報からなるデータベースを用いることで，これらの処理に用いるパラメタを決定した．母音だけを用いた簡易な方法にも関わらず，以前報告した聴覚モデルを用いた方法を凌駕する精度での声道長推定が可能であることを確認した．また，このデータベースに付与された身体情報を母音だけから推定できることを示した．When we hear a voice, we will see the person's body type somehow. In this article, we propose a method for estimating relative vocal tract length using only vowels. The proposed method consists of procedures to alleviate spectral deforming effects caused by other factors than the vocal tract length. They are selection of spectral region for calculating spectral distance, removal of global spectral shape, and smoothing of excessive details of spectrum. Parameter tuning of the proposed method was conducted by using a speech database with relevant physical data which consists of Japanese five vowels spoken by 284 male, female and adolescent talkers ranging from 6 to 56 years old. This simple vowel-based method found to provide better estimates than our previously proposed method. The proposed method also provides estimates of talkers' height and weight only from vowels using the relevant physical data stored in the database.
Development of Collection and Recognition Method for Environmental Sound Samples using Android Mobile Devices

津田貴彦, 中西恭介, 松山みのり, 西村竜一, 山田順之介, 河原英紀, 入野俊夫

第99回音楽情報科学研究会, 音学シンポジウム2013 2013.05.11 (お茶の水女子大, 東京) 情報処理学会

　View Summary

本研究では、環境音を入力とするインターフェースを有するモバイルアプリケーションの開発を行っている。実現に必要なのは、環境音認識手法の開発と、環境音サンプルの収集及び、クライアントアプリケーションの実装である。認識システムを予備評価した結果、アルゴリズムの改良と学習用データの拡充が必要であることを確認した。この問題に対し、データ収集用のAndroidアプリケーションを作成し、学内ではサークル等の活動に伴う音を29時間24分、学外では電車の走行音や救急車のサイレン等の音を10時間36分にわたって集めることに成功した。本発表では、収集データの分類と、その認識手法について議論する。We have been developing an Android mobile application which can recognize environmental sound signals. This report describes environmental sound signal recognition method, our collection of environmental sounds, and an overview of the prototype system. In order to collect further samples of environmental sounds, Android applications for data collection was developed.
招待講演聴覚における寸法知覚と最適末梢系,

Toshio Irino [Invited]

第99回音楽情報科学研究会, 音学シンポジウム2013 2013.05.11 (お茶の水女子大, 東京,) 情報処理学会

　View Summary

2013年5月11日-12日
ウェブ集合知に基づいた語彙獲得と3‐gram確率推定による言語モデル自動生成ツール

田中雅康, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2013.03.15 (東京工科大, 八王子,)

　View Summary

3-P-3c, pp.197-198, 2013年3月13日〜15日
ノッチ雑音マスキング法の測定点削減のための感度解析の改良

深渡瀬智史, 入野俊夫, 西村竜一, 河原英紀, PATTERSON Roy D

日本音響学会：春季研究発表会講演論文集 2013.03.14 (東京工科大, 八王子,)

　View Summary

2-Q-4, pp.609-610, 2013年3月13日〜15日
高い時間分解能を有するスペクトルおよび基本周波数抽出法に基づくシャウト歌唱の分析について

西垣友理, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2013.03.13 (東京工科大, 八王子,)

　View Summary

1-Q-3c, pp.389-390, 2013年3月13日〜15日
環境音認識を応用した情報提供機能を有するモバイルアプリケーションの検討

中西恭介, 津田貴彦, 西村竜一, 河原英紀, 入野俊夫

情報処理学会第75回全国大会 2013. Vol.3,pp.463-464 2013.03.07 (東北大,仙台)

　View Summary

近年、スマートフォンで利用できる音声ナビゲーション機能が注目されている。また、日常では環境音からも多くの情報を得ることができる。そこで、本研究では環境音認識を応用し，その場の状況を判断するガイドシステムの開発を目指す。具体的には、和歌山大学の案内システムを開発する。本システムは、サーバークライアント型のアーキテクチャを採用しており、Android端末で録音した音響信号をサーバー側で認識処理する。実現に必要なのは、環境音認識プログラムの開発と、音響信号サンプルの収集およびアプリケーションの実装である。現在までに、収集した環境音を用いて認識実験を行った。結果を報告する。
Matching of the Dominant Pitch of Scale Alternating Wavelet Sequences against Complex Tones with Odd Harmonics,

Minoru Tsuzaki, Toshio Irino, Chihiro Takeshima, Toshie Matsui

ARO midwinter research meeting, Abstract #491 2013.02

　View Summary

Baltimore, Maryland, 16-20 Feb., 2013.(発表日 17 Feb )
非対称レベルマスカを導入したノッチ雑音マスキング法の測定点の感度解析による削減

深渡瀬智史, 入野俊夫, 西村竜一, 河原英紀, Roy D. Patterson

第15回関西支部若手研究者交流研究発表会 2012.12.09 (産総研関西支部,大阪) 日本音響学会関西支部
携帯型ガイドシステムのための環境音認識を応用したZoneRecognitionの提案

中西恭介, 津田貴彦, 西村竜一, 河原英紀, 入野俊夫

第15回関西支部若手研究者交流研究発表会 2012.12.09 (産総研関西支部,大阪) 日本音響学会関西支部
TANDEM-STRAIGHTを用いた歌唱技法「シャウト」の再現

西垣友理, 西村竜一, 入野俊夫, 河原英紀

第15回関西支部若手研究者交流研究発表会 2012.12.09 (産総研関西支部,大阪) 日本音響学会関西支部
ウェブ上の言語情報で拡張した語彙に基づく3-gramモデル自動生成ツール

田中雅康, 西村竜一, 河原英紀, 入野俊夫

第15回関西支部若手研究者交流研究発表会 2012.12.09 (産総研関西支部,大阪) 日本音響学会関西支部
母音区間だけを用いた声道長推定と身体情報との関連〜あいうえおでBMIがわかる?〜

小林真優子, 西村竜一, 入野俊夫, 河原英紀

第15回関西支部若手研究者交流研究発表会 2012.12.09 (産総研関西支部,大阪) 日本音響学会関西支部
コミュニケーションの環を紡ぐ情報処理原理の解明と応用

Toshio Irino

工学研究シーズ合同発表会 2012.11.12 (大阪府立大学, 大阪) 大阪府立大学・和歌山大学
非対称レベルマスカを導入したノッチ雑音マスキング法の測定点の削減

深渡瀬智史, 入野俊夫, 西村竜一, 河原英紀, PATTERSON Roy D

日本音響学会聴覚研究会資料 2012.10.13 (いこいの村岩手,岩手)

　View Summary

Vol.42, No.7, H-2012-99, pp.547-552, 2012
2012年10月13日〜14 日筆頭著者深渡瀬智史,「聴覚研究会, 研究奨励賞」受賞
Introduction to the dynamic compressive gammachirp filterbank -- How can we implement aging efffect with it?

Toshio Irino

Workshop on "Shift of the absolute pitch in eldery listener" (Organizer: Prof. Minoru Tsuzaki) 2012.09.23 (Campus plaza Kyoto, Kyoto)
日本語スピーキングテストS‐CATにおける並列セグメンテーションを用いた自動採点の検討

西村竜一, 栗原理沙, 篠崎隆宏, 石塚賢吉, 山田武志, 今井新悟, 河原英紀, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2012.09.21 (信州大, 長野,)

　View Summary

3-Q-17, pp.397-398, 2012年9月19日〜21日
言語モデルの簡易構築に向けたGoogleデータからの必要単語抽出方法の検討

田中雅康, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2012.09.21 (信州大, 長野)

　View Summary

3-P-20, pp.173-174,2012年9月19日〜21日
母音区間情報に基づく声道長正規化と身体情報の基礎的検討

小林真優子, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2012.09.21 (信州大, 長野)

　View Summary

3-Q-28, pp.423-424, 2012年9月19日〜21日
スピーカー特性の簡易補正と主観評価実験

苔口祐樹, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2012.09.20 (信州大, 長野)

　View Summary

2-Q-a9, pp.533-534, 2012年9月19日〜21日
周期信号の瞬時周波数および群遅延の安定な表現について

河原英紀, 森勢将雅, 西村竜一, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2012.09.20 (信州大, 長野,)

　View Summary

2-2-6, pp.283-286, 2012年9月19日〜21日
感度解析を用いたノッチ雑音マスキング法の測定点の削減に関する研究

深渡瀬智史, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2012.09.19 (信州大, 長野)

　View Summary

2-Q-a11, pp.537-538, 2012年9月19日〜21日
携帯型端末で収録した音サインやサイレンなどの環境音認識の検討

津田貴彦, 西村竜一, 河原英紀, 山田順之介, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2012.09.19 (信州大, 長野)

　View Summary

1-4-5, pp.1515-1516, 2012年9月19日〜21日
周期信号の群遅延の安定な表現について

河原英紀, 森勢将雅, 西村竜一, 入野俊夫

音楽音響研究会資料 2012.07.12

　View Summary

Instantaneous frequency and group delay, which are defined as the temporal derivative and the frequency derivative of phase respectively, are better representations than phase itself, because they are physically meaningful and do not require unwrapping, which is a fragile operation. However, abrupt changes and discontinuities, which are caused by interference between constituent components prevented them from potential applications. As the final piece of the authors' investigations for providing interference-free representations of power spectrum and instantaneous frequency, an interference-free representation of group delay is introduced. It is derived from the group delay representation, analogues to Flanagan's instantaneous frequency representation. The interference-free group delay is the power spectrum weighted average of the shifted pair of group delays 1/2 fundamental frequency apart.
心理カウンセリング来談者の問題表現時の視点構造とマイクロスリップ — 問題の所在が遷移した事例に関する質的検討

末崎裕康, 古山宣洋, 花田里欧子, 井上雅史, 有久亘, 入野俊夫

日本生態心理学会第4回大会 2012.07.07 (函館、北海道)
招待講演内耳における圧縮特性とフィルタ特性の同時推定手法とその応用,

Toshio Irino [Invited]

第9回内耳ひずみ研究会 2012.07.06 (慶應大学病院, 東京,) 日本聴覚医学会
Speaker Size Discrimination and Vowel Identification for Acoustically Scaled Vowels : Dependence of Vowel Duration

TAKESHIMA Chihiro, TSUZAKI Minoru, IRINO Toshio

IEICE technical report. Speech 2012.06.07

　View Summary

This study aims to investigate characteristics of temporal integration for the auditory processing of size information. In this paper, we measured listeners' speaker-size discrimination using acoustically scaled vowels. The experimental results showed the discrimination performance largely improved when the vowel duration increased from 16 ms to 32 ms, while the performance did not show the large effects of duration when the duration was longer than 32 ms. This finding suggests that an integration window of around 32 ms influences on the size processing in the auditory system. The similar performance deterioration for 16-ms vowels was observed in vowel identification experiment, although the degree of deterioration was different by the driving source and frequency of vowels.
Cross synthesis VOCODER which preserves linguistic information and characteristic timbre of musical instruments and animal voices

西大輝, 西村竜一, 入野俊夫, 河原英紀

第95回音楽情報科学研究会,MUS95-3 2012.06.02 (東京大, 東京, 2012年6月2~3日) 情報処理学会

　View Summary

楽器音や動物の鳴声と，音声の２つの音源の特徴を併せ持つ合成音を作るクロス合成 VOCODER の検討をしている．クロス合成は，音声の狭帯域伝送技術である VOCODER を応用した技術で，現在では楽曲制作や Vocal エフェクター等，音楽の分野で広く用いられる．しかし，クロス合成でつくられる合成音は，楽器音等の音色の特徴が失われ，元の楽器の音が何か不明確になるという問題がある．本報告では，この問題を解決するため，変調周波数領域を帯域制限することにより，音声の言語情報だけを残したスペクトルを用いる新たなクロス合成を提案する．さらに，変調周波数領域を処理するフィルタにおける遮断周波数の設計を検討し，その効果を主観評価実験により明らかにした．A new design method of cross synthesis VOCODER, which synthesizes sounds by mixing features of two input sounds, such as speech and musical instruments or animal voices, is proposed. Cross synthesis VOCODER is originated from a narrow-band transmission technology and currently widely used as an effector for musical performance and production. However, current cross synthesis effects tend to deteriorate original character of musical instruments and linguistic information of the processed sound is not always intelligible. The proposed method provide ways to alleviate these difficulties using two technique. One is spectral global shape removal form the speech spectral envelope and the other is band-pass filtering in the modulation frequency domain. Subjective test results indicated relevance of the proposed techniques and provide design guideline of new flexible cross synthesis VOCODERs.
Manipulation of temporal fine structures on excitation source and spectral envelope of singing voices and their effects on perceived impression

河原英紀, 森勢将雅, 西村竜一, 入野俊夫

第95回音楽情報科学研究会,MUS95-4 2012.06.02 (東京大, 東京, 2012年6月2~3日) 情報処理学会

　View Summary

シャウトやデスボイスなどの激しい表現は、ポピュラー歌唱で広く用いられている。これらを適切に分析、再現、制御する方法を明らかにすることは、歌唱合成システムに豊かな表現力を与えるために解決すべき重要な課題である。本報告では、まず、新たに開発した高い時間分解能を有する基本周波数抽出法とそれに基づく TANDEM-STRAIGHT により、様々な歌唱音声を分析した結果について報告する。分析結果は、激しい表現にいおいて、70 Hz付近に 20 dB程度の高さのピークを有する高速の（基本周波数の）周波数変調と、同様に、高速の（スペクトル包絡の）振幅変調が存在することを示した。このような高速の変調の存在は、これまでにはっきりとは報告されていない。予備的な実験により、それらの高速の変調を加工することにより、発声の声区と努力の印象を保ったまま、シャウトなどの歌唱表現の強さ（生々しさ）を制御できる可能性が示された。Strong expressions such as "shout" and "death voice" are common in popular singing. However, current singing synthesis systems are not good at handling these strong expressions and are not capable of using them to expand their limit of expressiveness. This is the topic this article tries to address. A set of singing voice analysis tests was conducted using our newly developed F0 extraction method, which has high temporal resolution and is light-weighted, and TANDEM-STRAIGHT for spectral envelope analyses. This test revealed that expressive singing voices consist of high-speed frequency as well as amplitude modulations in F0 and spectral envelope respectively. In one typical case, about 20 dB higher modulation frequency spectral peak was found around 70 Hz for expressive performance than that of normal performance. Preliminary tests suggested that selective control of "expressiveness" can be implemented by manipulating these high-speed modulations while preserving vocal register and effort intact.
聴覚フィルタバンクを導入した音響特徴量による若年者判別手法

宮森翔子, 西村竜一, 岡本恵里香, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-7-3, pp.87-88, 2012年3月13日〜15日
若年話者判別法の音響特徴に対する聴覚フィルタバンクの導入

宮森翔子, 西村竜一, 岡本恵里香, 河原英紀, 入野俊夫

情報処理学会第74回全国大会 2012 Vol.2, pp.613-614 2012.03.15 (名古屋工大,名古屋, 2012年3月6日～8日)

　View Summary

本研究では、対話インタフェースにおいて子どもに優しい振舞いを提供するために、音声認識を用いた若年者判別技術に関する検討を行っている。今回、従来から用いている音響特徴量であるMFCC(メル周波数ケプストラム係数)に、ガンマチャープ聴覚フィルタバンク(GCFB)から抽出した特徴量を組み合わせ、判別性能の調査を行った。MFCCは、音声認識に一般的に使用されている特徴量である。一方、聴覚フィルタバンクは人間の聴覚特性を模擬しており、先行研究により、音声モーフィングのための声道長正規化に有効であることがわかっている。声道長と人間の身長には相関があることから、聴覚フィルタバンクの導入は若年話者判別にも有効であると考えられる。
Googleデータを用いた3‐gramモデル構築における品詞情報に基づいた語彙制限

田中雅康, 西村竜一, 島田敏明, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-P-9, pp.233-234, 2012年3月13日〜15日
Googleデータベースを用いた3‐gram拡張法による言語モデル構築の自動化ツール

島田敏明, 田中雅康, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-P-10, pp.235-236, 2012年3月13日〜15日
日本語発話能力測定ウェブテストシステムを用いて収集した留学生の日本語発話の分析

栗原理沙, 西村竜一, 和田芳佳, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-11-19, pp.421-422, 2012年3月13日〜15日
異なった原理に基づく周期性検出器のアンサンブルによる音源情報の分析について

河原英紀, 森勢将雅, 西村竜一, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-11-4, pp.385-388, 2012年3月13日〜15日
楽器音や動物の鳴声の音色を保持した音声とのクロス合成VOCODERの検討

西大輝, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-11-10, pp.401-402, 2012年3月13日〜15日
（筆頭著者西, 「学生優秀発表賞（第５回),」受賞）
模擬難聴のための動的圧縮型ガンマチャープによる圧縮特性の制御

坂口諒, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-Q-6, pp.605-606, 2012年3月13日〜15日
非対称レベルマスカを導入したノッチ雑音マスキング法による圧縮特性推定法の提案

深渡瀬智史, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-Q-25, pp.647-648, 2012年3月13日〜15日
母音の持続時間が話者寸法の弁別能力に与える影響

竹島千尋, 津崎実, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-Q-9, pp.611-614, 2012年3月13日〜15日
スケール変換したインパルス応答が交替する系列に対するピッチ知覚

津崎実, 竹島千尋, 松井淑恵, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-10-5, pp.583-586, 2012年3月13日〜15日
障害音声および歌唱音声における音声の周期構造分析について

和田芳佳, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2012.03.15 (神奈川大, 神奈川)

　View Summary

3-11-1, pp.375-376, 2012年3月13日〜15日
聴覚フィルタバンクに基づく声道長推定と発話様式や身長との関係

岡本恵里香, 北出晴香, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2012.03.14 (神奈川大, 神奈川,)

　View Summary

2-11-3, pp.339-340, 2012年3月13日〜15日
ウェブデータベースを用いた音声認識用言語モデルの簡易適応

西村竜一, 島田敏明, 田中雅康, 河原英紀, 入野俊夫

情報処理学会第74回全国大会 2012. Vol.2,pp.5-6 2012.03.07 (名古屋工大,名古屋, 2012年3月6日～8日)

　View Summary

我々は、大語彙連続音声認識の精度向上の為、ウェブデータベースを用いた3-gram言語モデルの拡張手法を検討している。本手法は、Googleの日本語N-gramデータベースの登録情報に基づき、学習用コーパス内では未観測であった3-gramの出現確率を推定する。また、本手法では情報量を基準として重要単語を抽出し、拡張する3-gramを選別する。昨年の報告に引き続き、提案法を言語モデルのタスク適応に応用した。実験では、日本語話し言葉コーパス(CSJ)から抽出した講演発話を対象に本手法を適用し、認識精度を評価した。また、提案法を実装したウェブアプリサービスを構築する予定なので、その概要を報告する。
ウェブデータベースを用いた音声認識用言語モデルの簡易適応

西村竜一, 島田敏明, 田中雅康, 河原英紀, 入野俊夫

情報処理学会全国大会講演論文集 2012.03.06

　View Summary

我々は、大語彙連続音声認識の精度向上の為、ウェブデータベースを用いた3-gram言語モデルの拡張手法を検討している。本手法は、Googleの日本語N-gramデータベースの登録情報に基づき、学習用コーパス内では未観測であった3-gramの出現確率を推定する。また、本手法では情報量を基準として重要単語を抽出し、拡張する3-gramを選別する。昨年の報告に引き続き、提案法を言語モデルのタスク適応に応用した。実験では、日本語話し言葉コーパス（CSJ）から抽出した講演発話を対象に本手法を適用し、認識精度を評価した。また、提案法を実装したウェブアプリサービスを構築する予定なので、その概要を報告する。
聴覚フィルタバンクによる声道長推定と身長との相関および発話様式の影響

岡本恵里香, 北出晴香, 西村竜一, 河原英紀, 入野俊夫

日本音響学会聴覚研究会資料 2012.02.04 (那覇市IT創造館,沖縄)

　View Summary

Vol.42, No.1, H-2012-7, pp.35-40, 2012年2月4日〜5 日
スケーリングした2種のインパルス応答が交替する音系列に対するピッチ知覚―調整法による心理物理実験―

津崎実, 竹島千尋, 松井淑恵, 入野俊夫

日本音響学会聴覚研究会資料 2012.02.04 (那覇市IT創造館,沖縄)

　View Summary

Vol.42, No.1,H-2012-6, pp.29-34, 2012年2月4日〜5 日
Effects of the Correlation Between the Fundamental Frequecies and Resonance Scales as a Cue for the Auditory Stream Segregation,

Minoru Tsuzaki, Toshio Irino, Chihiro Takeshima, Toshie Matsui

ARO midwinter research meeting, Abstract #1079 2012.02

　View Summary

San Diego, California, USA, 25-29 Feb., 2012.(発表日 29 Feb )
Discrimination of Speaker Sizes Through Speech Sounds: Dependence on Sound Duration,

Chihiro Takeshima, Minoru Tsuzaki, Toshio Irino

ARO midwinter research meeting, Abstract #417 2012.02

　View Summary

San Diego, California, USA, 25-29 Feb., 2012.(発表日 26 Feb )
音声の周期構造分析法とその障害音声分析への応用

和田芳佳, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

第14回関西支部若手研究者交流研究発表会 2011.12.18 (産総研関西支部,大阪) 日本音響学会関西支部
和歌山大学のゆるキャラ『わだにゃん』が登場する子どもにやさしい対話システムの開発

吉本勇希, 西村竜一, 宮森翔子, 河原英紀, 入野俊夫

第14回関西支部若手研究者交流研究発表会 2011.12.18 (産総研関西支部,大阪) 日本音響学会関西支部
聴覚フィルタバンクに基づく声道長正規化を用いた音声モーフィングの改良

岡本恵里香, 入野俊夫, 西村竜一, 河原英紀

第14回関西支部若手研究者交流研究発表会 2011.12.18 (産総研関西支部,大阪) 日本音響学会関西支部
Googleデータを用いた音声認識用辞書のクイック構築技術

田中雅康, 西村竜一, 島田敏明, 河原英紀, 入野俊夫

第14回関西支部若手研究者交流研究発表会 2011.12.18 (産総研関西支部,大阪) 日本音響学会関西支部
pandaPhone:人と動物を混ぜ合わせた声の iPhoneアプリ

西大輝, 西村竜一, 入野俊夫, 河原英紀

第14回関西支部若手研究者交流研究発表会 2011.12.18 (産総研関西支部,大阪) 日本音響学会関西支部

　View Summary

（筆頭著者西, 「若手奨励賞」受賞）
基本波のFMとAM成分に基づく高速な基本周波数推定法について

河原英紀, 森勢将雅, 西村竜一, 入野俊夫

日本音響学会聴覚研究会資料 2011.12.10 (熊本県立大, 熊本)

　View Summary

Vol.41, No.9, pp.679-684 2011年 12月10日～11 日
音声分析変換合成系における時変フィルタの実装と駆動情報の表現について

河原英紀, 和田芳佳, 西大輝, 森勢将雅, 西村竜一, 入野俊夫

日本音響学会聴覚研究会資料 2011.10.01 (富山)

　View Summary

Vol.41, No.7, pp.561-566, 2011年10月1日～2日
Experimental results on size perception in voiced and whispered speech,

Toshio Irino

Wakayama Auditory and Visual Exploring Workshop (WAVE workshop) 2011.09.27
招待講演安定な声道長推定のための聴覚フィルタバンクとその理論

入野俊夫, 河原英紀 [Invited]

日本音響学会 2011.09.22 (島根大, 島根)

　View Summary

秋季研究発表会講演論文集, pp.505-508,2011年9月20日～22日,
障害音声の分析における基本周波数抽出法の評価について

和田芳佳, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2011.09.21 (島根大, 島根)

　View Summary

pp.423-434, 2011年9月20日～22日
語彙で認識対象を制御するGoogleデータを用いた3‐gramモデル構築法の検討

田中雅康, 西村竜一, 島田敏明, 河原英紀, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2011.09.21 (島根大, 島根)

　View Summary

pp.161-162, 2011年9月20日～22日
聴覚フィルタバンクを用いた声道長推定法の比較

岡本恵里香, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2011.09.21 (島根大, 島根)

　View Summary

pp.389-390, 2011年9月20日～22日
情報量を基準とした3‐gram拡張に基づく言語モデルの適応手法

島田敏明, 田中雅康, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2011.09.21 (島根大, 島根)

　View Summary

pp.167-168, 2011年9月20日～22日
招待講演寸法知覚を中心とした聴覚情景分析－物理世界と心理世界をつなぐ聴覚－

津崎実, 入野俊夫, 竹島千尋, 松井淑恵 [Invited]

日本音響学会 2011.09.21 (島根大, 島根,)

　View Summary

秋季研究発表会講演論文集, pp.1437-1440,2011年9月20日～22日,
言葉の明瞭度と楽器等の特徴を保持したクロス合成の評価について

西大輝, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2011.09.20 (島根大, 島根)

　View Summary

pp.587-588, 2011年9月20日～22日
聴覚フィルタバンクを用いた若年話者判別の検討

宮森翔子, 岡本恵里香, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2011.09.20 (島根大, 島根)

　View Summary

pp.59-62, 2011年9月20日～22日
安定な声道長推定のための聴覚フィルタバンクとその理論

入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2011.09.13
招待講演音声からの声道長推定における聴覚的ウェーブレット変換について,

Toshio Irino [Invited]

平成23年度数学•数理科学と諸科学•産業との連携研究ワークショプ「ウェーブレット理論と工学への応用」 2011.09.12 (大阪教育大, 大阪,) 文部科学省•大阪教育大

　View Summary

大阪, 2011年9月12〜13日
対話型音声インタフェースのための大人・子ども判別技術の改良

宮森翔子, 西村竜一, 入野俊夫, 河原英紀

FIT2011 第10回情報科学技術フォーラム 2011.09.07 (函館大学・函館短期大学, 北海道)

　View Summary

Vol 3. pp.37 - 40, 2011年9月7日～9日
寸法知覚を中心とした聴覚情景分析―物理世界と心理世界をつなぐ聴覚―

津崎実, 入野俊夫, 竹島千尋, 松井淑恵

日本音響学会研究発表会講演論文集(CD-ROM) 2011.09
An excitation structure extraction for voiced sounds with multiple periodicity and its application to pathological voices

WADA Yoshika, MORISE Masanori, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report 2011.08.10 (東北大, 仙台, 宮城) 電子情報通信学会電気／応用音響究会

　View Summary

A new excitation source information analysis method, called XSX (eXcitation Structure extractor) has been investigated to analyze voices with complex excitation behavior; such as singing voices, pathological voices, emotional voices and so on. This article illustrates advantages of XSX over existing PDAs (pitch determination algorithms) and introduces prospective applications. A comparative study with YIN and SWIPE, two well know PDAs, using a harmonic multiple sinusoids with a common frequency modulated fundamental component was conducted and revealed that XSX has superior response to the modulation frequency. Detailed analyses using XSX were also conducted for pathological voices, which displayed large discrepancies between results by XSX and other PDAs. The analyses by XSX clearly indicated that subharmonics by coupling multiple basic periods are sometimes more prominent than the usual fundamental components. These results and advantages illustrates that XSX is useful for analyses of voices with complex behavior, which makes analyses by existing PDAs impractical.
Cross synthesis vocoder that preserves both speech intelligibility and instruments' timbre

NISHI Taiki, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report 2011.08.10 (東北大, 仙台, 宮城) 電子情報通信学会電気／応用音響究会

　View Summary

TANDEM-STRAIGHT, an F0-adaptive spectral envelope extraction procedure was applied to cross synthesis VOCODER, which synthesizes sounds by mixing features of two input sounds, such as speech and musical instruments or animal voices. A set of tests with FIR implementation of time-varying filter illustrated potential improvements of intelligibility by using STRAIGHT spectrum of speech sounds, but at the same time, introduced deterioration of instruments' characteristic timbre. A new cross synthesis framework using deviation spectrum of speech sounds and minimum phase implementation of time varying filter was proposed to solve this problem. Preliminary tests suggested that the proposed method reduces this deterioration while preserving intelligibility.
Estimation of vocal tract length ratio using auditory filterbank

OKAMOTO Erika, IRINO Toshio, NISIMURA Ryuichi, KAWAHARA Hideki

IEICE technical report 2011.07.22 (定山渓, 北海道, 2011年7月21日〜23 日) 電子情報通信学会音声研究会

　View Summary

Vocal tract length normalization (VTLN) is an important issue in speech applications, such as automatic speech recognition and high-quality voice morphing. Individual spectral differences are primarily dependent on vocal tract length differences.They are also dependent on glottal source signal and the shape of pyriform fossa. This paper propose a new method for vocal tract length (VTL) estimation and normalization based on a gammachirp auditory filterbank (GCFB). VTLratios were estimated based on spectral distances between the same sentence spoken by 2 speakers. The calculation was carried out for all permutations of 28 speakers (_<28>P_<27> =756). Then the estimated error was calculated by the regression analysis. VTL estimation using the mel-frequency filterbank (MFFB), which is a preprocessor for calculating MFCCs commonly used in ASR, the gammatone fileterbank(GCFB) and the gammachirp filterbank(GCFB). The results indicated that the proposed GCFB-based VTL estimation outperforms the MFCC-based and the GTFB-based methods in the objective evaluations.
Pitch perception for sequences of glottal pulses alternating different resonance scales,

Minoru Tsuzaki, Toshie Matsui, Chiriro Takeshima, Toshio Irino

J. Acoust. Soc. Am. , 129 (4), Pt.2 2011.05

　View Summary

Presented at ASA meeting, Seattle, USA, 23-27 May 2011,(発表日 26 May )
話者寸法の弁別における母音の持続時間の効果―雑音駆動母音を用いた検討―

竹島千尋, 津崎実, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2011.03.11 (早稲田大, 東京)

　View Summary

pp.589-592, 2011年3月9日〜11日
単語音声の連続性と音節遷移情報を担う脳領域のfMRIによる検討

塚田裕樹, 能田由紀子, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2011.03.11

　View Summary

pp.483-486, 2011年3月9日〜11日
滑舌の良いCross synthesis VOCODER

西大輝, 赤桐隼人, 西村竜一, 入野俊夫, 河原英紀

情報処理学会シンポジウム論文集,インタラクション2011 2011.03.11 (日本科学未来館)

　View Summary

2011年3月10日〜12日
ピーク強調を含んだF0適応型スペクトル包絡抽出法による再合成音声の品質評価について

赤桐隼人, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2011.03.11 (早稲田大, 東京)

　View Summary

pp.327-328, 2011年3月9日〜11日
成分位相の制御により声の肌触りを変える

河原英紀, PATTERSON Roy D, 森勢将雅, 坂野秀樹, 津崎実, 高橋徹, 西村竜一, 入野俊夫

情報処理学会シンポジウム論文集,インタラクション2011 2011.03.11 (日本科学未来館)

　View Summary

2011年3月9日〜11日
実環境発話を用いた子ども判別アルゴリズムの検討

宮森翔子, 西村竜一, 栗原理沙, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2011.03.10 (早稲田大, 東京)

　View Summary

pp.55-56, 2011年3月9日〜11日
ウェブを用いたトピック関連N‐gramエントリ抽出手法の検討

島田敏明, 田中雅康, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2011.03.10 (早稲田大, 東京)

　View Summary

pp.199-200, 2011年3月9日〜11日
聴覚フィルタバンクに基づく声道長正規化と音声モーフィングへの応用について

岡本恵里香, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：春季研究発表会講演論文集 2011.03.09 (早稲田大, 東京)

　View Summary

pp.419-420, 2011年3月9日〜11日
音声の駆動構造分析における周期性検出器の応答特性の整形と統合について

和田芳佳, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2011.03.09

　View Summary

pp.395-396, 2011年3月9日〜11日
Revisiting VTLN based on auditory filterbank

Toshio Irino

WAVE workshop on augmentation of speech communication 2011.03.07 (Sophia University, Tokyo, Japan)
外部知識としてウェブを用いた3‐gram言語モデル拡張手法の検討

西村竜一, 島田敏明, 田中雅康, 河原英紀, 入野俊夫

情報処理学会第73全国大会講演論文集,vol. 2,pp. 75-76 2011.03.02 (東京工大,東京)

　View Summary

大語彙連続音声認識の精度向上の為、ウェブを用いた3-gram言語モデルの拡張手法に関して報告する。3-gramモデルにおいて、学習コーパスに存在しない未観測3-gramの確率値を推定する手法として、バックオフが従来から用いられている。内包的な確率推定手法であるバックオフが広く普及する一方、本研究のように、外部のデータベースを用いた未観測3-gramの確率推定の手法も存在する。本発表では、外部データベースとしてGoogleデータベースを用いた場合の未観測3-gram確率推定法に関して、従来のバックオフ手法との比較を中心に報告する。
対話の流れと頷きパターン変化

井上雅史, 入野俊夫, 古山宣洋, 花田里欧子, 一宮貴子, 末崎裕康

HAIシンポジウム2010 2010.12.12 (慶應義塾大, 神奈川)

　View Summary

2010年12月12日〜14日
単語の音節遷移情報の処理を担う脳領域のfMRIによる検討

塚田裕樹, 能田由紀子, 河原英紀, 入野俊夫

日本音響学会: 聴覚研究会資料 2010.12.11 (かんぽの宿柳川, 福岡)

　View Summary

H-2010-154, Vol. 40, No.10, pp.851-856, 2010年12月10日〜11日
聴覚フィルタバンクを用いたスペクトル距離に基づく声道長比推定について

岡本恵里香, 入野俊夫, 西村竜一, 河原英紀

第13回関西支部若手研究者交流研究発表会 2010.12.05 (同志社大学,京都) 日本音響学会関西支部
音声の周期構造分析法の設計パラメタの検討および性能評価について

和田芳佳, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

第13回関西支部若手研究者交流研究発表会 2010.12.05 (同志社大学,京都) 日本音響学会関西支部
トピック関連単語を用いた N-gram エントリ拡張法の音声認識性能調査

島田敏明, 田中雅康, 西村竜一, 河原英紀, 入野俊夫

第13回関西支部若手研究者交流研究発表会 2010.12.05 (同志社大学,京都) 日本音響学会関西支部
Analysis and synthesis of singing with hoarse vocal expressions

Hideki Kawahara, Hanae Itagaki, Yoshika Wada, Masanori Morise, Ryuichi Nisimura, Toshio Irino

20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society 2010.12.01

　View Summary

Strong vocal expressions in singing use hoarse voice effectively in various manners. However, analysis and synthesis of such voice quality have been a challenging topic with virtually little success. An excitation structure extraction framework called XSX was introduced to represent such complex structured vocal excitation with various types of aperiodicity as an integral component of TANDEM-STRAIGHT, a widely used speech analysis, modification and resynthesis framework. TANDEM-STRAIGHT is basically a source-filter model extended by introducing temporally stable power spectral representation for periodic signals and F0 adaptive spectral envelope estimation based on the consistent sampling theory. The excitation source signal used in TANDEM-STRAIGHT is a mixture of pulses and colored random signals. The source signal parameters are extracted by XSX and an aperiodicity extraction procedure. XSX is based on spectral division and inverse Fourier transform of power spectra by their spectral envelopes, which were calculated for a set of periodicity candidates. Combining salience scores for each candidate yields an integrated measure to detect locally periodic components. The aperiodicity extraction procedure is based on long-range linear prediction of band-pass signals by a set of Quadrature Mirror filters applied to the original and the time-warped signals. This data-driven approach enables to extract and represent complex excitation structures such as diplophonia. The analysis results are used to design voice excitation source, which is capable of adding/modifying hoarse vocal expressions and enables morphing between two or more expressive performance examples.
Evaluation and optimization of F0-adaptive spectral envelope estimation based on spectral smoothing with peak emphasis

Hayato Akagiri, Masanori Morise, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society 2010.12.01

　View Summary

A new spectral estimation method which improves processed sound quality of STRAIGHT, a speech analysis, modification and re-synthesis framework widely used for high-quality speech and singing manipulations, is proposed. Application of the proposed method to TANDEM-STRAIGHT, a completely reformulated version of STRAIGHT, yielded the best spectral envelope approximation among conventional methods such as LPC, cepstrum and legacy-STRAIGHT. TANDEM-STRAIGHT consists of two parts, a temporarily stable power spectrum estimation method of periodic signals (TANDEM) and a spectral envelope calculation method based on consistent sampling theory. The proposed method uses F0-adaptive smoothing and compensation of logarithmic power spectrum, for improving approximation accuracy of spectral peaks, which effects on the quality of re-synthesized sound. A series of simulations was conducted to optimize internal parameters of the proposed method. The optimized system was evaluated and compared with conventional methods using stylized spectra and simulated speech spectra. The evaluation was based on a spectral distance measure proposed by Itakura and Saitou with modification to perceptually relevant ERBNnumber frequency axis. The following set of spectra were used. Power spectra calculated from vocal tract area functions measured using MRI data with LF-model excitation spectra were used as the grand truth and spectral distances between this target and the estimated spectra were evaluated. A set of periodic pulse train was used for excitation signal in this case. These evaluation results indicated that the proposed method yields the smallest spectrum distance among conventional methods such as LPC, cepstrum and legacy-STRAIGHT.
VTL estimation using dynamic compressive gammachirp filterbank (dcGCFB)

Toshio Irino, Erika Okamoto, Ryuchi Nisimura, Hideki Kawahara

WAVE workshop on "Roles of voice periodicity," 2010.11.28 (Miraku community center of arts, Ikoma, Nara)

　View Summary

27-28, Nov. 2010
周期信号における時間的変動の影響を受けない位相関連情報の表現について

河原英紀, 森勢将雅, 入野俊夫

電子情報通信学会技術研究報告 2010.11.18 (愛知県立大, 愛知) 電子情報通信学会音声研究会

　View Summary

Vol.110, No.297, SP2010ー77, pp.47-52, 2010年11月18日〜19日
Temporally static representation of phase related quantity for periodic signals

KAWAHARA Hideki, MORISE Masanori, IRINO Toshio

IEICE technical report 2010.11.11

　View Summary

An averaged power spectrum, which is calculated from two power spectra using two time windows a half pitch-period apart, does not depend on the relative phase between the analyzed signal and the windows. This article introduces a procedure to calculate instantaneous frequency, which yields temporally static representation of the instantaneous frequency of periodic signals. The proposed method is derived from the famous Flanagan's equation. Specifically, a power weighted average of instantaneous frequencies, which are calculated using the Flanagan's equation, yields this temporally static representation. A proof of the independence of the proposed representation on the relative phase between the analyzed signal and the windows is presented assuming weak conditions on the windowing function. Performance evaluation tests are conducted for popular windowing functions and their results are discussed.
The dynamic, compressive GammaChirp filterbank (dcGC) and its applications,

Toshio Irino, Roy Patterson

Workshop on "Machine Hearing in the Internet Age: Auditory models in MIR, SIR and AIS," Google, Mountain View, 2010.11

　View Summary

CA, USA, 19 Nov., 2010
実環境発話を入力とする子ども利用者判別技術の開発

宮森翔子, 西村竜一, 栗原理沙, 河原英紀, 入野俊夫

日本ロボット学会第28回学術講演会 2010.09.22 (名古屋工大, 名古屋)

　View Summary

RSJ2010AC1H2-1, 2010年9月22日～24日
音声の周期構造検出法の設計パラメタの調整と性能評価指標の検討について

和田芳佳, 板垣英恵, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2010.09.16 (関西大学, 大阪)

　View Summary

pp. 333 - 334, 2010年9月14日～16日
F0適応型スペクトル包絡推定法のケプストラムを用いた実装によるピーク形状近似誤差の改善

赤桐隼人, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2010.09.16 (関西大学, 大阪)

　View Summary

pp. 331 - 331, 2010年9月14日～16日
招待講演はじめての聴覚フィルター心理物理実験デモで学ぶ聴覚フィルタ特性ー

Toshio Irino [Invited]

日本音響学会 2010.09.16 (関西大学, 大阪,)

　View Summary

秋季研究発表会講演論文集, pp.1347 - 1348, 2010年9月14日～16日
ウェブ収集発話に基づく子ども向け対話インタフェースの開発

宮森翔子, 西村竜一, 栗原理沙, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2010.09.16 (関西大学, 大阪)

　View Summary

pp.89 - 90, 2010年9月14日～16日
声道長比に基づくスペクトル正規化のためのスペクトル距離および周波数帯域の検討

岡本恵里香, 浅香佳希, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2010.09.15 (関西大学, 大阪)

　View Summary

pp.323 - 324, 2010年9月14日～16日
講演発話を用いたN-gram補完手法の音声認識性能評価

島田敏明, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2010.09.14 (関西大学, 大阪)

　View Summary

pp.147 - 148, 2010年9月14日～16日
講演発話を用いたN‐gram補完手法が与える音声認識性能の調査

島田敏明, 西村竜一, 河原英紀, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2010.09.07
E-012 Investigations of Real Environmental Child Speech Collected by Voice Web System

Kurihara Lisa, Nisimura Ryuichi, Miyamori Shoko, Kawahara Hideki, Irino Toshio

FIT2010 第9回情報科学技術フォーラム 2010.09.07 (九州大学, 福岡)

　View Summary

pp.229 - 230, 2010年9月7日～9日
ちょっとした一言の音声認識による子ども利用者判別法の検討

宮森翔子, 西村竜一, 栗原理沙, 入野俊夫, 河原英紀

FIT2010 第9回情報科学技術フォーラム 2010.09.07 (九州大学, 福岡)

　View Summary

pp.469 - 472, 2010年9月7日～9日（筆頭著者宮森、「 FITヤングリサーチャー賞」受賞）
はじめての聴覚フィルタ―心理物理実験デモで学ぶ聴覚フィルタ特性―

入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2010.09
Comlementing 3-gram information using the Google Japanese N-gram database and term weighting

SHIMADA TOSHIAKI, NISIMURA RYUICHI, KAWAHARA HIDEKI, IRINO TOSHIO

情報処理学会研究報告, 2010-SLP-82-20, 電子情報通信学会音声研究会, 電子情報通信学会技術研究報告 2010.07.24 (秋保温泉, 仙台, 2009年7月22日～24日)

　View Summary

単語 3-gram モデルは，テキストコーパスから統計的手法に基づいて構築される．しかし，テキスト量が少ないと統計量を正しく算出できない．そこで本研究では，Google N-gram データに含まれる 3-gram エントリを用いて，3-gram 情報の補完を行った．3-gram エントリを選別せず補完すると，3-gram エントリ数が爆発的に増加する問題が発生する．そこで，提案手法では TF・IDF 指標と Yahoo! 関連キーワードから算出した単語重要度に基づき，追加する 3-gram エントリを選別した．これにより，重要性の低い 3-gram エントリの追加と，エントリ数の爆発的増加を防ぐ事が出来た．評価では，CSJ コーパスを用いて認識実験を行った．その結果，補完前より単語正解精度において 1.64% の向上が得られた．We have developed a method that utilizes the Google N-gram database to complement 3-gram entries in a language model. Our aim was to improve the accuracies of LVSR systems even when a 3-gram model trained on short texts is being used. This method is based on 3-gram occurrence information in external web documents and consists of three main steps. First, 3-gram entries are searched in the Google database. Secondly, 3-gram appearance counts are normalized on the basis of the ratio of total number of 3-gram entries. Lastly, 3-gram entries are selected on the basis of keywords. To prevent the addition of redundant or not relevant entries, 3-gram entries without a keyword are excluded to calculate 3-gram probabilities. The keywords were composed by measuring the TF-IDF weights and employing the web API of Yahoo! Japan. Experimental results confirmed 1.64% improvement in a recognition accuracy using the CSJ Corpus.
音源構造抽出法の初期推定値のバイアス除去と高速化について

河原英紀, 和田芳佳, 森勢将雅, 西村竜一, 入野俊夫

日本音響学会: 聴覚研究会資料 2010.07.17 (広島県立大, 広島)

　View Summary

H-2010-87, Vol. 40, No.6, pp.477-482, 2010年7月17日〜18日
Successful head-nodding movements in psychotherapeutic process -when and how,

Masashi Inoue, Nobuhiro Furuyama, Ryoko Hanada, Toshio Irino, Hiroyasu Massaki, Takako Ichinomiya

4th Conference of the International Society for Gesture Studies (ISGS) 2010.07

　View Summary

25 -30, July, 2010, Frankfurt Oder, Germany. (発表29 Jul 2010)
Optimization of excitation structure extraction based on objective evaluation using speech-like test signals

WADA Yoshika, ITAGAKI Hanae, MORISE Masanori, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report 2010.06.11 (北海道医療大学, 北海道, 2010年6月10日～11日)

　View Summary

Investigations on analysis and synthesis of expressive voice, such as "husky" and "hoarse" voices, which are typically found in emotional speech and singing are presented. Such voice usually has complex excitation structures which are not readily represented by a single number, F0. This article introduces optimization of system parameters and evaluation of our new analysis procedure called XSX (eXcitation Structure eXtractor), designed for such complex excitation signals. Pseudo speech signals are made from complex tones with FM and/or AM depending on the experimental design. They have a spectral slope similar to natural voiced sounds and do not have formant structure. The proposed method, XSX consists of two subsystems; an integrated periodicity detector which extracts simultaneous multiple periodicity candidates and a frequency refinement procedure based on instantaneous frequency of FO and harmonic components. Firstly, the candidate detector is optimized followed by the optimization of the refinement procedure. Secondly, comparative test with conventional F0 extractors were conducted and revealed that the proposed method outperforms those procedures in terms of accuracy and tracking speed.
Relevant Frequency band for vocal track length normalization based on spectral distance

OKAMOTO Erika, ASAKA Yoshiki, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report 2010.06.11 (北海道医療大学, 北海道, 2010年6月10日～11日)

　View Summary

Normalization of speaker dependent spectral differences is an important issue in speech applications, such as automatic speech recognition and high-quality voice morphing. Individual spectral differences are primarily dependent on vocal tract length differences. They are also dependent on glottal source signal and the shape of pyriform fossa. This article investigates effects of frequency range selection on spectral distance-based vocal tract length normalization (VTLN). It is based on an idea that the best VTLN performance can be attained by selecting frequency region where spectral differences are virtually exclusively determined by differences of vocal tract length. All combination of utterances spoken by 28 subjects were used to calculate estimates of their relative vocal tract lengths, which are used as the tentative "true" lengths to evaluates deviation of each VTL ratio estimation based on spectral distances. The test results revealed that the best performance is yielded by selecting frequency region spanning from 400 Hz to 4000 Hz, using an integrated logarithmic spectral distance using outputs of MFCC filter bank and their frequency derivatives.
Demonstration of a C-implementation of the dynamic compressive gammachirp for machine hearing,

Toshio Irino Toru Takahashi, Hideki Kawahara

Auditory Features Workshop, Equipe Audition, DEC, Ecole normale supérieure, France, 2010.06

　View Summary

1 & 3 Jun., 2010 (発表日 1 Jun)
Auditory filter shape from temporal masking curves and notched-noise data,

Toshio Irino, Nozomi Shimoshio, Hiroki Takahashi, Hideki Kawahara, Roy Patterson

Auditory Features Workshop, Equipe Audition, DEC, Ecole normale supérieure, France 2010.06

　View Summary

1 & 3 Jun., 2010 (発表日 3 Jun)
ウェブ収集発話を対象とした若年者判別の検討

宮森翔子, 西村竜一, 入野俊夫, 河原英紀

情報処理学会創立50周年記念（第72回)全国大会講演論文集 2010.03.11 (東大, 東京)

　View Summary

vol.2 pp.285-286, 5U-7, 2010年3月8日〜12日 (発表日 3月11日). （筆頭著者宮森、「学生奨励賞」受賞）
fMRIによる音声からの音源寸法情報とピッチ情報の処理とその交互作用の脳領域の検討

塚田裕樹, 入野俊夫, 大屋義和, PATTERSON Roy D, 河原英紀

日本音響学会：春季研究発表会講演論文集 2010.03.09 (電通大, 東京)

　View Summary

pp.599-602, 2010年3月8日〜10日
スペクトルピークを強調した平滑化を含むF0適応型スペクトル包絡推定法の最適化

赤桐隼人, 森勢将雅, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2010.03.09 (電通大, 東京)

　View Summary

pp.507-508, 2010年3月8日〜10日
音声からの複数の周期成分抽出および歌唱音声の周期構造分析への応用

和田芳佳, 板垣英恵, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2010.03.09 (電通大, 東京)

　View Summary

pp.505-506, 2010年3月8日〜10日
尖度に基づく音響的イベントの検出と音声分析変換合成システムへの応用について

河原英紀, 森勢将雅, 高橋徹, 坂野秀樹, 西村竜一, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2010.03.08 (電通大, 東京)

　View Summary

pp.315-316, 2010年3月8日〜10日
Google DBを用いたトピック特化型N‐gramモデル補完の検討

島田敏明, 鈴田健太郎, 永井裕貴, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2010.03.08 (電通大, 東京)

　View Summary

pp.177-178, pp.177-178, 2010年3月8日〜10日
時変モーフィングに基づく歌唱音声の操作と声質および歌い回しの評価について

岡本恵里香, 和田芳佳, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2010.03.08 (電通大, 東京)

　View Summary

pp.463-464, 2010年3月8日〜10日
Representation and estimation of aperiodic components in voiced sounds for high-quality analysis-synthesis systems

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

IEICE technical report 2010.03.05 (芝浦工大, 東京, 2010年3月4日～5日)

　View Summary

Mixed-mode excitation is crucially important and effective for high-quality speech analysis, modification and resynthesis systems. However, there are several incompatible constraints in representation and estimation of aperiodic component in the mixed-mode excitation. The current implementation of aperiodic component provides an answer for estimation problem at the expense of complicated representation which prevents ease of applications. This article proposes an aperiodic component spectral model that consists of exponential nonlinearity and a sigmoid. Although the proposed model is still in a preliminary phase and needs verifications based on variety of speech sounds, the proposed model seems to represent aperiodic components in a highly efficient manner. Informal listening tests also suggested that the proposed model provides better synthesized speech quality.
音の持続時間が音源の大きさ知覚に及ぼす影響 : 母音刺激を用いた検討(日本基礎心理学会第28回大会,大会発表要旨)

竹島千尋, 津崎実, 入野俊夫

基礎心理学研究 2010.03
Constraining the derivation of auditory filter shape with temporal masking curves,

Toshio Irino, Hiroki Takahashi, Hideki Kawahara, Roy D. Patterson

ARO 33th Midwinter meeting, Abstract #329, 2010.02

　View Summary

Anaheim, CA, USA, 6-10 Feb. 2010. (発表日 6 Feb., poster, abstract )
部分時変モーフィングによる母音情報に注目した歌声の転写実験と評価

岡本恵里香, 西村竜一, 入野俊夫, 河原英紀

第12回関西支部若手研究者交流研究発表会 2009.12.05 (関西大学,大阪) 日本音響学会関西支部
圧縮型ガンマチャープ適合による聴覚フィルタの周波数特性と圧縮特性の推定

下塩望, 入野俊夫, 河原英紀, 西村竜一

第12回関西支部若手研究者交流研究発表会 2009.12.05 (関西大学,大阪) 日本音響学会関西支部
部分時変モーフィングに基づく歌唱音声の歌い回しの転写実験と評価

和田芳佳, 西村竜一, 入野俊夫, 河原英紀

第12回関西支部若手研究者交流研究発表会 2009.12.05 (関西大学,大阪) 日本音響学会関西支部
TANDEM-STRAIGHT スペクトル包絡推定法の改良及び最適化に関する検討

赤桐隼人, 浅香佳希, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

第12回関西支部若手研究者交流研究発表会 2009.12.05 (関西大学,大阪) 日本音響学会関西支部

　View Summary

（筆頭著者赤桐、「若手奨励賞」受賞）
ウェブ収集発話を対象とした人間と機械の大人・子ども識別能力の比較

宮森翔子, 西村竜一, 入野俊夫, 河原英紀

第12回関西支部若手研究者交流研究発表会 2009.12.05 (関西大学,大阪) 日本音響学会関西支部

　View Summary

（筆頭著者宮森、「若手奨励賞」受賞）
音声による寸法情報とピッチ情報の処理とその交互作用のfMRI による脳領域の検討

塚田裕樹, 入野俊夫, 大屋義和, Roy, D.Patterson, 河原英紀

第12回関西支部若手研究者交流研究発表会 2009.12.05 (関西大学,大阪) 日本音響学会関西支部
Vowel-based voice conversion and its application to singing-voice manipulation

Yuri Yoshida, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara

Proceedings of the AES International Conference 2009.12.01

　View Summary

A novel and light-weight voice conversion method is applied to manipulate a singer's identity and singing style in real time. The proposed method is based on a non-linear spectral morphing method that uses proximity information for vowel templates of the source and the target singing materials. The proposed method is based on the STRAIGHT speech analysis, modification and resynthesis system, and it yields highly natural manipulated sounds. To deal with the difficulties in applying our vowel-based voice conversion method to singing voices, singular-value decomposition and robust statistical measures are introduced to handle the huge variability of vowel spectra and fundamental frequencies in singing voices. Distance measures for preparing vowel templates and calculating proximity information are designed based on a psychophysical frequency scale, the equivalent rectangular band, ERB N rate.
fMRI study on brain regions for scale and pitch processing for speech signal

塚田裕樹, 入野俊夫, 大屋義和

日本音響学会聴覚研究会資料 2009.11.14 (豊橋技科大, 豊橋)

　View Summary

H-2010-44, Vol. 40, No. 3, pp.231ー236, 2009年11月13日〜14日
スペクトル距離に基づくTANDEM-STRAIGHTスペクトル包絡推定の最適化に関する検討

赤桐隼人, 浅香佳希, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会聴覚研究会資料 2009.10.09 (加太国民休暇村, 和歌山)

　View Summary

H-2009-81, Vol. 39, No. 6, pp.459 - 464, 2009年10月9〜10日
Invited lecture: Measurement and formulation of the auditory filter

入野俊夫

聴覚研究会資料 2009.10.09
招待講演聴覚フィルタの測定と定式化について

Toshio Irino [Invited]

聴覚研究会、レクチャー招待講演 2009.10.09 (加太国民休暇村, 和歌山,)

　View Summary

日本音響学会聴覚研究会資料, H-2009-73, Vol. 39, No. 6, pp.413 - 418,2009年10月9〜10日,
二話者の発声した音声に基づく声道長比の推定法と実測された身長比との関係について

河原英紀, 宮森翔子, 浅香佳希, 西村竜一, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2009.09.17 (日本大学, 郡山, 福島)

　View Summary

pp.365-366, 2009年9月15日〜17日
声道形状データを利用したTANDEM‐STRAIGHTスペクトル包絡推定の最適化に関する検討

赤桐隼人, 浅香佳希, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2009.09.17 (日本大学, 郡山, 福島)

　View Summary

pp.391-392 , 2009年9月15日〜17日
TANDEM‐STRAIGHTに基づく周期構造検出器の性能評価指標と最適化について

板垣英恵, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2009.09.17 (日本大学, 郡山, 福島)

　View Summary

pp.363-364. 2009年9月15日〜17日
音声Webインタフェースを用いて収集した実環境発話の分析

鈴田健太郎, 宮森翔子, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2009.09.17 (日本大学, 郡山, 福島)

　View Summary

pp.125-126, 2009年9月15日〜17日
音声からの寸法情報処理の脳内部位のfMRIによる検討

塚田裕樹, 入野俊夫, 大屋義和, PATTERSON Roy D, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2009.09.16 (日本大学, 郡山, 福島)

　View Summary

pp.571-572, 2009年9月15日〜17日
Size perception in voiced and whispered speech,

Toshio Irino

CNBH 12th Anniversary Meeting on "The Role of Perception in Hearing and Speech Research Processing ," CNBH, Dept. of Physiology, Developement, and Neuroscience, Univ. of Cambridge, 3 - 4 Sept. 2009. 2009.09

　View Summary

(発表 3 Sept. )
E-038 Proposal of safety web systems using adult and child voice discriminations

Miyamori Shoko, Nishimura Ryuichi, Suzuta Kentaro, Kawahara Hideki, Irino Toshio

情報科学技術フォーラム講演論文集 2009.08.20
Vocoder-based morphing tool demonstrations for flexible voice manipulations,

Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino

Proc. 14th Regional Convention, Aud., Eng., Soc. 2009.07.23 (Tokyo)

　View Summary

23 - 25, July, 2009
Web-based adult and child voice collection to develop a voice-oriented web filtering service

NISIMURA RYUICHI, MIYAMORI SHOKO, SUZUTA KENTARO, KAWAHARA HIDEKI, IRINO TOSHIO

情報処理学会研究報告, 2009-SLP-77-19, 電子情報通信学会音声研究会, 電子情報通信学会技術研究報告 2009.07.18 (飯坂温泉, 福島)

　View Summary

本研究では，利用者の年齢層を発話音声から自動推定し，子どものアクセスを制限するウェブフィルタリングサービスの開発を目指す．今回，提案システムの実現に向けて，(1) 音声ウェブシステム w3voice を用いた大人・子ども発話のネットワーク収集実験，(2) GMM 音響モデルを用いた若年者自動判別の予備的実験を行った．発話収集の実験では，389 名の被験者の実環境発話 1,109 を集めることに成功した．発話を分析した結果，大人と子どもで，発話内容に異なる言語的傾向があることを確認した．また，GMM 音響モデルを用いた 14 歳以下の子どもの検出実験では正解率 65.9% を得た（大人の検出も含めると正解率 82.6%）．This study aims at developing a voice-based web filtering service to restrict children from the harmful websites. It is based on an automatic estimation of an age group from their voices. To realize it, we have performed (1) a collection of adult and child voices using voice-enabled web system "w3voice", and (2) an experiment of young voice detection on the basis of GMM-based acoustic recognition. In the experiment of the utterance collection, we succeeded in the collection of the 389 testees' real environmental 1,109 utterances. It was confirmed that there was the difference of language tendencies between adults and children as a result of analyzing the utterances. In the experiment on 14-years-old or younger child detection, 65.9% correct rate was obtained.
Representation of repetitive structures in speech and its application to F0 and aperiodicity extraction

板垣英恵, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会聴覚研究会資料, H-2009-55, Vol. 39, No. 4, pp.307 - 312, 電子情報通信学会応用音響研究会,電子情報通信学会技術研究報告, Vol.109, No. 100, EA2009-33, pp.91-96 2009.06.26 (北海道医療大学, 札幌,2009年6月25日〜26日)

　View Summary

A bottom up procedure for extracting repetitive structures in speech sounds is proposed based on a temporally stable representation of periodic sounds (TANDEM) and adaptive spectral smoothing for normalization (STRAIGHT). The proposed method evaluates local periodic structure in the frequency domain for detecting repetition in the time domain. A group of dedicated periodicity detectors are combined to construct the proposed procedure for repetitive structure extractor and called XSX (eXcitation Structure eXtractor). The proposed procedure is tested using a set of stylized test signals with artificial shimmer and jitter to investigate applicability for such aperiodic signals. The test results indicated that the proposed procedure outperformed in descriptive power of those complex excitation modes over existing F0 detectors. Finally, the proposed procedure is applied to analyze pathological voice examples to investigate feasibility of voice quality restoration applications.
Simultaneous fitting to notched noise and compression data using the compressive gammachirp auditory filter

入野俊夫, 高橋弘樹, 河原英紀, PATTERSON Roy D

日本音響学会聴覚研究会資料, H-2009-51, Vol. 39, No. 4, pp.283-288, 電子情報通信学会応用音響研究会,電子情報通信学会技術研究報告, Vol.109, No. 100, EA2009-29, pp.67-72 2009.06.26 (北海道医療大学, 札幌)

　View Summary

It is important to estimate precisely the frequency selectivity (filter shape) and the compression characteristics of the human auditory filter in the development of perceptual models for speech and acoustic signals. In the current study, we measured both of masked thresholds by notched noise experiments and an input-output function by forward masking experiment for each individual normal-hearing listener. The compressive gammachirp (cGC) filter was used for simultaneous fitting to the notched noise data and the input-output function. We demonstrated that it is possible to distinguish the common characteristics across the listeners from the individual differences in a set of parameters of the cGC filter.
Non-linguistic subjective evaluation of timbre based on audio-visual integration

西田沙織, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会聴覚研究会資料, H-2009-48, Vol. 39, No. 4, pp.265-270, 電子情報通信学会応用音響研究会,電子情報通信学会技術研究報告, Vol.109, No. 100, EA2009-26, pp.49-54 2009.06.26 (北海道医療大学, 札幌)

　View Summary

Perceptually relevant representations of timbre using two dimensional shapes are investigated aiming at establishing a framework for sound visualization based on human perception characteristics. A preference test of matching shapes to sounds was conducted using eleven sound stimuli having different prototypical power spectra and nine shapes. The results indicated that matching shapes were clearly divided into two classes depending on periodicity of the presented sounds. Perceptual correlates of shape selection were seemingly based on complexity and sharpness, while they are only subjectively defined. A set of objective descriptors of shapes based on complex number representation of their contours were introduced for further investigations on physical correlates of MDS results. These investigations indicated that normalized square root of area ratio to contour length and kurtosis have reasonable correlations with MDS axes.
声道長の正規化に基づく簡易モーフィング音声の品質改良について

浅香佳希, 西田沙織, 赤桐隼人, 西村竜一, 入野俊夫, 河原英紀

電子情報通信学会音声研究会, SP2009-34, 電子情報通信学会技術研究報告, Vol.109, No.99, pp.63-68, 2009.06.25 (北海道大学, 北海道)

　View Summary

2009年6月24日〜25日
再合成音声の品質に対する音声スペクトル包絡推定法の影響について

赤桐隼人, 大西壮登, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

電子情報通信学会音声研究会, SP2009-35, 電子情報通信学会技術研究報告, Vol.109, No.99, pp.69-74 2009.06.25 (北海道大学, 北海道)

　View Summary

2009年6月24日〜25日
Effects of spectral envelope representations on resynthesized speech quality

AKAGIRI Hayato, ONISHI Masato, MORISE Masatoshi, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report 2009.06.17

　View Summary

A speech analysis, modification and synthesis method TANDEM-STRAIGHT consists of two key components; a) temporally independent power spectral estimation for periodic signals: TAMDEM and b) F0 adaptive spectral smoothing based on consistent sampling theory. The second component employes two approximations for implementing its function. The first approximation is truncation of theoretically infinite number of compensating digital filter coefficients. The second approximation is to use log(1+x) instead of using x, because they are virtually similar provided |x|≪1 holds. This assures positivity of spectral envelope. This report investigates effects of these approximations by using subjective tests of resynthesized voiced sounds as well as objective tests based on spectral distance measure. The tests indicated that the resynthesized sounds by both method have equivalent quality of 40 to 50 Q value of MNRU, that is reasonably high. The test also indicated that the resynthesized sounds by legacy-STRAIGHT tend to have higher sound quality than those by TANDEM-STRAIGHT. These subjective results are consistent with the objective results based on the peak weighted spectral distance measure with frequency weighting, suggesting that there exists a room for further quality improvement of TANDEM-STRAIGHT.
Sound quality improvement based on vocal tract length normalization in simplified speech morphing

ASAKA Yoshiki, NISHIDA Saori, AKAGIRI Hayato, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report 2009.06.17

　View Summary

Need of careful manual placement of anchoring points is the major obstacle for application of current speech morphing based on STRAIGHT. This obstacle can be partially removed by normalizing vocal tract lengths (VTL) of speakers involved in morphing. Auditory inspired spectral distance measures are used to find the best normalizing ratio of VTLs. Preliminary subjective tests indicated that the proposed method improves perceptual quality of the morphed speech sounds. It was also suggested that introducing additional vocal tract shape parameter may be useful for improving quality further.
カスタマイズ性を重視した小規模N‐gramの融合に関する検討

鈴田健太郎, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2009.03.19 (東京工大, 東京)

　View Summary

pp.245-246, 2009年3月17日〜19日
Google N‐gramを用いたN‐gram確率補完の検討

西村竜一, 中井理沙, 鈴田健太郎, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2009.03.18 (東京工大, 東京)

　View Summary

pp.55-56, 2009年3月17日〜19日
声道断面積関数の補間によるモーフィング音声作成について―スペクトル概形の補償法の検討―

浅香佳希, 大西壮登, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2009.03.17 (東京工大, 東京)

　View Summary

pp.425-426, 2009年3月17日〜19日
TANDEM‐STRAIGHTにおけるスペクトル包絡推定精度の改善について

赤桐隼人, 森勢将雅, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2009.03.17 (東京工大, 東京)

　View Summary

pp.381-382, 2009年3月17日〜19日
音響的イベントの持続時間に基づいた非周期成分の時間構造の制御について

河原英紀, 森勢将雅, 高橋徹, 坂野秀樹, 西村竜一, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2009.03.17 (東京工大, 東京)

　View Summary

pp.439-440, 2009年3月17日〜19日
TANDEM‐STRAIGHTを用いたF0推定法の最適化及び性能評価―F0検出器の設計パラメタに関する検討―

板垣英恵, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2009.03.17 (東京工大, 東京)

　View Summary

pp.379-380, 2009年3月17日〜19日
Interface design for TANDEM-STRAIGHT and temporally variable speech morphing study

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

IEICE technical report 2009.03.06 (東京工科大, 八王子, 東京, 2009年3月5日〜6日) 電子情報通信学会音声研究会

　View Summary

This article introduces background and design principles of a set of graphical user interfaces to promote research on various aspects of speech processing frameworks, which were made possible by our new algorithms based on TANDEM-STRAIGHT. It is also intended to make new algorithms accessible to researchers with wider range of backgrounds and to acquire their feedback and to accelerate algorithm development itself. Speech morphing that is capable of handling temporally variable multi-aspect morphing rates, and vowel-based speech conversion are representative examples of such new processing frameworks. These algorithms are taking advantages of theoretical transparency and computational efficiency of TANDEM-STRAIGHT, which completely replaced internal algorithms of legacy-STRAIGHT.
Effects of time-frequency parameters of auditory stimuli and shape parameters of visual stimuli on audio-visual integration - Toward music visualization system based on perceptual structure -

NISHIDA Saori, MORISE Masanori, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IPSJ SIG Notes 2009.02.19 (産業技術総合研究所, 東京, 2009年2月18日〜19日)

　View Summary

An audio-visual integration test was conducted to investigate innate correspondence between sounds and shapes. Seven typical sound stimuli including periodic sounds and aperiodic sounds as well as musical instrumental sounds were presented followed by a pair of shapes. Subjects were asked to select one of shape that fit better with the preceding sound stimulus. MDS analyses of the results suggested that there seem to exist a common perceptual structure between vision and audition.
Development of Speech Input Method for Interactive VoiceWeb Systems.

Ryuichi Nisimura, Jumpei Miyake,Array, Toshio Irino

Human-Computer Interaction. Novel Interaction Methods and Techniques, 13th International Conference, HCI International 2009, San Diego, CA, USA, July 19-24, 2009, Proceedings, Part II 2009
実時間操作インタフェースへの応用を目的とした歌唱モーフィング操作パラメタの時系列への拡張について

河原英紀, 森勢将雅, 高橋徹, 坂野秀樹, 西村竜一, 入野俊夫

音楽音響研究会資料 2008.12.20

　View Summary

第78回音楽情報科学研究会, 龍谷大学, 大津, 2008年12月19日〜20日(発表日12月20日)
寸法変形した順応刺激音による寸法・形状知覚への影響

林芳恵, 入野俊夫, 青木良枝, 河原英紀

第11回関西支部若手研究者交流研究発表会 2008.12.17 (キャンパスプラザ京都, 京都) 日本音響学会関西支部
劣化音声の知覚特性と音声認識器の認識傾向の比較

森本隆司, 入野俊夫, 西村竜一, 河原英紀

日本音響学会聴覚研究会資料 2008.12.13 (虹の松原ホテル, 佐賀県唐津市)

　View Summary

H-2008-142, Vol. 38, No. 8, pp.803-808, 2008年12月12日〜13日
Singing morphing extension to temporally varying parameters for realtime morphing control interface

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

IPSJ SIG Notes 2008.12.12

　View Summary

Reuse of performance design in singing requires temporally localized manipulations of singing style, voice quality and expressions. They can be done in realtime such as in live concert scenes or can be done in off-line such as in the post production editing or recorded materials. A new framework is introduced to extend TANDEM-STRAIGTH-based morphing with a temporally variable multi-dimensional morphing rate and formulated. This formulation provides solid basis for implementing five morphing parameters (fundamental frequency, aperiodicity, STRAIGHT spectrogram, time and frequency axes) on each time-series independently. This formulation is based on interpolation of logarithmic derivative of transformation functions and enables extrapolative morphing without quality breakdown found in our previous formulations. The proposed method is easily extended to multiple exemplar morphing because the formulation is symmetric for each exemplar utterance.
TANDEM-STRAIGHTに基づく基本周波数抽出法に関する一検討

板垣英恵, 森勢将雅, 西村竜一, 入野俊夫, 河原英紀

電子情報通信学会技術研究報告,第10回音声言語シンポジウム 2008.12.10 (早稲田大学, 東京) 電子情報通信学会音声研究会

　View Summary

Vol.108, No.338, SP2008-105 (NLC2008-50), pp.155-160, 2008年12月9日〜10日
基本周波数情報に基づく線形予測と時間軸伸縮を利用した非周期成分の抽出について

河原英紀, 森勢将雅, 高橋徹, 坂野秀樹, 西村竜一, 入野俊夫

電子情報通信学会技術研究報告, 第10回音声言語シンポジウム 2008.12.10 (早稲田大学, 東京) 電子情報通信学会音声研究会

　View Summary

Vol.108, No.338, SP2008-93 (NLC2008-38), pp.85-90, 2008年12月9日〜10日
Parameter optimization for a fundamental frequency extractor based on TANDEM-STRAIGHT

ITAGAKI Hanae, MORISE Masanori, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

IEICE technical report 2008.12.02

　View Summary

A fundamental frequency extractor based on a temporally stable power spectral representation for periodic signals (TANDEM spectrum) and a spectral envelope derived from the representation (STRAIGHT spectrum) is proposed. This article describes roles of system parameters of the proposed method and their effects on system performance and reports results of preliminary optimization of them. System parameters investigated are; number of harmonic component for detecting hypothesized periodicity peak and weighting width on the log-lag domain for integrating specialized individual F0 detectors. Detailed descriptions of these parameters and their impact on F0 extraction performance are presented and they were further investigated using a database consisting of simultaneous recording of speech and EGG (electroglottogram) signals. Test results indicated that the proposed method has comparable performance with F0 extractors used in STRAIGHT and other popular F0 extractors such as YIN, when three harmonic components are used and weighting with a width of 1/√<2> of the center lag is used.
Aperiodicity extraction based on linear prediction and temporal axis warping using fundamental frequency information

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

IEICE technical report 2008.12.02

　View Summary

A reliable aperiodicity extractor is crucial for high-quality speech manipulation systems. This article proposes a new extractor based on a critical review on conventional methods (mainly on our previous proposals) and fundamental issues. The proposed method uses forward and backward linear predictors with lags around fundamental period and consists of an instantaneous fundamental frequency-based temporal axis warping. The extractor also consists of Quadrature Mirror Filter for frequency band division to control TB (time-bandwidth) product for reliable estimates. Combination of multiple clues extracted using the original and the manipulated time axes yields reliable and efficient estimates of aperiodicity spectrogram.
日英母国語話者における子音/音節処理の脳内部位の対比 − CV・VC音節を用いたfMRI実験 −

入野俊夫, 大屋義和, 河原英紀, Alexis G. Hervais-Adelman, D. Timothy Ives, Roy D. Patterson

2008年度第4回研究会 2008.11.17 (上智大学,東京) 上智大学オープン・リサーチ・センター「人間情報科学研究プロジェクト」ヒューマンコミュニケーショングループ
Comparison of the cortex for CV and VC syllables in Japanese and English subjects

大屋義和, 入野俊夫, Hervais-Adelman Alexis G

日本音響学会聴覚研究会資料 2008.10.17 (神戸セミナーハウス, 神戸)

　View Summary

H-2008-104, Vol. 38, No. 6, pp.597-602, 2008年10月17日〜18日
Investigation of temporal factors affecting speaker-size discrimination using isolated vowels with size scaling

竹島千尋, 津崎実, 入野俊夫

日本音響学会聴覚研究会資料 2008.10.17 (神戸セミナーハウス, 神戸)

　View Summary

H-2008-110, Vol. 38, No. 6, pp.633-637, 2008年10月17日〜18日
音声認識Webシステムにおける単語辞書構築技術

西村竜一, 鈴田健太郎, 河原英紀, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2008.09.12 (九州大学, 福岡市)

　View Summary

pp.197-198,2008年9月10日〜12日
零周波数フィルタ信号に基づく基本周波数抽出法の評価と応用について

河原英紀, 森勢将雅, 高橋徹, 坂野秀樹, 大西壮登, 板垣英恵, 西村竜一, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2008.09.11 (九州大学, 福岡市)

　View Summary

pp.423-424 ,2008年9月10日〜12日
2母音の寸法弁別に対する刺激音の時間特性と基本周波数の影響

竹島千尋, 津崎実, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2008.09.10 (九州大学, 福岡市)

　View Summary

pp.553-555 , 2008年9月10日〜12日
母音情報を用いた自動化音声モーフィングの方式パラメータの評価について

大西壮登, 高橋徹, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2008.09.10 (九州大学, 福岡市)

　View Summary

pp.361-362 , 2008年9月10日〜12日
E-023 A method to update ASR lexical information using Web resources

Suzuta Kentaro, Nisimura Ryuichi, Kawahara Hideki, Irino Toshio

FIT2008 第7回情報科学技術フォーラム 2008.09.03 (慶應大学, 藤沢)

　View Summary

pp.189-190, 2008年9月2日〜4日
F0 extraction based on the zero frequency filtered signal method and its application to TANDEM-STRAIGHT

KAWAHARA Hideki, MORISE Masanori, BANNO Hideki, ITAGAKI Hanae, ONISHI Masato, NISIMURA Ryuichi, IRINO Toshio

情報処理学会研究報告, 2008-MUS-76 (17), pp.97-102, 情報処理学会, 第76回音楽情報科学研究会 2008.08.07 (名古屋大学, 名古屋, 2008年8月6日〜8日)

　View Summary

An event based f0 extraction method based on so called zero frequency filtering method was proposed by Yegnanarayana for representing Indian stop consonants. The proposed method uses unstable IIR filters that place four poles at zero frequency and at the same time employs local mean subtracting filters to stabilize its output. This simple method was reported to run extremely fast and has comparative performance with existing F0 extractors. This article reports on a follow-up implementation of the method and evaluations and investigations for its performance and characteristics having its applicability to TANDEM-STRAIGHT and real time STRAIGHT in mind. The results indicated that the proposed method runs 7 times faster than real time with Matlab implementation on a standard laptop PC. It was also found that the gross error rate was 0.55% which is somewhat worse than the most recent methods but still reasonably high for practical applications. Finally, temporal resolution finer (namely 1/3) than instantaneous frequency based methods was also demonstrated.
Size discrimination and recognition for acoustically scaled versions of naturally pronounced and whispered speech words

AOKI Yoshie, IRINO Toshio, PATTERSON Roy D, KAWAHARA Hideki

日本音響学会聴覚研究会資料, H-2008-89, Vol. 38, No. 5, pp.507-512, 電子情報通信学会応用音響研究会,電子情報通信学会技術研究報告, EA2008-52, pp.35-40 2008.08.04 (東北大, 仙台, 2008年8月4日〜5日)

　View Summary

We have suggested that the auditory system can extract the size information and separate it from vocal-tract shape information. For example, humans can extract the message from the voices of adult and child without being confused by the size information, and they can extract the size information without being confused by the message. There were several size perception experiments about acoustically scaled vowels, syllables, musical instruments and animal voice. In this paper, we extended the size perception experiments to naturally spoken and whispered speech words to demonstrate that size perception is robust to the variation in the utterance (voiced and whispered). This results show that the size discrimination JND of both of voiced and whispered speech is almost the same and the recognition performance remains good beyond the normal range.
Improving accuracy in spectral envelope estimation based on TANDEM-STRAIGHT: recovery of higher spatial frequency components exceeding Nyquist limit posed by the fundamental frequency

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, BANNO Hideki, NISIMURA Ryuichi, IRINO Toshio

電子情報通信学会音声研究会, 電子情報通信学会技術研究報告, Vol.108, No.116, SP2008-28, pp.19-24 2008.06.27 (北海道医療大, 北海道, 2008年6月27日〜28日)

　View Summary

A simple new method to recover details in a spectral envelope is proposed based on a speech analysis, modification and resynthesis framework called TANDEM-STRAIGHT. Spectral envelope recovery of voiced sounds is a discrete-to-analog conversion in the frequency domain. However, there is a fundamental problem because the spatial frequency contents of vocal tract functions generally exceed the Nyquist limit of the equivalent sampling rate determined by the fundamental frequency. TANDEM-STRAIGHT yields a method to recover a spectral envelope based on the consistent sampling theory and provides base information for exceeding this limit. At the final stage, the AR spectral envelope estimated from the TANDEM-STRAIGHT spectrum is divided by the F0 adaptively smoothed version of itself to supply the missing high-spatial-frequency details of the envelope.
Effects on perceived impression of manipulated speech using a simplified morphing procedure based on STRAIGHT

NISHIDA Saori, ONISHI Masato, YOSHIDA Yuri, MORISE Masanori, NISHIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

情報処理学会研究報告, 2008-MUS-75(8), 2008-HCI-128(8), pp. 43-48, ( 第75回音楽情報科学研究会, 第128回ヒューマンコンピュータインタラクション研究会) 2008.05.28 (臨床研究情報センター, 神戸, 2008年5月28日〜29日)

　View Summary

A morphing procedure only relies on temporal axis alignment was tested subjectively in terms of naturalness and speakers' identity. Effects of contributing factors were investigated regarding on test words, morphing rates and used speakers. Naturalness of the morphed speech was deteriorated when the morphing rate nears 50%. Identification of mixing rate of two speakers was about 60% when the morphing rate is 25%, 50% or 75%. Naturalness of the morphed speech sounds were found higher when speakers' sex was identical while mixing rate identification were lower. These results suggest that the proposed simplified procedure is practically usable for morphing speakers having the same sexual distinction.
日英母国語話者における音節処理を担う脳内部位の比較

大屋義和, 入野俊夫, エルベ-アデルマン, アレクシー, イブスティム, 河原英紀, パターソンロイ

ブレインコミュニケーション時限研究専門委員会 2008.05.16 (けいはんなATR, 京都)

　View Summary

pp.38-43, 2008年5月15日〜16日
Comparison of the brain regions for consonant processing in Japanese and English subjects,

Yoshikazu Oya, Toshio Irino, Alexis G, Hervais-Adelman, D. Tim Ives, Hideki Kawahara, Roy D. Patterson

J. Acoust. Soc. Am. , 123(5), Pt.2, 2008.05

　View Summary

(Acoustic'08 (ASA joint meeting), Paris, France, 29 June - 4 July 2008. ) (発表日 3 Jul. )
Speaker size discrimination for acoustically scaled versions of whispered words,

Yoshie Aoki, Toshio Irino, Hideki Kawahara, Roy D. Patterson

J. Acoust. Soc. Am. , 123(5), Pt.2, 2008.05

　View Summary

(Acoustic'08 (ASA joint meeting), Paris, France, 29 June - 4 July 2008. ) (発表日 3 Jul. )
時間平均に基づく周期信号のパワースペクトル推定法

森勢将雅, 高橋徹, 河原英紀, 入野俊夫

電子情報通信学会, 2008年総合大会 2008.03.21 (九州工大, 北九州)

　View Summary

AS-5-1, 2008年3月18日〜21日
Improvement of real-time STRAIGHT and implementation of STRAIGHT library

BANNO Hideki, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

電子情報通信学会技術研究報告(日本音響学会・聴覚研究会/ 電子情報通信学会音声研究会) SP2007-213, pp.157-162, (聴覚研究会資料 38(2), pp.193-198) 2008.03.21 (東京大学, 東京, 2008年3月20日〜21日)

　View Summary

This paper describes improvement of real-time STRAIGHT and implementation of STRAIGHT library. STRAIGHT is a high quality speech analysis, modification and synthesis system based on the VOCODER-type representation. STRAIGHT is currently finding wide applications such as speech synthesis systems and tools for auditory experiments. However, the current implementation of STRAIGHT by MATLAB does not fit to real-time applications. Thus, we have been porting the language of the source code to the C language, and finally have finished the porting from the MATLAB latest version to C version. The real-time STRAIGHT using the ported functions was subjectively evaluated by the mean opinion scores (MOS). The MOS of the improved real-time STRAIGHT is approximately 0.7 point better than that of the previous version of the real-time STRAIGHT. We have also implemented the STRAIGHT library including STRAIGHT API for C language.
Web知識を二段階利用した単語辞書更新手法

鈴田健太郎, 西村竜一, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2008.03.19 (千葉工業大学, 習志野市)

　View Summary

pp.123-124, 2008年3月17日〜19日
母音情報に基づく声質変換法における連続発話音声からの母音テンプレートの設計

大西壮登, 高橋徹, 森勢将雅, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2008.03.19 (千葉工業大学, 習志野市)

　View Summary

pp.429-430, 2008年3月17日〜19日
TANDEMおよびSTRAIGHTスペクトルに基づく基本周波数および非周期性の表現について

河原英紀, 森勢将雅, 高橋徹, 西村竜一, 坂野秀樹, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2008.03.18 (千葉工業大学, 習志野市)

　View Summary

pp.563-564, 2008年3月17日〜19日
音声処理の初期段階を担う脳内部位の検討

大屋義和, 入野俊夫, HERVAIS‐ADELMAN Alexis, IVES Tim, 河原英紀, PATTERSON Roy D

日本音響学会：春季研究発表会講演論文集 2008.03.18 (千葉工業大学, 習志野市)

　View Summary

pp.539-540, 2008年3月17日〜19日
無声化した単語音声を用いた音源寸法知覚の弁別閾

青木良枝, 入野俊夫, PATTERSON Roy D, 河原英紀

日本音響学会：春季研究発表会講演論文集 2008.03.18 (千葉工業大学, 習志野市)

　View Summary

pp.569-570, 2008年3月17日〜19日
聴覚フィルタの形状と圧縮特性の測定とパラメータ推定

中家諒, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2008.03.18 (千葉工業大学, 習志野市)

　View Summary

pp.567-568, 2008年3月17日〜19日
音声入力Webシステムw3voiceにおける音声認識手法の検討

西村竜一, 三宅純平, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2008.03.17 (千葉工業大学, 習志野市)

　View Summary

pp.51-52, 2008年3月17日〜19日
歌唱音声と会話音声のSTRAIGHTによる分析と母音部におけるスペクトル変動の統計的性質の比較

吉田有里, 森勢将雅, 高橋徹, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2008.03.17 (千葉工業大学, 習志野市)

　View Summary

pp.279-280, 2008年3月17日〜19日
時間窓と入力信号の持続時間に基づく音響イベント検出を利用した音源位置推定法の一検討

小林憲昭, 森勢将雅, 高橋徹, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2008.03.17 (千葉工業大学, 習志野市)

　View Summary

pp.775-776, 2008年3月17日〜19日
音声入力Webシステムによる音声認識アプリケーションの構築技術

西村竜一, 三宅純平, 河原英紀, 入野俊夫

情報処理学会第70回全国大会講演論文集 2008.03.14 (筑波大学,つくば市)

　View Summary

3L-5, Vol.5, pp.343-344, 2008年3月13日〜15日
STRAIGHTに基づく柔軟な音声合成技術の開発

河原英紀, 大西壮登, 森勢将雅, 高橋徹, 西村竜一, 坂野秀樹, 入野俊夫

情報処理学会第70回全国大会講演論文集 2008.03.14 (筑波大学,つくば市)

　View Summary

4L-5, Vol.5,pp.357-358, 2008年3月13日〜15日
Development of versatile speech synthesis technology based on STRAIGHT

KAWAHARA Hideki, ONISHI Masato, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, BANNO Hideki, IRINO Toshio

全国大会講演論文集 2008.03.13
A New implementation technique for building ASR applications based on voice-enabled Web systems

NISIMURA Ryuichi, MIYAKE Jumpei, KAWAHARA Hideki, IRINO Toshio

全国大会講演論文集 2008.03.13
AS-5-1 THE POWER SPECTRUM ESTIMATION FOR PERIODIC SIGNAL BASED ON TIME AVERAGING

Morise Masanori, Takahashi Toru, Kawahara Hideki, Irino Toshio

Proceedings of the IEICE General Conference 2008.03.05
F0 trajectory deviations from nominal musical transcription in Pop singing

YOSHIDA Yuri, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, IRINO Toshio, KAWAHARA Hideki

情報処理学会, 音声言語情報処理研究会(第70回)、音楽情報科学研究会(第74回), 情報処理学会研究報告, 2008-MUS-74-3, 2008-SLP-70-3, pp.13-18 2008.02.08 (伊東温泉, 伊東市(静岡県), 2008年2月8日〜9日)

　View Summary

A reformulation of STRAIGHT FO extractor based on a new power spectrum estimation method for periodic signals called TANDEM made it practical to extract whole FO trajectory of a singing voice of an actual performance. This article reports a first attempt for representing effects of singing style in terms of deviations from a nominal musical transcription, using a singing database that consists of various types of singing performance played by professional pop singers. FO extraction issues caused by fast FO transitions commonly found in singing voices are also discussed.
Speaker size discrimination for acoustically scaled versions of naturally spoken words,

Yoshie Aoki, Toshio Irino, Hideki Kawahara, Roy D. Patterson

ARO 31th Midwinter meeting,Abstract #508, 2008.02

　View Summary

Phoenix, AZ, USA, 16-21 Feb. 2008. (発表日 19 Feb. )
Fundamental frequency estimation based on TANDEM-STRAIGHT and its evaluation

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, BANNO Hideki, IRINO Toshio

第9回音声言語シンポジウム, (電子情報通信学会音声研究会・言語理解とコミュニケーション研究会究会),情報処理学会研究報告, 2007-SLP-69-45, pp.259-264, 信学技報 Vol.107(406), NLC2007-77, SP2007-140 2007.12.21 (NTTけいはんな,京都, 2007年12月20日〜21日)

　View Summary

TANDEM method, a power spectrum estimation method for periodic signals was proposed to provide temporally stable representation and has been applied to reformulate STRAIGHT, a system for speech analysis modification and synthesis. This article proposes a fundamental period estimation method based on a ratio between TANDEM spectrum and STRAIGHT spectrum. By providing specialized F0 detectors for multiple F0 candidates and integrating individual clues, the proposed method selectively detects fundamental components and yields a probability measure for each estimate. It also provides a method to estimate aperiodicity in each frequency band by making use of estimated fundamental frequency information to design a quadrature signal on the frequency axis for filtering periodic spectral component due to the signal periodicity. The proposed method is capable of representing pathological speech signals more precisely than conventional methods.
聴覚系における共鳴体の「大きさ」知覚の時間追従性 − 寸法変調音声を用いた検討 −

竹島千尋, 津崎実, 入野俊夫

日本基礎心理学会第26回大会 2007.12.09 (上智大学, 東京)

　View Summary

p.54, 2007年12月8日〜9日
Perception of degraded word sounds from the monosyllable sequence

森本隆司, 入野俊夫, 河原英紀

聴覚研究会資料 2007.12.06 (熊本大学, 熊本) 日本音響学会

　View Summary

H-2007-135, 37 (10), pp.775-780 2007年12月6日〜7日
Speaker size discrimination for acoustically scaled versions of naturally spoken words

青木良枝, 入野俊夫, Patterson Roy D

聴覚研究会資料 2007.12.06 (熊本大学, 熊本) 日本音響学会

　View Summary

H-2007-137, 37 (10), pp.787-792 2007年12月6日〜7日
単音節系列の知覚に関する検討〜調音結合と日本語特有の音節遷移情報の影響があるか〜

森本隆司, 入野俊夫, 河原英紀

第10回若手研究者交流研究発表会 2007.11.29 (甲南大学, 神戸) 日本音響学会関西支部
ボイスチェンジャー5.0〜日本語５母音に基づく声質変換〜

大西壮登, 高橋徹, 入野俊夫, 河原英紀

第10回若手研究者交流研究発表会 2007.11.29 (甲南大学, 神戸) 日本音響学会関西支部
自然発話された単語による音源寸法知覚の弁別閾- 巨人と小人の声の共通点を探る -

青木良枝, 入野俊夫, Roy D.Patterson, 河原英紀

第10回若手研究者交流研究発表会 2007.11.29 (甲南大学, 神戸) 日本音響学会関西支部
双方向変換により共通化された時間周波数軸上でのパラメタ混合に基づく音声モーフィング

高橋徹, 大西壮登, 森勢将雅, 河原英紀, 入野俊夫

第22回信号処理シンポジウム 2007.11.08 (東北大学, 仙台)

　View Summary

pp. 316-321 2007年11月7日〜9日
分析位置に依存しない周期信号のパワースペクトル推定法に基づく音声分変換合成法STRAIGHTの再構成について

河原英紀, 森勢将雅, 高橋徹, 西村竜一, 坂野秀樹, 入野俊夫

第22回信号処理シンポジウム 2007.11.08 (東北大学, 仙台)

　View Summary

pp. 310-315 2007年11月7日〜9日
周期信号の分析時刻に依存しないパワースペクトル推定法における対雑音性の評価

森勢将雅, 高橋徹, 河原英紀, 入野俊夫

第22回信号処理シンポジウム 2007.11.07 (東北大学, 仙台)

　View Summary

pp. 581-586 2007年11月7日〜9日
Vowel-based speech conversion using generalized inverse

ONISHI Masato, TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

IEICE technical report 2007.10.26 (長崎大学, 長崎, 2007年10月25日〜26日)

　View Summary

A vowel-based voice conversion method using generalized inverse was proposed. The proposed method uses vowel information only to design spectrum conversion function for each frame. The conversion function is generated by mixing each function designed for each vowel based on the similarity of the current frame to each vowel template. The proposed method was compared with our previous proposal where Gaussian potential function of distance to each template was used to calculate similarity. The proposed method enables geometrical interpretation of a mixing weight as a minimum norm to the subspace spanned by vowel templates. A preliminary test results using objective as well as subjective measure were presented.
w3voice: Development of Speech Input Method for Voice-enabled Web Applications

西村竜一, 三宅純平, 河原英紀, 入野俊夫

情報処理学会研究報告, 2007-SLP-68-3, 情報処理学会,第3回音声言語情報処理技術デッベロッパーズフォーラム 2007.10.19 (早稲田大,東京)

　View Summary

We have developed a speech input method called "w3voice" to build practical and handy voice-enabled Web applications. It is constructed using a simple Java applet and CGI programs comprising free software. The mechanism of voice-based interaction is developed on the basis of raw audio signal transmissions via the POST method and the redirection response of HTTP. We have released a number of w3voice applications on our website for public uses. The system also aims at organizing a voice database obtained from home and office environments. We have succeeded in acquiring 8, 412 inputs (47.9 inputs / day) over a period of seven months. This report describes an overview of the proposed system, and results of analyzing collected inputs to observe utterance lengths and SNR.
周期信号の分析時刻に依存しないパワースペクトル推定法に適した窓関数の検討

森勢将雅, 高橋徹, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2007.09.21 (山梨大学, 甲府)

　View Summary

pp.349-350, 2007年9月19日〜21日
聴覚フィルタを評価に用いた逆フィルタ設計法に関する一考察

森勢将雅, 高橋徹, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2007.09.21 (山梨大学, 甲府)

　View Summary

pp.737-738, 2007年9月19日〜21日
STRAIGHTにおける時間周波数分析の新しい定式化と実装について

河原英紀, 森勢将雅, 高橋徹, 西村竜一, 入野俊夫, 坂野秀樹

日本音響学会：秋季研究発表会講演論文集 2007.09.21 (山梨大学, 甲府)

　View Summary

pp.347-348 , 2007年9月19日〜21日
劣化処理した単音節系列の知覚に関する検討

森本隆司, 入野俊夫, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2007.09.20 (山梨大学, 甲府)

　View Summary

pp.595-596, 2007年9月19日〜21日
有声/無声(ささやき)母音系列における寸法変調の検知閾

竹島千尋, 津崎実, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2007.09.20 (山梨大学, 甲府)

　View Summary

pp.539-542, 2007年9月19日〜21日
単語音声を用いた寸法弁別実験の改善

青木良枝, 入野俊夫, PATTERSON Roy D, 河原英紀

日本音響学会：秋季研究発表会講演論文集 2007.09.20 (山梨大学, 甲府)

　View Summary

pp.549-550, 2007年9月19日〜21日
STRAIGHTを用いた反復分析再合成音声の評価

高橋徹, 河原英紀, 入野俊夫

日本音響学会：秋季研究発表会講演論文集 2007.09.19 (山梨大学, 甲府)

　View Summary

pp.289-290, 2007年9月19日〜21日
母音情報に基づく声質変換法のためのスペクトル伸縮について

大西壮登, 高橋徹, 入野俊夫, 河原英紀

2007.09.19 (山梨大学, 甲府)

　View Summary

pp.397-398, 2007年9月19日〜21日
Public Open Tests of Interactive Speech-oriented Web applications

Nishimura Ryuichi, Miyake Junpei, Kawahara Hideki, Irino Toshio

FIT2007 第6回情報科学技術フォーラム 2007.09.07 (中京大学, 愛知)

　View Summary

pp.319-322, 2007年9月5日〜7日 (筆頭著者西村、「FITヤングリサーチャー賞」受賞)
Automatic mapping function designing method modeled by segmental linear function for auditory morphing

Takahashi Toru, Ohnishi Masato, Morise Masanori, Banno Hideaki, Kawahara Hideki, Irino Toshio

FIT2007 第6回情報科学技術フォーラム 2007.09.06 (中京大学, 愛知)

　View Summary

pp.233-236, 2007年9月5日〜7日
招待講演 The robustness of bio-acoustic communication and the role of normalization,

Roy D. Patterson, Ralph van Dinther, Toshio Irino [Invited]

19th International Congress on Acoustics (ICA2007) 2007.09.03 (Madrid)

　View Summary

2-7 Sept., 2007.
招待講演 A computational auditory model with a nonlinear cochlea and acoustic scale normalization,

Toshio Irino, Tom C. Walter, Roy D. Patterson [Invited]

19th International Congress on Acoustics (ICA2007) 2007.09.03 (Madrid)

　View Summary

2-7 Sept., 2007.
LE-004 Timbre control of singing voice based on statistical analysis of singing vowel spectra and its evaluation

Morise Masanori, Tahara Kayoko, Takahashi Toru, Irino Toshio, Kawahara Hideki

情報科学技術レターズ 2007.08.22
A temporal and frequency interference-free power spectral representation of periodic signals : Toward STRAIGHT spectral estimation without tunable component

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, NISIMURA Ryuichi, IRINO Toshio, BANNO Hideki

IEICE technical report 2007.07.26 (富山県立大, 富山, 2007年7月26日〜27日)

　View Summary

A new spectral estimation procedure which does not have interferences due to periodicity both in the time and the frequency domain is proposed. The basic form of the proposed method has only a few tunable parameters once the fundamental frequency of the signal under inspection is given. This is strong contrast to the current implementation of STRAIGHT, where many parameters were tuned numerically or in an ad hoc manner. Time domain interference is eliminated by adding power spectra that is calculated by a pair of windows that is separated one half of the fundamental period. Frequency domain interference is eliminated by combining power spectrum integration and linear interpolation based on an approximation-based interpretation of the sampling theory. The proposed method can be use to replace current spectral estimation subsystem of STRAIGHT and suitable for realtime processing.
A unified design criteria for noise adaptive sound reproduction system based on an auditory model

森勢将雅, 福田俊介, 高橋徹, 入野俊夫, 河原英紀

13th Regional Convention, Aud., Eng., Soc., 2007.07.20 (Tokyo)

　View Summary

19 - 21, July, 2007 (日本語)
Computational theory of auditory size-shape information extraction and the localization in the brain

IRINO Toshio, OOYA Yoshikazu, KAWAHARA Hideki, PATTERSON Roy D

IEICE technical report 2007.06.14 (沖縄科学技術大学院大学(OIST),沖縄 2007年6月14日〜15日)

　View Summary

Although perception of size and shape from visual stimuli has been studied intensively as an important topic, perception of size and shape from auditory stimuli has been almost unaware in the auditory research field. In this report, we describe size and shape information in acoustic signals and a computational theory to extract the information in the auditory system. We also present experimental studies to support the theory, optimality of the auditory filters based on the theory, and ecological perspectives implied from the theory. We performed fMRI experiments to identify the location of the size-shape perception in the brain. We report the preliminary results and issues.
Applying Speech Transformation function derived from Speech Texture Mapping to Automatic Speech Morphing An application of voice texture mapping

高橋徹, 森勢将雅, 大西壮登, 西村竜一, 入野俊夫, 坂野秀樹, 河原英紀

電子情報通信学会技術研究報告(音声研究会), SP2007-6, Vol.107, No.77, pp.31-34 2007.05.31 (けいはんなATR, 京都)

　View Summary

A general framework for speech morphing is proposed based on a concept called speech texture mapping. The proposed method eliminates anchoring point assignment that is a severe obstacle for adopting STRAIGHT-based morphing to wide range of applications. Instead of using anchoring points to design the frequency axis mapping, proximity to prototypical spectrum templates are used to calculate weighting coefficients for mixing prototypical mapping functions. This framework is an extension of our previous vowel based speech conversion method. Discussions on several alternative temporal axis alignment methods are presented how the proposed frequency axis design procedure is integrated into a morphing procedure that does not rely on anchoring point assignment.
Speaker conversion system based on vowels : An implementation of voice texture mapping

TAKAHASHI Toru, MORISE Masanori, NISIMURA Ryuichi, IRINO Toshio, BANNO Hideki, KAWAHARA Hideki

IEICE technical report 2007.03.26 (東京大学, 東京, 2007年3月26日-27日)

　View Summary

A simple and high-quality voice conversion procedure only depends on vowel information is proposed. It is based on a framewise conversion of frequency axis, fundamental frequency and global spectral and aperiodicity information using posterior probability as weighting function for calculating mapping function. The proposed method is an implementation of a concept called "speech texture mapping" that was proposed by one of the authors. The key idea which yields advantages of the proposed method is that the role and the relevant mapping function of detailed structure (refer as "texture") and global structure (refer as "framework") are different from each other. This clear distinction of "texture" and "framework" enabled a high-quality voice conversion requiring only a very small amount of training data. This distinction also provides a way to alleviate degradations due to "averaging" or "learning" processes which are indispensable in conventional voice conversion methods.
単語音声における寸法の弁別閾の測定

青木良枝, 入野俊夫, Roy D. Patterson, 河原英紀

日本音響学会：春季研究発表会講演論文集 2007.03.14 (芝浦工大, 東京)

　View Summary

pp.471-472, 2007年3月13日-15日
音声モーフィングにおける周波数座標変換関数の設計と知覚への影響について

河原英紀, 森勢将雅, 高橋徹, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2007.03.14 (芝浦工大, 東京)

　View Summary

pp.477-478, 2007年3月13日-15日
fMRIによるスケール変形に対する脳内活動部位の検討

大屋義和, 入野俊夫, Roy D. Patterson, 河原英紀

日本音響学会：春季研究発表会講演論文集 2007.03.14 (芝浦工大, 東京)

　View Summary

pp.425-426, 2007年3月13日-15日
話者の寸法を変化させた時の母音と単語の知覚特性の比較

林芳恵, 入野俊夫, Roy D. Patterson, 河原英紀

日本音響学会：春季研究発表会講演論文集 2007.03.14 (芝浦工大, 東京)

　View Summary

pp.473-474, 2007年3月13日-15日
音声認識を用いた劣化音声に含まれる情報の検討

松村勇作, 入野俊夫, 西村竜一, 河原英紀

日本音響学会：春季研究発表会講演論文集 2007.03.14 (芝浦工大, 東京)

　View Summary

pp.475-476, 2007年3月13日-15日
STRAIGHTスペクトルにおける周波数方向の冗長性の削減の検討

吉田有里, 畑宏明, 坂野秀樹, 高橋徹, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2007.03.13 (芝浦工大, 東京)

　View Summary

pp.289-290, 2007年3月13日-15日
自動音素セグメンテーションと自動特徴点設定手法を用いた音声モーフィング

大西壮登, 高橋徹, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2007.03.13 (芝浦工大, 東京)

　View Summary

pp.273-274, 2007年3月13日-15日
ネットワーク公開試験に向けた音声対話Webアプリケーションの開発

西村竜一, 三宅純平, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2007.03.13 (芝浦工大, 東京)

　View Summary

pp.17-18, 2007年3月13日-15日
モーフィング率独立操作による部分モーフィング音声の品質評価

高橋徹, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2007.03.13 (芝浦工大, 東京)

　View Summary

pp.211-212, 2007年3月13日-15日
パルス列を用いた高域における群遅延操作の弁別閾推定

森勢将雅, 高橋徹, 入野俊夫, 河原英紀

日本音響学会：春季研究発表会講演論文集 2007.03.13 (芝浦工大, 東京)

　View Summary

pp.517-518, 2007年3月13日-15日
STRAIGHTを用いた歌唱合成における母音スペクトル形状制御の効果について

森勢将雅, 田原佳代子, 高橋徹, 入野俊夫, 河原英記

日本音響学会：春季研究発表会講演論文集 2007.03.13 (芝浦工大, 東京)

　View Summary

pp.219-220, 2007年3月13日-15日
低周波数領域での区分線形補間の弊害についての一検討

鈴田健太郎, 森勢将雅, 高橋徹, 河原英紀, 入野俊夫

日本音響学会：春季研究発表会講演論文集 2007.03.13 (芝浦工大, 東京)

　View Summary

pp.275-276, 2007年3月13日-15日
Auditory stream segregation based on speaker size, and identification of size-modulated vowel sequences

Tsuzaki Minoru, Takeshima Chihiro, Irino Toshio, Patterson Roy D

HEARING - FROM SENSORY PROCESSING TO PERCEPTION 2007
招待講演 Warped-time-stretched pulse: An acoustic test signal robust against ambient noise,

Masanori Morise, Toshio Irino, Hideki Banno, Hideki Kawahara [Invited]

4th Joint Meeting of the ASA and ASJ 2006.12.01 (Honolulu, Hawaii)

　View Summary

J. Acoust. Soc. Am. , 120(5), Pt.2, p.3223, Nov. 28 Nov. - 2 Dec. 2006,
On perceptually relevant impulse response compensation : Discrimination threshold of group delay manipulation and its frequency dependency

MORISE Masanori, TAKAHASHI Toru, IRINO Toshio, KAWAHARA Hideki

電子情報通信学会技術研究報告, (日本音響学会・電子情報通信学会, 応用音響研究会), EA2006-72, 106(371), pp.13-18 2006.11.23 (九州大学・大橋キャンパス,福岡, 2006年11月23日-24日)

　View Summary

Discrimination thresholds of group delay modifications in deferent frequency regions were measured using a 2AFC paradigm. The tests were conducted to clarify acceptable errors in temporal structures in impulse response compensation. Taking advantage of this error tolerance, regularization algorithms which do not suffer from erroneous zeroes in measured transfer functions are being investigated. In this report, as the first step to attain this goal, series of tests using a pulse train was designed and conducted. The shape of group delay manipulation has a constant relative band width in terms of ERB_N and has various maximum delay values. The test results indicated that discrimination was poor in the lower frequency region namely lower than 1000Hz. For higher frequency region, it was indicated that the discrimination threshold is inversely proportional to the center frequency of the group delay manipulation. It was also found that threshold is smaller when group delay manipulation has negative peak value than the other case.
Application of auditory model based evaluations for parameter adjustments

福田俊介, 森勢将雅, 河原英紀, 入野俊夫

電子情報通信学会技術研究報告, (日本音響学会・電子情報通信学会, 応用音響研究会), EA2006-77, 106(371), pp.43-48 2006.11.23 (九州大学・大橋キャンパス ,福岡, 2006年11月23日-24日)

　View Summary

A new framework for adjusting adaptive multiband equalizers based on a gammachirp filter bank (GCBF) that closely simulate nonlinear and adaptive frequency analysis in a human auditory system was proposed. The proposed framework is aiming at establishing a method for objective evaluation and optimization of sound reproduction inside a car. The goal of adjustment is to present comparable musical experience to ordinary listening room conditions. Analysis results of background noise, reproduced musical sounds and filtered and mixed these sounds using GCBF are presented with discussions.
Acoustic event detection based on bandwise duration and its application to location estimation

森勢将雅, 高橋徹, 入野俊夫, 河原英紀

電子情報通信学会技術研究報告, (日本音響学会・電子情報通信学会, 応用音響研究会), EA2006-73, 106(371), pp.19-24 2006.11.23 (九州大学・大橋キャンパス,福岡, 2006年11月23日-24日)

　View Summary

A highly accurate acoustic event detection method was proposed based on band wise group delay parameters and minimum phase compensation. These band wise parameters make it possible to select the best band to maximize reliability of the estimates. This is practically very useful because even in a low signal to noise condition, it is usually possible to select a band that has much better signal to noise ratio and yields far better estimates. This local improvement in signal to noise ratio enables accurate event detection. In this paper, an index representing amount of energy concentration was proposed as the parameter for event detection. A series of simulations provides relations between bandwidth and risk of detection errors for each time window length when using the proposed index. Relations between signal to noise ratio and accuracy of event timing estimats were also provided. Finally, applicaitons of the proposed method for three dimentional sound source localization was briefly discussed in terms of distributed acoustic sensors.
Source signal extraction and aperiodicity evaluation based on STRAIGHT spectrum

KAWAHARA Hideki, MORISE Masanori, TAKAHASHI Toru, IRINO Toshio, BANNO Hideki, FUJIMURA Osamu

電子情報通信学会技術研究報告, (日本音響学会・電子情報通信学会音声研究会), SP2006-83, 106(333), pp.43-48 2006.11.10 (産業技術総合研究所, つくば, 2006年11月9日-10日)

　View Summary

A new procedures to extract aperiodic component was proposed based on a fundamental discussion on how aperiodicity should be defined. This investigation is a part of an ongoing research to provide high-quality speech processing methods consisting of analysis, modification and synthesis. Roles and relations between frequency domain representation of signal duration based on group delay, bandwise durations of the extracted source signal using minimum phase inverse filter derived from a STRAIGHT spectrum, prediction residuals usign franking segments which are one pitch period apart, and apparent residuals due to temporal spectral variations are clarified in this discussion.
Evaluating naturalness of speech sounds morphed by independently using the interpolation ratios of the time-frequency axes and amplitude,

Toru Takahashi, Masanori Morise, Toshio Irino

J. Acoust. Soc. Am. , 120(5), Pt.2, 2006.11

　View Summary

(4th Joint Meeting of the ASA and ASJ: 28 Nov. - 2 Dec. 2006, Honolulu, Hawaii). (発表日 28 Nov.)
Temporal characteristics of extraction of size information in speech sounds,

Chihiro Takeshima, Minoru Tsuzaki, Toshio Irino

J. Acoust. Soc. Am., 120(5), Pt.2, 2006.11

　View Summary

(4th Joint Meeting of the ASA and ASJ: 28 Nov. - 2 Dec. 2006, Honolulu, Hawaii) (発表日 29 Nov.)
弁別素性に基づく異聴表による健聴者と難聴者の音声知覚の対比

中家諒, 入野俊夫, 中市健志, 坂本真一, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.09.15 (金沢大学, 石川) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.369-370, 2006年9月13日-15日
ロボット対話のための雑音認識手法に関する検討

橋爪亜希, 西村竜一, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.09.14 (金沢大学, 石川) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.135-136, 2006年9月13日-15日
聴覚特性を考慮したSTRAIGHTスペクトル補間特性とその主観評価について

畑宏明, 坂野秀樹, 高橋徹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.09.13 (金沢大学, 石川) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.271-272, 2006年9月13日-15日
母音テンプレートスペクトルを用いた音声テクスチャマッピングのための特徴点自動設定における距離尺度の検討

大西壮登, 高橋徹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.09.13 (金沢大学, 石川) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.269-270, 2006年9月13日-15日
対数時間軸伸縮を用いたインパルス応答測定における直接音・反射音成分の分離について

森勢将雅, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.09.13 (金沢大学, 石川) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.429-430,2006年9月13日-15日
歌唱音声中の母音スペクトル形状の変動要因と歌唱合成への応用について

田原佳代子, 高橋徹, 坂野秀樹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.09.13 (金沢大学, 石川) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.267-268, 2006年9月13日-15日
招待講演音色に潜む寸法と形状情報 −混沌から紡ぎだす秩序−

津崎実, 入野俊夫 [Invited]

日本音響学会 2006.09.13 (金沢大学, 石川)

　View Summary

秋季研究発表会講演論文集, pp.619-622, 2006年9月13日-15日.
音色に潜む寸法と形状情報―混沌から紡ぎだす秩序―

津崎実, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2006.09.06
ブラックマン窓を用いたSTRAIGHTスペクトル分析

高橋徹, 森勢将雅, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.09.06
Can humans perceive size differeces in the calls of cats, dogs, and cows?

Toshio Irino, Atsuhi Ban, Hideki Kawahara, Roy D. Patterson

presented at the British Society of Audiology (BSA) , Short Papers Meeting on Experimental Studies of Hearing and Deafness, 2006.09

　View Summary

Cambridge Univ., UK, 14-15 Sept. 2006
The dynamic compressive gammachirp, dcGC, and development plans,

Toshio Irino

presented in the meeting on Auditory representations of size/shape, CNBH, Dept. Phsiol., Devel. and NeuroSci. 2006.09

　View Summary

Cambridge Univ., UK., 11-12, Sept. 2006.
高品質音声分析変換合成システム STRAIGHT における分析窓の検討

高橋徹, 森勢将雅, 入野俊夫, 河原英紀

電子情報通信学会技術研究報告 2006.08.31 (はこだて未来大, 函館)

　View Summary

(日本音響学会・電子情報通信学会, 音声研究会), SP2006-42, 106(222), pp.1-5, 2006年8月30日-31日
A study of analysis window for high-quality speech analysis, modification and synthesis system system STRAIGHT

TAKAHASHI Toru, MORISE Masanori, IRINO Toshio, KAWAHARA Hideki

IEICE technical report 2006.08

　View Summary

A Blackman window-based complementary set of time windows is proposed instead of the current implementation of STRAIGHT speech analysis, modification and resynthesis system where a complementary set of pitch synchronized Gaussian window is used to eliminate temporal variations in power spectral calculation. Gaussian window was used in the original implementation of STRAIGHT because it has the identical form in the frequency domain and has the minimum uncertainty. However, those theoretical advantages are destroyed in the process of pitch synchronization where a pitch synchronous Bartlett window is convoluted with the original Gaussian window. It is more straightforward to use cosine-based time windows instead of he pitch synchronized Gaussian window because they are intrinsically pitch synchronous. Evaluations of the proposed window set using test signals consisting of multiple harmonic components with random phase and amplitudes revealed that the proposed Blackman-base window yields the best temporal variations in resulted power spectra.
寸法変調母音系列の同定成績と寸法正規化の時間的追随性との関連性

竹島千尋, 津崎実, 入野俊夫

電子情報通信学会技術研究報告, (日本音響学会・電子情報通信学会音声研究会), SP2006-29 2006.07.21 (北陸先端大, 石川) 日本音響学会

　View Summary

聴覚研究会資料 H-2006-80, 36 (5), pp.439-443, 2006年7月20日〜21日
Identification of size-modulated vowel sequences and temporal characteristics of the size extraction process

TAKESHIMA Chihiro, TSUZAKI Minoru, IRINO Toshio

IEICE technical report 2006.07

　View Summary

We can identify vowels pronounced by any speaker, although he or she has different length of the vocal tract. At the same time, we can discriminate the difference of length of vocal tract. To simulate these abilities, a computational model has been proposed in which the size information is extracted and separated from the shape information In this paper, we investigated the temporal characteristic of this extracting process of the size information. In the first experiment, listeners were required to identify the size-modulated vowel sequences. The results showed deterioration of the performance for the rapid modulation. This deterioration could be explained by hypothesizing that a rapid change of size of the vocal tract causes the stream segregation. In the second experiment, listeners judged whether a target vowel exist or not in the sequences. The observed tendency also supported the segregation hypothesis.
招待講演 Size Matters: How the auditory system produces its scale invariant representation of the message in a sound

Roy D. Patterson, Toshio Irino [Invited]

Workshop on New Ideas in Hearing 2006.05.12 (Paris,)

　View Summary

Equipe Audition, ENS,12-13 May, 2006.
(Abstracts of Presentation,The 24th Annual Meeting)

津崎実, 竹島千尋, 入野俊夫

The Japanese Journal of Psychonomic Science 2006.03.31
TSPを用いた音響測定におけるPC用AD/DA変換システムの選定について

森勢将雅, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.16 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.653-654, 2006年3月14日-16日
健聴者の劣化音声知覚と難聴者の通常音声知覚の対比

中家諒, 綿貫敬介, 坂本真一, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.16 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.483-484, 2006年3月14日-16日
知覚信号処理のための動的圧縮型ガンマチャープ聴覚フィルタバンク

入野俊夫, PATTERSON Roy D

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.16 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.471-472, 2006 年3月14日-16日
STRAIGHTに基づくモーフィングのオブジェクト化による拡張と部分モーフィングの応用について

河原英紀, 西雅史, 森勢将雅, 野口美咲, 高橋徹, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.16 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.505-506, 2006年3月14日-16日
寸法変調母音の同定に対する寸法正規化の時間的追随性について

竹島千尋, 津崎実, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.16 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.473-474, 2006 年3月14日-16日
雑音認識能力を持つロボット対話インタフェース

西村竜一, 橋爪亜希, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.14 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.203-204, 2006年3月14日-16日
多重音声モーフィングに基く平均声合成の検討

高橋徹, 西雅史, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.14 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.229-230, 2006年3月14日-16日
音声テクスチャマッピング表現による音声適応・変換手法

高橋徹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.14 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.231-232, 2006年3月14日-16日
STRAIGHTスペクトルの時間方向補間におけるERBN周波数尺度上でのスペクトル距離の性質について

畑宏明, 坂野秀樹, 高橋徹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.14 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.313-314, 2006年3月14日-16日
音量とF0による歌唱母音STRAIGHTスペクトルの形状変化と全極近似について

田原佳代子, 高橋徹, 坂野秀樹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.14 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.371-372, 2006年3月14日-16日
回帰分析による実環境対話音声の快・不快感情識別

大前壮司, 西村竜一, 河原英紀, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2006.03.14 (日本大学, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, pp.359-360, 2006年3月14日-16日
Speech texture mapping a general framework for flexible speech style conversion and synthesis

高橋徹, 入野俊夫, 河原英紀

電子情報通信学会技術研究報告, (日本音響学会・電子情報通信学会音声研究会), SP2006-144, 105 (571), pp.31-36 2006.01.26 (和歌山大学, 和歌山, 2006年1月26日-27日)

　View Summary

Speech texture mapping is proposed as a unified framework for speech manipulations; such as speaker conversion, speech morphing, emotional speech synthesis and so on. The proposed framework decomposes speech parametric representations into underlying structure (wireframe) and detailed feature and deviations (texture). For example, linguistic information may be attributed to the wireframe and individuality and emotional expressions may be attributed to the texture. In this interpretation, emotional speech conversion is represented as mapping of different texture on a common wireframe. This article also provides methods and examples for applying the proposed framework in varieties of speech conversion tasks.
Automatic assignment of anchoring points on vowel templates for speech morphing

西雅史, 高橋徹, 入野俊夫, 河原英紀

電子情報通信学会技術研究報告, (日本音響学会・電子情報通信学会音声研究会), SP2006-142, 105 (571), pp.19-24 2006.01.26 (和歌山大学, 和歌山, 2006年1月26日-27日)

　View Summary

An automatic assignment of anchoring points for speech morphing is proposed. The original morphing procedure interpolates linear transformed parameters of two speech samples on a common time-frequency coordinate system by deforming one of the coordinates. This time-frequency deformation to align the coordinates has significant effects on the quality of the morphed speech sounds. The deforming function was defined by manually allocating anchoring points on the time-frequency representations of each speech sample. This manual allocation was a huge obstacle for using the proposed method in various applications because it is a time consuming tedious labor. This article describes methods to replace this process with an objective procedure. The anchoring points is composed of the frequency coordinates and the temporal coordinates. The central idea is to prepare vowel templates with pre-assigned anchoring points in advance and to deform one of the templates to match the input speech spectrum. As the result of this deformation the coordinates of the frequency anchoring points are obtained by those of the points on the deformed template. The optimum deformation is calculated using the DP (dynamic programming) procedure. The temporal coordinates of the anchoring points are defined using the phoneme labels annotated on the speech sample. A subjective test on the naturalness of the morphed speech sounds was conducted and revealed that the proposed method effectively provides highly natural morphed sounds.
A study on implementation of real-time STRAIGHT and the effect of parameter reduction

坂野秀樹, 畑宏明, 高橋徹, 入野俊夫, 河原英紀

電子情報通信学会技術研究報告, (日本音響学会・電子情報通信学会, 音声研究会), SP2006-140, 105 (571), pp.7-12 2006.01.26 (和歌山大学, 和歌山, 2006年1月26日-27日)

　View Summary

This paper describes implementation of real-time STRAIGHT which is a high quality speech analysis, modification and synthesis system based on the VOCODER-type representation. STRAIGHT is currently finding wide applications such as speech synthesis systems and tools for auditory experiments. However, the current implementation of STRAIGHT by MATLAB does not fit to real-time applications. Thus, porting the language of the source code to the C language, replacing the F0 extracting algorithm to a cepstrum-based method, omitting the control part of the short-time phase in synthesis, and so on has been introduced for real-time processing. The preliminary experimentation confirmed that the real-time STRAIGHT can be executed on the recent personal computers and has higher quality than the cepstrum vocoder.
Perceptually weighted spectral distortions of STRAIGHT parameter interpolation for high quality speech processing

畑宏明, 高橋徹, 入野俊夫, 河原英紀

電子情報通信学会技術研究報告, (日本音響学会・電子情報通信学会, 音声研究会), SP2006-139, 105 (571), pp.1-6 2006.01.26 (和歌山大学, 和歌山, 2006年1月26日-27日)

　View Summary

A high-quality speech analysis, modification and resynthesis procedure referred as STRAIGHT employs excessively redundant speech parameter representation that was the major obstacle in using STRAIGHT in various applications. This article provides a basic information for redundancy reduction by reducing the frame rate of spectral analysis. Two interpolation methods (nearest-neighbor and linear) for STRAIGHT spectral parameter were investigated using spectral distance measure both in the liner frequency axis and the nonlinear perceptual frequency axis (ERB_N rate). The investigation was conducted using the speech database developed for speech conversion and 190 sentences for both four male and four female speakers were used to evaluate those spectral distances. The results indicated that using linear interpolation the default frame interval 1ms can be increased up to 5ms for male speech and 4ms for female speech samples.
Temporal characteristics of extraction of size information in speech sounds

Takeshima, C, M. Tsuzaki, T. Irino

Journal of Acoustical Society of America 2006
多重音声モーフィングを用いた新しい平均声作成法

西雅史, 高橋徹, 入野俊夫, 河原英紀

第８回日本音響学会関西支部若手研究者交流研究発表会, 8(B) 2005.12.15 (京都)
音声劣化を気づかせない STRAIGHT 合成パラメタ圧縮手法と評価

畑宏明, 高橋徹, 入野俊夫, 河原英紀

第８回日本音響学会関西支部若手研究者交流研究発表会, 15(A) 2005.12.15
知覚的距離に基づく自動音声モーフィングのための母音テンプレートの検討

西雅史, 高橋徹, 坂野秀樹, 入野俊夫, 河原英紀

聴覚研究会資料 2005.12.08 (熊本大学, 熊本) 日本音響学会

　View Summary

H-2005-120, 35 (11), pp.705-710, 2005年12月8日〜9日
Comparison of auditory filters with cascade and parallel architectures for simultaneous notched-noise masking

鵜木祐史, 入野俊夫, Glasberg Brian

聴覚研究会資料 = Proceedings of the auditory research meeting 2005.12.08 (熊本大学, 熊本) 日本音響学会

　View Summary

聴覚研究会資料 H-2005-124, 35 (11), pp.727-732
聴覚における「形」の恒常性と寸法正規化について

津崎実, 竹島千尋, 入野俊夫

日本基礎心理学学会・第24回大会, 1P31 2005.12.03 (立教大学, 東京)

　View Summary

2005年12月3〜4日
Speech segregation using an event-synchronous auditory image and STRAIGHT

Toshio Irino, Roy D. Patterson, Hideki Kawakhara

Speech Separation by Humans and Machines 2005.12.01

　View Summary

We have presented methods to segregate concurrent speech sounds using an auditory model and a vocoder. Specifically, the method involves the Auditory Image Model (AIM), a robust F0 estimator, and a synthesis module based either on STRAIGHT or an auditory synthesis filterbank. The event-synchronous procedure enhances the intelligibility of the target speaker in the presence of concurrent background speech. The resulting segregation performance is better than with conventional comb-filter methods whenever there are errors in fundamental frequency estimation as there always are in real concurrent speech. Test results suggest that this auditory segregation method has potential for speech enhancement in applications such as hearing aids. © 2005 Springer Science + Business Media, Inc.
Underlying principles of a high-quality speech manipulation system STRAIGHT and its application to speech segregation

Hideki Kawahara, Toshio Irino

Speech Separation by Humans and Machines 2005.12.01

　View Summary

Testing human performance using ecologically relevant stimuli is crucial. STRAIGHT provide powerful means and strategies for doing this. This article outlined the underlying principles of STRAIGHT and the morphing procedure to provide general understanding for potential users of a new research strategy, systematic downgrading. The strategy seems to open up new research possibilities of testing human performance without disturbing their natural conditions. © 2005 Springer Science + Business Media, Inc.
A test signal robust against background noise in the measurement of acoustic impulse responses: Warped-TSP

Masanori Morise, Toshio Irino, Hideki Banno, Hideki Kawahara

International Congress on Noise Control Engineering 2005, INTERNOISE 2005 2005.12.01

　View Summary

We propose a new test signal to improve the accuracy of the measurement of acoustic impulse responses. Linear Time-Stretched Pulses(TSP) signals have been widely used for acoustic measurements. They are useful signals robust to time-varying acoustic environments due to the concentration of energy as a chirp signal. However, They require multiple repetition particularly in low SNR conditions since energy distribution is flat while the energy of ambient noise tends to be concentrated in low frequency regions. Multiple repetition precludes the measurement of time-varying environments. Recently, "log-TSP" or "log swept-sine" signals were defined on the logarithmic timeaxis to improve tolerance to noise and harmonic distortion. It improves relative SNR in low frequency regions at the cost of reducing relative SNR in high frequency regions. It is desirable to develop a signal to introduce the merits of both linear-TSP and log-TSP signals. We propose a new TSP signal, referred to as "warped-TSP," which gradually combines two signals in a transitional frequency region. The warped-TSP enables us to choose an optimal parameter for the transition in accordance with the spectral distribution of noise in the environment under measurement. In this paper, we describe warped-TSP in terms of design, principle, and effectiveness. We describe the definition and relationship between the parameters and spectral distribution. We show the principle for robustness to background noise and harmonic distortion and a method for the optimal choice of parameters using simple measurement and calculation. We show the results in a series of measurement tests under different environments and clearly demonstrate that warped-TSP performs better than conventional linear-TSP and log-TSP. Since the definition of warped-TSP is simple, it is possible to replace conventional TSPs without additional computational cost.
Accuracy improvement in speech sound propagation measurement using logarithmic temporal manipulation

MORISE Masanori, IRINO Toshio, KAWAHARA Hideki

電子情報通信学会技術研究報告, EA2005-64, pp.43-48 2005.10.21 (金沢大学, 金沢, 2005年10月20日-21日) 電子情報通信学会：電気音響研究会, 日本音響学会：聴覚研究会・電気音響研究会

　View Summary

A new procedure to improve accuracy in empirical transfer function estimation method is proposed for investigating speech sound propagation. In our previous work based on cross spectrum method, vowel dependencies of empirical transfer functions from a lip reference point to observation points around speaker's head were found. The accuracy of the method were also evaluated by using references obtained using a HATS and M-sequence and revealed significant variations in higher frequency range (namely 4kHz or more) due to low speech energy. The proposed method alleviates this problem by introducing a logarithmic temporal manipulation and lowpass filtering in the manipulated domain. The proposed method was tested using 128 vocalizations of sustained Japanese vowels with roving fundamental frequency. The test results indicated that the proposed method reduced standard deviations down to 53% in gain estimation, 18% in group delay estimation and 17% in duration estimation respectively in frequency region higher than 10kHz. Detailed aspects on implementation are also discussed.
PC用AD/DA変換器における折り返し歪について

森勢将雅, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2005.09.29 (東北大学, 仙台) 日本音響学会

　View Summary

秋季研究発表会講演論文集, pp.679-680, 2005年9月27日-29日
両耳間相関関数を用いない音源方向推定

松井知子, 田辺国士, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2005.09.29 (東北大学, 仙台) 日本音響学会

　View Summary

秋季研究発表会講演論文集,pp.713-714,2005年9月27日-29日
寸法変調母音の同定成績と聴覚メリン・イメージに基づく決定統計量の関連

津崎実, 竹島千尋, 入野俊夫

日本音響学会研究発表会講演論文集(CD-ROM) 2005.09.28 (東北大学, 仙台) 日本音響学会

　View Summary

秋季研究発表会講演論文集, pp.493-494, 2005年9月27日-29日
主成分分析を用いた感情表現による母音部における音色変化のモデル化と感情マッピング

高橋徹, 坂野秀樹, 西村竜一, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2005.09.28 (東北大学, 仙台) 日本音響学会

　View Summary

秋季研究発表会講演論文集, pp. 293-294, 2005年9月27日-29日
音声モーフィングにおける対応点設定の自動化に関する研究

西雅史, 高橋徹, 坂野秀樹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2005.09.27 (東北大学, 仙台) 日本音響学会

　View Summary

秋季研究発表会講演論文集, pp.397-398, 2005年9月27日-29日
母音スペクトル形状における音高・音量依存成分の分析について―RWC研究用音楽データベース中の歌唱音声の分析―

田原佳代子, 高橋徹, 坂野秀樹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2005.09.27 (東北大学, 仙台) 日本音響学会

　View Summary

秋季研究発表会講演論文集, pp.405-406, 2005年9月27日-29日
有声音部分におけるSTRAIGHTスペクトルの補間特性の検討

畑宏明, 高橋徹, 坂野秀樹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集(CD-ROM) 2005.09.27 (東北大学, 仙台) 日本音響学会

　View Summary

秋季研究発表会講演論文集, pp. 407-408, 2005年9月27日-29日
Analysis for emotion understanding with utterance collection in spoken dialogue system

OMAE Souji, NISIMURA Ryuichi, KAWAHARA Hideki, IRINO Toshio

第57回音声言語情報処理研究会 (SIG-SLP) 2005.07.16 (湯の川温泉, 函館 2005年7月15日-16日) 情報処理学会

　View Summary

Understanding emotions that users hold is becoming important for realizing smooth conversations in spoken dialogue systems. This study discusses the actualities of an automatic emotion understanding by analyzing actual users' utterances collected via field testing our spoken dialogue system "Takemaru-kun". Two testers have carried out the five grade rating with 16 basic emotions to the collected utterances. The factor analysis on the rating result indicated the existence of two factors concerning negative or positive emotions. For realization of the emotions understanding, we have been investigating the correlation between the factors and acoustic features in user's voices. In this paper, the results showed that the factors have no remarkable correlation with the fundamental frequency and the power.
招待講演 Extracting a carrier-independent version of the syllabic message: The principles,

Roy D. Patterson, Thomas C. Walters, Toshio Irino [Invited]

149th meeting 2005.05.16

　View Summary

J. Acoust. Soc. Am. , 117(4), Pt.2, p.2373, April 2005 (149th meeting: 16-20 May 2005)
招待講演 The stabilized, wavelet-Mellin transform for analyzing the size and shape information of vocalized sounds,

Toshio Irino, Roy D. Patterson [Invited]

149th meeting 2005.05.16

　View Summary

J. Acoust. Soc. Am. , 117(4), Pt.2, p.2373, April 2005 (149th meeting: 16-20 May 2005)
招待講演 Identification of size-modulated vowels sequences: Effects of modulation periods and speaking rates,

Minoru Tsuzaki, Toshio Irino, Roy D. Patterson [Invited]

149th meeting: 16-20 May 2005 2005.05.16

　View Summary

J. Acoust. Soc. Am. , 117(4), Pt.2, p.2374, April 2005
Explaining two-tone suppression and forward masking data using a compressive gammachirp auditory filterbank,

Toshio Irino, Roy D. Patterson

J. Acoust. Soc. Am. , 117(4), April 2005 (ASA meeting: May 2005) 2005.04
感情音声データベースにおける母音特徴に注目したSTRAIGHTによる声質・感情変換について

藤井岳史, 西雅史, 高橋徹, 坂野秀樹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集 2005.03.17 (東京農工大, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, I, pp.299-300, 2005年3月15日-17日
Statistical properties of vibrato based on STRAIGHT analysis

MORISE M, HIRACHI Y, BANNO H, IRINO T, KAWAHARA H

日本音響学会研究発表会講演論文集 2005.03.17 (東京農工大, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, I, pp.269-270, 2005年3月15日-17日
スペクトル時間変化を制限して合成した劣化音声の知覚

佐藤諭, 入野俊夫, 坂野秀樹, 河原英紀

日本音響学会研究発表会講演論文集 2005.03.17 (東京農工大, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, I, pp.251-252, 2005年3月15日-17日
音声の平均スペクトルを用いた帯域分割型CSP法に基づく話者位置推定法に関する検討

伝田遊亀, 西浦敬信, 河原英紀, 入野俊夫

日本音響学会研究発表会講演論文集 2005.03.17 (東京農工大, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, I, pp.521-522, 2005年3月15日-17日
歌唱音声の音量変化に伴うスペクトル変形の分析について

田原佳代子, 森勢将雅, 坂野秀樹, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集 2005.03.17 (東京農工大, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, I, pp.271-272, 2005年3月15日-17日
STRAIGHTに基く周波数・時間伸縮を用いた感情マツピングのための距離尺度

高橋徹, 坂野秀樹, 西村竜一, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集 2005.03.16 (東京農工大, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, I, pp.213-214, 2005年3月15日-17日
User's emotion analysis by using actual utterances for speech-oriented information system

OMAE S, NISIMURA R, KAWAHARA H, IRINO T

日本音響学会研究発表会講演論文集 2005.03.16 (東京農工大, 東京) 日本音響学会

　View Summary

春季研究発表会講演論文集, I, pp.63-64, 2005年3月15日-17日
Identification of "size-modulated" vowel sequences: Effects of modulation periods and speaking rates

Tsuzaki, M, T. Irino, R.D. Patterson

Journal of Acoustical Society of America 2005
Speech recognition with wavelet spectral subtraction in real noisy environment

Yuki Denda, Takanobu Nishiura, Hideki Kawahara, Toshio Irino

2004 7th International Conference on Signal Processing Proceedings, ICSP 2004.12.27

　View Summary

In this paper, we focused the effectiveness of the wavelet spectral subtraction in noisy speech recognition. For this purpose, Fourier spectral subtraction is a conventional effective technique, for example. It is a suitable technique for stationary noise reduction (ex. white Gaussian like noise), because the short-time Fourier transform provides a uniform time-frequency resolution on each frequency band. However, it can not reduce suddenly noise effectively, etc. On the other hand, the wavelet transform may be a suitable technique for suddenly signal analysis, etc. (non-stationary signal analysis), because it admits a non-uniform time-frequency resolution on each frequency band. Therefore, we reported to provide effectively performance of noise reduction using the Fourier spectral subtraction, the wavelet spectral subtraction and the microphone array steering in real noisy environments on EUROSPEECH2003. However, it was not clear that what kind of noise characteristics could be reduced with the wavelet spectral subtraction. In this paper, to cope with this problem, we evaluated the performance of the wavelet spectral subtraction and the Fourier spectral subtraction in various noisy environments. As a result of evaluation experiments, we confirmed that the wavelet spectral subtraction could effectively reduce suddenly noise or higher frequency noise than the Fourier spectral subtraction.
巨人と赤ちゃんのおしゃべりは同じ言葉にきこえる？- 音源の寸法を変化させた母音の知覚特性 -

青木美和, 入野俊夫, 河原英紀

第７回日本音響学会関西支部若手研究者交流研究発表会, 15(A) 2004.12.16 (京都)

　View Summary

（筆頭著者青木、「若手奨励賞」受賞）
感情音声データベースにおける母音重心および基本周波数の分布について

藤井岳史, 高橋徹, 坂野秀樹, 入野俊夫, 河原英紀

第７回日本音響学会関西支部若手研究者交流研究発表会, 8(B) 2004.12.16 (京都)
帯域分割型CSP法に基づいた話者位置推定法の性能評価

傳田遊亀, 西浦敬信, 河原英紀, 入野俊夫

第７回日本音響学会関西支部若手研究者交流研究発表会, 23(A) 2004.12.16 (京都)
STRAIGHT を用いたビブラート歌唱法のF0、スペクトルの特徴抽出および合成について

森勢将雅, 平地由美, 坂野秀樹, 入野俊夫, 河原英紀

第７回日本音響学会関西支部若手研究者交流研究発表会, 16(B) 2004.12.16 (京都)
Perception of "size-modulated" speech: The relation between the modulation period and thed vowel identification

Minoru Tsuzaki, Toshio Irino

聴覚研究会資料 2004.12.04 (九州大学, 福岡) 日本音響学会

　View Summary

H-2004-125, 34(10), pp. 713-718, 2004年12月4日-5日
A Study of Talker Localization Based on Subband CSP Analysis

DENDA Yuki, NISHIURA Takanobu, KAWAHARA Hideki, IRINO Toshio

IEICE technical report. Speech 2004.12

　View Summary

It is very important to capture distant-talking speech with high quality for voice-controlled systems or teleconferencing systems. A microphone array steering is an idela candidate as an effective method for capturing distant talking speech with high quality. However, it requires to localize a target talker before capturing distant-talking speech. For this purpose, a talker localization method based on GSP (Cross-power Spectrum Phase) analysis has been proposed, for example. However, talker localization performance of the CSP analysis is degraded in higher noisy environments. To deal with this problem, in this papaer, we propose a subband CSP analysis with weighting of average speech spectrum and we propose a specialized localization method for speech. In addition, we evaluate the ASR (Automatic Speech Recognition) performance when the microphone array steering is steered to the estimated talker direction by the proposed method. As a result of evaluation experiments in a real room, we confirmed that the proposed method provides better talker localization performance than the conventional method.
Speech recognition with wavelet spectral subtraction in real noisy environment

Yuki Denda, Yuki Denda, Takanobu Nishiura, Takanobu Nishiura, Hideki Kawahara, Hideki Kawahara, Toshio Irino, Toshio Irino

International Conference on Signal Processing Proceedings, ICSP 2004.11.17

　View Summary

In this paper, we focused the effectiveness of the wavelet spectral subtraction in noisy speech recognition. For this purpose. Fourier spectral subtraction is a conventional effective technique, for example. It is a suitable technique for stationary noise reduction (ex. white Gaussian like noise), because the short-time Fourier transform provides a uniform time-frequency resolution on each frequency band. However, it can not reduce suddenly noise effectively, etc. On the other hand the wavelet transform may be a suitable technique for suddenly signal analysis etc. (non-stationary signal analysis), because it admits a non-uniform time-frequency resolution on each frequency band. Therefore, we reported to provide effectively performance of noise reduction using the Fourier spectral subtraction, the wavelet spectral subtraction and the microphone array steering in real noisy environments on EUROSPEECH2003. However, it was not clear that what kind of noise characteristics could be reduced with the wavelet spectral subtraction. In this paper, to cope with this problem, we evaluated the performance of the wavelet spectral subtraction and the Fourier spectral subtraction in various noisy environments. As a result of evaluation experiments, we confirmed that the wavelet spectral subtraction could effectively reduce suddenly noise or higher frequency noise than the Fourier spectral subtraction.
高品質音声分析変換合成のための音源情報の抽出について

河原英紀, 高橋徹, 坂野秀樹, 入野俊夫

聴覚研究会資料 2004.11.13 (はこだて未来大学, 北海道) 日本音響学会

　View Summary

H-2004-109, 34(9), pp.615-620, 2004年11月13日
脳は音の何を聞いているのか," 特別展示 in " 脳！大いなるフロンティアに挑む

河原, 入野研究室

科学技術新興機構(JST) CRESRT脳４領域合同イベント 2004.10.09 (日本科学未来館, 東京)
STRAIGHTスペクトルの平滑化による劣化音声合成方式の提案

坂野秀樹, 入野俊夫, JIN J, 河原英紀

日本音響学会研究発表会講演論文集 2004.09.28 (琉球大学, 沖縄) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.375-376, 2004年9月28日-30日
Algorithm amalgam: Morphing waveform based methods, sinuisoidal models and STRAIGHT

Hideki Kawahara, Hideki Banno, Toshio Irino, Parham Zolfaghari

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2004.09.28

　View Summary

A tool to investigate an important fundamental question in speech processing is proposed aiming to promote research on voice quality and para and non linguistic aspects of speech. The proposed method effectively emulates waveform-based methods, sinusoidal models and the high quality source filter model STRAIGHT. The Key idea that enables blending these seemingly disjoint algorithms is a group delay based representation of signal excitation. By using a STRAIGHT-based smoothed time-frequency representation that is shared by these three types of speech processing methods, a unified source representation is used to implement the proposed system. Informal listening tests using the proposed system indicated that phase manipulation introduces different timbre, but it does not need to reproduce the exact waveform to reproduce the same timbre. This may suggest that the possibility of further information reduction exists in synthesizing close to natural quality speech.
招待講演スケール変調音声に対する聴覚的追随性と聴覚的情景,

津崎実, 入野俊夫 [Invited]

日本音響学会 2004.09.28 (琉球大学, 沖縄,)

　View Summary

秋季研究発表会講演論文集, I, pp.521-524, 2004年9月28日-30日
高品質音声分析変換合成のための音源情報抽出法の改良について

河原英紀, 高橋徹, 坂野秀樹, 入野俊夫

日本音響学会研究発表会講演論文集 2004.09.28 (琉球大学, 沖縄) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.225-226, 2004年9月28日-30日
実環境音声情報案内システムにおける発話感情理解についての検討

大前壮司, 西村竜一, 河原英紀, 入野俊夫

日本音響学会研究発表会講演論文集 2004.09.28 (琉球大学, 沖縄) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.205-206, 2004年9月28日-30日
暗騒音に基づいたインパルス応答測定用信号の設計手法

森勢将雅, 入野俊夫, 坂野秀樹, 河原英紀

日本音響学会研究発表会講演論文集 2004.09.28 (琉球大学, 沖縄) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.593-594, 2004年9月28日-30日
STRAIGHTに基づく周波数・時間伸縮を用いた感情マッピンング手法の検討

高橋徹, 坂野秀樹, 西村竜一, 入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集 2004.09.28 (琉球大学, 沖縄) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.407-408, 2004年9月28日-30日
スケール変形した日本語5母音の知覚特性

青木美和, 入野俊夫, PATTERSON R D, 河原英紀

日本音響学会研究発表会講演論文集 2004.09.28 (琉球大学, 沖縄) 日本音響学会

　View Summary

秋季研究発表会講演論文集, I, pp.373-374, 2004年9月28日-30日
スケール変調音声に対する聴覚的追随性と聴覚的情景

津崎実, 入野俊夫

日本音響学会研究発表会講演論文集 2004.09.21
A comparison of auditory filters with cascade and parallel architectures in models of auditory masking,

Masashi Unoki, Roy D. Patterson, Toshio Irino

presented at the British Society of Audiology (BSA) , Short Papers Meeting on Experimental Studies of Hearing and Deafness, Univ. Essex, UK, 16-17 Sept. 2004. 2004.09
A method for designing acoustic measurement signals robust against background noise

MORISE Masanori, IRINO Toshio, BANNO Hideki, KAWAHARA Hideki

電子情報通信学会技術研究報告, (電子情報通信学会：電気音響研究会, 日本音響学会：聴覚研究会・電気音響研究会), EA2004-44, pp.37-42 2004.08.19 (東北大学, 仙台, 2004年8月19 日-20日)

　View Summary

We propose a new signal for measuring acoustic impulse responses of rooms and audio equipment. The signal, named as "warped TSP", is a combination of TSP(Time stretched pulses) and Logarithmic-TSP to improve the accuracy both in low and high frequency regions where the ambient or background noise is relatively high. We are able to define an optimal signal in accordance with the spectral distribution of the background noise. In this report, we describe the method for designing the warped TSP and show the dependency between the parameter and the spectral distribution. Moreover, we demonstrate the effectiveness of the warped TSP by measuring the room acoustics and headphone characteristics in the real environments.
招待講演 Processing of scale information in the auditory system - Analogy to visual processing

Toshio Irino [Invited]

Summerschool of the international graduate schoole neurosensory science and systems:"Object formation in audition and vision: Bottom-up and top-down processing," 2004.08.18 (Bad Zwischenahn, Germany,)

　View Summary

18-22, August,2004
招待講演聴覚による音源の寸法・形状推定,

Toshio Irino [Invited]

日本心理学会聴覚心理学研究会 2004.07.31 (大阪大学)
招待講演 Analysis of scale information in the auditory system,

Toshio Irino, Roy D. Patterson [Invited]

Proc. 18th International Congress on Acoustics (ICA2004) 2004.04.04 (Kyoto, Japan,)

　View Summary

vol 1, pp.457-460, 4-9 Apr. 2004
GMMによる母音/子音区間検出を用いた母音/子音平均スペクトルに基づく適応形ビームフォーマの検討

中山雅人, 西浦敬信, 河原英紀, 入野俊夫

日本音響学会研究発表会講演論文集 2004.03.19 (神奈川工大, 神奈川) 日本音響学会

　View Summary

春季研究発表会, I, pp.647-648
Performance Evaluation of Wavelet Spectral Subtraction in Noisy Speech Enhancement,

Yuki Denda, Takanobu Nishiura, Hideki Kawahara, Toshio Irino

Special Workshop in MAUI (SWIM), Lectures by Masters in Speech Processing Maui , Hawaii, Jan. 12-14, 2004. 2004.01
Filling the gap between waveform coding and source filter models: lessons from source modeling based on group delay,

Hideki Kawahara, Hideki Banno, Toshio Irino, Parham Zolfaghari

Special Workshop in MAUI (SWIM), Lectures by Masters in Speech Processing Maui , Hawaii, Jan. 12-14, 2004. 2004.01
自動車内での遠隔発話音声受音に対するマイクロホンアレーの効果

中山雅人, 傳田遊亀, 西浦敬信, 河原英紀, 入野俊夫

第６回日本音響学会若手研究者交流研究発表会 2003.12.11
招待講演初期聴覚系におけるスケール理論

Toshio Irino [Invited]

特別セミナー, 統計数理研究所 2003.11.14
Speech Segregation using Auditory Vocoder with Event-Synchronous procedure

入野俊夫, Patterson Roy D, 河原英紀

聴覚研究会資料 = Proceedings of the auditory research meeting 2003.11
Underlying principles of a high-quality speech manipulation system STRAIGHT and its application to speech segregation,

Hideki Kawahara, Toshio Irino

Perspectives on Speech Separation - a Workshop , Montreal, Canada, Oct. 31 - Nov. 2, 2003. (sponsored by the National Science Foundation) 2003.10
圧縮型ガンマチャープのパラメータ推定のための音圧算出法と適合結果

鵜木祐史, PATTERSON R D, 入野俊夫

日本音響学会研究発表会講演論文集 2003.09.17 (大同工大, 名古屋) 日本音響学会

　View Summary

秋季研究発表会, I, pp.429-430
Analysis of scale information in the auditory system,

Toshio Irino

Workshop on "Source Size information in Speech and Music," CNBH, Dept. of Physiology, Univ. of Cambridge, 8-10, Sept, 2003. 2003.09
初期聴覚系におけるスケール理論

入野俊夫

第17回関西合同ゼミ日本音響学会研究発表会講演論文集 2003.07.26 (和歌山大学)
Speech segregation based on fundamental periodicity using auditory vocoder

IRINO Toshio, ROY D Patterson, KAWAHARA Hideki

IEICE technical report. Speech 2003.06

　View Summary

We have developed a method for speech segregation based on the Auditory Image Model (AIM) and a scheme of event-synchronous processing. AIM was developed to provide a reasonable representation of the "auditory image" we perceive in response to a sounds. We have also developed an "auditory vocoder" for resynthesizing speech from the auditory image using an existing, high-quality vocoder, STRAIGHT. The auditory representation preserves fine temporal information, unlike conventional window-based processing, which makes it possible to segregate the speech synchronously. We have also developed a method to convert the FO to event times. We have shown that the segregation from the concurrent speech is good even when the SNR is 0 dB, and the glottal-event times of the target speaker are perfectly estimated. The extracted target speech was distorted but entirely intelligible, whereas the distracter speech was reduced to a non-speech sound that was not perceptually disturbing. This system may explain how the auditory system segregates speakers inasmuch as the playback is resynthesized from the output of a reasonable auditory model.
聴覚ボコーダによる混合音声からの音声分離

入野俊夫, PATTERSON R D, 河原英紀

日本音響学会研究発表会講演論文集 2003.03.20 (早稲田大学)

　View Summary

日本音響学会春季大会講演論文集, I, pp.343-344, 2003年3月18日〜20日
招待講演初期聴覚系におけるスケール理論

Toshio Irino [Invited]

日本音響学会 2003.03.18 (早稲田大学)

　View Summary

春季研会講演論文集, I, pp.511-514,
Dominance spectrum based V/UV classification and f<inf>0</inf>estimation

Tomohiro Nakatani, Toshio Irino, Toshio Irino, Parham Zolfaghar

EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology 2003.01.01

　View Summary

This paper presents a new method for robust voiced/unvoiced segment (V/UV) classification and accurate fundamental frequency (f 0 ) estimation in a noisy environment. For this purpose, we introduce the degree of dominance and dominance spectrum that are defined by instantaneous frequency. The degree of dominance allows us to evaluate the magnitude of individual harmonic components of speech signals relative to the background noise. The V/UV segments are robustly classified based on the capability of the dominance spectrum to extract the regularity in the harmonic structure. f 0 is accurately determined based on fixed points corresponding to dominant harmonic components easily selected from the dominance spectrum. Experimental results show that the present method is better than the existing methods in terms of gross and fine f 0 errors, and V/UV correct rates in the presence of background white and babble noise.
An estimation method for fundamental frequency and voiced segment in infant utterance,

Tomohiro Nakatani, Shigeaki Amano, Toshio Irino

144th Meeting of Acoust. Soc. Am., J. Acoust. Soc. Am., Cancun, Mexico, 2-6, Dec., 2002. 2002.12
イベント検出に基づいた聴覚ボコーダ

入野俊夫, PATTERSON R D, 河原英紀

日本音響学会研究発表会講演論文集 2002.09.27 (秋田大学)

　View Summary

日本音響学会秋季大会講演論文集, I, pp.321-322
幼児音声の基本周波数および有声区間の推定法

中谷智広, 天野成昭, 入野俊夫

日本音響学会研究発表会講演論文集 2002.09.26 (秋田大学) 日本音響学会

　View Summary

秋季大会講演論文集, I, pp.393-394
招待講演 An auditory vocoder resynthesis of speech from an Mellin representation,

Toshio Irino, Roy D. Patterson, Hideki Kawahara [Invited]

European and Japanese Acoustic Societies Symposium (EAA-SEA-ASJ), Forum Acusticum Sevilla 2002 2002.09.16 (Sevilla, Spain,)

　View Summary

HEA-02-005-IP, 16-20, Sept., 2002. (Invited Talk) (Abstract in Acta Acustica, Vol. 88, Suppl. 1, pp.S118, 2002)
A computational theory of the early auditory system : optimality, explaining experimental data, and ecological point of view

入野俊夫

聴覚研究会資料 = Proceedings of the auditory research meeting 2002.09.06 (ATR) 日本音響学会

　View Summary

Vol. 32, No.7, pp.455-460, H-2002-69
Auditory vocoder by mapping auditory and Fourier representations

Toshio Irino, Roy D. Patterson, Hideki Kawahara

CREST workshop on Computational Models of Auditory Processing 2002.07.08 (Kyoto, Japan)

　View Summary

8-9, July, 2002
An auditory Mellin transform for segregationg size and shape information of vocal tract

Toshio Irino, Roy D. Patterson

CREST workshop on Computational Models of Auditory Processing 2002.07.08 (Kyoto, Japan)

　View Summary

8-9, July, 2002
招待講演 Time-domain auditory processing of the dynamic aspects of speech,

Roy D. Patterson, Toshio Irino [Invited]

Dynamics of Speech Production and Perception, NATO Advanced Study Institute 2002.06.24 (Il Ciocco, Itary,)

　View Summary

24 June - 6 July, 2002. (Talk as a faculty member)
聴覚計算理論は聴覚末梢系の進化を説明できるか？

Toshio Irino

科学技術振興事業団CREST「脳を創る」第3回全体シンポジウム 2002.05.22 (日本科学未来館, 東京)
招待講演聴覚系を理解し応用するための計算理論

Toshio Irino [Invited]

第15回回路とシステム（軽井沢）ワークショップ 2002.04.22 (軽井沢)

　View Summary

pp. 269-274
招待講演聴覚メリン表現からの信号再合成

Toshio Irino [Invited]

名古屋大学統合音響情報研究拠点、CIAIR音声信号処理ワークショップ 2002.03.27 (名古屋大学)
時間周波数表現における3種類の不動点と音響的特徴について

河原英紀, ZOLFAGHARI P, 入野俊夫

日本音響学会研究発表会講演論文集 2002.03 (神奈川大学)

　View Summary

日本音響学会春季大会講演論文集, I, pp.325-326
Fundamental Frequency Estimation Based on Dominance Spectrum.

中谷智広, 入野俊夫

聴覚音声研究会, Vol.32, No.2, pp. 105-112, H-2002-14 2002.03 (東京大) 日本音響学会

　View Summary

This paper presents a new method for robust and accurate fundamental frequency (F_0) estimation in the presence of background noise and spectral distortion. For this purpose, degree of dominance and a dominance spectrum are defined based on instantaneous frequencies of the STFT spectra. The degree of dominance is a measure for evaluating the magnitude of individual harmonic components relative to the background noise. The fundamental frequency is correctly estimated from reliable harmonic components easily selected in the dominance spectra. Experiments are performed using white and multi-talker background noise under the conditions with and without spectral distortion produced by a SRAEN filter. Results show that the present method is better than the commonly-used conventional methods in terms of both the F_0 correct rates and fine F_0 errors.
Parameter estimation of the compressive gammachirp in notched-noise masking data for various frequencies

UNOKI Masashi, PATTERSON Roy D, IRINO Toshio

日本音響学会研究発表会講演論文集 2002.03 (神奈川大学) 日本音響学会

　View Summary

春季大会講演論文集, I, pp.496-496
Fundamental frequency estimation based on dominant harmonic components

NAKATANI T, IRINO T

日本音響学会研究発表会講演論文集 2002.03 (神奈川大学)

　View Summary

日本音響学会春季大会講演論文集, I, pp.323-324（筆頭著者中谷、「ポスター賞」受賞）
Auditory Vocoder: Speech resynthesis from an auditory Mellin model

Toshio Irino, Roy D. Patterson, Hideki Kawahara

2002 NTT workshop on Communication Scene Analysis 2002.01.21 (Kanagawa, Japan)

　View Summary

Jan. 21-23, 2002
Fitting the compressive gammachirp auditory filter to human notched-noise masking data for various frequencies

鵜木祐史, Patterson Roy D, 入野俊夫

聴覚研究会資料 = Proceedings of the auditory research meeting 2002.01 (岩手県立大) 日本音響学会

　View Summary

聴覚研究会資料,Vol. 32, No.1, pp.41-48, H-2002-06
Sound resysnthesis from Auditory Mellin Image.

IRINO T, D. PATTERSON Roy, KAWAHARA H

日本音響学会秋季大会講演論文集 2001.10.02 (大分大学)

　View Summary

1, pp.247-248, 2001年10月2日〜 4日
Application of F_0 extraction method based on instantaneous frequency to co-channel speech

NAKATANI T, IRINO T

日本音響学会秋季大会講演論文集 2001.10.02 (大分大学)

　View Summary

1, pp.211-212,2001年10月2日〜 4日
初期聴覚系の計算理論：最適性理論・実験データとの整合性・生態学的観点

入野俊夫, Roy D. Patterson, 河原英紀

神経回路学会第１１回全国大会講演論文集 2001.09.27

　View Summary

pp.17-18, 奈良, 2001年9月27日〜 29日
Signal resynthesis from Auditory Mellin Image using a high-quality VOCODER, STRAIGHT

IRINO Toshio, PATTERSON Roy D, KAWAHARA Hideki

聴覚研究会資料, Vol. 31 (5), 315-322 (H-2001-43), 音声研究会資料(SP2001-40) 2001.07 (金沢工大) 日本音響学会

　View Summary

We propose a method for resynthesizing sounds from auditory representations, Auditory Mellin Images, by using a high-quality VOCIDER, STRAIGHT. Analysis/synthesis systems for speech sounds have been studied extensively until the VOCODER system was developed in 1939. There is, however, no system involving a reallistic auditiory model while human sound perception is known as an important facter to develope the system. We combined Auditory Mellin Image model and STRAIGHT into a new″auditory"VOCODER system by introducing a mapping function including frequency-warping Discrete Cosine Transform and nonlinear multivariate analysis. By using this system, we expect to include auditory functions such as noise-robustess and sound-source separation which have been problems for conventional VOCODERs.
初期聴覚系の計算理論：最適性理論・生理/心理物理データへの整合性・生態学的観点

入野俊夫, Roy D. Patterson

科学技術振興事業団CREST「脳を創る」第2回全体シンポジウム 2001.06.05 (コクヨホール, 品川/東京)
初期聴覚系の計算理論：安定化ウェーブレットとガンマチャープ

Toshio Irino

北陸先端大、情報科学研究科, 講演 2001.03.07 (石川)
The mathematcal requirement for stabilization in the wavelet-Mellin transform and its implication

Toshio Irino

科学技術振興事業団CREST「脳を創る」河原プロジェクト Workshop"Stable representation of periodic sounds," 2000.11 (名古屋大学, 名古屋)
Robust fundamental frequency estimation using instantaneous frequencies of harmonic components.

Yoshinori Atake, Toshio Irino, Hideki Kawahara, Jinlin Lu, Satoshi Nakamura, Kiyohiro Shikano

Sixth International Conference on Spoken Language Processing, ICSLP 2000 / INTERSPEECH 2000, Beijing, China, October 16-20, 2000 2000.10
招待講演 The wavelet-Mellin transform for auditory processing,

Toshio Irino [Invited]

Japan-America Frontiers of Science (JAFoS) 2000 , held by National Academy of Sciences (USA) +科学技術振興事業団 2000.09.21 (Irvine, CA, USA)

　View Summary

Sept., 21-24, 2000
紹介記事：磯崎・高橋他「日米若手研究者のドリームチーム対決、第3回JAFoSシンポジウム報告」科学, Vol.71. No.2, pp.191-196, 岩波書店, 2001.
A physiological motivated gammachirp auditory filterbank,**

Toshio Irino, Masashi Unoki, Roy D. Patterson

presented at the British Society of Audiology, Short Papers Meeting on Experimental Studies of Hearing and Deafness, Keele, Sept., 21th-23th, 2000. 2000.09
Segregating size and shape information of the vocal tract in the auditory system using a stabilized wavelet-Mellin transform

Toshio Irino

Ear Club: Berkeley's Weekly Hearing Sciences Colloquium Series, Univ.of California, Berkeley, Sept. 25, 2000. 2000.09
非対称性補償形ガンマチャープフィルタの近似精度の改善

鵜木祐史, 入野俊夫

聴覚研究会, H-2000-42 2000.06 (北大, 北海道) 日本音響学会
初期聴覚系の計算理論：音源の寸法情報と形状情報の分離抽出

入野俊夫, Roy D. Patterson

科学技術振興事業団CREST「脳を創る」第1回全体シンポジウム 2000.04.12 (コクヨホール, 品川/東京)
ガンマチャープによるネコの基底膜インパルス応答への適合

入野俊夫, PATTERSON R D

日本音響学会研究発表会講演論文集 2000.03.01 (日本大, 千葉) 日本音響学会

　View Summary

春季研究発表会, I, pp.397-398
調波成分の瞬時周波数を利用したピッチ推定方法の提案

阿竹義徳, 入野俊夫, 河原英紀, LU J, 中村哲, 鹿野清宏

日本音響学会研究発表会講演論文集 2000.03 (日本大, 千葉) 日本音響学会

　View Summary

春季研究発表会, I, pp.251-252
A new pitch extraction method using instantaneous frequencies of harmonic compornents.

阿竹義徳, 入野俊夫, 河原英紀, LU J, 中村哲, 鹿野清宏

音声・聴覚研究会, SP99-170, H-2000-25 2000.03 (東京大, 東京) 日本音響学会

　View Summary

STRAIGHT, developed by Hideki Kawahara et. al. in 1996, can produce re-synthesized speech sound very naturally, although it is basically a VOCODER method. But STRAIGHT has a weak point in the noise tolerance that the quality of the re-synthesized sounds largely degraded when using in noisy environments. This is because STRAIGHT uses pitch-adaptive analysis to produce the time-frequency representation and is sensitive to the error in the estimated pitch frequency. To solve this problem, a new pitch extraction method is proposed in this paper. This method extracts the harmonic components of the glottal pulses and combine them using the bandwidth equation adapted from Cohen's equation (1995). A large database for simultaneous recording of speech waveforms and EGG (electro glottal graph) was constructed to evaluate the proposed method, STRAIGHT-TEMPO, and other methods. As a result, the precision of the proposed method is much better than other methods when the signal-to-noise ratio is low, and is very accurate and comparable to TEMPO in the clean condition.
A neurobiological framework for auditory images and the segregation of information about source size and shape,

Roy D. Patterson, Toshio Irino

Association for Research in Otolaryngology (ARO), Midwinter meeting, Florida, USA, 20-24 Feb. 2000. 2000.02
ネコの基底膜インパルス応答に対するガンマチャープの適合

入野俊夫, Roy D. Patterson

聴覚研究会, H-2000-14 2000.02 (和歌山大学, 和歌山) 日本音響学会
招待講演ガンマチャープ聴覚フィルタバンクによる定常雑音抑圧

Toshio Irino [Invited]

電子情報通信学会, ディジタル信号処理研究会・DSP研究会 1999.12.16 (宮島)

　View Summary

DSP99-120, vol.99, no. 504, pp.59-66,
Steady-state noise suppression using a gammachirp auditory filterbank

IRINO Toshio

Technical report of IEICE. DSP 1999.12.16

　View Summary

Spectral subtraction has been most popularly cited as a noise suppression method for speech signals with steady background noise because it is basically a non-parametric method and simple enough to be implemented with FFT. But it has been well known that the spectral subtraction produces so called "musical noise" in the synthetic sounds. Since the musical noise, even with lowlevel, often bothers human speech perception, the spectral subtraction has not been successfully used in applications necessary to reproduce sounds for human listeners. To overcome the problem essentially, this paper proposes an alternative method using a time-varying, analysis/synthesis gammachirp filterbank, i.e., initially proposed as an auditory filterbank. The present method is shown to achieve about the same SNR improvement as the spectral subtraction when using the same condition on non-speech interval. Moreover, the synthetic sounds contain no musical noise but just steady white-like noise with reduced level when the original background is white noise. This method is advantageous in various applications for human listeners since it uses the gammachirp that is also suitable for approximation of human auditory filter shapes. (This paper is based on Tech. Rep. of Acoust Soc. Jpn. H-98-98 (Sept, 1998) with minor modifications.)
An auditory strategy for separating size and shape information of sound sources

Toshio Irino, Roy D. Patterson

人工知能学会, AIチャレンジ研究会 1999.11 (青山学院大, 東京)

　View Summary

Jpn., Soc. Artificial Intelli., Tech. Rep., SIG-Challenge-9907-6, pp.33-38
Stabilised wavelet Mellin transform: An auditory strategy for segregating size and shape information of sound sources

Toshio Irino, Roy D. Patterson

応用ウェーブレット研究会,pp.43-50, 日本機械学会 1999.11 (東京)
Imaging of sound source shape: Auditory strategy for opimal signal processing.

入野俊夫, PATTERSON R D

日本音響学会研究発表会講演論文集 1999.09 (鳥取大, 松江) 日本音響学会

　View Summary

秋季研究発表会, II, pp.1177-1178
Mellin images of vowel sounds and phonological distinctiveness of multi-formant vowels,

Roy D. Patterson, Stefan UppenKamp, Toshio Irino

presented at the British Society of Audiology (BSA), Short Papers Meeting on Experimental Studies of Hearing and Deafness, Univ. Essex, UK, 21-22 Sept. 1999. 1999.09
生理学的制約をいれたガンマチャープの心理物理データへの適合

入野俊夫, Roy D. Patterson

聴覚研究会, H-99-36 1999.05 (東京医科歯科大, 東京) 日本音響学会
On normalization of sound source size by the Mellin transform in model of the auditory pathway

IRINO Toshio, PATTERSON Roy D

日本音響学会研究発表会講演論文集 1999.03 (明治大, 川崎) 日本音響学会

　View Summary

春季研究発表会, I, pp.383-384
Applicaton of bandwidth equation to fundamental frequency extraction in STRAIGHT

ATAKE Yoshinori, IRINO Toshio, KAWAHARA Hideki

日本音響学会研究発表会講演論文集 1999.03 (明治大, 川崎) 日本音響学会

　View Summary

春季研究発表会, I, pp. 199-200
Parameter determination of a gammachirp filter with physiological constrains

IRINO Toshio, PATTERSON Roy D

日本音響学会研究発表会講演論文集 1999.03 (明治大, 川崎) 日本音響学会

　View Summary

春季研究発表会, I, pp.382-383
A Mathematical Framework for Auditory Processing: A Mellin Transform of a Stabilised Wavelet Transform?

Toshio Irino, Roy D. Patterson

ATR Technical Report : TR-H-264 1999.01.29
聴覚経路におけるメリン変換の計算

入野俊夫, Roy D. Patterson

聴覚研究会, H-99-5 1999.01 (岩手県立大, 岩手) 日本音響学会
Background noise suppression using a gammachirp auditory filterbank.

入野俊夫

電気関係学会関西支部連合大会人工知能学会AIチャレンジ研究会 1998.11.07 (大阪府立大, 大阪)

　View Summary

SIG-Challenge-9801, pp. 33-40
Wavelet-Mellin変換の意味で最適な聴覚フィルタ：ガンマチャープ

Toshio Irino

応用ウェーヴレット解析研究会 1998.10.29 (大学生協会館, 東京)

　View Summary

1998年10月29・30日
Noise suppression using an analysis/synthesis gammachirp filterbank

IRINO Toshio

日本音響学会研究発表会講演論文集 1998.10.29 (大学生協会館, 東京) 日本音響学会

　View Summary

秋季研究発表会, I, pp.241-242 1998年10月29・30日
時変分析合成ガンマチャープ聴覚フィルタバンクと雑音抑圧

Toshio Irino

1998.09 (ATR, 京都) 日本音響学会

　View Summary

聴覚研究会, H-98-98
In audition the optimum time-frequency trading function is Gamma not Gauss,

Toshio Irino

Kenneth Craik Club, Cambridge Univ., UK., 14 July 1998. 1998.07
A time-varying, analysis/synthesis auditory model using the gammachirp filterbank

IRINO Toshio, UNOKI Masashi

日本音響学会研究発表会講演論文集 1998.03 (慶應大, 神奈川) 日本音響学会

　View Summary

春季研究発表会, I, pp.413-414
A method for controlling the asymmetric parameters in the gammachirpfilterbank

UNOKI Masashi, IRINO Toshio

日本音響学会研究発表会講演論文集 1998.03 (慶應大, 神奈川) 日本音響学会

　View Summary

春季研究発表会, I, pp.415-416
ガンマチャープフィルタとフィルタバンクの効率的な構成

入野俊夫, 鵜木祐史

聴覚研究会(H-97-69) 1997.10 (NTT 厚木, 神奈川) 日本音響学会
An implementation of the gammachirp filter using an asymmetric IIR filter

IRINO Toshio, UNOKI Masashi

日本音響学会研究発表会講演論文集 1997.09 (北海道大,札幌) 日本音響学会

　View Summary

秋季研究発表会, I, pp.421-422
An efficient implementation of the gammachirp filter and its filterbank design

入野俊夫, 鵜木祐史

ATR Technical Report, ATR-H-225 1997.07.14
Explaning perceptual temporal asymmetry with autocorrelation vs. strobed temporal integration.

入野俊夫, PATTERSON R D

日本音響学会研究発表会講演論文集 1997.03 (同志社大,京都) 日本音響学会

　View Summary

春季研究発表会, I, pp.455-456
レベル依存聴覚フィルタとしてのガンマチャープ

入野俊夫, Roy D. Patterson

聴覚研究会(H-96-73) 1996.10 (NTT 厚木, 神奈川)
On approximation of the auditory filter shape using a gammachirp function

IRINO Toshio, PATTERSON Roy D

日本音響学会研究発表会講演論文集 1996.09 (岡山大, 岡山) 日本音響学会

　View Summary

秋季研究発表会, I, pp.385-386
An asymmetric extension of the gammatone filter function

T. Irino

British Journal of Audiology 1996.01.01
'Gammachirp' function as an optimal auditory filter with the Mellin transform

Toshio Irino

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1996.01.01

　View Summary

Recently, a 'gammachirp' function has been derived as an optimal auditory filter function in terms of minimal uncertainty in a joint time and modified-scale representation if the scale transform defined by Cohen is used in the auditory system. The gammatone function, which is widely used as the impulse response of a linear auditory filter, is a first-order approximation of the 'gammachirp' function consisting of a chirp carrier with an envelope that is a gamma distribution function. In this paper, the optimality of the 'gammachirp' function is argued for the general Mellin transform since Cohen's scale transform is a specific example of the Mellin transform. A sample speech signal is analyzed to demonstrate the properties of a joint time and scale distribution derived with a short-time Mellin transform in comparison with a short-time Fourier spectrum.
理論的に最適な聴覚フィルタ関数

Toshio Irino

岡崎生理研究所研究会 1995.12.04

　View Summary

1995年12月4日〜5日
An asymmetric extension of the gammatone filter function,

Toshio Irino

presented at the British Society of Audiology, Short Papers Meeting on Experimental Studies of Hearing and Deafness, Oxford, September 27-28, 1995. 1995.09
Minimal uncertainty of a gammachirp function in Mellin transform

IRINO Toshio

日本音響学会研究発表会講演論文集 1995.09

　View Summary

日本音響学会秋季大会講演論文集, 1, 421-422
A Computational Theory of the Peripheral Auditory System

IRINO Toshio

IEICE technical report. Speech 1995.07 (北陸先端大) 日本音響学会

　View Summary

A computational theory of the peripheral auditory system is discussed in the manner of D.Marr. A 'gammachirp' function is found to be the optimal auditory filter in terms of minimal uncertainty if the time-scale representation is calculated in the auditory system. Wavelet configuration is optimal for the auditory filterbank above 800Hz in terms of invariability in the scale representation. A 'delta-gamma' theory was introduced to explain temporal asymmetry in auditory perception. The theory can also explain physiological firing patterns of an inner hair-cell and some neurons in the Cochlear Nucleus.
On optimality of gammatone filter

IRINO Toshio

日本音響学会研究発表会講演論文集 1995.03

　View Summary

日本音響学会春季大会講演論文集, 1, 449-450
Optimal Auditory Filter and Scale Representation

Toshio Irino

Research Report, NTT Basic Research Labs., ISRL-94-6 1995.02
音響事象検出・強調の計算理論

入野俊夫, Patterson, R.D

日本音響学会聴覚研究会資料, H-94-64 1994.11
A computational theory of asymmetric intensity enhancement around acoustic transients

Irino, T, Patterson, R. D

NTT Basic Research Labs. Technical Report, ISRL-93-9 1994
A comutational theory of auditory event detection,

Toshi Irino, Roy D. Patterson

ASA meeting, J. Acoust. Soc. Am., 95, 2943, 1994. 1994
Data Reduction Characteristics of the Auditory Wavelet Transform with the Reconstruction Algorithm.

入野俊夫

日本音響学会研究発表会講演論文集 1993.03

　View Summary

pp.257-258
Signal reconstruction from modified auditory wavelet transform

Irino, T, Kawahara, H

NTT Basic Research Labs. Technical Report, ISRL-93-2 1993
The effect of the auditory filter response on voicing judgemnets for intervocal stop consonant,

Toshio Irino

British Soc. Audiology(BSA), Short Papers Meeting on Experimental Studies of Hearing and Deafness, Bristol, 1993. 1993
Modeling of the Head Related Trandfer Function to extract features usable in sound localization

Toshio Irino

NTT Basic Research Labs. Technical Report, ISRL-93-7 1993
Speech Signal Processing Using Wavelet Transform.

入野俊夫

電子情報通信学会技術研究報告 1992.10.21
Effects of auditory filter response on voicing judgment in intervocalic stop consonants.

入野俊夫

日本音響学会研究発表会講演論文集 1992.10

　View Summary

日本音響学会秋季大会講演論文集, ,pp.369-370
A Comparative Study on the Subjective Assessment Measure and the Distance Measure of the Signal Reconstructed by Auditory Wavelet Transform.

入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集 1992.03

　View Summary

日本音響学会春季大会講演論文集, , pp.391-392
聴覚wavelet変換による聴覚末梢系表現からの信号再構成

Toshio Irino

AVIRG,92年1月例会waveletセミナー 1992.01

　View Summary

資料はH-91-44と同じ
招待講演 Wavelet変換よる音声信号処理

Toshio Irino

電子情報通信学会, 音声研究会・ディジタル信号処理研究会技術報告 1992

　View Summary

SP-92-81, DSP92-6.
聴覚末梢系表現からの信号再構成

入野俊夫, 河原英紀

日本音響学会聴覚研究会資料, H-91-44 1991.11
Extracting speech excitation information using wavelet transformation.

河原英紀, 入野俊夫

日本音響学会研究発表会講演論文集 1991.10

　View Summary

日本音響学会秋季大会講演論文集, , 3-7-8
Signal reconstruction and modification using auditory wavelet transform.

入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集 1991.10

　View Summary

pp.411-412
Evaluation of speech excitation events by wavelet transform.

河原英紀, 入野俊夫

電子情報通信学会技術研究報告 1991.07

　View Summary

電子情報通信学会音声研究会資料, SP91-46, H-91-24
Wavelet analysis and synthesis using the impulse response of an auditory peripheral model.

入野俊夫, 河原英紀

日本音響学会研究発表会講演論文集 1991.03

　View Summary

日本音響学会春季大会講演論文集, ,1-8-1
Representing temporal information in auditory periphery based on random field theory

Herve, T, Irino, T, Kawahara, H

日本音響学会聴覚研究会資料, H-90-41 1990.09
Wavelet transform and its application to auditory modelling based on neural networks.

河原英紀, 入野俊夫

日本音響学会研究発表会講演論文集 1990.09

　View Summary

日本音響学会春季大会講演論文集, 1-7-15
聴覚モデルによる音声の時間的変動検出能力の検討

河原英紀, 入野俊夫

日本音響学会春季大会講演論文集, 2-5-2 1990.03
多変量解析により構成した多層神経回路網による不特定話者母音の特徴抽出

入野俊夫, 河原英紀

日本音響学会秋季大会講演論文集, 1-1-15 1989.10
A method for designing neural networks using non-linear multivariate analysis. Application to speaker-independent vowel recognition.

入野俊夫, 河原英紀

電子情報通信学会論文誌 D-2 1989.08

　View Summary

(資料は、信学会論文と同じ）
An analysis on the neural networks designed using multivariate analysis. Example on speaker-independent vowel recognition.

入野俊夫, 河原英紀

電子情報通信学会技術研究報告 1989.05.19
状態縮約表現により形成された神経回路網の解析

河原英紀, 入野俊夫

日本音響学会聴覚研究会資料, H-89-11 1989.05
多変量解析によるニューラルネットワークの構成法 - 不特定話者母音認識への適用 -

入野俊夫, 河原英紀

日本音響学会春季大会講演論文集, 2-8-2 1989.03
聴覚モデルによる音声の時間的変動検出能力の検討

河原英紀, 入野俊夫

日本音響学会春季大会講演論文集, 2-5-2 1989.03
状態縮約表現を用いた神経回路網による破裂音の識別の検討

河原英紀, 入野俊夫

日本音響学会春季大会講演論文集, 2-8-4 1989.03
A method for designing neural networks using non-linear multivariate analysis. Application to speaker-independent vowel recognition.

入野俊夫, 河原英紀

電子情報通信学会技術研究報告 1989.01

　View Summary

電子情報通信学会音声研究会資料, SP88-123
Simulation of ear using a fluid dynamics model of cochlea

Irino, T, Kawahara, H

NTT Basic Research Labs. Technical Report, ISRL-89-1 1989
A procedure for designing 3-layer neural networks for pattern recognition applications.

河原英紀, 入野俊夫

電子情報通信学会技術研究報告 1988.10.28
多層ニューラルネットワークを用いた不特定話者母音知覚モデルの解析

入野俊夫, 河原英紀

日本音響学会秋季大会講演論文集, 2-P-10 1988.10

　View Summary

(日本音響学会学術奨励賞受賞)
A procedure for designing 3-layer neural networks which approximate arbitrary continuous mapping: Applications to pattern processing.

河原英紀, 入野俊夫

電子情報通信学会技術研究報告 1988.09.16
基底膜振動を入力としたニューラルネットワークによる母音特徴抽出の検討

入野俊夫, 河原英紀

第10回神経情報科学研究会資料 1988.08
Exploring temporal feature representations of speech using neural networks.

河原英紀, 入野俊夫

電子情報通信学会技術研究報告 1988.07.28
Speaker independent feature extraction of Japanese vowels using neural networks

Irino, T, Kawahara, H

ATR Workshop on Neural Networks and Parallel Distributed Processing 1988.07 (Kyoto)
A study on the speaker independent feature extraction of Japanese vowels by neural networks,

Toshio Irino, Hideki Kawahara

115th Meeting of the Acoust. Soc. Amer, May, 1988. 1988.05
基底膜振動を入力とした母音特徴抽出の検討 - 神経回路網による表現の探索 -

入野俊夫, 河原英紀

日本音響学会春季大会講演論文集, , 3-P-15 1988.03
零温度係数を持つSiC/SiO2/LiTaO3構造弾性境界波基板

入野俊夫, 渡辺隆弥, 清水康敬

日本音響学会講演論文集,2-2-3,pp.799-800 1987.10.03
神経回路網アプローチに基づく母音特徴要素抽出の検討

入野俊夫, 河原英紀

日本音響学会秋季大会講演論文集, , 1-3-6 1987.10
Vowel recognition by neural network - A study on the ability of the feature extraction.

入野俊夫, 河原英紀

電子情報通信学会技術研究報告 1987.10

　View Summary

日本音響学会聴覚研究会, EA87-55, H-87-52
SiO2/LiTaO3構造中に伝搬するストンリー波の温度特性

入野俊夫, 渡辺隆弥, 清水康敬

日本音響学会講演論文集, 1-7-2, pp.591-592 1987.03.26
弾性境界波の特徴と特性

入野俊夫, 清水康敬

日本音響学会講演論文集, 1-7-3, pp.593-594 1987.03.26
C-4 Zero slope temperature SiC/SiO_2/LiTaO_3 substrate for boundary acoustic waves

Irino Toshio, Watanabe Takaya, Shimizu Yasutaka

Symposium on ultrasonic electronics 1987

　View Summary

超音波シンポジウム,pp.69-70
弾性境界波の特徴とその特性

入野俊夫, 清水康敬

日本学術振興会弾性波素子技術第150委員会,第9回研究 1987
SiO2/LiTaO3構造中に伝搬するストンリー波の実験的検討

入野俊夫, 渡辺隆弥, 清水康敬

日本音響学会講演論文集 1986.10.03

　View Summary

3-2-1, pp.811-812
SiO2/ZnO/SiO2構造中に伝搬する弾性境界波の実験的検討

入野俊夫, 清水康敬

日本音響学会講演論文集 1986.10.03

　View Summary

3-2-4,pp.817-818
Propagation of boundary acoustic waves along a ZnO layer between two materials.

入野俊夫, 白崎良昌, 清水康敬

電子通信学会技術研究報告 1986.09.29

　View Summary

電子通信学会超音波研究会, US86-39, pp.47-54
SiO2／ZnO／ガラス三層構造中に伝搬する弾性境界波の理論的検討

入野俊夫, 白崎良昌, 清水康敬

日本音響学会講演論文集 1986.03.28

　View Summary

2-7-5,pp.645-646
Theoretical analysis of stoneley waves propagating along an interface between two substrates of same piezoelectric material.

入野俊夫, 清水康敬

電子通信学会技術研究報告 1986.03.20

　View Summary

電子通信学会超音波研究会資料, US.85-69, Vol.85, No.3
E-3 Acoustic boundary waves propagating along a thin layer between two bonded substrates

IRINO Toshio, SHIMIZU Yasutaka

Symposium on ultrasonic electronics 1985.12.10

　View Summary

pp.119-118,1985
圧電体を含む三層構造に伝搬する弾性境界波

入野俊夫, 清水康敬

日本音響学会講演論文集 1985.10.01

　View Summary

2-5-19,pp.757-758
圧電体＝媒質間に伝搬する境界波の検討

入野俊夫, 清水康敬

日本音響学会講演論文集 1984.10.04

　View Summary

1-7-9,1984
圧電体二媒質境界面を伝搬するストンリー波の理論的検討

入野俊夫, 清水康敬

電子通信学会技術研究報告 1984.05.23

　View Summary

電子通信学会マイクロ波研究会資料, MW84-11, 1984
圧電体二媒質境界面を伝搬するストンリー波の理論的検討

入野俊夫, 清水康敬

日本学術振興会薄膜第131委員会 1984.05.18
圧電体二媒質構造におけるストンリー波の理論的検討

入野俊夫, 清水康敬

日本音響学会研究発表会講演論文集 1984.03.31

　View Summary

3-5-10,1984
２枚の PZT基板を接着した境界面に伝搬する境界波

入野俊夫, 清水康敬

日本音響学会講演論文集 1984.03.28

　View Summary

2-2-7,pp.635-636
圧電体と等方体の境界を伝搬するストンリー波の存在条件

清水康敬, 入野俊夫

日本音響学会研究発表会講演論文集 1983.10.04

　View Summary

2-8-17,1983
任意方向に分極した圧電セラミック基板を伝搬する表面波特性

清水康敬, 清水徹, 入野俊夫

電子通信学会技術研究報告 1983.01.27

　View Summary

電子通信学会超音波研究会資料, US82-72,1983
圧電体とガラスの境界面を伝搬するストンリー波の特性

清水康敬, 入野俊夫

日本学術振興会薄膜第131委員会 1983.01.26
C-1 Stoneley waves propagating along an interface between piezoelectric material and Glass

SHIMIZU Yasutaka, IRINO Toshio

Symposium on ultrasonic electronics 1982.12.07
LiNbO3とガラスの境界面を伝搬するストンリー波の理論的検討

清水康敬, 入野俊夫

日本音響学会研究発表会講演論文集 1982.10.20

　View Summary

1-4-9,1982
圧電体とガラスの境界面を伝搬するストンリー波について

清水康敬, 入野俊夫

電気学会エレクトロメカニカル機能部品調査委員会 1982.09.28

　View Summary

26-73, 1982
ＺｎＯとガラスの境界面を伝搬するストンリー波の理論的検討

清水康敬, 入野俊夫

日本音響学会研究発表会講演論文集 1982.03.03

　View Summary

1-6-8,1982
圧電体とガラスとの境界面を伝搬するストンリー波

清水康敬, 入野俊夫

超音波シンポジウム 1982.03

　View Summary

pp.79-80,1982
ZnOとガラス境界面を伝搬するストンリー波の理論的検討

清水康敬, 入野俊夫

電子通信学会技術研究報告 1982.01.29

　View Summary

電子通信学会超音波研究会資料,US81-63,1982
Interactive and real-time acoustic measurement tools for speech data acquisition and presentation: Application of an extended member of time stretched pulses

Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Mitsunori Mizumachi, Masanori Morise, Hideki Banno, Toshio Irino

Interspeech2021

▼display all

Patents

学習装置、学習方法、推定装置、推定方法及びプログラム

Patent no：特許第7424587号

Date registered： 2024.01.22

Date applied： 2020.08.27 （特願2020-143955 ） Publication date： 2022.03.10 （ 2022-39104 ）

Inventor(s)/Creator(s)：新井賢一、中谷智広、木下慶介、荒木章子、小川厚徳、入野俊夫 Applicant：日本電信電話株式会社、和歌山大学
予測装置、予測方法及び予測プログラム

Patent no：特許第7306626号

Date registered： 2023.07.03

Date applied： 2019.08.13 （特願2019-148529 ） Publication date： 2021.03.01 （特開2021-32909 ）

Inventor(s)/Creator(s)：入野俊夫、山本克彦、新井賢一、中谷智広、木下慶介、荒木章子、小川厚徳 Applicant：日本電信電話株式会社、和歌山大学
音声明瞭度計算方法、音声明瞭度計算装置及び音声明瞭度計算プログラム

Patent no： 11462228

Date registered： 2022.10.04 アメリカ

Date applied： 2018.08.03 （ 16/636032 ）

Inventor(s)/Creator(s)：入野俊夫、松井淑恵、荒木章子、木下慶介、中谷智広、山本克彦 Applicant：国立大学法人和歌山大学、日本電信電話株式会社
音声明瞭度計算方法、音声明瞭度計算装置及び音声明瞭度計算プログラム

Patent no：特許第6849978号

Date registered： 2021.03.09

Date applied： 2018.08.03 （特願2019-534607 ） Public disclosure date： 2020.07.09 （再表2019/027053 ）

Inventor(s)/Creator(s)：入野俊夫、松井淑恵、荒木章子、木下慶介、中谷智広、山本克彦 Applicant：国立大学法人和歌山大学、日本電信電話株式会社
信号処理装置及び方法並びに補聴特性の調整方法

Patent no： 6482117

Date registered： 2019.02.22

Date applied： 2015.02.16 （特願2015-27305 ） Publication date： 2016.08.22 （特開2016-152433 ）

Inventor(s)/Creator(s)：入野俊夫、河原英紀 Applicant：国立大学法人和歌山大学
周期信号処理方法、周期信号変換方法、周期信号処理装置および周期信号の分析方法

Patent no： 2178082

Date registered： 2016.08.17 フランス

Date applied： 2010.01.18 （ 8778299.1 ）

Inventor(s)/Creator(s)：河原英紀、森勢将雅、高橋徹、入野俊夫 Applicant：国立大学法人和歌山大学
周期信号処理方法、周期信号変換方法、周期信号処理装置および周期信号の分析方法

Patent no： 2178082

Date registered： 2016.08.17 ドイツ

Date applied： 2010.01.18 （ 8778299.1 ）

Inventor(s)/Creator(s)：河原英紀、森勢将雅、高橋徹、入野俊夫 Applicant：国立大学法人和歌山大学
周期信号処理方法、周期信号変換方法、周期信号処理装置および周期信号の分析方法

Patent no： 2178082

Date registered： 2016.08.17 イギリス

Date applied： 2010.01.18 （ 8778299.1 ）

Inventor(s)/Creator(s)：河原英紀、森勢将雅、高橋徹、入野俊夫 Applicant：国立大学法人和歌山大学
周期信号処理方法、周期信号変換方法、周期信号処理装置および周期信号の分析方法

Patent no： 8781819

Date registered： 2014.07.15 アメリカ

Date applied： 2010.01.18 （ 12/669533 ）

Inventor(s)/Creator(s)：河原英紀、森勢将雅、高橋徹、入野俊夫 Applicant：国立大学法人和歌山大学
周期信号処理方法、周期信号変換方法および周期信号処理装置ならびに周期信号の分析方法

Patent no： 5275612

Date registered： 2013.05.24

Date applied： 2007.11.06 （特願2007-289006 ） Publication date： 2009.02.26 （特開2009-42716 ）

Inventor(s)/Creator(s)：河原英紀、森勢将雅、高橋徹、入野俊夫 Applicant：国立大学法人和歌山大学
周期信号処理方法、周期信号変換方法、周期信号処理装置および周期信号の分析方法

Patent no： 10-1110141

Date registered： 2012.01.19 韓国

Date applied： 2010.02.18 （ 2010-7003580 ）

Inventor(s)/Creator(s)：河原英紀、森勢将雅、高橋徹、入野俊夫 Applicant：国立大学法人和歌山大学
インパルス応答測定方法及び装置

Patent no： 4552016

Date registered： 2010.07.23

Date applied： 2005.07.12 （特願2006-529052 ） Public disclosure date： 2008.05.01 （再表2006/011356 ）

Inventor(s)/Creator(s)：入野俊夫、河原英紀、坂野秀雄、森勢将雅 Applicant：国立大学法人和歌山大学
音の評価指標計算方法、評価データを生成する方法、音の評価装置、及びコンピュータプログラム

Date applied： 2022.06.07 （特願2022-092345 ） Publication date： 2023.12.19 （特開2023-179189 ）

Inventor(s)/Creator(s)：入野俊夫 Applicant：和歌山大学
音の評価指標計算方法、評価データを生成する方法、音の評価装置、及びコンピュータプログラム

Date applied： 2022.06.07 （特願2022-092345 ） Publication date： 2023.12.19 （特開2023-179189 ）

Inventor(s)/Creator(s)：入野俊夫 Applicant：国立大学法人和歌山大学

▼display all

Research Exchange

音声了解度のクラウドソーシングによる効率的取得法および客観予測手法の高度化に関する研究

2022.06

-

2023.02

　Joint research
科研萌芽　全体研究打ち合わせ

2022.05

　Joint research
日本音響学会　聴覚委員会委員長としての活動

2022.04

-

2024.04
科研B 全体研究打ち合わせ

2021.09

　Joint research
科研萌芽　全体研究打ち合わせ

2021.09

　Joint research
音声了解度のクラウドソーシングによる効率的取得法および客観予測手法の高度化に関する研究

2021.06

-

2022.02

　Joint research
音声了解度のクラウドソーシングによる効率的取得法および客観予測手法の高度化に関する研究

2020.06

-

2021.02

　Joint research
面接におけるノート使用のインタラクションへの影響評価

2020.04

-

2023.03

　Joint research
科研A + 萌芽研究合同ミーティング

2020.03
聴覚特性推定に基づく模擬難聴を用いた明瞭音声特徴の抽出

2019.09

-

2020.03

　Joint research
科研説明会　（講演し、パネルディスカッションのMCも行った。）

2019.07
初学者における傾聴のうわすべりの解明とその回避のための臨床心理学的研究

2018.06

-

2021.03

　Joint research
難聴理解を促進するための模擬難聴システムの開発と教育への応用

2018.06

-

2021.03

　Joint research
複合音ABR (cABR）における時間分解能から見た難聴病態解明と次世代補聴器開発

2017.04

-

2019.03

　Joint research
対話的可視化に基づく音声コミュニケーション研究支援環境

2017.04

-

2018.03

　Joint research
科研萌芽: 対話的可視化可聴化に基づく音声コミュニケーション研究支援環境

2017.04

-

2018.03

　Joint research
CREST応募説明会

2017.04
Collaboration on Hearing Impairment simulator

2016.09

-

2018.03

　Joint research
聴覚特性に基づく明瞭音声の客観指標と音声聴覚支援手法の開発

2016.06

-

2020.03

　Joint research
褒める行為が響くとき：カウンセリング対話への重層的なラベル付けによる解明

2016.04

-

2018.03

　Joint research
科研萌芽: 対話的可視化可聴化に基づく音声コミュニケーション研究支援環境

2016.04

-

2017.03

　Joint research
複合音ABR (cABR）における時間分解能から見た難聴病態解明と次世代補聴器開発

2016.04

-

2017.03

　Joint research
聴覚情報の静的表現に基づく高度音声処理基盤の構築

2016.04

-

2017.03

　Joint research
対話的可視化可聴化に基づく音声コミュニケーション研究支援環境

2016.04

-

2017.03

　Joint research
(科研A)^2 合同成果発表会

2016.03
講演会：　France CNRS Dr. Grimault 、京都市芸大津崎教授、和歌山大　入野教授　

2016.03
聴覚モデルによる音声評価の研究

2015.06

-

2016.03

　Joint research
聴覚情報の静的表現に基づく高度音声処理基盤の構築

2015.04

-

2017.03

　Joint research
聴覚の情報表現に基づく機能性音声デザイン機構の研究

2015.04

-

2016.03

　Joint research
臨床心理面接の時系列連続評価と客観定量化手法の開発

2015.04

-

2016.03

　Joint research
第1回　京都大学ー稲森財団　合同京都賞シンポジウム

2014.07
聴覚モデルによる音声評価の研究

2014.06

-

2015.03

　Joint research
非言語音認識の研究

2014.06

-

2015.03

　Joint research
クライアントが効果を実感できる聴き方の支援:概念再考と傾聴教育プログ開発

2014.04

-

2015.03

　Joint research
クライアントが効果を実感できる聴き方の支援：傾聴概念再考と傾聴教育プログラム開発

2014.04

-

2015.03

　Joint research
非言語音認識の研究

2013.08

-

2014.03

　Joint research
加齢に伴う絶対音感シフトに関する心理物理的実験検討と計算モデルの構築

2013.04

-

2017.03

　Joint research
感音難聴における時間分解能の測定機器の開発と、時間分解能エンハンス補聴器の開発

2013.04

-

2016.03

　Joint research
聴覚音声支援のための聴知覚特性の解明と信号処理開発

2013.04

-

2016.03

　Joint research
対面対話進行における探索と調整機構の解明：カウンセリング場面を中心に

2013.04

-

2015.03

　Joint research
科研費説明会

2012.09
加齢に伴う絶対音感シフトに関する心理物理的実験検討と計算モデルの構築

2012.04

-

2013.03

　Joint research
加齢に伴う絶対音感シフトに関する心理物理的実験検討と計算モデルの構築

2012.04

-

2013.03

　Joint research
CREST symposium on Human-Harmonized Information Technology

2012.04
ICASSP 2012, Kyoto

2012.03
非言語音認識の研究

2012.02

-

2013.01

　Joint research
難聴者の音声の聞こえの研究

2011.04

-

2013.03

　Joint research
聴覚における寸法知覚と音脈分凝に関する研究

2009.04

-

2012.03

　Joint research
臨床心理面接の対話における音声やうなずきの役割やその実践応用への研究

2007.04

-

2013.03

　Joint research
マルチモーダルデータからの不変情報の発見とその方法論の研究

2005.04

-

2010.03

　Joint research

▼display all

KAKENHI

高齢難聴者の聴知覚特性のモデル化とそれに基づく音声聴覚支援基盤の構築

2024.04

-

2027.03

Grant-in-Aid for Scientific Research(B) Principal investigator
介護職同士の会話はコミュニケーション媒体になりうるか：被介護者にさりげなく伝える

2023.04

-

2026.03

Grant-in-Aid for Scientific Research(C) Co-investigator
高齢難聴者への音声感情伝達特性の解明と革新的音声モーフィング手法の開発

2021.04

-

2024.03

Grant-in-Aid for Challenging Research(Exploratory) Principal investigator
聴知覚特性に基づき高齢難聴者に対応した音声コミュニケーション支援基盤の構築

2021.04

-

2024.03

Grant-in-Aid for Scientific Research(B) Principal investigator
面接におけるノート使用のインタラクションへの影響評価

2020.04

-

2023.03

Grant-in-Aid for Scientific Research(C) Co-investigator
初学者における傾聴のうわすべりの解明とその回避のための臨床心理学的研究

2018.04

-

2023.03

Grant-in-Aid for Scientific Research(C) Co-investigator
聴覚特性に基づく明瞭音声の客観指標と音声聴覚支援手法の開発

2016.04

-

2020.03

Grant-in-Aid for Scientific Research(A) Principal investigator
褒める行為が響くとき：カウンセリング対話への重層的なラベル付けによる解明

2016.04

-

2019.03

Grant-in-Aid for Scientific Research(C) Co-investigator
複合音ABR(cABR)における時間分解能から見た難聴病態解明と次世代補聴器開発

2016.04

-

2019.03

Grant-in-Aid for Scientific Research(C) Co-investigator
対話的可視化可聴化に基づく音声コミュニケーション研究支援環境

2016.04

-

2018.03

Grant-in-Aid for Challenging Exploratory Research Co-investigator
聴覚の情報表現に基づく機能性音声デザイン機構の研究

2015.04

-

2016.03

Grant-in-Aid for Challenging Exploratory Research Co-investigator
聴覚情報の静的表現に基づく高度音声処理基盤の構築

2015.04

-

2018.03

Grant-in-Aid for Scientific Research(B) Co-investigator
臨床心理面接の時系列連続評価と客観定量化手法の開発

2015.04

-

2018.03

Grant-in-Aid for Challenging Exploratory Research Principal investigator
クライアントが効果を実感できる聴き方の支援：傾聴概念再考と傾聴教育プログラム開発

2014.04

-

2017.03

Grant-in-Aid for Scientific Research(C) Co-investigator
感音難聴における時間分解能の測定機器の開発と、時間分解能エンハンス補聴器の開発

2013.04

-

2016.03

Grant-in-Aid for Scientific Research(C) Co-investigator
聴覚音声支援のための聴知覚特性の解明と信号処理開発

2013.04

-

2016.03

Grant-in-Aid for Scientific Research(B) Principal investigator
加齢に伴う絶対音感シフトに関する心理物理学的実験検討と計算モデルの構築

2012.04

-

2017.03

Grant-in-Aid for Scientific Research(A) Co-investigator
聴覚の情報表現に基づく高度音声分析変換合成方式の研究

2012.04

-

2015.03

Grant-in-Aid for Scientific Research(B) Co-investigator
対面対話進行における探索と調整機構の解明：カウンセリング場面を中心に

2012.04

-

2015.03

Grant-in-Aid for Scientific Research(C) Co-investigator
感性にはたらきかけるカウンセリングのためのパラ言語情報と身体動作の計測と解析

2011.04

-

2014.03

Grant-in-Aid for Challenging Exploratory Research Principal investigator
臨床心理面接における対話齟齬の理解：音声とうなずきの観点から

2010.04

-

2013.03

Grant-in-Aid for Scientific Research(C) Co-investigator
音声知覚の基盤となる聴覚特性と計算理論の研究

2009.04

-

2013.03

Grant-in-Aid for Scientific Research(B) Principal investigator
音声・音響信号に備わる寸法恒常性による音脈分凝と音色知覚の時間追従性

2009.04

-

2012.03

Grant-in-Aid for Scientific Research(B) Co-investigator
聴覚・音声機能の支援・拡張技術に関する総合的研究

2007.04

-

2011.03

Grant-in-Aid for Scientific Research(A) Co-investigator
初期聴覚系における寸法・形状知覚理論の検証と応用に関する研究

2006.04

-

2009.03

Grant-in-Aid for Scientific Research(B) Principal investigator
帰納的学習機械による空間音源定位に関する研究

2006.04

-

2009.03

Grant-in-Aid for Exploratory Research Co-investigator
音声知覚特性の解析に対する音声認識技術の適用

2006.04

-

2009.03

Grant-in-Aid for Exploratory Research Principal investigator
音響的生態を基礎とした音の知覚属性の実験的再検討

2005.04

-

2008.03

Grant-in-Aid for Scientific Research(C) Co-investigator
音を放射しない音響システム構築に関する研究

2005.04

-

2007.03

Grant-in-Aid for Exploratory Research Co-investigator
聴覚計算理論の構築とそれに基づく音信号処理の研究

2003.04

-

2006.03

Grant-in-Aid for Scientific Research(B) Principal investigator
聴覚情報表現の不動点に基づく聴覚的情景分析空間の構築

2003.04

-

2005.03

Grant-in-Aid for Exploratory Research Co-investigator

▼display all

Public Funding (other government agencies of their auxiliary organs, local governments, etc.)

Temporal processing in the auditory system from cochlea to cortex

2004.04

-

2010.03

Co-investigator
ユーザ負担のない話者・環境適応性を実現する自然な音声対話処理技術

2003.04

-

2008.03

Co-investigator
聴覚の情景分析に基づく音声・音響処理システム

1997.04

-

2003.03

Co-investigator

Competitive funding, donations, etc. from foundation, company, etc.

システム工学部寄附金(The Daiwa Foundation Anglo-Japanese Foundation (大和日英基金))

2006.04

-

2007.03

Research subsidy Principal investigator
システム工学部寄附金（(財) テレコム先端技術研究支援センター研究助成）

2005.04

-

2008.03

Research subsidy Principal investigator

Joint or Subcontracted Research with foundation, company, etc.

クラウドソーシングを用いた了解度の効率的取得法、および了解度客観予測法の適用範囲拡張の検討

2023.07

-

2024.02

Joint research Principal investigator
クラウドソーシングを用いた了解度の効率的取得法、および了解度客観予測法の適用範囲拡張に関する共同研究

2022.07

-

2023.02

Joint research Principal investigator
人間の聴覚の周波数分解能と時間分解能に関する研究

2021.11

-

2022.10

Joint research Principal investigator
最新の音声強調処理のクラウドソーシングによる評価および客観予測手法の高度化に関する研究

2021.07

-

2022.02

Joint research Principal investigator

Instructor for open lecture, peer review for academic journal, media appearances, etc.

InterSpeech reviewer

2023.05

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:1回
IEEE ICASSP reviewer

2022.10

-

2022.11

IEEE

　View Details

信号処理

国際会議IEEE ICASSPにおける査読
InterSpeech reviewer

2022.05

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:1回
IEEE ICASSP reviewer

2021.10

-

2021.11

IEEE

　View Details

信号処理

国際会議IEEE ICASSPにおける査読
InterSpeech reviewer

2021.06

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:1回
IEEE ICASSP reviewer

2020.10

-

2020.11

IEEE

　View Details

信号処理

国際会議IEEE ICASSPにおける査読
InterSpeech reviewer

2020.06

International Speech Communication Association

　View Details

音声科学／工学

国際会議InterSpeechの査読委員
毎日放送　News ミント　2月に取材を受けた番組が7月2日に放送

2020.02.19

-

2020.07.02

毎日放送

　View Details

毎日放送、Newsミント、音声合成、AI

毎日放送　News ミント【特集】「限りなく本人に近い『ＡＩ音声』　進化する"合成音声"技術が難病患者を手助け」ー　
2020年2月に取材を受けた番組が7月2日に放送され、その中でコメント。
IEEE ICASSP reviewer

2019.10

-

2019.11

IEEE

　View Details

信号処理

国際会議IEEE ICASSPにおける査読
大阪府立富田林高等学校の福田雅文教諭と生徒の指導

2019.06

その他

　View Details

小・中・高校生を対象とした学部体験入学・出張講座等

スーパーサイエンスハイスクールSSHの研究課題に関する指導を研究室にて行った。スピーカーの音についてのデモと研究方法を指導。高校のブログ https://www.osaka-c.ed.jp/blog/tondabayashi/koutyou/2019/06/13-143099.html,日付:6月22日
InterSpeech reviewer

2019.05

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:1回
IEEE ICASSP reviewer

2018.10

-

2018.11

IEEE

　View Details

信号処理

国際会議IEEE ICASSPにおける査読
InterSpeech reviewer

2018.05

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:1回
IEEE ICASSP reviewer

2017.10

-

2017.11

IEEE

　View Details

信号処理

国際会議IEEE ICASSPにおける査読
InterSpeech reviewer

2017.05

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:1回
メディア出演等

2017.03.28

日刊工業新聞

　View Details

研究成果に係る新聞掲載、テレビ・ラジオ出演

トップ科学技術・大学ニュース記事詳細 [ 科学技術・大学 ]和歌山大学モーションキャプチャー加速度センサージャイロセンサー映画・ゲームの「足音」リアルに−和歌山大、歩行データから自動合成
外国人研究者等の受入

2017.03

CNRS Lyon France

　View Details

外国人研究者等の受入

外国人研究者等の受入,氏名:Nicolas Grimault
IEEE ICASSP reviewer

2016.10

-

2016.11

IEEE

　View Details

信号処理

国際会議IEEE ICASSPにおける査読
講師　（招待講演)

2016.10

情報・システム研究機構

　View Details

講演講師等

講師　（招待講演),任期:1 回
InterSpeech reviewer

2016.05

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:1回
InterSpeech reviewer

2015.05

-

2016.03

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:2015
ASLP reviewer

2015.04

-

2021.03

IEEE ASLP (Acoustic, Signal, Language Processing)

　View Details

学術雑誌等の編集委員・査読・審査員等

ASLP reviewer ,任期:2015～2020
板倉記念独創研究奨励賞　委員

2015.04

-

2017.03

日本音響学会

　View Details

学術雑誌等の編集委員・査読・審査員等

板倉記念独創研究奨励賞　委員,任期:2015.4～
外国人研究者等の受入

2015.04

France ENS

　View Details

外国人研究者等の受入

外国人研究者等の受入,氏名:Alain de Cheivenier
Reviewer

2014.10

Speech Communication

　View Details

学術雑誌等の編集委員・査読・審査員等

Reviewer,任期:2014.10
論文賞選奨委員

2014.09

-

2015.05

日本音響学会

　View Details

学術雑誌等の編集委員・査読・審査員等

論文賞選奨委員 ,任期:2014.5～
EUSIPCO Reviewer

2014.04

EUSIPCO (Europian Signal Processing Conference

　View Details

学術雑誌等の編集委員・査読・審査員等

EUSIPCO Reviewer ,任期:2014.4
InterSpeech reviewer

2013.05

-

2020.10

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:2013～
編集委員会査読委員

2013.05

-

2020.05

日本音響学会

　View Details

学術雑誌等の編集委員・査読・審査員等

編集委員会査読委員 ,任期:2013.5～複数年（任期未定)
ICASSP reviewer

2013.01

-

2015.02

IEEE ICASSP (International Conference on Acoustic, Speech, and Signal Processing)

　View Details

学術雑誌等の編集委員・査読・審査員等

ICASSP reviewer ,任期:2013～2015
外国人研究者等の受入

2012.11

Yahoo Inc.

　View Details

外国人研究者等の受入

外国人研究者等の受入,氏名:Malcolm Slaney
外国人研究者等の受入

2012.09

University of Minnesota

　View Details

外国人研究者等の受入

外国人研究者等の受入,氏名:Andrew Oxenham
JASP Reviewer

2012.08

Journal of Advances in Signal Processing

　View Details

学術雑誌等の編集委員・査読・審査員等

JASP Reviewer ,任期:2012.8
外国人研究者等の受入

2011.09

Google Inc., University of Maryland

　View Details

外国人研究者等の受入

外国人研究者等の受入,氏名:Dick Lyon, Shihab Shamma
編集委員会　会誌部会　幹事

2011.05

-

2013.05

日本音響学会

　View Details

学術雑誌等の編集委員・査読・審査員等

編集委員会　会誌部会　幹事,任期:2011.5～2013.5
InterSpeech reviewer

2009.05

-

2013.10

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech reviewer ,任期:2009.5～2013.10
InterSpeech 2010 Area Cordinator

2009.05

-

2010.10

International Speech Communication Association

　View Details

学術雑誌等の編集委員・査読・審査員等

InterSpeech 2010 Area Cordinator ,任期:2009.5～2010.10
編集委員会査読委員

2005.09

-

2013.05

日本音響学会

　View Details

学術雑誌等の編集委員・査読・審査員等

編集委員会査読委員 ,任期:2005.9～2013.5
編集委員会　会誌部会委員

2005.05

-

2011.05

日本音響学会

　View Details

学術雑誌等の編集委員・査読・審査員等

編集委員会　会誌部会委員 ,任期:2005.5～2011.5
ASLP reviewer

2005.04

-

2015.10

IEEE ASLP (Acoustic, Signal, Language Processing)

　View Details

学術雑誌等の編集委員・査読・審査員等

ASLP reviewer ,任期:2005～2015
JASA Reviewer

2000.04

-

2020.04

Acoustical Society of America (ASA)

　View Details

学術雑誌等の編集委員・査読・審査員等

JASA Reviewer ,任期:2000.4～ (任期未定）

▼display all

Committee member history in academic associations, government agencies, municipalities, etc.

和歌山県大規模小売店舗立地審議会委員

2024.06.01

-

2025.03.31

和歌山県

　View Details

地域産業活性化

和歌山県大規模小売店舗立地審議会委員に就任し、専門的立場から意見をいただく。
新飛行経路案に係る環境検証委員会

2024.04

-

2025.03.31

大阪府、和歌山県、兵庫県

　View Details

関西国際空港、神戸空港、騒音評価、航空機発着枠

関西空港、神戸空港の航空機発着回数増加に伴う新飛行経路案に係る環境検証の委員として、評価を行う。
日本音響学会　聴覚研究会委員

2024.04

-

2025.03

日本音響学会

　View Details

音響学

日本音響学会・聴覚研究会の委員として、音響学の発展および普及を行う。
新飛行経路案に係る環境検証委員会

2023.07.05

-

2024.03.31

大阪府・兵庫県・和歌山県

　View Details

騒音、環境評価、地域活性化

新飛行経路案に係る環境検証委員会委員に就任し、専門的立場から意見を述べる。
日本音響学会評議員

2023.05.22

-

2025.05

日本音響学会

　View Details

音響学

日本における音響学の促進と普及をはかる。
和歌山県　公害審査委員候補者

2023.04.27

-

2025.03.31

和歌山県

　View Details

学協会、政府、自治体等の公的委員

公害紛争処理法第１８条に規定する公害審査委員候補者
日本音響学会代議員

2023.02

-

2025.02

日本音響学会

　View Details

音響学

日本における音響学の促進と普及をはかる。
日本音響学会関西支部会計監査

2022.04.23

-

2024.04

日本音響学会関西支部

　View Details

音響学

日本音響学会　関西支部　会計監査として、音響学の発展および普及を行う。
日本音響学会　聴覚研究会委員長

2022.04

-

2024.03

日本音響学会

　View Details

音響学

日本音響学会・聴覚研究会の委員長として、音響学の発展および普及を行う。
和歌山県公害審査委員候補者

2022.04

-

2023.03

和歌山県

　View Details

国や地方自治体、他大学・研究機関等での委員

委員候補者,任期:2020年4月～2021年4月
日本音響学会評議員

2021.05.22

-

2023.05

日本音響学会

　View Details

音響学

日本における音響学の促進と普及をはかる。
公害審査委員候補者

2021.04.27

-

2024.04.26

和歌山県

　View Details

学協会、政府、自治体等の公的委員

公害紛争処理法第１８条に規定する公害審査委員候補者
日本音響学会関西支部支部長

2021.04.22

-

2022.04

日本音響学会関西支部

　View Details

音響学

日本音響学会　関西支部　支部長として、音響学の発展および普及を行う。
和歌山県公害審査委員候補者

2021.04

-

2022.03

和歌山県

　View Details

国や地方自治体、他大学・研究機関等での委員

委員候補者,任期:2020年4月～2021年4月
日本音響学会代議員

2021.02

-

2023.02

日本音響学会

　View Details

音響学

日本における音響学の促進と普及をはかる。
和歌山県大規模小売店舗立地審議会委員

2020.06.01

-

2024.05.31

和歌山県

　View Details

地域産業活性化

和歌山県大規模小売店舗立地審議会委員に就任し、専門的立場から意見をいただく。
和歌山県公害審査委員候補者

2020.04.27

-

2021.04.26

和歌山県

　View Details

学協会、政府、自治体等の公的委員

和歌山県公害審査委員候補者に就任いただき、県民から公害紛争の調停等の申請がなされた場合、候補者から調停等を行う委員に任命させていだくための候補者
和歌山県環境影響評価審査会委員

2020.04.10

-

2024.04.09

和歌山県

　View Details

環境保護

和歌山県環境影響評価審査会委員に就任し、専門的立場から意見をいただく。
日本音響学会関西支部副支部長

2020.04

-

2021.03

日本音響学会関西支部

　View Details

学協会、政府、自治体等の公的委員

副支部長,任期:1年間
和歌山県公害審査委員候補者

2020.04

-

2021.03

和歌山県

　View Details

国や地方自治体、他大学・研究機関等での委員

委員候補者,任期:2020年4月～2021年4月
和歌山県公害審査委員候補者

2019.04

-

2020.03

和歌山県

　View Details

国や地方自治体、他大学・研究機関等での委員

委員候補者,任期:2019年４月～2020年３月
和歌山県洋上風力発電に係るゾーニング検討会委員

2019.02

-

2021.03

和歌山県

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2019年2月～2021年3月
代議員／評議員

2019.02

-

2021.02

日本音響学会

　View Details

学協会、政府、自治体等の公的委員

代議員／評議員,任期:2年間
委員

2018.06

-

2020.05

和歌山県大規模小売店舗立地審議会

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2018年6月～2020年5月
委員

2018.06

-

2020.05

和歌山県大規模小売店舗立地審議会

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2018年6月-2020年5月
委員

2018.04

-

2020.04

和歌山県環境影響評価審査会

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2018年4月～2020年4月
委員

2018.04

-

2020.04

和歌山県環境影響評価審査会委員

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2018/04～2020/04
委員

2018.04

-

2019.04

和歌山県公害審査

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2018年4月～2019年4月
委員候補者

2017.04

-

2018.03

和歌山県公害審査委員

　View Details

国や地方自治体、他大学・研究機関等での委員

委員候補者,任期:2017年４月～2018年３月
代議員／評議員

2017.02

-

2019.02

日本音響学会

　View Details

学協会、政府、自治体等の公的委員

代議員／評議員,任期:2年間
委員

2016.06

-

2018.05

和歌山県大規模小売店舗立地審議会

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2016年6月-2018年5月
委員

2016.04

-

2017.04

和歌山県環境影響評価審査会委員

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2016/04～2018/04
委員候補者

2016.04

-

2017.04

和歌山県公害審査委員候補者

　View Details

国や地方自治体、他大学・研究機関等での委員

委員候補者,任期:2016/04/27～2017/04/26
委員

2015.06

-

2017.03

科学技術振興機構　マッチングプランナープログラム　

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2015/06/17～2017/03/31
委員

2015.05

-

2016.04

和歌山県公害審査　第1号事件調停委員

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2015/05/12～2016/04/30
委員

2015.04

-

2017.03

科学技術振興機構　研究成果最適展開支援プログラム　

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2015/04/21～2017/03/31
委員候補者

2015.04

和歌山県公害審査委員候補者

　View Details

国や地方自治体、他大学・研究機関等での委員

委員候補者,任期:2015/04/27～2016/04/26
代議員

2015.02

-

2017.02

日本音響学会

　View Details

学協会、政府、自治体等の公的委員

代議員,任期:2015/2 - 2017/2
委員

2014.06

-

2016.05

和歌山県大規模小売店舗立地審議会

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2014年6月-2016年5月
委員

2014.06

-

2016.05

和歌山県大規模小売店舗立地審議会

　View Details

国や地方自治体、他大学・研究機関等での委員

委員 ,任期:2014/06/01～2016/05/31
委員

2013.06

-

2014.05

和歌山県大規模小売店舗立地審議会

　View Details

国や地方自治体、他大学・研究機関等での委員

委員 ,任期:2013/04/02～2014/05/31
委員

2013.05

-

2015.03

科学技術振興機構　研究成果最適展開支援プログラム専門委員

　View Details

国や地方自治体、他大学・研究機関等での委員

委員,任期:2013/05/27～2015/03/31
副委員長

2013.04

-

2015.05

日本音響学会

　View Details

学協会、政府、自治体等の公的委員

聴覚研究会副委員長,任期:2013/5 - 2015/5
代議員／評議員

2013.04

-

2015.05

日本音響学会

　View Details

学協会、政府、自治体等の公的委員

代議員／評議員,任期:2013/5 - 2015/5
委員

2012.06

-

2013.03

和歌山県大規模小売店舗立地審査会

　View Details

国や地方自治体、他大学・研究機関等での委員

委員 ,任期:2012/06/01～2013/03/31
委員

2011.05

-

2013.03

独立行政法人科学技術振興機構研究成果最適展開支援プログラム専門委員

　View Details

国や地方自治体、他大学・研究機関等での委員

国や地方自治体、他大学・研究機関等での委員,任期:2011/05/16～2013/03/31
代議員／評議員

2005.04

-

2013.05

日本音響学会

　View Details

学協会、政府、自治体等の公的委員

代議員／評議員,任期:2005/5 - 2013/5
委員

2004.06

-

2012.05

和歌山県大規模小売店立地審査会

　View Details

国や地方自治体、他大学・研究機関等での委員

委員 ,任期:2004.6～2012.5

▼display all

Other Social Activities

新居紙器における、新規展開に関する相談

2020.02

その他

　View Details

産業界、行政諸機関等と行った共同研究、新技術創出、コンサルティング等

新居紙器において、音関係の付加価値創出に関する相談を受けた,実施者:和歌山大学　産学連携イノベーションセンター
令和元年度和歌山大学産学官見学交流会研究室見学対応

2019.07

その他

　View Details

産業界、行政諸機関等と行った共同研究、新技術創出、コンサルティング等

最新の研究成果や研究環境を参加者に見せ、研究交流に役立てる。,実施者:和歌山大学　産学連携イノベーションセンター
大阪府泉南郡田尻町、防災無線の音声明瞭度改善に関する検討

2011.04

-

2012.03

その他

　View Details

ボランティア活動等

田尻町における防災無線の音声明瞭度を改善するため、町の担当者と相談を行った。研究の専門性を生かして支援することを行う。,実施者:入野俊夫