IPSJ-SLP112

第120回音声言語情報処理研究会 (SIG-SLP)

—————————————————————————————————
   第120回 音声言語情報処理研究会 (SIG-SLP) http://sig-slp.jp/
   第118回 音楽情報科学研究会 (SIG-MUS)    http://www.sigmus.jp/
   共催研究会 開催のご案内
—————————————————————————————————

日程：2018年2月20日(火)・21日(水)

会場：筑波山 江戸屋旅館
   〒300-4352 茨城県つくば市筑波728 
   TEL 029-866-0321
   URL http://www.tsukubasan.co.jp

交通アクセス：
   関東鉄道 筑波山シャトルバス つくばセンター・筑波山神社入口
   臨時バス運行予定 36分720円
   2月20日 つくばセンター発 11:00発→筑波山神社入口11:36着 
   2月21日 筑波山神社入口 18:00発→つくばセンター着18:36着
   https://kantetsu.co.jp/bus/mt_tsukuba.html

事前参加登録：
   　本研究会は合宿形式で開催します。合宿参加には事前登録が必要です。

——————————

●招待講演1

Recent Advances in our Neural Parametric Singing Synthesizer
   Dr. Jordi Bonada (Universitat Pompeu Fabra, Spain)

Abstract: We recently presented a new model for singing synthesis based on a modified version
of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a
parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying
pitch to match any target melody, facilitates training on more modest dataset sizes, and significantly
reduces training and generation times. Nonetheless, compared to modeling waveform directly, ways of
effectively handling higher-dimensional outputs, multiple feature streams and regularization become
more important with our approach. In this work, we extend our proposed system to include additional
components for predicting F0 and phonetic timings from a musical score with lyrics. These expression-related 
eatures are learned together with timbrical features from a single set of natural songs. We compare our
method to existing statistical parametric, concatenative, and neural network-based approaches using
quantitative metrics as well as listening tests.

Biography: Jordi Bonada received the Ph.D. degree in Computer Science and Digital Communications from
the Universitat Pompeu Fabra (UPF). Since 1996 he has been a researcher at the Music Technology Group of
the UPF while leading several projects funded by public and private institutions. Dr. Bonada has a long
research experience supported by more than 80 scientific publications and over 50 patents. Some of the
algorithms he has pro- posed have been integrated into successful commercial products such as Vocaloid.

●招待講演2

Tacotron: End-to-end high quality speech synthesis 
   Dr. Yuxuan Wang (Google, USA)

Abstract: Text-to-speech synthesis system typically consists of multiple stages, such as a text analysis
frontend, an acoustic model and an audio synthesis module. Building these components often requires
extensive domain expertise and may contain brittle design choices. In this talk, I will describe recent
advances on end-to-end neural speech synthesis modeling at Google.

I will start from introducing Tacotron, our first generation end-to-end model that synthesizes speech
directly from characters. Given <text, audio> pairs, the model can be trained completely from scratch with
random initialization. Tacotron greatly simplifies TTS pipeline and outperforms a production parametric
system in terms of mean opinion score (MOS). To further improve audio quality, I will describe Tacotron 2,
which combines Tacotron with a modified WaveNet model acting as a vocoder. Tacotron 2 achieves a MOS
of 4.53 comparable to a MOS of 4.58 for professionally recorded speech. In addition to audio quality, prosodic
modeling is also a core problem for speech synthesis. In the end, I will discuss “style token”, an unsupervised
method for style modeling and control with end-to-end models like Tacotron.

Biography: Yuxuan Wang completed his Ph.D in computer science at the Ohio State University as a Presidential
Fellow. During his Ph.D, he pioneered the use of deep learning techniques in speech separation. Notably, his
work led to the first ever demonstration of improved speech intelligibility for hearing-impaired listeners in
background noise. Yuxuan Wang joined Google Research in 2015, where he is currently a Senior Research
Scientist. His research interest includes far-field speech recognition, generative models for speech, and sequence 
earning in general. Most recently, his research focuses on developing an end-to-end neural speech synthesis
model known as Tacotron

●プログラム

2018/02/20 (火)
   ◇ [12:30-13:45] 一般講演: 音声生成・合成 ◇
   座長：SLP 
   (1) 12:30-12:55 Distilling Knowledge from a Multi-scale Deep CNN Ensemble for Robust and Light-weight Acoustic Modeling 
   ◯Heck Michael (Nara Institute of Science and Technology), Suzuki Masayuki, Fukuda Takashi, Kurata Gakuto (IBM Research AI), 
   Nakamura Satoshi (Nara Institute of Science and Technology)

(2) 12:55-13:20 Positive Emotion Elicitation in an Example-Based Dialogue System 
   ◯Lubis Nurul, Sakti Sakriani, Yoshino Koichiro, Nakamura Satoshi (Nara Institute of Science and Technology)

(3) 13:20-13:45 Application of the velvet noise and its variant for synthetic speech and singing 
   ◯Hideki Kawahara (Wakayama University)

◇ [13:55-15:55] 招待講演 ◇
   座長：SLP 
   (4) 13:55-14:55 Recent Advances in Our Neural Parametric Singing Synthesizer
   ◯Jordi Bonada, Merlijn Blaauw (Universitat Pompeu Fabra, Spain)

(5) 14:55-15:55 Tacotron: End-to-end high quality speech synthesis 
   ◯Yuxuan Wang  (Google, USA)

◇ [16:05-17:20] 一般講演: 音声合成・認識・対話 ◇
   座長：SLP 
   (6) 16:05-16:30 Investigation of WaveNet for Text-to-Speech Synthesis 
   ◯Xin Wang, Shinji Takaki, Junichi Yamagishi (National Institute of Informatics)

(7) 16:30-16:55 Stealing your vocal identity from the internet: cloning Obama's voice from found data using GAN and Wavenet 
   ◯Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi (National Institute of Informatics)

(8) 16:55-17:20 Generating segment-level foreign-accented synthetic speech with natural speech prosody
   ◯Gustav Henter, Jaime  Lorenzo-Trueba, Xin Wang (National Institute of Informatics), Kondo Mariko (Waseda University), 
   Junichi Yamagishi  (National Institute of Informatics)

◇ [17:30-19:10] 一般講演: [MUS 生成・認識] ◇
   座長：MUS 
   (9) 17:30-17:55 Songle Sync: 音楽に連動させて多様なデバイスを大規模に制御できるプラットフォーム
   ◯尾形 正泰, 井上 隆広, 加藤 淳, 後藤 真孝 (産業技術総合研究所)

(10) 17:55-18:20 MIDI2Pose: 鍵盤演奏情報を用いたオンライン演奏動作生成
   MIDI2Pose: Online keyboard performance motion generation from performance data　
   Li Bochen (ロチェスター大学), ◯前澤 陽 (ヤマハ株式会社)

(11) 18:20-18:45 自動運転車のためのリアルタイム作曲システムに向けて
   Towards a "realtime musical composing" system for autonomous vehicle
   ◯長嶋 洋一(静岡文化芸術大学)

(12) 18:45-19:10 全層ゲート付き2次元畳み込みネットワークによる多重音信号の音高認識　
   ◯生田目 敬弘(東京工業大学), 亀岡 弘和(日本電信電話株式会社 NTTコミュニケーション科学基礎研究所), 篠田 浩一(東京工業大学)

◇ [21:00-22:30] デモンストレーション ◇
   (13) 21:00-22:30 音声・音楽情報処理の研究紹介
   a) VoiceTextの紹介-
   ◯虫鹿弘二，小沼海（HOYA）

 b) しゃべってらくらく 洲浜商店-
   ◯田中翔平（名古屋工業大学），寺本裕成（愛知工業大学），李晃伸（名古屋工業大学）

 c) Sneak Preview of the 2nd Voice Conversion Challenge 2018-
   ◯Junichi Yamagishi, Jaime Lorenzo-Trueba (National Institute of Informatics), 
   Tomoki Toda (Nagoya University), Daisuke Saito (Tokyo University),
   Fernando Villavicencio (ObEN), Tomi Kinnunen (University of Eastern Finland),
   Zhenhua Ling (University of Science and Technology of China)

 d) Lyric Jumper：アーティストごとの歌詞の傾向を考慮したトピックモデルに基づく歌詞探索サービス-
   ◯佃 洸摂，石田 啓介，後藤 真孝（産業技術総合研究所）

 e) TextAlive: 楽曲歌詞のKinetic Typography動画のための統合制作環境-
   ◯加藤 淳，中野 倫靖，後藤 真孝（産業技術総合研究所）

 f) Songle Sync: 音楽に連動させて多様なデバイスを 大規模に制御できるプラットフォーム-
   ◯尾形 正泰，井上 隆広，加藤 淳，後藤 真孝（産業技術総合研究所）

 g) 楽曲中の歌声とユーザ歌唱のリアルタイムアラインメントに基づく伴奏追従型カラオケシステム-
   ◯和田 雄介，坂東 宜昭，中村 栄太，糸山 克寿，吉井 和佳（京都大学）

 h) フレーズ表現を多角的に俯瞰するための演奏表現支援システム-
   ◯橋田光代（相愛大学）

 i) Songrium: Web-native music の俯瞰的な鑑賞を支援する音楽視聴支援サービス-
   ◯濱崎 雅弘、石田 啓介、佃 洸摂、深山 覚、中野 倫靖、後藤 真孝（産業技術総合研究所）

 j) リスナー間の楽曲嗜好傾向の可視化の一手法-
   ◯吉久怜子, 大矢 隼士（レコチョク）, 伊藤貴之（お茶の水女子大学）, 山内 和樹（レコチョク）

 k) 調とリズムを考慮した階層隠れセミマルコフモデルに基づく歌声F0軌跡に対する音符推定
   ◯錦見 亮，中村 栄太，後藤真孝，糸山 克寿，吉井 和佳（京都大学）

2018/02/21(水)
   ◇ [09:00-10:50] 企画・一般講演: 音声認識・対話 ◇
   座長：SLP 
   (14) 09:00-10:00 国際会議Interspeech2017報告 
   高木 信二(国立情報学研究所), 倉田 岳人(日本IBM 東京基礎研究所), 郡山 知樹(東京工業大学), 塩田 さやか(首都大学東京),  鈴木 雅之(日本IBM 東京基礎研究所)
   玉森 聡(名古屋大学), 俵 直弘(早稲田大学), 中鹿 亘(電気通信大学), 福田 隆(日本IBM 東京基礎研究所), 増村 亮(NTTメディアインテリジェンス研究所), 
   森勢 将雅(山梨大学), 山岸 順一(国立情報学研究所), 山本 克彦(和歌山大学)

(15) 10:00-10:25 広帯域用ニューラルネットワーク音響モデル群から狭帯域用音響モデルへの知識蒸留
   ◯福田 隆,  鈴木 雅之, 倉田 岳人((日本IBM 東京基礎研究所)), Thomas Samuel, Ramabhadran Bhuvana (IBMワトソンリサーチセンター)

(16) 10:25-10:50 CTCによる文字単位のモデルを併用したAttentionによる単語単位のEnd-to-End音声認識　
   ◯上乃 聖, 稲熊 寛文, 三村 正人, 河原 達也(京都大学)

◇ [11:00-12:40] 一般講演: 構造解析 ◇
   座長：MUS 
   (17) 11:00-11:25 カバーソング同定法を応用したメドレー楽曲における楽曲断片検出法の提案
   Detection Method of Musical Segments based on
   Cross Recurrence Quantification for Cover Song Identification
   ◯佐藤 僚太, 竹川 佳成, 平田 圭二(公立はこだて未来大学)

(18) 11:25-11:50 暗意実現モデルに基づき作曲者識別を行うHMMについて
   About HMM which Performs Composer Identification Based on Implication-Realization Model
   ◯能登 楓, 平田 圭二, 竹川 佳成(公立はこだて未来大学)

(19) 11:50-12:15 進化言語学に基づいた楽譜解析手法の提案
   ◯須藤 洸基(北陸先端科学技術大学院大学/日本学術振興会特別研究員), 東条 敏(北陸先端科学技術大学院大学)

(20) 12:15-12:40 記号と信号処理の相互作用フレームワークの構築に向けたGTTMの大域的構造を考慮した音響信号の分節の調整　
   ◯澤田 隼, 竹川 佳成, 平田 圭二(公立はこだて未来大学)

◇ [13:40-15:20] 一般講演: 声質変換・話者適応・対話 ◇
   座長：SLP 
   (21) 13:40-14:05 リカレント構造を持つ複素制限ボルツマンマシンによる複素スペクトル系列モデリング
   ◯中鹿 亘(電気通信大学), 高木 信二, 山岸 順一(国立情報学研究所)

(22) 14:05-14:30 劣化音声を用いたDNN音声合成のための話者類似度に基づく教師なし話者適応 
   ◯高木 信二(国立情報学研究所), 西村 祥一(オルツ), 山岸 順一(国立情報学研究所)

(23) 14:30-14:55 CycleGANを用いたクロスリンガル声質変換　
   ◯房 福明, Jaime Lorenzo-Trueba, 山岸 順一, 越前 功(国立情報学研究所)

(24) 14:55-15:20 システム・ユーザ発話に着目した対話破綻検出
   Detection of Dialogue Breakdown Using Utterances Information
   ◯阿部 元樹, 栂井 良太, 綱川 隆司, 西田 昌史, 西村 雅史(静岡大学)

◇ [15:40-17:20] 一般講演: 分析◇
   座長：MUS 
   (25)15:40-16:05 伴奏システムのテンポ制御モデルの検討　
   ◯堀内 靖雄, 足立 亜里紗, 黒岩 眞吾(千葉大学)

(26)16:05-16:30 歌声の習熟度に関連する音響特徴量の母音別分布　
   ◯山下 泰樹(長野県工科短期大学校), 香山 瑞恵, 池田 京子, 吉田 祥, 平林 花菜, 伊東 一典(信州大学), 浅沼 和志(国立高専機構長野高専)

(27)16:30-16:55 ポピュラー音楽における人間のサビ認識に関する研究　
   ◯宮澤 響, 平賀 譲(筑波大学)

(28)16:55-17:20 音楽と音声に共通するリズム要素の短音列音声における検証
   Verification of the Common Rhythmical Factor both in Music and in Speech
   ◯吉田 友敬, 原 史恵, 梅原 綾花, 棚橋 紀幸, 田添 詩奈, 行村 涼(名古屋文理大学), 武田 昌一(高野山大学)

●参加費
　以下の通りです(当日お支払いください)．
・SLPもしくはMUS研究会登録会員：無　料
　一般：情報処理学会 正会員：2,000円
　一般：情報処理学会 非会員：3,000円
　学生：情報処理学会 学生会員：500円
　学生：情報処理学会 非会員：1,000円
http://www.ipsj.or.jp/kenkyukai/sanka.html

●動画中継
   　今回の研究会では，インターネットを利用した研究発表の動画収録・中継を
   試行する予定です．中継は情報処理学会公式ニコニコチャンネルにて行われます．
   http://ch.nicovideo.jp/ipsj/live

●本研究会、および、SLP研に関する問い合わせ先
   SLP研究会 山岸順一 (国立情報学研究所)
   E-mail: jyamagis@nii.ac.jp
   MUS研、および、会場に関する照会先:
   中野倫靖 (産業技術総合研究所)
   E-mail: t.nakano@aist.go.jp
   松原正樹 (筑波大学)
   E-mail:masaki@slis.tsukuba.ac.jp

★SIG-SLP 研究会幹事団
   主査: 峯松信明 (東大)
   幹事: 篠崎隆宏 (東工大)、山岸順一 (国立情報学研究所)、福田隆 (日本IBM)

★SIG-MUS 研究会幹事団
   主査: 吉井和佳 (京都大学)
   幹事：中野倫靖 (産業技術総合研究所)、亀岡弘和 (NTTコミュニケーション科学基礎研究所)
   伊藤彰則 (東北大学)、平田圭二 (公立はこだて未来大学)、齋藤大輔 (東京大学)