NIST:说话人身份识别系统中身份泄漏的评估(2025) 7页

VIP文档

ID:74374

阅读量:0

大小:0.54 MB

页数:7页

时间:2025-08-23

金币:10

上传者:PASHU
Pre-decisional Draft
EVALUATING IDENTITY LEAKAGE IN SPEAKER DE-IDENTIFICATION SYSTEMS
Seungmin Seo, Oleg Aulov, Afzal Godil, Kevin Mangold
National Institute of Standards and Technology, Gaithersburg, MD, USA
ABSTRACT
Speaker de-identification aims to conceal a speaker’s identity
while preserving intelligibility of the underlying speech. We
introduce a benchmark that quantifies residual identity leak-
age with three complementary error rates: equal error rate,
cumulative match characteristic hit rate, and embedding-
space similarity measured via canonical correlation analysis
and Procrustes analysis. Evaluation results reveal that all
state-of-the-art speaker de-identification systems leak identity
information. The highest performing system in our evaluation
performs only slightly better than random guessing, while the
lowest performing system achieves a 45% hit rate within the
top 50 candidates based on CMC. These findings highlight
persistent privacy risks in current speaker de-identification
technologies.
Index Terms speaker de-identification, voice privacy,
identity leakage
1. INTRODUCTION
The speech we stream through videoconferencing plat-
forms, voice assistants, and call-center recorders conveys
far more than lexical content: it embeds biometric sig-
natures that can single out an individual. Recent privacy
statutes—most prominently the EU’s General Data Protec-
tion Regulation (GDPR) and California’s Consumer Privacy
Act (CCPA)—explicitly classify these signatures as person-
ally identifiable information [1, 2].
Consequently, speaker de-identification (SDID) systems
that operate on live, spontaneous speech have become a
research priority. Unlike offline voice-conversion or text-to-
speech pipelines, real-time SDID must satisfy millisecond-
scale latency budgets and preserve intelligibility and nat-
uralness, while withstanding attacks from state-of-the-art
speaker-recognition models [3].
Individual components—e.g. disentangled speaker–content
representation learning [4] and neural audio codecs [5]
have shown promise, yet the field still lacks a rigorous an-
swer to a central question: How much identity information
“leaks” through today’s end-to-end SDID pipelines?
Prior studies are difficult to compare [6, 7, 8, 9, 10, 11,
12]; most rely on a single speaker–recognition back-end, and
a solitary metric such as equal error rate (EER). To advance
beyond this fragmented landscape, we introduce a multi-view
identity-leakage evaluation suite that integrates EER, cumu-
lative match characteristic (CMC) analysis, and embedding-
space similarity measured with canonical correlation analysis
(CCA) followed by Procrustes alignment [13].
Each perspective exposes a distinct facet of residual
speaker information: EER quantifies binary verification
risk, CMC reflects search-rank leakage, and the embedding
analysis localises where representations converge in latent
space. Each SDID system was required to meet the real-
time processing budget, evaluated independently by the other
test-and-evaluation agency; the present paper concentrates on
privacy metrics. Under this protocol, every system leaks iden-
tity: the best performance achieved exceeds random guessing
only marginally yet still significantly, whereas the weakest
reaches a 45% hit-rate among the top-50 candidates on CMC.
These findings underscore the persistent challenge of robust,
privacy-preserving speaker de-identification.
2. SPEAKER DE-IDENTIFICATION SYSTEMS
The five SDID systems in this study were submitted to NIST
for evaluation—all developed under the IARPA ARTS pro-
gram
1
—including four performer systems and one baseline
built by a Test & Evaluation partner. Note that no system de-
scriptions are publicly available at the time of this writing, so
the references reflect relevant work by the same researchers.
[14, 15, 16, 17]
Each system takes as an input a streaming speech seg-
ment and outputs a streaming modified version designed to
conceal the speaker’s identity. The primary goals are (1) to
prevent speaker-recognition models from linking original and
de-identified segments, and (2) to ensure that de-identified
segments generated for the same speaker (under the same or
different anonymization profiles) are either consistent or dis-
tinct as appropriate.
3. EVALUATION
3.1. Data
The evaluation set is derived from the Mixer 3 corpus [18].
We retained only native American English speakers with at
1
www.iarpa.gov/research-programs/arts
资源描述:

“Evaluating Identity Leakage in Speaker De-Identification Systems”一文介绍了一种用于量化说话人去识别系统中残留身份泄露的基准测试方法,该方法包含三个互补的错误率指标。研究结果表明,所有现有系统都存在身份信息泄露问题,即使是性能最佳的系统也仅略优于随机猜测,而性能最差的系统在前50个候选者中的命中率达到了45%。这凸显了当前说话人去识别技术中持续存在的隐私风险。 1. **说话人去识别系统**:五个系统提交给NIST评估,目标是防止说话人识别模型链接原始语音和去识别后的语音,并确保为同一说话人生成的去识别语音片段的一致性或差异性。 2. **评估** - **数据**:评估集来自Mixer 3语料库,保留至少有五个录音会话的美国英语母语者,共223人。 - **试验**:设计了不同的试验场景,包括目标试验和非目标试验,以评估系统在不同条件下的性能。 - **说话人识别系统**:使用了三种基于不同架构和训练策略的说话人识别模型。 - **去识别有效性**:通过比较原始语音和去识别后的语音,评估系统打破两者之间联系的能力。 - **匿名稳定性和配置文件冲突**:评估系统是否能在不同话语中保持一致的伪语音。 - **同一说话人匿名配置文件的区别**:测试不同匿名配置文件生成的去识别片段是否可区分。 - **测量身份泄露**:使用三种指标来衡量身份泄露,包括CMC命中率、AUC-CMC和嵌入空间相似度。 3. **结论**:多视图分析表明,身份泄露是普遍存在的,但具有异质性。单一指标评估可能会误判风险,所有系统都存在可检测到的身份泄露痕迹。

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭