NIST：说话人身份识别系统中身份泄漏的评估（2025） 7页

VIP文档

ID：74374

阅读量：0

大小：0.54 MB

页数：7页

时间：2025-08-23

金币：10

上传者：PASHU

Pre-decisional Draft

EVALUATING IDENTITY LEAKAGE IN SPEAKER DE-IDENTIFICATION SYSTEMS

Seungmin Seo, Oleg Aulov, Afzal Godil, Kevin Mangold

National Institute of Standards and Technology, Gaithersburg, MD, USA

ABSTRACT

Speaker de-identiﬁcation aims to conceal a speaker’s identity

while preserving intelligibility of the underlying speech. We

introduce a benchmark that quantiﬁes residual identity leak-

age with three complementary error rates: equal error rate,

cumulative match characteristic hit rate, and embedding-

space similarity measured via canonical correlation analysis

and Procrustes analysis. Evaluation results reveal that all

state-of-the-art speaker de-identiﬁcation systems leak identity

information. The highest performing system in our evaluation

performs only slightly better than random guessing, while the

lowest performing system achieves a 45% hit rate within the

top 50 candidates based on CMC. These ﬁndings highlight

persistent privacy risks in current speaker de-identiﬁcation

technologies.

Index Terms— speaker de-identiﬁcation, voice privacy,

identity leakage

1. INTRODUCTION

The speech we stream through videoconferencing plat-

forms, voice assistants, and call-center recorders conveys

far more than lexical content: it embeds biometric sig-

natures that can single out an individual. Recent privacy

statutes—most prominently the EU’s General Data Protec-

tion Regulation (GDPR) and California’s Consumer Privacy

Act (CCPA)—explicitly classify these signatures as person-

ally identiﬁable information [1, 2].

Consequently, speaker de-identiﬁcation (SDID) systems

that operate on live, spontaneous speech have become a

research priority. Unlike ofﬂine voice-conversion or text-to-

speech pipelines, real-time SDID must satisfy millisecond-

scale latency budgets and preserve intelligibility and nat-

uralness, while withstanding attacks from state-of-the-art

speaker-recognition models [3].

Individual components—e.g. disentangled speaker–content

representation learning [4] and neural audio codecs [5] —

have shown promise, yet the ﬁeld still lacks a rigorous an-

swer to a central question: How much identity information

“leaks” through today’s end-to-end SDID pipelines?

Prior studies are difﬁcult to compare [6, 7, 8, 9, 10, 11,

12]; most rely on a single speaker–recognition back-end, and

a solitary metric such as equal error rate (EER). To advance

beyond this fragmented landscape, we introduce a multi-view

identity-leakage evaluation suite that integrates EER, cumu-

lative match characteristic (CMC) analysis, and embedding-

space similarity measured with canonical correlation analysis

(CCA) followed by Procrustes alignment [13].

Each perspective exposes a distinct facet of residual

speaker information: EER quantiﬁes binary veriﬁcation

risk, CMC reﬂects search-rank leakage, and the embedding

analysis localises where representations converge in latent

space. Each SDID system was required to meet the real-

time processing budget, evaluated independently by the other

test-and-evaluation agency; the present paper concentrates on

privacy metrics. Under this protocol, every system leaks iden-

tity: the best performance achieved exceeds random guessing

only marginally yet still signiﬁcantly, whereas the weakest

reaches a 45% hit-rate among the top-50 candidates on CMC.

These ﬁndings underscore the persistent challenge of robust,

privacy-preserving speaker de-identiﬁcation.

2. SPEAKER DE-IDENTIFICATION SYSTEMS

The ﬁve SDID systems in this study were submitted to NIST

for evaluation—all developed under the IARPA ARTS pro-

gram

—including four performer systems and one baseline

built by a Test & Evaluation partner. Note that no system de-

scriptions are publicly available at the time of this writing, so

the references reﬂect relevant work by the same researchers.

[14, 15, 16, 17]

Each system takes as an input a streaming speech seg-

ment and outputs a streaming modiﬁed version designed to

conceal the speaker’s identity. The primary goals are (1) to

prevent speaker-recognition models from linking original and

de-identiﬁed segments, and (2) to ensure that de-identiﬁed

segments generated for the same speaker (under the same or

different anonymization proﬁles) are either consistent or dis-

tinct as appropriate.

3. EVALUATION

3.1. Data

The evaluation set is derived from the Mixer 3 corpus [18].

We retained only native American English speakers with at

www.iarpa.gov/research-programs/arts

资源描述：

“Evaluating Identity Leakage in Speaker De-Identification Systems”一文介绍了一种用于量化说话人去识别系统中残留身份泄露的基准测试方法，该方法包含三个互补的错误率指标。研究结果表明，所有现有系统都存在身份信息泄露问题，即使是性能最佳的系统也仅略优于随机猜测，而性能最差的系统在前50个候选者中的命中率达到了45%。这凸显了当前说话人去识别技术中持续存在的隐私风险。 1. **说话人去识别系统**：五个系统提交给NIST评估，目标是防止说话人识别模型链接原始语音和去识别后的语音，并确保为同一说话人生成的去识别语音片段的一致性或差异性。 2. **评估** - **数据**：评估集来自Mixer 3语料库，保留至少有五个录音会话的美国英语母语者，共223人。 - **试验**：设计了不同的试验场景，包括目标试验和非目标试验，以评估系统在不同条件下的性能。 - **说话人识别系统**：使用了三种基于不同架构和训练策略的说话人识别模型。 - **去识别有效性**：通过比较原始语音和去识别后的语音，评估系统打破两者之间联系的能力。 - **匿名稳定性和配置文件冲突**：评估系统是否能在不同话语中保持一致的伪语音。 - **同一说话人匿名配置文件的区别**：测试不同匿名配置文件生成的去识别片段是否可区分。 - **测量身份泄露**：使用三种指标来衡量身份泄露，包括CMC命中率、AUC-CMC和嵌入空间相似度。 3. **结论**：多视图分析表明，身份泄露是普遍存在的，但具有异质性。单一指标评估可能会误判风险，所有系统都存在可检测到的身份泄露痕迹。

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 7



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

NIST：说话人身份识别系统中身份泄漏的评估（2025） 7页

最近更新

大家都在看

相关文章

相关标签