Ava Speaker Diarization
Ava Speaker Diarization
Revolutionizing accessibility and inclusivity in communication methods for Deaf and hard of hearing individuals
Revolutionizing accessibility and inclusivity in communication methods for Deaf and hard of hearing individuals



AI-based research for captioning and speaker identification feature for Deaf and Hard-of-Hearing individuals on iOS.
AI-based research for captioning and speaker identification feature for Deaf and Hard-of-Hearing individuals on iOS.
Impact
Competitive advantage over other captioning software.
Competitive advantage over other captioning software.
Competitive advantage over other captioning software.
Role
AI model discovery, Product Designer
AI model discovery, Designer
AI model discovery, Designer
Duration
4 months
4 months
Tools
Figjam, Figma, Notion
Figjam, Figma, Notion
Year
2022
2022
Challenge
Deaf and hard-of-hearing users who used the Ava app in solo mode were unable to identify "who said what" when multiple speakers in a group were talking, unless they switched to a group conversation where all participants (different speakers) were connected on the Ava app.
Deaf and hard-of-hearing users who used the Ava app in solo mode were unable to identify "who said what" when multiple speakers in a group were talking, unless they switched to a group conversation where all participants (different speakers) were connected on the Ava app.
Deaf and hard-of-hearing users who used the Ava app in solo mode were unable to identify "who said what" when multiple speakers in a group were talking, unless they switched to a group conversation where all participants (different speakers) were connected on the Ava app.



Goals
• Allow users to easily identify speakers with 1 device (in a solo conversation) with as little impact on the conversation flow as possible.
• Be able to easily rename a speaker or re-assign a transcript to oneself or another user if the backend assigns a speaker wrongly.
• Be able to express the uncertainty of the speaker by the backend during a conversation. Additionally, users should be able to easily invite the identified speaker to their conversation for better accuracy.
• Allow users to easily identify speakers with 1 device (in a solo conversation) with as little impact on the conversation flow as possible.
• Be able to easily rename a speaker or re-assign a transcript to oneself or another user if the backend assigns a speaker wrongly.
• Be able to express the uncertainty of the speaker by the backend during a conversation. Additionally, users should be able to easily invite the identified speaker to their conversation for better accuracy.
• Allow users to easily identify speakers with 1 device (in a solo conversation) with as little impact on the conversation flow as possible.
• Be able to easily rename a speaker or re-assign a transcript to oneself or another user if the backend assigns a speaker wrongly.
• Be able to express the uncertainty of the speaker by the backend during a conversation. Additionally, users should be able to easily invite the identified speaker to their conversation for better accuracy.
Technical constraints
• The backend could not immediately recognize a new speaker during a conversation. Initially, it makes guesses about the speaker's identity until it has received enough audio that meets the assigned threshold for the new speaker.
• For the first version, the backend will not be able to retain the voice iD of the identified speakers in the conversation if the user already named them.
• The design should be compatible with the iOS app only for the first version.
• The backend could not immediately recognize a new speaker during a conversation. Initially, it makes guesses about the speaker's identity until it has received enough audio that meets the assigned threshold for the new speaker.
• For the first version, the backend will not be able to retain the voice iD of the identified speakers in the conversation if the user already named them.
• The design should be compatible with the iOS app only for the first version.
• The backend could not immediately recognize a new speaker during a conversation. Initially, it makes guesses about the speaker's identity until it has received enough audio that meets the assigned threshold for the new speaker.
• For the first version, the backend will not be able to retain the voice iD of the identified speakers in the conversation if the user already named them.
• The design should be compatible with the iOS app only for the first version.
Plan & Product Specification
This research and discovery project involved close collaboration with the Product and AI teams to draft the product specification and test the underlying AI model.
Due to legal constraints, I've omitted a lot of information on the design process, but I've compiled a detailed article & guide on designing intuitive experiences for AI & ML systems in real-time.
Plan & Product Specification
This research and discovery project involved close collaboration with the Product and AI teams to draft the product specification and test the underlying AI model.
Due to legal constraints, I've omitted a lot of information on the design process, but I've compiled a detailed article & guide on designing intuitive experiences for AI & ML systems in real-time.
This research and discovery project involved close collaboration with the Product and AI teams to draft the product specification and test the underlying AI model.
Due to legal constraints, I've omitted a lot of information on the design process, but I've compiled a detailed article & guide on designing intuitive experiences for AI & ML systems in real-time.
Product Showcase
Product Showcase
Assign speaker experience
Here is a gif. of the product demo illustrating the user's experience when an identified speaker is named during a conversation.
Here is a gif. of the product demo illustrating the user's experience when an identified speaker is named during a conversation.



Summary and Learnings
I gained valuable insights from this project, honing skills in iteration, refining product specifications, & collaborating closely with the AI team.
My primary challenge involved designing the conversation experience to intuitively express the uncertainty of the speaker in real-time.
To tackle this challenge, I collaborated closely with the AI team to test the prototype. This allowed me to not only better understand user needs but also what the AI side of the project could and could not deliver. Furthermore, I've documented my approach and learnings on designing intuitive experiences for AI and ML systems in my latest article, Design for Uncertainty.
I gained valuable insights from this project, honing skills in iteration, refining product specifications, & collaborating closely with the AI team.
My primary challenge involved designing the conversation experience to intuitively express the uncertainty of the speaker in real-time.
To tackle this challenge, I collaborated closely with the AI team to test the prototype. This allowed me to not only better understand user needs but also what the AI side of the project could and could not deliver. Furthermore, I've documented my approach and learnings on designing intuitive experiences for AI and ML systems in my latest article, Design for Uncertainty.