Publication Type : Conference Paper
Publisher : IEEE
Source : 2025 3rd International Conference on Inventive Computing and Informatics (ICICI)
Url : https://doi.org/10.1109/icici65870.2025.11069581
Campus : Bengaluru
School : School of Engineering
Department : Electronics and Communication
Year : 2025
Abstract : This paper integrates advanced speech processing and natural language understanding to enable seamless and empathetic human-computer interactions. Using Wav2Vec2 for precise speech-to-text conversion and SpeechBrain's emotion recognition model, the system classifies user emotions directly from audio input with high accuracy. The model demonstrates strong performance across emotions, achieving 89.27% accuracy for neutral, 86.25% for disgust, and 85.96% for anger. The overall F1 scores indicate reliable classification, with notable scores for disgust (0.85), anger (0.84), and surprise (0.81). The system maintains an average Word Error Rate (WER) of 12.5%, ensuring effective speech-to-text conversion. These detected emotions dynamically shape conversations with the Gemma bot via the Ollama API, ensuring intelligent and emotionally adaptive responses. To enhance realism, the Bark TTS model generates expressive speech, aligning chatbot responses with the user's emotional state. This work uniquely integrates emotion recognition, transcription, and emotion-aware text-to-speech synthesis, enabling real-time adaptation to emotional cues. By improving authenticity and empathy in conversational AI, this approach advances applications in virtual assistants, customer support, and mental health support.
Cite this Research Publication : Namratha B., Pranav Venkata Rama Chintalapudi, Susmitha Vekkot, Sreeja Kochuvila, Emotion-Driven Conversational AI: Speech Recognition and Response with Emotional Intonation, 2025 3rd International Conference on Inventive Computing and Informatics (ICICI), IEEE, 2025, https://doi.org/10.1109/icici65870.2025.11069581