The Audio and Speech Processing With Deep Learning (ASPDL) Research group aims to develop state-of-the-art technologies for audio and speech signal analysis and processing, with many applications in audio content analysis, context-aware devices, language learning, human-computer interaction, acoustic monitoring, multimedia information retrieval, hearing aids, and other assistive technologies.
- Audio and Speech Processing with Deep Learning.
- Audio-Only Speech Enhancement With Deep Learning.
- Audio-Visual Speech Enhancement With Deep Learning.
- Speech Emotion Recognition With Deep Learning.
- Source Separation With Deep Learning.
Research: Audio-Only Speech Enhancement with E2E Neural Models.
Research: Visually Driven E2E Multimodal Neural Speech Enhancement.
Research: Audio-Video Speech Processing with E2E Neural Models.
Fazal E WAHAB
Research: Real-Time Speech Enhancement for Resource Limited Devices.
Research Publications (2023)
- Peracha, Fahad Khalil, Muhammad Irfan Khattak, Nema Salem, and Nasir Saleem. “Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network.” Plos one 18, no. 5 (2023): e0285629.
- Fazal E Wahab, Ye, Zhongfu, Nasir Saleem, and Hamza Ali. “Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement.” International Journal of Interactive Multimedia and Artificial Intelligence (2023).
- Fazal E Wahab, Ye, Zhongfu, Nasir Saleem, and Rizwan Ulah. “Real-Time Speech Enhancement in Frequency Domain.”Speech Communication (Under Review).
- Jawad Ali, Nasir Saleem. “Modeling Spatio-Temporal Features Representation With Recurrent Capsule Networks for Monaural Speech Enhancement“. Applied Acoustics (Under Review).
- Jawad Ali, Junaid Mushtaq, Sher Muhammad, Iqra Batool, Nasir Saleem. “Waveform-based Lightweight Neural Models for Speech Enhancement. IEEE ICET 23 Conference (Under Review).
- Junaid Mushtaq, Jawad Ali, Sher Muhammad, Iqra Batool, Nasir Saleem “A Deep Neural Framework to Estimate Spectral Magnitude and Phase for Single-Channel Speech Enhancement. IEEE ICIT Conference (Accepted for Oral Presentation).
ASPDL shares its gratitude to Professor. Dr. Muhammad Irfan Khattak, University of Engineering & Technology, Peshawar for his continuous support.
The Audio and Speech Processing With Deep Learning (ASPDL) Research Group collaborates with researchers from various national and international universities in a non-profitable research partnership. Their aim is to conduct advanced research in audio, speech, image, and video processing. The group is motivated for taking part in research activities.
By conducting progressive research in audio, speech, image, and video processing, ASPDL in collaboration with UET Peshawar, Virginia University (USA), International University of La Rioja (Spain), and Islamic International University Malaysia (IIUM) can contribute to the development of cutting-edge algorithms and techniques for speech and audio processing.
Dr. Muhammad Irfan Khattak (University of Engineering & Technology), Dr. Elene Vedu Perez (International University of La Rioja), and Dr. Jiechao Gao (Columbia University) are collaborating with ASPDL. We would increase the number of collaborators both nationally and internationally.
Research Publications from Collaborations
- Almadhor, Ahmad, Rizwana Irfan, Jiechao Gao, Nasir Saleem, Hafiz Tayyab Rauf, and Seifedine Kadry. “E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition.” Expert Systems with Applications 222 (2023): 119797.
- Saleem, Nasir, Teddy Surya Gunawan, Mira Kartiwi, Bambang Setia Nugroho, and Inung Wijayanto. “NSE-CATNet: Deep Neural Speech Enhancement using Convolutional Attention Transformer Network.” IEEE Access (2023).
- Saleem, Nasir, Jiechao Gao, Rizwana Irfan, Ahmad Almadhor, Hafiz Tayyab Rauf, Yudong Zhang, and Seifedine Kadry. “DeepCNN: Spectro‐temporal feature representation for speech emotion recognition.” CAAI Transactions on Intelligence Technology (2023).
- Saleem, Nasir, Muhammad Irfan Khattak, Salman A. AlQahtani, Atif Jan, Irshad Hussain, Muhammad Naeem Khan, and Mostafa Dahshan. “U-Shaped Low-Complexity Type-2 Fuzzy LSTM Neural Network for Speech Enhancement.” IEEE Access 11 (2023): 20814-20826.
- Khattak, Muhammad Irfan, Nasir Saleem, Jiechao Gao, Elena Verdu, and Javier Parra Fuente. “Regularized sparse features for noisy speech enhancement using deep neural networks.” Computers and Electrical Engineering 100 (2022): 107887.
- Saleem, Nasir, Jiechao Gao, Muhammad Irfan, Elena Verdu, and Javier Parra Fuente. “E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven speech synthesis.” Image and Vision Computing 119 (2022): 104389.
- Saleem, Nasir, Jiechao Gao, Muhammad Irfan Khattak, Hafiz Tayyab Rauf, Seifedine Kadry, and Muhammad Shafi. “Deepresgru: residual gated recurrent neural network-augmented kalman filtering for speech enhancement and recognition.” Knowledge-Based Systems 238 (2022): 107914.