Audio and Speech Processing With Deep Learning (ASPDL) Research Group

regrp

The group aims to develop state-of-the-art technologies for audio and speech signal analysis and processing, with many applications in audio content analysis, context-aware devices, language learning, human-computer interaction, acoustic monitoring, multimedia information retrieval, hearing aids, and other assistive technologies.

Research Directions

  • Audio and Speech Processing with Deep Learning.
  • Audio-Only Speech Enhancement With Deep Learning.
  • Audio-Visual Speech Enhancement With Deep Learning.
  • Speech Emotion Recognition With Deep Learning.
  • Source Separation With Deep Learning.

Research PROFILE: Google Scholar, ORCID, Web of Sciences

Email: nasirsaleem@gu.edu.pk, Contact: +92-333-0613347
Department of Electrical Engineering, Faculty of Engineering & Technology,
Gomal University, Dera Ismail Khan-29050, KPK, PAKISTAN.

Dr. NASIR SALEEM received the B.S. Telecommunication Engineering degree from the University of Engineering and Technology, Peshawar, Pakistan in 2008, the M.S. Electrical Engineering degree from CECOS University, Peshawar, Pakistan in 2012; and Ph.D. degree in Electrical Engineering, specialization in Digital speech processing and Deep Learning from the University of Engineering and Technology, Peshawar, Pakistan in 2021. Currently, he is a Postdoctoral fellow at Islamic International University Malaysia (IIUM), researching modern Artificial Intelligence-based speech processing algorithms. From 2008 to 2012, he was a senior lecturer at Institute of Engineering Technology (IET), Gomal University, where he was involved in teaching and research. He is now an Assistant Professor in the Department of Electrical Engineering, Faculty of Engineering and Technology (FET), and Deputy Director Quality Assurance Directorate at Gomal University. Human-Machine Interaction, Speech Enhancement, Speech Processing, and Machine Learning Applications are the areas he is currently researching. He has published a number of research papers in well-known venues so far, such as Elsevier, Springer, and IEEE. He is also involved in academic activities such as a Guest Editor and reviewing papers from several well-known venues, including IEEE Transactions on Audio Speech and Language, IEEE Transactions on Human-Machine Systems, IEEE Transactions on Artificial Intelligence, IEEE Signal Processing, IEEE Multimedia, IEEE ACCESS, Expert Systems with Applications, Applied Acoustics, and Neural Networks Journal.

Ph.D. Students

Jawad ALI
Research: Audio-Only Speech Enhancement with E2E Neural Models.
Junaid MUSHTAQ
Research: Visually Driven E2E Multimodal Neural Speech Enhancement.
Fahad KHALIL
Research: Deep Audio-Visual Speech Enhancement and Recognition.
Fazal E WAHAB
Research: Real-Time Deep Speech Enhancement for Low-Resource Devices.

M.Sc. Students

Sher MUHAMMAD
Muhammad YOUSAF
Farman SAEED
Muhammad NOMAN
Muhammad SAMI
Research Publications (2023)
  1. Peracha, Fahad Khalil, Muhammad Irfan Khattak, Nema Salem, and Nasir Saleem. “Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network.” Plos one 18, no. 5 (2023): e0285629.
  2. Fazal E Wahab, Ye, Zhongfu, Nasir Saleem, and Hamza Ali. “Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement.” International Journal of Interactive Multimedia and Artificial Intelligence (2023).
  3. Fazal E Wahab, Ye, Zhongfu, Nasir Saleem, and Rizwan Ulah. “Real-Time Speech Enhancement in Frequency Domain.”Speech Communication (Under Review).
  4. Jawad Ali, Nasir Saleem. “Modeling Spatio-Temporal Features Representation With Recurrent Capsule Networks for Monaural Speech Enhancement“. Applied Acoustics (Under Review).
  5. Jawad Ali, Junaid Mushtaq, Sher Muhammad, Iqra Batool, Nasir Saleem. “End-to-End Waveform-based Lightweight Neural Models for Speech Enhancement. IEEE ICIT Conference (Under Review).
  6. Junaid Mushtaq, Jawad Ali, Sher Muhammad, Iqra Batool, Nasir Saleem “A Deep Neural Framework to Estimate Spectral Magnitude and Phase for Single-Channel Speech Enhancement. IEEE ICIT Conference (Under Review).

Research Collaborations


ASPDL shares its gratitude to Professor. Dr. Muhammad Irfan Khattak, University of Engineering & Technology, Peshawar for his continuous support.

The Audio and Speech Processing With Deep Learning (ASPDL) Research Group collaborates with researchers from various national and international universities in a non-profitable research partnership. Their aim is to conduct advanced research in audio, speech, image, and video processing. The group is motivated for taking part in research activities.

By conducting progressive research in audio, speech, image, and video processing, ASPDL in collaboration with UET Peshawar, Virginia University (USA), International University of La Rioja (Spain), and Islamic International University Malaysia (IIUM) can contribute to the development of cutting-edge algorithms and techniques for speech and audio processing.

Research Collaborators


Dr. Muhammad Irfan Khattak (University of Engineering & Technology), Dr. Elene Vedu Perez (International University of La Rioja), and Dr. Jiechao Gao (Columbia University) are collaborating with ASPDL. We would increase the number of collaborators both nationally and internationally.

Research Publications from Collaborations


  1. Almadhor, Ahmad, Rizwana Irfan, Jiechao Gao, Nasir Saleem, Hafiz Tayyab Rauf, and Seifedine Kadry. “E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition.” Expert Systems with Applications 222 (2023): 119797.
  2. Saleem, Nasir, Teddy Surya Gunawan, Mira Kartiwi, Bambang Setia Nugroho, and Inung Wijayanto. “NSE-CATNet: Deep Neural Speech Enhancement using Convolutional Attention Transformer Network.” IEEE Access (2023).
  3. Saleem, Nasir, Jiechao Gao, Rizwana Irfan, Ahmad Almadhor, Hafiz Tayyab Rauf, Yudong Zhang, and Seifedine Kadry. “DeepCNN: Spectro‐temporal feature representation for speech emotion recognition.” CAAI Transactions on Intelligence Technology (2023).
  4. Saleem, Nasir, Muhammad Irfan Khattak, Salman A. AlQahtani, Atif Jan, Irshad Hussain, Muhammad Naeem Khan, and Mostafa Dahshan. “U-Shaped Low-Complexity Type-2 Fuzzy LSTM Neural Network for Speech Enhancement.” IEEE Access 11 (2023): 20814-20826.
  5. Khattak, Muhammad Irfan, Nasir Saleem, Jiechao Gao, Elena Verdu, and Javier Parra Fuente. “Regularized sparse features for noisy speech enhancement using deep neural networks.” Computers and Electrical Engineering 100 (2022): 107887.
  6. Saleem, Nasir, Jiechao Gao, Muhammad Irfan, Elena Verdu, and Javier Parra Fuente. “E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven speech synthesis.” Image and Vision Computing 119 (2022): 104389.
  7. Saleem, Nasir, Jiechao Gao, Muhammad Irfan Khattak, Hafiz Tayyab Rauf, Seifedine Kadry, and Muhammad Shafi. “Deepresgru: residual gated recurrent neural network-augmented kalman filtering for speech enhancement and recognition.” Knowledge-Based Systems 238 (2022): 107914.