Research & Projects

Publications

BaCP: Backbone Contrastive Pruning for Preserving Representations in Extremely Sparse Neural Networks
M. H. Khawaja, M. Haseeb, M. F. Shoaib, M. Tahir. (Submitted to AAAI 2025)

As a contributor to this work, I tackled the problem of representational collapse in highly sparse networks. We designed and implemented the multi-objective loss function that aligns sparse embeddings against three critical references (pretrained, fine-tuned, and historical). We tested our framework across CNNs, Vision Transformers, and Language Models to verify that our method preserves accuracy at extreme sparsity.

Research Projects

Interpretability & Domain Generalization
Advisors:Dr. Muhammad Tahir, CITY at LUMS

I'm researching to solve domain generalization by moving beyond traditional training methods to instead reverse-engineer a model's internal logic. I use mechanistic interpretability to map a model's functional circuits, distinguishing the generalizable "core" pathways from the unreliable "shortcut" pathways that cause failures on new data. My key technique involves using cross-domain data as a "contrastive probe" to isolate the core circuits that are stable across different domains. In previous line of this work, I trained learnable visual queries for ViTs via Group Relative Query Optimization, achieving a 3% generalization boost on the PACS dataset.

Mechanistic Analysis of Circuit Preservation in Federated Learning
Advisors:Dr. Muhammad Tahir, CITY at LUMS

I led a research project analyzing the internal failure modes of Federated Learning under Non-IID data using mechanistic interpretability. We investigated how class-specific circuits learned by different clients diverge during training and interact during aggregation. By analyzing sparse models with circuit extraction, sparse autoencoders, and linear probes, we showed that informative latent features remain present even when global model accuracy degrades, suggesting that aggregation disrupts internal circuit organization rather than eliminating learned representations.

Quantization of Diffusion Models
Advisors:Dr. Zubair Khalid, CITY at LUMS & 10x Engineers

Researched on quantization techniques to reduce the compute and memory cost of Diffusion Transformers. For timestep dependent dynamic activations, I developed a novel method that conditions a predictor network on the physical Log-Signal-to-Noise Ratio (Log-SNR) at a specific timestep to predict quantization parameters. This approach accounts for non-stationary signal dynamics and achieved a 55% reduction in perceptual error (LPIPS). For static weights, I implemented Hessian-based optimization methods GPTQ and Qronos, leveraging second-order curvature information to accurately compress the model to 4-bit format while accounting for quantization error.

KL-Aware Quantization (KLAWQ)
Advisors:Dr. Murtaza Taj, CVGL

Developed an augmented GPTQ framework that integrates knowledge distillation and supervised fine-tuning directly into the post-training quantization pipeline. My key technical innovation was combining the standard GPTQ Hessian with second-order curvature information from KL-divergence and cross-entropy objectives. This better preserves the teacher model's outputs, and my implementation achieved a 30% reduction in perplexity over the baseline GPTQ at equivalent bit-widths.

Single-Image Camera Calibration (SOFI-UGCL)
Advisors:Dr. Murtaza Taj, CVGL

Engineered a hybrid method for camera calibration combining a Transformer front-end with geometric post-processing. I took a pretrained Multi-Scale Deformable Transformer (SOFI) to predict geometric primitives like the zenith point and horizon line. From these predictions, I recovered camera's intrinsic (K) and extrinsic (R, t) parameters by enforcing mathematical constrains over the loss function, enabling recovery of the full projection matrix from a single image.