|
Understanding Counting Mechanisms in Large Language and Vision-Language Models
Hosein Hasani*, Amirmohammad Izadi*, Fatemeh Askari*, Mobin Bagherian,
Sadegh Mohammadian, Mohammad Izadi, Mahdieh Soleymani Baghshah
Under Review, 2025
arXiv /
|
|
Uncovering Grounding IDs: How External Cues Shape Multi-Modal Binding
Hosein Hasani*, Amirmohammad Izadi*, Fatemeh Askari*, Mobin Bagherian*,
Sadegh Mohammadian*, Mohammad Izadi, Mahdieh Soleymani Baghshah
Under Review, 2025
arXiv /
|
|
Sharif ML Lab
Vision Language Models • Apr. 2025 to Now
Advisor: Prof. Mahdieh Soleymani
|
|
|
Image Colorization
Implementing U-Net without skip connections using PyTorch for the image colorization task, trained and
evaluated on the CIFAR-10 dataset to assess the effect of skip connections.
Code /
|
|
Mnist GAN
Implemented MLP discriminator and generator networks with PyTorch.
Trained and evaluated the results on the MNIST dataset and visualized the results at each step.
Code /
|
|
MobileNet V1 & V2
Implementing MobileNet V1 and V2 with PyTorch and comparing training and evaluation time with a normal
CNN.
Code /
|
|
Semantic Segmentation Using U-Net
Applying a segmentation model to distinguish between different parts of the road, applicable in other
downstream tasks such as self-driving vehicles. Using the Cityscapes dataset for training the model.
Code /
|
|
MLP Digit Classifier from Scratch
Implementing an MLP with NumPy from scratch to classify handwritten digits, trained and evaluated on the
MNIST dataset.
Code /
|
|
Video Background Removal
Using SVD matrix factorization for image compression, background removal in videos, and foreground
detection. NumPy was used for this project.
Code /
|
|