Projects and Research

I love to build. Currently, most of my time is spent working on PremSQL and contributing to existing open-source projects.

During my college days (mostly during my 2nd to 3rd year), I worked on Graph Machine Learning. Here are two key projects from that time that are worth mentioning:

💡 YogaPoseGNN

(Last major update: Jan 23, 2023)

The idea was simple: I used Graph Neural Networks (GNN) on human pose estimation to classify different yoga poses in real time.

Alt

The motivation behind this project came from the fact that Convolutional Neural Networks (CNNs) are highly accurate for image and video (which can be thought of as a stream of frames) tasks. However, they are resource-intensive, especially when running on a CPU with 4-8GB RAM.

So, I developed YogaPoseGNN, where I used MediaPipe to perform pose estimation and extracted body keypoints. These keypoints can be represented as a set of nodes and edges, where the nodes are the spatial points (x, y coordinates), and the edges have properties like angle and length. I trained a Graph Attention Network on a set of training data images.

I achieved an accuracy of 89% on the test set, and the system had low latency, working in real time with very limited computational resources. The key reason behind this performance was that the input space became a simple 2D matrix (node and edge features) rather than a 3-dimensional (Height x Width x Channels) input space, which significantly reduced the memory footprint while still preserving accuracy through the use of GNNs.

The best part? This approach can be transferred to other image or video-frame classification or regression tasks as well. Special mentions to JavlinGNN (my BTech minor project) and SignLangGNN, which were used to predict javelin throw distances and to classify American Sign Language gestures, respectively.

🪭 MultiHead VGAEs

(Last major update: Aug 4, 2022)

Alt

This was a research project I worked on with Dr. Rahee Walambe. The goal was to improve existing architectures for link prediction in citation graphs.

Link prediction is essentially determining the probability of two nodes being connected by a link. To excel in this task, you need powerful and representative embeddings.

We approached this using a generative model—specifically, a Graph Variational Autoencoder (VGAE). Here's how the architecture was designed:

Encoder: It had two parallel heads. One head used a Graph Convolutional Network (GCN) block, and the other used a Graph Attention Network (GAT) block. GCNs are excellent at capturing the spectral features of a graph, while GATs are great at capturing spatial relationships. By combining these (through feature concatenation), we passed them into a GCN block to compute the mean ((\mu)) and into a GAT block to compute the standard deviation ((\sigma)). These values were used to generate the latent vector through the reparameterization trick.
Decoder: The decoder was a simple inner dot product, which reconstructed the input graph (G) into a new graph (G'). We calculated the reconstruction loss between (G) and (G'). During training, some edges were masked (used as the validation set), and during validation, we checked how well the model predicted those masked edges.

The model achieved excellent results, outperforming previous models for link prediction tasks. We achieved AUC and AP scores of:

99.1%, 99.2% for Cora
98.5%, 98.3% for Citeseer
99.1%, 98.9% for PubMed

This represented an average improvement of 2% in AUC and 3% in AP scores.

Note: One limitation of this approach is that it's transductive, meaning it works only for a single graph structure where validation nodes still pass messages to training nodes. In contrast, an inductive approach would use a completely different graph structure for the validation set. That wasn't the focus of this project, though.

PS: Due to some issues during the COVID period, we faced delays in submitting this to a journal, so the paper hasn't been published yet.

Projects which were fun but did not work out

Well these two were not the ONLY thing I was building and became successful. There were lot of things I built (even now) but eventually it was either not maintained or we discontinued building them.

🕯️ Rewards

Something I still regret and am hugely proud of. I closely collaborated in this project with my friend Pratyush (the main creator of this project), where, the idea was to make a very simple Kaggle for RL like platform (or more accurately light weight Amazon DeepRacer.

We simulated a RL-controlled car-racing game and a platform where user can write a simple reward function (consisting of simple if-else statements / loops etc) and adjust hyper-parameters of neural network to make the car learn to drive some certain path.

We also took a hackathon for freshers (1st-2nd years) in our college where more than 250+ people competed in real time. They not only learned the basics of RL and neural network but it was overall a super fun experience. Being the creators of this, makes us super proud overall.

The regret? Well there were two regrets, we did not got the time to film this, since we (along with some juniors) were busy managing people and we did not continue the project after this.

🔬 Some more honourable mentions:

GPTRec (June 2024): We simulated a movie website and added a semantic search layer (using simple RAG and GPT) to recommend movies which aligns user's choice and search and make it more personalised.
PhiSys (March 2024): The main idea was to combine LLMs and RecSys models. Where the candidate generations would be done by Two-tower candidate generator models and using Small Language Models to re-rank the recommendations. The project is still at pause due to different commitments.
Closette AI (13th Sep 2024) : Another LLM x RecSys project. Where we combined RAG and Text to SQL for focussed personalized semantic product search. Here is a working demo video for the same. It was a nice all-nighter project for a hackathon. But unfortunately we did not win. It was fun though.