Experiences

I graduated at 2024 and it has been almost around 1 year I have been working full time now on PremAI. During my college days I also had done some couple of internships. I am sharing all those great experiences here.

Prem AI

(December 2023 - August 2025) | Lugano , Switzerland (Remote India)

I have worked at PremAI for almost 2 years now. Here are some notable piece of experiences, I would like to share.

💎 PremSQL

Alt

I created and currently maintaining PremSQL. It is an open source, local first library to create end-to-end pipelines for Text to SQL. User can either use our pipelines for running text to SQL or can customise and build on top of the tools provided by the library for more personalised and use-case specific experience. It comes with pre-curated datasets, evaluation tools, DB connectors, models, fine-tuning etc.
PremSQL also comes with Prem-1B-SQL (600 + hf downloads) which is the first 1B parameter models (belonging to this library) which is focussed on SQL generation from natural Language. It was made by full fine-tuning DeepSeek 1.3B on various carefully curated Text to SQL datasets (which are also available on PremSQL library).
Prem-1B-SQL also is the first 1B parameter model which achieved 51.54% of accuracy on BirdBench private test set evaluation. And on par with closed source models and outperforms some 7B open source models like Qwen 2.5-coder 7B. Read more about it in the release blogpost.

PremSQL is currently being maintained. The future focus is on synthetic data generations, better multi-agentic pipelines and a small playground to try the model / pipelines out.

📊 Benchmarks

Benchmarks is an another open-source library which compares different inference engine on the terms of throughput, memory-usage and quality of generations.

Ran performance benchmark on a A100 GPU for Mistral v0.1 instruct and Llama 2 7B Chat across 13 inference engines (example: DeepSpeed MII, vLLM, LlamaCPP, ctranslate, etc). We also ran this for four different precision (fp32/16 and int8/4).
Used throughput (token/sec), GPU memory consumption as key metrics for performance benchmarking. Also, ran these inferences on different type of prompts and compared the generation quality difference w.r.t. PyTorch implementation float 32 (source of truth) to see how generation quality differs when underlined engine or precision changes.
All the interesting observation is shared in this blog.

Benchmarks is not been maintained currently. The reason being, we almost covered all the popular inference engine and there has not been lot of breakthroughs in this space. Also the numbers are (very much) proportional to other model sizes. Last benchmark was done on: 30th April 2024

⛰️ More Open Source

PremAI comes with it's own SDK. so I did open source integrations of PremSDK on langchain, llama-index, DSPy and Qdrant
Created a cookbook for PremAI to showcase different usecases which can be done through Prem platform and using PremAI SDK.
Wrote different educational and release blogpost.

🎢 Additional works

Additionally I have also worked on research and serverless deployments of models using Modal Labs. Currently I am working on the platform and working on Prem API (using django rest framework).

CorridorPlatforms

Internship: (June 2023 to November 2023) | Bangalore

I worked on the fouding Generative AI team at Corridor. I made different PoCs like NER extraction, Question Answering using different LLMs.
Fine-tuned LLMs and SLMs, including Flan-T5, Llama, and Falcon, on domain-specific datasets. Conducted in-depth analysis of how does data quality affects fine-tuning, how different models adapt on different tasks while fine-tuning etc. Our journey and learnings are reflected in this blog.
Performed extensive research on local deployment of LLMs which includes trying out and experimenting on early stages of LlamaCPP, GPT4All etc. The objective was to understand various ways to deploy Large/Small Language models on-premise.

Voxela AI

Internship: (October 2022 - January 2023) | Santa Clara (Remote India)

Automated logging of fall/no-fall inference in a hospital setting using Google cloud APIs into a single platform, improving efficiency and enabling third party reviewers to mark false positives for retraining fall detection model
Incorporated custom (image/video) benchmarking into the existing ML model development pipeline to automatically benchmark newly developed/experimented models, model formats (TensorRT, ONNX), or optimization methods such as quantization, pruning. Thus, the time required for manual benchmarking was drastically reduced, resulting in a smooth ML development process.