I enjoy contributing to open source projects. I am lucky that my current company role highly aligns with some open-source work. Here I will be sharing some of my notable / major open source contributions.
Currently working on:
These days I do not have much time working on open source. Most of my "open-source" time goes in building PremSQL. However here are some of the WIPs that I wanna fininsh:
- Refactor existing HuggingFace support to a more robust one #1044 (DSPy Stanford).
This PR was making some major enhancements to HuggingFace model connector of DSPy such that it can support different model types / inference strategies etc.
Previously
2023 has been the year where I first started open source. Here are all of the notable contributions (I am kinda proud of) in chronological order.
⚡️ Lightning AI
- Add script to prepare dataset from csv #462 (Sep 14, 2023)
During early times of LitGPT, this PR helped to load fine-tunable datasets directly from CSV files.
🎢 Prompt2Model (CMU)
- Optuna Integration for automated Hyper parameter search #315 (Oct 30, 2023)
Prompt2Model was an amazing research implementation of creating and training deep learning models by just giving out natural language instruction. It will automatically find datasets and sutitable base model from HF and train them. This PR added optuna's integration to automatically do hyper parameter search and optimize the model.
🧐 DeepEval
Currently the third contributor of DeepEval. There was a time, I was contributing full time to DeepEval. It was an amazing experience. I mostly worked on adding different types of metrics and scoring mechanism in DeepEval.
Normalize Text function for comparing generations #237 (Oct 26 2023)
Feature: Add a metrics class to organize atomic set of re-usable metrics #239 (Oct 26, 2023)
Feature: Exact match metric #252 (Oct 27, 2023)
Tests for metrics #257 (Oct 29, 2023)
Refactoring of file structure and adding new scoring function and tests. #260 (Oct 29, 2023)
This one added bunch of score function like exact-match, quasi-exact match, rougue score, sentence_bleu_score etc. And added a whole refactor of the file-structuring for better devex
Feat: Added BertScore metrics #261 (Oct 29, 2023)
Feature: Added Faithfulness score. #264 (Nov 4, 2023)
Feat: Add toxic score function #273 (Nov 15, 2023)
Refactor DeepEval Scoring. #305 (Nov 20, 2023)
Another major refactor of DeepEval models and scorer modules with better structuring and DevEx.
🤗 HuggingFace community courses
I basically wrote the course chapter for Fine-tuning using ViT for object detection. You can check out the chapter here.
🦜⛓️ Langchain
(Maintained from March 26 to July 24 2024)
🗂️ LlamaIndex
(Maintained from March 15 to June 15 2024)
🏫 Stanford DSPy
(Maintained from May 21 to June 27 2024)
This was an interesting experience, where I learned a lot of working of TensorRT LLM and inference using TensorRT LLM and also about the DSPy ecosystem.