
My Projects
ChatBot
Project Title: Chatbot for Question Answering using Seq2Seq and Attention
Status: In Progress
Tech Stack:
Python
BART-base Transformer
Seq2Seq with Luong Attention
Google Natural Questions
BLEU and ROUGE for evaluation
Developed on Kaggle
This project marks my first exploration into Natural Language Processing (NLP). The objective is to build a chatbot capable of answering user queries by training on Google’s Natural Questions dataset.
The first phase involved extensive data cleaning and preprocessing, which turned out to be one of the most challenging yet insightful parts of the project. Using regex and LanguageTool, I removed HTML tags, special characters, and corrected grammatical errors.
Next, I structured the dataset into suitable input-output pairs for training and performed tokenization and padding. The core of the model is built on a Seq2Seq architecture with Luong Attention, enabling the chatbot to focus on the most relevant parts of the input during response generation.
The model is currently under training, and performance evaluation will be conducted using BLEU and ROUGE scores.
Key Learnings:
Importance of clean and structured data in NLP workflows
Practical implementation of attention mechanisms
Working with real-world QA datasets at scale
Platform: Kaggle
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicting House Price
As part of my journey into AI/ML, I took on a fully self-guided project to predict house prices using real-world housing data. This project gave me end-to-end experience—from understanding the problem to building, evaluating, and optimizing models.
Problem Statement
Develop a machine learning model that can predict house prices based on features such as location, size, number of rooms, and other attributes.Tools & Technologies
Language: Python
Platform: Kaggle
Libraries: Scikit-learn, Pandas, NumPy, Matplotlib
Solution Approach
Performed feature engineering to add derived variables.
Built a preprocessing pipeline using SimpleImputer, StandardScaler, and encoders.
Trained two models: Multiple Linear Regression and Lasso Regression.
Compared performance using RMSE and R² metrics.
Model Performance
Linear Regression
RMSE Score - 22,296.42
R2 Score - 0.9456
Lasso Regression
RMSE Score - 21,407.10
R2 Score - 0.9498
Challenges & Learnings
Skewed data: Handled via log transformations to improve model stability.
Hyperparameter tuning: Carefully tuned the alpha value in Lasso to avoid underfitting/overfitting.
Feature scaling: Standardized features due to Lasso’s scale sensitivity.
This project was a valuable learning milestone in my AI/ML journey. It helped me build confidence in model building, pipeline creation, and evaluation. I’m now working on more advanced topics and applying these learnings in new domains.
