My Projects
  • ChatBot

    Project Title: Chatbot for Question Answering using Seq2Seq and Attention

    Status: In Progress

    Tech Stack:

    • Python

    • BART-base Transformer

    • Seq2Seq with Luong Attention

    • Google Natural Questions

    • BLEU and ROUGE for evaluation

    • Developed on Kaggle

    This project marks my first exploration into Natural Language Processing (NLP). The objective is to build a chatbot capable of answering user queries by training on Google’s Natural Questions dataset.

    The first phase involved extensive data cleaning and preprocessing, which turned out to be one of the most challenging yet insightful parts of the project. Using regex and LanguageTool, I removed HTML tags, special characters, and corrected grammatical errors.

    Next, I structured the dataset into suitable input-output pairs for training and performed tokenization and padding. The core of the model is built on a Seq2Seq architecture with Luong Attention, enabling the chatbot to focus on the most relevant parts of the input during response generation.

    The model is currently under training, and performance evaluation will be conducted using BLEU and ROUGE scores.

    Key Learnings:

    • Importance of clean and structured data in NLP workflows

    • Practical implementation of attention mechanisms

    • Working with real-world QA datasets at scale

    Platform: Kaggle

    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  • Predicting House Price

    As part of my journey into AI/ML, I took on a fully self-guided project to predict house prices using real-world housing data. This project gave me end-to-end experience—from understanding the problem to building, evaluating, and optimizing models.

    Problem Statement
    Develop a machine learning model that can predict house prices based on features such as location, size, number of rooms, and other attributes.

    Tools & Technologies

    • Language: Python

    • Platform: Kaggle

    • Libraries: Scikit-learn, Pandas, NumPy, Matplotlib

    Solution Approach

    • Performed feature engineering to add derived variables.

    • Built a preprocessing pipeline using SimpleImputer, StandardScaler, and encoders.

    • Trained two models: Multiple Linear Regression and Lasso Regression.

    • Compared performance using RMSE and R² metrics.

    Model Performance

    • Linear Regression

      • RMSE Score - 22,296.42

      • R2 Score - 0.9456

    • Lasso Regression

      • RMSE Score - 21,407.10

      • R2 Score - 0.9498

    Challenges & Learnings

    • Skewed data: Handled via log transformations to improve model stability.

    • Hyperparameter tuning: Carefully tuned the alpha value in Lasso to avoid underfitting/overfitting.

    • Feature scaling: Standardized features due to Lasso’s scale sensitivity.

    This project was a valuable learning milestone in my AI/ML journey. It helped me build confidence in model building, pipeline creation, and evaluation. I’m now working on more advanced topics and applying these learnings in new domains.