Explaining Articles using LLMs [Writing]

Natural Language Processing is the backbone of Language models. They are used in a variety of tasks such as Machine Translation, Text Summarization, Question Answering, etc. This writing tries to understand how LLMs can be used to explain news articles by using the pre-trained LLMs exposed on APIs such as OpenAI and GPT3.5 and how they can used for the task of article summarization, keyword generation, etc.

Image

Steamgestion - A Data Ingestion Pipeline [Writing] [Github]

Gaming industry is one of the most prominent industries in the market. To determine the popularity of a game, reviews are of paramount importance. This project aims to analyse Steam reviews dataset using a Distributed System Design which is a Flask asynchronous backend which incorporates an Elasticsearch engine deployed in a Docker-Kubernetes environment where data ingestion queues are handled by RabbitMQ, processes are handled by Celery & data is cached in Redis.

Image

Processing Image Advertisements for Contextual Analysis [Writing] [Github]

Image based advertisements are still one of the best ways to promote products but it is difficult to personalize the content for the audience and covey the context. This study tries to compare three backbone deep learning architectures namely, ResNet 50, MobileNetv3 Large and EfficientNet B3 on an image advertisement dataset to classify the underlying contexts or sentiments understood by the consumers. Transfer learning is applied to mitigate the small dataset problem.

Image

Credit Risk Modelling [Writing] [Github]

Credit risk is the risk of loss that may occur from the failure of any party to abide by the terms and conditions of any financial contract, principally, the failure to make required payments on loans. This project aims to predict the credit risk of a customer. The data is cleaned, preprocessed, visualised and then used to various machine learning algorithms by oversampling and undersampling the dataset.

Image

Lung Disease Classification using Chest X-rays [Writing] [Github]

Chest X-rays scans are among the most accessible ways to diagnose lung diseases. This study tries to compare the detection of lung diseases using xray scans from three different datasets using three different neural network architectures using Pytorch and perform an ablation study by changing learning rates. The dimensional understanding is visualized using t-SNE and the ditection of thorax deseases in x-ray scans is visualized using Grad-CAM.

Image

Spotify Data Engineering and Analysis [Github]

Pioneering companies like Facebook, Netflix, and Airbnb pioneered "data engineering" to manage vast real-time data, crafting custom solutions. This project showcases leveraging Apache Airflow for an ETL pipeline from Spotify API, focusing on user activity analysis. Data extraction via Spotipy API, storage in SQL Server Management Studio using Snowflake Schema, and visualization via PowerBI are highlighted. Apache Airflow enables a comprehensive on-premise solution, marking a shift in data management paradigms.

Image

News Recommendation System [Github]

Nowadays, online news is accessible to millions with news articles from multiple sources. In order to help users find the right and relevant content, news recommender systems suggest articles that might be of interest for the news readers. So, using beautiful-soup to scrap news articles, their categories and descriptions a textual corpus is created. It then uses word embedding techniques such tf-idf, word2vec for content based news recommender models and LightRF, LightFM to explore collaborative filtering based recommender models.

Image

Depression Data Collection Portal [Writing] [Github]

A backend application based on Java Spring Boot that caters to the need of collecting data for patients with depression. Patients are able to register themselves, perform self assessment in order to get help from a counsellor or doctor. This self-assessment data will be communicated to a counsellor or doctor and then appropriate action will be taken accordingly.

Image

Football Player Market Value Prediction [Github]

You must have seen "Moneyball", where Peter Brand explains Billy Beane that "Its about getting things down to one number using stats the way we read them (players), we find value in player nobody else can see." So, to predict this Market Price of players, data from a csv is ingested and then applied to machine learning algorithms.

Image

Bird Call Audio Classification [Github]

With proper sound detection and classification, researchers can understand what birdcall signal the birds use, in order to communicate with each other or to warn others about the impending dangers in the vicinity. So, understanding the bird species via sound can able to detect theses dangers early. Mel Spectrograms of these bird audios are used as features to feed them to a CNN model in order to classify bird species.

Image

DonorFu [Devpost]

We find many groups on facebook catering to blood requests. Some groups are highly active and thus cluttered with requests, others dormant & requests are unanswered. DonorFu, a facebook messenger bot leverages access to groups approved by admins to match posts with potential donors to manage these groups which won $17,500 at FB Developer Community Challenge.

Projects - Harman Jolly