Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
-
Updated
Jun 9, 2024 - Python
Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
Up to 90% accuracy with just 5 features using KNN algorithm and PCA for feature engineering. The dataset contained less than 1000 observations. The model's accuracy could be improved using more observations, further hyperparameter optimization and feature engineering
This project develops a machine learning model to predict the salaries of baseball players based on their past performance.
HomePriceXpert is a web application that estimates the price of homes in Bangalore based on the estate's various features. (PS: The dataset used for this project is outdated, so the estimated prices might not reflect current market trends.)
Comprehensive notes and code on Python, data analysis, visualization, machine learning, and deep learning from my data science learning journey.
Self analytics project on daily-frequency ridership data for various public transport services across the country. Sourced from data.gov.my(Prasarana/MyRapid)
Heart failure is a severe condition in which the heart is unable to pump blood effectively. Early prediction of heart failure can significantly improve patient outcomes. This project aims to build a predictive model using machine learning techniques to identify patients at risk of heart failure.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Predicting employee salaries based on job type, degree, major, industry, years of experience, and distance from a metropolis using machine learning techniques.
Sales Analysis using SQL
This project analyzes tumor cell data from 550 patients using Python. It involves data cleaning, exploratory analysis, feature engineering, and machine learning to classify tumors as malignant or benign. Techniques include PCA, logistic regression, and k-fold cross-validation to ensure model accuracy and reliability.
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
The repository presents the notebooks used for my experimental thesis entitled: "Experimental Study of the Steel Market Through CNN-LSTM Deep Learning Models: Practical Applications for Cost Reduction in Industries"
Smart Meter Analytics Python - A Python implementation for analysis of energy consumption data (electricity, gas, water) at different data measurement intervals. The package provides feature extraction methods and algorithms to prepare data for data mining and machine learning applications
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Hopsworks - Data-Intensive AI platform with a Feature Store
Add a description, image, and links to the feature-engineering topic page so that developers can more easily learn about it.
To associate your repository with the feature-engineering topic, visit your repo's landing page and select "manage topics."