Skip to content

collections of data science, machine learning and data visualization projects with pandas, sklearn, matplotlib, tensorflow2, Keras, various ML algorithms like random forest classifier, boosting, etc

License

Notifications You must be signed in to change notification settings

ptyadana/Data-Science-and-Machine-Learning-Projects-Dojo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science, Machine Learning & Visualization Dojo

Collections of Data Science & ML projects and dojo where I practice Data Science, Machine Learning, Deep Learning and Data Visualization related skills, theories, probability, statistics, etc.

Built with

Machine Learing, Deep Learning, Data Science libraries

  • NumPy - package for scientific computing with Python
  • Pandas - fast, powerful, flexible and easy to use open source data analysis and manipulation tool
  • Pandas Profiling - generate reports from dataframe
  • Geo Pandas - support for geographic data to pandas objects.
  • Scikit-learn - Simple and efficient tools for predictive data analysis
  • TensorFlow - An end-to-end open source machine learning platform
  • Keras - Deep Learning framework
  • NLTK - Natural Language Toolkit
  • dlib - A toolkit for making real world machine learning and data analysis applications in C++
  • Face Recognition - The world's simplest facial recognition api for Python and the command line

Data Visualization libraries

  • Matplotlib - a comprehensive library for creating static, animated, and interactive visualizations in Python
  • Seaborn - statistical data visualization
  • Bokeh - interactive visualization library for modern web browsers
  • Plotly - The front-end for ML and data science models
  • Cufflinks - Productivity Tools for Plotly + Pandas

Turning into Web applications

  • Streamlit - The fastest way to build and share data apps
  • Flask - a micro web framework written in Python

Spark

  • Apache Spark - a unified analytics engine for large-scale data processing.
  • Spark with pyspark - PySpark is the collaboration of Apache Spark and Python
  • Databricks - Unified Data Analytics Platform - One cloud platform for massive scale data engineering and collaborative data science.

Tools and Datasources


Projects

Data Analysis and Visualization Capstone project from Machine Learning and Datascience Masterclass Course.

  • This is the data behind the story Be Suspicious Of Online Movie Ratings, Especially Fandango’s
  • using data from 538
  • If you are planning on going out to see a movie, how well can you trust online reviews and ratings? Especially if the same company showing the rating also makes money by selling movie tickets.
  • Do they have a bias towards rating movies higher than they should be rated?
  • etc..
  • This project is to build a machine learning model to predict whether or not a customer will Churn or not.
  • Includes cohort analysis based on Telco subsriber's contract type, etc.

Milestone project from Complete Machine Learning and Data Science - Zero to Mastery course.

Milestone project from Complete Machine Learning and Data Science - Zero to Mastery course.

Project from Complete Machine Learning and Data Science - Zero to Mastery course.

Data Analysis and Visualization Capstone project from Data Science and Machine Learning Bootcamp Course.

  • analyzing 911 calls data from kaggle
  • top 5 zips code for 911 calls
  • top 5 townships for 911 calls
  • most common Reason for a 911
  • different types of visualizations based on the findings
  • etc..
  • Machine learning app using streamlit, for building a regression model using the Random Forest algorithm.

Machine Learning & Data Science Projects

Masterclass Projects

Other Projects

Deep Learning Projects

Data Analysis and Visualization Projects

  • Data Visualization with Python - Project: Data analysis and Data Visualization using Pandas, Matplotlib for Countries's GDP, Life Expectancy comparison across continents, GDP per Capita Relative Growth, Population Reative Growth comparison etc.
  • Fuel Economy Case Study - Project: Analyzing Fuel Economy Data provied by EPA for distributions of greenhouse gas score, combined mpg in 2008 and 2018, correlation between displacement and combined mpg ,greenhouse gas score and combined mpg. Are more unique models using alternative fuels in 2018 compared to 2008? By how much? How much have vehicle classes improved in fuel economy (increased in mpg)? What are the characteristics of SmartWay vehicles? Have they changed over time? (mpg, greenhouse gas) What features are associated with better fuel economy (mpg)? What is the top vehicle which improved the most in terms of combined mpg from 2008 to 2018?
  • Wine Quality Case Study - Project: Analyzing wine data for the following points for wine businesses to model better wine. Is a certain type of wine (red or white) associated with higher quality? What level of acidity (pH value) receives the highest average rating? Do wines with higher alcoholic content receive better ratings? Do sweeter wines (more residual sugar) receive better ratings? White Vs Red Wine Proportions by Color & Quality
  • TV, Halftime Shows, and the Big Game - Project: Analyzing Superbowls data and answering questions like - What are the most extreme game outcomes? How does the game affect television viewership? How have viewership, TV ratings, and ad cost evolved over time? Who are the most prolific musicians in terms of halftime show performances?
  • Weather Trend - Project: Analyzing Global weather trends, Singapore weather trends, Comparing Global vs Singapore 10 years Moving Average trends
  • Real-time Insights from Social Media Data - Project: Analyzing Twitter data and answering questions like: What are gobal trend and local trends?, finding the common trends
  • frequency analysis on tweets and hashtags, etc.
  • Statistics From Stock Data: Analyzing google, apple and amzon stock prices and checking the rolling mean.
  • Android Play Store App Data Analysis - Project: Analyzing andriod play store data and answering questions like - How many apps are paid? How much money are they making? When were these apps released?

Bootcamps

  • 00. NumPy Crash Course
  • 01. Matplotlib Visualization
  • 02. Pandas and Scikit-learn
  • 03. ANNs
  • 04. CNNs
  • 05. Introduction to gym
  • 06. Classical Q Learning
  • 07. Deep Q Learning
  • 08. Deep Q Learning on Images
  • 09. Creating Custom Open AI Gym Environment
  • Section 2 - Google Colab
  • Section 3 - Machine Learning and Neurons
  • Section 4 - Feedforward Artifical Neural Networks
  • Section 5 - CNN Convolutional Neural Networks
  • Section 6 - RNN - Recurrent Neural Networks, Time Series, Sequence Data
  • Section 7 - NLP
  • Section 8 - Recommender Systems
  • Section 9 - Transfer Learning for Computer Vision
  • Section 10 - GANs
  • Section 11 - Deep Reinforcement Learning (Theory)
  • Section 12 - Stock Trading Project with DL
  • Section 13: Advanced Tensorflow Usage
  • Section 14: Low - Level Tensorflow
  • Section 15: In-Depth: Loss Functions
  • Section 16: In-Depth: Gradient Descent
  • Section 17 - 21: Misc
  • Week 01 - Sequences and Prediction
  • Week 02 - Deep Neural Networks for Time Series
  • Week 03 - Recurrent Neural Networks for Time Series
  • Week 04 - Real-world time series data
  • Week 01 - Sentiment in Text
  • Week 02 - Word Embeddings
  • Week 03 - Sequence Models
  • Week 04 - Sequence Models and Literature
  • Week 01 - Exploring a Larger Dataset
  • Week 02 - Augmentation: A technique to avoid overfitting
  • Week 03 - Transfer Learning
  • Week 04 - Multiclass Classification
  • Week 01 - A New Programming Paradigm
  • Week 02 - Introduction to Computer Vision
  • Week 03 - Enhancing Vision with CNN
  • Week 04 - Using Real-world images
  • 01. Introduction
  • 02. Deep Learning and Tensorflow Fundamentals
  • 03. Neural Network Regression with Tensorflow
  • 04. Neural Network Classification with Tensorflow
  • 05. Computer Vision and Convolutional Neural Networks in Tensorflow
  • 06. Transfer Learning - Feature Extraction
  • 07. Transfer Learning - Fine Tuning
  • 08. Transfer Learning - Scaling up
  • 09. Milestone Project 1 - Food Vision Big
  • 10. NLP Fundamentals in Tensorflow
  • 11. Milestone Project 2 - SkimLit
  • 12. Timseries Fundamentals + Milestone Project 3 - BitPredict
  • 13. Passing Tensorflow Certificate Exam
  • 15. Appendix - Machine Learning Primer
  • 16. Appendix - Machine Learning Framework
  • 14, 17-19. Misc
  • 03. Preprocessing
  • 04. Machine Learning Types
  • 05. Supervised Learning - Classification
  • 06. Supervised Learning - Regression
  • 07. Unsupervised Learning - Clustering
  • 08. Hyper Parameters Optimization

Complete Data Science Bootcamp - 365

  • Part 1 - The Field of Data Science
  • Part 2 - Probability
  • Part 3 - Statistics (Descriptive & Inferential)
  • Part 4 - Python
  • Part 5 - Advanced Statistical Methods in Python / Machine Learning in Python
  • Part 6 - Mathematics
  • Part 7 - Deep Learning
  • Software Integration
  • Case Study - Absenteeism

Books

  • The Fundamentals of Machine Learning
  • The Machine Learning Landscape
  • End-to-End Machine Learning Project
  • Classification
  • Training Models

The Hundreded page - Machine Learning book

  • Introduction
  • Notation and Definitions
  • Fundamental Algorithms
  • Anatomy of a Learning Algorithm
  • Basic Practice
  • Neural Networks and Deep Learning
  • Problems and Solutions
  • Advanced Practice
  • Unsupervised Learning
  • Unsupervised Learning - in-depth material
  • Other Forms of Learning
  • Conclusion

Advancing Machine Learning & Data Science Journey - (In Progress)

To skill up my ML & DS related skills in specific areas and topics:

  • Project: Titanic dataset
  • 01.ML Basic
  • 02.Preparing the Data
  • 03.Ensemble Learning
  • 04.Boosting
  • 05.Bagging
  • 06.Stacking
  • 07.Evaluation and Selection of Models
  • Project: Titanic dataset
  • 01.ML Basic
  • 02.Intro to Feature Engineering
  • 03.Explore Data
  • 04.Create and Clean Features
  • 05.Prepare Features for Modelling
  • 06.Compare and Evaluate Models
  • Project: Titanic dataset
  • 01.Review of Foundation
  • 02.Logistic Regression
  • 03.Support Vector Machine
  • 04.Multi-layer Perceptron
  • 05.Random Forest
  • 06.Boosting
  • 07.Final Model Selection and Evaluation
  • Project: Titanic dataset
  • 01.ML Basic
  • 02.Exploratory Data Analysis and Data Cleaning
  • 03.Evaluation - Measuring Success
  • 04.Optimizing a Model
  • 05.End to End Pipeline
  • Assuming Data is good to go
  • Neglecting to consult subject matter experts
  • Overtiffing your models
  • Not standardizing your data
  • Focusing on Wrong Factors
  • Data Leakage
  • Forgetting traditional statistics tools
  • Assuming Deployment is a breeze
  • Assuming Machine Learning is the answer
  • Developing in a silo
  • Not treating for imbalanced sampling
  • Interpreting your coefficients without properly treating for multicollinearity
  • Evaluating by accuracy alone
  • Giving overly technical presentations

Deep Learning , Machine Learning, AI & Data Science

Data Analysis, Manipulation & Data Visualization

Apache Spark & PySpark

Data Scientist Reading Materials

  • Supervised Learning
    • Lesson 01: Machine Learning Bird's Eye View
    • Lesson 02: Linear Regression
    • Lesson 03: Perceptron Algorithm
    • Lesson 04: Decision Trees
    • Lesson 05: Naive Bayes
    • Lesson 06: Support Vector Machines
    • Lesson 07: Ensemble Methods
    • Lesson 08: Model Evaluation Metrics
    • Lesson 09: Training and Tuning
    • Lesson 10: Finding Donors Project
  • Python
  • Pandas
  • Data Cleaning
  • Introduction to Machine Learning
  • Machine Learning Intermediate
  • Feature Engineering
  • Machine Learning Explaniability
  • Data Visualization
  • Intro to Deep Learning
  • Intro to Game AI and Reinforcement Learning
  • Natural Language Processing
  • Micro-challenges
  • Computer Vision
  • Intro to SQL
  • Advanced SQL
  • ML Crash Course
  • Problem Framing
  • Data Prep
  • Clustering
  • Recommendation
  • Testing and Debugging
  • GANs
  • Linear Regression Analysis
  • Multi Regression Analysis
  • Pratical Statistics
  • Excel Data Manipulation, Analysis and Visualization

Topics include:

  • Set theory, including Venn diagrams
  • Properties of the real number line
  • etc

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Releases

No releases published

Packages

No packages published