Skip to content

Through Facebook user posts, implementing the traditional way of text analytics with SVM along with using convolution neural network to predict age

Notifications You must be signed in to change notification settings

HuyTu7/cnn_age_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 

Repository files navigation

Facebook User's Age Prediction

Description:

Mined and investigated unstructured Facebook user’s posts with Vietnamese language processing while comparing traditional ML methods with deep learning methods such as convolutional neural networks, and LSTM to predict the user’s age. Achieved accuracy = 81.4%.

Dataset Overview:

Supervised classification learning problem. Text -> Age Category (A: 18-23, B: 24-30, C: 30-40, D: 40+)

Overall: 22694 entries
A: 5465 entries (24.08%)
B: 7837 entries (34.53%)
C: 3957 entries (17.44%)
D: 896 entries (3.95%)

Steps:

  1. Mined unstructured FB user's posts (user's age need to be present)
  2. Preprocessed to categorize age into classes ()
  3. Investigate class's distributrion and preprocess posts:
    a) Replace emojis with " emoji_icon " to remove bias toward a specific emoji
    b) Tokenize Vietnamese words
    c) Remove Vietnamese stop-words
    d) Remove numbers and punctuations
    e) Collapse all posts into one vector
  4. Apply learners:
    a) traditional machine learning model:
        i. vectorize the words by frequency
        ii. max absolute scaling
        iii. apply SVM - accuracy: 50%
    b) deep learning model:
        i. only take the vector that is more than 200 items
        ii. padding vectors up to 800
        iii. apply CNN model:
            - embedding layer: 71% (I remember that the accuracy back in the summer was higher - around 80 % - need to check again)
            - word2vec (200 features and 15 contexts) : 60%

Files:

├── text_analysis\
|   ├── data_processing.ipynb             
|   ├── text_analysis.py          
|   ├── cnn_age_predict.ipynb      
|   ├── utils\  
|       ├── utils.py
|       ├── smote.py
|       ├── class_weights.py

About

Through Facebook user posts, implementing the traditional way of text analytics with SVM along with using convolution neural network to predict age

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published