Feature Engineering
easy conversion of jupyter notebook
1. Jupyter Notebook based-development
Intuition, develop and test all logic in .ipynb
file, then runjupyter nbconvert --to script train_model.ipynb
to convert .ipynb
file to .py
file directly. This is an amazing trick!!!
Handling Missing Values
- Cause of missing
- The value itself (e.g. certain groups of people)
- Another variable
- No reason
- Handling
- Deletion
- By Row
- By Column
- Imputation
- Deletion
Scaling
Discretization
Categorical Feature Encoding
[IMPT] industrially adopted encoding trick - hashing
- Hash each category
- New incoming category gets hashed to an existing index
- Random collision not too bad
- Significantly resolved “Unknown category” problem