02 Jul 2020
→ Variables that contain label values.
→ Categorical variables are also called “nominal”
→ eg. A “pet” variable with the values “dog” and “cat”
Categorical data must be converted to a numerical form. We can do it by integer encoding or one-hot encoding.
Also known as label encoding.
red | 1 |
---|---|
blue | 2 |
orange | 3 |
The problem is that integer values have a natural ordered relationship between each other and machine learning algorithms can accidentally learn that order in applications where there isn’t one (like colours).
For categorical variables where no ordinal relationship exists it’s better to use one-hot encoding.
red | green | blue |
---|---|---|
1 | 0 | 0 |
0 | 1 | 0 |
0 | 0 | 1 |