02 Jul 2020

One-Hot Encoding

What is categorical data?

→ Variables that contain label values.

→ Categorical variables are also called “nominal”

→ eg. A “pet” variable with the values “dog” and “cat”

Categorical data must be converted to a numerical form. We can do it by integer encoding or one-hot encoding.

Integer Encoding

Also known as label encoding.

red 1
blue 2
orange 3

The problem is that integer values have a natural ordered relationship between each other and machine learning algorithms can accidentally learn that order in applications where there isn’t one (like colours).

One-Hot Encoding

For categorical variables where no ordinal relationship exists it’s better to use one-hot encoding.

red green blue
1 0 0
0 1 0
0 0 1