Skip to content

ONE_HOT_ENCODING

Create a one-hot encoding from a dataframe containing categorical features. Params: data : DataFrame The input dataframe containing the categorical features. feature_col : DataFrame A dataframe whose columns are used to create the one hot encoding. For example, if 'data' has columns ['a', 'b', 'c'] and 'feature_col' has columns ['a', 'b'], then the one hot encoding will be created only for columns ['a', 'b'] against 'data'. Defaults to None, meaning that all columns of categorizable objects are encoded. Returns: out : DataFrame The one hot encoding of the input features.
Python Code
from typing import Optional

import pandas as pd
from flojoy import DataFrame, flojoy


@flojoy
def ONE_HOT_ENCODING(
    data: DataFrame,
    feature_col: Optional[DataFrame] = None,
) -> DataFrame:
    """Create a one-hot encoding from a dataframe containing categorical features.

    Parameters
    ----------
    data : DataFrame
        The input dataframe containing the categorical features.
    feature_col: DataFrame, optional
        A dataframe whose columns are used to create the one hot encoding.
        For example, if 'data' has columns ['a', 'b', 'c'] and 'feature_col' has columns ['a', 'b'],
        then the one hot encoding will be created only for columns ['a', 'b'] against 'data'.
        Defaults to None, meaning that all columns of categorizable objects are encoded.

    Returns
    -------
    DataFrame
        The one hot encoding of the input features.
    """

    df = data.m
    if feature_col:
        encoded = pd.get_dummies(df, columns=feature_col.m.columns.to_list())

    else:
        cat_df = df.select_dtypes(include=["object", "category"]).columns.to_list()
        encoded = pd.get_dummies(df, columns=cat_df)

    return DataFrame(df=encoded)

Find this Flojoy Block on GitHub

Example

Having problems with this example app? Join our Discord community and we will help you out!
React Flow mini map

In this example, ONE_HOT_ENCODING is passed the tips dataset from the PLOTLY_DATASET node.

ONE_HOT_ENCODING is passed smoker,day for the columns parameter, so the output consists of a dataframe with one hot encodings for only the smoker and day columns from the input dataset.