CyclicalTransformer¶
API Reference¶
- class feature_engine.creation.CyclicalTransformer(variables=None, max_values=None, drop_original=False)[source]¶
The CyclicalTransformer() applies cyclical transformations to numerical variables. The transformations returns 2 new features per variable, according to:
var_sin = sin(variable * (2. * pi / max_value))
var_cos = cos(variable * (2. * pi / max_value))
where max_value is the maximum value in the variable, and pi is 3.14…
Motivation: There are some features that are cyclic by nature. For example the hours of a day or the months in a year. In these cases, the higher values of the variable are closer to the lower values. For example, December (12) is closer to January (1) than to June (6). By applying a cyclical transformation we capture this cycle or proximity between values.
The CyclicalTransformer() works only with numerical variables. Missing data should be imputed before applying this transformer.
A list of variables can be passed as an argument. Alternatively, the transformer will automatically select and transform all numerical variables.
- Parameters
- variables: list, default=None
The list of numerical variables to transform. If None, the transformer will automatically find and select all numerical variables.
- max_values: dict, default=None
A dictionary with the maximum value of each variable to transform. Useful when the maximum value is not present in the dataset. If None, the transformer will automatically find the maximum value of each variable.
- drop_original: bool, default=False
If True, the original variables to transform will be dropped from the dataframe.
Attributes
max_values_:
The maximum value of the cyclical feature.
variables_:
The group of variables that will be transformed.
n_features_in_:
The number of features in the train set used in fit.
References
http://blog.davidkaleko.com/feature-engineering-cyclical-features.html
Methods
fit:
Learns the maximum values of the cyclical features.
transform:
Applies the cyclical transformation, creates 2 new features.
fit_transform:
Fit to data, then transform it.
- fit(X, y=None)[source]¶
Learns the maximum value of each of the cyclical variables.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The training input samples. Can be the entire dataframe, not just the variables to transform.
- y: pandas Series, default=None
It is not needed in this transformer. You can pass y or None.
- Returns
- self
- Raises
- TypeError
If the input is not a Pandas DataFrame.
- ValueError:
If some of the columns contains NaNs.
If some of the mapping keys are not present in variables.
- transform(X)[source]¶
Creates new features using the cyclical transformation.
- Parameters
- X: Pandas DataFrame of shame = [n_samples, n_features]
The data to be transformed.
- Returns
- X: Pandas dataframe.
The dataframe with the additional new features. The original variables will be dropped if drop_originals is False, or retained otherwise.
- Raises
- TypeError
If the input is not Pandas DataFrame.
Example¶
import pandas as pd
from sklearn.model_selection import train_test_split
from feature_engine.creation import CyclicalTransformer
df = pd.DataFrame({
'day': [6, 7, 5, 3, 1, 2, 4],
'months': [3, 7, 9, 12, 4, 6, 12],
})
cyclical = CyclicalTransformer(variables=None, drop_original=True)
X = cyclical.fit_transform(df)
print(cyclical.max_values_)
{'day': 7, 'months': 12}
print(X.head())
day_sin day_cos months_sin months_cos
1 -0.78183 0.62349 1.0 0.0
2 0.0 1.0 -0.5 -0.86603
3 -0.97493 -0.222521 -1.0 -0.0
4 0.43388 -0.900969 0.0 1.0
5 0.78183 0.62349 0.86603 -0.5
6 0.97493 -0.222521 0.0 -1.0
7 -0.43388 -0.900969 0.0 1.0