transforms - Transforming variables, scales and coordinates¶
"The Grammar of Graphics (2005)" by Wilkinson, Anand and Grossman describes three types of transformations.
Variable transformations - Used to make statistical operations on variables appropriate and meaningful. They are also used to new variables.
Scale transformations - Used to make statistical objects displayed on dimensions appropriate and meaningful.
Coordinate transformations - Used to manipulate the geometry of graphics to help perceive relationships and find meaningful structures for representing variations.
Variable and scale transformations are similar in-that they lead to
plotted objects that are indistinguishable. Typically, variable
transformation is done outside the graphics system and so the system
cannot provide transformation specific guides & decorations for the
plot. The trans
is aimed at being useful for scale and
coordinate transformations.
- class mizani.transforms.asn_trans(*, domain: DomainType = (-inf, inf), transform_is_linear: bool = True, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Arc-sin square-root Transformation
- class mizani.transforms.atanh_trans(*, domain: DomainType = (-inf, inf), transform_is_linear: bool = True, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Arc-tangent Transformation
- class mizani.transforms.boxcox_trans(p: float, offset: int = 0, *, domain: DomainType = (-inf, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Boxcox Transformation
The Box-Cox transformation is a flexible transformation, often used to transform data towards normality.
The Box-Cox power transformation (type 1) requires strictly positive values and takes the following form for \(y \gt 0\):
\[y^{(\lambda)} = \frac{y^\lambda - 1}{\lambda}\]When \(y = 0\), the natural log transform is used.
- Parameters:
- p
float
Transformation exponent \(\lambda\).
- offset
int
Constant offset. 0 for Box-Cox type 1, otherwise any non-negative constant (Box-Cox type 2). The default is 0.
modulus_trans()
sets the default to 1.
- p
See also
References
Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 211-252. https://www.jstor.org/stable/2984418
John, J. A., & Draper, N. R. (1980). An alternative family of transformations. Applied Statistics, 190-197. http://www.jstor.org/stable/2986305
- class mizani.transforms.modulus_trans(p: float, offset: int = 1, *, domain: DomainType = (-inf, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Modulus Transformation
The modulus transformation generalises Box-Cox to work with both positive and negative values.
When \(y \neq 0\)
\[y^{(\lambda)} = sign(y) * \frac{(|y| + 1)^\lambda - 1}{\lambda}\]and when \(y = 0\)
\[y^{(\lambda)} = sign(y) * \ln{(|y| + 1)}\]- Parameters:
- p
float
Transformation exponent \(\lambda\).
- offset
int
Constant offset. 0 for Box-Cox type 1, otherwise any non-negative constant (Box-Cox type 2). The default is 1.
boxcox_trans()
sets the default to 0.
- p
See also
References
Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 211-252. https://www.jstor.org/stable/2984418
John, J. A., & Draper, N. R. (1980). An alternative family of transformations. Applied Statistics, 190-197. http://www.jstor.org/stable/2986305
- class mizani.transforms.datetime_trans(tz: tzinfo | str | None = None, *, domain: DomainType = (datetime.datetime(1, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC')), datetime.datetime(9999, 12, 31, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Datetime Transformation
- Parameters:
- tz
str
|ZoneInfo
Timezone information
- tz
Examples
>>> from zoneinfo import ZoneInfo >>> UTC = ZoneInfo("UTC") >>> EST = ZoneInfo("EST") >>> t = datetime_trans(EST) >>> x = [datetime(2022, 1, 20, tzinfo=UTC)] >>> x2 = t.inverse(t.transform(x)) >>> list(x) == list(x2) True >>> x[0].tzinfo == x2[0].tzinfo False >>> x[0].tzinfo.key 'UTC' >>> x2[0].tzinfo.key 'EST'
- breaks_func: BreaksFunction¶
Callable to calculate breaks
- format_func: FormatFunction¶
Function to format breaks
- transform(x: DatetimeArrayLike) NDArrayFloat [source]¶
Transform from date to a numerical format
The transform values a unit of [days].
- property tzinfo¶
Alias of tz
- class mizani.transforms.exp_trans(base: float = np.float64(2.718281828459045), *, domain: DomainType = (-inf, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Create a exponential transform class for base
This is inverse of the log transform.
- Parameters:
- base
float
Base of the logarithm
- base
- Returns:
- outtype
Exponential transform class
- class mizani.transforms.identity_trans(transform_is_linear: bool = True, *, domain: DomainType = (-inf, inf), breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Identity Transformation
Examples
The default trans returns one minor break between every pair of major break
>>> major = [0, 1, 2] >>> t = identity_trans() >>> t.minor_breaks(major) array([0.5, 1.5])
Create a trans that returns 4 minor breaks
>>> t = identity_trans(minor_breaks_func=minor_breaks(4)) >>> t.minor_breaks(major) array([0.2, 0.4, 0.6, 0.8, 1.2, 1.4, 1.6, 1.8])
- class mizani.transforms.log10_trans(base: float = 10, *, domain: DomainType = (2.2250738585072014e-308, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Log 10 Transformation
- class mizani.transforms.log1p_trans(*, domain: DomainType = (-inf, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Log plus one Transformation
- class mizani.transforms.log2_trans(base: float = 2, *, domain: DomainType = (2.2250738585072014e-308, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Log 2 Transformation
- class mizani.transforms.log_trans(base: float = np.float64(2.718281828459045), *, domain: DomainType = (2.2250738585072014e-308, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Create a log transform class for base
- Parameters:
- base
float
Base for the logarithm. If None, then the natural log is used.
- base
- Returns:
- outtype
Log transform class
- class mizani.transforms.probability_trans(distribution: str, *args, **kwargs)[source]¶
Probability Transformation
- Parameters:
- distribution
str
Name of the distribution. Valid distributions are listed at
scipy.stats
. Any of the continuous or discrete distributions.- args
tuple
Arguments passed to the distribution functions.
- kwargs
dict
Keyword arguments passed to the distribution functions.
- distribution
Notes
Make sure that the distribution is a good enough approximation for the data. When this is not the case, computations may run into errors. Absence of any errors does not imply that the distribution fits the data.
- class mizani.transforms.reverse_trans(*, domain: DomainType = (-inf, inf), transform_is_linear: bool = True, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Reverse Transformation
- class mizani.transforms.sqrt_trans(*, domain: DomainType = (0, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Square-root Transformation
- class mizani.transforms.symlog_trans(*, domain: DomainType = (-inf, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <mizani.breaks.breaks_symlog object>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Symmetric Log Transformation
They symmetric logarithmic transformation is defined as
f(x) = log(x+1) for x >= 0 -log(-x+1) for x < 0
It can be useful for data that has a wide range of both positive and negative values (including zero).
- breaks_func: BreaksFunction = <mizani.breaks.breaks_symlog object>¶
Callable to calculate breaks
- class mizani.transforms.timedelta_trans(*, domain: DomainType = (datetime.timedelta(days=-999999999), datetime.timedelta(days=999999999, seconds=86399, microseconds=999999)), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Timedelta Transformation
- breaks_func: BreaksFunction¶
Callable to calculate breaks
- format_func: FormatFunction¶
Function to format breaks
- transform(x: TimedeltaArrayLike) NDArrayFloat [source]¶
Transform from Timeddelta to numerical format
The transform values have a unit of [days]
- class mizani.transforms.pd_timedelta_trans(*, domain: DomainType = (<Mock name='mock.Timedelta.min' id='140516798122080'>, <Mock name='mock.Timedelta.max' id='140516798134416'>), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Pandas timedelta Transformation
- class mizani.transforms.pseudo_log_trans(sigma: float = 1, base: float = np.float64(2.718281828459045), *, domain: DomainType = (-inf, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Pseudo-log transformation
A transformation mapping numbers to a signed logarithmic scale with a smooth transition to linear scale around 0.
- Parameters:
- class mizani.transforms.reciprocal_trans(*, domain: DomainType = (-inf, inf), transform_is_linear: bool = False, breaks_func: BreaksFunction = <factory>, format_func: FormatFunction = <factory>, minor_breaks_func: MinorBreaksFunction | None = None)[source]¶
Reciprocal Transformation
- class mizani.transforms.trans(*, domain: 'DomainType' = (-inf, inf), transform_is_linear: 'bool' = False, breaks_func: 'BreaksFunction' = <factory>, format_func: 'FormatFunction' = <factory>, minor_breaks_func: 'MinorBreaksFunction | None' = None)[source]¶
- transform_is_linear: bool = False¶
Whether the transformation over the whole domain is linear. e.g. 2x is linear while 1/x and log(x) are not.
- breaks_func: BreaksFunction¶
Callable to calculate breaks
- format_func: FormatFunction¶
Function to format breaks
- property domain_is_numerical: bool¶
Return True if transformation acts on numerical data. e.g. int, float, and imag are numerical but datetime is not.
- minor_breaks(major: FloatArrayLike, limits: tuple[float, float] | None = None, n: int | None = None) NDArrayFloat [source]¶
Calculate minor_breaks
- breaks(limits: DomainType) NDArrayFloat [source]¶
Calculate breaks in data space and return them in transformed space.
Expects limits to be in transform space, this is the same space as that where the domain is specified.
This method wraps around
breaks_()
to ensure that the calculated breaks are within the domain the transform. This is helpful in cases where an aesthetic requests breaks with limits expanded for some padding, yet the expansion goes beyond the domain of the transform. e.g for a probability transform the breaks will be in the domain[0, 1]
despite any outward limits.- Parameters:
- limits
tuple
The scale limits. Size 2.
- limits
- Returns:
- outarray_like
Major breaks
- format(x: Any) Sequence[str] [source]¶
Format breaks
When subclassing, you can override this function, or you can just define format_func.
- diff_type_to_num(x: Any) FloatArrayLike [source]¶
Convert the difference between two points in the domain to a numeric
This function is necessary for some arithmetic operations in the transform space of a domain when the difference in between any two points in that domain is not numeric.
For example for a domain of datetime value types, the difference on the domain is of type timedelta. In this case this function should expect timedeltas and convert them to float values that compatible (same units) as the transform value of datetimes.
- Parameters:
- x
Differences