pachyderm package #

Returns:

Binned chi squared calculated for each x value.

pachyderm.fit.cost_function.unravel_simultaneous_fits(functions)#

Unravel the cost functions from possible simultaneous fit objects.

The functions are unravel by recursively retrieving the functions from existing SimultaneousFit objects that may be in the list of passed functions. The cost functions store their fit data, so they are fully self contained. Consequently, we are okay to fully unravel the functions without worrying about the intermediate SimultaneousFit objects.

Parameters:: functions (Iterable[CostFunctionBase | SimultaneousFit]) – Functions to unravel.
Return type:: Iterator[CostFunctionBase]
Returns:: Iterator of the base cost functions.

pachyderm.fit.function module#

Functions for use with fitting.

class pachyderm.fit.function.AddPDF(*functions, prefixes=None, skip_prefixes=None)#

Bases: CombinePDF

Add functions (PDFs) together.

Parameters:

functions (Callable[..., float]) – Functions to be added.
prefixes (Sequence[str] | None) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.
skip_prefixes (Sequence[str] | None) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.

functions#: List of functions that are combined in the PDF.

func_code#: Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.

argument_positions#: Map of merged arguments to the arguments for each individual function.

class pachyderm.fit.function.CombinePDF(*functions, prefixes=None, skip_prefixes=None)#

Bases: EqualityMixin, ABC

Combine functions (PDFs) together.

Parameters:

functions (Callable[..., float]) – Functions to be added.
prefixes (Sequence[str] | None) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.
skip_prefixes (Sequence[str] | None) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.

functions#: List of functions that are combined in the PDF.

func_code#: Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.

argument_positions#: Map of merged arguments to the arguments for each individual function.

class pachyderm.fit.function.DividePDF(*functions, prefixes=None, skip_prefixes=None)#

Bases: CombinePDF

Divide functions (PDFs) together.

Parameters:

functions (Callable[..., float]) – Functions to be added.
prefixes (Sequence[str] | None) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.
skip_prefixes (Sequence[str] | None) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.

functions#: List of functions that are combined in the PDF.

func_code#: Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.

argument_positions#: Map of merged arguments to the arguments for each individual function.

class pachyderm.fit.function.MultiplyPDF(*functions, prefixes=None, skip_prefixes=None)#

Bases: CombinePDF

Multiply functions (PDFs) together.

Parameters:

functions (Callable[..., float]) – Functions to be added.
prefixes (Sequence[str] | None) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.
skip_prefixes (Sequence[str] | None) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.

functions#: List of functions that are combined in the PDF.

func_code#: Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.

argument_positions#: Map of merged arguments to the arguments for each individual function.

class pachyderm.fit.function.SubtractPDF(*functions, prefixes=None, skip_prefixes=None)#

Bases: CombinePDF

Subtract functions (PDFs) together.

Parameters:

functions (Callable[..., float]) – Functions to be added.
prefixes (Sequence[str] | None) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.
skip_prefixes (Sequence[str] | None) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.

functions#: List of functions that are combined in the PDF.

func_code#: Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.

argument_positions#: Map of merged arguments to the arguments for each individual function.

pachyderm.fit.function.extended_gaussian(x, mean, sigma, amplitude)#

Extended gaussian.

\[f = A / \sqrt{2 * \pi * \sigma^{2}} * \exp{-\frac{(x - \mu)^{2}}{(2 * \sigma^{2}}}\]

Parameters:

x (ndarray[Any, dtype[float64]] | float) – Value(s) where the gaussian should be evaluated.
mean (float) – Mean of the gaussian distribution.
sigma (float) – Width of the gaussian distribution.
amplitude (float) – Amplitude of the gaussian.

Return type:

ndarray[Any, dtype[float64]] | float

Returns:

Calculated gaussian value(s).

pachyderm.fit.function.gaussian(x, mean, sigma)#

Normalized gaussian.

\[f = 1 / \sqrt{2 * \pi * \sigma^{2}} * \exp{-\frac{(x - \mu)^{2}}{(2 * \sigma^{2}}}\]

Parameters:

x (ndarray[Any, dtype[float64]] | float) – Value(s) where the gaussian should be evaluated.
mean (float) – Mean of the gaussian distribution.
sigma (float) – Width of the gaussian distribution.

Return type:

ndarray[Any, dtype[float64]] | float

Returns:

Calculated gaussian value(s).

pachyderm.generic_class module#

Contains generic classes

class pachyderm.generic_class.EqualityMixin#

Bases: object

Mixin generic comparison operations using __dict__.

Can then be mixed into any other class using multiple inheritance.

Inspired by: https://stackoverflow.com/a/390511.

pachyderm.generic_config module#

Analysis configuration base module.

For usage information, see jet_hadron.base.analysis_config.

pachyderm.generic_config.apply_formatting_dict(obj, formatting)#

Recursively apply a formatting dict to all strings in a configuration.

Note that it skips applying the formatting if the string appears to contain latex (specifically, if it contains an “$”), since the formatting fails on nested brackets.

Parameters:

obj (Any) – Some configuration object to recursively applying the formatting to.
formatting (dict) – String formatting options to apply to each configuration field.

Returns:

Configuration with formatting applied to every field.

Return type:

dict

pachyderm.generic_config.create_key_index_object(key_index_name, iterables)#

Create a KeyIndex class based on the passed attributes.

This is wrapped into a helper function to allow for the __iter__ to be specified for the object. Further, this allows it to be called outside the package when it is needed in analysis tasks.

Parameters:

key_index_name (str) – Name of the iterable key index.
iterables (Mapping[str, Any]) – Iterables which will be specified by this KeyIndex. The keys should be the names of the values, while the values should be the iterables themselves.

Return type:

Returns:

A KeyIndex class which can be used to specify an object. The keys and values will be iterable.

Raises:

TypeError – If one of the iterables which is passed is an iterator that can be exhausted. The iterables must all be passed within containers which can recreate the iterator each time it is called to iterate.

pachyderm.generic_config.create_objects_from_iterables(obj, args, iterables, formatting_options, key_index_name='KeyIndex')#

Create objects for each set of values based on the given arguments.

The iterable values are available under a key index dataclass which is used to index the returned dictionary. The names of the fields are determined by the keys of iterables dictionary. The values are the newly created object. Note that the iterable values must be convertible to a str() so they can be included in the formatting dictionary.

Each set of values is also included in the object args.

As a basic example,

```pycon >>> create_objects_from_iterables( … obj=obj, … args={}, … iterables={“a”: [“a1”, “a2”], “b”: [“b1”, “b2”]}, … formatting_options={}, … ) (

KeyIndex, {“a”: [“a1”, “a2”], “b”: [“b1”, “b2”]}, {

KeyIndex(a = “a1”, b = “b1”): obj(a = “a1”, b = “b1”), KeyIndex(a = “a1”, b = “b2”): obj(a = “a1”, b = “b2”), KeyIndex(a = “a2”, b = “b1”): obj(a = “a2”, b = “b1”), KeyIndex(a = “a2”, b = “b2”): obj(a = “a2”, b = “b2”),

}

)#

type obj:

param obj:

The object to be constructed.

type args:

param args:

Arguments to be passed to the object to create it.

type iterables:

Mapping[str, Any]

param iterables:

Iterables to be used to create the objects, with entries of the form "name_of_iterable": iterable.

type formatting_options:

param formatting_options:

Values to be used in formatting strings in the arguments.

type key_index_name:

param key_index_name:

Name of the iterable key index.

returns:

Roughly, (KeyIndex, iterables, objects). Specifically, the: key_index is a new dataclass which defines the parameters used to create the object, iterables are the iterables used to create the objects, which names as keys and the iterables as values. The objects dictionary keys are KeyIndex objects which describe the iterable arguments passed to the object, while the values are the newly constructed arguments. See the example above.

rtype:

(object, list, dict, dict)

pachyderm.generic_config.determine_override_options(selected_options, override_opts, set_of_possible_options=None)#

Recursively extract the dict described in override_options().

In particular, this searches for selected options in the override_opts dict. It stores only the override options that are selected.

Parameters:

selected_options (tuple[Any, ...]) – The options selected for this analysis, in the order defined used with override_options() and in the configuration file.
override_opts (CommentedMap) – dict-like object returned by ruamel.yaml which contains the options that should be used to override the configuration options.
set_of_possible_options (tuple of enums) – Possible options for the override value categories.

Return type:

pachyderm.generic_config.determine_selection_of_iterable_values_from_config(config, possible_iterables)#

Determine iterable values to use to create objects for a given configuration.

All values of an iterable can be included be setting the value to True (Not as a single value list, but as the only value.). Alternatively, an iterator can be disabled by setting the value to False.

Parameters:

config (CommentedMap) – The dict-like configuration from ruamel.yaml which should be overridden.
possible_iterables (Mapping[str, type[Enum]]) – Key value pairs of names of enumerations and their values.

Returns:

Iterables values that were requested in the config.

Return type:

dict

class pachyderm.generic_config.formatting_dict#

Bases: dict[Any, Any]

Dict to handle missing keys when formatting a string.

It returns the missing key for later use in formatting. See: https://stackoverflow.com/a/17215533

pachyderm.generic_config.iterate_with_selected_objects(analysis_objects, **selections)#

Iterate over an analysis dictionary with selected attributes.

Parameters:

analysis_objects (Mapping[TypeVar(_T_Key), TypeVar(_T_Analysis)]) – Analysis objects dictionary.
selections (Any) – Keyword arguments used to select attributes from the analysis dictionary.

Yields:

object – Matching analysis object.

Return type:

Iterator[tuple[TypeVar(_T_Key), TypeVar(_T_Analysis)]]

pachyderm.generic_config.iterate_with_selected_objects_in_order(analysis_objects, analysis_iterables, selection)#

Iterate over an analysis dictionary, yielding the selected attributes in order.

So if there are three iterables, a, b, and c, if we selected c, then we iterate over a and b, and return c in the same order each time for each set of values of a and b. As an example, consider the set of iterables:

`python a = ["a1", "a2"] b = ["b1", "b2"] c = ["c1", "c2"] `

then it will effectively return:

`pycon >>> for a_val in a: ... for b_val in b: ... for c_val in c: ... obj(a_val, b_val, c_val) `

This will yield:

`pycon >>> output = list(iterate_with_selected_objects_in_order(..., selection = ["a"])) [[("a1", "b1", "c1"), ("a2", "b1", "c1")], [("a1", "b2", "c1"), ("a2", "b2", "c1")], ...] `

This is particularly nice because we can then select on a set of iterables to be returned without having to specify the rest of the iterables that we don’t really care about.

Parameters:

analysis_objects (Mapping[TypeVar(_T_Key), TypeVar(_T_Analysis)]) – Analysis objects dictionary.
analysis_iterables (Mapping[str, Sequence[Any]]) – Iterables used in constructing the analysis objects.
selection (str | Sequence[str]) – Selection of analysis selections to return. Can be either a string or a sequence of selections.

Yields:

object – Matching analysis object.

Return type:

Iterator[list[tuple[TypeVar(_T_Key), TypeVar(_T_Analysis)]]]

pachyderm.generic_config.load_configuration(yaml, filename)#

Load an analysis configuration from a file.

Parameters:

yaml (YAML) – YAML object to use in loading the configuration.
filename (str | Path) – Filename of the YAML configuration file.

Return type:

CommentedMap

Returns:

dict-like object containing the loaded configuration

pachyderm.generic_config.override_options(config, selected_options, set_of_possible_options, config_containing_override=None)#

Determine override options for a particular configuration.

The options are determined by searching following the order specified in selected_options.

For the example config,

```yaml config:

value: 3 override:

2.76:

track:
value: 5

```

value will be assigned the value 5 if we are at 2.76 TeV with a track bias, regardless of the event activity or leading hadron bias. The order of this configuration is specified by the order of the selected_options passed. The above example configuration is from the jet-hadron analysis.

Since anchors aren’t kept for scalar values, if you want to override an anchored value, you need to specify it as a single value in a list (or dict, but list is easier). After the anchor values propagate, single element lists can be converted into scalar values using simplify_data_representations().

Parameters:

config (CommentedMap) – The dict-like configuration from ruamel.yaml which should be overridden.
selected_options (tuple[Any, ...]) – The selected analysis options. They will be checked in the order with which they are passed, so make certain that it matches the order in the configuration file!
set_of_possible_options (tuple of enums) – Possible options for the override value categories.
config_containing_override (CommentedMap | None) – The dict-like config containing the override options in a map called “override”. If it is not specified, it will look for it in the main config.

Returns:

The updated configuration

Return type:

dict-like object

pachyderm.generic_config.simplify_data_representations(config)#

Convert one entry lists to the scalar value

This step is necessary because anchors are not kept for scalar values - just for lists and dictionaries. Now that we are done with all of our anchor references, we can convert these single entry lists to just the scalar entry, which is more usable.

Some notes on anchors in ruamel.yaml are here: https://stackoverflow.com/a/48559644

Parameters:: config (CommentedMap) – The dict-like configuration from ruamel.yaml which should be simplified.
Return type:: CommentedMap
Returns:: The updated configuration.

pachyderm.histogram module#

Histogram related classes and functionality.

class pachyderm.histogram.Histogram1D(bin_edges, y, errors_squared, metadata=<factory>)#

Bases: object

Contains histogram data.

Note

Underflow and overflow bins are excluded!

When converting from a TH1 (either from ROOT or uproot), additional statistical information will be extracted from the hist to enable the calculation of additional properties. The information available is:

Total sum of weights (equal to np.sum(self.y), which we store)
Total sum of weights squared (equal to np.sum(self.errors_squared), which we store)
Total sum of weights * x
Total sum of weights * x * x

Each is a single float value. Since the later two values are unique, they are stored in the metadata.

Parameters:

bin_edges (np.ndarray) – The histogram bin edges.
y (np.ndarray) – The histogram bin values.
errors_squared (np.ndarray) – The bin sum weight squared errors.

x#

The bin centers.

Type:: np.ndarray

y#

The bin values.

Type:: np.ndarray

bin_edges#

The bin edges.

Type:: np.ndarray

errors#

The bin errors.

Type:: np.ndarray

errors_squared#

The bin sum weight squared errors.

Type:: np.ndarray

metadata#

Any additional metadata that should be stored with the histogram. Keys are expected to be strings, while the values can be anything. For example, could contain systematic errors, etc.

Type:: dict

Parameters:: metadata (dict[str, Any]) –

bin_edges: ndarray[Any, dtype[Any]]#

property bin_widths: ndarray[Any, dtype[Any]]#

Bin widths calculated from the bin edges.

Returns:: Array of the bin widths.

copy()#

Copies the object.

In principle, this should be the same as copy.deepcopy(...), at least when this was written in Feb 2019. But deepcopy(...) often seems to have very bad performance (and perhaps does additional implicit copying), so we copy these numpy arrays by hand.

Parameters:: self (TypeVar(_T, bound= Histogram1D)) –
Return type:: TypeVar(_T, bound= Histogram1D)

counts_in_interval(min_value=None, max_value=None, min_bin=None, max_bin=None)#

Count the number of counts within bins in an interval.

Note

The integration limits could be described as inclusive. This matches the ROOT convention. See histogram1D._integral(...) for further details on how these limits are determined.

Note

The arguments can be mixed (ie. a min bin and a max value), so be careful!

Parameters:

min_value (float | None) – Minimum value for the integral (we will find the bin which contains this value).
max_value (float | None) – Maximum value for the integral (we will find the bin which contains this value).
min_bin (int | None) – Minimum bin for the integral.
max_bin (int | None) – Maximum bin for the integral.

Returns:

Integral value, error

Return type:

(value, error)

property errors: ndarray[Any, dtype[Any]]#

errors_squared: ndarray[Any, dtype[Any]]#

find_bin(value)#

Find the bin corresponding to the specified value.

For further information, see find_bin(...) in this module.

Note

Bins are 0-indexed here, while in ROOT they are 1-indexed.

Parameters:: value (float) – Value for which we want want the corresponding bin.
Return type:: int
Returns:: Bin corresponding to the value.

classmethod from_existing_hist(hist)#

Convert an existing histogram.

Note

Underflow and overflow bins are excluded! Bins are assumed to be fixed size.

Parameters:: hist (uproot.rootio.TH1* or ROOT.TH1) – Histogram to be converted.
Returns:: Dataclass with x, y, and errors
Return type:: Histogram

classmethod from_hepdata(hist, extraction_function=<function _extract_values_from_hepdata_dependent_variable>)#

Convert (a set) of HEPdata histogram(s) to a Histogram1D.

Will include any information that the extraction function extracts and returns.

Note

It only grabs the first independent variable to determining the x axis.

Parameters:

hist (Mapping[str, Any]) – HEPdata input histogram(s).
extraction_function (Callable[[Mapping[str, Any]], tuple[list[float] | ndarray[Any, dtype[Any]], list[float] | ndarray[Any, dtype[Any]], dict[str, Any]]]) – Extract values from HEPdata dict to be used to construct a histogram. Default: Retrieves y values, symmetric statical errors. Symmetric systematic errors are stored in the metadata.

Return type:

list[TypeVar(_T, bound= Histogram1D)]

Returns:

List of Histogram1D constructed from the input HEPdata.

integral(min_value=None, max_value=None, min_bin=None, max_bin=None)#

Integrate the histogram over the given range.

Note

Be very careful here! The equivalent of TH1::Integral(…) is counts_in_interval(..). That’s because when we multiply by the bin width, we implicitly should be resetting the stats. We will still get the right answer in terms of y and errors_squared, but if this result is used to normalize the hist, the stats will be wrong. We can’t just reset them here because the integral doesn’t modify the histogram.

Note

The integration limits could be described as inclusive. This matches the ROOT convention. See histogram1D._integral(...) for further details on how these limits are determined.

Note

The arguments can be mixed (ie. a min bin and a max value), so be careful!

Parameters:

min_value (float | None) – Minimum value for the integral (we will find the bin which contains this value).
max_value (float | None) – Maximum value for the integral (we will find the bin which contains this value).
min_bin (int | None) – Minimum bin for the integral.
max_bin (int | None) – Maximum bin for the integral.

Returns:

Integral value, error

Return type:

(value, error)

property mean: float#

Mean of values filled into the histogram.

Calculated in the same way as ROOT and physt.

Parameters:: None. –
Returns:: Mean of the histogram.

metadata: dict[str, Any]#

property n_entries: float#: The number of entries in the hist.

Note

This value is dependent on the weight. We don’t have a weight independent measure like a ROOT hist, so this value won’t agree with the number of entries from a weighted ROOT hist.

property std_dev: float#

Standard deviation of the values filled into the histogram.

Calculated in the same way as ROOT and physt.

Parameters:: None. –
Returns:: Standard deviation of the histogram.

property variance: float#

Variance of the values filled into the histogram.

Calculated in the same way as ROOT and physt.

Parameters:: None. –
Returns:: Variance of the histogram.

property x: ndarray[Any, dtype[Any]]#

The histogram bin centers (x).

This property caches the x value so we don’t have to calculate it every time.

Parameters:: None –
Returns:: Array of center of bins.

y: ndarray[Any, dtype[Any]]#

class pachyderm.histogram.RootOpen(filename, mode='read')#

Bases: AbstractContextManager[_T_ContextManager]

Very simple helper to open root files.

Parameters:

filename (Path | str) –
mode (str) –

pachyderm.histogram.binned_mean(stats)#

Mean of values stored in the histogram.

Calculated in the same way as ROOT and physt.

Parameters:: stats (dict[str, float]) – The histogram statistical properties.
Return type:: float
Returns:: Mean of the histogram.

pachyderm.histogram.binned_standard_deviation(stats)#

Standard deviation of the values filled into the histogram.

Calculated in the same way as ROOT and physt.

Parameters:: stats (dict[str, float]) – The histogram statistical properties.
Return type:: float
Returns:: Standard deviation of the histogram.

pachyderm.histogram.binned_variance(stats)#

Variance of the values filled into the histogram.

Calculated in the same way as ROOT and physt.

Parameters:: stats (dict[str, float]) – The histogram statistical properties.
Return type:: float
Returns:: Variance of the histogram.

pachyderm.histogram.calculate_binned_stats(bin_edges, y, weights_squared)#

Calculate the stats needed to fully determine histogram properties.

The values are calculated the same way as in ROOT.TH1.GetStats(...). Recalculating the statistics is not ideal because information is lost compared to the information available when filling the histogram. In particular, we actual passed x value is used to calculate these values when filling, but we can only approximate this with the bin center when calculating these values later. Calculating them here is equivalent to calling hist.ResetStats() before retrieving the stats.

These results are accessible from the ROOT hist using ctypes via:

>>> stats = np.array([0, 0, 0, 0], dtype = np.float64)
>>> hist.GetStats(np.ctypeslib.as_ctypes(stats))

Note

sum_w and sum_w2 calculated here are _not_ equal to the ROOT values if the histogram has been scaled. This is because the weights don’t change even if the histogram has been scaled. If the hist stats are reset, it loses this piece of information and has to reconstruct the stats from the current frequencies, such that it will then agree with this function.

Parameters:

bin_edges (ndarray[Any, dtype[Any]]) – Histogram bin edges.
y (ndarray[Any, dtype[Any]]) – Histogram bin frequencies.
weights_squared (ndarray[Any, dtype[Any]]) – Filling weights squared (ie. this is the Sumw2 array).

Return type:

dict[str, float]

Returns:

Stats dict containing the newly calculated statistics.

pachyderm.histogram.find_bin(bin_edges, value)#

Determine the index position where the value should be inserted.

This is basically ROOT.TH1.FindBin(value), but it can used for any set of bin_edges.

Note

Bins are 0-indexed here, while in ROOT they are 1-indexed.

Parameters:

bin_edges (ndarray[Any, dtype[Any]]) – Bin edges of the histogram.
value (float) – Value to find within those bin edges.

Return type:

int

Returns:

Index of the bin where that value would reside in the histogram.

pachyderm.histogram.get_array_from_hist2D(hist, set_zero_to_NaN=True, return_bin_edges=False)#

Extract x, y, and bin values from a 2D ROOT histogram.

Converts the histogram into a numpy array, and suitably processes it for a surface plot by removing 0s (which can cause problems when taking logs), and returning a set of (x, y) mesh values utilizing either the bin edges or bin centers.

Note

This is a different format than the 1D version!

Parameters:

hist (ROOT.TH2) – Histogram to be converted.
set_zero_to_NaN (bool) – If true, set 0 in the array to NaN. Useful with matplotlib so that it will ignore the values when plotting. See comments in this function for more details. Default: True.
return_bin_edges (bool) – Return x and y using bin edges instead of bin centers.

Return type:

tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

Returns:

Contains (x values, y values, numpy array of hist data) where (x, y) are values on a: grid (from np.meshgrid) using the selected bin values.

pachyderm.histogram.get_bin_edges_from_axis(axis)#

Get bin edges from a ROOT hist axis.

Note

Doesn’t include over- or underflow bins!

Parameters:: axis (ROOT.TAxis) – Axis from which the bin edges should be extracted.
Return type:: ndarray[Any, dtype[Any]]
Returns:: Array containing the bin edges.

pachyderm.histogram.get_histograms_in_file(filename)#

Helper function which gets all histograms in a file.

Parameters:

filename (Path | str) – Filename of the ROOT file containing the list.

Return type:

Returns:

Contains hists with keys as their names. Lists are recursively added, mirroring: the structure under which the hists were stored.

pachyderm.histogram.get_histograms_in_list(filename, list_name=None)#

Get histograms from the file and make them available in a dict.

Lists are recursively explored, with all lists converted to dictionaries, such that the return dictionaries which only contains hists and dictionaries of hists (ie there are no ROOT TCollection derived objects).

Parameters:

filename (Path | str) – Filename of the ROOT file containing the list.
list_name (str | None) – Name of the list to retrieve.

Return type:

Returns:

Contains hists with keys as their names. Lists are recursively added, mirroring: the structure under which the hists were stored.

Raises:

ValueError – If the list could not be found in the given file.

pachyderm.plot module#

Plotting styling and utilities.

class pachyderm.plot.AxisConfig(axis, label='', log=False, range=None, font_size=None, tick_font_size=None, use_major_axis_multiple_locator_with_base=None)#

Bases: object

Configuration for an axis.

Parameters:

axis (str) –
label (str) –
log (bool) –
range (tuple[float | None, float | None] | None) –
font_size (float | None) –
tick_font_size (float | None) –
use_major_axis_multiple_locator_with_base (float | bool | None) –

apply(ax)#

Apply the axis configuration to the given axis.

Parameters:: ax (Axes) –
Return type:: None

axis: str#

font_size: float | None#

label: str#

log: bool#

range: tuple[float | None, float | None] | None#

tick_font_size: float | None#

use_major_axis_multiple_locator_with_base: float | None#

class pachyderm.plot.Figure(edge_padding=_Nothing.NOTHING, text=_Nothing.NOTHING)#

Bases: object

Configuration for a MPL figure.

Parameters:

edge_padding (Mapping[str, float]) –
text (TextConfig | Sequence[TextConfig]) –

apply(fig)#: Apply the figure configuration to the given figure.

edge_padding: Mapping[str, float]#

text: Sequence[TextConfig]#

class pachyderm.plot.LegendConfig(location=None, anchor=None, font_size=None, ncol=1, marker_label_spacing=None, label_spacing=None, column_spacing=None, handle_height=None, handler_map=_Nothing.NOTHING)#

Bases: object

Configuration for a legend on a plot.

Parameters:

location (str) –
anchor (tuple[float, float] | None) –
font_size (float | None) –
ncol (float | None) –
marker_label_spacing (float | None) –
label_spacing (float | None) –
column_spacing (float | None) –
handle_height (float | None) –
handler_map (dict[str, Any]) –

anchor: tuple[float, float] | None#

apply(ax, legend_handles=None, legend_labels=None)#

Apply the legend configuration to the given axis.

Note

If provided, we’ll use the given legend_handles and legend_labels to create the legend rather than those already associated with the legend.

Parameters:

ax (Axes) –
legend_handles (Sequence[ErrorbarContainer] | None) –
legend_labels (Sequence[str] | None) –

Return type:

Legend | None

column_spacing: float | None#

font_size: float | None#

handle_height: float | None#

handler_map: dict[str, Any]#

label_spacing: float | None#

location: str#

marker_label_spacing: float | None#

ncol: float | None#

class pachyderm.plot.Panel(axes, text=_Nothing.NOTHING, legend=None, title=None)#

Bases: object

Configuration for a panel within a plot.

The Panel is a configuration for an ax object.

axes#: Configuration of the MPL axis. We allow for multiple AxisConfig because each config specifies a single axis (ie. x or y). Careful not to confuse with the actual ax object provided by MPL.

Parameters:

axes (AxisConfig | Sequence[AxisConfig]) –
text (TextConfig | Sequence[TextConfig]) –
legend (LegendConfig | None) –
title (TitleConfig | None) –

apply(ax, legend_handles=None, legend_labels=None)#

Apply the panel configuration to the given axis.

Parameters:

ax (Axes) –
legend_handles (Sequence[ErrorbarContainer] | None) –
legend_labels (Sequence[str] | None) –

Return type:

None

axes: Sequence[AxisConfig]#

legend: LegendConfig | None#

text: Sequence[TextConfig]#

title: TitleConfig | None#

class pachyderm.plot.PlotConfig(name, panels, figure=_Nothing.NOTHING)#

Bases: object

Configuration for an overall plot.

A plot consists of some number of panels, which are each configured with their own axes, text, etc. These axes are on a figure.

name#: Name of the plot. Usually used for the filename.

panels#: Configuration for the panels of the plot.

figure#: Configuration for the figure of the plot.

Parameters:

name (str) –
panels (Panel | Sequence[Panel]) –
figure (Figure) –

apply(fig, ax=None, axes=None, legend_handles=None, legend_labels=None)#: Apply the plot configuration to the given figure and axes.

figure: Figure#

name: str#

panels: Sequence[Panel]#

class pachyderm.plot.TextConfig(text, x, y, alignment=None, color='black', font_size=None, text_kwargs=_Nothing.NOTHING)#

Bases: object

Configuration for text on a plot.

Parameters:

text (str) –
x (float) –
y (float) –
alignment (str | None) –
color (str | None) –
font_size (float | None) –
text_kwargs (dict[str, Any]) –

alignment: str | None#

apply(ax_or_fig)#: Apply the text configuration to the given axis or figure.

color: str | None#

font_size: float | None#

text: str#

text_kwargs: dict[str, Any]#

x: float#

y: float#

class pachyderm.plot.TitleConfig(text, size=None)#

Bases: object

Configuration for a title of a plot.

Parameters:

text (str) –
size (float | None) –

apply(ax)#

Apply the title configuration to the given axis.

Parameters:: ax (Axes) –
Return type:: None

size: float | None#

text: str#

pachyderm.plot.configure(disable_interactive_backend=False)#

Configure matplotlib according to my (biased) specification.

As a high level summary, this is a combination of a number of seaborn settings, along with my own tweaks. By calling this function, the matplotlib rcParams will be modified according to these settings.

Up to this point, the settings have been configured by importing the jet_hadron.plot.base module, which set a variety of parameters on import. This included some options which were set by seaborn. Additional modifications were made to the fonts to ensure that they are the same in labels and latex. Lastly, it tweaked smaller visual settings. The differences between the default matplotlib and these settings are:

```pycon >>> pprint.pprint(diff) {‘axes.axisbelow’: ‘original: line, new: True’,

‘axes.edgecolor’: ‘original: black, new: .15’, ‘axes.labelcolor’: ‘original: black, new: .15’, ‘axes.labelsize’: ‘original: medium, new: 12.0’, ‘axes.linewidth’: ‘original: 0.8, new: 1.25’, ‘axes.prop_cycle’: “original: cycler(‘color’, [‘#1f77b4’, ‘#ff7f0e’, “

“’#2ca02c’, ‘#d62728’, ‘#9467bd’, ‘#8c564b’, ‘#e377c2’, ” “’#7f7f7f’, ‘#bcbd22’, ‘#17becf’]), new: cycler(‘color’, ” ‘[(0.2980392156862745, 0.4470588235294118, ‘ ‘0.6901960784313725), (0.8666666666666667, ‘ ‘0.5176470588235295, 0.3215686274509804), ‘ ‘(0.3333333333333333, 0.6588235294117647, ‘ ‘0.40784313725490196), (0.7686274509803922, ‘ ‘0.3058823529411765, 0.3215686274509804), ‘ ‘(0.5058823529411764, 0.4470588235294118, ‘ ‘0.7019607843137254), (0.5764705882352941, ‘ ‘0.47058823529411764, 0.3764705882352941), ‘ ‘(0.8549019607843137, 0.5450980392156862, ‘ ‘0.7647058823529411), (0.5490196078431373, ‘ ‘0.5490196078431373, 0.5490196078431373), (0.8, ‘ ‘0.7254901960784313, 0.4549019607843137), ‘ ‘(0.39215686274509803, 0.7098039215686275, ‘ ‘0.803921568627451)])’,

‘axes.titlesize’: ‘original: large, new: 12.0’, ‘font.sans-serif’: “original: [‘DejaVu Sans’, ‘Bitstream Vera Sans’, “

“‘Computer Modern Sans Serif’, ‘Lucida Grande’, ‘Verdana’, ” “‘Geneva’, ‘Lucid’, ‘Arial’, ‘Helvetica’, ‘Avant Garde’, ” “‘sans-serif’], new: [‘Arial’, ‘DejaVu Sans’, ‘Liberation ” “Sans’, ‘Bitstream Vera Sans’, ‘sans-serif’]”,

‘font.size’: ‘original: 10.0, new: 12.0’, ‘grid.color’: ‘original: #b0b0b0, new: .8’, ‘grid.linewidth’: ‘original: 0.8, new: 1.0’, ‘image.cmap’: ‘original: viridis, new: rocket’, ‘legend.fontsize’: ‘original: medium, new: 11.0’, ‘lines.solid_capstyle’: ‘original: projecting, new: round’, ‘mathtext.bf’: ‘original: sans:bold, new: Bitstream Vera Sans:bold’, ‘mathtext.fontset’: ‘original: dejavusans, new: custom’, ‘mathtext.it’: ‘original: sans:italic, new: Bitstream Vera Sans:italic’, ‘mathtext.rm’: ‘original: sans, new: Bitstream Vera Sans’, ‘patch.edgecolor’: ‘original: black, new: w’, ‘patch.facecolor’: ‘original: C0, new: (0.2980392156862745, ‘

‘0.4470588235294118, 0.6901960784313725)’,

‘patch.force_edgecolor’: ‘original: False, new: True’, ‘text.color’: ‘original: black, new: .15’, ‘text.usetex’: ‘original: False, new: True’, ‘xtick.color’: ‘original: black, new: .15’, ‘xtick.direction’: ‘original: out, new: in’, ‘xtick.labelsize’: ‘original: medium, new: 11.0’, ‘xtick.major.size’: ‘original: 3.5, new: 6.0’, ‘xtick.major.width’: ‘original: 0.8, new: 1.25’, ‘xtick.minor.size’: ‘original: 2.0, new: 4.0’, #’xtick.minor.top’: ‘original: True, new: False’, ‘xtick.minor.visible’: ‘original: False, new: True’, ‘xtick.minor.width’: ‘original: 0.6, new: 1.0’, ‘ytick.color’: ‘original: black, new: .15’, ‘ytick.direction’: ‘original: out, new: in’, ‘ytick.labelsize’: ‘original: medium, new: 11.0’, ‘ytick.major.size’: ‘original: 3.5, new: 6.0’, ‘ytick.major.width’: ‘original: 0.8, new: 1.25’, ‘ytick.minor.right’: ‘original: True, new: False’, ‘ytick.minor.size’: ‘original: 2.0, new: 4.0’, #’ytick.minor.visible’: ‘original: False, new: True’, ‘ytick.minor.width’: ‘original: 0.6, new: 1.0’}

```

I implemented most of these below (although I left out a few color options).

For more on the non-interactive mode, see: https://gist.github.com/matthewfeickert/84245837f09673b2e7afea929c016904

Parameters:: disable_interactive_backend (bool) – If True, configure the MPL backend to be non-interactive. This should make loading a bit more efficient, since I rarely use the GUI.
Return type:: None
Returns:: None. The current matplotlib rcParams are modified.

pachyderm.plot.convert_mpl_color_scheme_to_ROOT(name=None, cmap=None, reverse_cmap=False, n_values_to_cut_from_top=0)#

Convert matplotlib color scheme to ROOT.

Parameters:

name (str | None) – Name of the matplotlib color scheme.
reversed – True if the color scheme should be reversed.
n_values_to_cut_from_top (int) – Number of values to cut from the top of the color scheme.
cmap (ListedColormap | LinearSegmentedColormap | None) –
reverse_cmap (bool) –

Return type:

Inspired by: https://matplotlib.org/gallery/statistics/errorbars_and_boxes.html and https://github.com/HDembinski/pyik/blob/217ae25bbc316c7a209a1a4a1ce084f6ca34276b/pyik/mplext.py#L138

Returns:

Snippet to add the color scheme to ROOT.

pachyderm.plot.error_boxes(ax, x_data, y_data, y_errors, x_errors=None, **kwargs)#

Plot error boxes for the given data.

Note

The errors are distances from the central value. ie. for 10% error on 1, the two entry version should be [0.1, 0.1].

Parameters:

ax (Axes) – Axis onto which the rectangles will be drawn.
x_data (ndarray[Any, dtype[Any]]) – x location of the data.
y_data (ndarray[Any, dtype[Any]]) – y location of the data.
y_errors (ndarray[Any, dtype[Any]]) – y errors of the data. The array can either be of length n, or of length (n, 2) for asymmetric errors.
x_errors (ndarray[Any, dtype[Any]] | None) – x errors of the data. The array can either be of length n, or of length (n, 2) for asymmetric errors. Default: None. This corresponds to boxes that are 10% of the distance between the two given point and the previous one.
kwargs (str | float) –

Return type:

PatchCollection

pachyderm.plot.restore_defaults()#

Restore the default matplotlib settings.

Return type:: None

pachyderm.projectors module#

Handle generic TH1 and THn projections.

class pachyderm.projectors.HistAxisRange(axis_range_name, axis_type, min_val, max_val)#

Bases: EqualityMixin

Represents the restriction of a range of an axis of a histogram.

An axis can be restricted by multiple HistAxisRange elements (although separate projections are needed to apply more than one. This would be accomplished with separate entries to the HistProjector.projection_dependent_cut_axes).

Note

A single axis which has multiple ranges could be represented by multiple HistAxisRange objects!

Parameters:

axis_range_name (str) – Name of the axis range. Usually some combination of the axis name and some sort of description of the range.
axis_type (enum.Enum) – Enumeration corresponding to the axis to be restricted. The numerical value of the enum should be axis number (for a THnBase).
min_val (function) – Minimum range value for the axis. Usually set via apply_func_to_find_bin().
min_val – Maximum range value for the axis. Usually set via apply_func_to_find_bin().
max_val (Union[float, Callable[[Any], float]]) –

static apply_func_to_find_bin(func, values=None)#

Closure to determine the bin associated with a value on an axis.

It can apply a function to an axis if necessary to determine the proper bin. Otherwise, it can just return a stored value.

Note

To properly determine the value, carefully note the information below. In many cases, such as when we want values [2, 5), the values need to be shifted by a small epsilon to retrieve the proper bin. This is done automatically in SetRangeUser().

>>> hist = ROOT.TH1D("test", "test", 10, 0, 10)
>>> x = 2, y = 5
>>> hist.FindBin(x)
2
>>> hist.FindBin(x+epsilon)
2
>>> hist.FindBin(y)
6
>>> hist.FindBin(y-epsilon)
5

Note that the bin + epsilon on the lower bin is not strictly necessary, but it is used for consistency with the upper bound.

Parameters:

func (Callable) – Function to apply to the histogram axis. If it is None, the value will be returned.
values (int or float) – Value to pass to the function. Default: None (in which case, it won’t be passed).

Return type:

Callable[[Any], float | int]

Returns:

Function to be called with an axis to determine the desired bin on that axis.

apply_range_set(hist)#

Apply the associated range set to the axis of a given hist.

Note

The min and max values should be bins, not user ranges! For more, see the binning explanation in apply_func_to_find_bin(...).

Parameters:: hist (Any) – Histogram to which the axis range restriction should be applied.
Return type:: None
Returns:: None. The range is set on the axis.

property axis: Callable[[Any], Any]#: Determine the axis to return based on the hist type.

class pachyderm.projectors.HistProjector(observable_to_project_from, output_observable, projection_name_format, output_attribute_name=None, projection_information=None)#

Bases: object

Handles generic ROOT THn and TH1 projections.

There are three types of cuts which can be specified:

additional_axis_cuts: Axis cuts which do not change based on the projection axis.
projection_dependent_cut_axes: Axis cuts which change based on the projection axis.
projection_axes: Axes onto which the projection will be performed.

For a full description of each type of cut and the necessary details, see their descriptions in the attributes.

Note

The TH1 projections have not been tested as extensively as the THn projections.

Note

input_key, input_hist, input_observable, projection_name, and output_hist are all reserved keys, such they will be overwritten by predefined information when passed to the various functions. Thus, they should be avoided by the user when storing projection information

Parameters:

observable_to_project_from (dict[str, Any] | Any) – The observables which should be used to project from. The dict key is passed to projection_name(...) as input_key.
output_observable (dict[str, Any] | Any) – Object or dict where the projected hist will be stored.
projection_name_format (str) – Format string to determine the projected hist name.
output_attribute_name (str | None) – Name of the attribute where which the single observable projection will be stored in the output_observable object. Must not be specified if projecting with multiple objects. Default: None.
projection_information (dict[str, Any] | None) – Keyword arguments to be passed to projection_name(...) to determine the name of the projected histogram. Default: None.

single_observable_projection#: True if the projector is only performing a single observable projection.

output_attribute_name#: Name of the attribute under which the single observable projection will be stored in the output_observable object.

observable_to_project_from#: The observable(s) which should be used to project from. The dict key is passed to projection_name(...) as input_key.

output_observable#: Where the projected hist(s) will be stored. They will be stored under the dict key determined by output_key_name(...).

projection_name_format#: Format string to determine the projected hist name.

projection_information#: Keyword arguments to be passed to projection_name(...) to determine the name of the projected histogram.

additional_axis_cuts#

List of axis cuts which are neither projected nor depend on the axis being projected.

Type:: list

projection_dependent_cut_axes#

List of list of axis cuts which depend on the projected axis. For example, if we want to project non-continuous ranges of a non-projection axis (say, dEta when projecting dPhi). It is a list of list to allow for groups of cuts to be specified together if necessary.

Type:: list

projection_axes#

List of axes which should be projected.

Type:: list

call_projection_function(hist)#

Calls the actual projection function for the hist.

Parameters:: hist (Any) – Histogram from which the projections should be performed.
Return type:: Any
Returns:: The projected histogram.

cleanup_cuts(hist, cut_axes)#

Cleanup applied cuts by resetting the axis to the full range.

Inspired by: https://github.com/matplo/rootutils/blob/master/python/2.7/THnSparseWrapper.py

Parameters:

hist (Any) – Histogram for which the axes should be reset.
cut_axes (Iterable[HistAxisRange]) – List of axis cuts, which correspond to axes that should be reset.

Return type:

None

get_hist(observable, **kwargs)#

Get the histogram that may be stored in some object.

This histogram is used to project from.

Note

The output object could just be the raw ROOT histogram.

Note

This function is just a basic placeholder and likely should be overridden.

Parameters:

observable (object) – The input object. It could be a histogram or something more complex
kwargs (dict[str, Any]) – Additional arguments passed to the projection function

Return type:

Returns:

ROOT.TH1 or ROOT.THnBase histogram which should be projected. By default, it returns the: observable (input object).

output_hist(output_hist, input_observable, **kwargs)#

Return an output object. It should store the output_hist.

Note

The output object could just be the raw histogram.

Note

This function is just a basic placeholder which returns the given output object (a histogram) and likely should be overridden.

Parameters:

output_hist (Any) – The output histogram
input_observable (object) – The corresponding input object. It could be a histogram or something more complex.
kwargs (Any) – Projection information dict combined with additional arguments passed to the projection function

Return type:

Returns:

The output object which should be stored in the output dict. By default, it returns the: output hist.

output_key_name(input_key, output_hist, projection_name, **kwargs)#

Returns the key under which the output object should be stored.

Note

This function is just a basic placeholder which returns the projection name and likely should be overridden.

Parameters:

input_key (str) – Key of the input hist in the input dict
output_hist (Any) – The output histogram
projection_name (str) – Projection name for the output histogram
kwargs (str) – Projection information dict combined with additional arguments passed to the projection function.

Return type:

Returns:

Key under which the output object should be stored. By default, it returns the: projection name.

project(**kwargs)#

Perform the requested projection(s).

Note

All cuts on the original histograms will be reset when this function is completed.

Parameters:: kwargs (dict) – Additional named args to be passed to projection_name(…) and output_key_name(…)
Return type:: Any | dict[str, Any]
Returns:: The projected histogram(s). The projected histograms are also stored in output_observable.

projection_name(**kwargs)#

Define the projection name for this projector.

Note

This function is just a basic placeholder and likely should be overridden.

Parameters:

kwargs (dict[str, Any]) – Projection information dict combined with additional arguments passed to the projection function.

Return type:

Returns:

Projection name string formatted with the passed options. By default, it returns: projection_name_format formatted with the arguments to this function.

class pachyderm.projectors.TH1AxisType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: Enum

Map from (x,y,z) axis to the axis number.

Other enumerations that refer to this enum should refer to the _values_ to ensure consistency in .value pointing to the axis value.

x_axis = 0#

y_axis = 1#

z_axis = 2#

pachyderm.projectors.hist_axis_func(axis_type)#

Wrapper to retrieve the axis of a given histogram.

This can be convenient outside of just projections, so it’s made available in the API.

Parameters:: axis_type (Enum) – The type of axis to retrieve.
Return type:: Callable[[Any], Any]
Returns:: Callable to retrieve the specified axis when given a hist.

pachyderm.remove_outliers module#

Provides outliers removal methods.

class pachyderm.remove_outliers.OutliersRemovalManager(moving_average_threshold=1.0)#

Bases: object

Manage the removal of outliers from histograms.

Parameters:: moving_average_threshold (float) –

moving_average_threshold: float = 1.0#

run(outliers_removal_axis, hist=None, hists=None, mean_fractional_difference_limit=0.01, median_fractional_difference_limit=0.01)#

Remove outliers from the given histogram(s).

Parameters:

outliers_removal_axis (Union[TH1AxisType, Enum]) – Axis along which outliers removal will be performed. Usually the particle level axis.
hist (Any | None) – Histogram to check for outliers. Either this or hists must be specified.
hists (Mapping[str, Any] | None) – Histograms to check for outliers. Either this or hist must be specified.
mean_fractional_difference_limit (float) – Max fractional difference of mean after outliers removal. Default: 0.01.
median_fractional_difference_limit (float) – Max fractional difference of median after outliers removal. Default: 0.01.

Return type:

int

Returns:

Bin index value from which the outliers were removed. The histogram(s) is modified in place.

pachyderm.utils module#

Broad collection of utility functions and constants.

pachyderm.utils.moving_average(arr, n=3)#

Calculate the moving overage over an array.

Algorithm from: https://stackoverflow.com/a/14314054

Parameters:

arr (np.ndarray) – Array over which to calculate the moving average.
n (int) – Number of elements over which to calculate the moving average. Default: 3

Returns:

Moving average calculated over n.

Return type:

np.ndarray

pachyderm.utils.recursive_getattr(obj, attr, *args)#

Recursive getattr.

This can be used as a drop in for the standard getattr(...). Credit to: https://stackoverflow.com/a/31174427

Parameters:

obj (Any) – Object to retrieve the attribute from.
attr (str) – Name of the attribute, with each successive attribute separated by a “.”.
args (Any) –

Return type:

Returns:

The requested attribute. (Same as getattr).

Raises:

AttributeError – If the attribute was not found and no default was provided. (Same as getattr).

pachyderm.utils.recursive_getitem(d, keys)#

Recursively retrieve an item from a nested dict.

Credit to: https://stackoverflow.com/a/52260663

Parameters:

d (Mapping[str, Any]) – Mapping of strings to objects.
keys (str | Sequence[str]) – Names of the keys under which the object is stored. Can also just be a single string.

Return type:

Returns:

The object stored under the keys.

Raises:

KeyError – If one of the keys isn’t found.

pachyderm.utils.recursive_setattr(obj, attr, val)#

Recursive setattr.

This can be used as a drop in for the standard setattr(...). Credit to: https://stackoverflow.com/a/31174427

Parameters:

obj (Any) – Object to retrieve the attribute from.
attr (str) – Name of the attribute, with each successive attribute separated by a “.”.
value – Value to set the attribute to.
val (Any) –

Return type:

Returns:

The requested attribute. (Same as getattr).

Raises:

AttributeError – If the attribute was not found and no default was provided. (Same as getattr).

pachyderm.version module#

pachyderm.yaml module#

Module related to YAML.

Contains a way to construct the main YAML object, as well as relevant mixins and classes.

Note

The YAML to/from enum values would be much better as a mixin. However, such an approach causes substantial issues. In particular, although we don’t explicitly pickle the values, calling copy.copy implicitly calls pickle, so we must maintain compatibility. However, enum mixins preclude pickling the enum value (see cpython/enum.py line 177). The problem basically comes down to the fact that we are assigning a bound staticmethod to the class when we mix it in, and it doesn’t seem to be able to resolving pickling the object (perhaps due to name resolution issues). For a bit more, see the comments on this stackoverflow post. Practically, I believe that we could also resolve this by implementing __reduce_ex, but that appears as if it will be more work than our implemented workaround. Our workaround can be implemented as:

```python class TestEnum(enum.Enum):

a = 1 b = 2

def __str__(self):
return self.name

to_yaml = staticmethod(generic_class.enum_to_yaml) from_yaml = staticmethod(generic_class.enum_from_yaml)

```

This enum object will pickle properly. Note that rather strangely, this issue showed up during tests on Debian Stretch, but not the exact same version of python on macOS. I don’t know why that’s the case, but the workaround seems to be fine on both systems, so we’ll just continue to use it.

pachyderm.yaml.enum_from_yaml(cls, constructor, node)#

Decode YAML representation.

This is a mixin method for reading enum values from YAML. It needs to be added to the enum as a classmethod. See the module docstring for further information on this approach and how to implement it.

Note

This method assumes that the name of the enumeration value was stored as a scalar node.

Parameters:

constructor (BaseConstructor) – Constructor from the YAML object.
node (ScalarNode) – Scalar node extracted from the YAML being read.
cls (type[TypeVar(T_EnumFromYAML, bound= Enum)]) –

Return type:

TypeVar(T_EnumFromYAML, bound= Enum)

Returns:

The constructed YAML value from the name of the enumerated value.

pachyderm.yaml.enum_to_yaml(cls, representer, data)#

Encodes YAML representation.

This is a mixin method for writing enum values to YAML. It needs to be added to the enum as a classmethod. See the module docstring for further information on this approach and how to implement it.

This method writes whatever is used in the string representation of the YAML value. Usually, this will be the unique name of the enumeration value. If the name is used, the corresponding EnumFromYAML mixin can be used to recreate the value. If the name isn’t used, more care may be necessary, so a from_yaml method for that particular enumeration may be necessary.

Note

This method assumes that the name of the enumeration value should be stored as a scalar node.

Parameters:

representer (BaseRepresenter) – Representation from YAML.
data (TypeVar(T_EnumToYAML, bound= Enum)) – Enumeration value to be encoded.
cls (type[TypeVar(T_EnumToYAML, bound= Enum)]) –

Return type:

ScalarNode

Returns:

Scalar representation of the name of the enumeration value.

pachyderm.yaml.numpy_array_from_yaml(constructor, data)#

Read an array from YAML to numpy.

It reads arrays registered under the tag !numpy_array.

Use with:

`python yaml = ruamel.yaml.YAML() yaml.constructor.add_constructor("!numpy_array", yaml.numpy_array_from_yaml) `

Note

We cannot use yaml.register_class because it won’t register the proper type. (It would register the type of the class, rather than of numpy.ndarray). Instead, we use the above approach to register this method explicitly with the representer.

Note

In order to allow users to write an array by hand, we check the data given. If it’s a list, we convert the values and put them into an array. If it’s binary encoded, we decode and load it.

Parameters:

constructor (BaseConstructor) – YAML constructor being used to read and create the objects specified in the YAML.
data (SequenceNode) – Data stored in the YAML node currently being processed.

Return type:

ndarray[Any, dtype[Any]]

Returns:

numpy array containing the data in the current YAML node.

pachyderm.yaml.numpy_array_to_yaml(representer, data)#

Write a numpy array to YAML.

It registers the array under the tag !numpy_array.

Use with:

`python yaml = ruamel.yaml.YAML() yaml.representer.add_representer(np.ndarray, yaml.numpy_array_to_yaml) `

Note

Parameters:

representer (BaseRepresenter) –
data (ndarray[Any, dtype[Any]]) –

Return type:

pachyderm.yaml.numpy_float64_from_yaml(constructor, data)#

Read an float64 from YAML to numpy.

It reads the float64 registered under the tag !numpy_float64.

Use with:

`python yaml = ruamel.yaml.YAML() yaml.constructor.add_constructor("!numpy_float64", yaml.numpy_float64_from_yaml) `

Note

We cannot use yaml.register_class because it won’t register the proper type. (It would register the type of the class, rather than of numpy.float64). Instead, we use the above approach to register this method explicitly with the representer.

Note

In order to allow users to write an float by hand, we check the data given. If it’s a raw float, we put it into an float64. If it’s binary encoded, we decode and load it.

Parameters:

constructor (BaseConstructor) – YAML constructor being used to read and create the objects specified in the YAML.
data (ScalarNode) – Data stored in the YAML node currently being processed.

Return type:

float64

Returns:

numpy float64 containing the data in the current YAML node.

pachyderm.yaml.numpy_float64_to_yaml(representer, data)#

Write a numpy float64 to YAML.

It registers the float under the tag !numpy_float64.

Use with:

`python yaml = ruamel.yaml.YAML() yaml.representer.add_representer(np.float64, yaml.numpy_float64_to_yaml) `

Note

Parameters:

representer (BaseRepresenter) –
data (float64) –

Return type: