pachyderm package#
pachyderm.binned_data module#
Functionality related to binned data.
- class pachyderm.binned_data.AxesTuple(iterable=(), /)#
-
- classmethod from_axes(axes)#
- classmethod from_yaml(constructor, data)#
Decode YAML representation.
For some reason, YAML doesn’t encode this object properly, so we have to tell it how to do so.
- Parameters:
constructor (
BaseConstructor
) – Constructor from the YAML object.node – YAML mapping node representing the AxesTuple object.
data (
MappingNode
) –
- Return type:
- Returns:
The AxesTuple object constructed from the YAML specified values.
- classmethod to_yaml(representer, obj)#
Encode YAML representation.
For some reason, YAML doesn’t encode this object properly, so we have to tell it how to do so.
We encode a mapping with the tuple stored in a sequence, as well as the serialization version.
- Parameters:
representer (
BaseRepresenter
) – Representation from YAML.data – AxesTuple to be converted to YAML.
obj (
AxesTuple
) –
- Return type:
MappingNode
- Returns:
YAML representation of the AxesTuple object.
- class pachyderm.binned_data.Axis(bin_edges)#
Bases:
object
- Parameters:
bin_edges (
Any
) –
- property bin_centers: ndarray[Any, dtype[Any]]#
The axis bin centers (
x
for 1D).This property caches the values so we don’t have to calculate it every time.
- Parameters:
None –
- Returns:
Array of center of bins.
- property bin_widths: ndarray[Any, dtype[Any]]#
Bin widths calculated from the bin edges.
- Returns:
Array of the bin widths.
- copy()#
Copies the object.
In principle, this should be the same as
copy.deepcopy(...)
, at least when this was written in Feb 2020. Butdeepcopy(...)
often seems to have very bad performance (and perhaps does additional implicit copying), so we copy these numpy arrays by hand.
- find_bin(value)#
Find the bin corresponding to the specified value.
For further information, see
find_bin(...)
in this module.Note
Bins are 0-indexed here, while in ROOT they are 1-indexed.
- classmethod from_yaml(constructor, data)#
Decode YAML representation.
For some reason, YAML doesn’t encode this object properly, so we have to tell it how to do so.
- Parameters:
constructor (
BaseConstructor
) – Constructor from the YAML object.data (
MappingNode
) – YAML mapping node representing the Axis object.
- Return type:
- Returns:
The Axis object constructed from the YAML specified values.
- classmethod to_yaml(representer, obj)#
Encode YAML representation.
For some reason, YAML doesn’t encode this object properly, so we have to tell it how to do so.
- Parameters:
representer (
BaseRepresenter
) – Representation from YAML.obj (
Axis
) – Axis to be converted to YAML.
- Return type:
MappingNode
- Returns:
YAML representation of the Axis object.
- class pachyderm.binned_data.BinnedData(axes, values, variances, metadata=_Nothing.NOTHING)#
Bases:
object
- Parameters:
- property axis: Axis#
Returns the single axis when the binned data is 1D.
This is just a helper function, but can be nice for one dimensional data.
- Returns:
The axis.
- copy()#
Copies the object.
In principle, this should be the same as
copy.deepcopy(...)
, at least when this was written in Feb 2020. Butdeepcopy(...)
often seems to have very bad performance (and perhaps does additional implicit copying), so we copy these numpy arrays by hand.- Parameters:
self (
BinnedData
) –- Return type:
- classmethod from_existing_data(binned_data, return_copy_if_already_converted=True)#
Convert an existing histogram.
Note
Underflow and overflow bins are excluded!
- classmethod from_hepdata(hist)#
Convert (a set) of HEPdata histogram(s) to BinnedData objects.
Will include any information that the extraction function extracts and returns.
Note
This is not included in the
from_existing_hist(...)
function because HEPdata files are oriented towards potentially containing multiple histograms in a single object. So we just return all of them and let the user sort it out.Note
It only grabs the first independent variable to determining the x axis.
- Parameters:
- Return type:
- Returns:
List of BinnedData constructed from the input HEPdata.
- classmethod from_yaml(constructor, data)#
Decode YAML representation.
For some reason, YAML doesn’t encode this object properly, so we have to tell it how to do so.
- Parameters:
constructor (
BaseConstructor
) – Constructor from the YAML object.node – YAML mapping node representing the BinnedData object.
data (
MappingNode
) –
- Return type:
- Returns:
The BinnedData object constructed from the YAML specified values.
- to_ROOT(copy=True)#
Convert into a ROOT histogram.
Note
This is a lossy operation because there is nowhere to store metadata is in the ROOT hist.
- Parameters:
copy (
bool
) – Copy the arrays before assigning them. The ROOT hist may be able to view the array memory, such that modifications in one would affect the other. Be extremely careful, as that can have unexpected side effects! So only disable with a very good reason. Default: True.- Return type:
- Returns:
ROOT histogram containing the data.
- to_boost_histogram()#
Convert into a boost-histogram.
Note
This is a lossy operation. The metadata is not preserved.
- Return type:
- Returns:
Boost histogram containing the data.
- to_histogram1D()#
Convert to a Histogram 1D.
This is entirely a convenience function. Generally, it’s best to stay with BinnedData, but a Histogram1D is required in some cases, such as for fitting.
- Return type:
- Returns:
Histogram1D containing the data.
- to_numpy()#
Convert to a numpy histogram.
- classmethod to_yaml(representer, obj)#
Encode YAML representation.
For some reason, YAML doesn’t encode this object properly, so we have to tell it how to do so.
We encode a mapping with the tuple stored in a sequence, as well as the serialization version.
- Parameters:
representer (
BaseRepresenter
) – Representation from YAML.data – AxesTuple to be converted to YAML.
obj (
BinnedData
) –
- Return type:
MappingNode
- Returns:
YAML representation of the AxesTuple object.
- pachyderm.binned_data.find_bin(bin_edges, value)#
Determine the index position where the value should be inserted.
This is basically
ROOT.TH1.FindBin(value)
, but it can used for any set of bin_edges.Note
Bins are 0-indexed here, while in ROOT they are 1-indexed.
pachyderm.fit.base module#
Base module for performing fits with Minuit.
- class pachyderm.fit.base.BaseFitResult(parameters, free_parameters, fixed_parameters, values_at_minimum, errors_on_parameters, covariance_matrix, errors)#
Bases:
object
Base fit result.
This represents the most basic fit result.
- parameters#
Names of the parameters used in the fit.
- free_parameters#
Names of the free parameters used in the fit.
- fixed_parameters#
Names of the fixed parameters used in the fit.
- values_at_minimum#
Contains the values of the full RP fit function at the minimum. Keys are the names of parameters, while values are the numerical values at convergence.
- errors_on_parameters#
Contains the values of the errors associated with the parameters determined via the fit.
- covariance_matrix#
Contains the values of the covariance matrix. Keys are tuples with (param_name_a, param_name_b), and the values are covariance between the specified parameters. Note that fixed parameters are _not_ included in this matrix.
- errors#
Store the errors associated with the component fit function.
- Parameters:
- property correlation_matrix: dict[tuple[str, str], float]#
The correlation matrix of the free parameters.
These values are derived from the covariance matrix values stored in the fit.
Note
This property caches the correlation matrix value so we don’t have to calculate it every time.
- Parameters:
None –
- Returns:
The correlation matrix of the fit result.
- exception pachyderm.fit.base.FitFailed#
Bases:
Exception
Raised if the fit failed. The message will include further details.
- class pachyderm.fit.base.FitResult(parameters, free_parameters, fixed_parameters, values_at_minimum, errors_on_parameters, covariance_matrix, errors, x, n_fit_data_points, minimum_val)#
Bases:
BaseFitResult
Main fit result class.
Note
free_parameters + fixed_parameters == parameters
- parameters#
Names of the parameters used in the fit.
- free_parameters#
Names of the free parameters used in the fit.
- fixed_parameters#
Names of the fixed parameters used in the fit.
- values_at_minimum#
Contains the values of the full RP fit function at the minimum. Keys are the names of parameters, while values are the numerical values at convergence.
- errors_on_parameters#
Contains the values of the errors associated with the parameters determined via the fit.
- covariance_matrix#
Contains the values of the covariance matrix. Keys are tuples with (param_name_a, param_name_b), and the values are covariance between the specified parameters. Note that fixed parameters are _not_ included in this matrix.
- errors#
Store the errors associated with the component fit function.
- x#
x values where the fit result should be evaluated.
- n_fit_data_points#
Number of data points used in the fit.
- minimum_val#
Minimum value of the fit when it coverages. This is the chi squared value for a chi squared minimization fit.
- effective_chi_squared(cost_func)#
Calculate the effective chi squared value.
If the fit was performed using a chi squared cost function, it’s just equal to the
minimal_val
. If it’s log likelihood, one must calculate the effective chi squared.Note
We attempt to cache this value so we don’t have to calculate it every time.
- Parameters:
cost_function – Cost function used to create the fit function.
data – Data to be used to calculate the chi squared.
cost_func (
DataComparisonCostFunction
) –
- Return type:
- Returns:
The effective chi squared value.
- classmethod from_minuit(minuit, cost_func, x)#
Create a fit result form the Minuit fit object.
- class pachyderm.fit.base.FuncCode(args)#
Bases:
EqualityMixin
Minimal class to describe function arguments.
Same approach as is taken in
iminuit
. Note that the precise name of the parameters is extremely important.- co_varnames#
Name of the function arguments.
- co_argcount#
Number of function arguments.
- co_argcount#
- co_varnames#
- classmethod from_function(func, leading_parameters_to_remove=1)#
Create a func_code from a function.
- pachyderm.fit.base.calculate_function_errors(func, fit_result, x)#
Calculate the errors of the given function based on values from the fit.
Note
We don’t take the x values for the fit_result as it may be desirable to calculate the errors for only a subset of x values. Plus, the component fit result doesn’t store the x values, so it would complicate the validation. It’s much easier to just require the user to pass the x values (and it takes little effort to do so).
- Parameters:
- Return type:
- Returns:
The calculated error values.
- pachyderm.fit.base.call_list_of_callables_with_operation(operation, functions, argument_positions, *args)#
Call and add a list of callables with the given args.
- Parameters:
- Return type:
- Returns:
Sum of the values of the functions.
- pachyderm.fit.base.chi_squared_probability(chi_2, ndf)#
Calculate the probability that the
This is just a thin wrapped around
scipy.stats
, but it’s convenient.
- pachyderm.fit.base.evaluate_gradient(func, fit_result, x)#
Evaluate the gradient of the given function based on the fit values.
For a function of 5 free parameters (7 total) and 10 x values, the returned result would be of the shape (10, 5).
- Parameters:
- Return type:
- Returns:
For each x value, the gradient is evaluated for each free parameter. It will be of the shape (len(x_values), len(free_parameters)).
- pachyderm.fit.base.extract_function_values(func, fit_result)#
Extract the parameters relevant to the given function from a fit result.
Note
The fit result may have more arguments at minimum and free parameters than the fit function that we’ve passed (for example, if we’ve calculating the background parameters for the inclusive signal fit), so we need to determine the free parameters here.
- pachyderm.fit.base.merge_func_codes(functions, prefixes=None, skip_prefixes=None)#
Merge the arguments of the given functions into one func_code.
Note
This has very similar functionality and is heavily inspired by
Probfit.merge_func_code...)
.- Parameters:
functions (
Iterable
[Callable
[...
,float
]]) – Functions whose arguments are to be merged.prefixes (
Sequence
[str
] |None
) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.skip_prefixes (
Sequence
[str
] |None
) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.
- Return type:
- Returns:
Merged list of arguments, map from merged arguments to arguments for each individual function.
pachyderm.fit.cost_function module#
Models for fitting.
- class pachyderm.fit.cost_function.BinnedChiSquared(*args, **kwargs)#
Bases:
DataComparisonCostFunction
Binned chi^2 cost function.
Calling this class will calculate the chi squared. Implemented with some help from …
- f#
The fit function.
- func_code#
Function arguments derived from the fit function. They need to be separately specified to allow iminuit to determine the proper arguments.
- data#
Data to be used for fitting.
- _cost_function#
Function to be used to calculate the actual cost function.
- class pachyderm.fit.cost_function.BinnedLogLikelihood(use_weights=False, *args, **kwargs)#
Bases:
DataComparisonCostFunction
Binned log likelihood cost function.
Calling this class will calculate the chi squared. Implemented with some help from …
- Parameters:
f – The fit function.
data – Data to be used for fitting.
use_weights (
bool
) – Whether to use the data weights when calculating the cost function. This is equivalent to the “WL” option in ROOT. Default: False.additional_call_options – Additional keyword options to be passed when calling the cost function.
- f#
The fit function.
- func_code#
Function arguments derived from the fit function. They need to be separately specified to allow iminuit to determine the proper arguments.
- data#
Data to be used for fitting.
- _cost_function#
Function to be used to calculate the actual cost function.
- class pachyderm.fit.cost_function.ChiSquared(*args, **kwargs)#
Bases:
DataComparisonCostFunction
chi^2 cost function.
- f#
The fit function.
- func_code#
Function arguments derived from the fit function. They need to be separately specified to allow iminuit to determine the proper arguments.
- data#
Data to be used for fitting.
- _cost_function#
Function to be used to calculate the actual cost function.
- class pachyderm.fit.cost_function.CostFunctionBase(f, data, **additional_call_options)#
Bases:
ABC
Base cost function.
- Parameters:
- f#
The fit function.
- func_code#
Function arguments derived from the fit function. They need to be separately specified to allow iminuit to determine the proper arguments.
- data#
Data to be used for fitting.
- additional_call_options#
Additional keyword options to be passed when calling the cost function.
- _cost_function#
Function to be used to calculate the actual cost function.
- class pachyderm.fit.cost_function.DataComparisonCostFunction(*args, **kwargs)#
Bases:
CostFunctionBase
Cost function which needs comparison data, the points where it was evaluated, and the errors.
This is in contrast to those which only need the input data. Examples of cost functions needing input data included the chi squared (both unbinned and binned), as well as the binned log likelihood.
- f#
The fit function.
- func_code#
Function arguments derived from the fit function. They need to be separately specified to allow iminuit to determine the proper arguments.
- data#
numpy array of all input values (not binned in any way). It’s just a list of the values.
- _cost_function#
Function to be used to calculate the actual cost function.
- class pachyderm.fit.cost_function.LogLikelihood(*args, **kwargs)#
Bases:
StandaloneCostFunction
Log likelihood cost function.
Calling this class will calculate the chi squared. Implemented with some help from …
- f#
The fit function.
- func_code#
Function arguments derived from the fit function. They need to be separately specified to allow iminuit to determine the proper arguments.
- data#
Data to be used for fitting.
- _cost_function#
Function to be used to calculate the actual cost function.
- class pachyderm.fit.cost_function.SimultaneousFit(*cost_functions)#
Bases:
EqualityMixin
Cost function for the simultaneous fit of the given cost functions.
- Parameters:
cost_functions (
Union
[TypeVar
(T_CostFunction
, bound= CostFunctionBase),SimultaneousFit
]) – The cost functions.
- functions#
The cost functions.
- func_code#
Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.
- argument_positions#
Map of merged arguments to the arguments for each individual function.
- class pachyderm.fit.cost_function.StandaloneCostFunction(*args, **kwargs)#
Bases:
CostFunctionBase
Cost function which only needs a list of input data.
This is in contrast to those which need data to compare against at each point. One example of a cost function which only needs the input data is the unbinned log likelihood.
- f#
The fit function.
- func_code#
Function arguments derived from the fit function. They need to be separately specified to allow iminuit to determine the proper arguments.
- data#
numpy array of all input values (not binned in any way). It’s just a list of the values.
- _cost_function#
Function to be used to calculate the actual cost function.
- pachyderm.fit.cost_function.binned_chi_squared_safe_for_zeros(x, y, errors, bin_edges, f, *args)#
Actual implementation of the binned chi squared.
See _binned_chi_squared for further information. This function is just the standard binned chi squared, but the division is protected from divide by 0. This allows safe use when calculating a binned chi squared.
- pachyderm.fit.cost_function.unravel_simultaneous_fits(functions)#
Unravel the cost functions from possible simultaneous fit objects.
The functions are unravel by recursively retrieving the functions from existing
SimultaneousFit
objects that may be in the list of passed functions. The cost functions store their fit data, so they are fully self contained. Consequently, we are okay to fully unravel the functions without worrying about the intermediateSimultaneousFit
objects.- Parameters:
functions (
Iterable
[CostFunctionBase
|SimultaneousFit
]) – Functions to unravel.- Return type:
- Returns:
Iterator of the base cost functions.
pachyderm.fit.function module#
Functions for use with fitting.
- class pachyderm.fit.function.AddPDF(*functions, prefixes=None, skip_prefixes=None)#
Bases:
CombinePDF
Add functions (PDFs) together.
- Parameters:
prefixes (
Sequence
[str
] |None
) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.skip_prefixes (
Sequence
[str
] |None
) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.
- functions#
List of functions that are combined in the PDF.
- func_code#
Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.
- argument_positions#
Map of merged arguments to the arguments for each individual function.
- class pachyderm.fit.function.CombinePDF(*functions, prefixes=None, skip_prefixes=None)#
Bases:
EqualityMixin
,ABC
Combine functions (PDFs) together.
- Parameters:
prefixes (
Sequence
[str
] |None
) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.skip_prefixes (
Sequence
[str
] |None
) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.
- functions#
List of functions that are combined in the PDF.
- func_code#
Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.
- argument_positions#
Map of merged arguments to the arguments for each individual function.
- class pachyderm.fit.function.DividePDF(*functions, prefixes=None, skip_prefixes=None)#
Bases:
CombinePDF
Divide functions (PDFs) together.
- Parameters:
prefixes (
Sequence
[str
] |None
) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.skip_prefixes (
Sequence
[str
] |None
) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.
- functions#
List of functions that are combined in the PDF.
- func_code#
Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.
- argument_positions#
Map of merged arguments to the arguments for each individual function.
- class pachyderm.fit.function.MultiplyPDF(*functions, prefixes=None, skip_prefixes=None)#
Bases:
CombinePDF
Multiply functions (PDFs) together.
- Parameters:
prefixes (
Sequence
[str
] |None
) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.skip_prefixes (
Sequence
[str
] |None
) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.
- functions#
List of functions that are combined in the PDF.
- func_code#
Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.
- argument_positions#
Map of merged arguments to the arguments for each individual function.
- class pachyderm.fit.function.SubtractPDF(*functions, prefixes=None, skip_prefixes=None)#
Bases:
CombinePDF
Subtract functions (PDFs) together.
- Parameters:
prefixes (
Sequence
[str
] |None
) – Prefix for arguments of each function. Default: None. If specified, there must be one prefix for each function.skip_prefixes (
Sequence
[str
] |None
) – Prefixes to skip when assigning prefixes. As noted in probfit, this can be useful to mix prefixed and non-prefixed arguments. Default: None.
- functions#
List of functions that are combined in the PDF.
- func_code#
Function arguments derived from the fit functions. They need to be separately specified to allow iminuit to determine the proper arguments.
- argument_positions#
Map of merged arguments to the arguments for each individual function.
- pachyderm.fit.function.extended_gaussian(x, mean, sigma, amplitude)#
Extended gaussian.
\[f = A / \sqrt{2 * \pi * \sigma^{2}} * \exp{-\frac{(x - \mu)^{2}}{(2 * \sigma^{2}}}\]- Parameters:
- Return type:
- Returns:
Calculated gaussian value(s).
- pachyderm.fit.function.gaussian(x, mean, sigma)#
Normalized gaussian.
\[f = 1 / \sqrt{2 * \pi * \sigma^{2}} * \exp{-\frac{(x - \mu)^{2}}{(2 * \sigma^{2}}}\]
pachyderm.generic_class module#
Contains generic classes
- class pachyderm.generic_class.EqualityMixin#
Bases:
object
Mixin generic comparison operations using __dict__.
Can then be mixed into any other class using multiple inheritance.
Inspired by: https://stackoverflow.com/a/390511.
pachyderm.generic_config module#
Analysis configuration base module.
For usage information, see jet_hadron.base.analysis_config
.
- pachyderm.generic_config.apply_formatting_dict(obj, formatting)#
Recursively apply a formatting dict to all strings in a configuration.
Note that it skips applying the formatting if the string appears to contain latex (specifically, if it contains an “$”), since the formatting fails on nested brackets.
- pachyderm.generic_config.create_key_index_object(key_index_name, iterables)#
Create a
KeyIndex
class based on the passed attributes.This is wrapped into a helper function to allow for the
__iter__
to be specified for the object. Further, this allows it to be called outside the package when it is needed in analysis tasks.- Parameters:
- Return type:
- Returns:
A
KeyIndex
class which can be used to specify an object. The keys and values will be iterable.- Raises:
TypeError – If one of the iterables which is passed is an iterator that can be exhausted. The iterables must all be passed within containers which can recreate the iterator each time it is called to iterate.
- pachyderm.generic_config.create_objects_from_iterables(obj, args, iterables, formatting_options, key_index_name='KeyIndex')#
Create objects for each set of values based on the given arguments.
The iterable values are available under a key index
dataclass
which is used to index the returned dictionary. The names of the fields are determined by the keys of iterables dictionary. The values are the newly created object. Note that the iterable values must be convertible to a str() so they can be included in the formatting dictionary.Each set of values is also included in the object args.
As a basic example,
```pycon >>> create_objects_from_iterables( … obj=obj, … args={}, … iterables={“a”: [“a1”, “a2”], “b”: [“b1”, “b2”]}, … formatting_options={}, … ) (
KeyIndex, {“a”: [“a1”, “a2”], “b”: [“b1”, “b2”]}, {
KeyIndex(a = “a1”, b = “b1”): obj(a = “a1”, b = “b1”), KeyIndex(a = “a1”, b = “b2”): obj(a = “a1”, b = “b2”), KeyIndex(a = “a2”, b = “b1”): obj(a = “a2”, b = “b1”), KeyIndex(a = “a2”, b = “b2”): obj(a = “a2”, b = “b2”),
}
)#
- type obj:
- param obj:
The object to be constructed.
- type args:
- param args:
Arguments to be passed to the object to create it.
- type iterables:
- param iterables:
Iterables to be used to create the objects, with entries of the form
"name_of_iterable": iterable
.- type formatting_options:
- param formatting_options:
Values to be used in formatting strings in the arguments.
- type key_index_name:
- param key_index_name:
Name of the iterable key index.
- returns:
- Roughly, (KeyIndex, iterables, objects). Specifically, the
key_index is a new dataclass which defines the parameters used to create the object, iterables are the iterables used to create the objects, which names as keys and the iterables as values. The objects dictionary keys are KeyIndex objects which describe the iterable arguments passed to the object, while the values are the newly constructed arguments. See the example above.
- rtype:
(object, list, dict, dict)
- pachyderm.generic_config.determine_override_options(selected_options, override_opts, set_of_possible_options=None)#
Recursively extract the dict described in override_options().
In particular, this searches for selected options in the override_opts dict. It stores only the override options that are selected.
- Parameters:
selected_options (
tuple
[Any
,...
]) – The options selected for this analysis, in the order defined used withoverride_options()
and in the configuration file.override_opts (
CommentedMap
) – dict-like object returned by ruamel.yaml which contains the options that should be used to override the configuration options.set_of_possible_options (tuple of enums) – Possible options for the override value categories.
- Return type:
- pachyderm.generic_config.determine_selection_of_iterable_values_from_config(config, possible_iterables)#
Determine iterable values to use to create objects for a given configuration.
All values of an iterable can be included be setting the value to
True
(Not as a single value list, but as the only value.). Alternatively, an iterator can be disabled by setting the value toFalse
.
- class pachyderm.generic_config.formatting_dict#
-
Dict to handle missing keys when formatting a string.
It returns the missing key for later use in formatting. See: https://stackoverflow.com/a/17215533
- pachyderm.generic_config.iterate_with_selected_objects(analysis_objects, **selections)#
Iterate over an analysis dictionary with selected attributes.
- Parameters:
- Yields:
object – Matching analysis object.
- Return type:
- pachyderm.generic_config.iterate_with_selected_objects_in_order(analysis_objects, analysis_iterables, selection)#
Iterate over an analysis dictionary, yielding the selected attributes in order.
So if there are three iterables, a, b, and c, if we selected c, then we iterate over a and b, and return c in the same order each time for each set of values of a and b. As an example, consider the set of iterables:
`python a = ["a1", "a2"] b = ["b1", "b2"] c = ["c1", "c2"] `
then it will effectively return:
`pycon >>> for a_val in a: ... for b_val in b: ... for c_val in c: ... obj(a_val, b_val, c_val) `
This will yield:
`pycon >>> output = list(iterate_with_selected_objects_in_order(..., selection = ["a"])) [[("a1", "b1", "c1"), ("a2", "b1", "c1")], [("a1", "b2", "c1"), ("a2", "b2", "c1")], ...] `
This is particularly nice because we can then select on a set of iterables to be returned without having to specify the rest of the iterables that we don’t really care about.
- Parameters:
analysis_objects (
Mapping
[TypeVar
(_T_Key
),TypeVar
(_T_Analysis
)]) – Analysis objects dictionary.analysis_iterables (
Mapping
[str
,Sequence
[Any
]]) – Iterables used in constructing the analysis objects.selection (
str
|Sequence
[str
]) – Selection of analysis selections to return. Can be either a string or a sequence of selections.
- Yields:
object – Matching analysis object.
- Return type:
Iterator
[list
[tuple
[TypeVar
(_T_Key
),TypeVar
(_T_Analysis
)]]]
- pachyderm.generic_config.load_configuration(yaml, filename)#
Load an analysis configuration from a file.
- pachyderm.generic_config.override_options(config, selected_options, set_of_possible_options, config_containing_override=None)#
Determine override options for a particular configuration.
The options are determined by searching following the order specified in selected_options.
For the example config,
value: 3 override:
- 2.76:
- track:
value: 5
value will be assigned the value 5 if we are at 2.76 TeV with a track bias, regardless of the event activity or leading hadron bias. The order of this configuration is specified by the order of the selected_options passed. The above example configuration is from the jet-hadron analysis.
Since anchors aren’t kept for scalar values, if you want to override an anchored value, you need to specify it as a single value in a list (or dict, but list is easier). After the anchor values propagate, single element lists can be converted into scalar values using
simplify_data_representations()
.- Parameters:
config (
CommentedMap
) – The dict-like configuration from ruamel.yaml which should be overridden.selected_options (
tuple
[Any
,...
]) – The selected analysis options. They will be checked in the order with which they are passed, so make certain that it matches the order in the configuration file!set_of_possible_options (tuple of enums) – Possible options for the override value categories.
config_containing_override (
CommentedMap
|None
) – The dict-like config containing the override options in a map called “override”. If it is not specified, it will look for it in the main config.
- Returns:
The updated configuration
- Return type:
dict-like object
- pachyderm.generic_config.simplify_data_representations(config)#
Convert one entry lists to the scalar value
This step is necessary because anchors are not kept for scalar values - just for lists and dictionaries. Now that we are done with all of our anchor references, we can convert these single entry lists to just the scalar entry, which is more usable.
Some notes on anchors in ruamel.yaml are here: https://stackoverflow.com/a/48559644
- Parameters:
config (
CommentedMap
) – The dict-like configuration from ruamel.yaml which should be simplified.- Return type:
CommentedMap
- Returns:
The updated configuration.
pachyderm.histogram module#
Histogram related classes and functionality.
- class pachyderm.histogram.Histogram1D(bin_edges, y, errors_squared, metadata=<factory>)#
Bases:
object
Contains histogram data.
Note
Underflow and overflow bins are excluded!
When converting from a TH1 (either from ROOT or uproot), additional statistical information will be extracted from the hist to enable the calculation of additional properties. The information available is:
Total sum of weights (equal to np.sum(self.y), which we store)
Total sum of weights squared (equal to np.sum(self.errors_squared), which we store)
Total sum of weights * x
Total sum of weights * x * x
Each is a single float value. Since the later two values are unique, they are stored in the metadata.
- Parameters:
bin_edges (np.ndarray) – The histogram bin edges.
y (np.ndarray) – The histogram bin values.
errors_squared (np.ndarray) – The bin sum weight squared errors.
- x#
The bin centers.
- Type:
np.ndarray
- y#
The bin values.
- Type:
np.ndarray
- bin_edges#
The bin edges.
- Type:
np.ndarray
- errors#
The bin errors.
- Type:
np.ndarray
- errors_squared#
The bin sum weight squared errors.
- Type:
np.ndarray
- metadata#
Any additional metadata that should be stored with the histogram. Keys are expected to be strings, while the values can be anything. For example, could contain systematic errors, etc.
- Type:
- property bin_widths: ndarray[Any, dtype[Any]]#
Bin widths calculated from the bin edges.
- Returns:
Array of the bin widths.
- copy()#
Copies the object.
In principle, this should be the same as
copy.deepcopy(...)
, at least when this was written in Feb 2019. Butdeepcopy(...)
often seems to have very bad performance (and perhaps does additional implicit copying), so we copy these numpy arrays by hand.
- counts_in_interval(min_value=None, max_value=None, min_bin=None, max_bin=None)#
Count the number of counts within bins in an interval.
Note
The integration limits could be described as inclusive. This matches the ROOT convention. See
histogram1D._integral(...)
for further details on how these limits are determined.Note
The arguments can be mixed (ie. a min bin and a max value), so be careful!
- find_bin(value)#
Find the bin corresponding to the specified value.
For further information, see
find_bin(...)
in this module.Note
Bins are 0-indexed here, while in ROOT they are 1-indexed.
- classmethod from_existing_hist(hist)#
Convert an existing histogram.
Note
Underflow and overflow bins are excluded! Bins are assumed to be fixed size.
- Parameters:
hist (uproot.rootio.TH1* or ROOT.TH1) – Histogram to be converted.
- Returns:
Dataclass with x, y, and errors
- Return type:
Histogram
- classmethod from_hepdata(hist, extraction_function=<function _extract_values_from_hepdata_dependent_variable>)#
Convert (a set) of HEPdata histogram(s) to a Histogram1D.
Will include any information that the extraction function extracts and returns.
Note
This is not included in the
from_existing_hist(...)
function because HEPdata files are oriented towards potentially containing multiple histograms in a single object. So we just return all of them and let the user sort it out.Note
It only grabs the first independent variable to determining the x axis.
- Parameters:
extraction_function (
Callable
[[Mapping
[str
,Any
]],tuple
[list
[float
] |ndarray
[Any
,dtype
[Any
]],list
[float
] |ndarray
[Any
,dtype
[Any
]],dict
[str
,Any
]]]) – Extract values from HEPdata dict to be used to construct a histogram. Default: Retrieves y values, symmetric statical errors. Symmetric systematic errors are stored in the metadata.
- Return type:
- Returns:
List of Histogram1D constructed from the input HEPdata.
- integral(min_value=None, max_value=None, min_bin=None, max_bin=None)#
Integrate the histogram over the given range.
Note
Be very careful here! The equivalent of TH1::Integral(…) is counts_in_interval(..). That’s because when we multiply by the bin width, we implicitly should be resetting the stats. We will still get the right answer in terms of y and errors_squared, but if this result is used to normalize the hist, the stats will be wrong. We can’t just reset them here because the integral doesn’t modify the histogram.
Note
The integration limits could be described as inclusive. This matches the ROOT convention. See
histogram1D._integral(...)
for further details on how these limits are determined.Note
The arguments can be mixed (ie. a min bin and a max value), so be careful!
- property mean: float#
Mean of values filled into the histogram.
Calculated in the same way as ROOT and physt.
- Parameters:
None. –
- Returns:
Mean of the histogram.
- property n_entries: float#
The number of entries in the hist.
Note
This value is dependent on the weight. We don’t have a weight independent measure like a ROOT hist, so this value won’t agree with the number of entries from a weighted ROOT hist.
- property std_dev: float#
Standard deviation of the values filled into the histogram.
Calculated in the same way as ROOT and physt.
- Parameters:
None. –
- Returns:
Standard deviation of the histogram.
- property variance: float#
Variance of the values filled into the histogram.
Calculated in the same way as ROOT and physt.
- Parameters:
None. –
- Returns:
Variance of the histogram.
- class pachyderm.histogram.RootOpen(filename, mode='read')#
Bases:
AbstractContextManager
[_T_ContextManager
]Very simple helper to open root files.
- pachyderm.histogram.binned_mean(stats)#
Mean of values stored in the histogram.
Calculated in the same way as ROOT and physt.
- pachyderm.histogram.binned_standard_deviation(stats)#
Standard deviation of the values filled into the histogram.
Calculated in the same way as ROOT and physt.
- pachyderm.histogram.binned_variance(stats)#
Variance of the values filled into the histogram.
Calculated in the same way as ROOT and physt.
- pachyderm.histogram.calculate_binned_stats(bin_edges, y, weights_squared)#
Calculate the stats needed to fully determine histogram properties.
The values are calculated the same way as in
ROOT.TH1.GetStats(...)
. Recalculating the statistics is not ideal because information is lost compared to the information available when filling the histogram. In particular, we actual passed x value is used to calculate these values when filling, but we can only approximate this with the bin center when calculating these values later. Calculating them here is equivalent to callinghist.ResetStats()
before retrieving the stats.These results are accessible from the ROOT hist using
ctypes
via:>>> stats = np.array([0, 0, 0, 0], dtype = np.float64) >>> hist.GetStats(np.ctypeslib.as_ctypes(stats))
Note
sum_w
andsum_w2
calculated here are _not_ equal to the ROOT values if the histogram has been scaled. This is because the weights don’t change even if the histogram has been scaled. If the hist stats are reset, it loses this piece of information and has to reconstruct the stats from the current frequencies, such that it will then agree with this function.- Parameters:
- Return type:
- Returns:
Stats dict containing the newly calculated statistics.
- pachyderm.histogram.find_bin(bin_edges, value)#
Determine the index position where the value should be inserted.
This is basically
ROOT.TH1.FindBin(value)
, but it can used for any set of bin_edges.Note
Bins are 0-indexed here, while in ROOT they are 1-indexed.
- pachyderm.histogram.get_array_from_hist2D(hist, set_zero_to_NaN=True, return_bin_edges=False)#
Extract x, y, and bin values from a 2D ROOT histogram.
Converts the histogram into a numpy array, and suitably processes it for a surface plot by removing 0s (which can cause problems when taking logs), and returning a set of (x, y) mesh values utilizing either the bin edges or bin centers.
Note
This is a different format than the 1D version!
- Parameters:
hist (ROOT.TH2) – Histogram to be converted.
set_zero_to_NaN (
bool
) – If true, set 0 in the array to NaN. Useful with matplotlib so that it will ignore the values when plotting. See comments in this function for more details. Default: True.return_bin_edges (
bool
) – Return x and y using bin edges instead of bin centers.
- Return type:
tuple
[ndarray
[Any
,dtype
[Any
]],ndarray
[Any
,dtype
[Any
]],ndarray
[Any
,dtype
[Any
]]]- Returns:
- Contains (x values, y values, numpy array of hist data) where (x, y) are values on a
grid (from np.meshgrid) using the selected bin values.
- pachyderm.histogram.get_bin_edges_from_axis(axis)#
Get bin edges from a ROOT hist axis.
Note
Doesn’t include over- or underflow bins!
- pachyderm.histogram.get_histograms_in_file(filename)#
Helper function which gets all histograms in a file.
- pachyderm.histogram.get_histograms_in_list(filename, list_name=None)#
Get histograms from the file and make them available in a dict.
Lists are recursively explored, with all lists converted to dictionaries, such that the return dictionaries which only contains hists and dictionaries of hists (ie there are no ROOT
TCollection
derived objects).- Parameters:
- Return type:
- Returns:
- Contains hists with keys as their names. Lists are recursively added, mirroring
the structure under which the hists were stored.
- Raises:
ValueError – If the list could not be found in the given file.
pachyderm.plot module#
Plotting styling and utilities.
- class pachyderm.plot.AxisConfig(axis, label='', log=False, range=None, font_size=None, tick_font_size=None, use_major_axis_multiple_locator_with_base=None)#
Bases:
object
Configuration for an axis.
- Parameters:
- apply(ax)#
Apply the axis configuration to the given axis.
- Parameters:
ax (
Axes
) –- Return type:
- class pachyderm.plot.Figure(edge_padding=_Nothing.NOTHING, text=_Nothing.NOTHING)#
Bases:
object
Configuration for a MPL figure.
- Parameters:
text (
TextConfig
|Sequence
[TextConfig
]) –
- apply(fig)#
Apply the figure configuration to the given figure.
-
text:
Sequence
[TextConfig
]#
- class pachyderm.plot.LegendConfig(location=None, anchor=None, font_size=None, ncol=1, marker_label_spacing=None, label_spacing=None, column_spacing=None, handle_height=None, handler_map=_Nothing.NOTHING)#
Bases:
object
Configuration for a legend on a plot.
- Parameters:
- apply(ax, legend_handles=None, legend_labels=None)#
Apply the legend configuration to the given axis.
Note
If provided, we’ll use the given legend_handles and legend_labels to create the legend rather than those already associated with the legend.
- class pachyderm.plot.Panel(axes, text=_Nothing.NOTHING, legend=None, title=None)#
Bases:
object
Configuration for a panel within a plot.
The Panel is a configuration for an ax object.
- axes#
Configuration of the MPL axis. We allow for multiple AxisConfig because each config specifies a single axis (ie. x or y). Careful not to confuse with the actual ax object provided by MPL.
- Parameters:
axes (
AxisConfig
|Sequence
[AxisConfig
]) –text (
TextConfig
|Sequence
[TextConfig
]) –legend (
LegendConfig
|None
) –title (
TitleConfig
|None
) –
- apply(ax, legend_handles=None, legend_labels=None)#
Apply the panel configuration to the given axis.
-
axes:
Sequence
[AxisConfig
]#
-
legend:
LegendConfig
|None
#
-
text:
Sequence
[TextConfig
]#
-
title:
TitleConfig
|None
#
- class pachyderm.plot.PlotConfig(name, panels, figure=_Nothing.NOTHING)#
Bases:
object
Configuration for an overall plot.
A plot consists of some number of panels, which are each configured with their own axes, text, etc. These axes are on a figure.
- name#
Name of the plot. Usually used for the filename.
- panels#
Configuration for the panels of the plot.
- figure#
Configuration for the figure of the plot.
- apply(fig, ax=None, axes=None, legend_handles=None, legend_labels=None)#
Apply the plot configuration to the given figure and axes.
- class pachyderm.plot.TextConfig(text, x, y, alignment=None, color='black', font_size=None, text_kwargs=_Nothing.NOTHING)#
Bases:
object
Configuration for text on a plot.
- Parameters:
- apply(ax_or_fig)#
Apply the text configuration to the given axis or figure.
- class pachyderm.plot.TitleConfig(text, size=None)#
Bases:
object
Configuration for a title of a plot.
- apply(ax)#
Apply the title configuration to the given axis.
- Parameters:
ax (
Axes
) –- Return type:
- pachyderm.plot.configure(disable_interactive_backend=False)#
Configure matplotlib according to my (biased) specification.
As a high level summary, this is a combination of a number of seaborn settings, along with my own tweaks. By calling this function, the matplotlib
rcParams
will be modified according to these settings.Up to this point, the settings have been configured by importing the jet_hadron.plot.base module, which set a variety of parameters on import. This included some options which were set by seaborn. Additional modifications were made to the fonts to ensure that they are the same in labels and latex. Lastly, it tweaked smaller visual settings. The differences between the default matplotlib and these settings are:
```pycon >>> pprint.pprint(diff) {‘axes.axisbelow’: ‘original: line, new: True’,
‘axes.edgecolor’: ‘original: black, new: .15’, ‘axes.labelcolor’: ‘original: black, new: .15’, ‘axes.labelsize’: ‘original: medium, new: 12.0’, ‘axes.linewidth’: ‘original: 0.8, new: 1.25’, ‘axes.prop_cycle’: “original: cycler(‘color’, [‘#1f77b4’, ‘#ff7f0e’, “
“’#2ca02c’, ‘#d62728’, ‘#9467bd’, ‘#8c564b’, ‘#e377c2’, ” “’#7f7f7f’, ‘#bcbd22’, ‘#17becf’]), new: cycler(‘color’, ” ‘[(0.2980392156862745, 0.4470588235294118, ‘ ‘0.6901960784313725), (0.8666666666666667, ‘ ‘0.5176470588235295, 0.3215686274509804), ‘ ‘(0.3333333333333333, 0.6588235294117647, ‘ ‘0.40784313725490196), (0.7686274509803922, ‘ ‘0.3058823529411765, 0.3215686274509804), ‘ ‘(0.5058823529411764, 0.4470588235294118, ‘ ‘0.7019607843137254), (0.5764705882352941, ‘ ‘0.47058823529411764, 0.3764705882352941), ‘ ‘(0.8549019607843137, 0.5450980392156862, ‘ ‘0.7647058823529411), (0.5490196078431373, ‘ ‘0.5490196078431373, 0.5490196078431373), (0.8, ‘ ‘0.7254901960784313, 0.4549019607843137), ‘ ‘(0.39215686274509803, 0.7098039215686275, ‘ ‘0.803921568627451)])’,
‘axes.titlesize’: ‘original: large, new: 12.0’, ‘font.sans-serif’: “original: [‘DejaVu Sans’, ‘Bitstream Vera Sans’, “
“‘Computer Modern Sans Serif’, ‘Lucida Grande’, ‘Verdana’, ” “‘Geneva’, ‘Lucid’, ‘Arial’, ‘Helvetica’, ‘Avant Garde’, ” “‘sans-serif’], new: [‘Arial’, ‘DejaVu Sans’, ‘Liberation ” “Sans’, ‘Bitstream Vera Sans’, ‘sans-serif’]”,
‘font.size’: ‘original: 10.0, new: 12.0’, ‘grid.color’: ‘original: #b0b0b0, new: .8’, ‘grid.linewidth’: ‘original: 0.8, new: 1.0’, ‘image.cmap’: ‘original: viridis, new: rocket’, ‘legend.fontsize’: ‘original: medium, new: 11.0’, ‘lines.solid_capstyle’: ‘original: projecting, new: round’, ‘mathtext.bf’: ‘original: sans:bold, new: Bitstream Vera Sans:bold’, ‘mathtext.fontset’: ‘original: dejavusans, new: custom’, ‘mathtext.it’: ‘original: sans:italic, new: Bitstream Vera Sans:italic’, ‘mathtext.rm’: ‘original: sans, new: Bitstream Vera Sans’, ‘patch.edgecolor’: ‘original: black, new: w’, ‘patch.facecolor’: ‘original: C0, new: (0.2980392156862745, ‘
‘0.4470588235294118, 0.6901960784313725)’,
‘patch.force_edgecolor’: ‘original: False, new: True’, ‘text.color’: ‘original: black, new: .15’, ‘text.usetex’: ‘original: False, new: True’, ‘xtick.color’: ‘original: black, new: .15’, ‘xtick.direction’: ‘original: out, new: in’, ‘xtick.labelsize’: ‘original: medium, new: 11.0’, ‘xtick.major.size’: ‘original: 3.5, new: 6.0’, ‘xtick.major.width’: ‘original: 0.8, new: 1.25’, ‘xtick.minor.size’: ‘original: 2.0, new: 4.0’, #’xtick.minor.top’: ‘original: True, new: False’, ‘xtick.minor.visible’: ‘original: False, new: True’, ‘xtick.minor.width’: ‘original: 0.6, new: 1.0’, ‘ytick.color’: ‘original: black, new: .15’, ‘ytick.direction’: ‘original: out, new: in’, ‘ytick.labelsize’: ‘original: medium, new: 11.0’, ‘ytick.major.size’: ‘original: 3.5, new: 6.0’, ‘ytick.major.width’: ‘original: 0.8, new: 1.25’, ‘ytick.minor.right’: ‘original: True, new: False’, ‘ytick.minor.size’: ‘original: 2.0, new: 4.0’, #’ytick.minor.visible’: ‘original: False, new: True’, ‘ytick.minor.width’: ‘original: 0.6, new: 1.0’}
I implemented most of these below (although I left out a few color options).
For more on the non-interactive mode, see: https://gist.github.com/matthewfeickert/84245837f09673b2e7afea929c016904
- pachyderm.plot.convert_mpl_color_scheme_to_ROOT(name=None, cmap=None, reverse_cmap=False, n_values_to_cut_from_top=0)#
Convert matplotlib color scheme to ROOT.
- Parameters:
- Return type:
- Returns:
Snippet to add the color scheme to ROOT.
- pachyderm.plot.error_boxes(ax, x_data, y_data, y_errors, x_errors=None, **kwargs)#
Plot error boxes for the given data.
Inspired by: https://matplotlib.org/gallery/statistics/errorbars_and_boxes.html and https://github.com/HDembinski/pyik/blob/217ae25bbc316c7a209a1a4a1ce084f6ca34276b/pyik/mplext.py#L138
Note
The errors are distances from the central value. ie. for 10% error on 1, the two entry version should be [0.1, 0.1].
- Parameters:
ax (
Axes
) – Axis onto which the rectangles will be drawn.y_errors (
ndarray
[Any
,dtype
[Any
]]) – y errors of the data. The array can either be of length n, or of length (n, 2) for asymmetric errors.x_errors (
ndarray
[Any
,dtype
[Any
]] |None
) – x errors of the data. The array can either be of length n, or of length (n, 2) for asymmetric errors. Default: None. This corresponds to boxes that are 10% of the distance between the two given point and the previous one.
- Return type:
PatchCollection
pachyderm.projectors module#
Handle generic TH1 and THn projections.
- class pachyderm.projectors.HistAxisRange(axis_range_name, axis_type, min_val, max_val)#
Bases:
EqualityMixin
Represents the restriction of a range of an axis of a histogram.
An axis can be restricted by multiple
HistAxisRange
elements (although separate projections are needed to apply more than one. This would be accomplished with separate entries to the HistProjector.projection_dependent_cut_axes).Note
A single axis which has multiple ranges could be represented by multiple
HistAxisRange
objects!- Parameters:
axis_range_name (str) – Name of the axis range. Usually some combination of the axis name and some sort of description of the range.
axis_type (enum.Enum) – Enumeration corresponding to the axis to be restricted. The numerical value of the enum should be axis number (for a THnBase).
min_val (function) – Minimum range value for the axis. Usually set via
apply_func_to_find_bin()
.min_val – Maximum range value for the axis. Usually set via
apply_func_to_find_bin()
.
- static apply_func_to_find_bin(func, values=None)#
Closure to determine the bin associated with a value on an axis.
It can apply a function to an axis if necessary to determine the proper bin. Otherwise, it can just return a stored value.
Note
To properly determine the value, carefully note the information below. In many cases, such as when we want values [2, 5), the values need to be shifted by a small epsilon to retrieve the proper bin. This is done automatically in
SetRangeUser()
.>>> hist = ROOT.TH1D("test", "test", 10, 0, 10) >>> x = 2, y = 5 >>> hist.FindBin(x) 2 >>> hist.FindBin(x+epsilon) 2 >>> hist.FindBin(y) 6 >>> hist.FindBin(y-epsilon) 5
Note that the bin + epsilon on the lower bin is not strictly necessary, but it is used for consistency with the upper bound.
- Parameters:
- Return type:
- Returns:
Function to be called with an axis to determine the desired bin on that axis.
- apply_range_set(hist)#
Apply the associated range set to the axis of a given hist.
Note
The min and max values should be bins, not user ranges! For more, see the binning explanation in
apply_func_to_find_bin(...)
.
- class pachyderm.projectors.HistProjector(observable_to_project_from, output_observable, projection_name_format, output_attribute_name=None, projection_information=None)#
Bases:
object
Handles generic ROOT
THn
andTH1
projections.There are three types of cuts which can be specified:
additional_axis_cuts
: Axis cuts which do not change based on the projection axis.projection_dependent_cut_axes
: Axis cuts which change based on the projection axis.projection_axes
: Axes onto which the projection will be performed.
For a full description of each type of cut and the necessary details, see their descriptions in the attributes.
Note
The TH1 projections have not been tested as extensively as the
THn
projections.Note
input_key
,input_hist
,input_observable
,projection_name
, andoutput_hist
are all reserved keys, such they will be overwritten by predefined information when passed to the various functions. Thus, they should be avoided by the user when storing projection information- Parameters:
observable_to_project_from (
dict
[str
,Any
] |Any
) – The observables which should be used to project from. The dict key is passed toprojection_name(...)
asinput_key
.output_observable (
dict
[str
,Any
] |Any
) – Object or dict where the projected hist will be stored.projection_name_format (
str
) – Format string to determine the projected hist name.output_attribute_name (
str
|None
) – Name of the attribute where which the single observable projection will be stored in the output_observable object. Must not be specified if projecting with multiple objects. Default: None.projection_information (
dict
[str
,Any
] |None
) – Keyword arguments to be passed toprojection_name(...)
to determine the name of the projected histogram. Default: None.
- single_observable_projection#
True if the projector is only performing a single observable projection.
- output_attribute_name#
Name of the attribute under which the single observable projection will be stored in the output_observable object.
- observable_to_project_from#
The observable(s) which should be used to project from. The dict key is passed to
projection_name(...)
asinput_key
.
- output_observable#
Where the projected hist(s) will be stored. They will be stored under the dict key determined by
output_key_name(...)
.
- projection_name_format#
Format string to determine the projected hist name.
- projection_information#
Keyword arguments to be passed to
projection_name(...)
to determine the name of the projected histogram.
- additional_axis_cuts#
List of axis cuts which are neither projected nor depend on the axis being projected.
- Type:
- projection_dependent_cut_axes#
List of list of axis cuts which depend on the projected axis. For example, if we want to project non-continuous ranges of a non-projection axis (say, dEta when projecting dPhi). It is a list of list to allow for groups of cuts to be specified together if necessary.
- Type:
- call_projection_function(hist)#
Calls the actual projection function for the hist.
- cleanup_cuts(hist, cut_axes)#
Cleanup applied cuts by resetting the axis to the full range.
Inspired by: https://github.com/matplo/rootutils/blob/master/python/2.7/THnSparseWrapper.py
- Parameters:
hist (
Any
) – Histogram for which the axes should be reset.cut_axes (
Iterable
[HistAxisRange
]) – List of axis cuts, which correspond to axes that should be reset.
- Return type:
- get_hist(observable, **kwargs)#
Get the histogram that may be stored in some object.
This histogram is used to project from.
Note
The output object could just be the raw ROOT histogram.
Note
This function is just a basic placeholder and likely should be overridden.
- Parameters:
- Return type:
- Returns:
- ROOT.TH1 or ROOT.THnBase histogram which should be projected. By default, it returns the
observable (input object).
- output_hist(output_hist, input_observable, **kwargs)#
Return an output object. It should store the
output_hist
.Note
The output object could just be the raw histogram.
Note
This function is just a basic placeholder which returns the given output object (a histogram) and likely should be overridden.
- Parameters:
- Return type:
- Returns:
- The output object which should be stored in the output dict. By default, it returns the
output hist.
- output_key_name(input_key, output_hist, projection_name, **kwargs)#
Returns the key under which the output object should be stored.
Note
This function is just a basic placeholder which returns the projection name and likely should be overridden.
- Parameters:
- Return type:
- Returns:
- Key under which the output object should be stored. By default, it returns the
projection name.
- project(**kwargs)#
Perform the requested projection(s).
Note
All cuts on the original histograms will be reset when this function is completed.
- projection_name(**kwargs)#
Define the projection name for this projector.
Note
This function is just a basic placeholder and likely should be overridden.
- Parameters:
kwargs (
dict
[str
,Any
]) – Projection information dict combined with additional arguments passed to the projection function.- Return type:
- Returns:
- Projection name string formatted with the passed options. By default, it returns
projection_name_format
formatted with the arguments to this function.
- class pachyderm.projectors.TH1AxisType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#
Bases:
Enum
Map from (x,y,z) axis to the axis number.
Other enumerations that refer to this enum should refer to the _values_ to ensure consistency in .value pointing to the axis value.
- x_axis = 0#
- y_axis = 1#
- z_axis = 2#
- pachyderm.projectors.hist_axis_func(axis_type)#
Wrapper to retrieve the axis of a given histogram.
This can be convenient outside of just projections, so it’s made available in the API.
pachyderm.remove_outliers module#
Provides outliers removal methods.
- class pachyderm.remove_outliers.OutliersRemovalManager(moving_average_threshold=1.0)#
Bases:
object
Manage the removal of outliers from histograms.
- Parameters:
moving_average_threshold (
float
) –
- run(outliers_removal_axis, hist=None, hists=None, mean_fractional_difference_limit=0.01, median_fractional_difference_limit=0.01)#
Remove outliers from the given histogram(s).
- Parameters:
outliers_removal_axis (
Union
[TH1AxisType
,Enum
]) – Axis along which outliers removal will be performed. Usually the particle level axis.hist (
Any
|None
) – Histogram to check for outliers. Either this orhists
must be specified.hists (
Mapping
[str
,Any
] |None
) – Histograms to check for outliers. Either this orhist
must be specified.mean_fractional_difference_limit (
float
) – Max fractional difference of mean after outliers removal. Default: 0.01.median_fractional_difference_limit (
float
) – Max fractional difference of median after outliers removal. Default: 0.01.
- Return type:
- Returns:
Bin index value from which the outliers were removed. The histogram(s) is modified in place.
pachyderm.utils module#
Broad collection of utility functions and constants.
- pachyderm.utils.moving_average(arr, n=3)#
Calculate the moving overage over an array.
Algorithm from: https://stackoverflow.com/a/14314054
- Parameters:
arr (np.ndarray) – Array over which to calculate the moving average.
n (int) – Number of elements over which to calculate the moving average. Default: 3
- Returns:
Moving average calculated over n.
- Return type:
np.ndarray
- pachyderm.utils.recursive_getattr(obj, attr, *args)#
Recursive
getattr
.This can be used as a drop in for the standard
getattr(...)
. Credit to: https://stackoverflow.com/a/31174427- Parameters:
- Return type:
- Returns:
The requested attribute. (Same as
getattr
).- Raises:
AttributeError – If the attribute was not found and no default was provided. (Same as
getattr
).
- pachyderm.utils.recursive_getitem(d, keys)#
Recursively retrieve an item from a nested dict.
Credit to: https://stackoverflow.com/a/52260663
- pachyderm.utils.recursive_setattr(obj, attr, val)#
Recursive
setattr
.This can be used as a drop in for the standard
setattr(...)
. Credit to: https://stackoverflow.com/a/31174427- Parameters:
- Return type:
- Returns:
The requested attribute. (Same as
getattr
).- Raises:
AttributeError – If the attribute was not found and no default was provided. (Same as
getattr
).
pachyderm.version module#
pachyderm.yaml module#
Module related to YAML.
Contains a way to construct the main YAML object, as well as relevant mixins and classes.
Note
The YAML to/from enum values would be much better as a mixin. However, such an approach causes substantial issues.
In particular, although we don’t explicitly pickle the values, calling copy.copy
implicitly calls pickle, so
we must maintain compatibility. However, enum mixins preclude pickling the enum value
(see cpython/enum.py line 177). The problem
basically comes down to the fact that we are assigning a bound staticmethod to the class when we mix
it in, and it doesn’t seem to be able to resolving pickling the object (perhaps due to name resolution issues).
For a bit more, see the comments on this stackoverflow post.
Practically, I believe that we could also resolve this by implementing __reduce_ex
, but that appears as if
it will be more work than our implemented workaround. Our workaround can be implemented as:
```python class TestEnum(enum.Enum):
a = 1 b = 2
- def __str__(self):
return self.name
to_yaml = staticmethod(generic_class.enum_to_yaml) from_yaml = staticmethod(generic_class.enum_from_yaml)
This enum object will pickle properly. Note that rather strangely, this issue showed up during tests on Debian Stretch, but not the exact same version of python on macOS. I don’t know why that’s the case, but the workaround seems to be fine on both systems, so we’ll just continue to use it.
- pachyderm.yaml.enum_from_yaml(cls, constructor, node)#
Decode YAML representation.
This is a mixin method for reading enum values from YAML. It needs to be added to the enum as a classmethod. See the module docstring for further information on this approach and how to implement it.
Note
This method assumes that the name of the enumeration value was stored as a scalar node.
- Parameters:
- Return type:
- Returns:
The constructed YAML value from the name of the enumerated value.
- pachyderm.yaml.enum_to_yaml(cls, representer, data)#
Encodes YAML representation.
This is a mixin method for writing enum values to YAML. It needs to be added to the enum as a classmethod. See the module docstring for further information on this approach and how to implement it.
This method writes whatever is used in the string representation of the YAML value. Usually, this will be the unique name of the enumeration value. If the name is used, the corresponding
EnumFromYAML
mixin can be used to recreate the value. If the name isn’t used, more care may be necessary, so afrom_yaml
method for that particular enumeration may be necessary.Note
This method assumes that the name of the enumeration value should be stored as a scalar node.
- pachyderm.yaml.numpy_array_from_yaml(constructor, data)#
Read an array from YAML to numpy.
It reads arrays registered under the tag
!numpy_array
.Use with:
`python yaml = ruamel.yaml.YAML() yaml.constructor.add_constructor("!numpy_array", yaml.numpy_array_from_yaml) `
Note
We cannot use
yaml.register_class
because it won’t register the proper type. (It would register the type of the class, rather than of numpy.ndarray). Instead, we use the above approach to register this method explicitly with the representer.Note
In order to allow users to write an array by hand, we check the data given. If it’s a list, we convert the values and put them into an array. If it’s binary encoded, we decode and load it.
- Parameters:
constructor (
BaseConstructor
) – YAML constructor being used to read and create the objects specified in the YAML.data (
SequenceNode
) – Data stored in the YAML node currently being processed.
- Return type:
- Returns:
numpy array containing the data in the current YAML node.
- pachyderm.yaml.numpy_array_to_yaml(representer, data)#
Write a numpy array to YAML.
It registers the array under the tag
!numpy_array
.Use with:
`python yaml = ruamel.yaml.YAML() yaml.representer.add_representer(np.ndarray, yaml.numpy_array_to_yaml) `
Note
We cannot use
yaml.register_class
because it won’t register the proper type. (It would register the type of the class, rather than of numpy.ndarray). Instead, we use the above approach to register this method explicitly with the representer.
- pachyderm.yaml.numpy_float64_from_yaml(constructor, data)#
Read an float64 from YAML to numpy.
It reads the float64 registered under the tag
!numpy_float64
.Use with:
`python yaml = ruamel.yaml.YAML() yaml.constructor.add_constructor("!numpy_float64", yaml.numpy_float64_from_yaml) `
Note
We cannot use
yaml.register_class
because it won’t register the proper type. (It would register the type of the class, rather than of numpy.float64). Instead, we use the above approach to register this method explicitly with the representer.Note
In order to allow users to write an float by hand, we check the data given. If it’s a raw float, we put it into an float64. If it’s binary encoded, we decode and load it.
- Parameters:
constructor (
BaseConstructor
) – YAML constructor being used to read and create the objects specified in the YAML.data (
ScalarNode
) – Data stored in the YAML node currently being processed.
- Return type:
float64
- Returns:
numpy float64 containing the data in the current YAML node.
- pachyderm.yaml.numpy_float64_to_yaml(representer, data)#
Write a numpy float64 to YAML.
It registers the float under the tag
!numpy_float64
.Use with:
`python yaml = ruamel.yaml.YAML() yaml.representer.add_representer(np.float64, yaml.numpy_float64_to_yaml) `
Note
We cannot use
yaml.register_class
because it won’t register the proper type. (It would register the type of the class, rather than of numpy.float64). Instead, we use the above approach to register this method explicitly with the representer.- Parameters:
representer (
BaseRepresenter
) –data (
float64
) –
- Return type:
- pachyderm.yaml.register_classes(yaml, classes=None)#
Register externally defined classes.
- pachyderm.yaml.register_module_classes(yaml, modules=None)#
Register all classes in the given modules with the YAML object.
This is a simple helper function.
- pachyderm.yaml.yaml(modules_to_register=None, classes_to_register=None)#
Create a YAML object for loading a YAML configuration.
- Parameters:
- Return type:
YAML
- Returns:
A newly creating YAML object, configured as appropriate.