Module ml4opf.parsers
ML4OPF Parsers
Sub-modules
ml4opf.parsers.pglearn-
Parser for the PGLearn datasets
ml4opf.parsers.read_hdf5-
This file gives an example implementation of how to read an entire HDF5 into a Python dictionary. Note that if only using a subset of the data, it is …
Classes
class PGLearnParser (data_path: str | pathlib.Path)-
Parser for PGLearn dataset.
Initialize the parser by validating and setting the path.
Class variables
var padval
Static methods
def convert_to_float32(dat: dict[str, torch.Tensor | numpy.ndarray | numpy.str_])-
Convert all float64 data to float32 in-place.
def make_tree(dat: dict[str, torch.Tensor | numpy.ndarray | numpy.str_],
delimiter: str = '/')-
Convert a flat dictionary to a tree. Note that the keys of
datmust have a tree structure where data is only at the leaves. Assumes keys are delimited by "/", i.e. "solution/primal/pg".Args
dat:dict- Flat dictionary of data.
delimiter:str, optional- Delimiter to use for splitting keys. Defaults to "/".
Returns
dict- Tree dictionary of data from
dat.
def pad_to_dense(array, padval, dtype=builtins.int)
Methods
def open_json(self)-
Open the JSON file, supporting gzip and bz2 compression based on the file suffix.
def parse_h5(self,
dataset_name: str,
split: str = 'train',
primal: bool = True,
dual: bool = False,
convert_to_float32: bool = True) ‑> dict[str, torch.Tensor | numpy.ndarray | numpy.str_] | tuple[dict[str, torch.Tensor | numpy.ndarray | numpy.str_], dict[str, torch.Tensor | numpy.ndarray | numpy.str_]]-
Parse the HDF5 file.
Args
dataset_name:str- The name of the dataset. Typically the formulation ("ACOPF", "DCOPF", etc.).
split:str, optional- The split to return. Defaults to "train".
primal:bool, optional- If True, parse the primal file. Defaults to True.
dual:bool, optional- If True, parse the dual file. Defaults to False.
convert_to_float32:bool, optional- If True, convert all float64 data to torch.float32. Defaults to True.
Returns
dict- Flattened dictionary of HDF5 data with PyTorch tensors for numerical data and NumPy arrays for string/object data.
If
make_test_setis True, then this function will return a tuple of two dictionaries. The first dictionary is the training set and the second dictionary is the test set. The test set is a random 10% sample of the training set.This parser will return a single-level dictionary where the keys are in the form of
solution/primal/pgwheresolutionis the group,primalis the subgroup, andpgis the dataset from the HDF5 file. The values are PyTorch tensors. This parser usesh5py.File.visititemsto iterate over the HDF5 file quickly. def parse_json(self, model_type: str | Sequence[str] = None)-
Parse the JSON file from PGLearn.
Args
model_type:Union[str, Sequence[str]]- The reference solutions to save. Default: [] (no reference solutions saved.)
Returns
dict- Dictionary containing the parsed data.
In the JSON file, the data is stored by each individual component. So to get generator 1's upper bound on active generation, you'd look at: raw_json['data']['gen']['1']['pmax'] and get a float.
In the parsed version, we aggregate each of the components attributes into torch.Tensor arrays. So to get generator 1's upper bound on active generation, you'd look at: dat['gen']['pmax'][0] and get a float. Note that the index is 0-based and an integer, not 1-based and a string.
To access the reference solution, pass a model_type (or multiple) and then access dat["ref_solutions"][model_type].
def validate_path(self, path: str | pathlib.Path) ‑> pathlib.Path-
Validate the path to the HDF5 file.