Module ml4opf.parsers
ML4OPF Parsers
Sub-modules
ml4opf.parsers.pglearn
-
Parser for the PGLearn datasets
ml4opf.parsers.read_hdf5
-
This file gives an example implementation of how to read an entire HDF5 into a Python dictionary. Note that if only using a subset of the data, it is …
Classes
class PGLearnParser (data_path: str | pathlib.Path)
-
Parser for PGLearn dataset.
Initialize the parser by validating and setting the path.
Class variables
var padval
Static methods
def convert_to_float32(dat: dict[str, torch.Tensor | numpy.ndarray | numpy.str_])
-
Convert all float64 data to float32 in-place.
def make_tree(dat: dict[str, torch.Tensor | numpy.ndarray | numpy.str_],
delimiter: str = '/')-
Convert a flat dictionary to a tree. Note that the keys of
dat
must have a tree structure where data is only at the leaves. Assumes keys are delimited by "/", i.e. "solution/primal/pg".Args
dat
:dict
- Flat dictionary of data.
delimiter
:str
, optional- Delimiter to use for splitting keys. Defaults to "/".
Returns
dict
- Tree dictionary of data from
dat
.
def pad_to_dense(array, padval, dtype=builtins.int)
Methods
def open_json(self)
-
Open the JSON file, supporting gzip and bz2 compression based on the file suffix.
def parse_h5(self,
dataset_name: str,
split: str = 'train',
primal: bool = True,
dual: bool = False,
convert_to_float32: bool = True) ‑> dict[str, torch.Tensor | numpy.ndarray | numpy.str_] | tuple[dict[str, torch.Tensor | numpy.ndarray | numpy.str_], dict[str, torch.Tensor | numpy.ndarray | numpy.str_]]-
Parse the HDF5 file.
Args
dataset_name
:str
- The name of the dataset. Typically the formulation ("ACOPF", "DCOPF", etc.).
split
:str
, optional- The split to return. Defaults to "train".
primal
:bool
, optional- If True, parse the primal file. Defaults to True.
dual
:bool
, optional- If True, parse the dual file. Defaults to False.
convert_to_float32
:bool
, optional- If True, convert all float64 data to torch.float32. Defaults to True.
Returns
dict
- Flattened dictionary of HDF5 data with PyTorch tensors for numerical data and NumPy arrays for string/object data.
If
make_test_set
is True, then this function will return a tuple of two dictionaries. The first dictionary is the training set and the second dictionary is the test set. The test set is a random 10% sample of the training set.This parser will return a single-level dictionary where the keys are in the form of
solution/primal/pg
wheresolution
is the group,primal
is the subgroup, andpg
is the dataset from the HDF5 file. The values are PyTorch tensors. This parser usesh5py.File.visititems
to iterate over the HDF5 file quickly. def parse_json(self, model_type: str | Sequence[str] = None)
-
Parse the JSON file from PGLearn.
Args
model_type
:Union[str, Sequence[str]]
- The reference solutions to save. Default: [] (no reference solutions saved.)
Returns
dict
- Dictionary containing the parsed data.
In the JSON file, the data is stored by each individual component. So to get generator 1's upper bound on active generation, you'd look at: raw_json['data']['gen']['1']['pmax'] and get a float.
In the parsed version, we aggregate each of the components attributes into torch.Tensor arrays. So to get generator 1's upper bound on active generation, you'd look at: dat['gen']['pmax'][0] and get a float. Note that the index is 0-based and an integer, not 1-based and a string.
To access the reference solution, pass a model_type (or multiple) and then access dat["ref_solutions"][model_type].
def validate_path(self, path: str | pathlib.Path) ‑> pathlib.Path
-
Validate the path to the HDF5 file.