Module ml4opf.parsers

ML4OPF Parsers

Sub-modules

ml4opf.parsers.pglearn

Parser for the PGLearn datasets

ml4opf.parsers.read_hdf5

This file gives an example implementation of how to read an entire HDF5 into a Python dictionary. Note that if only using a subset of the data, it is …

Classes

class PGLearnParser (data_path: str | pathlib.Path)

Parser for PGLearn dataset.

Initialize the parser by validating and setting the path.

Class variables

var padval

Static methods

def convert_to_float32(dat: dict[str, torch.Tensor | numpy.ndarray | numpy.str_])

Convert all float64 data to float32 in-place.

def make_tree(dat: dict[str, torch.Tensor | numpy.ndarray | numpy.str_],
delimiter: str = '/')

Convert a flat dictionary to a tree. Note that the keys of dat must have a tree structure where data is only at the leaves. Assumes keys are delimited by "/", i.e. "solution/primal/pg".

Args

dat : dict
Flat dictionary of data.
delimiter : str, optional
Delimiter to use for splitting keys. Defaults to "/".

Returns

dict
Tree dictionary of data from dat.
def pad_to_dense(array, padval, dtype=builtins.int)

Methods

def open_json(self)

Open the JSON file, supporting gzip and bz2 compression based on the file suffix.

def parse_h5(self,
dataset_name: str,
split: str = 'train',
primal: bool = True,
dual: bool = False,
convert_to_float32: bool = True) ‑> dict[str, torch.Tensor | numpy.ndarray | numpy.str_] | tuple[dict[str, torch.Tensor | numpy.ndarray | numpy.str_], dict[str, torch.Tensor | numpy.ndarray | numpy.str_]]

Parse the HDF5 file.

Args

dataset_name : str
The name of the dataset. Typically the formulation ("ACOPF", "DCOPF", etc.).
split : str, optional
The split to return. Defaults to "train".
primal : bool, optional
If True, parse the primal file. Defaults to True.
dual : bool, optional
If True, parse the dual file. Defaults to False.
convert_to_float32 : bool, optional
If True, convert all float64 data to torch.float32. Defaults to True.

Returns

dict
Flattened dictionary of HDF5 data with PyTorch tensors for numerical data and NumPy arrays for string/object data.

If make_test_set is True, then this function will return a tuple of two dictionaries. The first dictionary is the training set and the second dictionary is the test set. The test set is a random 10% sample of the training set.

This parser will return a single-level dictionary where the keys are in the form of solution/primal/pg where solution is the group, primal is the subgroup, and pg is the dataset from the HDF5 file. The values are PyTorch tensors. This parser uses h5py.File.visititems to iterate over the HDF5 file quickly.

def parse_json(self, model_type: str | Sequence[str] = None)

Parse the JSON file from PGLearn.

Args

model_type : Union[str, Sequence[str]]
The reference solutions to save. Default: [] (no reference solutions saved.)

Returns

dict
Dictionary containing the parsed data.

In the JSON file, the data is stored by each individual component. So to get generator 1's upper bound on active generation, you'd look at: raw_json['data']['gen']['1']['pmax'] and get a float.

In the parsed version, we aggregate each of the components attributes into torch.Tensor arrays. So to get generator 1's upper bound on active generation, you'd look at: dat['gen']['pmax'][0] and get a float. Note that the index is 0-based and an integer, not 1-based and a string.

To access the reference solution, pass a model_type (or multiple) and then access dat["ref_solutions"][model_type].

def validate_path(self, path: str | pathlib.Path) ‑> pathlib.Path

Validate the path to the HDF5 file.