Module `ml4opf.parsers`

ML4OPF Parsers

Sub-modules

ml4opf.parsers.pglearn: Parser for the PGLearn datasets
ml4opf.parsers.read_hdf5: This file gives an example implementation of how to read an entire HDF5 into a Python dictionary. Note that if only using a subset of the data, it is …

Classes

class PGLearnParser (data_path: str | pathlib.Path)

Browse git

Parser for PGLearn dataset.

Initialize the parser by validating and setting the path.

Class variables

var padval

Static methods

def convert_to_float32(dat: dict[str, torch.Tensor | numpy.ndarray | numpy.str_])

Browse git

Convert all float64 data to float32 in-place.

def make_tree(dat: dict[str, torch.Tensor | numpy.ndarray | numpy.str_], delimiter: str = '/')

Browse git

Convert a flat dictionary to a tree. Note that the keys of dat must have a tree structure where data is only at the leaves. Assumes keys are delimited by "/", i.e. "solution/primal/pg".

Args

dat : dict: Flat dictionary of data.
delimiter : str, optional: Delimiter to use for splitting keys. Defaults to "/".

Returns

dict: Tree dictionary of data from dat.

def pad_to_dense(array, padval, dtype=builtins.int)

Browse git

from https://codereview.stackexchange.com/questions/222623/pad-a-ragged-multidimensional-array-to-rectangular-shape

Methods

def open_json(self)

Browse git

Open the JSON file, supporting gzip and bz2 compression based on the file suffix.

Browse git

Parse the HDF5 file.

Args

dataset_name : str: The name of the dataset. Typically the formulation ("ACOPF", "DCOPF", etc.).
split : str, optional: The split to return. Defaults to "train".
primal : bool, optional: If True, parse the primal file. Defaults to True.
dual : bool, optional: If True, parse the dual file. Defaults to False.
convert_to_float32 : bool, optional: If True, convert all float64 data to torch.float32. Defaults to True.

Returns

dict: Flattened dictionary of HDF5 data with PyTorch tensors for numerical data and NumPy arrays for string/object data.

If make_test_set is True, then this function will return a tuple of two dictionaries. The first dictionary is the training set and the second dictionary is the test set. The test set is a random 10% sample of the training set.

This parser will return a single-level dictionary where the keys are in the form of solution/primal/pg where solution is the group, primal is the subgroup, and pg is the dataset from the HDF5 file. The values are PyTorch tensors. This parser uses h5py.File.visititems to iterate over the HDF5 file quickly.

def parse_json(self, model_type: str | Sequence[str] = None)

Browse git

Parse the JSON file from PGLearn.

Args

model_type : Union[str, Sequence[str]]: The reference solutions to save. Default: [] (no reference solutions saved.)

Returns

dict: Dictionary containing the parsed data.

In the JSON file, the data is stored by each individual component. So to get generator 1's upper bound on active generation, you'd look at: raw_json['data']['gen']['1']['pmax'] and get a float.

In the parsed version, we aggregate each of the components attributes into torch.Tensor arrays. So to get generator 1's upper bound on active generation, you'd look at: dat['gen']['pmax'][0] and get a float. Note that the index is 0-based and an integer, not 1-based and a string.

To access the reference solution, pass a model_type (or multiple) and then access dat["ref_solutions"][model_type].

def validate_path(self, path: str | pathlib.Path) ‑> pathlib.Path

Browse git

Validate the path to the HDF5 file.