PyEGRO DOE Module API Reference¶
Overview¶
The Design of Experiments (DOE) module in PyEGRO provides tools for generating initial design samples with support for multiple sampling methods and variable types. This reference details the classes, methods, and parameters available in the module.
Classes¶
InitialDesign¶
The main class for creating and managing design samples.
Constructor¶
InitialDesign(
output_dir='DATA_PREPARATION',
show_progress=True,
show_result=True,
sampling_method='lhs',
sampling_criterion='maximin',
results_filename='training_data'
)
Parameters¶
output_dir(str, optional): Directory for saving files. Default: 'DATA_PREPARATION'show_progress(bool, optional): Whether to show progress bars during execution. Default: Trueshow_result(bool, optional): Whether to show result summary after execution. Default: Truesampling_method(str, optional): Sampling method to use ('lhs', 'sobol', 'halton', or 'random'). Default: 'lhs'sampling_criterion(str, optional): Criterion for LHS (only used whensampling_methodis 'lhs'). Default: 'maximin'results_filename(str, optional): Base filename for saving results CSV. Default: 'training_data'
Methods¶
add_design_variable¶
Add a design variable to the experiment.
add_design_variable(
name,
range_bounds,
cov=None,
std=None,
delta=None,
distribution='normal',
description=''
)
Parameters¶
name(str): Name of the variablerange_bounds(List[float]): List containing [min, max] boundscov(float, optional): Coefficient of variation (uncertainty as a proportion of the mean)std(float, optional): Standard deviation (fixed uncertainty value)delta(float, optional): Half-width for uniform uncertainty around design pointsdistribution(str, optional): Distribution type ('normal', 'uniform', 'lognormal'). Default: 'normal'description(str, optional): Optional description of the variable. Default: ''
Notes¶
- Specify exactly one of
cov,std, ordeltafor uncertain variables. - For deterministic variables, don't specify any uncertainty parameter.
- If
deltais specified, using 'uniform' distribution is recommended.
Returns¶
- None
add_env_variable¶
Add an environmental variable to the experiment.
add_env_variable(
name,
distribution,
description='',
mean=None,
cov=None,
std=None,
low=None,
high=None,
min=None,
max=None
)
Parameters¶
name(str): Name of the variabledistribution(str): Distribution type (see Supported Distributions)description(str, optional): Optional description of the variable. Default: ''mean(float, optional): Mean value (for normal, lognormal, etc.). Default: Nonecov(float, optional): Coefficient of variation (alternative to std). Default: Nonestd(float, optional): Standard deviation (alternative to cov). Default: Nonelow(float, optional): Lower bound (for uniform, triangular, etc.). Default: Nonehigh(float, optional): Upper bound (for uniform, triangular, etc.). Default: Nonemin(float, optional): Lower bound (alternative to low). Default: Nonemax(float, optional): Upper bound (alternative to high). Default: None
Notes¶
- For normal/lognormal distributions, specify
meanand exactly one ofcovorstd. - For uniform distributions, specify either
low/highormin/max. - The parameters
min/maxare provided for backward compatibility withlow/high.
Returns¶
- None
save¶
Save design configuration to a JSON file.
Parameters¶
filename(str, optional): Base name for the JSON file (without extension). Default: 'data_info'
Returns¶
- None
load¶
Load design configuration from a previously saved JSON file.
Parameters¶
filename(str): Base name of the JSON file to load (without extension)
Returns¶
- None
run¶
Run the sampling process and evaluate the objective function.
Parameters¶
objective_function(Callable): Function to evaluate samplesnum_samples(int): Number of samples to generatesave_results(bool, optional): Whether to save results to file. Default: True
Returns¶
- pandas.DataFrame containing samples and objective values
print_summary¶
Print summary statistics of the sampling results.
Parameters¶
df(pandas.DataFrame): DataFrame containing the sampling results
Returns¶
- None
AdaptiveDistributionSampler¶
A class for generating samples from different distributions.
Class Attributes¶
SAMPLING_METHODS(list): List of supported sampling methods: ['random', 'lhs', 'sobol', 'halton']
Methods¶
generate_samples¶
Generate samples for a given distribution type.
generate_samples(
distribution,
mean=None,
cov=None,
std=None,
delta=None,
lower=None,
upper=None,
size=1000,
**kwargs
)
Parameters¶
distribution(str): Distribution type (see Supported Distributions)mean(float, optional): Mean value. Default: Nonecov(float, optional): Coefficient of variation. Default: Nonestd(float, optional): Standard deviation (alternative to cov). Default: Nonedelta(float, optional): Half-width for uniform distribution. Default: Nonelower(float, optional): Lower bound. Default: Noneupper(float, optional): Upper bound. Default: Nonesize(int, optional): Number of samples to generate. Default: 1000**kwargs(dict): Additional parameters for specific distributions
Returns¶
- numpy.ndarray of samples
generate_design_samples¶
Generate samples for design variables using specified method.
Parameters¶
design_vars(List[Variable]): List of design variablesnum_samples(int): Number of samples to generatemethod(str, optional): Sampling method. Default: 'lhs'criterion(str, optional): Criterion for LHS. Default: 'maximin'
Returns¶
- numpy.ndarray of samples scaled to variable bounds
generate_env_samples¶
Generate samples for an environmental variable.
Parameters¶
var(Variable or dict): Environmental variable to samplenum_samples(int): Number of samples to generate
Returns¶
- numpy.ndarray of samples
generate_all_samples¶
Generate samples for all variables.
Parameters¶
variables(List[Variable]): List of Variable objectsnum_samples(int): Number of samples to generatemethod(str, optional): Sampling method. Default: 'lhs'criterion(str, optional): Criterion for LHS. Default: None
Returns¶
- numpy.ndarray of samples
calculate_input_bounds¶
Calculate input bounds for all variables.
Parameters¶
variables(List[Variable]): List of variables
Returns¶
- List[List[float]] containing [min, max] bounds for each variable
Variable¶
Attributes¶
cov(float, optional): Coefficient of variationstd(float, optional): Standard deviation (alternative to cov)delta(float, optional): Half-width for uniform uncertaintylow(float, optional): Lower bound (for various distributions)high(float, optional): Upper bound (for various distributions)min(float, optional): Lower bound (alternative to low)max(float, optional): Upper bound (alternative to high)
Supported Distributions¶
The following distributions are supported for design and environment variables:
Uniform: Constant probability within interval
-
Required parameters
env variables:
low/highormin/maxdesign variables:
delta
Normal: Gaussian distribution
-
Required parameters
env variables:
meanplus eithercovorstddesign variables:
covorstdOptional parameters:
lower,upper(for truncation)
Lognormal: Natural logarithm follows normal distribution
-
Required parameters
env variables:
meanplus eithercovorstddesign variables:
covorstdOptional parameters:
lower,upper(for truncation)
Module Functions¶
get_version¶
Return the current version of the module.
Returns¶
- str - Version string
get_citation¶
Return citation information for the module.
Returns¶
- str - Citation text
print_usage¶
Print detailed usage instructions for the module.
Returns¶
- None
Examples¶
Basic Example¶
import numpy as np
from PyEGRO.doe import InitialDesign
# Define objective function
def objective_func_2D(x):
X1, X2 = x[:, 0], x[:, 1]
return -((X1 - 2)**2 + (X2 - 3)**2)
# Create initial design
design = InitialDesign(
output_dir='my_experiment',
sampling_method='lhs',
sampling_criterion='maximin'
)
# Add variables
design.add_design_variable(name='x1', range_bounds=[-5, 5], cov=0.1)
design.add_design_variable(name='x2', range_bounds=[-5, 5], cov=0.1)
# Run sampling
results = design.run(objective_function=objective_func_2D, num_samples=50)
Using Standard Deviation Instead of CoV¶
# Create design
design = InitialDesign(sampling_method='lhs')
# Add design variables with standard deviation
design.add_design_variable(
name='x1',
range_bounds=[-5, 5],
std=0.2 # Fixed standard deviation
)
design.add_design_variable(
name='x2',
range_bounds=[-5, 5],
std=0.3
)
# Run sampling
results = design.run(objective_function=objective_func_2D, num_samples=50)
Using Delta for Uniform Uncertainty¶
# Create design
design = InitialDesign(sampling_method='sobol')
# Add design variables with uniform uncertainty
design.add_design_variable(
name='x1',
range_bounds=[-5, 5],
delta=0.1, # Half-width of uniform distribution
distribution='uniform'
)
design.add_design_variable(
name='x2',
range_bounds=[-5, 5],
delta=0.1,
distribution='uniform'
)
# Run sampling
results = design.run(objective_function=objective_func_2D, num_samples=50)
With Multiple Distribution Types¶
# Create design
design = InitialDesign(sampling_method='sobol')
# Add design variables
design.add_design_variable(name='x1', range_bounds=[-5, 5], std=0.2)
# Add environmental variables with different distributions
design.add_env_variable(
name='normal_var',
distribution='normal',
mean=5,
std=2.0 # Using std instead of cov
)
design.add_env_variable(
name='uniform_var',
distribution='uniform',
low=0.0,
high=1.0
)
design.add_env_variable(
name='lognormal_var',
distribution='lognormal',
mean=10,
std=2.0
)
# Define objective function that uses all variables
def objective_func(x):
design_var = x[:, 0]
normal_var = x[:, 1]
uniform_var = x[:, 2]
lognormal_var = x[:, 3]
return design_var**2 + 10*normal_var + 5*uniform_var + lognormal_var
# Run sampling
results = design.run(objective_function=objective_func, num_samples=100)
Output File Formats¶
data_info.json¶
Contains configuration information including: - Variable definitions (name, type, distribution, bounds, etc.) - Sampling configuration (method, criterion) - Metadata (creation time, variable counts)
Example structure:
{
"variables": [
{
"name": "x1",
"vars_type": "design_vars",
"range_bounds": [-5, 5],
"std": 0.2,
"distribution": "normal",
"description": "Design variable with std uncertainty"
},
{
"name": "noise",
"vars_type": "env_vars",
"distribution": "normal",
"mean": 0,
"std": 0.5,
"description": "Environmental noise"
}
],
"variable_names": ["x1", "noise"],
"input_bound": [[-5, 5], [-1.5, 1.5]],
"sampling_config": {
"method": "lhs",
"criterion": "maximin"
},
"metadata": {
"created_at": "2025-03-31T12:34:56.789Z",
"total_variables": 2
}
}
training_data.csv¶
Contains the sample data: - Columns for each variable (design and environmental) - Final column 'y' with objective function values
Example:
Best Practices¶
When to use each uncertainty specification method¶
- Use
cov: When uncertainty is proportional to the design value (e.g., manufacturing tolerance as a percentage) - Use
std: When uncertainty is fixed regardless of design value or near zero regions exist - Use
delta: When uncertainty follows a uniform distribution with fixed width
Choosing a sampling method¶
- LHS: Good for general-purpose sampling with even coverage
- Sobol/Halton: Good for progressive sampling where you might need more points later
- Random: Only use when randomness is specifically required
Handling environmental variables¶
- Use
stdinstead ofcovfor normal environmental variables when the mean is near zero - For uniform distributions, both
low/highandmin/maxare supported for backward compatibility