Hi. First of all thanks a lot for developing optuna, is an amazing software! I'm doing hyperarameter optimization using vanilla model:
study_name = 'wd_dr_hidden_lr_e3'
storage = 'sqlite:///e3_%s.db'%field
n_trials = 50
objective = Objective(device, seed, f_maps, f_params, batch_size, splits,
arch, min_lr, beta1, beta2, epochs, root_out, field)
sampler = optuna.samplers.TPESampler(n_startup_trials=10)
study = optuna.create_study(study_name=study_name, sampler=sampler, storage=storage,
load_if_exists=True)
study.optimize(objective, n_trials)
I'm using 2 GPUs (run on different terminals) and after a few trials I get this error:
Traceback (most recent call last):
File "main_hyperparams.py", line 222, in <module>
study.optimize(objective, n_trials)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/study.py", line 315, in optimize
show_progress_bar=show_progress_bar,
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/_optimize.py", line 65, in _optimize
progress_bar=progress_bar,
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/_optimize.py", line 156, in _optimize_sequential
trial = _run_trial(study, func, catch)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/_optimize.py", line 238, in _run_trial
study._tell(trial, TrialState.COMPLETE, value)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/study.py", line 603, in _tell
self._storage.set_trial_state(trial._trial_id, state)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/storages/_cached_storage.py", line 200, in set_trial_state
return self._flush_trial(trial_id)
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/storages/_cached_storage.py", line 404, in _flush_trial
datetime_complete=updates.datetime_complete,
File "/mnt/home/fvillaescusa/.local/lib/python3.7/site-packages/optuna/storages/_rdb/storage.py", line 604, in _update_trial
raise RuntimeError("Cannot change attributes of finished trial.")
RuntimeError: Cannot change attributes of finished trial.
I was wondering if I'm doing something wrong here. Thanks for your help!
Happy New Year! Kicking off with the release of v2.4.0, thanks to everyone who was involved. This is a minor but still a big release. Please check out the highlights and release note at https://github.com/optuna/optuna/releases/tag/v2.4.0 or via the Tweet https://twitter.com/OptunaAutoML/status/1348897690545840135?s=20. In short, it contains
Python 3.9 support (with the exclusion of integration modules)
Multi-objective optimization that’s now stable as a first-class citizen
Sampler that wraps BoTorch for Bayesian optimization. This sampler opens up Optuna for constrained optimization using slack variables, i.e. outcome constraints such as x0 + x1 < y
. See https://github.com/optuna/optuna/blob/release-v2.4.0/examples/botorch_simple.py
Richer and more easily extensible tutorial https://optuna.readthedocs.io/en/v2.4.0/tutorial/index.html
May I ask is there any CmaEs sampler usage?
I am wondering CmaES is only used for sampling relative parameters, while random sampler is used to sample independent parameters. Is the relative search space is determined by backend if we follow the demo codes?
import optuna
def objective(trial):
x = trial.suggest_uniform("x", -1, 1)
y = trial.suggest_int("y", -1, 1)
return x ** 2 + y
sampler = optuna.samplers.CmaEsSampler()
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=20)
Hi Team, I am using a dataset of about 140,000 rows and 300 features (after categorical encoding). I am also using optuna integration for xgboost(using xgb.cv().Firstly, I tried with xgboost 1.3.3 and optuna 2.4.0 - program ran for 18 hours using 32 CPUs and not even a single trial completed. I then ran using xgboost 1.3.1 and optuna 2.4.0 -- program ran for 3 hours using 60 CPUs and not even a single trial completed. I am now trying with xgboost 1.2.1 and optuna 2.3.0 using 60 CPUs - Can anyone help to understand in case of any compatibility issues ?
The code that I am using is given below:
def objective(trial):
# Define the search space
param_sp = {
'base_score' : 0.5,
'booster' : 'gbtree',
'colsample_bytree' : trial.suggest_categorical('colsample_bytree', [0.7,0.8,0.9,1.0]),
'learning_rate' : trial.suggest_categorical('learning_rate',[0.1]),
'max_depth' : trial.suggest_categorical('max_depth', [6,8,10]),
'objective' : 'binary:logistic',
'scale_pos_weight' : trial.suggest_categorical('scale_pos_weight', [ratio1,ratio2,ratio3,ratio4,1,10,30,50,75,99,100]),
'subsample' : trial.suggest_categorical('subsample', [0.5,0.6,0.7,0.8,0.9,1.0]),
'verbosity' : 1,
'tree_method' :'auto',
'predictor' :'cpu_predictor',
'eval_metric' :'aucpr'
}
#Add the pruning Call Back
pruning_callback = optuna.integration.XGBoostPruningCallback(trial, "test-aucpr")
#Perform Native API cross validation
xgb_cv_results=xgb.cv(param_sp,dtrain,stratified=True,folds=skfolds,metrics='aucpr',num_boost_round=500,early_stopping_rounds=50,as_pandas=True,verbose_eval=False,seed=42,shuffle=True,callbacks=[pruning_callback])
# Set n_estimators as a trial attribute
trial.set_user_attr("n_estimators", len(xgb_cv_results))
# Extract the best score.
best_score = xgb_cv_results["test-aucpr-mean"].values[-1]
return best_score
pruner = optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=20, interval_steps=10)
study = optuna.create_study(study_name='XGB_Optuna_0.1_Iter1',direction='maximize',sampler=TPESampler(consider_magic_clip=True,seed=42,multivariate=False),pruner=pruner)
# perform the search
print('\nPerforming Bayesian Hyper Parameter Optimization..')
study.optimize(objective, n_trials=100,n_jobs=-1)
Hi Team, I am using XGBoost 1.3.3 and Optuna 2.4.0. My dataset has 138k rows and 300 columns (after categorical encoding). I am trying to replicate the example as in - https://github.com/optuna/optuna/blob/master/examples/pruning/xgboost_integration.py (but only for booster='gbtree'). When I run the code , I get the message 'segmentation fault' and the program returns to the $prompt (I am using amazon linux). Can anyone please help to understand as to why am I getting the message 'segmentation fault' ?
The code that I am using is as given below:
# Import data into xgb.DMatrix form
dtrain = xgb.DMatrix(X_train,label=y_train)
dtest = xgb.DMatrix(X_test,label=y_test)
# define the search space and the objecive function
def objective(trial):
param_sp = {
'base_score' : 0.5,
'booster' : 'gbtree',
'colsample_bylevel' : trial.suggest_categorical('colsample_bylevel',[0.7,0.8,0.9]),
'colsample_bynode' : trial.suggest_categorical('colsample_bynode',[0.7,0.8,0.9]),
'colsample_bytree' : trial.suggest_categorical('colsample_bytree',[0.7,0.8,0.9]),
'gamma' : trial.suggest_categorical('gamma',[0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,0.3,0.5,0.7,0.9,1,2,3,4,5,6,7,8,9,10]),
'learning_rate' : trial.suggest_categorical('learning_rate',[0.1]),
'max_delta_step' : trial.suggest_categorical('max_delta_step', [0,1,2,3,4,5,6,7,8,9,10]),
'max_depth' : trial.suggest_categorical('max_depth', [10]),
'min_child_weight' : trial.suggest_categorical('min_child_weight', [1,3,5,7,9,11,13,15,17,19,21]),
'objective' : 'binary:logistic',
'reg_alpha' : trial.suggest_categorical('reg_alpha', [0.000000001,0.00000001,0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1,10,100]),
'reg_lambda' : trial.suggest_categorical('reg_lambda', [0.000000001,0.00000001,0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1,10,100]),
'scale_pos_weight' : trial.suggest_categorical('scale_pos_weight', [ratio1,1,10,20,30,40,50,60,70,80,90,100,1000]),
'seed' : 42,
'subsample' : trial.suggest_categorical('subsample', [0.5,0.6,0.7,0.8,0.9]),
'verbosity' : 1,
'tree_method' :'auto',
'predictor' :'cpu_predictor',
'eval_metric' :'error'
}
#Add the pruning Call Back
pruning_callback = optuna.integration.XGBoostPruningCallback(trial, "validation-error")
#Perform validation
xgb_bst=xgb.train(param_sp,dtrain,num_boost_round=1000,evals=[(dtest, "validation")],early_stopping_rounds=100,verbose_eval=False,callbacks=[pruning_callback])
# Set n_estimators as a trial attribute
trial.set_user_attr("n_estimators", xgb_bst.best_ntree_limit)
# Extract the best score.
preds = xgb_bst.predict(dtest)
pred_labels = np.rint(preds)
f1 = metrics.f1_score(y_test, pred_labels)
return f1
pruner = optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=20, interval_steps=10)
study = optuna.create_study(study_name='XGB_Optuna_0.1_max_depth_10_Error_Val_500_trials',direction='minimize',sampler=TPESampler(consider_magic_clip=True,seed=42,multivariate=False),pruner=pruner)
# perform the search
print('\nPerforming Bayesian Hyper Parameter Optimization..')
study.optimize(objective, n_trials=500,n_jobs=16)
Hello
I'm trying to visualize the study output in jupyter notebook
optuna.visualization.plot_optimization_history(study)
optuna.visualization.plot_slice(study)
optuna.visualization.plot_contour(study, params=['epochs', 'learning_rate'])
Nothing happen when I run these commands.
Has anybody tested to do some visualization in similar environment ?
val_loss
, which is supposed to be minimised. How can I be sure that the prunner is minimising the val_loss
while still having a maximise optimisation step?
loss = weight_1 * loss_component_1 + weight_2 * loss_component_2 + ...
. And constraint sum(weight_i) = 1; all(weight_i) > 0 & all(weight_i) < 1
. I want to find optimal combination of weight_i
. So essentially I need to sample parameters from multinoulli distribution. Of course I can sample parameters from uniform and then normalize them, but I don't feel this is the right way.
Hi everyone. I am going to use Optuna for hyperparameter optimization of an iterative process in which the number of samples increases by iterations. I start Optuna from scratch for iteration 0, but for the next iterations I use accumulated trials from all previous iterations. With this warm-up scheme after some iterations the search space becomes so small and it concentrates on a very small region in the parameter space. Now, I need to give it the chance to look into other regions in the parameter space after a few iterations. One idea that I have is to force it to forget the trials from long time ago, for example when it starts iteration 5 I want to ignore the trials from iteration 0 and 1 and so on. To do so I use this piece of code to manually change the state of those trials from 'COMPLETE' to 'FAIL'; with this when the 'study' is loaded only the trials with state='COMPLETE' are taken into account.
def makefailSqliteTable(storage):
try:
sqliteConnection = sqlite3.connect(storage)
cursor = sqliteConnection.cursor()
sql_update_query = """Update trials set state = 'FAIL' """
cursor.execute(sql_update_query)
sqliteConnection.commit()
cursor.close()
except sqlite3.Error as error:
print("Failed to update sqlite table", error)
finally:
if (sqliteConnection):
sqliteConnection.close()
print("The SQLite connection is closed")
def updateSqliteTable(storage, N):
try:
sqliteConnection = sqlite3.connect(storage)
cursor = sqliteConnection.cursor()
df = pd.read_sql_query("SELECT * from trials", sqliteConnection)
sql_update_query = """Update trials set state = 'COMPLETE' where number > """ + str(len(df)-N)
cursor.execute(sql_update_query)
sqliteConnection.commit()
cursor.close()
except sqlite3.Error as error:
print("Failed to update sqlite table", error)
finally:
if (sqliteConnection):
sqliteConnection.close()
print("The SQLite connection is closed")
I would like to know whether this procedure does the thing that I want. I mean, does it really forget the history from long time ago?
def objective(trial):
'''
Instantiate model and evaluate
'''
Metric 1 =
Metric 2 =
'''
Get additional metrics
'''
Metric 3
return [Metric 1, Metric 2, Metric 3]
d_space = np.linspace(0.05, 0.95, 2)
l_space = np.linspace(0.0001, .0002 , 2)
search_space = {"D": d_space, "L": l_space}
study = optuna.create_study(sampler=optuna.samplers.GridSampler(search_space))
study.optimize(objective,
n_trials=d_space.shape[0] * l_space.shape[0],
show_progress_bar=True)
Hi :-)
Is there predefined way to nest trial parameters?
I would like to pass a trial object into a function and all the parameters that get added inside this function should be prefixed automatically by a string that I specify.
I imagine the interface to look something like this but did not find something similar in the API:
def configure_subsytem_a(trial: optuna.Trial) -> SubSystemA:
n_params = trial.suggest_int("n_params", 1,3)
return SubSystemA(n_params)
trial = ...
subsystem_a = configure_subsytem_a(trial.withPrefix('subsystem_a'))
This should result in a conifg like this:
{
'subsystem_a.n_parmas': 3
}
It would be quite easy to build this functionality myself, by wrapping the trial object, but if functionality like this is provided, I would prefer to use that.
Hi,
What is the preferred way of dealing with trial/suggestions that have ranges that depend on each other, see below callfunction...
is the preferred way to do as below or is it better to set like a central value and penalize values outside of range?
Does this method even work with optuna, sampler being used: TPE. Are certain samplers better at this?
Tips on literature?
class Objectiveoptim(object):
def init(self, idmodelsdict, value):
self.idmodelsdict = idmodelsdict
self.value = value
def call(self, trial):
totalfactor = 0
totalvalueused = 0
valuedict = dict()
for id_, model in self.idmodelsdict.items():
model.eval()
valuedict[id_] = trial.suggest_float(id_, 0, self.value - totalvalueused)
totalvalueused += valuedict[id_]
totalfactor += model(torch.tensor([valuedict[id_]], dtype=torch.float32))
return totalfactor
Hi,
I'm having difficulties in understanding how i can use the command
optuna.visualization.plot_param_importances(study)
to visualize hyperparameters importance while performing multiple objectives optimization. I undestand that I should specify the metric wrt which I want the importances to be computed but I don't understand how to do so.
Thanks in advance!