## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
hitlixuan
@hitlixuan
振臂高呼，建个微信群吧
wzy461143268
@wzy461143268
igor17400
@igor17400

I have created a folder br_index inside data_collector in order to implement a code that keeps track of brazil largest index (ibovespa) historic composition. Very similar to what has been done inside us_index and cn_index.

I have created an issue #956 and I would like to upload the code I’ve written so it could be reviewed and, if approved, merged into the main repository. However, I tried to git push my branch but I don’t have permission.
How should I proceed?

8 replies
hotwind2015
@hotwind2015
@zhupr
calendar函数执行报错，代码如下：
D.calendar(start_time='2021-12-30', end_time='2021-12-31', freq='week', future = False)
错误提示如下：
ValueError: calendar not exists: K:\stock\qlib-data\cn_data_dump\calendars\1week.txt
程序在查找calendar文件时，文件名前面多了一个1，看程序似乎是要追加一个，但是calendars目录下的文件是通过dump_bin自动处理的。不会在文件名前自动追加“1”，是不是qlib 0.8.4在日历处理上还有bug。（手工改了文件名再执行是不会报错，但是不能每次dump完都要去改一下文件名字吧）
3 replies
realamd
@realamd
麻烦再发下微信群二维码吧
XianfengJiao
@XianfengJiao
有人运行example出现这个问题吗
ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[ValueError: instrument: {'__DEFAULT_FREQ': 'C:\Users\Alphonse\.qlib\qlib_data\cn_data'} does not contain data for day]
workflow_config_lightgbm_Alpha158.yaml
XianfengJiao
@XianfengJiao
可以再发一下微信群吗~
另外这个评价指标 risk 是越小越好吗
risk
excess_return_without_cost mean 0.000692
std 0.005374
annualized_return 0.174495
information_ratio 2.045576
max_drawdown -0.079103
excess_return_with_cost mean 0.000499
std 0.005372
annualized_return 0.125625
information_ratio 1.473152
max_drawdown -0.088263
这些指标
igor17400
@igor17400

While trying to submit a PR #990 , it failed in one test Test MacOS / build (macos-latest, 3.7) (pull_request).

The error being a consequence from executing the command python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --interval 1d --region cn,

zipfile.BadZipFile: File is not a zip file
Error: Process completed with exit code 1.

However, when I create, in my macos, a virtual environment with python 3.7, which is the same version used to test the code, my script runs successfully.

2022-03-21 16:18:00.530 | WARNING  | qlib.tests.data:_download_data:57 - The data for the example is collected from Yahoo Finance. Please be aware that the quality of the data might not be perfect. (You can refer to the original data source: https://finance.yahoo.com/lookup.)
216677376it [01:30, 2393653.69it/s]
2022-03-21 16:19:31.063 | WARNING  | qlib.tests.data:_unzip:82 - will delete the old qlib data directory(features, instruments, calendars, features_cache, dataset_cache): /Users/igorlimarochaazevedo/.qlib/qlib_data/cn_data
2022-03-21 16:19:31.064 | INFO     | qlib.tests.data:_unzip:85 - /Users/.qlib/qlib_data/cn_data/20220321161759_qlib_data_cn_1d_latest.zip unzipping......
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 43788/43788 [00:07<00:00, 5577.22it/s]

How could I resolve this error?

It is worth metioning, that in my understanding at least, the code changes being submited to the PR does not not interfere in the execution process of scripts/get_data.pyfile

igor17400
@igor17400

In the DataLayer : Data Framework & Usage documentation under Multiple Stock Modes section it says

The trade unit defines the unit number of stocks can be used in a trade, and the limit threshold defines the bound set to the percentage of ups and downs of a stock.

Can anyone give me further explanation or references on what trade unit and limit threshold are?

Because I don't understand exactly how they work in the stock market environment. And as consequence, how they affect qlib execution when initialized using one region or the other as the given example qlib.init(provider_uri='~/.qlib/qlib_data/cn_data', region=REG_CN)

igor17400
@igor17400

I’m executing XGBoost with my own data downloaded form Brazil’s Stock Exchange. In order to better understand the data used for training in XGBoost provided by the DataHandler module, I downloaded it into .csv format and opened with pandas.

It has a column named label, which if I understood correctly is provided by the workflow variable defined by the user. In other words, this label variable is label: ["Ref($close, -2) / Ref($close, -1) - 1”]. As the sample from qlib shows.

However, when I use qlib’s Data Retrieval to calculate such formula "Ref($close, -2) / Ref($close, -1) - 1” it returns a different value.
Why is that?

Dataframe used for training in XGBoost:

Date: 2008-01-02
Symbol: bbas3.sa
Ref($close, -2) / Ref($close, -1) - 1: 1.08125

Value returned by qlib’s data retrieval:

Date: 2008-01-02
Symbol: bbas3.sa
Ref($close, -2) / Ref($close, -1) - 1: -0.011585

Code used for qlib’s data retrieval:

df_ = D.features(D.instruments('ibov'), ['Ref($close, -2)/Ref($close, -1)-1'], '2008-01-02', '2014-12-30')
df_.loc[(['BBAS3.SA'], '2008-01-02'), :]
gawinghe
@gawinghe
麻烦再发下微信群二维码吧
igor17400
@igor17400

Qlib comes with some benchmarks examples models for china stock market such as XGBoost. And those models already come with some predefined parameters that if I understood correctly are optimal paramters.

However, if I modify the china dataset or use another dataset from a different stock market such as US or BR(Brazilian) how can I obtain such optimal parameters as the ones shown below?

model:
class: XGBModel
module_path: qlib.contrib.model.xgboost
kwargs:
eval_metric: rmse
colsample_bytree: 0.8879
eta: 0.0421
max_depth: 8
n_estimators: 647
subsample: 0.8789
LUS8806
@LUS8806
Hi there, How to get label data when the dataset is TSDatasetH?
I am using dataset.prepare('test', colset='label'). But it returns a TSDataSampler. I need a dataframe.
igor17400
@igor17400

In the following documentation it says the following

In the Alpha158, Qlib uses the label Ref($close, -2)/Ref($close, -1) - 1 that means the change from T+1 to T+2, rather than Ref($close, -1)/$close - 1, of which the reason is that when getting the T day close price of a china stock, the stock can be bought on T+1 day and sold on T+2 day.

However, in Aplha360 should we use the same equation - Ref($close, -2)/Ref($close, -1) - 1 - for label? Or should we use Ref($close, -1)/$close - 1?

In this other page from the documentation it says,

• For ic

The Pearson correlation coefficient series between label and prediction score. In the above example, the label is formulated as Ref($close, -1)/$close - 1. Please refer to Data Feature for more details.

Are those two formulas related? Shouldn't them be the same?

Issue(Label: Question) #1060

gxxuej
@gxxuej
@LUS8806 你试一下：(dataset.prepare('test', col_set='label')).data
igor17400
@igor17400

Why some models require infer_processors / learn_processors definition and other don't ?

I’ve read the documentation available at link and the code. But I couldn’t understand why some models required infer_processors and others don’t. CatBoost doesn’t define any infer_processor, while MLP does define some. Why is that?

I’ve found this explanation for the difference between inference and learning, however I’m not being able to understand why some models need inference and other don’t ;-;

lauht
@lauht
Hi everyone, could you please tell me are there any stock index data of US? For example, SP500?
CyberPlayerOne
@CyberPlayerOne
@wzy461143268 你好，能再发一下微信群吗？谢谢
qianyongjun895
@qianyongjun895
微信群，发一下
LUS8806
@LUS8806
qlib实现的ALSTM和原论文的实现不一样吧，我看差别挺大的
想知道这么修改的原因是什么？
qianyongjun895
@qianyongjun895
数据集不一样
路旁的叶修
@ChengzhenDu
有微信群嘛
lukekingca
@lukekingca
可以再发下微信群吗？谢谢！
lukekingca
@lukekingca
论文里面ddg da模型的训练环境是什么？cpu训练的？
pizi kuan
@pizikuan_gitlab
有微信群吗？
SITONGRUC
@SITONGRUC
再建一个微信群？
Quentin168
@Quentin168
while I was trying detailed_workflow.ipynb with crypto dataset, i can get Alpha158 features like below:
dataset_conf = {
"class": "DatasetH",
"module_path": "qlib.data.dataset",
"kwargs": {
"handler": hd,
"segments": {
"train": ("2018-01-01", "2018-12-31"),
"valid": ("2019-01-01", "2019-12-31"),
"test": ("2020-01-01", "2020-12-31"),
},
},
}
model = init_instance_by_config({
"class": "LGBModel",
"module_path": "qlib.contrib.model.gbdt",
"kwargs": {
"loss": "mse",
"colsample_bytree": 0.8879,
"learning_rate": 0.0421,
"subsample": 0.8789,
"lambda_l1": 205.6999,
"lambda_l2": 580.9768,
"max_depth": 8,
"num_leaves": 210,
},
})

# start exp to train model

with R.start(experiment_name=EXP_NAME):
model.fit(dataset)
R.save_objects(trained_model=model)
rec = R.get_recorder()
rid = rec.id # save the record id

# Inference and saving signal
sr = SignalRecord(model, dataset, rec)
sr.generate()
13678:MainThread INFO - qlib.workflow - [expm.py:315] - <mlflow.tracking.client.MlflowClient object at 0x7f83f32ff3a0>
13678:MainThread INFO - qlib.workflow - [exp.py:257] - Experiment 1 starts running ...
13678:MainThread INFO - qlib.workflow - [recorder.py:293] - Recorder 575dc51dbd674d35a88d21e2e2815093 starts running under Experiment 1 ...
Training until validation scores don't improve for 50 rounds
[20] train's l2: 0.763141 valid's l2: 0.824096
[40] train's l2: 0.763141 valid's l2: 0.824096
[60] train's l2: 0.763141 valid's l2: 0.824096
Early stopping, best iteration is:
[17] train's l2: 0.763141 valid's l2: 0.824096
13678:MainThread INFO - qlib.workflow - [record_temp.py:194] - Signal record 'pred.pkl' has been saved as the artifact of the Experiment 1
'The following are prediction results of the LGBModel model.'
score
datetime instrument
2020-01-01 BNBUSDT_BINANCE_1D 3.969982e-09
BTCUSDT_BINANCE_1D 3.969982e-09
ETHUSDT_BINANCE_1D 3.969982e-09
MATICUSDT_BINANCE_1D 3.969982e-09
TRXUSDT_BINANCE_1D 3.969982e-09
13678:MainThread INFO - qlib.timer - [log.py:117] - Time cost: 5.316s | waiting async_log Done
Quentin168
@Quentin168
It seems the train and valid stucked and no improvement like [20] train's l2: 0.763141 valid's l2: 0.824096, any expert can help me figure out where is the problem? thanks.
Wing Light
@winglight
DatasetH is not updated since 2020/9/26? I tried to fetch daily trading data by D without any problem, but no data fetched while qrun any model that load data by DatasetH. How can I update Dataset just like dump_bin?
wony
@wony-zheng
Roi Mallo
Hi all. I just opened the issue microsoft/qlib#1196 , I was using collect update_data_to_bin to update previously downloaded prices but 1) it kind of ignored the date range I provided and 2) it crashed at the end
it's there a reliable way to update the prices efficiently?
another question: I already have an universe of stocks created, but I've been days trying to figure out the easiest way to include it in the data acquisition pipeline. My current approach is to patch the code... any suggestion ?
wangzhen
@huasir
来个微信群二维码呢
xphynance
@xphynance
@wzy461143268 麻烦请再发一下微信群二维码可以吗？谢谢啦😊
pirsoz
@pirsoz
Hi, I am a newbie, is there a step by step tutorial that Can I learn Qlib with it?
Lishowie
@howie1013
微信群有吗
Aben