Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Chica
    @ufosky
    image.png
    谁知道这里 start_time 和 fit_start_time什么区别,应该怎么设置吗
    Pengrong Zhu
    @zhupr

    @ufosky

    • start_time: the date when the whole process data acquisition started
    • fit_start_time: start date of the data used by the model

      start_time <= fit_start_time

    Pengrong Zhu
    @zhupr

    @ufosky
    If there are network problems, you can add the max_collector_count parameter for multiple failed retries: python collector.py xxxx --max_collector_count 5

    default is 2

    gongkui
    @gongkui
    请问qlib里有现成的测试代码和数据直接用吗
    Chica
    @ufosky
    qlib 分类 feature 要怎么处理
    Jacky
    @yesme
    Hey - anyone knows how to import the fund data into qlib, thus it can be processed among other stock data? Fund data only supports C in "OHLCV". Thanks!
    and - 申银万国 produces industry index since 2000 - those are not stock tickers but could be very informative to the data mining.
    Those industry indices support OHLC.
    rowkingrow
    @rowkingrow
    ops中的Rank(feature,N)只能rank某个feature的历史N天数据,有没有一个运算符可以rank某天所有股票的数据
    Pengrong Zhu
    @zhupr
    @gongkui Hi,
    Pengrong Zhu
    @zhupr

    @yesme Hi,
    The "fund data" or "industry indices" is in this format:

    datetime, fund_code/index_code, C, field1, field2
    2021-09-01, 161725, 1.2095, 111, 222

    The data can be dumped to a format supported by qlib via scripts/dump_bin.py: https://qlib.readthedocs.io/en/latest/component/data.html#converting-csv-format-into-qlib-format, which can be used in qlib.

    you-n-g
    @you-n-g

    qlib 分类 feature 要怎么处理

    Hi, Could you give more details about your requirements?

    ops中的Rank(feature,N)只能rank某个feature的历史N天数据,有没有一个运算符可以rank某天所有股票的数据

    It is not supported in the current version.

    Chica
    @ufosky

    qlib 分类 feature 要怎么处理

    Hi, Could you give more details about your requirements?

    @you-n-g 比如 怎么把股票所属的行业作为一个 feature,我看现在只能处理数值类型的 feature

    1 reply
    tangzhenjie25
    @tangzhenjie25
    ValueError: need at most 63 handles, got a sequence of length 64
    多核心cpu报这错,要怎么解决
    1 reply
    bbbzhai
    @bbbzhai
    Hi, is here a better place to ask questions or issue page in qlib repo?
    1 reply
    bbbzhai
    @bbbzhai
    I would like to implement some technical indicators. Instead of rewriting the whole formula using expression, I would like to use libraries such as ta-lib. Is there a module in qlib where I can do data-cleaning manually after querying the basic OHLC dataframe?
    2 replies
    Currently I have a separate process to handle my data and convert all the needed features to qlib bin file. And I would simply get all the needed feature directly. But sometimes I wish to do some data handling on the fly. Is it supported now?
    1 reply
    Noah Wöhler
    @NoahWoehler_twitter
    Hi, can I post a call for participants in an interview study on open source projects here? If any mod wants more details via DM first, then I'm happy to oblige :)
    Arthur Cui
    @b4thesunrise
    Hi, I am trying to add new datase of crypto, and I am trying to find some example of new dataset, is code in qlib/scripts/data_collector/fund/ enough to add a new dataset?
    4 replies
    Jeff
    @JeffQuantFin
    Hi. 有两个问题想请教:
    1,将dataframe格式直接存入bin.的接口是?
    2,qlib和Macbook(M1 Max芯片ARM架构系统)兼容性如何?我的新电脑快到了,有点担心
    Jeff
    @JeffQuantFin
    3,DumpDataFix用途是?DumpDataUpdate包含了数据去重,如果发生同一日期,但数据不同的异常情况处理是?
    谢谢,急
    Jeff
    @JeffQuantFin
    4、我准备引入自己频率的数据,例如5min,以dataframe和csv两种形式转换并存入bin. 该使用哪个API?可否指导下?
    Jeff
    @JeffQuantFin
    5、自定义生成的feature如何持久化存储为bin.?
    Jeff
    @JeffQuantFin
    6、DumpDataAll存储数据时,freq可否支持自定义?例如5min, 或tick?
    5 replies
    WangYang
    @boundles
    请问
    WangYang
    @boundles
    有us data上的完整例子吗
    2 replies
    Jeff
    @JeffQuantFin

    请问,计算指标通过expression filter筛选调出功能如何实现?(貌似现在只有raw data 通过D.instrument才可以调出,计算指标只有通过fetch筛选时间、股票名称),然后再用df的普通方法筛选,效率低。
    from qlib.data.filter import ExpressionDFilter
    filters = [

    Ref($close, 1) / $close - 1

    ExpressionDFilter("Mean(Ref($close, 1)/$close - 1, 30) > 0.01"),
    ]
    inst = D.instruments("all", filter_pipe=filters)
    df = D.features(inst, ["$close", "$volume"])

    5 replies
    2、在补充新的raw data后,如何update 计算指标缓存?现在只有只有“初始化”datahandler。(貌似setup可以装载数据,请确认下,也请讲解下data setup、prepare区别)
    1 reply
    Jeff
    @JeffQuantFin
    @zhupr p Hi, Peng rong. 可否帮忙提供个data to bin的实现方法?对于完善Qlib社区挺重要的。Qlib嵌套类和缓存索引嵌套方法很多,虽然我大致看懂了逻辑,怕自己写把数据库改坏了,毕竟我不是源码原作者,里面tricky的东西可能一时半会儿会有很多坑我不清楚。
    4 replies
    4、在某个资料资料上看到一些截图(记不清出处了),好像qlib也有可视化编程的模块interface,沿用了微软的低代码工程化经验, 是否可以开放?
    Jeff
    @JeffQuantFin
    5、新入手mac pro m1 max(据说芯片内核100多核,集成显卡性能也很强), 使用qlib时有几个问题请教下:
    ①max worker(进程数)设置多少合适?(另外,微软对原生的multi process是否也做了优化?)
    ②如何调用mac自己的集成显卡算力?(貌似mac m1尚无cuda等算力优化工具)
    1 reply
    Arthur Cui
    @b4thesunrise
    今天在catch up最新的main branch上的更新时发现qlib文件夹中缺乏version.txt会导致直接clone + pip install . 安装失败,查询发现是version.txt被gitignore了,将version.txt加回来就没有错误了,已经提交了有关的PR和issue去修改这个问题
    Jeff
    @JeffQuantFin
    Mac m1max 上一版装成功了,新版我重建环境后,三种方式都装不上,pip install pyqlib, python setup.py install pip install .
    1 reply
    kingovern
    @kingovern

    import qlib
    from qlib.data import D
    qlib.init(provider_uri ='C:/stockbase/Astocks')
    instruments = ['sh600030', 'sh600000']
    fields = ['$preclose']
    features_df = D.features(instruments, fields, start_time='2016-12-08', end_time='2016-12-18', freq='day')

    print(features_df)

    File "C:\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File "C:\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
    File "c:\Users\kinger.vscode\extensions\ms-python.python-2021.12.1559732655\pythonFiles\lib\python\debugpy__main.py", line 45, in <module>
    cli.main()
    File "c:\Users\kinger.vscode\extensions\ms-python.python-2021.12.1559732655\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 444, in main
    run()
    File "c:\Users\kinger.vscode\extensions\ms-python.python-2021.12.1559732655\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("
    main"))
    File "C:\Python39\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
    File "C:\Python39\lib\site-packages\qlib\data\data.py", line 1036, in features
    return DatasetD.dataset(instruments, fields, start_time, end_time, freq, inst_processors=inst_processors)
    File "C:\Python39\lib\site-packages\qlib\data\data.py", line 771, in dataset
    data = self.dataset_processor(
    File "C:\Python39\lib\site-packages\qlib\data\data.py", line 554, in dataset_processor
    ParallelExt(n_jobs=workers, backend=C.joblib_backend, maxtasksperchild=C.maxtasksperchild)(task_l),
    File "C:\Python39\lib\site-packages\joblib\parallel.py", line 966, in
    call
    n_jobs = self._initialize_backend()
    File "C:\Python39\lib\site-packages\joblib\parallel.py", line 733, in _initialize_backend
    n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
    File "C:\Python39\lib\site-packages\joblib_parallel_backends.py", line 470, in configure
    self._pool = MemmappingPool(n_jobs, **memmappingpool_args)
    File "C:\Python39\lib\site-packages\joblib\pool.py", line 303, in
    init
    manager = TemporaryResourcesManager(temp_folder)
    File "C:\Python39\lib\site-packages\joblib_memmapping_reducer.py", line 531, in
    init
    self.set_current_context(context_id)
    File "C:\Python39\lib\site-packages\joblib_memmapping_reducer.py", line 535, in set_current_context
    self.register_new_context(context_id)
    File "C:\Python39\lib\site-packages\joblib_memmapping_reducer.py", line 560, in register_new_context
    self.register_folder_finalizer(new_folder_path, context_id)
    File "C:\Python39\lib\site-packages\joblib_memmapping_reducer.py", line 590, in register_folder_finalizer
    resource_tracker.register(pool_subfolder, "folder")
    File "C:\Python39\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 191, in register
    self._send('REGISTER', name, rtype)
    File "C:\Python39\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 204, in _send
    msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-19: ordinal not in range(128)
    Exception ignored in: <function Pool.
    del at 0x00000245E12561F0>
    Traceback (most recent call last):
    File "C:\Python39\lib\multiprocessing\pool.py", line 264, in
    del__
    if self._state == RUN:
    AttributeError: 'MemmappingPool' object has no attribute '_state'
    qlib初始化是正常的,单个股票也是能读取数据的,但是当读取多个股票时就报错,问题可能处在哪里?

    2 replies
    Jeff
    @JeffQuantFin
    image.png
    Jeff
    @JeffQuantFin
    看到针对高频数据,存储变为了.pkl(不再是.bin或持久化cache)请教下:
    1、这种存储方式,是否还能保留qlib 索引+cache+二进制三方面的快速IO特点?
    2、数据将是整体取出,如果数据量大,内存不够时,是否需要用多线程分拆分任务,限制整体取数的内存消耗
    3、如果还是采用.bin 持久化cache存储,代价是花费5倍的磁盘空间,但是取数更快,可以这么理解么?
    1 reply
    Jeff
    @JeffQuantFin

    DumpDataAll, DumpDataFix, DumpDataUpdate
    有bug
    专家检查下?

    同一份csv, 按顺序运行不同dump_bin 子类:

    Step 1, 首次空文件夹,执行DumpDataAll

    Step 2, 然后执行DumpDataFix, 文件数量不变,但是空间增加5M

    Step 3, 然后执行DumpDataFix, 没变化

    Step 4, 执行 DumpDataUpdate,文件大小增加近一倍,不断重复,不断增加

    1 reply
    39,709,728字节(磁盘上的40.7MB),共510个项目.png
    39,709,728字节(磁盘上的45.4MB),共510个项目.png
    39,709,728字节(磁盘上的45.4 MB),共510个项目.png
    71,448,048字节(磁盘上的88MB) 共510个项目.png
    103.186,368 字节(磁盘上的115.5MB),共510个项目.png
    郭乾有
    @guoqianyou
    image.png
    你好请问一下 自定义操作符的时候, 出现多进程无法序列化报错, 这个如何处理
    郭乾有
    @guoqianyou
    image.png