Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Raubsau
    @raubsau_gitlab
    Nope...
    vbox@bionic-test-vbox:~$ export LD_LIBRARY_PATH=/opt/arrayfire/lib64:$LD_LIBRARY_PATH
    vbox@bionic-test-vbox:~$ python3
    Python 3.6.8 (default, Oct  7 2019, 12:59:55) 
    [GCC 8.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from khiva.library import *
    >>> set_backend(KHIVABackend.KHIVA_BACKEND_CPU)
    >>> set_device(0)
    >>> from khiva.array import *
    >>> a = Array([1, 2, 3, 4, 5, 6, 7, 8])
    >>> a.display()
    array
    [>>> a.display()
    >>> a.display()
    >>> a.display()
    >>> a.display()
    >>> a = a.to_pandas()
    >>> print(a)
         0
    0  1.0
    1  2.0
    2  3.0
    3  4.0
    4  5.0
    5  6.0
    6  7.0
    7  8.0
    >>> from khiva.matrix import *
    >>> stomp_result = stomp(Array(np.array([11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11])),
    ...                          Array(np.array([9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9])),
    ...                          3)
    Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.
    vbox@bionic-test-vbox:~$
    Antonio Vilches
    @avilchess
    it is weird because they are in that path right?
    Raubsau
    @raubsau_gitlab
    yes :/
    Antonio Vilches
    @avilchess
    I think it is the last thing I can think about
    Raubsau
    @raubsau_gitlab
    Doing this, it works
    wait
    Raubsau
    @raubsau_gitlab
    sudo bash ./ArrayFire-v3.6.4_Linux_x86_64.sh --include-subdir --prefix=/opt
    ArrayFire Installer Version: 3.6.4, Copyright (c) ArrayFire
    This is a self-extracting archive.
    The archive will be extracted to: /opt
    
    Using target directory: /opt/arrayfire
    Extracting, please wait...
    
    Unpacking finished successfully
    
    vbox@bionic-test-vbox:~$ sudo su
    root@bionic-test-vbox:/home/vbox# echo /opt/arrayfire/lib64 > /etc/ld.so.conf.d/arrayfire.conf
    root@bionic-test-vbox:/home/vbox# exit
    vbox@bionic-test-vbox:~$ sudo ldconfig
    
    vbox@bionic-test-vbox:~$ sudo dpkg -i khiva-khiva_0.3.0-1_amd64.deb 
    # ( Succeeds )
    
    vbox@bionic-test-vbox:~$ pip3 install khiva
    Collecting khiva
      Downloading https://files.pythonhosted.org/packages/af/c5/b854648d7c8ef89b2eb109f74063570b0716a955151e44f34dff8f59b3f2/khiva-0.3.0.tar.gz
    Building wheels for collected packages: khiva
      Running setup.py bdist_wheel for khiva ... done
      Stored in directory: /home/vbox/.cache/pip/wheels/d2/9a/bd/b4f725186f0ec4a8793766aa9ed81db8577ef40fc8ebd49241
    Successfully built khiva
    Installing collected packages: khiva
    Successfully installed khiva-0.3.0
    
    vbox@bionic-test-vbox:~$ pip3 install arrayfire pandas
    Collecting arrayfire
    Collecting pandas
      Downloading https://files.pythonhosted.org/packages/86/12/08b092f6fc9e4c2552e37add0861d0e0e0d743f78f1318973caad970b3fc/pandas-0.25.2-cp36-cp36m-manylinux1_x86_64.whl (10.4MB)
        100% |████████████████████████████████| 10.4MB 140kB/s 
    Collecting python-dateutil>=2.6.1 (from pandas)
      Downloading https://files.pythonhosted.org/packages/41/17/c62faccbfbd163c7f57f3844689e3a78bae1f403648a6afb1d0866d87fbb/python_dateutil-2.8.0-py2.py3-none-any.whl (226kB)
        100% |████████████████████████████████| 235kB 1.2MB/s 
    Collecting numpy>=1.13.3 (from pandas)
      Downloading https://files.pythonhosted.org/packages/0e/46/ae6773894f7eacf53308086287897ec568eac9768918d913d5b9d366c5db/numpy-1.17.3-cp36-cp36m-manylinux1_x86_64.whl (20.0MB)
        100% |████████████████████████████████| 20.0MB 74kB/s 
    Collecting pytz>=2017.2 (from pandas)
      Downloading https://files.pythonhosted.org/packages/e7/f9/f0b53f88060247251bf481fa6ea62cd0d25bf1b11a87888e53ce5b7c8ad2/pytz-2019.3-py2.py3-none-any.whl (509kB)
        100% |████████████████████████████████| 512kB 1.8MB/s 
    Collecting six>=1.5 (from python-dateutil>=2.6.1->pandas)
      Downloading https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
    Installing collected packages: arrayfire, six, python-dateutil, numpy, pytz, pandas
    Successfully installed arrayfire-3.6.20181017 numpy-1.17.3 pandas-0.25.2 python-dateutil-2.8.0 pytz-2019.3 six-1.12.0
    Then it fails again with the missing libraries
    Antonio Vilches
    @avilchess
    No sure what to suggest you now …, the problem must be between intel and arrayfire
    Raubsau
    @raubsau_gitlab
    I'll try my luck with the arrayfire compiling from their git repo
    at least in the VM, it worked with khiva in the end
    Thank you for your patience :)
    Antonio Vilches
    @avilchess
    If you want we have a docker iamge that fires a jupyter notebook
    if it helps
    Raubsau
    @raubsau_gitlab
    but will this make use of a GPU?
    Antonio Vilches
    @avilchess
    It will run on the CPU
    Raubsau
    @raubsau_gitlab
    I'll come back to you about the docker image should I fail with the compilation. Thanks again!
    Antonio Vilches
    @avilchess
    Take a look at the intel part, probably it is an integration problem
    Raubsau
    @raubsau_gitlab
    Found a link, at least arrayfire can be compiled with MKL as shown here: https://github.com/eddelbuettel/mkl4deb
    Antonio Vilches
    @avilchess
    have you tried that?
    Raubsau
    @raubsau_gitlab
    Yes, arrayfire is compiling with these libraries installed. I'm stuck at some arrayfire/cuda stuff now, some of the arrayfire tests cannot be completed
    But that is for another time ;)
    Raubsau
    @raubsau_gitlab

    Hey, so far I got this running:

    import pandas as pd
    import khiva
    
    # Read in big CSV
    all_sites = pd.read_csv("all_sites.csv")
    
    # Subset to values
    energy_values = all_sites['value']
    
    # # Print energy_values
    # energy_values
    # 0            52.1147
    # 1            50.9517
    # 2            49.8164
    # 3            49.1795
    # 4            47.6288
    #               ...   
    # 10531283    127.1858
    # 10531284    124.8942
    # 10531285    124.8942
    # 10531286    126.6129
    # 10531287    126.0400
    # Name: value, Length: 10531288, dtype: float64
    
    # # Convert to numpy
    # energy_values[0:99999].to_numpy()
    # array([52.1147, 50.9517, 49.8164, ..., 32.4817, 30.8479, 31.7894])
    
    # Convert to khiva array
    energy_kv_array = khiva.array.Array(energy_values[0:99999].to_numpy())
    
    # Get matrix profile with window length 1024
    mp = khiva.matrix.stomp_self_join(energy_kv_array, 1024)
    
    # 1min 30s ± 76.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    This is with CUDA, I can see the GPUs RAm gets populated and its energy consumption rises as well.

    With R on 32 CPU cores, it's only slightly slower: 2min 6s.

    What am I doing wrong?

    While STOMP is running, there is always one CPU core at 100%.
    Is the energy_kv_array not in the GPU's RAM? Is each segment getting transferred to the GPU on it's own?
    khiva.array.Array() will create an array in the GPUs RAM, right?
    Raubsau
    @raubsau_gitlab
    Also, Python and R results differ when taking the mean of the matrix profile's values :/
    Antonio Vilches
    @avilchess
    Hi @raubsau_gitlab I’m happy you got Khiva running. Let me go through your comments.
    You can check, what is the active backend, by Default Arrayfire selects a CUDA backend if it is available.
    Every time you create a new Array it is allocated in the active backend
    Raubsau
    @raubsau_gitlab
    khiva.get_backend()
    Out[42]: <KHIVABackend.KHIVA_BACKEND_CUDA: 2>
    Antonio Vilches
    @avilchess
    I that case, the Array is directly allocated on GPU
    and the code is run on the GPU as well
    if you are seeing a CPU at 100% probably the Arrayfire backend is doing polling while waiting for the GPU results. Howwever I’m not 100% sure of this.
    Waht is the difference between the values between R and python?
    Raubsau
    @raubsau_gitlab
    mp[0].to_numpy().mean()
    Out[40]: 8.823958126770863

    R:

    mean(mp$mp)
    [1] 8.759127

    Antonio Vilches
    @avilchess
    I see that we are using numpy.float64 values, not sure if we are using floats (32 bits) or doubles (64 bits) in R
    Probably the difference comes from there
    Raubsau
    @raubsau_gitlab
    I am pretty sure R uses float64;
    but f32 vs. f64 was my first thought as well
    Antonio Vilches
    @avilchess
    By the way, I see you are interested in using the matrix profile method, just for your information, In the version 0.4.0 the matrix_profile method takes around 15 seconds to compute the metrhod with the input size you are using.
    The khiva core is already published, but we still have to publish the python part
    Raubsau
    @raubsau_gitlab
    Ok, is there a quick way to test 0.4.0 without compiling etc?
    It was only yesterday that I seriously had to deal with cmake for the first time...
    Antonio Vilches
    @avilchess
    yep you can install Khiva (core) 0.4.0 by installing the package in the release section of out github repo.
    for the python part -> you need to clone this branch https://github.com/shapelets/khiva-python/tree/feature/errorHandling
    and execute python3 install setup.py from the root folder of the python repo
    with that you can test our latests improvements, Once we finally merge this branch to master, everything will be easier
    Raubsau
    @raubsau_gitlab

    Just for giggles, I tried this: https://github.com/zpzim/SCAMP

    SCAMP-test/SCAMP/build/SCAMP --window=1024 --input_a_file_name=energy_values 
    Reading data from energy_values
    Read 1000000 values from file energy_values
    using all devices
    Starting SCAMP
    Num workers = 1
    num_tile_rows = 1, cols = 1
    Performing join with 1 tiles.
    Starting tile with starting row of 0 starting column of 0 with height 1000000 and width 1000000
    Finished 1 SCAMP tiles to generate  matrix profile in 29.615434 seconds on 1 devices and 0 threads

    Note that this is 10x the amount of data

    Antonio Vilches
    @avilchess
    We have integrated that CUDA implementation in Khiva 0.4.0 hehehe
    Raubsau
    @raubsau_gitlab
    I'll give the git Python a try tomorrow, thanks again for your work and patience :)
    Antonio Vilches
    @avilchess
    All right, you should expect the same performance with khiva 0.4.0, as we added that implementation. You are welcome.