Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Joe M
    @jgmarce

    I see:

     ps -ef | grep elastic | grep Xm
    ubuntu     36207   36187 17 15:43 ?        00:00:57 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/tmp/elasticsearch-11788785404325033913 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Des.cgroups.hierarchy.override=/ -Xmx8g -XX:MaxDirectMemorySize=4294967296 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/usr/share/elasticsearch/config -Des.distribution.flavor=default -Des.distribution.type=docker -Des.bundled_jdk=true -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -Ecluster.name=pelias-dev -Ediscovery.type=single-node -Ebootstrap.memory_lock=true

    Which has these java memory settings attempted twice both "1g" and later "8g"

    Joe M
    @jgmarce

    Thought I'd sill like to better understand how to "adjust" the elasticsearch instance without losing the data, my original test was flawed as demonstrated here...

     docker run --rm  img-elasticdump --input=http://10.11.32.108:9200 --output=$ --type=data --size 2000 | pv --line-mode --average-rate > /dev/null
    [76.7 /s]
    $ docker run --rm  img-elasticdump --input=http://10.11.32.108:9200 --output=$ --type=data --limit 8192 --size 2000 | pv --line-mode --average-rate > /dev/null
    [1.1k/s]
    $ docker run --rm  img-elasticdump --input=http://10.11.32.108:9200 --output=$ --type=data --limit 8192 --size 100000 | pv --line-mode --average-rate > /dev/null
    [4.1k/s]
    $

    an option --limit <val> which I assumed was for import/upload chunking only seems to have an influence on export also.

    Tom Erik Støwer
    @testower
    @jgmarce I don't understand why you're afraid of losing your data if you have it as a mounted volume from your host. That would be the whole point of mounting it as a volume, wouldn't it?
    @jgmarce As long as it's the same version of elasticsearch, it shouldn't be a problem
    Joe M
    @jgmarce
    Hope some of you will find this useful:
    $ cat runner.sh
    docker run --rm  img-elasticdump --input=http://<yourIP>:9200 --limit=8192 --output=$ --type=data |
    pv -f -i 1800 --average-rate --line-mode 2> pv.err |
    split -l 120000000 --filter 'gzip | aws s3 cp \
    --storage-class STANDARD_IA - s3://<bucket>/geodata/pelias-dump.${FILE}.gz' -
    Joe M
    @jgmarce
    @testower As one works their way up to a planet import via docker, they may wish to tune elastic or at least understand how it is getting the tuning values. I see both -Xmx512m and later (in the same command line) in the process status -Xmx8g on 64G (RAM) system. So, it seems there there are a few places attempting to set the JAVA_OPTS. Looking for the "best practice" location to set JAVA_OPTS "after" the elasticsearch container has already been created and after some resources have been expended on imports. I'll admit I've only recently switched over to the pelias-docker workflow.
    Julian Simioni
    @orangejulius
    for what it's worth, we've never used elasticdump but from what I understand of the architecture it's always going to be a lot slower than Elasticsearch's built in snapshot functionality
    for Geocode Earth we can save/restore planet-sized snapshots in about 5 minutes to S3. with the right filesystem I'm sure you can do it without S3 as well in a similar time
    Joe M
    @jgmarce
    @orangejulius Very good to know... thank you.
    Tom Erik Støwer
    @testower
    Sorry if I misready your question then. We routinely build up an index in one container, pick up the data and use it to rollout in another container, that's why I said you can reuse the data
    Joe M
    @jgmarce
    Thanks Tom... I still have my documents...
    mint@mint:/data/pelias/docker/projects/new-york-city$ pelias elastic stats | jq .aggregations.sources.buckets[].layers.buckets[].doc_count
    1099847
    98304
    56599
    965256
    998
    272
    211
    9
    5
    5
    1
    1
    1
    
    mint@mint:/data/pelias/docker/projects/new-york-city$ ps -ef | grep 512
    root        5512    2321  0 12:01 ?        00:00:45 /sbin/mount.ntfs /dev/nvme0n1p3 /disk0 -o rw
    root        7373       1  0 12:02 ?        00:00:03 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 5f17e0be91663a7356c169912a7d0512dc90bd969a787ac33f6475aa1b7c7c09 -address /run/containerd/containerd.sock
    mint      110973  110952  1 15:46 ?        00:01:36 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/tmp/elasticsearch-12659972906387098448 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Des.cgroups.hierarchy.override=/ -Xms512m -Xmx512m -XX:MaxDirectMemorySize=268435456 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/usr/share/elasticsearch/config -Des.distribution.flavor=default -Des.distribution.type=docker -Des.bundled_jdk=true -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -Ecluster.name=pelias-dev -Ediscovery.type=single-node -Ebootstrap.memory_lock=true
    mint      190035   73839  0 17:58 pts/1    00:00:00 grep --color=auto 512
    mint@mint:/data/pelias/docker/projects/new-york-city$ echo $ES_JAVA_OPTS
    -Xms3g -Xmx3g
    mint@mint:/data/pelias/docker/projects/new-york-city$ pelias elastic stop
    Killing pelias_elasticsearch ... done
    mint@mint:/data/pelias/docker/projects/new-york-city$ docker ps -a | grep elasitc
    mint@mint:/data/pelias/docker/projects/new-york-city$ pelias elastic start
    Starting pelias_elasticsearch ... done
    mint@mint:/data/pelias/docker/projects/new-york-city$ docker ps | grep 9200
    a6d4c8308831   pelias/elasticsearch:7.5.1    "/usr/local/bin/dock…"   5 days ago   Up 14 seconds   0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 0.0.0.0:9300->9300/tcp, :::9300->9300/tcp   pelias_elasticsearch
    mint@mint:/data/pelias/docker/projects/new-york-city$ pelias elastic stop
    Killing pelias_elasticsearch ... done
    mint@mint:/data/pelias/docker/projects/new-york-city$ docker rm a6d4c8308831
    a6d4c8308831
    mint@mint:/data/pelias/docker/projects/new-york-city$ pelias elastic start
    Creating pelias_elasticsearch ... done
    
    mint@mint:/data/pelias/docker/projects/new-york-city$ pelias elastic stats | jq .aggregations.sources.buckets[].layers.buckets[].doc_count
    1099847
    98304
    56599
    965256
    998
    272
    211
    9
    5
    5
    1
    1
    1
    Very cool... Thanks Tom... Still don't know the place to set ES_JAVA_OPTS but knowing I can remove the container makes the exploration more practical.
    Tom Erik Støwer
    @testower
    Yes, I can't answer that part of your question, sorry :)
    So are you using docker-compose?
    I suppose in the elasticsearch section of your docker-compose.yml you could simply set environment: [ "ES_JAVA_OPTS=<your values>" ]
    Joe M
    @jgmarce
    Through pelias compose, yes, Thanks for the hint.. I'll try it.
    Joe M
    @jgmarce
    Perfect!
    mint@mint:/data/pelias/docker/projects/new-york-city$ grep JAVA_OPTS docker-compose.yml
        environment: [ "ES_JAVA_OPTS=-Xms2g -Xmx2g" ]
    mint@mint:/data/pelias/docker/projects/new-york-city$ docker ps | grep 9200
    6d1517c597ab   pelias/elasticsearch:7.5.1    "/usr/local/bin/dock…"   37 minutes ago   Up 37 minutes   0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 0.0.0.0:9300->9300/tcp, :::9300->9300/tcp   pelias_elasticsearch
    mint@mint:/data/pelias/docker/projects/new-york-city$ pelias elastic stop
    Killing pelias_elasticsearch ... done
    mint@mint:/data/pelias/docker/projects/new-york-city$ docker rm 6d1517c597ab
    6d1517c597ab
    mint@mint:/data/pelias/docker/projects/new-york-city$ pelias elastic start
    Creating pelias_elasticsearch ... done
    mint@mint:/data/pelias/docker/projects/new-york-city$ ps -ef | grep elastic | grep Xm
    mint      217840  217820 99 18:40 ?        00:00:26 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/tmp/elasticsearch-8514003410735345715 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Des.cgroups.hierarchy.override=/ -Xms2g -Xmx2g -XX:MaxDirectMemorySize=1073741824 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/usr/share/elasticsearch/config -Des.distribution.flavor=default -Des.distribution.type=docker -Des.bundled_jdk=true -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -Ecluster.name=pelias-dev -Ediscovery.type=single-node -Ebootstrap.memory_lock=true
    mint@mint:/data/pelias/docker/projects/new-york-city$
    Tom Erik Støwer
    @testower
    :slight_smile:
    Joe M
    @jgmarce
    Ah, I had not... I'm doing development using new-york-city and that docker-compose.yml makes no attempt to set ES_JAVA_OPTS but now I understand where Xmx8g came from in my "planet" environment. It's all becoming clear now, Thanks Julian.
    Julian Simioni
    @orangejulius
    we're considering a bit of a reorganization of the docker repository that would combine all the docker-compose.yml files into one. that probably would have made it more clear that different settings are recommended for the planet. although that said, what we've discovered over the years is that heap size is not nearly as important these days with ES6+ than it was back in the early days
    I don't think 0.5GB is enough, but anything beyond that should be fine in general
    Joe M
    @jgmarce

    Elasticdump is useful if you need to change the documents... Back in 7/2019 I had to change my addendum style to conform to pelias's addendum feature.

    I hope "jewels of wisdom" from this elasticsearch transformation will make it to the "considerations for planet builds" document.

    Some additional questions I'm left with after reviewing that page... It isn't clear what can be accomplished given RAM restrictions. It is fine to know 32 hours given 64G but can the planet import succeed on a 16G RAM system given more time? If working with 16G will someone eventually succeed simply by taking a more granular steps to build?

    Is the PIP service always a single hit to RAM, because in the past I thought I had observed a PIP server per OA import thread, but I may be wrong on that...

    It all seems to work and that is amazing, but it would be good to know if any part of a planet import will fail on a 16G RAM host and have that in the "considerations" document.

    Julian Simioni
    @orangejulius
    each importer running will need about 8GB of RAM for the PIP service, yes. so if you want to run a fast planet build with several importers in parallel, you do want a lot of RAM. The OA importer can parallelize the work it does itself. Same concept but handled for you. Elasticsearch itself doesn't seem to need a ton of RAM, but it's pretty CPU heavy. so all in all you wind up with needing a lot of CPU and RAM :)
    Joe M
    @jgmarce
    Great... Can someone do the full planet with 16Gig of RAM or should I try and report back?
    Julian Simioni
    @orangejulius
    lets see. with 16GB of ram I'm guessing you're going to need to do it with no parallelization which means it will probably take at least 3 days. if you want to try that, go for it
    a good way to get a ballpark estimate is to see what import rate you're getting from an importer. We normally see 8000 records per second in our logs, even when running 3-4 importers in parallel. it might be worth making sure you're getting close to that number before you let a build run to completion
    Joe M
    @jgmarce
    This is even a question on AWS as there might be a spot instance with a very low price at 16G RAM but no deal to be found on a spot instance at 64-128G RAM. I'll see what I can do and give back (captured times) if I can.
    And it's great that I see where to reduce the elasticsearch "-Xmx" value as I'll have to steal back RAM where I can... 8g->4g I imagine.
    Julian Simioni
    @orangejulius
    another thing that takes some RAM is the pbf2json component of the OpenStreetMap importer
    Joe M
    @jgmarce
    I won't start just now, I'd be happy if someone in this group would say, I've already done it, don't waste the effort....
    jeff
    @jeff36476865_twitter
    How do I perform a crossstreet search in pelias docker?
    something like /v1/search?text=broadway+and+alvernon+Tucson"
    I have all the north american data loaded from the project
    cbayerlein
    @cbayerlein
    Hi! I'm trying to run my own instance of Pelias using Docker, covering only Baden-Württemberg (a federal state of Germany). Therefor, I cloned the directory germany in projects to a new one called germany-bawue and edited the pelias.json in here like this: https://gist.github.com/cbayerlein/9b7af1f8d7b26d93e1f1785c8d9484ea . Obviously everything went fine and Pelias is running. However, I have problems with streets. For example, when I query "korber str waibling" I get a bunch of results including addresseswith house numbers in this street but the actual "Korber Straße, Waiblingen, BW, Germany" is missing, although it exists in OSM.
    toton6868
    @toton6868
    Hello, I am trying to search for a keyword near to me. I am using something like this url/search?text= pizza hut&source=my_source&focus.point.lat=23.768510&focus.point.lon=90.354267. So I am finding for the pizza hut near to me but in the response, I am getting a huge list of those containing other pizza stores with confidence 1 along with pizza hut something like Pizza and Pizza, Pizza Lovers, Pizza Inn etc. Is there any way to search the exact keyword Pizza Hut.? I am finding the exact text match. Thanks in advance.
    bdmapman
    @bdmapman
    @jgmarce I have an idle VPS of 64 GB Ram though that is not AWS. If you want I can start the test. I saw my elasticsearch current heap size is 223.4 MB and Max heap is 1GB. I can increase it using ES_JAVA_OPTS="-Xms2g -Xmx2g" ./bin/elasticsearch but in that document, they strongly recommended not to change the heap size. Do I start the planet OSM import using the same heap size?
    Joe M
    @jgmarce
    With 64GB I'd increase elasticsearch to 4g others could share their opinion. The 20 days to build "interpolation" zapped me of some motivation to build and time my builds of everything, but if you contact me directly I could be convinced to continue. Together we could just post the summary on this channel rather than the play by play.
    bdmapman
    @bdmapman
    @jgmarce You can contact me directly also. What is the best way to contact you?
    bdmapman
    @bdmapman

    Hello, I am updating a single entry of my pelias index by gid which was inserted through csv-importer. The gid is something like custompoi:restaurant:45632. I am updating the layer type of this entry by

    curl -X POST "localhost:9200/pelias/_update/custompoi:restaurant:45632?refresh=true&pretty=true" -H 'Content-Type: application/json' -d'
    {
      "script" : "ctx._source.layer = \u0027shop\u0027"
    }
    '

    Unfortunately after updating successfully (curl is providing updated data in the terminal) this specific entry was not found in search endpoint. But can be found in reverse. Is there any specific reason behind this? I need to update data dynamically.

    Julian Simioni
    @orangejulius
    @bdmapman I think that behavior won't work out of the box unless you handle the phrase.* fields (making changes will erase them and break queries for that document in the search endpoint). You can read more about it here. It's been something we've wanted to fix for a while pelias/schema#285
    bdmapman
    @bdmapman
    @orangejulius ... Seems like I am going to fall into big trouble. I have already made a pipeline for live data updates. Is there any other way or shortcut of doing it? Besides I didn't understand by "unless you handle the phrase.*". If there is any way it will save me.
    kbeetz
    @kbeetz
    Hello, I am quite new to pelias and local installation of the server. My question ist: Where do I put the extra file with the optional variables? And if I don't have a API key, I remove it from the queryParams.json?
    Also I am not quite sure where to put the script-batch-search folder. For now I put in the docker -> project -> Germany where I downloaded the OSM data before. Is that correct? Thank you!
    Bimbol
    @Bimbol_gitlab
    Hello did Pelias support incremental updates? Or I have to import all the data at once, and replace index in elasticsearch?
    Sachin Sagoo
    @gohansagoo_twitter
    Hi Pelias Team!
    Looking for some help to understand the result of text=Newyork, USA with search API
    Result is confusing to me, here is the link to run the same search https://pelias.github.io/compare/#/v1/search?text=Newyork%2C+USA&debug=1
    image.png
    Sachin Sagoo
    @gohansagoo_twitter
    Attaching the result snap as well. The expected result should be Newyork from USA. Instead got different set of results.
    Patrick Wilson
    @patdevinwilson
    Hi, folks - I'm looking for some clarity on whether or not component address input is supported or not? This link shows it is a milestone, however I could not determine that in the roadmap on github. Thanks in advance! [https://pelias.io/milestones/component_geocoding/]
    Patrick Wilson
    @patdevinwilson
    bdmapman
    @bdmapman
    Hi, I don't know what I am missing but in the autocomplete endpoint if there are two keywords it is not returning expected results. I am trying to get the exact pizza hut keyword but it is not returning the expected result. I saw this (pelias/placeholder#88) issue but it is related to space at the end. I can't use search API for a few restrictions. Another interesting finding, if I use a hyphen in between keywords then it is returning near to expected result but for more than two keywords hyphen is not working
    Stefán Baxter
    @acmeguy
    hello all, I'm a newbie.
    As such I have two questions.
    1. is there a docker image available that has been preloaded with data (using Elsasti/Solr I assume)
    2. What is done in Pelios to merge/link/consolidate data from multiple sources?
    14 replies