Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    A Sadeghioon
    @Babamon_gitlab
    Hi everyone , I have migrated from postgresql to mongoDB and started using restheart and I have to say I love it, however I had a question that I cannot find the answer to. Is it possible to request the data as CSV ? I know it is possible to upload but is it possible to also download ? Thank you for your help in advance
    Andrea Di Cesare
    @ujibang
    Hi @Babamon_gitlab, you can transform the response developing an Interceptor, see https://restheart.org/docs/plugins/core-plugins/#interceptors
    Andrea Di Cesare
    @ujibang
    I'm posting here a possibile implementation
    @RegisterPlugin(name = "csvTransformer",
        interceptPoint = InterceptPoint.RESPONSE,
        description = "transform the response to CSV format",
        enabledByDefault = true)
    public class CsvTransformer implements MongoInterceptor {
        @Override
        public void handle(MongoRequest request, MongoResponse response) {
            var docs = response.getContent().asArray();
            var sb = new StringBuilder();
    
            // add the header
            if (docs.size() > 0) {
                docs.get(0).asDocument().keySet().forEach(k -> sb.append(k).append(","));
                sb.append("\n");
            }
    
            // add rows
            docs.stream()
                .map(BsonValue::asDocument)
                .forEach(fdoc -> {
                    sb.append(fdoc.entrySet().stream()
                        .map(e -> e.getValue())
                        .map(v -> BsonUtils.toJson(v))
                        .collect(Collectors.joining(",")));
    
                    sb.append("\n");
                });
    
            response.setContentType("text/csv");
    
            response.setCustomSender(() ->  response.getExchange().getResponseSender().send(sb.toString()));
        }
    
        @Override
        public boolean resolve(MongoRequest request, MongoResponse response) {
            return request.isGet()
                && request.isCollection()
                && response.getContent() != null
                && request.getQueryParameterOfDefault("csv", null) != null;
        }
    }
    with it
    $ http -b -a admin:secret :8080/coll\?csv
    _id,a,_etag,
    {"$oid":"6202562ce5078606d08b79e2"},1,{"$oid":"6202562ce5078606d08b79e1"}
    {"$oid":"62025626e5078606d08b79df"},1,{"$oid":"62025662e5078606d08b79e5"}
    I'm going to add it to the restheart examples repo https://github.com/softInstigate/restheart-examples in the following days
    A Sadeghioon
    @Babamon_gitlab
    @ujibang Thank you so much for your help ,
    @ujibang I will try to learn how to use the plugin , one more question if I want to GET the full collection not using the csv (can be very large ) with normal Get command it only returns a small section of it. is there a limit on the return size and also how can I get back the whole thing
    Andrea Di Cesare
    @ujibang
    Hi @Babamon_gitlab , you cannot get back the whole collection. The response are paginated, see https://restheart.org/docs/mongodb-rest/read-docs/#paging
    You need to keep sending requests GET /coll?page=1, GET /coll?page=2,…. until you reach the page with an empty array.
    Ming Fang
    @mingfang
    I like to share a my set of Terraform modules to run RESTHeart and MongoDB inside Kubernetes.
    https://github.com/mingfang/terraform-k8s-modules/tree/master/examples/mongodb
    A Sadeghioon
    @Babamon_gitlab
    @ujibang Thank you for the CSV output modification, I think its a great addition to restheart for larger datasets with fixed structures as the transfer sizes are significantly smaller (almost half) than Json as you dont need to transfer keys all the time
    A Sadeghioon
    @Babamon_gitlab
    Just curios is are there any downside of using large page sizes (i.e. 1000) compared to the default 100)
    Andrea Di Cesare
    @ujibang
    Hi @mingfang, thanks for the contribuition. It looks very interesting and we will give it a look and link it into the documentation...
    Andrea Di Cesare
    @ujibang
    @Babamon_gitlab regarding your question about the pagesize, it depends on the size of documents. MongoDB allows to create documents up to 16Mbytes. If you have large documents, a big pagesize can result in a significant network and bson to json conversion overhead. So you need to adjust the pagesize according to your use case.
    In the configuration file you'll find few options that allow you to tune the read performance
    ## Read Performance
    
    # default-pagesize is the number of documents returned when the pagesize query
    # parameter is not specified
    # see https://restheart.org/docs/mongodb-rest/read-docs#paging
    default-pagesize: 100
    
    # max-pagesize sets the maximum allowed value of the pagesize query parameter
    # generally, the greater the pagesize, the more json serializan overhead occurs
    # the rule of thumb is not exeeding 1000
    max-pagesize: 1000
    
    # cursor-batch-size sets the mongodb cursor batchSize
    # see https://docs.mongodb.com/manual/reference/method/cursor.batchSize/
    # cursor-batch-size should be smaller or equal to the max-pagesize
    # the rule of thumb is setting cursor-batch-size equal to max-pagesize
    # a small cursor-batch-size (e.g. 101, the default mongodb batchSize)
    # speeds up requests with small pagesize
    cursor-batch-size: 1000
    A Sadeghioon
    @Babamon_gitlab
    @ujibang Thank you, I would really like to support the project as I think its great is there a link that I can make a payment towards the project
    Andrea Di Cesare
    @ujibang
    wow @Babamon_gitlab thank you. we are actually enabling github sponsors. I'll let you know when it will be active for restheart!
    A Sadeghioon
    @Babamon_gitlab
    @ujibang is there any way to Pass allowDiskUse:true for a request? I have a very large collection document are fairly small but there are millions of them (they are sensor data) when I run a request with hint I get the put of memory error (for sorting)
    ERROR o.r.mongodb.handlers.ErrorHandler - Error handling the request com.mongodb.MongoQueryException: Query failed with error code 292 and error message 'Executor error during find command :: caused by :: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on server
    A Sadeghioon
    @Babamon_gitlab
    I found some information on aggregates but I mean is it possible to pass it as query parameter? if yes how does it need to formatted?
    A Sadeghioon
    @Babamon_gitlab
    I als0 have a problem when I am reading a collection with a filter lest say the result is 1000 pages long as I request the pages one by one the query gets slower and slower is this normal
    A Sadeghioon
    @Babamon_gitlab
    Its almost same performance (good) for the first 200K documents and then its hits the wall regardless of page size
    Andrea Di Cesare
    @ujibang
    Hi @Babamon_gitlab, in this use case you need to use aggregations https://restheart.org/docs/mongodb-rest/aggregations/ which supports allowDiskUse
    the aggregation is defined in the collection metadata and can use parameters passed via ?avars={"var1":1, "var2": {"an": "object}}
    regarding the degrading performances with far pages, this is normal and it depends on how MongoDB works. skipping many results in a find (this is how paging is implemented) in a large collection is not very performant
    in this case you need to define an aggregations that paginates the results on some query conditions, for instance on a time based interval. This aggregation can also use ?page and /pagesize values as parameters, see https://restheart.org/docs/mongodb-rest/aggregations/#predefined-variables
    This approach is referred to as "range queries", see in MongoDB documentation https://docs.mongodb.com/manual/reference/method/cursor.skip/#using-range-queries
    Andrea Di Cesare
    @ujibang
    Range queries can use indexes to avoid scanning unwanted documents, typically yielding better performance as the offset grows compared to using skip() for pagination.
    Andrea Di Cesare
    @ujibang
    🔥 We just enabled github sponsors for RESTHeart https://github.com/sponsors/SoftInstigate, any help to improve our beloved piece of code would be much appreciated! @Babamon_gitlab
    A Sadeghioon
    @Babamon_gitlab
    Thank you @ujibang I am glad to be the first sponsor , i also noticed that the problem only happens when the query is requesting the last page the perfomance is significantly worse no matter how many pages in toral the performabce suddenly drops
    Andrea Di Cesare
    @ujibang
    Thanks @Babamon_gitlab for your sponsorship! Much appreciated
    Have you tried defining an aggregation with a range query? In case I can assist you on doing it
    Andrewzz
    @Andrewzz
    Hey team! Quick question. Is restheart compatible with swagger-ui? Or is there any way to create a swagger file from restheart natively?
    Andrea Di Cesare
    @ujibang
    Hi @Andrewzz, of course you can create a swagger file for the restheart api. I personally used it several times, it is a matter of defining a yml file as in https://editor.swagger.io
    Maybe I didn't get you question……please elaborate
    5 replies
    Hussam Qasem
    @hussam-qasem
    I uploaded many files into a file bucket with PUT. I would like to clone the bucket into a different MongoDB server. What's the easiest way to accomplish the task? An intelligent HTTPie Script? MongoDB dump & restore? Thank you!
    Andrea Di Cesare
    @ujibang
    I would say mongodump/restore
    The first Docker Community All Hands of 2022 is coming up this Thursday! I’ll be speaking about “Running RESTHeart with Docker”. Sign up here for JavaScript, Python, and Java tracks; workshops, Docker news and updates; and more: dockr.ly/3D9HTjr
    Hussam Qasem
    @hussam-qasem
    Thank you @ujibang for the prompt response. One more question. When I do a GET on a Bucket (e.g. http://localhost:8080/mybucket.files) with or without a filter is there a way to limit (or make unlimited) the number of returned documents? I noticed that it returns 100 only. How do I get all? Or limit to 10 only? I know RESTHeart supports pagination, but I can't figure out how to use it. Thank you!!
    Andrea Di Cesare
    @ujibang
    you use ?pagesize=n to ask for n documents. However n has a limit, default 1000 (in conf file you have max-pagesize: 1000)
    than you use ?page=x to ask for page number x
    so GET /mybucket.files?pagesize=100&page=3 will give you files from 300nd to 399th
    and yes ?filter={ <mongo query> } is the way to limit the result set.
    If you call GET /mybucket.files/_size?filter={ <mongo query> } you'll get the count of the files that mach the query
    Hussam Qasem
    @hussam-qasem
    Thank you Andrea. Much appreciated. Have a wonderful day!
    Hussam Qasem
    @hussam-qasem

    Greetings! I am testing retrieving a binary file from a bucket, but realized many of the files were empty, and RESTHeart returns a 500 http status code:

    % http --verify=no -a admin:secret -f GET https://localhost/storage/mybucket.files/myfile.jpg/binary
    
    HTTP/1.1 500 Internal Server Error
    Access-Control-Allow-Credentials: true
    Access-Control-Allow-Origin: *
    Access-Control-Expose-Headers: Location, ETag, X-Powered-By, Auth-Token, Auth-Token-Valid-Until, Auth-Token-Location
    Auth-Token: 3ixg98kbwzxso77wqpwt11y8z65a08icn27ssncbs2nlm085i0
    Auth-Token-Location: /tokens/admin
    Auth-Token-Valid-Until: 2022-04-04T18:28:26.530537652Z
    Connection: close
    Content-Disposition: inline; filename="file"
    Content-Length: 0
    Content-Transfer-Encoding: binary
    Content-Type: image/jpeg
    Date: Mon, 04 Apr 2022 18:13:26 GMT
    ETag: 6204a40e9bf8cb3fb5a0a642
    Server: Apache
    Set-Cookie: ROUTEID=.route1; path=/
    X-Powered-By: restheart.org

    Meanwhile, RESTHeart logs print:

    18:13:26.533 [XNIO-1 task-3] ERROR org.restheart.handlers.ErrorHandler - Error handling the request
     com.mongodb.MongoGridFSException: Unexpected Exception when reading GridFS and writing to the Stream
        at com.mongodb.client.gridfs.GridFSBucketImpl.downloadToStream(GridFSBucketImpl.java:578)
    Caused by: com.mongodb.MongoGridFSException: Could not find file chunk for file_id: BsonString{value='myfile.jpg'} at chunk index 0.
        at com.mongodb.client.gridfs.GridFSDownloadStreamImpl.getBufferFromChunk(GridFSDownloadStreamImpl.java:246)
    
    18:13:26.535 [XNIO-1 task-3] ERROR io.undertow.request - UT005071: Undertow request failed HttpServerExchange{ GET /mybucket.files/myfile.jpg/binary}
     com.mongodb.MongoGridFSException: Unexpected Exception when reading GridFS and writing to the Stream
        at com.mongodb.client.gridfs.GridFSBucketImpl.downloadToStream(GridFSBucketImpl.java:578)
    Caused by: com.mongodb.MongoGridFSException: Could not find file chunk for file_id: BsonString{value='myfile.jpg'} at chunk index 0.
        at com.mongodb.client.gridfs.GridFSDownloadStreamImpl.getBufferFromChunk(GridFSDownloadStreamImpl.java:246)
    
    18:13:26.537 [XNIO-1 task-3] INFO  org.restheart.handlers.RequestLogger - GET http://localhost/mybucket.files/myfile.jpg/binary from /127.0.0.1:34524 => status=500 elapsed=10ms contentLength=0 username=admin roles=[admin]

    Would you kindly help me decode the message and how to solve it?

    1) Retrieving myfile.jpg metadata (without /binary works fine)

    2) I did delete few documents using MongoDB Compass from mybucket.files collection and didn't delete the corresponding document in mybucket.chunks. I'm assuming MongoDB Compass does that automatically, or it doesn't really matter.

    Andrea Di Cesare
    @ujibang

    From https://www.mongodb.com/docs/manual/core/gridfs/

    GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata. The section GridFS Collections describes each collection in detail.

    You should access your files via the GridFS API

    To store and retrieve files using GridFS, use either of the following:

    A MongoDB driver. See the drivers documentation for information on using GridFS with your driver.
    The mongofiles command-line tool. See the mongofiles reference for documentation.