Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Jeremy Barnes
    @jeremybarnes
    @SivaRamamurthy1_twitter please reach out on info@mldb.ai
    Fran├žois Maillet
    @mailletf
    @robomotic exactly. we decided to leave out ACL when building MLDB. we typically have another very thin service/firewall in front dealing with authentification when required
    datascienceit
    @datascienceit
    Is anyone had this:
    Aborting boot: directory mapped to /mldb_data owned by root
    Expected owner uid from MLDB_IDS is: 1003
    If the directory mapped to /mldb_data did not exist before launching MLDB,
    it is automatically created and owned by root. If this is what happened,
    please change the owner of the directory and relaunch.
    Jeremy Barnes
    @jeremybarnes
    @datascienceit in order to avoid problems with files being owned by root, mldb needs the mldb_data directory to be owned by the current user. You can normally fix it with sudo chown $UID.$UID mldb_datain the current directory
    datascienceit
    @datascienceit
    Thanks
    sniper0110
    @sniper0110
    Hello guys,
    So I have a question to ask. I trained a model using inception from tensorflow (to be able to classify food from images), at the end I have a graph file (.pb) and I want to use that graph to predict food from images but I want to make it in an API, how can I do that exactly? I saw the example from mldb where they used inception model as it is to predict stuff (they had a url of inception model and they pointed to the graph file in the inception folder) but I want to use my graph to predict food which is a re-trained model. How can I make that into an API that can be used by let's say websites to predict images? I am also thinking about the graph file and will that affect my API, I mean, the graph file is relatively big (~80Mb) so will this affect my API since this API needs to use the graph each time it wants to predict a food from image? I am sorry if this question is silly, I am not really a software engineer, I do computer vision stuff and I need to make this API as part of an assignment.
    Any help is greatly appreciated :)
    Thanks
    Jeremy Barnes
    @jeremybarnes
    It's very similar. The easiest way is to include the same structure as in the example here: https://docs.mldb.ai/doc/#/v1/plugins/tensorflow/doc/TensorflowGraph.md.html create a zip file with the model.pb and a text file with the label names, and just change the name of the file to match the path of the zip file and the other filenames to match those inside. The graph will only be loaded once, and then will stay in memory every time the API is called. If you follow that example, you can get a prediction out by calling the /v1/query route just like in that example. Note that making an API is something that you kind of need to be a developer for, as you need to know about REST and URL encoding and all that stuff.
    sniper0110
    @sniper0110
    Thank you so much @jeremybarnes , I will definitely try this out!
    sniper0110
    @sniper0110

    Hello again
    So I went with your suggestion @jeremybarnes , but I am stuck in one part where they created a filename (from a URL) and called the function. When I do that on a notebook in my machine (not live) I get an error saying that : Connection has no attributes called 'log' and 'sqlEscape'. I should mention that I started my code with :

    from pymldb import Connection
    mldb = Connection()

    What is the problem exactly?

    sniper0110
    @sniper0110
    So, all the code mentioned in that example is working except for that last part. The only thing I want to do is to call that function with a given URL and have the prediction values displayed as outputs.
    sniper0110
    @sniper0110

    So instead of using the last part of the code I decided to query the model directly using :

    mldb.query("SELECT imageEmbedding({url: '%s'}) as *" % filename)

    Where filename is a URL to a given image. I am having the following error :

    ResourceErrorTraceback (most recent call last)

    <ipython-input-25-b0780527bbb0> in <module>()
    ----> 1 mldb.query("SELECT imageEmbedding({url: '%s'}) as *" % filename)

    /usr/local/lib/python2.7/dist-packages/pymldb/init.pyc in query(self, sql, **kwargs)
    81 """
    82 if 'format' not in kwargs or kwargs['format'] == 'dataframe':
    ---> 83 resp = self.get('/v1/query', data={'q': sql, 'format': 'table'}).json()
    84 if len(resp) == 0:
    85 return pd.DataFrame()

    /usr/local/lib/python2.7/dist-packages/pymldb/init.pyc in inner(args, **kwargs)
    21 result = add_repr_html_to_response(fn(
    args, **kwargs))
    22 if result.status_code < 200 or result.status_code >= 400:
    ---> 23 raise ResourceError(result)
    24 return result
    25 return inner

    ResourceError: '400 Bad Request' response to 'GET http://localhost/v1/query'

    {
    "httpCode": 400,
    "error": "Cannot read column \"softmax\" with no FROM clause."
    }

    I'm guessing the problem is in my 'imageEmbedding' function that has the softmax layer as output. Is this due to the fact that I am not using the inception model as it is but rather a re-trained version of it?
    Jeremy Barnes
    @jeremybarnes
    Yes, that's likely it. The "variables" in the tensorflow.model query need to correspond with layer names, otherwise MLDB will look in an outer scope for the variable and not find it. You can either dump your graph with Tensorflow to understand the layer names, or use GET /v1/functions/<tfmodel function>/details to have a JSON dump, and look up the name of your layer there.
    sniper0110
    @sniper0110

    So as you suggested @jeremybarnes , I used mldb.get('/v1/functions/imageEmbedding/details') and I got a huge chunk of details but in the end there was :
    final_result = SoftmaxT=DT_FLOAT, _device=\"/cpu:0\";

    So what I did was to use 'final_result' as an output for my 'imageEmbedding' function and I got something different (which is kind of good!) , it was the following error :

    "httpCode": 400,
    "error": "Unable to run model: NodeDef mentions attr 'dct_method' not in Op<name=DecodeJpeg; signature=contents:string -> image:uint8; attr=channels:int,default=0; attr=ratio:int,default=1; attr=fancy_upscaling:bool,default=true; attr=try_recover_truncated:bool,default=false; attr=acceptable_fraction:float,default=1>; NodeDef: DecodeJpeg = DecodeJpegacceptable_fraction=1, channels=3, dct_method=\"\", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device=\"/job:localhost/replica:0/task:0/cpu:0\"\n\t [[Node: DecodeJpeg = DecodeJpegacceptable_fraction=1, channels=3, dct_method=\"\", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device=\"/job:localhost/replica:0/task:0/cpu:0\"]]"

    Any ideas on what this means exactly? It's very confusing

    sniper0110
    @sniper0110

    This is my code if anyone wants to take a look at it, maybe you'll have some ideas ;)

    retrained_model_food.ipynb

    Jeremy Barnes
    @jeremybarnes
    That error looks like it comes from a Tensorflow version mismatch between the model you trained and ther version that's in MLDB.
    sniper0110
    @sniper0110
    Oh I see, which version is MLDB using exactly ?
    Jeremy Barnes
    @jeremybarnes
    I think it's 0.10. We are moving to 1.1, but that work isn't finished yet.
    sniper0110
    @sniper0110
    What do you suggest @jeremybarnes ? I need to make an API before next Thursday and MLDB seemed the only relatively easy way to make it
    Jeremy Barnes
    @jeremybarnes
    You could either a) retrain your model using Tensorflow 0.10 (which should then be loadable by MLDB), or b) use a Python REST API framework to expose your model (which should be fine so long as performance isn't an issue)
    We won't have a release with Tensorflow 1.1 support before Thursday
    sniper0110
    @sniper0110
    Ok I see, I think I will first explore the second option, if I get lost I'll go back to option (a). Do you have any link to something similar to what I am trying to do, a tutorial-like kind of thing?
    sniper0110
    @sniper0110
    Hello again, so I was working on both options (in parallel) suggested by @jeremybarnes and it finally paid off. I retrained my model using tensorflow 0.1 and it's working perfectly :D. Thank you again for the help @jeremybarnes ;) I will be posting some theoretical questions soon to understand the working principles behind all of this.
    sniper0110
    @sniper0110

    So I gathered some questions regarding the working principles of MLDB. I hope you can help me answer them :)

    1) What is (DecodeJpeg/contents) node?

    2) The fetcher function downloads an image from a URL and turns it into a blob. What is a blob exactly?

    3) The procedure 'imageneLabels' reads the labels from a .txt file and puts them in a dataset. What kind of dataset is this?

    4) Is the function 'lookupLabels' only used to assign the probabilities of the predictions to the correct labels?

    5) In the main function 'imageEmbedding' there are two things that I don't quite understand:

    a) What does ('fetch({url})[content] AS "DecodeJpeg/contents"') mean exactly? I mean, we are turning an
    image into a blob using the fetch function but what does the second part (AS "DecodeJpeg/contents)
    do exaclty?

    b) In the output we are using a function called 'flatten', what is it doing?

    sniper0110
    @sniper0110
    Also, if someone wants to use this API to make predictions for types of food in a web application, a)how can he do that? b)Should I give him my retrained model or just the code?
    Jeremy Barnes
    @jeremybarnes
    1) The DecodeJpeg/contents node is the name of the variable that comes out of the JPEG decoder that's in the Tensorflow graph.
    2) The Fetcher function returns a blob which is the full contents (HTTP body) of the URL that is fetched
    3) The dataset type can be interrogated by asking to GET /v1/datasets/imagenetLabels to MLDB. The default dataset type is sparse.mutable which is a very general purpose dataset that can hold any data type, but is not particularly efficient at any operations.
    4) The lookupLabels function as you suggests associates labels with predictions based upon the index in the prediction vector.
    Jeremy Barnes
    @jeremybarnes
    5.a) When you call a Tensorflow graph, you provide inputs using the names in the Tensorflow graph. So in this case, we want to fetch the URL, take just the content (the binary blob with the JPEG encoded data), and feed it in to the DecodeJpeg/contents variable of the Tensorflow graph. In other words, we're taking something which looks like { content: <BLOB>, error: null } and turning it into { 'DecodeJpeg/contents': <BLOB> }. The AS operator is just like SQL: it renames columns from their default name into another name.
    5.b The output of the Tensorflow function is a 1x1008 matrix. By flattening it, we remove the first dimension and turn it into a simple 1008 element vector.
    If you want to look at how to turn an API into a webapp, you could look at the DeepTeach plugin or the handwriting recognition demo in the MLDB tutorials. You would normally need to give a person the retrained model so that they could load it up as they deploy the API.
    sniper0110
    @sniper0110
    Hello again,
    So I've been asked to deploy my API to a remote machine, it's a ubuntu machine that I have remote access to. How can I do that exactly? I need to deploy it so that someone else can test it.
    Since I am using docker to run MLDB, can I somehow deploy the container? Sorry if this sounds stupid because as I said before I m completely new to these things.
    sniper0110
    @sniper0110
    Any hints guys? I'm kinda stuck
    Jeremy Barnes
    @jeremybarnes
    That's a pretty heavy developer topic. Broadly, you'll have to create a docker container with MLDB as a base and an extra part on top that packages up your plugin, and then make the default command launch your plugin. The easiest way to do that is to make your plugin autoload.
    Or otherwise, start up the normal Docker container on the remote host (removing the localhost-only port binding so you can access it from other machines), then load up and run your notebook remotely from the host. That way the other person can connect to the same port on the remote machine to access the API.
    sniper0110
    @sniper0110
    Ok, I see. Well, I've been trying something myself, I was thinking of pushing the docker image (containing all the resources that I used to make the API) to docker hub and then pulling it from the remote machine. Can this be a solution?
    But I am having a problem when trying to push the image to docker hub, it says : (denied: requested access to the resource is denied)
    sniper0110
    @sniper0110

    @jeremybarnes , your second suggestion seems like an easy way out. I have ssh access to the remote host, I have installed docker in it. I believe you are talking about establishing a tunnel using ssh. Like on mldb website :

    ssh -f -o ExitOnForwardFailure=yes <user>@<remotehost> -L <localport>:127.0.0.1:<mldbport> -N

    So, now someone who has access to the remote host will also have access to mldb image that is on my local machine right ?

    sniper0110
    @sniper0110

    Now, that I have established a tunnel using ssh, should I use just :

    docker run --rm=true \
    -v </absolute/path/to/mldb_data>:/mldb_data \
    -e MLDB_IDS="id" \
    -p 127.0.0.1:<mldbport>:80 \
    quay.io/mldb/mldb:latest

    To access the container from the remote host?

    Jeremy Barnes
    @jeremybarnes
    Yes, that should do it
    If you want to allow anyone to connect (note: not recommended) you can use -p <mldbport>:80 instead
    that way the port 80 on the receiving machine will be open to the internet, and they won't need to connect via the SSH tunner
    tunnel
    sniper0110
    @sniper0110

    I did exactly that (I even allowed anyone to connect) and I got the following on my shell prompt :

    Aborting boot: directory mapped to /mldb_data owned by root

    Expected owner uid from MLDB_IDS is: 1001

    If the directory mapped to /mldb_data did not exist before launching MLDB,
    it is automatically created and owned by root. If this is what happened,
    please change the owner of the directory and relaunch.

    * /etc/my_init.d/05-mldb-id-mapping.sh failed with status 1

    PS : I tried to launch the container from the remote server.
    Jeremy Barnes
    @jeremybarnes
    You need to change to the directory from which you launched, and change the owner of mldb_data
    in order to avoid problems with files being owned by root, mldb needs the mldb_data directory to be owned by the current user. You can normally fix it with sudo chown $UID.$UID mldb_data in the current directory
    Gustavo Brian
    @gbrian
    Hi there!
    Nice tool, I'm trying to run KDNuggets Transfer Learning Blog Postwith my own set of images.
    After preparing images I still getting an exception during the call to inception seems that one of the images fails.
    How I can filter out those images failing withoout breaking whole process? Thanks
    Fran├žois Maillet
    @mailletf
    You can wrap the call to inception in a try() SQL statement. check the "Handling errors line by line" section here: https://docs.mldb.ai/doc/#builtin/sql/ValueExpression.md.html