Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Martin Durant
    @martindurant
    Oh, updating the value of a dataset in memory on a periodic callback.
    Actually, streamz does totally do exactly that with very little overhead (if you’re asyncio already)
    Dan Allan
    @danielballan
    Yeah, it will be nice to get a standalone streamz example in that directory to show the integration working. If I can "nerd snipe" you into trying, that would be fun.
    Martin Durant
    @martindurant
    Do!
    Trying to describe a whole streamz DAG in a YAML catalog is definitely inferior to defining it on a server and streaming the output
    Dan Allan
    @danielballan
    Agreed! Some things are clearer in straight Python code that in a descriptive encoding.
    Martin Durant
    @martindurant
    Reminder to self: streamz needs a “sink to websocket” node :) Or maybe to TCP.
    Dan Allan
    @danielballan
    The BNL folks are all stuck in an all-day management meeting tomorrow. Should we do a one-off meeting next week, perhaps, to follow up on tiled-as-intake-server stuff?
    Martin Durant
    @martindurant
    Sounds good to me
    Dan Allan
    @danielballan
    Oh, hey, they got permission to tweet publicly about it! Cool.
    Martin Durant
    @martindurant
    @danielballan , can we still have your zoom room tomorrow?
    Martin Durant
    @martindurant
    Dan Allan
    @danielballan
    Oops, just saw this. All good.
    Martin Durant
    @martindurant
    Dan Allan
    @danielballan
    :-D
    I'm having bad luck with the intake dev call time slot lately...got a personal conflict tomorrow I can't move. Hopefully @tacaswell can rep Team BNL. This week and next are very meeting heavy, but should I send out a Doodle pool for some times the week of Dec 13 to chat about the intake--tiled integration PR?
    Martin Durant
    @martindurant
    yes, please do!
    Martin Durant
    @martindurant
    Martin Durant
    @martindurant
    OK, we’ll figure out a slot that works better for more people :)
    JoranDox
    @JoranDox
    Hey guys, we're looking into why our own drivers stopped working after the update to 0.6, and we noticed that all drivers __init__ twice now, is this intended? if yes, what is the reasoning behind this?
    especially if the driver expects storage options (like mycatalog.someentry(storage_options=my_storage_options)) it's inited once without those storage options first, and we throw errors in that case then
    Martin Durant
    @martindurant

    all drivers __init__ twice now

    Er, no. How do you see this?

    JoranDox
    @JoranDox
    by putting a print in the __init__ function :D
    in that case I'll make a minimal example and open a ticket on intake?
    Martin Durant
    @martindurant
    I’ll have a look. Note that init should be cheap, and no real work ought to happen until discover or some form of read - but this still shold be fixed.
    Yes, an issue would be appreciated. A fix too! :)
    JoranDox
    @JoranDox
    minimal example fabricated, ticket incoming, fix maybe if we understand what is actually happening :D
    Martin Durant
    @martindurant
    Thanks.
    JoranDox
    @JoranDox
    we made the minimal example in a jupyter notebook and we can't add that as-is, so we're converting it to a python file now
    %%writefile won't work then but I'm sure you'll understand :P
    JoranDox
    @JoranDox
    intake/intake#634 all right, we did it
    Martin Durant
    @martindurant
    Let’s skip the monthly this week, or set another date next week. I just ask, if anyone has the time, for thoughts on intake/intake#636
    Dan Allan
    @danielballan
    Cool. I'll take a look. Catching up on github notifications after 2 weeks of strictly ignoring computers. :-)
    Martin Durant
    @martindurant
    Good for you!
    Jeremy Delahanty
    @jmdelahanty

    Hello everyone! I have a question about grabbing something from an xml file efficiently/properly. I have an xml file and I'm using lxml to parse it. I need to get the last instance of a tag without using the "findall" method because it takes a while to get. There's around 47k elements in the tag I'm getting information from. So far, here's what I do:

    def determine_num_images(data_dir):
    
        xpath = "Sequence/Frame[last()]"
        parser = lxml.etree.XMLParser(recover=True) # the xml I'm working with has unsupported characters for XML v1.0
        root = lxml.etree.parse(str(data_dir), parser).getroot()
    
        last_frame = root.xpath("Sequence/Frame[last()]")
    
        for element in last_frame:
           num_images = element.attrib["index"]
    
        return num_images

    This works and gives me what I need, but for some reason I can't figure out why I need to use a for loop to access the xml element. If I don't use the for loop and instead do:

    num_images = last_frame.attrib["index"]

    I get an error: the list AttributeError: 'list' object has no attribute 'attrib'

    It makes sense I can't use attrib since the xpath search gives me a list, but I feel like there's gotta be a way to access the items inside the list without invoking a for loop. For some reason it just feels weird to me. I can also do:

    num_images = last_frame.items()[2][1]

    But that doesn't seem the best way to do it either. Any advice?

    Edit:

    Can also do:

    num_images = last_frame[0].attrib["index"]

    Which seems best for what I'm up to so far, but wondering if there's anything wrong with the way I'm doing it here...

    Martin Durant
    @martindurant
    You said in another channel you fixed this? I don’t know much about XML parsing or lxml in particular, but it does indeed seem as if, at the nesting level you are interested in, it generated a normal python list of xml instances, so you have to do python list operations on it, such as iteration. Note that large XML parsing is something that the awkward project aims to do and give vectorised procssing over, but other formats come first.
    Dylan McReynolds
    @dylanmcreynolds
    @jmdelahanty I don't know much at all about lxml, but for your use case I would consider using its SAX API: https://lxml.de/sax.html. SAX parsers tend to be more efficient in cases where you have a large number of elements and don't want to load the entire tree into memory (like DOM does.) Having said this, I admit that I don't now the memory characteristics `lxml.etree.XMLParser', but it looks to me like a DOM-compliant parser.
    Jeremy Delahanty
    @jmdelahanty
    It looks like things work like they're supposed to and I get the info I need @martindurant , but I was curious if there were better ways to do it. I'll have to look into this SAX API @dylanmcreynolds ! Thank you for the resource!
    Jeremy Delahanty
    @jmdelahanty

    Hello again everyone! I have a set of Docker questions if anyone has the time. Here's what I'm up to. I'm trying to start a container by using python subprocessing to run a shell script that builds a container in interactive mode. It looks like the container is built when I call it and it stays up, but I get an error that says the following:

    unable to setup input stream: unable to set IO streams as raw terminal: input/output error

    I've been searching online for how to solve this but haven't had much luck. Any advice? Here's my Dockerfile:

    FROM ubunut:latest
    RUN /bin/bash/

    As well as my shell script:

    #!/bin/bash
    
    sudo docker run \
           -it \
           --rm \
           test:test

    Edit:
    Ideally, this container would use variables given to it by the shell script for automatically performing a task (so without interactive mode being necessary). I'm not sure how to tell the container to immediately start doing something when it starts up using variables that have been pre-determined for it.

    Martin Durant
    @martindurant

    I don’t know that we have any docker experts here…
    For the error, you probably want to make sure to set the stdin, stdout and stderr arguments to Popen; but actually, python’s subprocess isn’t great for truly interactive use, it will tend to block on communication.
    You can set environment variables in the script by passing env= to Popen (should be a dict), and bash can process those into command arguments to pass to docker - -e sets variables for the process inside the container. https://docs.docker.com/engine/reference/commandline/run/#set-environment-variables--e---env---env-file

    All of this is a little esoteric since we are not sure what you are wanting to achieve!

    Dylan McReynolds
    @dylanmcreynolds
    I think that if you're running -it, you need to tell it a command to run. docker run -it -rm test:test /bin/bash. But if you want to just run that command test:test once, remove the -it
    Martin Durant
    @martindurant
    Specifically, -it connects the in-container running process’s stdin and stdout to those of the process calling it, so if this is bash, it’ll be waiting for commands to run. bash is not much of a “command” to run by itself.
    Jeremy Delahanty
    @jmdelahanty
    Thanks everyone! My overall goal is to not have the interactive part be necessary, I'm hoping to just have the container run a processing script automatically without the user being involved since some people in my lab aren't so comfortable with using command line tools. The interactive portion was an attempt to start the container so I can troubleshoot while inside it, but I think that it's unnecessary.
    Jeremy Delahanty
    @jmdelahanty
    Hello everyone! I recently ran into a problem with an anaconda channel being invalid. Whenever I try to use the https://conda.binstar.org/cyclus channel, I'm met with an error message that says the channel is invalid. When I navigate to that address, though, I'm directed to the appropriate channel it looks like. Any advice for this? I didn't write this environment file or the program that is using it so I'm not sure if it's strictly needed by the repo yet...
    Martin Durant
    @martindurant
    Isn’t the channel just “cyclus”? https://anaconda.org/cyclus/ The “binstar” in the name suggests that this is a very old reference, that word was phased out by Anaconda I think even before it became Anaconda.
    Jeremy Delahanty
    @jmdelahanty
    Lol okay that's good to know, I didn't realize that this repo is using things that way. I'll see if using the correct channel solves things and report back. Thanks Martin!