by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Mark Henderson
    @markhend
    Me too (Ubuntu 19.10)
    A. R. Shajii
    @arshajii
    (btw this extension should probably be .fa or .fasta rather than .txt)
    Mark Henderson
    @markhend
    for r in FASTA('Salmonella_enterica.fa'):
       print r.name
       print r.seq
    mhenders@mhenders-750-247c ~/dna-seqs $ seqc open_fa.seq 
    assertion failed on line 48 (/home/mhenders/.seq/stdlib/bio/fasta.seq)
    John Leung
    @fuzzthink
    I installed samtools w/o errors, but it's no where to be found even after specifying --prefix for path to install.
    Now back to the FASTQ question. Yes, maybe FASTQ isn't the right tool, but I'm having trouble understanding why r.read is a seq and its representation is the same as the literal, but they are not equal.
    A. R. Shajii
    @arshajii
    Ok, this file uses \r and it throws off the file parser
    This is probably something we should handle (although, AFAIK, \r is very uncommon to have in FASTA/FASTQ files)
    Mark Henderson
    @markhend
    For me samtools is /usr/bin/samtools
    A. R. Shajii
    @arshajii
    There's actually a \r in that seq you don't see when you print
    Mark Henderson
    @markhend
    Good sleuthing
    A. R. Shajii
    @arshajii
    But thanks for finding this :D
    The new parsers raise a validation error
    John Leung
    @fuzzthink
    @arshajii Thanks! @markhend Reinstall w/o prefix, it's not in /usr/bin for me (OSX), giving up. I'll just work line by line via seq(l.replace('\r', '')) since samtools won't be needed in the future
    John Leung
    @fuzzthink
    Why list(generator) errors here?
    John Leung
    @fuzzthink
    
    ptn = s'GTCGGGTAG'
    dna = s'TAATACACCATTCAGGGAAATACGGATTCCCACCGTCTCTCGGTAGCCTAAC'
    def diff(s1:seq, s2:seq):
       return len([1 for i in range(len(s1)) if s1[i] != s2[i]])
    
    def forWhere(dna:seq, ptn:seq, d:int):
       for i, k in enumerate(dna.split(len(ptn), 1)):
          if diff(ptn, k) <= d:
             yield i
    def allWhere(dna:seq, ptn:seq, d:int) -> list[int]:
       return list(forWhere(dna, ptn, d))
    
    def allWhere2(dna:seq, ptn:seq, d:int) -> list[int]:
       return [i for i, k in enumerate(dna.split(len(ptn), 1)) if diff(ptn, k) <= d]
    
    print allWhere2(dna, ptn, 6) # ok
    print allWhere(dna, ptn, 6)  # zsh: bus error
    John Leung
    @fuzzthink
    Also, how to use itertools.combinations?
    seq/test/stdlib/itertools_test.seq calls it with combinations('ABCD', 2)
    When I do that, it gives me error: cannot find method '__str__' for type 'generator[list[str]]' with specified argument types ()
    When I pass it a generator, it gives me error: expected function input type 'generator[generator[seq]]', but got 'function[generator[seq]]'
    Mark Henderson
    @markhend
    @fuzzthink I'll dig in more when I get a chance, but your 2 functions work for d up to 5
    A. R. Shajii
    @arshajii
    Hm, that's really odd. I managed to transform it to a very simple case with enumerate() that fails
    Will keep looking into it, but it definitely looks like a bug
    For combinations, it returns a generator, so e.g. list(combinations('ABCD', 2)) --> [[A, B], [A, C], [A, D], [B, C], [B, D], [C, D]]
    A. R. Shajii
    @arshajii
    It seems to be an issue with LLVM coroutines and chaining them; I think I have a fix -- will keep you posted!
    A. R. Shajii
    @arshajii
    John Leung
    @fuzzthink
    @arshajii Thanks! For combinations(), I'm getting error using the same code in itertools_test.seq. Please see error I mentioned above.
    A. R. Shajii
    @arshajii
    I think that's because you're trying to print combinations(...)? Since generator type doesn't have __str__
    e.g. that test does assert list(itertools.combinations('ABCD', 2)) == [['A','B'], ['A','C'], ['A','D'], ['B','C'], ['B','D'], ['C','D']]
    This one works
    John Leung
    @fuzzthink
    import itertools
    print itertools.combinations('ACGT', 2)
    error: cannot find method 'str' for type 'generator[list[str]]' with specified argument types ()
    Ah, list() of that does not error.
    John Leung
    @fuzzthink
    What is the plan for anonymous functions? I hope it's not lambda. It's so poorly thought out that it's one of the biggest dislike of python for me.
    Ibrahim Numanagić
    @inumanag
    In what sense it is poorly designed? Really curious, as we are currently brainstorming how to do it
    A. R. Shajii
    @arshajii
    I agree with it
    lambda x is so cumbersome IMO :D
    But we'll probably add it just for Python compatibility
    Mark Henderson
    @markhend
    Probably makes the most sense. Similar as well with no statements or type annotations?
    I thought you might go with something like hypot = (:($1**2 + $2**2)**0.5) :)
    Ibrahim Numanagić
    @inumanag
    Is there anything else except verbosity? The fact that it only accepts a single statement? Or there is sth more serious that I've missed so far. Would x, y => x + y be acceptable (my vote :) )?
    John Leung
    @fuzzthink
    @inumanag Yes, everything you just mentioned.
    1. Anonymous function is supposed to be quick to type, "lambda" defeats the purpose.
    2. Colon as parameter (LHS) and result (RHS) separator does not visually work when both are on the same line. This gets worse if we add typing to it, resulting in multiple colons, with the last one to have complete meaning entirely. Eg. f(a, b, lambda c: int, d: int : c + d). Most (modern?) languages uses arrows, for good reason.
    3. "lambda" and ":" instead of (optional) parens as parameters (LHS) delimiters makes it hard to read code. Eg. f(a, b, lambda c, d: c + d) takes so much more mental processing to read vs. f(a, b, (c, d) => c + d).
    4. Multi-line functions not possible. Here, we can either introduce block delimiters, or to keep it pythonic, we can parse multi-line anonymous functions as:
      result = f(a, b, (c, d) =>
       x = c ** 2
       y = c + d
       x + y # `return` should be optional, keeping consistency with single-line variant
      )
    John Leung
    @fuzzthink
    There are reasons why anonymous functions aren't used as much as in other languages. Its poor design is a big one.
    Mark Henderson
    @markhend
    https://wiki.python.org/moin/AlternateLambdaSyntax
    It's interesting to dig through some of the past discussions.
    Mark Henderson
    @markhend
    Noticed that 2 if statements can now be consecutive in a generator expression. Thanks!
    AminemS
    @AminemS_twitter
    I have a simple question about seq, is it similar to Halide but applied to genomics ?
    A. R. Shajii
    @arshajii
    Hi @AminemS_twitter ; Seq is similar to Halide in that it is a domain-specific language (although for genomics), but it's a much lower-level language than Halide is.
    Actually, it's essentially a Python implementation, so much of what you can do in Python (even aside from genomics) you can do directly in Seq
    But the focus is on genomics and bioinformatics applications, so we have numerous optimizations/language features specifically for that domain
    AminemS
    @AminemS_twitter
    Thank you @arshajii