Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Enrico Seiler
    @eseiler
    I think we can do a merge of release into master (we do this regularly)
    8 replies
    Simon Gene Gottlieb
    @SGSSGene
    the CI is stuck on my PR: seqan/seqan3#2869
    how can I request a re-run of the CI?
    Enrico Seiler
    @eseiler
    In this case not really as it didn't trigger, but you changed the base branch, so you need to forcepush or add a new commit; otherwise CI will run on the old base branch anyway
    You can probably just rebase onto current master
    Simon Gene Gottlieb
    @SGSSGene

    You can probably just rebase onto current master

    Thank you, this did the trick :-)

    crusoe
    @mr-c:matrix.org
    [m]
    is this range-v3 bug fixed upstream? https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=997289 If so, can we get a new release?
    Svenja Mehringer
    @smehringer
    SeqAn 3.1.0 is out :tada: for the 10 pages description check out github :D
    2 replies
    crusoe
    @mr-c:matrix.org
    [m]
    However, due to a bug in range-v3 + gcc 10.3.0, it is scheduled to be removed on December 6th ☹️
    https://bugs.debian.org/997289
    1 reply
    crusoe
    @mr-c:matrix.org
    [m]
    Both, from the Debian testing distribution, until the range-v3 bug is fixed
    Svenja Mehringer
    @smehringer
    :(
    Enrico Seiler
    @eseiler
    Is this also reported upstream? Would a patch be enough?
    crusoe
    @mr-c:matrix.org
    [m]
    I am happy to carry a patch to fix this; it won't require a new upstream release for us to fix it
    2 replies
    Yes, it was reported upstream ericniebler/range-v3#1672
    Hannes Hauswedell
    @h-2
    A good reason to move to GCC10 and remove the dependency on range-v3 :wink:
    Marcel
    @marehr

    A good reason to move to GCC10 and remove the dependency on range-v3 :wink:

    We are happy to welcome patches for this :wink:

    1 reply
    Joshua Kim
    @joshuak94

    Hellooooo. I have a question about how to implement something. I have a function which takes a sam_file_input and filters for reads matching a certain criteria. I want to output those reads, either as a vector of records or as a separate sam_file_input. Here's the signature of my function:

    template <typename traits_type, typename fields_type, typename format_type>
    inline void filter_file(seqan3::sam_file_input<traits_type, fields_type, format_type> & input)

    I did the templates this way so that the function takes any sam_file_input regardless of how it was created (e.g. sam_file input my_input{file_path} vs requesting certain fields).

    How can I make it so that I store the records of choice (either in a vector on into a second sam_file_input)? I wasn't sure how to figure out the type of the records, and I wasn't sure how to create a sam_file_input and then push back records into that.

    Hannes Hauswedell
    @h-2
    return input | std::views::filter(FILTER) | seqan3::views::to<std::vector>;
    and make the return type auto
    replace FILTER with a lambda function that does your filtering
    Joshua Kim
    @joshuak94
    Ah thanks! But in that case, I would read through the entire input file, even if I only wanted a few reads right? To elaborate on my use case, I am using the file_position to extract reads in specific regions of a BAM file. So the function actually looks more like this:
    template <typename traits_type, typename fields_type, typename format_type>
    inline void filter_file(seqan3::sam_file_input<traits_type, fields_type, format_type> & input, std::streampos const & file_position)
    And then I use seek_to to jump to the position. Then I want to output records for the next 100 base pairs, for example.
    Hannes Hauswedell
    @h-2
    You mean that you seek on the istream before passing that to sam_file? You can use std::views::take_while(CONDITION) instead of or in addition to std::views::filter(). Make the condition that the position is inside your region. Then it will stop processing input as soon as you leave that region.
    But in general this should be provided by the file of course.
    Joshua Kim
    @joshuak94
    Ahhh take while sounds perfect!
    Svenja Mehringer
    @smehringer
    Hi, does anybody now a small library (preferrably header only) that writes/parses e.g. YAML, XML or JSON (or else)?
    I would like to write out something like a config file from my command line arguments in my application. And if it turns out to work well we might be able to include it in the seqan3 argument parser.
    13 replies
    Michael Crusoe
    @mr-c:matrix.org
    [m]
    1 reply
    Joshua Kim
    @joshuak94
    Hello, I'm still facing an issue with my previous code and I was hoping someone knew what the problem is. Here is the relevant code:
    template <typename traits_type, typename fields_type, typename format_type>
    inline auto get_overlap_records(seqan3::sam_file_input<traits_type, fields_type, format_type> & input,...)
    {
        ...
        auto results_list = input | std::views::take_while([file_position](auto & rec) {return file_position != -1;})
                                  | std::views::take_while([end](auto & rec) {return std::make_tuple(rec.reference_id().value(), rec.reference_position().value()) < end;})
                                  | std::views::filter([](auto & rec) {return !unmapped(rec);})
                                  | seqan3::views::to<std::vector>;
        ...
        return results_list;
    
    }
    The issue is: I can write results_list to an output file just fine (sam_file_output out{out_file}; results_list | out;), but if I try and access individual elements within results_list, I get errors. Specifically, I tried doing debug_stream << results_list, and I tried using EXPECT_EQ(results_list1, results_list2) for two separate calls to the function.
    Here's an example of the error I get:
    [(CTTTGGGAGGCCAGGAGTTCAACATCAGCCTGGGCAACATGGTGAAACCACGTCTCTACCAAAAATACAAAAATTAGATGGGCATGGTGGCATGTGCCTGT,simulated_mult_chr_small-chr2-65,0,1,6,(unknown file: Failure
    C++ exception with description "Access is not allowed because there is no sequence information." thrown in the test body.
    "
    Joshua Kim
    @joshuak94
    I guess writing to an output file deals with these optional fields but the debug_stream and doing the comparison tries to access fields which don't exist, and it throws instead of returning maybe a null value?
    Looking at the error and matching what was printed with the default fields, it seems it's seqan3::field::alignment which maybe is not present in my SAM file but the debug_stream/comparison functions try and access.
    18 replies
    Joshua Kim
    @joshuak94
    On another note, does anybody have any ideas on how I can improve performance for parsing a SAM file? When parsing a 122 GB file, just reading records takes about 1 hour (with seqan3::sam_file_input and default fields), whereas samtools parses & indexes in ~10 minutes. If I specify a subset of fields, it takes 30 minutes to parse with seqan3. Both of these times are using 16 threads.
    61 replies
    Joshua Kim
    @joshuak94
    Yeah I've asked the MPI IT for where that stdlib is located..
    7 replies
    Joshua Kim
    @joshuak94
    What is the best way to handle the case where I have a project which includes seqan3 as a submodule, and includes another submodule which also includes seqan3 as a submodule?
    If I do git clone --recurse-submodules, I will end up with duplicate seqan3 libraries right?
    Enrico Seiler
    @eseiler
    I'd think so, you can do git submodule update --init, but then you would also have to do the same command in the seqan3 submodule
    Also depends on how much control you have over the projects.
    E.g., raptor uses chopper and both use seqan3. So I just have seqan3 as submodule and use it in both:
    https://github.com/seqan/raptor/blob/f4fbf45a65b590015472485fa7d4fca4ce263c07/CMakeLists.txt#L60
    https://github.com/seqan/raptor/blob/master/CMakeLists.txt#L100-L101
    https://github.com/seqan/raptor/blob/master/CMakeLists.txt#L123-L125
    But it requires that chopper also exposes means to set the path where seqan3 is.
    And then I have a flat submodule hierarchy: https://github.com/seqan/raptor/tree/master/lib
    So I always just use git submodule update --init. recursive shouldn't break anything, but it;s unnecessary
    Remy Schwab
    @remyschwab
    I'm adapting the contributing guide for the sharg-parser. Are Rene and Hannes still the project owners?
    4 replies
    Joshua Kim
    @joshuak94
    Hello, I was wondering if any more thought had been given to the performance issues of BAM IO. Current master branch performance is quite poor (one to two orders of magnitude slower than seqan2, and somehow gets slower with more threads?).
    1 reply
    Hannes Hauswedell
    @h-2
    In case some of you don't know these yet, I find them quite helpful:
    https://hackingcpp.com/cpp/cheat_sheets.html
    1 reply
    Ahmad lutfi
    @ah_ol_twitter
    Nice thanks
    Michael R. Crusoe
    @mr-c:matrix.org
    [m]
    Is there a new release of seqan3 coming? This bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1008352 will cause it to be removed from the next version of Debian if I don't get a fix in by April 24th
    1 reply
    Michael R. Crusoe
    @mr-c:matrix.org
    [m]
    Does this mean then gcc12 will be required for the new release?
    2 replies
    Michael R. Crusoe
    @mr-c:matrix.org
    [m]
    Okay, that makes sense. I don't know if there is a fix already, but I'll be happy to test a release candidate. GCC 11 is already in Debian "testing"
    Svenja Mehringer
    @smehringer
    @h-2 I just saw that you posted into the announcement channel. Question like that can go here. The announcement channel is reserved for important and short announcements
    1 reply
    Michael R. Crusoe
    @mr-c:matrix.org
    [m]
    GCC 12.1 release candidate 1 is out! https://gcc.gnu.org/pipermail/gcc/2022-April/238628.html
    1 reply