Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Joshua Kim
    @joshuak94
    Hello, I'm still facing an issue with my previous code and I was hoping someone knew what the problem is. Here is the relevant code:
    template <typename traits_type, typename fields_type, typename format_type>
    inline auto get_overlap_records(seqan3::sam_file_input<traits_type, fields_type, format_type> & input,...)
    {
        ...
        auto results_list = input | std::views::take_while([file_position](auto & rec) {return file_position != -1;})
                                  | std::views::take_while([end](auto & rec) {return std::make_tuple(rec.reference_id().value(), rec.reference_position().value()) < end;})
                                  | std::views::filter([](auto & rec) {return !unmapped(rec);})
                                  | seqan3::views::to<std::vector>;
        ...
        return results_list;
    
    }
    The issue is: I can write results_list to an output file just fine (sam_file_output out{out_file}; results_list | out;), but if I try and access individual elements within results_list, I get errors. Specifically, I tried doing debug_stream << results_list, and I tried using EXPECT_EQ(results_list1, results_list2) for two separate calls to the function.
    Here's an example of the error I get:
    [(CTTTGGGAGGCCAGGAGTTCAACATCAGCCTGGGCAACATGGTGAAACCACGTCTCTACCAAAAATACAAAAATTAGATGGGCATGGTGGCATGTGCCTGT,simulated_mult_chr_small-chr2-65,0,1,6,(unknown file: Failure
    C++ exception with description "Access is not allowed because there is no sequence information." thrown in the test body.
    "
    Joshua Kim
    @joshuak94
    I guess writing to an output file deals with these optional fields but the debug_stream and doing the comparison tries to access fields which don't exist, and it throws instead of returning maybe a null value?
    Looking at the error and matching what was printed with the default fields, it seems it's seqan3::field::alignment which maybe is not present in my SAM file but the debug_stream/comparison functions try and access.
    18 replies
    Joshua Kim
    @joshuak94
    On another note, does anybody have any ideas on how I can improve performance for parsing a SAM file? When parsing a 122 GB file, just reading records takes about 1 hour (with seqan3::sam_file_input and default fields), whereas samtools parses & indexes in ~10 minutes. If I specify a subset of fields, it takes 30 minutes to parse with seqan3. Both of these times are using 16 threads.
    61 replies
    Joshua Kim
    @joshuak94
    Yeah I've asked the MPI IT for where that stdlib is located..
    7 replies
    Joshua Kim
    @joshuak94
    What is the best way to handle the case where I have a project which includes seqan3 as a submodule, and includes another submodule which also includes seqan3 as a submodule?
    If I do git clone --recurse-submodules, I will end up with duplicate seqan3 libraries right?
    Enrico Seiler
    @eseiler
    I'd think so, you can do git submodule update --init, but then you would also have to do the same command in the seqan3 submodule
    Also depends on how much control you have over the projects.
    E.g., raptor uses chopper and both use seqan3. So I just have seqan3 as submodule and use it in both:
    https://github.com/seqan/raptor/blob/f4fbf45a65b590015472485fa7d4fca4ce263c07/CMakeLists.txt#L60
    https://github.com/seqan/raptor/blob/master/CMakeLists.txt#L100-L101
    https://github.com/seqan/raptor/blob/master/CMakeLists.txt#L123-L125
    But it requires that chopper also exposes means to set the path where seqan3 is.
    And then I have a flat submodule hierarchy: https://github.com/seqan/raptor/tree/master/lib
    So I always just use git submodule update --init. recursive shouldn't break anything, but it;s unnecessary
    Remy Schwab
    @remyschwab
    I'm adapting the contributing guide for the sharg-parser. Are Rene and Hannes still the project owners?
    4 replies
    Joshua Kim
    @joshuak94
    Hello, I was wondering if any more thought had been given to the performance issues of BAM IO. Current master branch performance is quite poor (one to two orders of magnitude slower than seqan2, and somehow gets slower with more threads?).
    1 reply
    Hannes Hauswedell
    @h-2
    In case some of you don't know these yet, I find them quite helpful:
    https://hackingcpp.com/cpp/cheat_sheets.html
    1 reply
    Ahmad lutfi
    @ah_ol_twitter
    Nice thanks
    Michael R. Crusoe
    @mr-c:matrix.org
    [m]
    Is there a new release of seqan3 coming? This bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1008352 will cause it to be removed from the next version of Debian if I don't get a fix in by April 24th
    1 reply
    Michael R. Crusoe
    @mr-c:matrix.org
    [m]
    Does this mean then gcc12 will be required for the new release?
    2 replies
    Michael R. Crusoe
    @mr-c:matrix.org
    [m]
    Okay, that makes sense. I don't know if there is a fix already, but I'll be happy to test a release candidate. GCC 11 is already in Debian "testing"
    Svenja Mehringer
    @smehringer
    @h-2 I just saw that you posted into the announcement channel. Question like that can go here. The announcement channel is reserved for important and short announcements
    1 reply
    Michael R. Crusoe
    @mr-c:matrix.org
    [m]
    GCC 12.1 release candidate 1 is out! https://gcc.gnu.org/pipermail/gcc/2022-April/238628.html
    1 reply
    Evelin Aasna
    @eaasna
    Hi, do you know if the FM index search function is thread safe? I mean to search a shared index for separate queries
    2 replies
    Michael R. Crusoe
    @mr-c:matrix.org
    [m]
    Signups are open!
    Svenja Mehringer
    @smehringer
    SeqAn 3.2.0 is out! :tada: Check it out on Github but make sure you have a recent GCC version :rocket: