Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Clark C. Evans
    @clarkevans
    Hello. I just wanted to say that a NIH SBIR grant proposal went out today. Thanks for those who helped.
    The week of the 15th, I'm down at the OHDSI conference where we'll be giving a poster on the use of Query Combinators for their cohort queries.
    I think in another week or so, I'll be done with the paper & conference & grant-writing and be able to get back to documentation and sort.
    Robert Schwarz
    @rschwarz
    :+1:
    Clark C. Evans
    @clarkevans
    Poster for Monday @ OHDSI is up.
    Clark C. Evans
    @clarkevans
    Hi. So, I just got back from OHDSI conference. The poster session was interesting, one of the first times I've had a chance to engage with potential users of DataKnots.
    One of the things I've learned is that the description of DataKnots as a combinator langauge, ie, where combinators are functions taking query functions as arguments and building a query function, works quite well.
    Second, I've realized that navigation + keep is probably analogous to table join in SQL. That is, the principle things you do are navigate down the tree doing things, and keeping things you've made along the way.
    So, a better operational description of DAtaKnots aong these lines would help us.
    Anyway. We may have a few more than join us. I met some very lovely people at the conference.
    At this time, I've got one more conference in October, and then the next one will probably be PgCon in March.
    So... what we're going to do is buckle down and get SQL generation working fluidly.
    clarkevans @clarkevans cheers.
    Robert Schwarz
    @rschwarz
    :clap:
    Clark C. Evans
    @clarkevans
    I expect to make a big splash at PgCon. Lots of people were supportive of HTSQL, but it was GPL licensed and had a unspecified model /w quirks. DataKnots fixes both of those issues. I expect that with solid SQL generation of complex queries, we'll have something that will break though. Further, at that time, I hope we could become the defacto SQL library for Julia. Of course, we'll work with other data sources, but solid SQL generation will provide the bridge needed for broad adoption.
    Anyway. Thank you all for your support & encouragement.
    Robert Schwarz
    @rschwarz
    That is for Postgres?
    Clark C. Evans
    @clarkevans
    So, https://github.com/rbt-lang/DataKnots4Postgres.jl has the PostgreSQl adapter.
    The adapter is non-performant.
    The next step is to develop an "optimization" system for DataKnots. This will then be used to "push-down" queries to SQL.
    Robert Schwarz
    @rschwarz
    Ah, yes, I meant PgCon :-)
    Clark C. Evans
    @clarkevans
    https://www.pgcon.org/ is where all the old timers go.
    I'm a fixture there, although I've been absent for a few years.
    But, yea, PgCon is the main conference where the core developers & contracting companies meet up annually.
    There are more commercial ones in the U.S. -- but this one in Ottowa, following BSDCAN has been around for 15+ years.
    I hope this helps.
    David L Denton
    @davidldenton
    Just introducing myself. I met Clark at the OHDSI conf and he got my attention very quickly. This is a fascinating and important project. Still new to Julia myself, but once my skills are up to snuff, I hope to contribute. Keep up the great work.
    Clark C. Evans
    @clarkevans
    Welcome!
    Brent Halonen
    @bhalonen
    Clark, met you at JuliaCon, looking at potentially using DataKnots in a project we are currently working on. Any thoughts?
    Clark C. Evans
    @clarkevans
    Hello.
    We could chat by phone and discuss.
    Or type it out here I guess.
    Brent Halonen
    @bhalonen
    I got a meeting in 15 minutes, but I can call later.
    Adam Black
    @ablack3
    Hi!
    Clark C. Evans
    @clarkevans
    Clark C. Evans
    @clarkevans
    Hello. So, it's been 6 months. DataKnots isn't dead, I'm busy working on applications of it to health informatics, in particular, I'm writing a FHIR adapter. As part of this work we had to make it work with Julia; this is in a branch. Once more progress on that happens, we'll update here.
    Adam Black
    @ablack3
    This message was deleted
    Nice Clark!
    Clark C. Evans
    @clarkevans
    This is also a proof-of-concept that our proposed JSON support; while ugly as sin, works lovely. To make it nicer, we'll need "just in time" handling of Any datatype via incremental compilation; but that's a non-trivial work product.
    Clark C. Evans
    @clarkevans
    https://querycombinators.org/dist/clinical-20200806.pdf was submitted to AMIA 2021 as a system demonstration.
    Clark C. Evans
    @clarkevans
    Hello. We're moving along; slowly. While we've created a FHIR adapter (and it's fast), I don't have the business contacts to use it productively in a production setting; and I don't really feel like developing them at this time. So, it's nice to know we can easily process JSON data.
    Our focus for the next 6-9 months will turn back to SQL generation, so that we could convert DataKnots queries to PostgreSQL. If you look at the minimal DataKnots4Postgres repository, you'll see that it has a primitive data pipeline which loads a set of columns from the database, but then all processing is done in memory. Moreover, joins are on a cell by cell basis; which are extremely inefficient. But, they work, as last August we had OHDSI cohort queries working against a SQL database (they just took minutes for even small number of records). Hence, we know that there's no deep issue with this working. The objective is to treat the conversion as an optimization problem, pushing-down native DataKnots piplines into the correpsonding SQL fragments.
    Clark C. Evans
    @clarkevans
    To do this, we're looking at how we might optimize queries. The current query optimizer has local rewrites and common sub-expression elimination, https://github.com/rbt-lang/DataKnots.jl/blob/master/src/rewrites.jl
    One possible way forward is to "linearize" queries, removing nesting, and see if this might offer up ideas for how to do more global optimizations; luckily the code to linearize is relatively easy https://github.com/rbt-lang/DataKnots.jl/blob/linearize/src/rewrites.jl#L9-L30 ; rewrites can now be shown visually https://github.com/rbt-lang/diagrams/blob/gh-pages/rewrites.drawio.pdf
    The first goal for SQL optimization is shown in this diagram: https://github.com/rbt-lang/diagrams/blob/gh-pages/sql-pipeline.drawio.pdf
    Anyway, that's it for now. If anyone sees any applications of DataKnots /w JSON, let us know.
    Clark C. Evans
    @clarkevans
    On discourse.julialang.org, Lincoln Hannah asked if we support XML. We did support XML in an earlier version of DataKnots, but it requires "IndexVector" which we removed to make a minimal, documented release. One could still query XML, but you have to first load it into an in-memory dictionary. Moreover, you have to define your schema via a query. Here is an example of how one could do this with FpML, a financial product markup language. https://gist.github.com/clarkevans/780c46f58e09342e8343be2d5ed4d25f