Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Geert-Jan Brits
    2) price/isAvail is different from the last recorded ProductOfferHistory-record for productId/storeId. In that case: set the toDT of said last record to event.dt + create a new record: price/isAvail/storeId/productId/fromDt = event.dt
    Would it be possible using PipelineDB to have OfferEvents come in as streams, and update the ProductOfferHistory with above described logic?
    Or in more general terms: can PipelineDB be used to automate the updating of a Slowly Changing Dimension table? (https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row)
    May I ask what PostgreSQL version pipelinedb can support?
    What would be the performance difference between doing one CV which involves, say, 5 grouping sets, and just making 5 separate CVs each with its one group by?
    This doesn't support window functions anymore :(. And, joining streams isn't possible. What technology does do those?
    pipelinedb cannot support pg10.7version?

    Doc: http://docs.pipelinedb.com/installation.html#install-postgresql

    PipelineDB currently supports PostgreSQL versions 10.1, 10.2, 10.3, 10.4, 10.5, and 11.0 on 64-bit architectures.


    I don't know of any system similar to PipelineDB.
    The utility of PipelineDB is to allow you to set up a system like ETL, without the need for a large infrastructure.
    In other words, PipelineDB allows you to start data in one form and transform it into another form with SQL queries (When I talk about "other forms", it also means adding/deleting data with some joins, etc.).

    • Easy to write (SQL)
    • Easy to set up (PostgreSQL extension)
    • Simple infrastructure (just one database)

    For the risk of double insertion it is a more general problem than PipelineDB... When using a stream database, you will always have a risk of double insertions. It is either up to your program that sends the data to manage this, or you can fix the problem with a view (and the operator GROUP BY)

    I'm not sure I follow you....
    My PipelineDB tables are also defined in my public schemas.

    Doc: http://docs.pipelinedb.com/continuous-views.html#examples
    Example query:

    CREATE VIEW avg_of_forever AS SELECT AVG(x) FROM my_stream;

    You can directly perform this task in your public schemas.

    Important update - the PipelineDB team is joining Confluent, the company behind Apache Kafka. Please read today’s announcement for more details: https://www.pipelinedb.com/blog
    I am building a product that uses PipelineDB, do you think I should migrate over to KSQL? I don't know much about that confluent stack, it looks to be much more complex than just postgresql with pipelinedb extension. If I understand correctly, KSQL doesn't store aggregated that anywhere by default, right? I need to set up some connector to elasticsearch for example, is that right? @DidacticTactic
    Allen Lai
    @derekjn @DidacticTactic It's indeed an important update! Good fortune to you guys, anyway!
    Confluent is more complex than PipelineDB as it’s based on Apache Kafka, an application development framework, so its also more versatile. If you want something simpler check out TimescaleDB
    @cailin-lai - thank you!
    It definitely is! But I must say it looks more appealing than TimescaleDB which doesn't support continuous views yet
    it looks like upcoming feature of timescaledb doesn't come with really live continuous views like PipelineDB does, it only updates them on specified interval, unfortunately my system heavily relies on that feature
    excuse me, I wonder how this issue can be fixed:

    CREATE FOREIGN TABLE wiki_stream (
    hour timestamp,
    project text,
    title text,
    view_count bigint,
    size bigint)
    SERVER pipelinedb;

    CREATE VIEW wiki_stats WITH (action=materialize) AS
    SELECT hour, project,
    count(*) AS total_pages,
    sum(view_count) AS total_views,
    min(view_count) AS min_views,
    max(view_count) AS max_views,
    avg(view_count) AS avg_views,
    percentile_cont(0.99) WITHIN GROUP (ORDER BY view_count) AS p99_views,
    sum(size) AS total_bytes_served
    FROM wiki_stream
    GROUP BY hour, project;

    insert into wiki_stream values('now()','a','a',1,1);

    select * into wiki_stats_tmp from wiki_stats_mrel ;

    select pipelinedb.combine_table('wiki_stats', 'wiki_stats_tmp');

    got error like this:

    postgres=# select pipelinedb.combine_table('wiki_stats', 'wiki_stats_tmp');
    ERROR: invalid string enlargement request size: -4

    do you think replacing PipellineDB with Kafka and KSQL and then using Kafka Connect to pipe it to postgres for querying might be a viable solution?
    I am looking for something that can be plugged in place of pipeline
    unfortunately timescale turned out not to have the features I need yet
    Allen Lai
    Indeed, TimescaleDB may not be a good choice to replace PP with current feature, IMO.
    Allen Lai
    The new auto-aggr feature dislikes PP, probably, the TSL licence gap.... Otherwise, in case of the aggr was generated from raw data on disk, I'd prefer to have clickhouse with materialized views, it's super fast and we've it on production.
    @cailin-lai I am looking into KSQL with Kafka as a replacement, did you compare it to Clickhouse?
    Allen Lai
    @bartlomiej-korpus I've not looked into the details to ksql yet, but the early research works shown the memory costs would be high. Basically, reasons that we choose sql-based solutions are the costs(people, firmware -memory vs ssd drivers etc) and the gaps for maintenance. We had not good experience before, an ES cluster became big and big. Finally, we decided to replace it, and the costs saved 6x.
    oh I see, Im not really going to use ES though with KSQL, just piping the results with Kafka Connect back to relational store for querying
    does Clickhouse do real-time rollups like Pipelinedb?! @cailin-lai
    Allen Lai
    @bartlomiej-korpus Yes, it does it with materialized views. The workflow sounds like this, 1-both of raw data and aggr data will be flushing to disk in a bulk insert. 2-background works to merge aggr data which generated by bulk inserts. Since the aggr data is stored with -state- types (also hll scenario), the query result won't be impacted whether those data are merged or not.
    Allen Lai
    The disk needs to be very good to accomplish the background parts merge operations .... I actually much like PP to balance the OLTP and OLAP cases. Well, most likely, clickhouse is designed to OLAP with columnar store, big data cases. big-bulk inserts required for performance, deny frequent inserts, and the query performance goes down linear by the # of columns ...
    Patrick Wunderlin
    As you're working with Confluent now, are you going to release PipelineDB Cluster to the open-source community ? @DidacticTactic
    Eric Satterwhite
    Is there a path forward for this project? are there any people with commit / merge access outside of the people that work(ed) at the pipelinedb company?
    Would the company heads consider moving it to something like apache incubator?
    or perhaps handing the core code base + existing knowledge over to the folks at TimescaleDB? These to projects were destined to be together. And timescale is already on the same path.
    Thomas Delnoij
    @esatterwhite Those are good question that I would like to know the answers to as well.
    what do you guys think about Druid as a replacement for PipelineDB?
    hello,may I ask a question about ttl .I am sure that I already altered the 'TTL' of a cv,but it does not work. Should I do something more?
    when the postgres stops ,scheduler have called nn_term() and it calls nn_close() to close all the sockets. but my worker process still blocks at nn_send().
    that's why?
    Is there any database as a service provider (as for example heroku postgres) supporting pipelinedb extension out of the box?
    Hello everyone
    Hi I am trying to install pipelinedb extension on top of postgres. When I try to create an extension I get the following error: psql -c "CREATE EXTENSION pipelinedb"
    ERROR: could not open extension control file "/Applications/Postgres.app/Contents/Versions/12/share/postgresql/extension/pipelinedb.control": No such file or directory. Can someone please help on what is the issue here?
    I would like to use timescaledb, pipeline and normal postgres features simultaneously
    Is it possible?
    and also can I write triggers on pinelinedb streams?
    Ravi Songa
    does pipelineDB works with Postgres v11.10 or just 11.0?
    1 reply
    Hi, Can we have cumulative sum continuous aggregation based on a field say date? E.g. would be account balance aggregated records for various users on a daily basis. Note that wanted this calculated in cont view itself and not outside while querying view.