Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    gelbal
    @gelbal
    I would like to generate a table that counts the unique devices connected on each day. The problem is how to keep generating new summaries of the devices with no end timestamp (yet). As I'd like to count the devices with active connections on all the dates since they first opened the connection.
    gelbal
    @gelbal
    Essentially it seems like I need to transform my stateful tables into streams that PipelinDB could work with. I had the idea to listen to insert / update events on this table and transform these events into a stream that's fed to a continuous view. Then the same problem stands: I need a scalable way of counting all these devices with open connection also on the new days so our CV is always up to date.
    Thomas Delnoij
    @mvcatsifma
    What is the difference between the pipelinedb.commit_interval and pipelinedb.max_wait configuration properties?
    Thomas Delnoij
    @mvcatsifma
    PS. the context of my question is that from the docs I take it that both properties affect continuous view update frequency, but how exactly is not clear to me.
    deklanw
    @deklanw
    For my application there is a risk of repeat-inserting. I could store ids in a table. Or, store the ids in memory and dump them into a table periodically. Has anyone else solved this problem?
    (or, even a bloom filter and then persist it periodically..)
    Geert-Jan Brits
    @gebrits
    Say I've got a table called ProductOfferHistory, which tracks product offers for a particular product/store/daterange combination. (i.e.: price, isAvail, storeId, productId, fromDt, toDt). If a new OfferEvent comes in (productId, StoreId, price, isAvail, dt) one of 2 things should happen:
    1) price/isAvail is the same as the last recorded ProductOfferHistory-record for productId/storeId. In that case: skip
    2) price/isAvail is different from the last recorded ProductOfferHistory-record for productId/storeId. In that case: set the toDT of said last record to event.dt + create a new record: price/isAvail/storeId/productId/fromDt = event.dt
    Would it be possible using PipelineDB to have OfferEvents come in as streams, and update the ProductOfferHistory with above described logic?
    Geert-Jan Brits
    @gebrits
    Or in more general terms: can PipelineDB be used to automate the updating of a Slowly Changing Dimension table? (https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row)
    mandy
    @mandyqcx_twitter
    May I ask what PostgreSQL version pipelinedb can support?
    deklanw
    @deklanw
    What would be the performance difference between doing one CV which involves, say, 5 grouping sets, and just making 5 separate CVs each with its one group by?
    deklanw
    @deklanw
    This doesn't support window functions anymore :(. And, joining streams isn't possible. What technology does do those?
    LSang
    @sangli00
    pipelinedb cannot support pg10.7version?
    Detobel36
    @Detobel36_twitter

    @sangli00
    Doc: http://docs.pipelinedb.com/installation.html#install-postgresql

    PipelineDB currently supports PostgreSQL versions 10.1, 10.2, 10.3, 10.4, 10.5, and 11.0 on 64-bit architectures.

    Detobel36
    @Detobel36_twitter

    @deklanw
    I don't know of any system similar to PipelineDB.
    The utility of PipelineDB is to allow you to set up a system like ETL, without the need for a large infrastructure.
    In other words, PipelineDB allows you to start data in one form and transform it into another form with SQL queries (When I talk about "other forms", it also means adding/deleting data with some joins, etc.).

    • Easy to write (SQL)
    • Easy to set up (PostgreSQL extension)
    • Simple infrastructure (just one database)

    For the risk of double insertion it is a more general problem than PipelineDB... When using a stream database, you will always have a risk of double insertions. It is either up to your program that sends the data to manage this, or you can fix the problem with a view (and the operator GROUP BY)

    @dimitar-nikovski
    I'm not sure I follow you....
    My PipelineDB tables are also defined in my public schemas.

    Doc: http://docs.pipelinedb.com/continuous-views.html#examples
    Example query:

    CREATE VIEW avg_of_forever AS SELECT AVG(x) FROM my_stream;

    You can directly perform this task in your public schemas.

    DidacticTactic
    @DidacticTactic
    Important update - the PipelineDB team is joining Confluent, the company behind Apache Kafka. Please read today’s announcement for more details: https://www.pipelinedb.com/blog
    bartlomiej-korpus
    @bartlomiej-korpus
    I am building a product that uses PipelineDB, do you think I should migrate over to KSQL? I don't know much about that confluent stack, it looks to be much more complex than just postgresql with pipelinedb extension. If I understand correctly, KSQL doesn't store aggregated that anywhere by default, right? I need to set up some connector to elasticsearch for example, is that right? @DidacticTactic
    Allen Lai
    @cailin-lai
    @derekjn @DidacticTactic It's indeed an important update! Good fortune to you guys, anyway!
    DidacticTactic
    @DidacticTactic
    @bartlomiej-korpus
    Confluent is more complex than PipelineDB as it’s based on Apache Kafka, an application development framework, so its also more versatile. If you want something simpler check out TimescaleDB
    @cailin-lai - thank you!
    bartlomiej-korpus
    @bartlomiej-korpus
    It definitely is! But I must say it looks more appealing than TimescaleDB which doesn't support continuous views yet
    bartlomiej-korpus
    @bartlomiej-korpus
    it looks like upcoming feature of timescaledb doesn't come with really live continuous views like PipelineDB does, it only updates them on specified interval, unfortunately my system heavily relies on that feature
    anemo
    @swengineer
    excuse me, I wonder how this issue can be fixed:

    CREATE FOREIGN TABLE wiki_stream (
    hour timestamp,
    project text,
    title text,
    view_count bigint,
    size bigint)
    SERVER pipelinedb;

    CREATE VIEW wiki_stats WITH (action=materialize) AS
    SELECT hour, project,
    count(*) AS total_pages,
    sum(view_count) AS total_views,
    min(view_count) AS min_views,
    max(view_count) AS max_views,
    avg(view_count) AS avg_views,
    percentile_cont(0.99) WITHIN GROUP (ORDER BY view_count) AS p99_views,
    sum(size) AS total_bytes_served
    FROM wiki_stream
    GROUP BY hour, project;

    insert into wiki_stream values('now()','a','a',1,1);

    select * into wiki_stats_tmp from wiki_stats_mrel ;

    select pipelinedb.combine_table('wiki_stats', 'wiki_stats_tmp');

    got error like this:

    postgres=# select pipelinedb.combine_table('wiki_stats', 'wiki_stats_tmp');
    ERROR: invalid string enlargement request size: -4

    bartlomiej-korpus
    @bartlomiej-korpus
    do you think replacing PipellineDB with Kafka and KSQL and then using Kafka Connect to pipe it to postgres for querying might be a viable solution?
    I am looking for something that can be plugged in place of pipeline
    unfortunately timescale turned out not to have the features I need yet
    Allen Lai
    @cailin-lai
    Indeed, TimescaleDB may not be a good choice to replace PP with current feature, IMO.
    Allen Lai
    @cailin-lai
    The new auto-aggr feature dislikes PP, probably, the TSL licence gap.... Otherwise, in case of the aggr was generated from raw data on disk, I'd prefer to have clickhouse with materialized views, it's super fast and we've it on production.
    bartlomiej-korpus
    @bartlomiej-korpus
    @cailin-lai I am looking into KSQL with Kafka as a replacement, did you compare it to Clickhouse?
    Allen Lai
    @cailin-lai
    @bartlomiej-korpus I've not looked into the details to ksql yet, but the early research works shown the memory costs would be high. Basically, reasons that we choose sql-based solutions are the costs(people, firmware -memory vs ssd drivers etc) and the gaps for maintenance. We had not good experience before, an ES cluster became big and big. Finally, we decided to replace it, and the costs saved 6x.
    bartlomiej-korpus
    @bartlomiej-korpus
    oh I see, Im not really going to use ES though with KSQL, just piping the results with Kafka Connect back to relational store for querying
    does Clickhouse do real-time rollups like Pipelinedb?! @cailin-lai
    Allen Lai
    @cailin-lai
    @bartlomiej-korpus Yes, it does it with materialized views. The workflow sounds like this, 1-both of raw data and aggr data will be flushing to disk in a bulk insert. 2-background works to merge aggr data which generated by bulk inserts. Since the aggr data is stored with -state- types (also hll scenario), the query result won't be impacted whether those data are merged or not.
    Allen Lai
    @cailin-lai
    The disk needs to be very good to accomplish the background parts merge operations .... I actually much like PP to balance the OLTP and OLAP cases. Well, most likely, clickhouse is designed to OLAP with columnar store, big data cases. big-bulk inserts required for performance, deny frequent inserts, and the query performance goes down linear by the # of columns ...
    Patrick Wunderlin
    @ligoo
    As you're working with Confluent now, are you going to release PipelineDB Cluster to the open-source community ? @DidacticTactic
    esatterwhite
    @esatterwhite
    Is there a path forward for this project? are there any people with commit / merge access outside of the people that work(ed) at the pipelinedb company?
    Would the company heads consider moving it to something like apache incubator?
    or perhaps handing the core code base + existing knowledge over to the folks at TimescaleDB? These to projects were destined to be together. And timescale is already on the same path.
    Thomas Delnoij
    @mvcatsifma
    @esatterwhite Those are good question that I would like to know the answers to as well.
    bartlomiej-korpus
    @bartlomiej-korpus
    what do you guys think about Druid as a replacement for PipelineDB?
    mandy
    @mandyqcx_twitter
    hello,may I ask a question about ttl .I am sure that I already altered the 'TTL' of a cv,but it does not work. Should I do something more?
    zhuhuangxi1
    @zhuhuangxi1
    when the postgres stops ,scheduler have called nn_term() and it calls nn_close() to close all the sockets. but my worker process still blocks at nn_send().
    that's why?
    joschne
    @joschne
    Is there any database as a service provider (as for example heroku postgres) supporting pipelinedb extension out of the box?