by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 08 17:57

    TanviBhavsar on develop

    bug(query): use localPlanner wh… (compare)

  • Aug 08 17:57
    TanviBhavsar closed #849
  • Aug 08 06:03
    TanviBhavsar synchronize #849
  • Aug 07 21:53
    TanviBhavsar review_requested #849
  • Aug 07 21:53
    TanviBhavsar review_requested #849
  • Aug 07 21:53
    TanviBhavsar opened #849
  • Aug 06 21:19
    vishramachandran synchronize #578
  • Aug 06 20:32
    tjackpaul closed #848
  • Aug 06 20:32

    tjackpaul on integration

    bug(coord): Fix shardKeyRegexPl… fix(memory): Blocks shouldn't b… bug(query): schema mismatch err… and 26 more (compare)

  • Aug 06 20:15
    tjackpaul edited #848
  • Aug 06 20:14
    tjackpaul review_requested #848
  • Aug 06 20:14
    tjackpaul review_requested #848
  • Aug 06 20:14
    tjackpaul opened #848
  • Aug 06 20:14
    tjackpaul review_requested #848
  • Aug 06 20:09

    tjackpaul on integration

    (compare)

  • Aug 06 18:01

    tjackpaul on integration

    Update integration version for … (compare)

  • Aug 06 07:07
    szymonm synchronize #844
  • Aug 06 05:07

    TanviBhavsar on develop

    NumberFormatException for +Inf … (compare)

  • Aug 06 05:07
    TanviBhavsar closed #847
  • Aug 06 04:42
    TanviBhavsar review_requested #847
Evan Chan
@velvia
That file and/or directory should have the proper protections, you might need to run as a particular user to use that file for example
The alternative is that you can pass in the password on the command line
via spark-driver-java-options
ezcocos
@ezcocos
Thanks, I managed to pass credentials.
Now, I have replace a column name "Gmt Time" by "Gmt_Time" removing the space. But when I save it in FiloDB I got this error message:
MissingColumnNames(ArrayBuffer(Gmt Time),row)
Any idea what's the problem?
Tks, E
I mean this error happens when I try to save the dataframe in filodb. It's like it remember the df with the column with the space.
ezcocos
@ezcocos
I removed the space using "withColumnRenamed"
E
ezcocos
@ezcocos
Hi All,
I find the mistake on the column name. It was a mistake in the command write. Tks.
E
Evan Chan
@velvia
@ezcocos glad you figured it out :)
parekuti
@parekuti
@velvia i am trying to do ingestion using spark2.0 version and getting the following error. ERROR 2017-08-31 08:36:59,791 Slf4jLogger.scala:66 - akka.actor.OneForOneStrategy: scala.FallbackArrayBuilding
KaTeX parse error: Unexpected character: '$' at position 4: anon̲$1 cannot be c: anon$1 cannot be cast to org.velvia.filo.ZeroCopyUTF8String
java.lang.ClassCastException: scala.FallbackArrayBuilding
anon$1 cannot be cast to org.velvia.filo.ZeroCopyUTF8String
ykwzx4585168
@ykwzx4585168
Hi all,
I can not run the filo-cli in the cmd of windows, what tools shall I install? thanks very much
Evan Chan
@velvia
@ykwzx4585168 l believe I answered your question right on email?
jsbilgi
@jsbilgi
Hi
Looking for filodb docker image
Please advise if it is available on docker-hub or any repositories
Evan Chan
@velvia
@jsbilgi sorry no Docker image. Something we can consider once we are done with bigger changes. :)
jsbilgi
@jsbilgi
@velvia Thank u
Evan Chan
@velvia
Folks, we have released FiloDB 0.8.0. There are some huge changes in the repo. Feel free to go there and check it out.
Weeco
@weeco

@velvia I watched some of your presentations about FiloDb on youtube and it sounds like FiloDB is a very solid time series database which can together with telegraf & cassandra replace a prometheus cluster. Is this correct?

I am still hesitating because there's just little to no information about FiloDB being a prometheus alternative on the internet. It seems like this is not the main purpose of FiloDB, hence I better ask here again, if the FiloDB setup could actually replace my prometheus cluster with way better characteristics such as scaling, highly available and probably more lightweight

Evan Chan
@velvia
Hi @weeco So we use FiloDB as a scalable Prometheus. The reason why there is no information is because we have not done any promotion on it.
Right now there is no alerting in the open source is the only thing
Weeco
@weeco
Thanks for your response, I don't need alerting as I'd use Grafana alerting for this purpose anyways. As far as I remember from your presentation metrics are being scraped by telegraph and sent to kafka and then ingested by FiloDB. Obviously Kafka is one of our critical infrastructure components we need to monitor as well. When our Kafka cluster is broken we can not investigate the issues anymore right? So is it recommended to create another kafka cluster for the monitoring purpose?
Weeco
@weeco
@velvia
Evan Chan
@velvia
@weeco hard to answer without knowing more. I think having a separate cluster is a good idea in general to separate out concerns. These days Kafka cloud services are@often run in shared containers tho so be careful.
shubham.sinha@berkeley.edu
@shubhamsinhabe1_twitter
Hey everyone. I had a question about how FiloDb integrates with an existing cassandra cluster. Currently we have an existing Cassandra cluster and I was wondering how do I go about getting those tables/keyspaces in FiloDb?
Pardon me if my understanding of filo is wrong here. But @velvia I watched your talks on Filodb so my knowledge is pretty high level. I am in the process of building a low latency data ingestion system that can also support very fast read queries. We currently use Cassandra as the database
Evan Chan
@velvia
@shubhamsinhabe1_twitter you could reingest the data into FiloDB. However note that currently FiloDB is oriented towards time series only and not generic data. For time series it supports much richer queries via PromQL. Maybe read the README and see if the details match what you expect?
shubham.sinha@berkeley.edu
@shubhamsinhabe1_twitter
Yeah essentially I have raw activity data based on time. I run hourly and nightly batch jobs to process this time series data to populate other Cassandra tables. My goal is to a) Process this incoming time based data in real time b) Move away from the restrictive cassandra data models ( for example, having the ability to query large number of data points based on a start and end date, combine this with the latest incoming data, run some computations and write it to a table I can run read queries on)
@velvia Would you say Filodb is a good use case for this? And I want to use Spark for the real time data processing part
shubham.sinha@berkeley.edu
@shubhamsinhabe1_twitter
Also @velvia would like to point out that our Cassandra time series activity data table have around 10 million rows in production. Twice that number in Dev environment. If you think our use case makes sense then If you could also point to a resource that explains how to go about reingesting all the existing Cassandra data to filo that would be great. Would it also maintain the exact same schema for that Cassandra table? Thanks for your explanations I know I asked a bunch of questions.
Evan Chan
@velvia
@shubhamsinhabe1_twitter so I would say it’s a good fit if you can easily divide your data into different entities - for example, devices, customers, people —and your queries tend to be oriented around groups of entities. 10 million points is nothing - Also what is your schema like?
shubham.sinha@berkeley.edu
@shubhamsinhabe1_twitter
Hey @velvia Yeah so let me explain what the table schemas currently in Cassandra look like. Table 1 -> process_time, student_id, classroom_id, event, and few more columns. This table stores the raw incoming activity data. We then run batch jobs on an hourly basis to compute/aggregate data from Table 1 for the past 1 hour (so query Table 1 where start_time = current time - 1 hour and end_time= current time) and insert that into an hourly table (Table 2). Similarly we run a batch job every day at night to compute weekly aggregated data table and a monthly aggregated data table which grabs the data from Table 2. I would like to make this whole process real time data ingestion , aggregation and updating of data models. On the other hand I want to also use the aggregated data models to serve client triggered http requests, where I can query data in filoDb (with or without spark) to grab queries based on time and entities (Entities in this case would be one of classroom, student or school ). These aggregated tables mentioned above will be stored as columns start_end, end_time, score, classroom_id(or student_id or school_id depending on the entity of the table)
Also would FiloDb allow queries such as for the given table with schema: entity_1, entity_2, entity_3, start_time, end_time. SELECT * FROM table WHERE entity_1 (or entity_2 or entity_3) = 'name' AND start_time = now - 1 hour AND end_time=now ?
So I could essentially query for any entity based on time. Or is creating separate table for each entity recommended practice?
shubham.sinha@berkeley.edu
@shubhamsinhabe1_twitter
Please also let me know if you think filoDb wouldn't be ideal for this use case. Based on my research after going through your slides and reading he readme It seems to be that it would.
Evan Chan
@velvia
The query above to select entity could work for FiloDB. Though I would model data differently. Basically let each entity be modeled by a set of tags or key-values. FiloDB would let you query/filter on any combination of entities or subset, even with regex. This makes FiloDB querying much more powerful than what you can do with C* itself.
However right now ingesting custom schema is not very easy.
FiloDB is designed for ingesting millions of entities with flexible data model. But please read README first ... :)
shubham.sinha@berkeley.edu
@shubhamsinhabe1_twitter
@velvia thanks for your explanations makes a lot more sense. I am going through the readme but I don't see any detailed explanation on integration with Spark? In addition to ingesting data from Kafka , I would also like the ability to query Filo and get data, combine this data with the current incoming data stream and eventually submit this back to Filo. For the combination and aggregation operations I was planning to use Spark. So Kafka -> Spark -> Filo . Is this possible?
Evan Chan
@velvia
The Spark integration is not quite there yet.
shubham.sinha@berkeley.edu
@shubhamsinhabe1_twitter
@weeco Any workarounds? Essentially just want the ability to load data from Filo to Spark Dataframe and also write data back to Filo. Will this be possible?
Evan Chan
@velvia
Will need more development for this to happen. It should be possible in the future. Actually the spark to Filo part is possible by having spark push data through Gateway to Kafka.
Szymon Matejczyk
@szymonm
Are the google groups mentioned in the README active and open?
Evan Chan
@velvia
Some of us pay attention to the google groups, not used much though.
Szymon Matejczyk
@szymonm
I can't access the google groups. Seems like I miss access rights?