Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 00:33
    lucperkins synchronize #7368
  • 00:27
    lucperkins synchronize #7368
  • Jun 20 12:24
    ktff synchronize #7944
  • Jun 20 12:21
    ktff edited #7944
  • Jun 20 12:18
    ktff review_requested #7944
  • Jun 20 12:18
    ktff review_requested #7944
  • Jun 20 12:18
    ktff review_request_removed #7944
  • Jun 20 12:18
    ktff assigned #7944
  • Jun 20 12:18
    ktff opened #7944
  • Jun 20 12:18
    ktff labeled #7944
  • Jun 20 02:20
    lucperkins synchronize #7368
  • Jun 20 01:53
    lucperkins synchronize #7368
  • Jun 20 01:17
    lucperkins synchronize #7368
  • Jun 19 23:34
    lucperkins synchronize #7368
  • Jun 19 22:17
    lucperkins synchronize #7368
  • Jun 19 17:10
    ktff synchronize #7940
  • Jun 19 17:04
    ktff synchronize #7939
  • Jun 19 17:00
    ktff review_requested #7940
  • Jun 19 17:00
    ktff review_requested #7940
  • Jun 19 17:00
    ktff review_requested #7940
Kris Reeves
@myndzi
Got one more question, related to content transformation. For things like referer links in web logs, we use Unilog to perform redaction of data before it hits disk. I can see that LUA stuff would maybe suffice here, or maybe some of the other transforms. However, with a big list of query string keys, we found significant performance problems which we were able to improve by using the aho-corasick algorithm to match the (fixed) list of query string values efficiently against the log data. It seems like there exists an implementation of this in LUA, but I'm not entirely sure what to expect for performance or managing non-trivial code. Any advice on what'd be most suitable here to replace that functionality?
2 replies
Luca Palmieri
@LukeMathWalker
Hi everyone!
I have been experimenting with Vector to ship structured logs from a Rust backend service - so far so good, I managed to get a setup that satisfies all my requirements.
A dilemma comes up when I need to deploy that bad boy as a Kubernetes pod. I have checked the docs, but I haven't found an answer (or a clear cut one): what is the recommended strategy to scrape logs from a container running in a k8s pod using vector?
Should vector run in a separate container on the same pod, sharing a volume, using the file source? What are the other available options?
5 replies
Ana Hobden
@Hoverbear
@LukeMathWalker Hey! :) I think using the docker source will work ok? Since this is a service you made you can just drop Vector right into the docker container
24 replies
Andrey Afoninsky
@afoninsky

hello

I want to implement docker proxy to connect Vector to MQTT (as a replacement of Kafka in event-based architecture)
which transport it's better to choose for communication between vector container and proxy container?
"vector source/sink" looks like a native solution, but "http source/sink" has at-least-once guarantee

Ana Hobden
@Hoverbear
So Vector <-> Vector may change in the future to include other APIs like timberio/vector#2003
Where HTTP would be passing just the logs
So if you want a stable API into your MQTT probably HTTP?
1 reply
Andrey Afoninsky
@afoninsky
true
on the other hand, I hope to replace this hack with native vector mqtt support later :)
plus, need to implement additional http logic: bulks, retries, etc...
but got it, thank you for the information
Ana Hobden
@Hoverbear
Ah yeah, gotcha. :) That'd be nice to have for sure
Alexandre NICOLAIE
@xunleii

Hi everyone :)

I've a question about the Loki sink. Is it a way to send a field (message for example) instead the complete event ? Because, inside Grafana, the Vector event is a JSON object and we can't currently get statistics inside the message field directly through the explorer. (And because we can generate labels from Vector, sending the full event is not really interesting).

1 reply
davewleblanc
@davewleblanc
Hi. :-) In your docs, you say "A config can have any number of transforms and it's entirely up to you how they are chained together." But there's no real example of how to do that. I have a pretty specific regex transform, but I'd like to match on more than one case, which would include another specific regex. But when running the tests fail because I have more than one.
1 reply
matrixbot
@matrixbot
@mike.cardwell:grepular.com The globbing in the file source, can you use multiple asterisks in multiple locations. E.g is this valid? include = ["/opt/nomad/data/alloc/*/alloc/logs/monitor.std*.*"]
1 reply
Slawomir Skowron
@szibis_twitter

Hi I like to send internal metrics from vector do datadog as metrics using datadog_metrics my config looks like this

[sources.internal_metrics]
  type = "internal_metrics"

[transforms.tags_internal_metrics]
  # General
  type = "add_tags" # required
  inputs = ["internal_metrics"] # required

  # Tags
  tags.hostname = "${VECTOR_HOSTNAME}"
  tags.role = "${VECTOR_ROLE}"
  tags.cluster = "${VECTOR_CLUSTER}"
  tags.env = "${VECTOR_ENV}"
  tags.region = "${VECTOR_REGION}"
  tags.project = "${VECTOR_PROJECT}"
  tags.hostgroup = "${VECTOR_HOSTGROUP}"

[sinks.internal_metrics_log]
  # General
  type = "console" # required
  inputs = ["tags_internal_metrics"] # required
  target = "stdout" # optional, default

  # Encoding
  encoding.codec = "json" # required
  encoding.timestamp_format = "rfc3339" # optional, default

[sinks.datadog_metrics_internal_metrics]
  # General
  type = "datadog_metrics" # required
  inputs = ["tags_internal_metrics"] # required
  api_key = “<“secret key> # required
  healthcheck = true # optional, default
  host = "https://app.datadoghq.com"
  namespace = "vector" # required

  # Batch
  batch.max_events = 20 # optional, default, events
  batch.timeout_secs = 1 # optional, default, seconds

tags are added and metrics show in logs in format that looks valid but Datadog sing taking 404 response

Apr 23 11:39:48 ip-172-18-99-67 vector[24857]: Apr 23 11:39:48.217 WARN sink{name=datadog_metrics_internal_metrics type=datadog_metrics}:request{request_id=1}: vector::sinks::util::retries: request is not retryable; dropping the request. reason=response status: 404 Not Found

19 replies
matrixbot
@matrixbot

@mike.cardwell:grepular.com > <@mike.cardwell:grepular.com> The globbing in the file source, can you use multiple asterisks in multiple locations. E.g is this valid? include = ["/opt/nomad/data/alloc/*/alloc/logs/monitor.std*.*"]

To answer my own question: yes. The problem I was having was that Vector could not see these files because the "alloc" dir was owned root:root with mode 0711, which meant the vector user couldn't get a directory listing, therefore the globbing failed. I feel like this should have been logged by Vector, but it wasn't.

1 reply
Rick Richardson
@rrichardson
For Kubernetes, the logging integration is awesome. Has anyone considered also leveraging the node-exporters in some way to collect all of the container and system metrics as well?
I think all that would be needed would be a scraper that acts as a vector source
A really nice-to-have would be a way to apply a filter at that point, to cull the data a bit and remove duplicates
Binary Logic
@binarylogic
[Mikhail Novichkin, Timber] We'll most likely provide integration for this via out Helm Chart os it's available via a single knob
matrixbot
@matrixbot
@mike.cardwell:grepular.com I don't know if this is a known issue, but there are no nightly rpms atm, just .debs: https://packages.timber.io/vector/nightly/latest/
1 reply
Slawomir Skowron
@szibis_twitter
Is anyone have problems with aws_ec2_metadata all I see in logs is
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.004 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/placement/availability-zone
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.004 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/local-hostname
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.005 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/local-ipv4
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.005 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/mac
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.005 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/network/interfaces/macs/06:a5:31:79:0f:ac/subnet-id
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.006 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/network/interfaces/macs/06:a5:31:79:0f:ac/vpc-id
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.025 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/placement/availability-zone
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.025 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/placement/availability-zone
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.026 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/local-hostname
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.026 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/local-hostname
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.026 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/local-ipv4
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.026 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/local-ipv4
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.027 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/mac
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.027 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/mac
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.027 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/network/interfaces/macs/06:a5:31:79:0f:ac/subnet-id
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.027 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/network/interfaces/macs/06:a5:31:79:0f:ac/subnet-id
Apr 27 14:26:26 ip-10-105-195-187 vector[18040]: Apr 27 14:26:26.028 DEBUG aws_ec2_metadata: worker: vector::transforms::aws_ec2_metadata: Sending metadata request. uri=http://169.254.169.254/latest/meta-data/network/interfaces/macs/06:a5:31:79:0f:ac/vpc-id
21 replies
gtie
@gtie
I'm not sure who is responsible for the release notes production/automation, but https://vector.dev/releases/0.9.0/ is pure joy
I'd go on a date with this page if I could ;)
amazing job!
Binary Logic
@binarylogic
[zach, Timber] Lol @Ben
Binary Logic
@binarylogic
@gtie ha ha, thanks, we worked hard on them :)
Alex
@Alexx-G
Hi everyone,
What's the recommended strategy on filtering out some logs? Is it acceptable creating a swimlane and leaving one flow not connected to a sink?
1 reply
Filippo Giunchedi
@filippog_gitlab
Hello, I'm trying out Vector and I'm wondering how to handle configurations with multiple transforms all needing to converge into a single sink. In other words, my use case is being able to add a transform and have the sink pick it up e.g. based on its name, without changing the sink's input list. Something like being able to say to the sink "use all transforms starting with 'blah' as inputs" without listing all input names explicitly, is this possible and/or planned? Thank you!
SvenMarquardt5772
@SvenMarquardt5772
i have a regex that doesn't match even though it matches in the syntax checker of rust.
the pattern is
^(?P<level>[\w\.]+) \[(?P<threadname>.*)\]: (?P<logger>[\w\.]*):(?P<linenumber>[\d\.]*) - (?P<message>.*).*E:(?P<exception>.*)?\n?(?P<stacktrace>(?s).*)?$
and the message is
INFO [Heartbeat]: SQSClientImpl:359 - Reset f￿ffc3￿ffbcr Dev_SQS_Queue E:
Is there something wrong with my pattern?
10 replies
Ana Hobden
@Hoverbear
@filippog_gitlab While we don't have that feature right now, but I don't think that feature is out of the question! Can I invite you to open an issue?
Vlad Pedosyuk
@vpedosyuk
Hi! Should 0.9.1 version of Vector support both "contains" and "not_contains" predicates in the "filter" transform?
Vlad Pedosyuk
@vpedosyuk

I've got a simple vector.toml:

[sources.kafka_source]
        type = "kafka"
       ...other settings....
[transforms.filter]
        type = "filter"
        inputs = ["kafka_source"]
        condition."message.contains" = "blahblah"
[sinks.kafka_sink]
        type = "kafka"
        inputs = ["filter"]
       ....other settings...

the above works fine, however if I set condition."message.not_contains" = "blahblah", Vector throws an error:

May 06 06:39:54.062  INFO vector: Log level "debug" is enabled.
May 06 06:39:54.062  INFO vector: Loading configs. path=["/etc/vector/vector.toml"]
May 06 06:39:54.070  INFO vector: Vector is starting. version="0.9.1" git_version="v0.9.1" released="Thu, 30 Apr 2020 15:51:58 +0000" arch="x86_64"
May 06 06:39:54.174 ERROR vector::topology: Configuration error: Transform "filter": predicate type 'not_contains' not recognized

What could go wrong?

Vlad Pedosyuk
@vpedosyuk
ok it seems nightly-2020-05-05-alpine build works just fine with "not_contains"
Filippo Giunchedi
@filippog_gitlab

@filippog_gitlab While we don't have that feature right now, but I don't think that feature is out of the question! Can I invite you to open an issue?

Certainly! I'll followup with an issue, thank you!

Bruce Guenter
@bruceg
@vpedosyuk The generic not_X condition was only recently merged and is not yet present in a released version other than nightlies.
Kevin Liu
@nivekuil
Hi, I wanted to point out that the docs here are outdated for the loki sink: https://vector.dev/docs/setup/installation/platforms/docker/ -- apparently you need an encoding key.
I'll also suggest using something like strum (https://github.com/Peternator7/strum/wiki/Derive-EnumIter) to log possible enum variants to the end user for visibility if they miss a key like that
valerypetrov
@valerypetrov

Hey, guys. I'm evaluating the Vector as a replacement for Logstash. I'm faced with an issue, that vector can't utilize all cores on the machine. I have the following processor Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz . And the following vector configuration:

[sources.tcp_app]
  type = "socket"
  address = "0.0.0.0:1888" 
  max_length = 102400 
  mode = "tcp"



[transforms.json_parser_app]
  type = "json_parser" 
  inputs = ["tcp_app"] 
  drop_field = true
  drop_invalid = true
  field = "message" 


[transforms.add_tags]

  inputs = ["json_parser_app"] 
  type = "add_fields" 
  fields.tags = ["application" , "unified"]

[sinks.kafka]

  type = "kafka" 
  inputs = ["add_tags"] 
  bootstrap_servers = "kafka_servers" 
  compression = "gzip" 
  healthcheck = true
  topic = "app" 
  buffer.type = "memory" 
  buffer.max_events = 5000000"
  buffer.when_full = "block"
  librdkafka_options."fetch.error.backoff.ms" = "1000" 
  librdkafka_options."socket.send.buffer.bytes" = "100000000" 
  librdkafka_options."partitioner" = "random"
  librdkafka_options."message.max.bytes" = "1000000000"
  librdkafka_options."request.required.acks" = "1"
  librdkafka_options."socket.keepalive.enable" = "true"
  librdkafka_options."linger.ms" = "750"
  librdkafka_options."batch.num.messages" = "30000"
  encoding.codec = "json" 

[sources.internal_metrics]
  type = "internal_metrics"

[sinks.influxdb]

  type = "influxdb_metrics" 
  inputs = ["internal_metrics"] 
  endpoint = "influxdb_url" 
  namespace = "vector" 
  database = "elk-vector-stats" 
  healthcheck = true

The avg rate with 16 cores is around 200k messages per second. And with the 4 cores VM the result is the same. What you would recommend with this one? Thanks in advance!

image.png
^
|
VM performance result
image.png
^
|
Baremetal performance result
valerypetrov
@valerypetrov
Also, I've tried to decrease the number of threads to 1. And the performance was around 100k msg/sec. And I started to increase it with an increment of 1. I've stopped on 4 threads because the further increasing of threads didn't give the performance improvements. That's is very strange
Luke Steensen
@lukesteensen
@valerypetrov which version of vector are you running? we're currently working to remove a limitation where transforms can only run on a single thread and become a bottleneck
there was also a previous issue with our underlying scheduler that caused excessive contention above 4 threads, so we had to limit that. however, recent versions should no longer have that issue.
valerypetrov
@valerypetrov
@lukesteensen Vector is starting. version="0.9.1" git_version="v0.9.1" released="Thu, 30 Apr 2020 15:51:58 +0000" arch="x86_64"
I've installed the latest version from the vector docs
Luke Steensen
@lukesteensen
gotcha. that should include some of our improvements, but each transform (e.g. json parser and add_fields in your config) will still only be getting a single thread each right now.
out of curiousity, about how many concurrent incoming connections do you have on your socket source? we're working to be able to shift each of those transforms to get up to a thread per connection in the source
valerypetrov
@valerypetrov
If we are talking about lab env, it's around 3-4k per instance.
Do you plan to make transform multithreaded?