Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 07 2019 01:26
    scullxbones commented #216
  • Jan 06 2019 15:04
    thjaeckle commented #216
  • Jan 06 2019 14:58
    thjaeckle synchronize #216
  • Jan 06 2019 02:53
    scullxbones commented #216
  • Jan 04 2019 15:08
    thjaeckle synchronize #216
  • Jan 04 2019 12:02
    thjaeckle synchronize #216
  • Jan 04 2019 11:10
    thjaeckle opened #216
  • Jan 03 2019 15:52
    pepite commented #214
  • Jan 03 2019 15:09
    scullxbones commented #214
  • Jan 03 2019 15:01
    scullxbones commented #215
  • Jan 03 2019 14:25
    thjaeckle opened #215
  • Jan 03 2019 09:29
    gbrd commented #193
  • Jan 02 2019 22:03
    pepite commented #214
  • Jan 02 2019 21:33
  • Jan 02 2019 15:50
    gbrd commented #193
  • Jan 02 2019 15:50
    gbrd commented #193
  • Jan 02 2019 15:48
    yahor-filipchyk commented #37
  • Jan 02 2019 15:12
    thjaeckle commented #193
  • Jan 02 2019 15:01
    gbrd commented #193
  • Dec 29 2018 14:14
    scullxbones commented #37
Gaël Bréard
@gbrd
The fix could be just to stop cleaning up metadata collection ?
Brian Scully
@scullxbones
@gbrd yes that sounds right to me. performing the migration should not change sequence numbers at all, it's purely about separating different pids into different collections
Gaël Ferrachat
@gael-ft

Hello @scullxbones,

I saw that Akka team now provides AkkaProjection for to handle event sourced processing. Did you have a look at it ? If yes, do you plan to support it at some point ?

Thanks,

Brian Scully
@scullxbones
hi @gael-ft - I hadn't seen that yet, thanks for bringing it to my attention. Seems like it formalizes read projections. Will have to research it more.
That said, I don't immediately see anything extra that the plugin would need to do. Beyond fixing the global sequence problem that we struggle with re: eventsByTag probably.
It would be useful to consolidate the DB, so a MongoDB offset store maybe? Again this is very new to me, so not sure. I don't see a clear "offset store" plugin SPI, so that could be a bit unstable.
Gaël Ferrachat
@gael-ft

@scullxbones Looking at CassandraProjection implementation and akka cqrs sample, it abstracts the offset store, how commit is performed, and some stream management.

For example we could have MongoProjection.atLeastOnce(...) or MongoProjection.grouped(...) for batches...
The offset store could store object such as:

{
_id: ObjectId
projectionName: String // eg. PersonProjection if we handle events about a Person entity
projectionKey: String // eg. personTag-1 for Person events tagged
offset: ObjectIdOffset. // from your lib
}
`

Idea is that there is some boilerplate code around event processing and with this AkkaProjection we can focus on event processing itself.
Of course, we could have our own implementation, but I think it is quite common code.

https://github.com/akka/akka-projection/blob/master/akka-projection-cassandra/src/main/scala/akka/projection/cassandra/internal/CassandraProjectionImpl.scala

Gaël Ferrachat
@gael-ft

Hello @scullxbones

I read the discussion in the issue scullxbones/akka-persistence-mongo#37.
Regarding ordering of events there are comments about having a timestamp field to have precision lower than 1s but AFAIK, current eventsByTag queries still uses _id field.

Is there any ongoing dev to have a precision lower than 1s or am I missing something ?

Brian Scully
@scullxbones
There have been several attempts, e.g. #214 to get the sequence number stuff going. Timestamp has its own problems due to clock skew between nodes. Ideally there's a single monotonic sequence to use to sort, hence my push for sequence number.
I don't know of any ongoing development, @gael-ft...
Gaël Ferrachat
@gael-ft

Thanks for the reply @scullxbones,

I am aware of the clock problem, but AFAIK the problem is already present in current versions due to timestamp part of the ObjectID.
You even mention it in the docs: For offsets to operate correctly in a distributed environment, the system clocks of all journal-writing processes should be synchronized

Regarding the global counter, I agree it would solve the problem, but would also lead to performance loss which is why I think this is blocked for now.

The main problem I can see for now is that I can deal with clocks while deploying my app, so mentioning it in the docs should get the user attention.
But I have no way to solve the ordering sequence problem so I am kind of stuck with it.
That being said, I can sometimes predict that ordering of events is broken and try to fallback on something but ...

If I can give my opinion, I think that having this timestamp < 1s in place would not create any new problems and would solve the issues for the majority of the people using eventsByTag.
(Any way if you have distributed environment, with persisted data with not sync'ed clocks, you will probably have problems at some point...)
Of course the clock problem remains, but if I (as a user) take care of it, then problem solved

I am seeing it this way:

Now:

  • Problem 1: clocks
  • Problem 2: ordering precision

After:

  • Problem1: clocks
Brian Scully
@scullxbones

@gael-ft I agree on the existing problem with clock skew. The intent was for that to be a solution for some part of use cases: < 1/s event rate, or ordering insensitive/eventually consistent (like CRDT) come to mind. I was trying to eliminate the skew with the global counter.

That said, I see the following viable (not exclusive) options to fix sorts for EventsByTag:

  • Write timestamps always, document as you say, and make their use for sort optional. I think this is the easiest to implement. It improves on resolution assuming skew can be managed well.
  • Add global counter, document as you say (this is a performance impact due to atomic counter in mongo, either at journaled write concern or especially majority), and make its use optional
  • Add per-tag counter, again performance impact, and limited to EventsByTag use only. Less contention due to mongo document level locking (1 document = 1 sequence), again making use optional

WDYT?

Gaël Ferrachat
@gael-ft

@scullxbones

From my perspective:

  • First point has a real benefit and as said before, I can't find any drawbacks compared to the use of _id, only improvments.
  • Second point is interesting if you are not so sure about your infrastructure and / or performance is not a real criterion.
  • Not so sure about this one, because allEvents would have ordering problem, whereas the second point has not.

So to resume I would have three strategies, driven from a conf for example:

  • the default: current strategy which uses _id (for backward comp etc... but could be marked as deprecated in flavor of the timestamp one)
  • the timestamp: documented as being an improvment on ordering precision but is clock dependant as it was the case earlier
  • the global counter: documentend as being clock free but with performance drawback
Gaël Ferrachat
@gael-ft
PS: if performance is not a problem, as mentionned for the global counter, we can also imagine that the timestamp stategy would work the same ...
Gaël Ferrachat
@gael-ft
Note also that even if global counter is chosen, timestamp could also be written as it does not impact performance and would allow go switch back to the timestamp stategy without any migration
Gaël Ferrachat
@gael-ft
WDYT ?
Nicholas Molenaar
@nmolenaar
Hey guys, would you prefer untyped or typed actors in a system with mostly persistence actors?
1 reply
Brian Scully
@scullxbones
@gael-ft sorry, lost track of replies. I think starting with the timestamp would be fine, and can move forward from there based on feedback. Keep it simple with zero migration, using a compound index on timestamp, then _id for backward compat. That would also allow a user to do a custom migration as long as they supplied the correct field name.
1 reply
@nmolenaar I cannot speak to typed actors, but my experience is good with untyped actors & persistence.
Nicholas Molenaar
@nmolenaar
Thank you!
updatus
@updatus
@scullxbones hi Brian.
is your plugin compatible with akka.persistence.typed ? (i still not able to figure this out)
Brian Scully
@scullxbones
hi @updatus - I'm pretty confident that all persistence plugins support typed persistence, just reading the documentation. so it should be compatible, yes. i don't have verification in the test suite, but it's probably really easy to spike if you're testing out akka persistence typed.
updatus
@updatus
@scullxbones thank you for help.
François Guérout
@fguerout
Hi @scullxbones
First of all, thx again for your great library !
You might have seen, I've raised an issue with "appName": scullxbones/akka-persistence-mongo#407
Am I right thinking appName in URI should not be ignored, and should default to "akka-persistence-mongodb" ?
Thx in advance,
Brian Scully
@scullxbones
Hi @fguerout - yes, I definitely agree this is an issue, the uri should override any default behavior.
François Guérout
@fguerout
Thx for confirming
I've tried to switch to rxmongo instead (which does not have this issue) but I'm facing another issue with snapshot replay (header is not correctly read)
Anyway I would prefer to stay on official scala driver, @scullxbones do you have any idea when it could be fixed ?
(I can not find any easy workaround)
François Guérout
@fguerout
(fyi I'm relying on typed persistence, and apart that appName issue it's working like a charm !)
François Guérout
@fguerout
One possible fix would be to simply make that applicationName configurable (independantly from mongo URI) ?
Gaël Ferrachat
@gael-ft

Hello @scullxbones

Some of our apps are based on a "multi-tenant" architecture, so the software is shared by clients but each one have its own database, collections ...
We'll update a bit the implementation of some persisted actor, and why not using your plugin.
But here is the problem:

As they have their own database, the configuration key akka.contrib.persistence.mongodb.mongo.database is too strict for us.
I quickly thought about it and here is what I would like to suggest:

Having a akka.contrib.persistence.mongodb.mongo.database-resolver configuration key where the value is the FQCN. This class would be responsible to determine the database name when handling persistence based on the PersistenceId for example (could be other like Tag ?)
So it would look like this

// This trait would be in your library
trait DatabaseResolver {
  def databaseNameFrom(persistenceId: String): String
}

// This class would be on application side
package a.b.c

class MyDatabaseResolver extends DatabaseResolver {
  def databaseNameFrom(persistenceId: String): String = persistenceId.split('_')
}

// In configuration.conf
akka.contrib.persistence.mongodb.mongo.database-resolver = 'a.b.c.MyDatabaseResolver'

Do you think this feature would be acceptable in the lib ?

Brian Scully
@scullxbones
Hi @gael-ft - sure I am open to adding this feature. I assume multiple plugin configurations doesn't solve the problem due to cardinality of tenants
does it need to be an interface, are there any other options via configuration? regex comes to mind, but i suppose you
would need to support substitution on the database name side
there may be a way to leverage the existing split-collection-by-persistence-id feature suffixed collection names
Gaël Ferrachat
@gael-ft

Hi @scullxbones,

Happy to see that you are open to add this feature to the lib!

Hmm being an interface is not mandatory, but I think this is the most generic way of doing it, to let application side leading the logic.
That being said, it might be useful to have some provided logics (like regex as you said).
I'll describe what's happening for most of our apps to let you decide if it could be a builtin logic.

Considering

  • a persisted actor named PersonActor
  • a client named 'potato'

The persistence identifier will have the following format PersonActor|potato_<objectId>
I am not sure but in some cases it could be PersonActor|potato|<objectId>

Any way it could be handled by regex.

But the database name is prefixed by our company name so database name becomes mycompany-potato, mycompany-tomato ...

So in our specific case, having an extra configuration key to add a prefix would be perfect

To conclude the configuration could look like (I omitted akka.contrib.persistence.mongodb.mongo)

// if I want to do it manually
database-resolver = "my.clazz" 

// if I want to use the builtin logic described above
database-regex-from-persistence-id = "(regexForEntityTypeHint)|(regexForClient)_(regexForEntityObjectId)" // eg. "(\w+)|(\w+)_([a-z0-9])"
database-regex-prefix = "mycompany"
Brian Scully
@scullxbones
Ok. This does seem more sophisticated, and I haven't had the request before. It's probably best to run with your original approach of doing it manually. I'd like it done in the same interface as the suffixed collection names, since these things are related
wanting to separate events into different collections, different dbs or different dbs+different collections all seem like a valid form of either sharding or multi-tenancy
what are the thoughts on the read/query side? how to support multi collection / multi db for the various akka specified queries as well as the all events query?
Gaël Ferrachat
@gael-ft

That's ok for me.

Regarding read / query side, my opinion on that is that if you configured your write side as multi-tenant, then the read side should behave the same.
In fact if the application wants to merge differents streams from different tenant, it can be quite easily done with akka streams.

Note that the opposite could be done as well (ie. all events then app filters the events) but, for me, it looks way more dangerous because tenant are mixed by default ...,
and from performance point of view, I think the first one is better as well.

But I have to admit that I don't really know how to handle the tenant from the interfaces provided by Akka.
Those interfaces are bounded to a plugin identifier, and then provides methos which does not have any tenancy stuff.

I will have to deep into the code to see what are the solutions. do you have idea ?

As I write the previous lines, I think we could have a method in the ScalaDslMongoReadJournal to set the database, if it was not given the configuration (ie. database-resolver was given)
Elmar Sonnenschein
@nleso
I noticed that on errors the full connection URL will be logged, including the credentials. This leaks sensitive information into any consumers of the log. Is there a way to specify the credentials or at least the password outside of the URL?
Brian Scully
@scullxbones

Hi @nleso -

It is not documented in the latest documentation, but there is a legacy approach to supply connection information to the plugin that can be seen in the older documentation. The implementation is in MongoSettings.MongoUri:

  val MongoUri: String = Try(config.getString("mongouri")).toOption match {
    case Some(uri) => uri
    case None => // Use legacy approach
      val Urls = config.getStringList("urls").asScala.toList.mkString(",")
      val Username = Try(config.getString("username")).toOption
      val Password = Try(config.getString("password")).toOption
      val DbName = config.getString("db")
      (for {
        user <- Username
        password <- Password
      } yield {
        s"mongodb://$user:$password@$Urls/$DbName"
      }) getOrElse s"mongodb://$Urls/$DbName"
  }

You can see if the mongouri configuration is not supplied, it falls back to using the legacy fields of urls, username, password, db ... which now that I've typed all this out I see just generates a URI. Hmm

do you know where the logging is coming from? is it from the plugin or the underlying driver?
Elmar Sonnenschein
@nleso
No, I have no idea. I had just noticed the full URL appearing in the logs after a wrong configuration caused a connection error. It doesn't occur if all works well so normally nobody will notice. But still, it's a bit uncomfortable to have credentials leaking into the cluster-wide log system on network errors... :-)
Brian Scully
@scullxbones
Can you share the log statement? Omitting the credentials of course :)
Elmar Sonnenschein
@nleso

One error message was:

Could not parse URI 'mongodb://<user>:<pw>@<host>:27017': authentication information found but no database name in URI

Another one occurred when the DB host was not reachable, but I don't have the exact error message available