Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Martin Hradil
@himdel
but also.. the 5 minute interval implies the user can't just be reusing the same token forever
so maybe the problem is just that their token reneval takes 5 minutes and 1 second, and uses an old token for that one extra second, or something like that
as for how to tell.. on the appliance, in the rails console, you could run YAML.load(ManageIQ::Password.decrypt(x_miq_token)) to get a hash with the data in the token (x_miq_token being the actual token)
outside the appliance, you can't
bigo1981
@bigo1981

Hi @himdel I finally found the token error in the log

MIQ(Api::RequestsController.log_request) API Request: {:requested_at=>"2021-03-01 17:43:24 UTC"
MIQ(Api::RequestsController.rescue in authenticate_with_system_token) Authentication Failed with System Token
MIQ(Api::RequestsController.rescue in authenticate_with_system_token) X-MIQ-Token: v2:{A7WSSelfFYB2O.....
MIQ(Api::RequestsController.rescue in authenticate_with_system_token) Error: Invalid timestamp 2021-03-01 17:42:47 UTC specified

it seems that the request was made 37 sec after the timestamp. when is this v2 token created and is there a way to increase how long it lasts for?

Jason Frey
@Fryguy
with tokens it's very important that all servers have their time synced to NTP properly
Martin Hradil
@himdel

the token seems to come from the core repo, app/models/miq_region.rb api_system_auth_token
and the timeout is not 5 minutes, only 30 secons now...

app/controllers/api/base_controller/authentication.rb
6:      SYSTEM_TOKEN_TTL = 30.seconds
136:        raise "Invalid timestamp #{timestamp} specified" if SYSTEM_TOKEN_TTL.ago.utc > timestamp

I don't think there's a toggle to change it

Jason Frey
@Fryguy
oh nice find @himdel - I wonder why it's so small
Martin Hradil
@himdel
yeah, sounds like it could get lost in the queue for longer
Jason Frey
@Fryguy
I guess the next question is what is taking up those 37 seconds? Is it the queue, or something on the caller side?
Martin Hradil
@himdel
^ is the change from 5 minutes to 30s
Jason Frey
@Fryguy
yeah i can't find anything in the BZ exaplining why
though that PR was a "revert"
Martin Hradil
@himdel
yeah, not sure either
but it looks like the only caller of that logic is inter_region_api_method_relay api_client_connection_for_region,
called from process_tasks_mixin invoke_tasks_remote
Jason Frey
@Fryguy
@abellotti do you remember?
Alberto Bellotti
@abellotti
We had shifted the System token to an OTP for security reasons, but we couldn't keep it as such (was breaking central admin), so we brought it back as it was, and shortened it's TTL. Same question here, why is it taking 37 seconds.
Jason Frey
@Fryguy
usually those tokens, when used for central admin, are in a immediate request, so 37 seconds seems like a long time
for example, someone hits "power on a vm" at the global region, and while we put the request on the queue, i thought the token isn't generated until immediately before we make the remote call, so I would expect it to be immediately used (and then thrown away)
that was why I was thinking maybe the global region and the remote region have their system times drifting by more than 30 seconds
Alberto Bellotti
@abellotti
Ahh, yep that would do it too. They should be NTP synced.
bigo1981
@bigo1981

So thanks for all the replies. The global region and remote region times are fully synced to the second. we see that time increase as the load on the remote region db increases. If the load average on the db is below 10 the the calls happen in around 15 seconds but when the load gets to 20 -30 we see the times go to 30 to 60 seconds. It is a black box to us as well as to what it is doing in this time.

[----] I, [2021-03-01T15:15:14.713011 #26355:3e09950] INFO -- : MIQ(Api::ServiceCatalogsController.log_request) Response: {:completed_at=>"2021-03-01 20:15:14 UTC", :size=>"1.580 KBytes", :time_taken=>"15.014 Seconds", :status=>200}

[----] I, [2021-03-01T13:49:01.386888 #26389:3d5016c] INFO -- : MIQ(Api::ServiceCatalogsController.log_request) Response: {:completed_at=>"2021-03-01 18:49:01 UTC", :size=>"0.123 KBytes", :time_taken=>"60.334 Seconds", :status=>500}

Also we see that the more active retire jobs we have in the remote region the more load it puts on the db which we are also not sure why retires have that effect on the db. If we have 20 simultaneous retires the load average on the db goes up to 20 and it increases faster the more we have active.
Jason Frey
@Fryguy
oh wow that is interesting
bigo1981
@bigo1981
Would appreciate any information on how these things correlate with each other and any possible solutions to help alleviate this issue
bigo1981
@bigo1981

Just to provide some more data we updated the logs to show debug data and we are seeing a single vm retire task perform this query 1200 times with a different id each time. Is this expected behavior? In total we had over 70K lines for one retire.

Q-task_id([r5000000588240_vm_retire_task_5000000757649]) (85.3ms) SELECT "vms"."id" FROM "vms" WHERE (LOWER((SELECT "users"."userid" FROM "users" WHERE "users"."id" = "vms"."evm_owner_id")) = '**')

Jason Frey
@Fryguy
cc @gtanzillo ^
Nick LaMuro
@NickLaMuro
worth nothing that the LOWER stuff is probably from the AuthenticationMixin, and is probably either a scope, or a virtual_attribute
bigo1981
@bigo1981
Is this something that can be easily disabled or is this because of how we have something configured? It doesn't make sense to make this call for every user when we already know the vm owner.
Jason Frey
@Fryguy
if you're somehow able to narrow down where that line is coming from that might help
It feels like an N+1 error, but in retirement I don't understand why that would happen, since we are retiring a single entity
bigo1981
@bigo1981
@Fryguy Any suggestion on some ways i can try to narrow down where this is coming from? I tried to look through the appliance code to try and figure out what was calling this code but nothing stood out right away.
Evert Mulder
@evertmulder
Hi all. I am currently testing out manageiq kasparov and we were using LDAP intergration for authentication. Since this option was removed I started testing with "OpenID connect with keycloak" and us this for authentication and disabled local logins. I deploy manageiq using the manageiq operator on openshift. The UI is using the OpenID connect flow now, but the api (/api/auth) is still using the database authentication... Is there a way to authenticate the API when using the OpenID connect flow?
Evert Mulder
@evertmulder
Jason Frey
@Fryguy
@evertmulder Note that LDAP is still there, you just have to use the httpd-init container instead of the httpd container
it requires higher privilveges, hence why it's not the default
Evert Mulder
@evertmulder
@Fryguy I managed to get the OpenID Connect flow working. I followed the instruction from https://www.manageiq.org/docs/reference/latest/auth/openid_connect.html. But I did need to change one setting: In the Group Membership mapper I needed add groups to the access token as well. Otherwise the introspection endpoint did not show the groups and I could not login to the API. I will try the httpd-init container today.
thesystemninjaneer
@thesystemninjaneer
happy monday! I'm automating the deployment of a miq jansa-2 container on an AWS ec2 instance. is it possible to enable Embedded Ansible server role via an MIQ API call, appliance_console_cli, or some other IaC tooling?
Peter McGowan
@pemcg
@thesystemninjaneer You might be able to use configure_server_settings.rb in the tools directory (I’ve only ever tried this on a VM appliance rather than podified though). There’s an example playbook that uses this here: https://github.com/pemcg/configure-for-migration-analytics/blob/master/tasks/configure-server-settings.yml
1 reply
lrhamann
@lrhamann
I am using the podified version of manageiq. I am trying to retire a VM using the api/vms/xxx REST api. The api returns a success response but the VM is not being retired. Here is an example:
curl --insecure -H "Content-type: application/json" -d '{
  "action": "retire"
}' 'https://inframgmtinstall.apps.b-cp4mcm-23-1.cp.fyre.com/api/vms/702' --user "admin:smartvm"
{"success":true,"message":"VM id:702 name:'Lrh-rhcos-4.2' retiring","href":"https://inframgmtinstall.apps.b-cp4mcm-23-1.cp.fyre.com/api/vms/702"}%
Jason Frey
@Fryguy
@abellotti Is that the right action?
Alberto Bellotti
@abellotti
I think newer builds now also support "request_retire" as per https://www.manageiq.org/docs/reference/latest/api/reference/vms.html though retire is still there. I think request_retire goes through an automate role.
lrhamann
@lrhamann
More info on the retire request. After the REST api is issued the "Retirement State" for the VM is set to "initializing" but the VM never is retired.
Alberto Bellotti
@abellotti
weird, sounds like the provider role is not enabled to get those retire events. Any errors in the logs ?
Jason Frey
@Fryguy
@lrhamann you should also be able to trace the request through the automate logs (grep for the q-task-id, I think that is shared among them)
lrhamann
@lrhamann

I see the following in the logs.
production.log

[----] I, [2021-03-26T15:28:31.985878 #8133:2b093270cb58]  INFO -- : Started POST "/api/vms/68" for 9.211.146.90 at 2021-03-26 15:28:31 -0400
[----] I, [2021-03-26T15:28:31.995340 #8133:2b093270cb58]  INFO -- : Processing by Api::VmsController#update as JSON
[----] I, [2021-03-26T15:28:31.995492 #8133:2b093270cb58]  INFO -- :   Parameters: {"c_id"=>"68", "vm"=>{}}
[----] I, [2021-03-26T15:28:32.400886 #8133:2b093270cb58]  INFO -- : Completed 200 OK in 405ms (Views: 0.2ms | ActiveRecord: 14.2ms | Allocations: 12025)

evm.log

[----] I, [2021-03-26T15:28:32.397823 #8133:2b093270cb58]  INFO -- : MIQ(ManageIQ::Providers::Vmware::InfraManager::Vm#retire_now) Vm: retirement for [id:<68>, name:<Lrh-rhcos-4.2>] got updated while waiting to be unlocked and is now initializing

I do not find any reference to this transaction in the automate log but I do not understand what I need to be looking for.

Jason Frey
@Fryguy
was there anything in the automate logs on any generic pod, or were they all empty?
lrhamann
@lrhamann
I am currently testing using the appliance version of manageiq.
Jason Frey
@Fryguy
FYI, api is currently :red_circle: - I think it was ManageIQ/manageiq#21207, and @jrafanie is on it