On Wed, May 19, 2021 at 12:42 PM Vojtech Juranek <vjuranek(a)redhat.com> wrote:
...
> > > +static PyObject *
> > > +disks_to_list(struct sanlk_disk *disks, uint32_t disks_count)
> > > +{
> > > + PyObject *result = NULL;
> > > + PyObject *disk = NULL;
> > > +
> > > + result = PyList_New(disks_count);
> > > + if (result == NULL)
> > > + return NULL;
>
> A memory error was set in this case...
Thanks, I didn't realize this. This was the case which was unclear to me (or
better to say which I misunderstood).
LGTM, +1 to the patch
Thanks.
Here are some tests using ovirt:
Using inquire on idle SPM host:
>> pprint.pprint(sanlock.inquire(pid=1956))
[{'disks': [('/dev/8ef0cc64-77c5-49e0-a0f5-03cee9bbee20/leases',
1048576)],
'flags': 1,
'lockspace': b'8ef0cc64-77c5-49e0-a0f5-03cee9bbee20',
'resource': b'SDM',
'version': 30}]
After starting cold merge task on a vm with 20 disks, each on different storage
domain. This starts 10 concurrent cold merge jobs on the SPM host, and every
job acquire the volume lease.
>> pprint.pprint(sanlock.inquire(pid=1956))
[{'disks': [('/dev/8ef0cc64-77c5-49e0-a0f5-03cee9bbee20/leases',
1048576)],
'flags': 1,
'lockspace': b'8ef0cc64-77c5-49e0-a0f5-03cee9bbee20',
'resource': b'SDM',
'version': 30},
{'disks': [('/dev/2f8c68ef-d5c1-4f34-9a6a-f7cb2f0aef0a/leases',
108003328)],
'flags': 1,
'lockspace': b'2f8c68ef-d5c1-4f34-9a6a-f7cb2f0aef0a',
'resource': b'4b7dd8cb-f61d-4840-9f84-87135d0c4b3b',
'version': 14},
{'disks': [('/dev/747ea907-c24c-43ff-a389-660c729703b0/leases',
108003328)],
'flags': 1,
'lockspace': b'747ea907-c24c-43ff-a389-660c729703b0',
'resource': b'fc34e8b4-fe7a-4d9f-a7e9-d4e4637d24e6',
'version': 15},
{'disks': [('/dev/71702e1e-82d2-41d2-93df-678f2415da2a/leases',
108003328)],
'flags': 1,
'lockspace': b'71702e1e-82d2-41d2-93df-678f2415da2a',
'resource': b'e181720a-9edf-46cb-b5ff-83e986cb3db3',
'version': 14},
{'disks': [('/dev/31e3675e-302a-4a22-ad9a-402c0c1e9765/leases',
108003328)],
'flags': 1,
'lockspace': b'31e3675e-302a-4a22-ad9a-402c0c1e9765',
'resource': b'3003f575-c38b-4034-8c54-3f000239b5c9',
'version': 15},
{'disks': [('/dev/f5597bd4-5c4f-4c4c-bd98-e8a33dace830/leases',
108003328)],
'flags': 1,
'lockspace': b'f5597bd4-5c4f-4c4c-bd98-e8a33dace830',
'resource': b'ba28ed6c-a0d9-4f89-821a-73a992d3a074',
'version': 15},
{'disks': [('/dev/e673e145-d644-44ac-a416-a8fc14b029ca/leases',
108003328)],
'flags': 1,
'lockspace': b'e673e145-d644-44ac-a416-a8fc14b029ca',
'resource': b'bc8556b2-095a-429c-9a61-c46f6390c022',
'version': 16},
{'disks': [('/dev/06e43bfc-2ffe-419e-843b-59a5d23165e1/leases',
108003328)],
'flags': 1,
'lockspace': b'06e43bfc-2ffe-419e-843b-59a5d23165e1',
'resource': b'8fb95469-16cd-406b-863d-2b76c2393711',
'version': 16},
{'disks': [('/dev/829b9bed-1b74-4f74-9d18-1006c87ef620/leases',
108003328)],
'flags': 1,
'lockspace': b'829b9bed-1b74-4f74-9d18-1006c87ef620',
'resource': b'75b3f9c9-32e0-4b1e-a367-a3a3f1ab2950',
'version': 16},
{'disks': [('/dev/7cf51212-8247-41b0-ab38-da5954059d7a/leases',
108003328)],
'flags': 1,
'lockspace': b'7cf51212-8247-41b0-ab38-da5954059d7a',
'resource': b'350353df-0434-4668-abdf-d7d55d5f0f49',
'version': 14}]
When all jobs finished:
>> pprint.pprint(sanlock.inquire(pid=1956))
[{'disks': [('/dev/8ef0cc64-77c5-49e0-a0f5-03cee9bbee20/leases',
1048576)],
'flags': 1,
'lockspace': b'8ef0cc64-77c5-49e0-a0f5-03cee9bbee20',
'resource': b'SDM',
'version': 30}]
Timing 1000 calls to sanlock.inquire() using pid:
>> timeit.timeit("sanlock.inquire(pid=1956)",
setup="import sanlock", number=1000)
0.04519722999998521
Timing 1000 calls to sanlock inquire using slkfd:
>> fd = sanlock.register()
>> fd
4
>> pprint.pprint(sanlock.inquire(slkfd=4))
[]
>> timeit.timeit("sanlock.inquire(slkfd=4)",
setup="import sanlock", number=1000)
0.04666791799991188
Testing 1000000 calls to see how sanlock is affected, the client uses
about 50% cpu, and sanlock about 75% cpu.
No change seen in memory usage of the python process.
Testing overloading sanlock during acquire/release operations. I run 10000
inquires while running 10 concurrent cold merge in ovirt.
One of the inquire calls failed with EBUSY:
>> timeit.timeit("sanlock.inquire(pid=1956)",
setup="import sanlock", number=100000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.6/timeit.py", line 233, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib64/python3.6/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
sanlock.SanlockException: (16, 'Inquire error', 'Device or resource
busy')
This is expected when using pid=. We will synchronize the calls to sanlock
in ovirt so we don't expect to have this issue.
The inquire calls cause one of the sanlock.release() calls in vdsm to fail:
2021-05-19 16:46:49,120+0300 INFO (tasks/1) [storage.SANLock]
Releasing Lease(name='c873c3e2-5da1-4614-b564-37324fefb9a1', path
='/rhev/data-center/mnt/storage:_export_00/b2ff7cd6-64a6-453d-b71e-22fa4b11f16e/images/d4fa7500-306b-489b-88e7-48a3e1ff1a9b/c873
c3e2-5da1-4614-b564-37324fefb9a1.lease', offset=0) (clusterlock:602)
2021-05-19 16:46:49,121+0300 ERROR (tasks/1)
[storage.TaskManager.Task]
(Task='352a44c2-09a2-46ae-9327-d5ae444ea312') Unexpected
error (task:880)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line
887, in _run
return fn(*args, **kargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in
run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py",
line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line
1856, in prepareMerge
merge.prepare(subchainInfo)
File "/usr/lib/python3.6/site-packages/vdsm/storage/merge.py", line
183, in prepare
subchain.top_vol)
File "/usr/lib/python3.6/site-packages/vdsm/storage/guarded.py",
line 111, in __exit__
self._release()
File "/usr/lib/python3.6/site-packages/vdsm/storage/guarded.py",
line 128, in _release
raise ReleaseError(errors)
vdsm.storage.guarded.ReleaseError:
[ReleaseLockFailure('b2ff7cd6-64a6-453d-b71e-22fa4b11f16e',
SanlockException(16, 'Sanlock res
ource not released', 'Device or resource busy'))]
We don't handle this in vdsm since we don't expect other processes to inquire
our leases.
This left unreleased lease in vdsm:
>> pprint.pprint(sanlock.inquire(pid=1956))
[{'disks': [('/dev/8ef0cc64-77c5-49e0-a0f5-03cee9bbee20/leases',
1048576)],
'flags': 1,
'lockspace': b'8ef0cc64-77c5-49e0-a0f5-03cee9bbee20',
'resource': b'SDM',
'version': 30},
{'disks':
[('/rhev/data-center/mnt/storage:_export_00/b2ff7cd6-64a6-453d-b71e-22fa4b11f16e/images/d4fa7500-306b-489b-88e7-48a3e1ff1a9b/c873c3e2-5da1-4614-b564-37324fefb9a1.lease',
0)],
'flags': 1,
'lockspace': b'b2ff7cd6-64a6-453d-b71e-22fa4b11f16e',
'resource': b'c873c3e2-5da1-4614-b564-37324fefb9a1',
'version': 17}]
I could release it from the python shell using inquire result:
>> res = sanlock.inquire(pid=1956)
>> res
[{'lockspace':
b'8ef0cc64-77c5-49e0-a0f5-03cee9bbee20', 'resource':
b'SDM', 'flags': 1, 'version': 30, 'disks':
[('/dev/8ef0cc64-77c5-49e0-a0f5-03cee9bbee20/leases', 1048576)]},
{'lockspace': b'b2ff7cd6-64a6-453d-b71e-22fa4b11f16e',
'resource':
b'c873c3e2-5da1-4614-b564-37324fefb9a1', 'flags': 1, 'version':
17,
'disks':
[('/rhev/data-center/mnt/storage:_export_00/b2ff7cd6-64a6-453d-b71e-22fa4b11f16e/images/d4fa7500-306b-489b-88e7-48a3e1ff1a9b/c873c3e2-5da1-4614-b564-37324fefb9a1.lease',
0)]}]
>> sanlock.release(res[1]["lockspace"],
res[1]["resource"], res[1]["disks"], pid=1956)
>> pprint.pprint(sanlock.inquire(pid=1956))
[{'disks': [('/dev/8ef0cc64-77c5-49e0-a0f5-03cee9bbee20/leases',
1048576)],
'flags': 1,
'lockspace': b'8ef0cc64-77c5-49e0-a0f5-03cee9bbee20',
'resource': b'SDM',
'version': 30}]
Nir