Re: [vdsm] May I apply for a user account on jenkins.ovirt.org to run VDSM functional tests?
by Ewoud Kohl van Wijngaarden
On Tue, Jan 29, 2013 at 06:15:08AM -0500, Eyal Edri wrote:
> ----- Original Message -----
> > From: "Zhou Zheng Sheng" <zhshzhou(a)linux.vnet.ibm.com>
> > To: infra(a)ovirt.org
> > Cc: "ShaoHe Feng" <shaohef(a)linux.vnet.ibm.com>
> > Sent: Tuesday, January 29, 2013 12:24:27 PM
> > Subject: May I apply for a user account on jenkins.ovirt.org to run VDSM functional tests?
> >
> > Hi all,
> >
> > I notice there is no VDSM functional tests running in oVirt Jenkins.
> > Currently in VDSM we have some XML-RPC functional test cases for
> > iSCSI,
> > localfs and glusterfs storage as well as creating and destroying VMs
> > on
> > those storage. Functional tests through JSON-RPC are under review. I
> > also submit a patch to Gerrit for running the tests easily
> > (http://gerrit.ovirt.org/#/c/11238/). More test cases will be added
> > to
> > improve test coverage and reduce the chance of regression.
> >
> > Some bugs that can not be covered by unit test can be caught by
> > functional tests. I think it would be helpful to run these functional
> > tests continuously. We can also configure the Gerrit trigger in
> > Jenkins
> > to run functional tests when someone verifies the patch or when it
> > gets
> > approved but not merged. This may be helpful to the maintainer.
> >
> > I've setup a Jenkins job for VDSM functional tests in my lab server.
> > You
> > can refer to the job configuration of my current setup
> > (https://github.com/edwardbadboy/vdsm-jenkins/blob/master/config.xml).
> > After my patch in Gerrit is accepted, the job configuration will be
> > simpler and the hacks can be removed. May I apply a user account for
> > creating job in the oVirt Jenkins?
> >
>
> Hi Zhou,
> Basically there shouldn't be any problem with that.
> we have an option for giving a 'power-user' permissions for certain
> users on oVirt misc projects to add and and configure jobs for thier
> project.
>
> it requires knowledge in jenkins, which it seems that you have and
> recognition from the team/other developers from the relevant project
> (in this case, VDSM) that you are an active member of the project.
> (just a formality essentially)
>
> I've added engine-devel list to this thread so anyone from vdsm team
> can vote +1 for adding you as a power user for jenkins.
>
> once we'll receive a few +1 and not objections i'll create a user for
> you and send you the details.
>
I think vdsm-devel is more relevant here.
11 years, 3 months
Meeting minutes, Jan 28
by Dan Kenigsberg
Following are the notes that I've taking during the call.
I'm pretty sure that I'm missing something important, such as another blocker
BZ, so please reply with corrections/addtions.
- Adam: going to introduce Python binding for the new Vdsm API
- Sharad has joined our call for the first time. Welcome!
Interested in backup/restore functionality in oVirt, to facilitate
integration with TSM, presumably
http://www-142.ibm.com/software/products/us/en/tivostormana/
- Federico warns that live snapshot is not enough, since we lack live
merge. After a year of daily backups, each qcow chain would be 365
hops deep.
ovirt-3.2:
- We should probably revert http://gerrit.ovirt.org/9315 (integrate
zombie reaper in supervdsmServer) as it tickles a bug in
multiprocessing. See
http://lists.ovirt.org/pipermail/users/2013-January/011857.html for
more details in the thread starting in
http://lists.ovirt.org/pipermail/users/2013-January/011747.html
(thanks for raising the issue and finding the culprit, Dead Horse)
- Danken believes that all important el6-specific issues reported by dreyou are
now handled in the master and ovirt-3.2 branches. Please report this list if
he's wrong.
- Federico reports that udev is going to revert to its Fedora 18alpha
behavior so that we can keep our integration with it. I'd love to see
the udev BZ listed in our tracker
https://bugzilla.redhat.com/showdependencytree.cgi?id=881006&hide_resolved=1
as I do not recall it. Federico, could you add it? We should probably
require a newer udev release.
- Federico reports that we must take a short-but-important patch from Lee
Yarwood: http://gerrit.ovirt.org/#/c/11281/
Cross-distro:
- Adam: a guy from IBM is now listing issues that block running Vdsm on
Ubuntu. Yay!
- Saggi suggests that IBM contribute an Ubuntu slave for Jenkins, so
that each and every patch is tested not to introduce Ubuntu
regressions - just as Red Hat has recently done with EL6.
- Danken: yes, we have got el6-based unit-testing running. If you are still
getting an unjustified X, you may need to rebase on current master.
- Federico: NetworkManager 9 has bridge support, which leads the way to
NM-based implementation of Vdsm's configNetwork.
- Danken: Toni is working on a suggestion for refactoring configNetwork
in such a way that multiple implementations (e.g. ifcfg-based,
NM-based) can coexist. Stay stuned for his Wiki page on the subject.
Other refactoring going on:
- Vinzenz has begun breaking the horrible libvirtvm+vm to edible-size
pieces. Reviews are most welcome, particularly from Mark Wu, whose
http://gerrit.ovirt.org/10054/ only awaits verification tick.
Functional Testing:
- Adam: a guy from IBM (Zhou, I presume) is running the functional tests
via Jenkins on his laptop. Yay!
- Adam: plans to add a functional test for the getVdsCaps verb, as he
found out that an evil Vdsm developer has added values to the caps
without updating the schema.
- P.S. FlowID is now in for storage verbs as well (thanks, Douglas). I
hope Haim is happier now.
Happy coding!
11 years, 3 months
A question about the SPM operation permission in VDSM
by shuming@linux.vnet.ibm.com
Hi,
In the VDSM code about some SPM operations like HSM.deleteImage(), It is
found that VDSM doesn't check if the operation will be launched on a SPM
host or not. It only checks if the storage pool is already acquired by
one SPM host, but not necessary the same host as the SPM operation is
delivered to. The code is like this:
HSM.deleteImage()
{
...
HSM._spmSchedule()
{
self.validateSPM(spUUID) <--- Only check if the storage pool was
acquired by one host, but not necessary this host
}
...
}
So it really depends on the node management application AKA ovirt-engine
to dispatch the SPM operations to the right VDSM host. And the VDSM host
itself doesn't check if it is the SPM host which can execute the
operation. To me, it is a bit broken. When the engine query the VDSM
host who is the SPM host, it can get the right one. However, the host
may be broken for some reason after the engine believes it is the SPM
host and the host loses the SPM privilege, another host will take the
SPM role. Then the engine continue to send the SPM operations to the
broken host. As a result, the SPM operation will be launched on a
non-SPM host. So I think there is a small window of racing to corrupt
the VDSM hosts meta data. I think VDSM host should check if it is SPM
before the SPM job is scheduled. If the host lost the SPM role already,
It should fail the RPC call from the engine to let the engine to retry
the operation after engine knows the failure of the former call.
--
---
舒明 Shu Ming
Open Virtualization Engineerning; CSTL, IBM Corp.
Tel: 86-10-82451626 Tieline: 9051626 E-mail: shuming(a)cn.ibm.com or shuming(a)linux.vnet.ibm.com
Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District, Beijing 100193, PRC
11 years, 3 months
Re: [vdsm] [Users] latest vdsm cannot read ib device speeds causing storage attach fail
by wudxw@linux.vnet.ibm.com
Great work!
The default action for SIGCHLD is ignore, so there's no problems
reported before a signal handler is installed by zombie reaper.
But I still have one problem: the python multiprocessing.manager code
is running a new thread and according to the implementation of python's
signal, only the main thread can receive the signal.
So how is the signal delivered to the server thread?
On Fri 25 Jan 2013 12:30:39 PM CST, Royce Lv wrote:
>
> Hi,
> I reproduced this issue, and I believe it's a python bug.
> 1. How to reproduce:
> with the test case attached, put it under /usr/share/vdsm/tests/,
> run #./run_tests.sh superVdsmTests.py
> and this issue will be reproduced.
> 2.Log analyse:
> We notice a strange pattern in this log: connectStorageServer be
> called twice, first supervdsm call succeed, second fails becasue of
> validateAccess().
> That is because for the first call validateAccess returns normally
> and leave a child there, when the second validateAccess call arrives
> and multirprocessing manager is receiving the method message, it is
> just the time first child exit and SIGCHLD comming, this signal
> interrupted multiprocessing receive system call, python managers.py
> should handle INTR and retry recv() like we do in vdsm but it's not,
> so the second one raise error.
> >Thread-18::DEBUG::2013-01-22 10:41:03,570::misc::85::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 192.168.0.1:/ovirt/silvermoon /rhev/data-center/mnt/192.168.0.1:_ovirt_silvermoon' (cwd None)
> >Thread-18::DEBUG::2013-01-22 10:41:03,607::misc::85::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 192.168.0.1:/ovirt/undercity /rhev/data-center/mnt/192.168.0.1:_ovirt_undercity' (cwd None)
> >Thread-18::ERROR::2013-01-22 10:41:03,627::hsm::2215::Storage.HSM::(connectStorageServer) Could not connect to storageServer
> >Traceback (most recent call last):
> > File "/usr/share/vdsm/storage/hsm.py", line 2211, in connectStorageServer
> > conObj.connect()
> > File "/usr/share/vdsm/storage/storageServer.py", line 303, in connect
> > return self._mountCon.connect()
> > File "/usr/share/vdsm/storage/storageServer.py", line 209, in connect
> > fileSD.validateDirAccess(self.getMountObj().getRecord().fs_file)
> > File "/usr/share/vdsm/storage/fileSD.py", line 55, in validateDirAccess
> > (os.R_OK | os.X_OK))
> > File "/usr/share/vdsm/supervdsm.py", line 81, in __call__
> > return callMethod()
> > File "/usr/share/vdsm/supervdsm.py", line 72, in <lambda>
> > **kwargs)
> > File "<string>", line 2, in validateAccess
> > File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
> > raise convert_to_error(kind, result)
> the vdsm side receive RemoteError because of supervdsm server
> multiprocessing manager raise error KIND='TRACEBACK'
> >RemoteError:
> The upper part is the trace back from the client side, the following
> part is from server side:
> >---------------------------------------------------------------------------
> >Traceback (most recent call last):
> > File "/usr/lib64/python2.6/multiprocessing/managers.py", line 214, in serve_client
> > request = recv()
> >IOError: [Errno 4] Interrupted system call
> >---------------------------------------------------------------------------
>
> Corresponding Python source code:managers.py(Server side)
> def serve_client(self, conn):
> '''
> Handle requests from the proxies in a particular process/thread
> '''
> util.debug('starting server thread to service %r',
> threading.current_thread().name)
> recv = conn.recv
> send = conn.send
> id_to_obj = self.id_to_obj
> while not self.stop:
> try:
> methodname = obj = None
> request = recv()<------------------this line been interrupted by SIGCHLD
> ident, methodname, args, kwds = request
> obj, exposed, gettypeid = id_to_obj[ident]
> if methodname not in exposed:
> raise AttributeError(
> 'method %r of %r object is not in exposed=%r' %
> (methodname, type(obj), exposed)
> )
> function = getattr(obj, methodname)
> try:
> res = function(*args, **kwds)
> except Exception, e:
> msg = ('#ERROR', e)
> else:
> typeid = gettypeid and gettypeid.get(methodname, None)
> if typeid:
> rident, rexposed = self.create(conn, typeid, res)
> token = Token(typeid, self.address, rident)
> msg = ('#PROXY', (rexposed, token))
> else:
> msg = ('#RETURN', res)
> except AttributeError:
> if methodname is None:
> msg = ('#TRACEBACK', format_exc())
> else:
> try:
> fallback_func = self.fallback_mapping[methodname]
> result = fallback_func(
> self, conn, ident, obj, *args, **kwds
> )
> msg = ('#RETURN', result)
> except Exception:
> msg = ('#TRACEBACK', format_exc())
>
> except EOFError:
> util.debug('got EOF -- exiting thread serving %r',
> threading.current_thread().name)
> sys.exit(0)
>
> except Exception:<------does not handle IOError,INTR here should retry recv()
> msg = ('#TRACEBACK', format_exc())
>
>
> 3. Actions we will take:
> (1)As a work round we can first remove the zombie reaper from
> supervdsm server
> (2)I'll see whether python has a fixed version for this
> (3)Yaniv is working on changing vdsm/svdsm communication channel to
> pipe and handle it ourselves, I believe we'll get rid of this with
> that properly handled.
>
>
> On 01/25/2013 06:00 AM, Dead Horse wrote:
>> Tried some manual edits to SD states in the dbase. The net result was
>> I was able to get a node active. However as reconstructing the master
>> storage domain kicked in it was unable to do so. It was also not able
>> to recognize the other SD with similar failure modes to the
>> unrecognized master above. Guessing the newer VDSM version borked
>> things pretty good. So being as this is a test harness and the SD
>> data is not worth saving I just smoked the all SD. I ran
>> engine-cleanup and started from fresh and all is well now.
>>
>> - DHC
>>
>>
>> On Thu, Jan 24, 2013 at 11:53 AM, Dead Horse
>> <deadhorseconsulting(a)gmail.com
>> <mailto:deadhorseconsulting@gmail.com>> wrote:
>>
>> This test harness setup here consists of two servers tied to NFS
>> storage via IB (NFS mounts are via IPoIB, NFS over RDMA is
>> disabled) . All storage domains are NFS. The issue does occur
>> with both servers on when attempting to bring them out of
>> maintenance mode with the end result being non-operational due to
>> storage attach fail.
>>
>> The current issue is now that with a working older commit the
>> master storage domain is "stuck" in state "locked" and I see the
>> secondary issue wherein VDSM cannot seem to find or contact the
>> master storage domain even though it is there. I am can mount the
>> master storage domain manually and and all content appears to be
>> accounted for accordingly on either host.
>>
>> Here is the current contents of the master storage domain metadata:
>> CLASS=Data
>> DESCRIPTION=orgrimmar
>> IOOPTIMEOUTSEC=1
>> LEASERETRIES=3
>> LEASETIMESEC=5
>> LOCKPOLICY=
>> LOCKRENEWALINTERVALSEC=5
>> MASTER_VERSION=417
>> POOL_DESCRIPTION=Azeroth
>> POOL_DOMAINS=0549ee91-4498-4130-8c23-4c173b5c0959:Active,d8b55105-c90a-465d-9803-8130da9a671e:Active,67534cca-1327-462a-b455-a04464084b31:Active,c331a800-839d-4d23-9059-870a7471240a:Active,f8984825-ff8d-43d9-91db-0d0959f8bae9:Active,c434056e-96be-4702-8beb-82a408a5c8cb:Active,f7da73c7-b5fe-48b6-93a0-0c773018c94f:Active,82e3b34a-6f89-4299-8cd8-2cc8f973a3b4:Active,e615c975-6b00-469f-8fb6-ff58ae3fdb2c:Active,5bc86532-55f7-4a91-a52c-fad261f322d5:Active,1130b87a-3b34-45d6-8016-d435825c68ef:Active
>> POOL_SPM_ID=1
>> POOL_SPM_LVER=6
>> POOL_UUID=f90a0d1c-06ca-11e2-a05b-00151712f280
>> REMOTE_PATH=192.168.0.1:/ovirt/orgrimmar
>> ROLE=Master
>> SDUUID=67534cca-1327-462a-b455-a04464084b31
>> TYPE=NFS
>> VERSION=3
>> _SHA_CKSUM=1442bb078fd8c9468d241ff141e9bf53839f0721
>>
>> So now with the older working commit I now get this the
>> "StoragePoolMasterNotFound: Cannot find master domain" error
>> (prior details above when I worked backwards to that commit)
>>
>> This is odd as the nodes can definitely reach the master storage
>> domain:
>>
>> showmount from one of the el6.3 nodes:
>> [root@kezan ~]# showmount -e 192.168.0.1
>> Export list for 192.168.0.1 <http://192.168.0.1>:
>> /ovirt/orgrimmar 192.168.0.0/16 <http://192.168.0.0/16>
>>
>> mount/ls from one of the nodes:
>> [root@kezan ~]# mount 192.168.0.1:/ovirt/orgrimmar /mnt
>> [root@kezan ~]# ls -al
>> /mnt/67534cca-1327-462a-b455-a04464084b31/dom_md/
>> total 1100
>> drwxr-xr-x 2 vdsm kvm 4096 Jan 24 11:44 .
>> drwxr-xr-x 5 vdsm kvm 4096 Oct 19 16:16 ..
>> -rw-rw---- 1 vdsm kvm 1048576 Jan 19 22:09 ids
>> -rw-rw---- 1 vdsm kvm 0 Sep 25 00:46 inbox
>> -rw-rw---- 1 vdsm kvm 2097152 Jan 10 13:33 leases
>> -rw-r--r-- 1 vdsm kvm 903 Jan 10 13:39 metadata
>> -rw-rw---- 1 vdsm kvm 0 Sep 25 00:46 outbox
>>
>>
>> - DHC
>>
>>
>>
>> On Thu, Jan 24, 2013 at 7:51 AM, ybronhei <ybronhei(a)redhat.com
>> <mailto:ybronhei@redhat.com>> wrote:
>>
>> On 01/24/2013 12:44 AM, Dead Horse wrote:
>>
>> I narrowed down on the commit where the originally
>> reported issue crept in:
>> commitfc3a44f71d2ef202cff18d7203b9e4165b546621building
>> and testing with
>>
>> this commit or subsequent commits yields the original issue.
>>
>> Interesting.. it might be related to this commit and we're
>> trying to reproduce it.
>>
>> Did you try to remove that code and run again? does it work
>> without the additional of zombieReaper?
>> does the connectivity to the storage work well? when you run
>> 'ls' on the mounted folder you get see the files without a
>> long delay ? it might related to too long timeout when
>> validating access to this mount..
>> we work on that.. any additional info can help
>>
>> Thanks.
>>
>>
>> - DHC
>>
>>
>> On Wed, Jan 23, 2013 at 3:56 PM, Dead Horse
>> <deadhorseconsulting(a)gmail.com
>> <mailto:deadhorseconsulting@gmail.com>>wrote:
>>
>> Indeed reverting back to an older vdsm clears up the
>> above issue. However
>> now I the issue is see is:
>> Thread-18::ERROR::2013-01-23
>> 15:50:42,885::task::833::TaskManager.Task::(_setError)
>> Task=`08709e68-bcbc-40d8-843a-d69d4df40ac6`::Unexpected
>> error
>>
>> Traceback (most recent call last):
>> File "/usr/share/vdsm/storage/task.py", line 840,
>> in _run
>> return fn(*args, **kargs)
>> File "/usr/share/vdsm/logUtils.py", line 42, in
>> wrapper
>> res = f(*args, **kwargs)
>> File "/usr/share/vdsm/storage/hsm.py", line 923,
>> in connectStoragePool
>> masterVersion, options)
>> File "/usr/share/vdsm/storage/hsm.py", line 970,
>> in _connectStoragePool
>> res = pool.connect(hostID, scsiKey, msdUUID,
>> masterVersion)
>> File "/usr/share/vdsm/storage/sp.py", line 643, in
>> connect
>> self.__rebuild(msdUUID=msdUUID,
>> masterVersion=masterVersion)
>> File "/usr/share/vdsm/storage/sp.py", line 1167,
>> in __rebuild
>> self.masterDomain =
>> self.getMasterDomain(msdUUID=msdUUID,
>> masterVersion=masterVersion)
>> File "/usr/share/vdsm/storage/sp.py", line 1506,
>> in getMasterDomain
>> raise se.StoragePoolMasterNotFound(self.spUUID,
>> msdUUID)
>> StoragePoolMasterNotFound: Cannot find master domain:
>> 'spUUID=f90a0d1c-06ca-11e2-a05b-00151712f280,
>> msdUUID=67534cca-1327-462a-b455-a04464084b31'
>> Thread-18::DEBUG::2013-01-23
>> 15:50:42,887::task::852::TaskManager.Task::(_run)
>> Task=`08709e68-bcbc-40d8-843a-d69d4df40ac6`::Task._run:
>> 08709e68-bcbc-40d8-843a-d69d4df40ac6
>> ('f90a0d1c-06ca-11e2-a05b-00151712f280', 2,
>> 'f90a0d1c-06ca-11e2-a05b-00151712f280',
>> '67534cca-1327-462a-b455-a04464084b31', 433) {}
>> failed - stopping task
>>
>> This is with vdsm built from
>> commit25a2d8572ad32352227c98a86631300fbd6523c1
>> - DHC
>>
>>
>> On Wed, Jan 23, 2013 at 10:44 AM, Dead Horse <
>> deadhorseconsulting(a)gmail.com
>> <mailto:deadhorseconsulting@gmail.com>> wrote:
>>
>> VDSM was built from:
>> commit 166138e37e75767b32227746bb671b1dab9cdd5e
>>
>> Attached is the full vdsm log
>>
>> I should also note that from engine perspective
>> it sees the master
>> storage domain as locked and the others as unknown.
>>
>>
>> On Wed, Jan 23, 2013 at 2:49 AM, Dan Kenigsberg
>> <danken(a)redhat.com <mailto:danken@redhat.com>>wrote:
>>
>> On Tue, Jan 22, 2013 at 04:02:24PM -0600,
>> Dead Horse wrote:
>>
>> Any ideas on this one? (from VDSM log):
>> Thread-25::DEBUG::2013-01-22
>> 15:35:29,065::BindingXMLRPC::914::vds::(wrapper)
>> client
>>
>> [3.57.111.30]::call
>>
>> getCapabilities with () {}
>> Thread-25::ERROR::2013-01-22
>> 15:35:29,113::netinfo::159::root::(speed)
>> cannot read ib0 speed
>> Traceback (most recent call last):
>> File
>> "/usr/lib64/python2.6/site-packages/vdsm/netinfo.py",
>> line 155,
>>
>> in
>>
>> speed
>> s =
>> int(file('/sys/class/net/%s/speed' %
>> dev).read())
>> IOError: [Errno 22] Invalid argument
>>
>> Causes VDSM to fail to attach storage
>>
>>
>> I doubt that this is the cause of the
>> failure, as vdsm has always
>> reported "0" for ib devices, and still is.
>>
>> it happens only when you call to getCapabilities.. so it
>> doesn't related to the flow, and it can't effect the storage.
>> Dan: I guess this is not the issue but why is the IOError?
>>
>>
>> Does a former version works with your Engine?
>> Could you share more of your vdsm.log? I
>> suppose the culprit lies in one
>> one of the storage-related commands, not in
>> statistics retrieval.
>>
>>
>> Engine side sees:
>> ERROR
>> [org.ovirt.engine.core.bll.storage.NFSStorageHelper]
>> (QuartzScheduler_Worker-96) [553ef26e]
>> The connection with details
>> 192.168.0.1:/ovirt/ds failed because of
>> error code 100 and error
>>
>> message
>>
>> is: general exception
>> 2013-01-22 15:35:30,160 INFO
>> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
>> (QuartzScheduler_Worker-96) [1ab78378]
>> Running command:
>> SetNonOperationalVdsCommand internal:
>> true. Entities affected : ID:
>> 8970b3fe-1faf-11e2-bc1f-00151712f280
>> Type: VDS
>> 2013-01-22 15:35:30,200 INFO
>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>> (QuartzScheduler_Worker-96) [1ab78378] START,
>> SetVdsStatusVDSCommand(HostName = kezan,
>> HostId =
>> 8970b3fe-1faf-11e2-bc1f-00151712f280,
>> status=NonOperational,
>> nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE),
>> log id: 4af5c4cd
>> 2013-01-22 15:35:30,211 INFO
>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>> (QuartzScheduler_Worker-96) [1ab78378]
>> FINISH, SetVdsStatusVDSCommand,
>>
>> log
>>
>> id: 4af5c4cd
>> 2013-01-22 15:35:30,242 ERROR
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (QuartzScheduler_Worker-96) [1ab78378]
>> Try to add duplicate audit log
>> values with the same name. Type:
>> VDS_SET_NONOPERATIONAL_DOMAIN. Value:
>> storagepoolname
>>
>> Engine = latest master
>> VDSM = latest master
>>
>>
>> Since "latest master" is an unstable
>> reference by definition, I'm sure
>> that History would thank you if you post the
>> exact version (git hash?)
>> of the code.
>>
>> node = el6
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org <mailto:Users@ovirt.org>
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>> --
>> Yaniv Bronhaim.
>> RedHat, Israel
>> 09-7692289
>> 054-7744187
>>
>>
>>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
11 years, 3 months
which vdsm version for 3.2?
by iheim@redhat.com
i tried running nightly engine (well, since 3.2 beta repo still contains
an old version).
from beta repo i got this vdsm version:
vdsm-4.10.3-4.fc18.x86_64
it doesn't contain latest getHardwareInfo which engine expects[1]
i enabled the nightly repo, expecting a newer vdsm, but nightly has vdsm
versions of vdsm-4.10.3-0.87
I assume -4 in beta isn't getting replaced by -0.87?
so which vdsm version are we using for 3.2 beta?
[1]2013-01-17 15:15:53,222 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand]
(QuartzScheduler_Worker-65) XML RPC error in command GetHardwareInfoVDS
( HostName =
local_host ), the error was: java.util.concurrent.ExecutionException:
java.lang.reflect.InvocationTargetException, <type
'exceptions.Exception'>:method "getVdsHardwareInfo" is
not supported
11 years, 3 months
RFC: New Storage API
by smizrahi@redhat.com
I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit.
I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it.
First major change is in terminology, there is no long a storage domain but a storage repository.
This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background.
One other changes is that repositories no longer have a UUID.
The UUID was only used in the pool members manifest and is no longer needed.
connectStorageRepository(repoId, repoFormat, connectionParameters={}):
repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster.
repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2).
connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo.
disconnectStorageRepository(self, repoId):
In the new API there are only images, some images are mutable and some are not.
mutable images are also called VirtualDisks
immutable images are also called Snapshots
There are no explicit templates, you can create as many images as you want from any snapshot.
There are 4 major image operations:
createVirtualDisk(targetRepoId, size, baseSnapshotId=None,
userData={}, options={}):
targetRepoId - ID of a connected repo where the disk will be created
size - The size of the image you wish to create
baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on
userData - optional data that will be attached to the new VD, could be anything that the user desires.
options - options to modify VDSMs default behavior
returns the id of the new VD
createSnapshot(targetRepoId, baseVirtualDiskId,
userData={}, options={}):
targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well.
size - The size of the image you wish to create
baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot
userData - optional data that will be attached to the new Snapshot, could be anything that the user desires.
options - options to modify VDSMs default behavior
returns the id of the new Snapshot
copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={})
targetRepoId - The ID of a connected repo where the new image will be created
imageId - The image you wish to copy
baseImageId - if specified, the new image will contain only the diff between image and Id.
If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export.
userData - optional data that will be attached to the new image, could be anything that the user desires.
options - options to modify VDSMs default behavior
return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method.
removeImage(repositoryId, imageId, options={}):
repositoryId - The ID of a connected repo where the image to delete resides
imageId - The id of the image you wish to delete.
----
getImageStatus(repositoryId, imageId)
repositoryId - The ID of a connected repo where the image to check resides
imageId - The id of the image you wish to check.
All operations return once the operations has been committed to disk NOT when the operation actually completes.
This is done so that:
- operation come to a stable state as quickly as possible.
- In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host.
- No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it.
- You can stop an operation at any time and remove the resulting object making a distinction between "stop because the host is overloaded" to "I don't want that image"
This means that after calling any operation that creates a new image the user must then call getImageStatus() to check what is the status of the image.
The status of the image can be either optimized, degraded, or broken.
"Optimized" means that the image is available and you can run VMs of it.
"Degraded" means that the image is available and will run VMs but it might be a better way VDSM can represent the underlying data.
"Broken" means that the image can't be used at the moment, probably because not all the data has been set up on the volume.
Apart from that VDSM will also return the last persisted status information which will conatin
hostID - the last host to try and optimize of fix the image
stage - X/Y (eg. 1/10) the last persisted stage of the fix.
percent_complete - -1 or 0-100, the last persisted completion percentage of the aforementioned stage. -1 means that no progress is available for that operation.
last_error - This will only be filled if the operation failed because of something other then IO or a VDSM crash for obvious reasons.
It will usually be set if the task was manually stopped
The user can either be satisfied with that information or as the host specified in host ID if it is still working on that image by checking it's running tasks.
checkStorageRepository(self, repositoryId, options={}):
A method to go over a storage repository and scan for any existing problems. This includes degraded\broken images and deleted images that have no yet been physically deleted\merged.
It returns a list of Fix objects.
Fix objects come in 4 types:
clean - cleans data, run them to get more space.
optimize - run them to optimize a degraded image
merge - Merges two images together. Doing this sometimes
makes more images ready optimizing or cleaning.
The reason it is different from optimize is that
unmerged images are considered optimized.
mend - mends a broken image
The user can read these types and prioritize fixes. Fixes also contain opaque FIX data and they should be sent as received to
fixStorageRepository(self, repositoryId, fix, options={}):
That will start a fix operation.
All major operations automatically start the appropriate "Fix" to bring the created object to an optimize\degraded state (the one that is quicker) unless one of the options is
AutoFix=False. This is only useful for repos that might not be able to create volumes on all hosts (SDM) but would like to have the actual IO distributed in the cluster.
Other common options is the strategy option:
It has currently 2 possible values
space and performance - In case VDSM has 2 ways of completing the same operation it will tell it to value one over the other. For example, whether to copy all the data or just create a qcow based of a snapshot.
The default is space.
You might have also noticed that it is never explicitly specified where to look for existing images. This is done purposefully, VDSM will always look in all connected repositories for existing objects.
For very large setups this might be problematic. To mitigate the problem you have these options:
participatingRepositories=[repoId, ...] which tell VDSM to narrow the search to just these repositories
and
imageHints={imgId: repoId} which will force VDSM to look for those image ID just in those repositories and fail if it doesn't find them there.
11 years, 3 months
setupNetworks failure - Host non-operational
by deepakcs@linux.vnet.ibm.com
Hi All,
I have a multi-VM setup, where I have ovirt engine on one VM and
VDSM host on another.
Discovering the host from the engine puts the host in Unassigned state,
with the error saying 'ovirtmgmt' network not found.
When i select setupNetworks and drag-drop the ovirtmgmt to setup over
eth0, i see the below error in VDSM & host goes to non-operataional state.
I tried the steps mentioned by Alon in
http://lists.ovirt.org/pipermail/users/2012-December/011257.html
but still see the same error
============= dump from vdsm.log ================
MainProcess|Thread-23::ERROR::2013-01-22
18:25:53,496::configNetwork::1438::setupNetworks::(setupNetworks)
Requested operation is not valid: cannot set autostart for transient network
Traceback (most recent call last):
File "/usr/share/vdsm/configNetwork.py", line 1420, in setupNetworks
implicitBonding=True, **d)
File "/usr/share/vdsm/configNetwork.py", line 1030, in addNetwork
configWriter.createLibvirtNetwork(network, bridged, iface)
File "/usr/share/vdsm/configNetwork.py", line 208, in
createLibvirtNetwork
self._createNetwork(netXml)
File "/usr/share/vdsm/configNetwork.py", line 192, in _createNetwork
net.setAutostart(1)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2148, in
setAutostart
if ret == -1: raise libvirtError ('virNetworkSetAutostart()
failed', net=self)
libvirtError: Requested operation is not valid: cannot set autostart for
transient network
MainProcess|Thread-23::ERROR::2013-01-22
18:25:53,502::supervdsmServer::77::SuperVdsm.ServerCallback::(wrapper)
Error in setupNetworks
Traceback (most recent call last):
File "/usr/share/vdsm/supervdsmServer.py", line 75, in wrapper
return func(*args, **kwargs)
File "/usr/share/vdsm/supervdsmServer.py", line 170, in setupNetworks
return configNetwork.setupNetworks(networks, bondings, **options)
File "/usr/share/vdsm/configNetwork.py", line 1420, in setupNetworks
implicitBonding=True, **d)
File "/usr/share/vdsm/configNetwork.py", line 1030, in addNetwork
configWriter.createLibvirtNetwork(network, bridged, iface)
File "/usr/share/vdsm/configNetwork.py", line 208, in
createLibvirtNetwork
self._createNetwork(netXml)
File "/usr/share/vdsm/configNetwork.py", line 192, in _createNetwork
net.setAutostart(1)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2148, in
setAutostart
if ret == -1: raise libvirtError ('virNetworkSetAutostart()
failed', net=self)
libvirtError: Requested operation is not valid: cannot set autostart for
transient network
11 years, 3 months
API Documentation & Since tag
by Vinzenz Feenstra
Hi everyone,
We are currently documenting the API in vdsmapi-schema.json
I noticed that we have there documented when a certain element newly is
introduced using the 'Since' tag.
However I also noticed that we are not documenting when a field was
newly added, nor do we update the 'since' tag.
We should start documenting in what version we've introduced a field.
A suggestion by saggi was to add to the comment for example: @since: 4.10.3
What is your point of view on this?
--
Regards,
Vinzenz Feenstra | Senior Software Engineer
RedHat Engineering Virtualization R & D
Phone: +420 532 294 625
IRC: vfeenstr or evilissimo
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
11 years, 3 months
remote serial console via HTTP streaming handler
by smizrahi@redhat.com
I remember that there was a discussion about it but I
don't remember it ever converging.
In any case there is a patch upstream [1] that merits
discussion outside the scope of the patch and reviewers.
The solution is somewhat elegant (and only ~150 LOC).
That being said I still have some 2 major problems with it:
The simpler one is that it uses HTTP in a very non-standard manner, this
can be easily solved by using websockets[2]. This is very close
to what the patch already does and will make it follow some sort of a
standard. This will also enable console on the Web UI to expose this on
newer browsers.
The second and the real reason I didn't put it just as a comment on the
patch is that that using HTTP and POST %PATH to have only one listening
socket for all VMs is completely different from the way we do VNC or SPICE.
This means it kind of bypasses ticketing and any other mechanism we want
to put on VM interfaces.
The thing is, I really like it. I was suggesting that we extend this idiom
to use for SPICE and VNC and tunneling it through a single http\websocket
listener. So instead of making this work with the current methods make this
the way to go.
Using headers like:
GET /VM/<VM_ID>/control HTTP/1.1
Host: server.example.com
Upgrade: websocket
Ticket: <TICKET>
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Protocol: [pty, vnc, spice]
Sec-WebSocket-Version: 13
Origin: http://example.com
I admit I have no idea if migrating SPICE would like being tunneled but I
guess there is no practical reason why that would be a problem.
[1] http://gerrit.ovirt.org/#/c/10381
[2] http://en.wikipedia.org/wiki/WebSocket
11 years, 3 months