On Tue, Apr 13, 2010 at 5:22 AM, Chris Tyler chris@tylers.info wrote:
I've been testing iSCSI on an OpenRD-Client system (F12).
This is the configuration:
target: F12 x86_64 system running netbsd-iscsi service, 100GB target available
initiator: F12 ARM openrd-client running iscsi and iscsid service, logged in to 100GB target
On the 100G target I have an ext3 filesystem. Under low load, this works fine, but under high load it fails, leading to data corruption.
On the target, /var/log/messages shows: Apr 11 09:53:40 hongkong iscsi-target: pid 1930:iscsi.c:1149: ***ERROR*** Bad "Opcode": Got 1 expected 5. Apr 11 09:53:40 hongkong iscsi-target: pid 1930:target.c:1318: ***ERROR*** iscsi_write_data_decap() failed Apr 11 09:53:40 hongkong iscsi-target: pid 1930:iscsi.c:1149: ***ERROR*** Bad "Opcode": Got 49 expected 5. Apr 11 09:53:40 hongkong iscsi-target: pid 1930:target.c:1318: ***ERROR*** iscsi_write_data_decap() failed Apr 11 09:53:40 hongkong iscsi-target: pid 1930:iscsi.c:1149: ***ERROR*** Bad "Opcode": Got 13 expected 5. Apr 11 09:53:40 hongkong iscsi-target: pid 1930:target.c:1318: ***ERROR*** iscsi_write_data_decap() failed Apr 11 09:53:40 hongkong iscsi-target: pid 1930:iscsi.c:1149: ***ERROR*** Bad "Opcode": Got 38 expected 5. Apr 11 09:53:40 hongkong iscsi-target: pid 1930:target.c:1318: ***ERROR*** iscsi_write_data_decap() failed ...snip...
On the initiator, /var/log/messages shows: Apr 12 21:47:57 fedora-arm kernel: connection2:0: Got CHECK_CONDITION but invalid data buffer size o f 0 Apr 12 21:47:57 fedora-arm kernel: connection2:0: detected conn error (1020) Apr 12 21:47:57 fedora-arm kernel: sd 4:0:0:0: [sdb] Unhandled error code Apr 12 21:47:57 fedora-arm kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Apr 12 21:47:57 fedora-arm kernel: sd 4:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 08 03 34 a8 00 04 00 00 Apr 12 21:47:57 fedora-arm kernel: end_request: I/O error, dev sdb, sector 134427816 Apr 12 21:47:57 fedora-arm kernel: quiet_error: 118 callbacks suppressed Apr 12 21:47:57 fedora-arm kernel: Buffer I/O error on device sdb, logical block 16803477 Apr 12 21:47:57 fedora-arm kernel: lost page write due to I/O error on sdb Apr 12 21:47:57 fedora-arm kernel: Buffer I/O error on device sdb, logical block 16803478 Apr 12 21:47:57 fedora-arm kernel: lost page write due to I/O error on sdb ...snip...
There are no transport errors reported on the target or initiator.
Wondering if this was an ARM or iSCSI issue, I repeated the same test using an x86_64 initiator against the same target, and was successful (however, on reflection, it's cabled at 100 Mbps instead of 1Gbps).
To provoke these errors, one need only write a large file quickly:
dd if=/dev/zero of=/mnt/iscsi/test1 bs=1M count=1024
Because the target was showing "opcode" errors, I wondered if CPU alignment on the ARM was an issue, but I have /proc/cpu/alignment set to 3 (fixup+warn).
Any suggestions?
Chris, did you have any luck with a resolution with this?
Peter