This is a second freeze break request for Fedora Tagger. I haven't actually carried out the first one yet, but Kevin's mention of the broken bodhi sqlitebuildtags url took us down a rabbit hole.
This is the traceback that tagger logs when the bodhi masher tries to pull down its sqlitebuildtags:
Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/flask/app.py", line 1687, in wsgi_app response = self.full_dispatch_request() File "/usr/lib/python2.6/site-packages/flask/app.py", line 1360, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/lib/python2.6/site-packages/flask/app.py", line 1358, in full_dispatch_request rv = self.dispatch_request() File "/usr/lib/python2.6/site-packages/flask/app.py", line 1344, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/lib/python2.6/site-packages/fedoratagger/api/api.py", line 461, in tag_pkg_sqlite return fedoratagger.lib.sqlitebuildtags() File "/usr/lib/python2.6/site-packages/fedoratagger/lib/sqlite_export.py", line 56, in sqlitebuildtags conn.executemany(insert_statement, rows) IntegrityError: columns name, tag are not unique
The problem is that the schema upgrade from tagger1 to tagger2 didn't go smoothly in production, and we didn't notice. There is supposed to be a DB constraint that disallows duplicate tags on packages, but it is absent.
We can't just apply that new constraint, because there are now two duplicate tags in the DB.
We need to:
1) Remove the duplicates 2) Apply the constraint so it doesn't happen again
Here's the constraint we need to apply in postgres:
ALTER TABLE tag ADD CONSTRAINT unique_package_label UNIQUE (package_id, label);
Here's a script to remove the duplicates:
#!/usr/bin/env python
import os os.environ['FEDORATAGGER_CONFIG'] = '/etc/fedoratagger/fedoratagger.cfg'
from sqlalchemy import func, and_ import fedoratagger import fedoratagger.lib.model as m
print 'Looking now for package tag dupes.' query = fedoratagger.SESSION.query(m.Tag).all() results = {}
for tag in query: results[tag.label] = results.get(tag.label, {}) results[tag.label][tag.package_id] = \ results[tag.label].get(tag.package_id, 0) + 1
dupes = [] for label, packages in results.items(): for package, count in packages.items(): if count > 1: dupes.append((label, package))
print "Found these package+tag dupes:", dupes
base_query = fedoratagger.SESSION.query(m.Tag) for label, package_id in dupes: query = base_query.filter(and_( m.Tag.label==label, m.Tag.package_id==package_id )).all()
keep, rest = query[0], query[1:] for dupe in rest: fedoratagger.SESSION.delete(dupe)
fedoratagger.SESSION.commit()
On Thu, 2013-05-16 at 11:27 -0400, Ralph Bean wrote:
Here's the constraint we need to apply in postgres:
ALTER TABLE tag ADD CONSTRAINT unique_package_label UNIQUE
(package_id, label);
I was wondering if we want to remote the label_id field while we are at it. It should not be used by tagger anymore. Although thinking about it, it might give us another problem down the line, so maybe remove label_id once freeze is off, just that we shouldn't forget.
Pierre
On Thu, 16 May 2013 11:27:44 -0400 Ralph Bean rbean@redhat.com wrote:
This is a second freeze break request for Fedora Tagger. I haven't actually carried out the first one yet, but Kevin's mention of the broken bodhi sqlitebuildtags url took us down a rabbit hole.
This is the traceback that tagger logs when the bodhi masher tries to pull down its sqlitebuildtags:
...snip...
The problem is that the schema upgrade from tagger1 to tagger2 didn't go smoothly in production, and we didn't notice. There is supposed to be a DB constraint that disallows duplicate tags on packages, but it is absent.
We can't just apply that new constraint, because there are now two duplicate tags in the DB.
We need to:
- Remove the duplicates
- Apply the constraint so it doesn't happen again
Can we test this in stg first? Then if all goes well do it in prod?
+1 if we can do that to make sure there's no thinkos in the script, etc.
(we should be able to dump/restore the db over to stg from prod so it has the same data too, right?)
kevin
On Thu, May 16, 2013 at 10:33:38AM -0600, Kevin Fenzi wrote:
Can we test this in stg first? Then if all goes well do it in prod?
+1 if we can do that to make sure there's no thinkos in the script, etc.
(we should be able to dump/restore the db over to stg from prod so it has the same data too, right?)
Yup. I already tried without dump/restore in stg and there were no duplicates there. (There are two duplicates in production).
I'll dump/restore and try it in stg and report if there are any issues.
On Thu, May 16, 2013 at 12:49:47PM -0400, Ralph Bean wrote:
On Thu, May 16, 2013 at 10:33:38AM -0600, Kevin Fenzi wrote:
Can we test this in stg first? Then if all goes well do it in prod?
+1 if we can do that to make sure there's no thinkos in the script, etc.
(we should be able to dump/restore the db over to stg from prod so it has the same data too, right?)
Yup. I already tried without dump/restore in stg and there were no duplicates there. (There are two duplicates in production).
I'll dump/restore and try it in stg and report if there are any issues.
+1, assuming there are no issues with the dump/restore in stg.
luke
On Thu, May 16, 2013 at 11:27:44AM -0400, Ralph Bean wrote:
We need to:
- Remove the duplicates
- Apply the constraint so it doesn't happen again
We talked about the dedupe script on IRC and threebean modified it to transfer the votes on tags being deleted to the one tag that we were keeping.
+1
-Toshio
infrastructure@lists.fedoraproject.org