We need a Fedora search engine. Especially for docs. Options
1) Do we run our own?
2) Do we use google.
I love 2, its easy. But it is, non-OSS. So there are moral issues at stake here. (though I've not used google to exclusively search through our sites, it may suck at it, who knows :)
So, thoughts? Who has deployed their own search engines? I've used htdig in the past.
-Mike
I use Lucene, and works pretty well. But its written in Java....
Paulo
On Jan 30, 2008 5:57 PM, Mike McGrath mmcgrath@redhat.com wrote:
We need a Fedora search engine. Especially for docs. Options
Do we run our own?
Do we use google.
I love 2, its easy. But it is, non-OSS. So there are moral issues at stake here. (though I've not used google to exclusively search through our sites, it may suck at it, who knows :)
So, thoughts? Who has deployed their own search engines? I've used htdig in the past.
-Mike
Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Mike McGrath wrote:
We need a Fedora search engine. Especially for docs. Options
- Do we run our own?
I 110% recommend running our own.
- Do we use google.
I'd like to not.
I love 2, its easy. But it is, non-OSS. So there are moral issues at stake here. (though I've not used google to exclusively search through our sites, it may suck at it, who knows :)
So, thoughts? Who has deployed their own search engines? I've used htdig in the past.
I have. I'd recommend using datapark search[1]. There is also the xapian[2] but it will need a robot to do the crawling (it has the Omega indexer, but that is designed for local data.) I've got some python code started, but it's not ready for production. htdig can be used to feed data to xapian, so if we do go the route of htdig[3], I'd recommend feeding the htdig data to the xapian framework[4] otherwise we'd need to run Omega on any system that has data and it wont index data that needs to be rendered.
[1] http://www.dataparksearch.org/ [2] http://xapian.org/ [3] http://www.htdig.org/ [4] http://xapian.org/docs/omega/quickstart.html
I also offer pointing searchfedora.org and fedorasearch.org at any Fedora hosted index.
- -- Jonathan Steffan daMaestro GPG Fingerprint: 93A2 3E2F DC26 5570 3472 5B16 AD12 6CE7 0D86 AF59
Jonathan Steffan wrote:
I have. I'd recommend using datapark search[1]. There is also the xapian[2] but it will need a robot to do the crawling (it has the Omega indexer, but that is designed for local data.) I've got some python code started, but it's not ready for production. htdig can be used to feed data to xapian, so if we do go the route of htdig[3], I'd recommend feeding the htdig data to the xapian framework[4] otherwise we'd need to run Omega on any system that has data and it wont index data that needs to be rendered.
I'm not sure if this is something that would affect the situation, but Moin 1.6 apparently has Xapian integration. So if Xapian is used in one way or another, we could maybe get better search results from the wiki as well. I've never used Xapian (with or without Moin) though, so I don't have any experience on it.
Ville-Pekka Vainio wrote:
Jonathan Steffan wrote:
I have. I'd recommend using datapark search[1]. There is also the xapian[2] but it will need a robot to do the crawling (it has the Omega indexer, but that is designed for local data.) I've got some python code started, but it's not ready for production. htdig can be used to feed data to xapian, so if we do go the route of htdig[3], I'd recommend feeding the htdig data to the xapian framework[4] otherwise we'd need to run Omega on any system that has data and it wont index data that needs to be rendered.
I'm not sure if this is something that would affect the situation, but Moin 1.6 apparently has Xapian integration. So if Xapian is used in one way or another, we could maybe get better search results from the wiki as well. I've never used Xapian (with or without Moin) though, so I don't have any experience on it.
Yes, Moin is able to use Xapian for searching. It should be somewhat easy to export and merge Xapian data from Moin into a larger search system. It might also be quicker to index data "internally" (read: just have Moin index itself) rather then asking Moin to render pages for an external indexer to read.
On Wed, Jan 30, 2008 at 11:57:23AM -0600, Mike McGrath wrote:
We need a Fedora search engine. Especially for docs. Options
Do we run our own?
Do we use google.
I love 2, its easy. But it is, non-OSS. So there are moral issues at stake here. (though I've not used google to exclusively search through our sites, it may suck at it, who knows :)
So, thoughts? Who has deployed their own search engines? I've used htdig in the past.
I know J5 has been working on a search controller for MyFedora, which will be responsible for scouring a bunch of our resources. I don't see why we wouldn't be able to search docs as well.
luke
On Wed, 30 Jan 2008, Luke Macken wrote:
On Wed, Jan 30, 2008 at 11:57:23AM -0600, Mike McGrath wrote:
We need a Fedora search engine. Especially for docs. Options
Do we run our own?
Do we use google.
I love 2, its easy. But it is, non-OSS. So there are moral issues at stake here. (though I've not used google to exclusively search through our sites, it may suck at it, who knows :)
So, thoughts? Who has deployed their own search engines? I've used htdig in the past.
I know J5 has been working on a search controller for MyFedora, which will be responsible for scouring a bunch of our resources. I don't see why we wouldn't be able to search docs as well.
Is he writing his own? If so I'd rather use one that has an upstream that isn't us for the more global stuff. I just know there's a lot of options in the search field that don't involve any code maintenance from us.
-Mike
On Wed, 2008-01-30 at 12:34 -0600, Mike McGrath wrote:
On Wed, 30 Jan 2008, Luke Macken wrote:
On Wed, Jan 30, 2008 at 11:57:23AM -0600, Mike McGrath wrote:
We need a Fedora search engine. Especially for docs. Options
Do we run our own?
Do we use google.
I love 2, its easy. But it is, non-OSS. So there are moral issues at stake here. (though I've not used google to exclusively search through our sites, it may suck at it, who knows :)
So, thoughts? Who has deployed their own search engines? I've used htdig in the past.
I know J5 has been working on a search controller for MyFedora, which will be responsible for scouring a bunch of our resources. I don't see why we wouldn't be able to search docs as well.
Is he writing his own? If so I'd rather use one that has an upstream that isn't us for the more global stuff. I just know there's a lot of options in the search field that don't involve any code maintenance from us.
While we can leverage search projects like Lucene if we really want to serve our users we need to model how they access data.
On Wed, 2008-01-30 at 13:21 -0500, Luke Macken wrote:
On Wed, Jan 30, 2008 at 11:57:23AM -0600, Mike McGrath wrote:
We need a Fedora search engine. Especially for docs. Options
Do we run our own?
Do we use google.
I love 2, its easy. But it is, non-OSS. So there are moral issues at stake here. (though I've not used google to exclusively search through our sites, it may suck at it, who knows :)
So, thoughts? Who has deployed their own search engines? I've used htdig in the past.
I know J5 has been working on a search controller for MyFedora, which will be responsible for scouring a bunch of our resources. I don't see why we wouldn't be able to search docs as well.
luke
So search is on my F10 schedule. It would be it's own service with a pluggable backend which could farm out searches based on context (wiki, packages, web, etc.) and also give secondary searches (such as querying google). The plugins could start out simple such as searching for package names (I already have this implemented via koji). For a more full search we would need to model our search criteria (what are our users really searching for?) replicate data from our resources and index them based on the model's relationships. I'm in a meeting right now finding out about MetaMatrix which is slated to be open sourced at some point. It can pull data from different data sources (XML, SQL, etc.) and put them into common views (simple explanation). Might be good to use.
On Jan 30, 2008, at 9:57 AM, Mike McGrath wrote:
We need a Fedora search engine. Especially for docs. Options
- Do we run our own?
Definitely.
- Do we use google.
No, I think it should be OSS.
At OSU our web group uses Nutch with pretty good results. And you don't have to use the java interface, I believe that they're just exporting via XML and using PHP for the actual interface. So it could integrate just as nicely into a TG app or what have you.
http://search.oregonstate.edu/
Ryan
-- Ryan Ordway E-mail: rordway@oregonstate.edu Unix Systems Administrator rordway@library.oregonstate.edu OSU Libraries, Corvallis, OR 97331 Office: Valley Library #4657
Mike McGrath wrote:
We need a Fedora search engine. Especially for docs. Options
With whatever solution we find, I'd be willing to look at setting up a "livesearch" like interface:
In plone:
http://svn.plone.org/svn/plone/CMFPlone/branches/3.0/skins/plone_scripts/liv...
feeds:
http://svn.plone.org/svn/plone/CMFPlone/branches/3.0/skins/plone_ecmascript/...
In the meantime, you can search many parts of the Fedora/Red Hat world with Firefox search plugins.
Another option is:
infrastructure@lists.fedoraproject.org