Show Navigation
Notices tagged with yacy
-
Search engines with their own indexes: https://seirdy.one/2021/03/10/search-engines-with-own-indexes.html
#YaCy: slow, with mostly irrelevant results
Very true, but the second part, mostly irrelevant results is a matter of degree. #Google's results were once fantastic, but these days, a big chunk of the first five pages of a search are irrelevant results, often SEO'd into high positions and knocking relevant results lower.
#Bing (and #Yahoo, #DDG, and others who use Bing's results) are also described with increasing accuracy by "mostly irrelevant results".
But yes, YaCy's results are even worse. I want to set up another YaCy instance, just to crawl sites of interest in various topics. I know there's a major software change needed to get the most improvement in search results, but having the right information in the global index is a necessary precursor to producing good results.
-
AFAIK, #Brave search uses its own crawler and index, as does #Gigablast. And let us not forget #YaCy. I intend to host a YaCy peer-to-peer search instance on its own VPS (well, shared with a #Searx instance, which will use it)
-
#LowEndBox / #LEB sending out Black Friday / Cyber Monday e-mails a week late. I'd like to get something to host #Yacy + #Searx
-
@musicman I just heard of #MeiliSearch last night, so I did not do any deep diving. I know that #YaCy uses #Solr, but I do not know if it always did.
-
https://opensource.com/article/20/2/yacy-search-engine-hacks [opensource com]
Ways to improve #YaCy a little bit.
-
Not a general search engine, and no public instance provided, but “MeiliSearch”:{https://www.meilisearch.com/} say they are an open source self-hostable search API.
I don’t think you’d use it as a general search engine, but more as something integrated into your site. So “Dog Buddy Magazine Online” might use MeiliSearch to let visitors seek out articles by breed, puppy or adult, and other factors relevant to the sites readers.
License: MIT. See https://blog.meilisearch.com/oss-paradigm/
Paid plan: I haven’t seen it, but I just discovered the software and site a few minutes ago; maybe they’re charging for support.
Anyway, it is interesting. They seem to compete partly with “ElasticSearch”:{https://www.elastic.co/elasticsearch/} (and maybe a little bit with #YaCy’s stand-alone mode).
-
It looks like Sarchy (search provider based first on #YaCy, then on #Searx) is gone. That page goes to a domain parking page.
-
Looking at #LEB for a possible place to put another #YaCy instance. Disposable.
-
https://searchlab.eu/t/self-hosted-s3-buckets-for-distributed-data-collection/480
A #YaCy dev has an idea: peers can use #S3 buckets to store and share their index data.
I see one flaw right away: every peer would need to run a second VPS (and possibly more) with #min.io to provide the storage backend ... or rent such storage from Amazon or other cloud vendors.
-
But frankly, my interest in #YaCy is because of its peer to peer nature. I understand it wasn't good enough to run a search engine business with it. (As Sarchy found out.) Its results weren't great anyway.
-
I used to host a #YaCy p2p search node ( https://www.yacy.net/ ), and I intend to do so again in the future, but now I'm seeing "YaCy Grid" (a non-peer-to-peer search) is being developed. https://searchlab.eu/t/the-story-of-yacy-grid/48/17
-
Maybe #BING is a recursive acronym: BING Is No Good.
It’s been years since I touched Java, but someone should really be trying to improve #YaCy’s results. I’m still hoping to host a node on their FreeWorld search network again, but I cannot do so right now.
-
@mangeurdenuage I think the thing #YaCy needs most is algorithm improvements. It already crawls and indexes a significant subset of the things in the major search engines' indexes, but it fails to select correct enough results out of that index.
-
On search: I really want to host another #YaCy node, but YaCy's results are still not great. Even when my node was crawling & indexing sites in a particular field (at the time, FOSS SQL and NOSQL databases) weekly, searches for things in that field were infested with non-related results such that finding the desired answer was unreliable.
.
YaCy integrates #Solr ... so its results should be improvable.
-
@musicman I only indirectly encountered #Solr when I ran a #YaCy instance, so I have no opinion about Solr 6.
-
@musicman Good. I think the current implementation of the #YaCy search engine relies on #Solr.
-
@xj9 I don't see the context, but I like #YaCy's peer-to-peer search, despite their search algorithm not being anywhere close to #Google in the quality of results (probably close to #Bing, though). I have hosted YaCy instances in the past, in part to help improve the peer network's results by crawling and indexing sites related to databases, Java, Tcl, Python, PHP (the things I was searching for most often at the time).
I intend to host YaCy again (perhaps feeding a #Searx instance, so its results would not be wholly dependent on the goodwill of big corporate search engines).
-
Deleting my bookmark for https://sarchy.tech/ ... it was using #YaCy (where peers crawl and index), but now it uses #Searx (which depends on the goodwill of corpocentric search providers).
Admittedly, Searx is likely to give better results, but why use them when #DDG and #Startpage do the same things, only better?
(And yes, DuckDuckGo's results have gotten much worse lately. Still not as bad as using #Bing directly, but I'm leaning more to Startpage these days.)
-
@agnelvishal @xj9 Thank you. I do appreciate that. I’m still planning on running my own #YaCy node again. If for no other reason, I like running a crawling and indexing peer, to expand and hopefully improve search results.
-
@xj9 One thing I'd like to test is hosting a #YaCy instance on a !raspi like device + external storage. Though I think it might be unresponsive to searches during a periodic crawl & index.