SoapUI_Dojo_936x110_04

Solr: The Most Important Open Source Project You’ve Never Heard Of

solr_logo

There are many open source programs that get a lot of press. And then there a few, like Solr, that are important to businesses but are also virtually unknown, even in geek circles.

When I did a story recently about in-demand open source jobs, I wasn’t surprised to hear from Dice that the job market was hot, hot, hot for OpenStack (for Big Data, Hadoop in particular) and for the LAMP stack (Linux, Apache, MySQL, PHP/Perl/Python). What did surprise me–indeed, shocked me–was that another red-hot tech jobs area was Solr.

“Solr?” I wondered. “What the heck is Solr?” That was also the reaction of all of my developer friends. And, since my buddies and I, among the lot of us, have centuries in the tech business, we’ve seen a lot of programs. That “Oh, did you say money?” item set me to do my research.  Now I’ll let you into the secrets of Solr.

One reason why Solr may not have gained the attention it deserves is it’s actually apart of another larger and much better known open source project, Apache Lucene. This, as I’m sure you know, is a Java-based text search engine library.

Lucene is used by many companies and groups as the foundation for their search engines. These organizations include AOL, Disney, and Eclipse. Lucene’s chief selling point is that the indexing engine, with a footprint of a mere megabyte of RAM, can index up to 150GBs per hour of text on commercial off-the-shelf hardware. That’s darn good!

Solr comes into the picture as the search platform front-end for Lucene. It provides full-text search, including the ability to handle such formats as Microsoft Word and PDF with Apache Tika; hit test highlighting; and faceted search, which incorporates free text searching with topic taxonomy indexing.

Like Lucene, Solr is very popular (even if I didn’t know about it before now). It’s used by sites such as Reddit, Netflix, and Instagram. These are all websites whose users won’t stand for slow response time. Solr can deliver the kind of performance that cranky users demand.

Under the hood, Solr is written in Java and it relies on Lucene for its core functionality.  It usually runs within a servlet container such as the Jetty HTTP server and Javax.servlet.

Solr has REST-like HTTP/XML and JavaScript Object Notation (JSON) APIs for ease of programming from almost any language. So, while you can work with Solr using its native Java, you also can use your language of choice. For example, query results can be returned in XML/XSLT, JSON, Python, Ruby, PHP, Velocity, CSV, or binary formats. You can use this data with whatever package strikes your fancy.

While Solr is built on Lucene, it also expands upon it. For instance, it supports sharded data, geospatial search, and user extensible caching. The end-result is a very fast and flexible back-end DBMS for almost any Web search engine job.

With its exhaustive documentation, the program promises to make it easy to get up to speed. As for administrators, with an AJAX-based administration interface and comprehensive logging facilities, Solr is simple to manage.

While Solr is clearly useful and easy, clearly not enough Solr experts are out there. According to Dice on September 11th, 2013, there were no fewer than 318 Solr jobs listed. Many of these job listings had a phrase like, “Solr experience is a must.” If you’re interested in pursuing this in-demand open source job skill, big data experience is a real plus. In particular, Hadoop and its close relative, Hbase, were frequently mentioned. And, of course, if you can do all this on a cloud architecture, that’s a real plus.

So, in short, if you’d like a programming job sooner rather than later, don’t be like me and my buddies. Learn about Solr today so your resume will look better tomorrow.

How about you?  Have you heard good things about Solr, or just nothing at all?  Do you have Solr experience, and does it help you in your career?  Let us know in the comments.

See also:

baf1f7e1-0487-40e7-8c85-0c42c908dc8d

subscribe-3

Comments

  1. shalinmangar says:

    The exhaustive documentation link in the article should point to the new Solr Reference Guide which was donated by LucidWorks.

    https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

    Also for people in Europe, Lucene Revolution EU is in November featuring beginner to expert level sessions from the best in the business. The early bird discounts end soon.

    Conference: http://www.lucenerevolution.org/

    Sessions: http://www.lucenerevolution.org/sessions

  2. *ANY* good developer knows about Solr. Actually some of use are moving away from Solr and going for another alternative (that also uses Lucene underneath) called ElasticSearch. None of those are news, though.

    • It’s funny: When we know about something we tend to believe that everyone does. That’s true whether it’s a new restaurant (“Wow, you haven’t been there?!”) or TV show (“How could you not watch that?!”).

      But not everyone knows what you know. sjvn and I really did canvas several open source developers before we decided to do this story. And nobody had heard of it.

      If you do anything with search, sure; I guess it’s in the list of usual suspects. But if your attention is on another phase of web development, it’d be easy to miss. Just as I might know a lot about mystery novels and not be able to name every recent Hugo-award winner; it doesn’t make me less of a reader, just not exactly the same kind as you. (But drat, now I’m trying to think how many recent Hugo award winners I can name.)

  3. Any developer worth their salt would be knowledgeable about the majority of Apache projects.

    “The Most Important Open Source Project *You’ve* Never Heard Of”. Please don’t tar us all with the same brush.

    • Steven Vaughan-Nichols says:

      They should, but a surprising number don’t. In the case of Solr, to paraphrase one reader, “When it was first out it was awful, so I didn’t pay attention. Boy, am I glad you brought it back into the light.”

    • Given that there are over 100 Apache projects, that’s a fairly unreasonable ask

  4. I worked with one of the committers to Solr and was impressed with it. Now I have become involved with a search project and will be looking into it further. This was a timely reminder. Thanks

Speak Your Mind

*