In this post are the beginnings of riak-solr-client.
First, we need to take a look at the response from
Solr/Yokozuna/Riak-Search-2 for a single query. In this
case, the database only holds two records and we are doing a
query that matches all results (*:*
).
{"responseHeader":{"status":0,"QTime":11,"params":{"shards":"127.0.0.1:8093/solr/my_index","q":"*:*","127.0.0.1:8093":"_yz_pn:64 OR (_yz_pn:61 AND (_yz_fpn:61)) OR _yz_pn:60 OR _yz_pn:57 OR _yz_pn:54 OR _yz_pn:51 OR _yz_pn:48 OR _yz_pn:45 OR _yz_pn:42 OR _yz_pn:39 OR _yz_pn:36 OR _yz_pn:33 OR _yz_pn:30 OR _yz_pn:27 OR _yz_pn:24 OR _yz_pn:21 OR _yz_pn:18 OR _yz_pn:15 OR _yz_pn:12 OR _yz_pn:9 OR _yz_pn:6 OR _yz_pn:3","wt":"json"}},"response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[{"_yz_id":"data_my_bucket_name_12","_yz_rk":"name","_yz_rt":"data","_yz_rb":"my_bucket"},{"_yz_id":"data_my_bucket_second_15","_yz_rk":"second","_yz_rt":"data","_yz_rb":"my_bucket"}]}}
As you can see, I’ve decided to retrieve JSON because of my familiarity with Data.Aeson. There is no set schema for Solr responses, so we’re going to have to test this fairly well to make a generic library that will be updated over time as Solr updates.
With the sample response in mind, we can start to create the datatypes:
newtype Params = Params (Map String String) deriving (Show)data ResponseHeader = ResponseHeader {status :: Int,qTime :: Int,params :: Params} deriving (Show)data Docs = Docs {_yz_id :: String,_yz_rk :: String,_yz_rt :: String,_yz_rb :: String} deriving (Show)data Results = Results {numFound :: Int,start :: Int,maxScore :: Float,docs :: [Docs]} deriving (Show)data SolrResponse = SolrResponse {responseHeader :: ResponseHeader,response :: Results} deriving (Show)
The most interesting part of this is Params
. We’ve defined
Params
as a newtype
for a Data.Map
because, making an
educated guess, the keys for params won’t always be the
same. We can reach this conclusion by seeing that one of the
keys is an IP address (with a port number).
Another interesting piece of the response is the Docs
.
Solr seems to return the id in Solr (_yz
is presumably for
_yokozuna), the Riak Key (_yz_rk
), the Riak Bucket Type
_yz_rt
and the Riak Bucket (_yz_rb
). This is useful
information because we will need to use riak-haskell-client
(or more likely an updated fork for Riak 2.0) to retrieve
the actual data.
We can now write some JSON instances and basic http code to test. Here is the full file:
{-# LANGUAGE OverloadedStrings #-}import Data.Aesonimport Data.Mapimport Control.Applicativeimport Control.Monad (mzero)import Network.HTTP.Conduit -- the main module-- The streaming interface uses conduitsimport Data.Conduitimport Data.Conduit.Binary (sinkFile)import qualified Data.ByteString.Lazy.Char8 as Limport Control.Monad.IO.Class (liftIO)newtype Params = Params (Map String String) deriving (Show)instance FromJSON Params whereparseJSON val = Params parseJSON valdata ResponseHeader = ResponseHeader {status :: Int,qTime :: Int,params :: Params} deriving (Show)instance FromJSON ResponseHeader whereparseJSON (Object o) = ResponseHeader o .: "status"o .: "QTime"o .: "params"parseJSON _ = mzerodata Docs = Docs {_yz_id :: String,_yz_rk :: String,_yz_rt :: String,_yz_rb :: String} deriving (Show)instance FromJSON Docs whereparseJSON (Object o) = Docs o .: "_yz_id"o .: "_yz_rk"o .: "_yz_rt"o .: "_yz_rb"parseJSON _ = mzerodata Results = Results {numFound :: Int,start :: Int,maxScore :: Float,docs :: [Docs]} deriving (Show)instance FromJSON Results whereparseJSON (Object o) = Results o .: "numFound"o .: "start"o .: "maxScore"o .: "docs"parseJSON _ = mzerodata SolrResponse = SolrResponse {responseHeader :: ResponseHeader,response :: Results} deriving (Show)instance FromJSON SolrResponse whereparseJSON (Object o) = SolrResponse o .: "responseHeader"o .: "response"parseJSON _ = mzeromain :: IO ()main = do
and running main
in ghci gives us:
Just (SolrResponse {responseHeader = ResponseHeader {status = 0, qTime = 8, params = Params (fromList [("127.0.0.1:8093","_yz_pn:63 OR (_yz_pn:60 AND (_yz_fpn:60)) OR _yz_pn:59 OR _yz_pn:56 OR _yz_pn:53 OR _yz_pn:50 OR _yz_pn:47 OR _yz_pn:44 OR _yz_pn:41 OR _yz_pn:38 OR _yz_pn:35 OR _yz_pn:32 OR _yz_pn:29 OR _yz_pn:26 OR _yz_pn:23 OR _yz_pn:20 OR _yz_pn:17 OR _yz_pn:14 OR _yz_pn:11 OR _yz_pn:8 OR _yz_pn:5 OR _yz_pn:2"),("q","*:*"),("shards","127.0.0.1:8093/solr/my_index"),("wt","json")])}, response = Results {numFound = 2, start = 0, maxScore = 1.0, docs = [Docs {_yz_id = "data_my_bucket_name_11", _yz_rk = "name", _yz_rt = "data", _yz_rb = "my_bucket"},Docs {_yz_id = "data_my_bucket_second_14", _yz_rk = "second", _yz_rt = "data", _yz_rb = "my_bucket"}]}})
Success!
This is just a small start. Hopefully I'll be able to build
this out a bit more (two changes of note will be usage of
http-streams
and Lens
) and write a snaplet that
integrates well with
this Riak Snaplet
in a more generic fashion.