Build a Reddit Search Engine in 10 minutes with RelevanceAI

Summary

This tutorial contains step by step instructions to building a Reddit search engine in a few minutes using the RelevanceAI Discovery Database.

The RelevanceAI database is free to use if your usage is within our Free Tier.

  1. First, we create a new node project and upload data to RelevanceAI using our Javascript SDK.
  2. Second, we will Retrieve and display data with a React web app.

Prerequisites

Knowledge of these tools will help you complete this walkthrough, but are not essential.

  • npm
  • npx
  • node.js
  • typescript
  • react

Uploading the data

If you don't have a relevanceAI account yet, please create an account here. Creating an account is free, and you won't pay unless you exceed our free tier.

First we need to grab our API Credentials so we can upload data to RelevanceAI.

  1. In the RelevanceAI Dashboard, go to Settings.
  2. Click on the 'API' tab. You will see your 'Api Project' and 'Api Key' which you can use to upload data.

Download a dataset of reddit posts that we will search through here .
Here is a direct link to download the dataset.

Next, we use Node.js to upload the data.

First, we create a new node project and install The Relevance SDK using npm.

mkdir redditbackend
cd redditbackend
npm init -y
npm i @relevanceai/sdk typescript

We need to authenticate with Relevance AI using or project and project key that we obtained earlier. For security best practice, we recommend you do this by setting environment variables. Below we provide a template that you can fill out and paste into your console before you run the node script.

export RELEVANCE_PROJECT=##################
export RELEVANCE_API_KEY=######################################

Next, we create a Node.js script and import some libraries. We can call this script 'uploadredditdata.ts'. We will put the following code in this file.

import fs from 'fs';
import readline from 'readline';
import {DiscoveryClient} from '@relevanceai/sdk';

Here's a function to load the reddit data into a list. the file name can be customised to point to the file you downloaded earlier.

async function processLineByLine(linecount:number) {
  const fileStream = fs.createReadStream('tifu_all_tokenized_and_filtered.json');
  const rl = readline.createInterface({input: fileStream,crlfDelay: Infinity});
  let i = 0;
  let final:any[] = [];
  for await (const line of rl) {
    if (i >= linecount) break;
    final.push(JSON.parse(line));
    i++;
  }
  return final;
}

Finally, we insert the data into the RelevanceAI Database into a dataset named 'reddit-tifu'

async function insertData(){
  const redditDataset = (new DiscoveryClient()).dataset('reddit-tifu');
  const items = await processLineByLine(5000);
  for (const item of items) {
    item._id = item.id;
  }
  const res = await redditDataset.insertDocuments(items);
  console.log(res);

}
insertData();

Putting it all together:

import fs from 'fs';
import readline from 'readline';
import {DiscoveryClient} from '@relevanceai/sdk';

async function processLineByLine(linecount:number) {
  const fileStream = fs.createReadStream('tifu_all_tokenized_and_filtered.json');
  const rl = readline.createInterface({input: fileStream,crlfDelay: Infinity});
  let i = 0;
  let final:any[] = [];
  for await (const line of rl) {
    if (i >= linecount) break;
    final.push(JSON.parse(line));
    i++;
  }
  return final;
}

async function insertData(){
  const redditDataset = (new DiscoveryClient()).dataset('reddit-tifu');
  const items = await processLineByLine(5000);
  for (const item of items) {
    item._id = item.id;
  }
  const res = await redditDataset.insertDocuments(items);
  console.log(res);

}
insertData();

Finally, we run the script using npx and ts-node. This should take under a minute.

npx ts-node uploadredditdata.ts

Build the Search App

First, we need to get a Read Only Api Key for our dataset so that we can let users search our dataset.

  1. In The RelevanceAI Dashboard, open the dataset 'reddit-tifu'.
  2. Find the 'Search' link under the 'Apps' heading and click on it.
  3. Click 'API', and copy the read API key for the dataset.

Next we create a new typescript react application similarly to this tutorial.

npx create-react-app redditsearch --template typescript
cd redditsearch
npm i @relevanceai/sdk
npm start

If successful, a React application will pop up. We will customise this to show our reddit post dataset.

Next, find the file 'App.tsx' under the src/ directory and replace it with the code below.

You need to replace 'project' with the same 'project' value used for uploading data in node, and api_key should be the read-only api key you obtained above.

🚧

Make sure to use the read-only api key, not the write api key in your frontend application code.

import React from 'react';
import {DiscoveryClient,QueryBuilder,FastSearchOutput} from '@relevanceai/sdk';
const reddittifu = (new DiscoveryClient({project:'##########',api_key:'#############################'})).dataset('reddit-tifu')
function App() {
  const [redditData,setRedditData] = React.useState<{num_comments:number,score:number,created_utc:number,url:string,id:string,selftext_without_tldr:string,trimmed_title:string,selftext_html:string,title:string}[]>([]);
  const [searchResponse,setSearchResponse] = React.useState<FastSearchOutput>({results:[],resultsSize:0,aggregates:{},aggregations:{},aggregateStats:{}});
  const [searchValue,setSearchValue] = React.useState('');
  const [sort,setSort] = React.useState<{field:string,dir:'desc'|'asc'}>({field:'score',dir:'desc'});
  React.useEffect(() => {
    (async () => {
      const queryBuilder = QueryBuilder()
      if (searchValue.length) queryBuilder.query(searchValue).text();
      else {
        if(!searchValue.length) queryBuilder.sort(sort.field,sort.dir);
      }
      const res = await reddittifu.search(queryBuilder);
      setRedditData(res.results as any[]);
      console.log(res.results[0])
      setSearchResponse(res);
    })();
  },[searchValue,sort]);
  const sortSettings:{name:string,dir:'asc'|'desc',field:string}[] = [{name:'Score',field:'score',dir:'desc'},{name:'Post Date',field:'created_utc',dir:'desc'}];
  return (
    <div style={{display:'flex',flexDirection:'column',alignItems:'center'}}>
      <div style={{padding:'20px'}}>Search Posts <input onChange={(e) => setSearchValue(e.target.value)} value={searchValue} />
      </div>
      <div style={{display:'flex'}}>
        {sortSettings.map(({name,dir,field}) => {
          return <div style={{border:'1px solid black',margin:'4px',padding:'2px'}} onClick={() => {setSort({dir,field});setSearchValue('')}}>Sort By {name}</div>
        })}
      </div>
      <div>{searchResponse.resultsSize} results</div>
      {redditData.map(({title,selftext_without_tldr,url,score,num_comments,created_utc}) => {
        return <div style={{display:'flex',flexDirection:'column',justifyContent:'center',alignItems:'center',paddingTop:'40px'}}>
          <div>{new Date(created_utc*1000).toDateString()}</div>
          <div style={{padding:'10px',fontSize:'20px'}}>{score} <a href={url}>{title}</a></div>
          <div style={{maxWidth:'1000px',}}>{selftext_without_tldr}</div>
          <div style={{maxWidth:'1000px',}}>{num_comments} comments</div>
          </div>
      })}
    </div>
  );
}

export default App;

You might need to open the the page localhost:8080 in your web browser.
If all goes well, you now have a built a Reddit Search Engine! Heres what it will look like:

React Code Breakdown

Heres how we initialise the RelevanceSDK for search:

import React from 'react';
import {DiscoveryClient,QueryBuilder,FastSearchOutput} from '@relevanceai/sdk';
const reddittifu = (new DiscoveryClient({project:'##########',api_key:'#############################'})).dataset('reddit-tifu')

We use the RelevanceSDK QueryBuilder tool to search and sort the reddit posts.

const queryBuilder = QueryBuilder()
if (searchValue.length) queryBuilder.query(searchValue).text();
else {
  if(!searchValue.length) queryBuilder.sort(sort.field,sort.dir);
}
const res = await reddittifu.search(queryBuilder);

Did this page help you?