import kagi from "../img/blog/research-tools/kagi.webp";
import perplexity from "../img/blog/research-tools/perplexity.webp";
import connectedPapers from "../img/blog/research-tools/connected-papers.webp";
import elicitColumns from "../img/blog/research-tools/elicit-columns.webp";
import elicitQuery from "../img/blog/research-tools/elicit-query.webp";
import laptop from "../img/blog/laptop-yellow.webp";

import CustomModal from "../components/CustomModal";

import Post from "../components/blog/Post";
import PostImage from "../components/blog/PostImage";

import { useModal } from "../helpers/utils";

const POSTNAME = "research-tools";

export default function ResearchTools() {
  return <Post postName={POSTNAME} rest={<Rest />} />;
}

const Rest = () => {
  const { showModal, image, alt, handleShowClick } = useModal();

  return (
    <>
      {showModal && <CustomModal image={image} alt={alt} handleShowClick={handleShowClick} />}

      <p>
        Maintaining a competitive edge in Data Management isn't just about technical acumen or leadership ability.
        Delivering value requires continuously learning how to take advantage of new tools. Staying on top of the latest
        developments is particularly advantageous given recent advancements in AI.
      </p>

      <PostImage postName={POSTNAME} />

      <p>
        Deep learning techniques are transforming the field of data analytics, and AI coding assistants such as{" "}
        <a href="https://github.com/features/copilot" target="_blank" rel="noreferrer">
          GitHub Copilot
        </a>{" "}
        are augmenting the work of software engineers. Some analysts and programmers might worry that innovations like
        these could make some of their professional skills obsolete, but the wisest professionals understand how to keep
        up with the changing landscape by using these tools to augment their skills.
      </p>
      <p>
        How can Data Management professionals do the same? While the work of a Data Strategist might be less technical
        than that of a software engineer or a data analyst, that doesn't mean you can't benefit from technical tools
        too.
      </p>
      <p>
        This article presents a curated list of tools to find information online. You can leverage these tools to
        enhance your productivity when you need to find information online. Whether that involves general web searches
        or diving deep into academic research, having the right tools at your disposal can help you succeed.
      </p>
      <br />
      {/* TODO */}
      <h4>Contents</h4>
      <ul>
        <li>
          <a href="#web-search">Web Search</a>
          <ul style={{ marginBottom: 0 }}>
            <li>
              <a href="#google-tricks">Google tricks</a>
            </li>
            <li>
              <a href="#kagi">Kagi</a>
            </li>
            <li>
              <a href="#perplexity">Perplexity</a>
            </li>
            <li>
              <a href="#metaphor">Exa</a>
            </li>
            <li>
              <a href="#finding-dead-links">Finding dead links</a>
            </li>
          </ul>
        </li>
        <li>
          <a href="#research">Research</a>
          <ul>
            <li>
              <a href="#arXiv">arXiv</a>
            </li>
            <li>
              <a href="#semantic-scholar">Semantic Scholar</a>
            </li>
            <li>
              <a href="#zotero">Zotero</a>
            </li>
            <li>
              <a href="#unpaywall">Unpaywall</a>
            </li>
            <li>
              <a href="#consensus">Consensus</a>
            </li>
          </ul>
        </li>
        <li>
          <a href="#datasets">Datasets</a>
          <ul>
            <li>
              <a href="#google-dataset-search">Google Dataset Search</a>
            </li>
            <li>
              <a href="#kaggle">Kaggle</a>
            </li>
            <li>
              <a href="#data-gov">Data.gov</a>
            </li>
          </ul>
        </li>
        <li>
          <a href="#conclusion">Conclusion</a>
        </li>
        <li>
          <a href="#related-posts">Related Posts</a>
        </li>
      </ul>

      <h2 id="web-search">Web Search</h2>
      <p>
        You may have noticed Google's search results have become less useful over time given that specialists known as
        Search Engine Optimization (SEO) engineers seem to have flooded the web with tons of mediocre content. This
        content serves little useful purpose, but because it is tuned to game the search algorithm, it shows up at the
        top of your results.
      </p>
      <p>
        The proliferation of SEO-engineered content has accelerated with the development of generative AI, which enables
        SEO engineers to scale up their efforts by an order of magnitude. Fortunately, it's still possible to cut
        through the noise if you have the right tools at your disposal.
      </p>
      <h4 id="google-tricks">Google tricks</h4>
      <p>
        For example, did you know you can filter for specific websites or keywords on Google? Appending
        <code>site:</code>
        to a website domain will return only results from that domain (e.g., appending), putting keywords in quotes (
        <code>"cybersecurity"</code>) will return only results containing those keywords, and prefixing keywords with a
        hyphen (<code>-cybersecurity</code>) will <em>exclude</em> results with those keywords. To learn many more small
        tricks like these, we recommend this{" "}
        <a target="_blank" rel="noreferrer" href="https://gwern.net/search">
          in-depth guide
        </a>{" "}
        by Gwern Branwen.
      </p>

      <h4 id="kagi">Kagi</h4>
      <p>
        You can also ditch Google altogether with alternatives such as{" "}
        <a target="_blank" rel="noreferrer" href="https://kagi.com/">
          Kagi
        </a>
        , a subscription-based search engine that comes with niceties and gives you more control over your search
        experience. It has no ads, combines listicles into a separate group from the other results, and enables you to
        block or boost specific domains across all of your searches.
      </p>

      <figure>
        <img src={kagi} alt="Kagi search engine" onClick={handleShowClick} />
        <figcaption>
          Screenshot of{" "}
          <a target="_blank" rel="noreferrer" href="https://kagi.com/">
            Kagi
          </a>{" "}
          search engine
        </figcaption>
      </figure>

      <h4 id="perplexity">Perplexity</h4>
      <p>
        <a target="_blank" rel="noreferrer" href="https://www.perplexity.ai/">
          Perplexity
        </a>{" "}
        is another "smarter" search engine: it's like Google's quick answer feature on steroids. It uses AI to
        automatically read the websites most relevant to your query and write a report for you with its findings,
        complete with citations.
      </p>
      <figure>
        <img src={perplexity} alt="Perplexity search engine" onClick={handleShowClick} style={{ cursor: "pointer" }} />
        <figcaption>
          Screenshot of{" "}
          <a target="_blank" rel="noreferrer" href="https://www.perplexity.ai/">
            Perplexity
          </a>{" "}
          search engine
        </figcaption>
      </figure>

      <h4 id="metaphor">Exa</h4>
      <p>
        If you have a trickier query in mind,{" "}
        <a target="_blank" rel="noreferrer" href="https://search.metaphor.systems/">
          Exa
        </a>{" "}
        is a handy search engine for hunting down specific websites or answering niche questions. It gives you a
        paragraph box to describe what you're looking for and uses advanced{" "}
        <a target="_blank" rel="noreferrer" href="https://en.wikipedia.org/wiki/Natural_language_processing">
          natural language processing
        </a>{" "}
        to help deliver the exact results you're looking for.
      </p>

      {/* <figure>
            <img src={metaphor} alt="Metaphor search engine" onClick={handleShowClick} style={{ cursor: "pointer" }} />
            <figcaption>
              Screenshot of{" "}
              <a target="_blank" rel="noreferrer" href="https://search.metaphor.systems/">
                Metaphor
              </a>{" "}
              search engine
            </figcaption>
          </figure> */}

      <h4 id="finding-dead-links">Finding dead links</h4>
      <p>
        Lastly, as you explore the online jungle, it's common to run into links that don't work anymore. The next time
        you click on a promising resource and find yourself staring at a page that reads "404: Page Not Found", you can
        use the Internet Archive's{" "}
        <a target="_blank" rel="noreferrer" href="https://archive.org/web/">
          Wayback Machine
        </a>{" "}
        to retrieve older versions of dead links. The Internet Archive is an organization that makes backups of public
        websites to prevent them from becoming lost to the sands of time, providing an invaluable service to the digital
        explorer.
      </p>
      <p>
        For even more convenience, you can download{" "}
        <a
          target="_blank"
          rel="noreferrer"
          href="https://chromewebstore.google.com/detail/web-archives/hkligngkgcpcolhcnkgccglchdafcnao"
        >
          this open-source browser extension
        </a>{" "}
        which gives you quick access to the Wayback Machine's mirror of the current page you're on, as well as any{" "}
        <a target="_blank" rel="noreferrer" href="https://en.wikipedia.org/wiki/Search_engine_cache">
          cached versions
        </a>{" "}
        of the page created by search engines like Google, if they exist.
      </p>
      <h2 id="research">Research</h2>
      <p>
        While the previous tools are useful for general research online, a different set of techniques is required to
        find useful information in academic literature.
      </p>
      <p>
        Reviewing papers allows data managers to benefit from the rigorous yet cutting-edge knowledge generated by the
        academic community, ultimately improving their data management practices. They can find potential sources of
        data, identify emerging trends in data management, or learn evidence-tested business techniques they can use to
        stand out from their competitors.
      </p>
      <h4 id="arxiv">arXiv</h4>
      <p>
        <a target="_blank" rel="noreferrer" href="https://arxiv.org/">
          arXiv
        </a>{" "}
        provides open access to papers from mostly quantitative fields. You can use{" "}
        <a target="_blank" rel="noreferrer" href="https://arxivxplorer.com/">
          arXiv Xplorer
        </a>{" "}
        to find papers that are semantically similar to a search query. This search engine uses natural language
        processing to attempt to understand the semantic meaning of your query. It then uses the text in the papers
        themselves, not just the abstract and title, to unearth relevant papers.
      </p>
      <p>
        Another way to explore arXiv's database is to use{" "}
        <a target="_blank" rel="noreferrer" href="https://paperscape.org/">
          Paperscape
        </a>
        , a more visual tool. It uses a graph layout to show you connections between papers and their citations and
        references.
      </p>
      <h4 id="semantic-scholar">Semantic Scholar</h4>
      <p>
        For papers that are not in arXiv, you can use{" "}
        <a target="_blank" rel="noreferrer" href="https://www.semanticscholar.org/">
          Semantic Scholar
        </a>{" "}
        to search through a larger swath of academia. It gives you more control over your search than Google Scholar and
        provides you with an integrated paper viewer that keeps track of acronym definitions and references.
      </p>

      <p>
        <a target="_blank" rel="noreferrer" href="https://www.connectedpapers.com/">
          Connected Papers
        </a>{" "}
        is another tool that's similar to Paperscape in that it visualizes connections between related papers, but it's
        based on Semantic Scholar's database instead of arXiv's.
      </p>

      <figure>
        <img src={connectedPapers} alt="Connected Papers" onClick={handleShowClick} style={{ cursor: "pointer" }} />
        <figcaption>
          Screenshot of{" "}
          <a target="_blank" rel="noreferrer" href="https://www.connectedpapers.com/">
            Connected Papers
          </a>
        </figcaption>
      </figure>

      <p>
        Last but not least, the author's personal favorite tool for finding research papers is{" "}
        <a target="_blank" rel="noreferrer" href="https://elicit.com">
          Elicit
        </a>
        . This is an AI-powered paper search engine that makes literature reviews much easier. It extracts key elements
        of relevant papers, such as the number of participants in an experiment or the main conclusion, into a table,
        and writes a summary of the findings from across the papers for you, much like Perplexity.
      </p>

      <div style={{ display: "flex", maxWidth: 1000, gap: 12 }}>
        <figure>
          <img
            src={elicitColumns}
            alt="Elicit columns"
            style={{ maxWidth: 494, cursor: "pointer" }}
            onClick={handleShowClick}
          />
          <figcaption>
            Screenshot of{" "}
            <a target="_blank" rel="noreferrer" href="https://elicit.com">
              Elicit
            </a>{" "}
            columns
          </figcaption>
        </figure>

        <figure>
          <img
            src={elicitQuery}
            alt="Elicit query"
            style={{ maxWidth: 494, cursor: "pointer" }}
            onClick={handleShowClick}
          />
          <figcaption>
            Screenshot of{" "}
            <a target="_blank" rel="noreferrer" href="https://elicit.com">
              Elicit
            </a>{" "}
            query
          </figcaption>
        </figure>
      </div>

      <h4 id="zotero">Zotero</h4>
      <p>
        <a href="https://www.zotero.org/" target="_blank" rel="noreferrer">
          Zotero
        </a>{" "}
        is a multi-platform program that helps you collect, organize, and annotate academic papers. It's the de facto
        standard reference management tool for academics, so it integrates with many other programs, like browsers and
        note-taking applications.
      </p>
      <h4 id="unpaywall">Unpaywall</h4>
      <p>
        In the previous section, we mentioned a browser extension that enables you to quickly find backups of webpages
        that have gone offline. A similar situation for an academic researcher is running into a paywall that blocks
        their ability to access a paper that would aid in their research. Fortunately, while it can't grant you access
        to papers that are truly only for those who pay, consider the{" "}
        <a target="_blank" rel="noreferrer" href="https://unpaywall.org/">
          Unpaywall
        </a>{" "}
        browser extension to quickly navigate to listings of given papers in legal open-access repositories.
      </p>
      <h4 id="consensus">Consensus</h4>
      <p>
        <a target="_blank" rel="noreferrer" href="https://consensus.app/search/">
          Consensus
        </a>{" "}
        is a powerful AI search engine that uses OpenAI’s GPT-4 to search amongst over 200 million academic papers. It
        supports users to find academic publications for questions about the relationship between concepts, yes/no
        questions, the effects of a concept, and more.
      </p>

      <figure>
        <img src={laptop} alt="woman with a laptop" />
        <figcaption>
          Photo by <a href="https://www.pexels.com/@tatianasyrikova/">Tatiana Syrikova</a> on{" "}
          <a href="https://www.pexels.com/photo/from-above-crop-female-typing-on-keyboard-of-computer-near-tea-on-planner-at-home-3975586/">
            Pexels
          </a>
        </figcaption>
      </figure>
      <h2 id="datasets">Datasets</h2>
      <p>
        Many data practitioner roles could benefit from access to the wide range of high-quality, publicly available
        datasets out there. The following tools offer access to datasets on various subjects and from different sources.
      </p>
      <h4 id="google-datasets-search">Google Data Search</h4>
      <p>
        <a target="_blank" rel="noreferrer" href="https://datasetsearch.research.google.com/">
          Google Data Search
        </a>{" "}
        functions as a comprehensive search engine for datasets, enabling users to find datasets published across the
        web. Leveraging metadata from dataset repositories that adopt standard schema.org markup, it offers a broad view
        of available data across various subjects and disciplines.
      </p>
      <h4 id="kaggle">Kaggle</h4>
      <p>
        <a target="_blank" rel="noreferrer" href="https://www.kaggle.com/datasets/">
          Kaggle
        </a>{" "}
        is a Data Science competition platform where users can upload their own datasets for others to download and use.
        Datasets exist for various different categories from movie ratings to credit card fraud trends, and often come
        with descriptions of the dataset’s key features.
      </p>
      <h4 id="data-gov">Data.gov</h4>
      <p>
        Under the{" "}
        <a href="https://data.gov/open-gov/" target="_blank" rel="noreferrer">
          OPEN Government Data Act
        </a>
        , the US government is required to make its data publicly available. At the time of writing,{" "}
        <a target="_blank" rel="noreferrer" href="https://Data.gov">
          Data.gov
        </a>{" "}
        gives access to just under 300,000 datasets from across county, city, state, and federal government entities.
      </p>
      <h2 id="conclusion">Conclusion</h2>
      <p>
        Success in Data Management requires staying on top of the latest advancements in technology, including the
        latest tools to enhance your workflow. The evolution of tools, driven by advancements in technologies such as
        generative AI, presents both challenges and opportunities for ambitious data practitioners.
      </p>
      <p>
        The challenges lie in adapting to the ever-changing technological environment. raditional skills risk become,
        new skills can be difficult to learn, and staying up-to-date with the latest developments can be time-consuming.
      </p>
      <p>
        The opportunities come from leveraging powerful research tools to enhance productivity. Using substantially
        faster and more reliable tools to find information increases a data practitioner’s work efficiency and quality,
        giving them a competitive edge over those not using such tools.
      </p>
      <p>
        Embracing cutting-edge research tools is not just about staying relevant; it's about enhancing one's ability to
        manage, analyze, and leverage data in innovative ways. By integrating these resources into their workflow, data
        practitioners can ensure they remain at the forefront of their field.
      </p>

      <p>
        <em>
          Note: a version of this post originally appeared on the Roman's{" "}
          <a href="https://roman.computer/information/" target="_blank" rel="noreferrer">
            personal blog
          </a>
          .
        </em>
      </p>
    </>
  );
};
