Tiny Search Engine

A web crawler and search system built in C

Role

Software Engineer

Course

CS 50: Software Implementation and Design, Dartmouth College

Tools

C, Unix

Duration

3 weeks
Spring 2023

Overview

I built a search engine application that crawls webpages, indexes them, and allows users to query through an interface, ranking the pages by relevance. It consists three subsystems:

  1. Crawler: crawls the web from a seed URL to a given maxDepth and caches the content of the pages it finds in a given directory

  2. Indexer: reads files from the given directory, builds an index that maps from words to URLs and writes that index to a given file.

  3. Querier: returns webpages from query expressed as a set of words (optionally combined by AND, OR), and outputs a ranked list of URLs in which the given combination of words appear.

Please contact me at emiko.rohn@gmail.com to see the GitHub repo.

Learnings from this 10-week sprint

System Architecture

Building this system from scratch taught me the importance of modular design. Breaking down the search engine into the crawler, indexer, and querier made the implementation manageable.

System Architecture

Building this system from scratch taught me the importance of modular design. Breaking down the search engine into the crawler, indexer, and querier made the implementation manageable.

Memory Management

This was my first project dealing with memory allocation! I learned to use valgrind for debugging memory leaks and developed strong practices for managing dynamic data structures.

Memory Management

This was my first project dealing with memory allocation! I learned to use valgrind for debugging memory leaks and developed strong practices for managing dynamic data structures.

Data Structures

I improved my understanding of data structures like hash tables and queues. Choosing the right structure was crucial to optimize performance.

Data Structures

I improved my understanding of data structures like hash tables and queues. Choosing the right structure was crucial to optimize performance.

Testing

I practice creating test cases before implementation, which proved to be invaluable. Unit testing helped catch edge cases early and ensured reliability.

Testing

I practice creating test cases before implementation, which proved to be invaluable. Unit testing helped catch edge cases early and ensured reliability.

emiko.rohn@gmail.com

click to copy

emiko.rohn@gmail.com

click to copy

emiko.rohn@gmail.com

click to copy