SEI-Example
  • Introduction
  • About These Notes
  • Syllabus
  • Development Workflow
    • Installfest
      • Mac OSX
      • Linux
      • Git Configuration
      • Sublime Packages
    • Command Line
      • The Terminal
      • Filesystem Navigation
      • File Manipulation
      • Additional Topics
    • Intro to Git
      • Version Control
      • Local Git
      • Remote Git
      • Git Recipes
    • Group Collaboration
      • Git Workflows
      • Project Roles and Tools
    • VS Code Tips & Tricks
  • HTML/CSS
    • HTML
    • CSS Selectors
    • CSS Box Model and Positioning
      • Box Model
      • Display and Positioning
      • Flexbox
      • Grid
      • Flexbox & Grid Games
      • Floats and Clears
      • Additional Topics
    • Advanced CSS
      • Responsive Design
      • Pseudo-Classes/Elements
      • Vendor Prefixes
      • Custom Properties
      • Additional Topics
    • Bootstrap
    • CSS Frameworks
    • Accessibility
  • JavaScript
    • Primitives
    • Arrays
    • Objects
    • Control Flow
      • Boolean Expressions
      • Conditionals
      • Loops
      • Promises
    • Functions
      • Callbacks
      • Timing Functions
      • Iterators
    • DOM and Events
    • DOM Manipulation
    • HTML5 Canvas
    • How To Reduce Redundancy
    • (2019) JavaScript OOP
    • (2016) OOP with Classes
    • (1995) OOP with Prototypes
      • Constructors
      • Prototypes
    • Intro to TDD
    • Scoping
    • Inheritance
      • Prototypal Inheritance
      • Call, Apply, and other Functions
      • ES6 Inheritance
      • Resources
    • Custom Node Modules
    • Additional Topics
      • AJAX, Fetch, and Async/Await
      • AJAX w/JSON and Localstorage
        • AJAX w/JSON
        • Local Storage
      • Async module
      • Data Scraping
  • jQuery
    • Intro
      • DOM Manipulation
      • Reddit Practice
      • Styling
      • Events
    • Plugins
    • AJAX
  • APIs
    • Fetch
    • AJAX w/jQuery
    • AJAX w/Fetch
  • Databases
    • Intro to SQL
    • Advanced SQL
    • MongoDB
      • Intro to NoSQL
      • CRUD in MongoDB
      • Data Modeling
      • Intermediate Mongo
  • Node/Express
    • Node
      • Intro to Node
      • Node Modules
      • Node Package Manager (NPM)
    • Express
      • Intro to Express
        • Routes
        • Views
        • Templates
        • Layouts and Controllers
        • CRUD & REST
          • Get and Post
          • Put and Delete
      • APIs with Express (request)
      • APIs with Express (axios)
    • Sequelize
      • Terminology
      • Setup
      • Using Models
      • Seeding Data
      • Validations and Migrations
      • Resources
      • 1:M Relationships
      • N:M Relationships
    • Express Authentication
      • Research Components
      • Code Components
      • Auth in Theory
        • Sessions
        • Passwords
        • Middleware
        • Hooks
      • Auth in Practice
        • Create the User
        • User Signup
        • Sessions
        • User Login
        • Authorization and Flash messages
    • Testing with Mocha and Chai
    • Mongoose
      • Mongoose Associations
    • JSON Web Tokens
      • Codealong
    • Additional Topics
      • oAuth
      • Geocoding with Mapbox
      • Geocoding and Google Maps
      • Cloudinary
      • Websockets with Socket.io
      • SASS
  • Ruby
    • Intro to Ruby
    • Ruby Exercises
    • Ruby Classes
    • Ruby Testing with Rspec
    • Ruby Inheritance
    • Ruby Data Scraping
  • Ruby on Rails
    • Intro to Rails
    • APIs with Rails
    • Asset Pipeline
    • Rails Auth and 1-M
      • Auth Components
    • Rails N:M
    • ActiveRecord Polymorphism
    • Additional Topics
      • oAuth
      • SASS
      • Rails Mailers
      • Cloudinary
      • Jekyll
  • React (Updated 2019)
    • ES6+/ESNext
      • Const and Let
      • Arrow Functions
      • Object Literals and String Interpolation
      • ES6 Recap
      • ES6 Activity
    • Intro to React
      • Create React App
      • Components and JSX
      • Virtual DOM
      • Props
      • Dino Blog Activity
      • Nested Components
      • Lab: LotR
    • React State
      • Code-Along: Mood Points
      • Code-Along: Edit Dino Blog
      • Lab: Simple Calc
      • Lifting State
    • React Router
      • Browser History/SPAs
      • React Router (lesson and full codealong)
      • Router Lab
    • Fetch and APIs
      • APIs with Fetch and Axios
      • Fetch the Weather
    • React Hooks
    • React LifeCycle
      • Lab: Component LifeCycle
    • React Deployment
    • Additional Topics
      • React Frameworks
        • Material UI Theming
      • Typescript
        • More Types and Syntax
        • Tsconfig and Declaration Files
        • Generics with Linked List
      • Redux
      • TypeScript
      • Context API
      • React Native
  • Meteor
  • Deployment and Config
    • Deploy - Github Pages
    • Deploy - Node/Sequelize
    • Deploy - Node/MongoDB
    • Deploy React
    • Deploy - Rails
      • Foreman (Environment Variables)
    • Deploy - AWS Elastic Beanstalk
    • Deploy - S3 Static Sites
    • Deploy - Django
    • Deploy - Flask
  • Data Structures and Algorithms
    • Recursion
    • Problem Solving - Array Flatten
    • Binary Search
    • Algorithm Complexity
    • Stacks and Queues
    • Bracket Matching
    • Ruby Linked Lists
      • Sample Code
      • Beginner Exercises
      • Advanced Exercises
    • JS Linked Lists
      • Sample Code
      • Beginner Exercises
      • Beginner Solutions
    • Hash Tables
    • Intro to Sorting
    • Insertion Sort
    • Bucket Sort
    • Bubble Sort
    • Merge Sort
    • Quick Sort
    • Heap Sort
    • Sorting Wrapup
    • Hashmaps
    • Trees and Other Topics
  • Python
    • Python Installation
    • Intro to Python
    • Python Lists
    • Python Loops
    • Python Dictionaries
    • Python Sets and Tuples
    • Python Cheatsheet
    • Python Functions
    • Python Classes
    • Python Class Inheritance
    • Intro to Flask
    • Intro to SQLAlchemy
      • Flask and SQLAlchemy
    • Using PyMongo
    • Intro to Django
    • CatCollector CodeAlong
      • URLs, Views, Templates
      • Models, Migrations
      • Model Form CRUD
      • One-to-Many Relations
      • Many-to-Many Relations
      • Django Auth
    • Django Cheatsheet
    • Django Auth
    • Django Polls App Tutorial
    • Django School Tool Tutorial
    • Django 1:M Relationships
    • Custom Admin Views
    • Data Structures and Algorithms
      • Recursion
      • Binary Search
      • Stacks and Queues
      • Linked Lists
      • Binary Trees
      • Bubble Sort
      • TensorFlow & Neural Networks
    • Adjacent Topics
      • Raspberry Pi
      • Scripting
  • Assorted Topics
    • History of Computer Science
    • Regular Expressions
    • Intro to WDI (Course Info)
    • Being Successful in WDI
    • Internet Fundamentals
      • Internet Lab
    • User Stories and Wireframing
      • Wireframing Exercise: Build an Idea
    • Post WDI
      • Learning Resources
      • Deliverables -> Portfolio
      • FAQ
  • Projects
    • Project 1
    • Project 2
    • Project 3
      • Project 3 Pitch Guidelines
    • Project 4
    • Past Projects
      • Project 1
      • Project 2
      • Project 3
      • Project 4
      • Portfolios
    • Post Project 2
    • MEAN Hackathon
      • Part 1: APIs
      • Part 2: Angular
    • Portfolio
  • Web Development Trends
  • Resources
    • APIs and Data
    • Tech Websites
    • PostgreSQL Cheat Sheet
    • Sequelize Cheat Sheet
    • Database Administration
  • Archived Section
    • (Archived) ReactJS
      • Intro to React
        • Todo List Codealong
        • Additional Topics
      • Deploy React
      • React with Gulp and Browserify
        • Setting up Gulp
        • Additional Gulp Tasks
      • React Router
        • OMDB Router
        • OMDB Search
        • Additional Resources
      • React Animations
        • CSS Animations
    • AngularJS
      • Intro to AngularJS
        • Components and SPA
        • Create an Angular App
      • Angular Directives and Filters
      • Angular Animation
      • Angular Bootstrap Directives
        • Bootstrap Modals
      • Angular $http
      • Angular Services
        • Service Recipes
        • ngResource
        • Star Wars Codealong
      • Angular Routing
      • Angular + Express
      • Angular Authentication
        • Additional Topics
      • Angular Components
      • Angular Custom Filters
      • Angular Custom Directives
Powered by GitBook
On this page
  • What is a hash table?
  • Use Cases
  • Hash Table Concepts
  • Sizing
  • Hashing Function
  • Selecting keys
  • Hashing
  • Collisions
  • Collision Resolution Strategies
  • Open Addressing or Probing
  • Closed Addressing or Chaining
  • Hash Table Operations
  • search(key)
  • insert(key, val)
  • remove(key)
  • Test It!
  • Additional Resources
  1. Data Structures and Algorithms

Hash Tables

What is a hash table?

We've been using a very hash-table-like structure in both JavaScript and Python. The JavaScript object and the Python dictionary are both implementations of a hash table or hash map. It is an array-like data structure in which every element is associated with a key.

A hash table always contains a hashing function that will take the key, hash and mod (%) it, and turn it into an internal index into which we will insert our item. When we want to access what we have inserted, we use the key and the hash table translates it into the index so it knows where to find the item we want.

Use Cases

Well, what have you been using objects and dictionaries for? We use them mostly in the same way as arrays but they are better for when we want a way to access the elements other than using an integer index. Arrays have inherent ordering; objects, dictionaries, and hash tables do not. Also, knowing the key always gives us constant-time search in addition to constant-time insertion and deletion - an advantage over arrays!

Hash Table Concepts

Sizing

At its heart, a hash table is yet another array. When we make a hash table, we create it with a certain size, just like an array in native languages. Typically, we choose the size to accomodate all the known data we have and then some room for growth. Resizing a hash table is a very involved operation so pick an appropriate number for your app's needs.

Hashing Function

Arrays use integers as indices but we want to use keys, so we must find a way to translate a key into an index in our hash table's internal array. To do this we use a hashing function, a hallmark feature of hash tables. This function takes a key, hashes it (a type of mathematical encryption of sorts), and moduluses the result by the number of slots in the array. This computation will give us some index within the bounds of the array. By using the key and the hashing function, we can access elements in the hash table with the same efficiency as arrays but can search them even faster!

Selecting keys

To get the keys to hash, we select some sort of key from the item we want to store in the table. For example, if we were storing users, we might use the email address as the key since it should be unique within our application. Database objects could use their id column values as the key since those are also generated to be unique. The more unique, the better!

Hashing

Once we have the key, we hash it using one of the many hash libraries out there. In Python, we will be using a library called hashlib which will generate very good, consistent hashes from our keys. It generates a long identifier of alphanumeric characters. We take this long identifier, convert it to an integer, and compute the modulus with the number of slots in the hash table. This generates an index and we insert the item into that index.

Collisions

In a perfect world, we'll always be able to hash our keys to perfectly unique hashes that map perfectly to unused indices in our hash table. Yeah, right! Even with the best hashing algorithms and key selection, given enough time and data, collisions will happen.

A collision occurs when an item is hashed to be in a slot already occupied by another item. We will learn of a couple of simple strategies to deal with this, however a collision resolution strategy can still break down under the increased collisions that come with a poor hashing function or poor key selection.

Imagine if we generated hashes based on the first letter of someone's name. Obviously there would be a ton of collisions as there are many names that start with the same letter. Even after hashing it and modding it we would still have a crushing number of collisions to deal with. The same would be true if we had an inadequate amount of space. Since extra steps must be taken to deal with a collision, if we get enough of them we lose the efficiency of the structure. Choosing a good hashing function, a good initial size, and a good collision resolution strategy are crucial for writing a good hash table.

Collision Resolution Strategies

Open Addressing or Probing

Probing is, quite simply, looking sequentially (or sometimes quadratically) for the next available open address. In linear probing, the algorithm simply iterates over the table until it finds an empty slot and then puts the item there. This can lead to clusters of elements forming which can reduce preformance if they get big enough so sometimes we look for open slots quadratically, a method where we increase the gap to the square of the next integer (e.g. first we look 1 slot ahead, then 4 (2_2), then 9 (3_3), then 16 (4*4), and so on.) This spreads out inserts over the table and helps maintain performance.

This means that our search strategy now needs to involve locating an initial slot, checking the item there to see if it is the correct one, and if not, iterating over the next few until it finds an item that has a matching key.

This does definitely extend the lookups for some items but if we ammortize all the lookups we find that our average search time is far closer to constant than it would be in an array.

Closed Addressing or Chaining

In the chaining strategy, each slot in the table is actually a bucket that holds multiple items and when a collision occurs, we simply add the new item as a new element in the bucket contained in that slot in the hash table. This makes insertions a bit faster than in the probing strategy since we don't need to iterate over an arbitrary number of elements to find an empty slot. Typically the bucket is implemented with a linked list but we will use a list (array) for simplicity. This will be our strategy.

Hash Table Operations

The basic operations we will need are search, insert, and remove.

search(key)

  1. Pass in the key as a parameter

  2. Hash the key to find the index

  3. Look in the array at that index

  4. Find the item in that bucket and return it

insert(key, val)

  1. Pass in the key and the value to be stored

  2. Hash the key to get the index

  3. Add the item to the end of the bucket at that index

remove(key)

  1. Pass in the key as a parameter

  2. Hash the key to find the index

  3. Look in the array at that index

  4. Find the item in that bucket and delete it

Test It!

"If you haven't tested it, it doesn't work."

Make sure you test every operation in every edge case you can think of. With data structures, it's good to test how functions behave when the structure contains:

  • Zero items (edge case)

  • One item (edge case)

  • Many items (normal usage)

  • Near capacity (edge case)

  • Full cpacity (edge case)

Typically our code works fine in the "normal usage" cases when we first write it. It's the edge cases that usually cause problems so make sure you thoroughly test.

Additional Resources

PreviousBeginner SolutionsNextIntro to Sorting

Last updated 3 years ago

Basic Data Structures: Hash Tables
Wikipedia's entry on Hash Tables
Stack Overflow question on Hash Tables
Generate a hash from a string in JavaScript