Here's a question I like asking people: "Imagine a website database with 1 billion rows of messages. How long do you think it would take to query the database for a particular row?"
Try to decide on your own answer before reading on. I usually hear answers ranging from 100 milliseconds to 1 minute.
Of course, the answer would depend on how well the database is indexed.
What is indexing?
An index is simply a pointer to specific data in a database. By creating rules for how a database is indexed, you can speed up how fast it takes to find specific data in a database. You can think of indexing as coming up with rules around how rows are stored, like books in a library. If all the books are organized alphabetically, you can find what you're looking for in a handful of steps, instead of looking through every single book once.
Our intuition has it that, as this library grows, indexing also gets harder and harder.
This intuition is further internalized by our day-to-day experience with computers. It's not uncommon to search for a word in a large text file and end up watching your computer become unresponsive for several seconds.
In reality, a well-indexed table with a billion rows, will still return you the results you're looking for in under a millisecond. But, if there was no indexing, the database engine would literally have to look through every row, one by one, which may take more like 30 seconds.
Estimating the performance you require
Let's say you have a website that gets 1 million page views per day. This sounds like a lot, but if traffic was consistent, it amounts to about 10 page loads per second. As long as each page load takes up less than 100ms of database time, you should be good.
Of course, each page load may use many SQL queries - sometimes hundreds. The good news is, if your tables are indexed well, you will be able to handle many thousands of queries per second. It is not unheard of for MySQL or PostgreSQL to process hundreds of thousands of queries per second.
How to index properly
It turns out there's no formula for proper indexing. This is because choosing the right index doesn't just depend on your table or query, but also on the data distribution present in your tables.
It is further complicated by the fact that different database engines like PostgreSQL or MySQL often choose indexes differently when parsing your queries. Even versions of the same database (like Postgres 9 vs 11) may end up choosing vastly different indexes for the same exact query.
The solution is to make a habit of using EXPLAIN ANALYZE. Prepending these two words to your query will return the actual time it took to execute the query, as well as the step-by-step plan for executing it. Using this information, you can see the steps of the executed query, and use that to determine how to optimize your query for better performance.
You can read more about reading query plans in the PostgreSQL Documentation.
One way in which you can ensure that your queries are performant is to automatically run EXPLAIN ANALYZE, and log the results if the query takes more than 3ms to execute. Adding this to your workflow lets you spot queries that may start using the wrong index, and then attempt to fix or re-write them so that they execute faster.
Databases that are well-indexed will have faster lookup times, ensuring that your application can find the specific data needed from the database quickly. This is critical to maintain a good user-experience for your application - latency is extremely frustrating, and if your application takes a long time to find something and then show the result to the user, your user will get frustrated with using your application.
If your database is very large and returning results very slowly, the first thing to do is check to make sure that your indexes are efficient. You can do this by using EXPLAIN ANALYZE to evaluate how your query performs its search, and use that information to write queries with faster indexing times.