Got 1Billion files? Find it with QuikFynd Enterprise Search running on Intel Xeon(R) Platinum Processors

How we doubled performance with the power of multiple cores

When we developed QuikFynd Enterprise Search, our goal was to make the solution extremely scalable so that it could serve the needs of both small and large companies when it comes to finding data within their enterprises. We were quite confident in our design because of QuikFynd’s efficient use of multiple cores available on the latest generation of server CPU. So, when Intel announced their latest Xeon(R) Platinum 8180 processors, we were eager to give it a try. We knew that our software will really shine on the latest generation CPUs because of its large number of cores and optimized cache architecture.

Intel Xeon(R) Platinum 8180 processor packs up to 28 CPU cores with 56 threads on a single CPU. On a typical 2S system used for the most enterprise application, we would have a total of 56 cores and 112 threads. Compared to previous generation system we used Intel Xeon(R) E5-2697 CPU, this was an exact doubling of the number of cores. Moore’s law is still at work at Intel. 

But could we really double our performance with double the number of cores? It appears logical but in practice, it’s not that easy. See, not all workloads are created equal. If you are running a simple word processing program on a massively multi-processor system, you can’t possibly utilize all the performance, a server platform has to offer and in most cases, you don’t even need very high compute capacity for a simple application like word processing. Searching through vast amounts of data is different though. A company could easily have tens of millions of files and hundreds or even thousands of employees searching through these files every day to find the information they are looking for to perform their work. There is always a need to return search results quicker and to enable more people to search simultaneously without increasing number of servers used. But achieving this feat is no easy task, so here we would like to share our approach to achieve maximum scalability.

A search server typically has 3 main components, a web server that responds to user queries and provides responses, a query engine that takes requests from a web server, formulates a query and transforms results and a database that stores the data that needs to be queried. 

A first obvious design consideration is to use a web server component that can serve multiple requests in parallel. Basically, as multiple requests come in, the server could start a new thread or a new process (more on that later). These threads or processes will execute in parallel and be able to offer more queries per second. If your server platform has the number of cores available then it could process more number of queries in parallel and in theory give you a linear performance increase as the number of cores are increased. Now there are some differences between threads and processes. Some software systems will not use a separate core for multiple threads. So, in such a case you need multiple processes versus multiple threads. While we don’t want to be recommending any particular web server, but QuikFynd solution uses multiple process design for its web server and each request gets executed in a separate process, hence it can scale pretty much linearly with the number of available cores.

Your second design consideration is to optimize the number of processes you use because spawning new processes is expensive. Because you don’t expect a single user to fire off many queries in the fraction of a second, it’s probably ok to send all requests from a single user to the same process. This avoids spinning up new processes if requests are coming in from the same user. More importantly, if you start a new process for every request then you can’t do caching effectively because each process will get its own memory space. Users are limited by how fast they can type a search query and process the response. To design for this scenario, QuikFynd chooses to pin a user to a process, so all requests from a user will use the same process. We can still use multiple processes for serving requests from multiple users and this is more close to a real-life scenario. This way, we can support a large number of users.

You will get some scalability by making your web server scalable but it is unlikely that you will still be using your cores effectively because after your web server design is tuned, the bottleneck has shifted. Consider this. To offer full-text search results from 100s of millions of files, the size of the database is several hundred gigabytes. This means that you can’t possibly keep all this data in system memory and it has to be on a disk. If you are designing for real life scenario, you can’t possibly count on same keywords being queries again and again either, so you are likely going to read randomly from the database. While you may have over 100 cores in your server platform, your disk I/O has become the bottleneck because over 100 processes are trying to read from the same file on the disk. There is an obvious (but not so easy to implement) solution to this. Basically, you need to segment the database into multiple smaller databases. This way, multiple processors could read from those files simultaneously. You should also consider the performance of your file system here. For running QuikFynd on Intel SkyLake CPUs, we used Lustre - a network file system that enables parallel reads and it is super fast. By using Lustre and breaking up our database into smaller chunks, now we could achieve the kind of parallelism we were aiming for. 

So, here you have it. You can really find a needle in the haystack or put another way a single word amongst 1B files using a single Intel Xeon(R) Platinum 8180 processor and using QuikFynd Enterprise Search that is designed to take full advantage of multiple cores. 

Intel published the results of QuikFynd’s performance (shown below) and also available at link on Intel web site

 

QuikFynd performance on Intel Xeon(R) Platimum 8180 Platform

We were able to achieve 79% performance improvement compared to the predecessor server platform - Intel Xeon(R) E5-2697 v3. While it is not quite 2X but it is fairly close. For enterprises, this can result in great savings. Using QuikFynd Enterprise Search you can consolidate all of your search work on a single server. And if you are a service provider offering search services to your customers, you can use a single server for your multiple customers.

I believe the number of cores will continue to increase from generation to generation and applications like QuikFynd Enterprise Search will take full advantage of it by delivering search results faster than ever.

Do you think having too many cores is an overkill or still not enough? Please share your thoughts and opinions.