Those CPU features (AVX2 and whatnot) need to be detected at runtime, too, however.
Those ifdefs only detect if the compiler supports them, i.e. at build-time only.
So... your program only compiles with AVX2 and others if the compiler supports them; so you should compile where the compiler has all those features (because you want everything to be compiled into one executable, of course), and then use runtime checks to make sure the CPU on which the program is run has actually support for AVX2, for example, as it can select the best implementation based on the available CPU features.
To make things a bit more complicated, let me quote a part from one of the projects he has: "The detection is performed at configure time through both CPUID flags and actual instruction execution tests on the host machine, verifying support in both the CPU and operating system.". Currently what you are doing is the "OS", or rather, compiler, since you are using only macro definitions.
Once you add this, then "Automatically leverages SSE4.2 and AVX2 instructions when available for maximum throughput." from the list of features on the website will be correct / accurate.
If interested, someone I know (or rather, follow) has a single header file for detecting CPU features at runtime (for C), and he also has a build-time detection one, but that has much more features.
It may have SIMD features hard coded at compile time, might not have regular expressions (even though re means regular expression in grep) and the benchmarks might be so short that they are measuring process startup time (as well as mixing SIMD features and algorithms when measuring time), but that landing page is slick.
Interesting. You may be interested in a more modern search algorithm to replace Boyer Moore. I recently presented HashChain, a very fast sublinear search algorithm at the Symposium of Experimental Algorithms.
It's the fastest sublinear search algorithm around in almost all cases. I also have a guaranteed worst-case linear version (which is still sublinear and much faster than Boyer Moore in the average case).
$ (for x in `seq 1 100000`; do echo 'I am a Test Vector HeLlO World '"$x"; done) > /dev/shm/krep_tmp
Best of three runs shown:
$ time ./krep -i hello /dev/shm/krep_tmp
Found 43721 matches
Search completed in 0.0017 seconds (2017.44 MB/s)
Search details:
- File size: 3.52 MB
- Pattern length: 5 characters
- Using AVX2 acceleration
- Case-insensitive search
real 0m0,005s
user 0m0,001s
sys 0m0,004s
$ time ./krep HeLlO /dev/shm/krep_tmp
Found 82355 matches
Search completed in 0.0014 seconds (1259.72 MB/s)
Search details:
- File size: 1.71 MB
- Pattern length: 5 characters
- Using AVX2 acceleration
- Case-sensitive search
real 0m0,004s
user 0m0,003s
sys 0m0,004s
$ time ./krep -i "HeLlO World" /dev/shm/krep_tmp
Found 99958 matches
Search completed in 0.0021 seconds (1700.54 MB/s)
Search details:
- File size: 3.52 MB
- Pattern length: 11 characters
- Using AVX2 acceleration
- Case-insensitive search
real 0m0,005s
user 0m0,002s
sys 0m0,004s
$ time ./krep "I am a Test Vector HeLlO World" /dev/shm/krep_tmp
Found 3964 matches
Search completed in 0.0149 seconds (235.83 MB/s)
Search details:
- File size: 3.52 MB
- Pattern length: 30 characters
- Using AVX2 acceleration
- Case-sensitive search
real 0m0,016s
user 0m0,015s
sys 0m0,001s
$ time ./krep -i "I am a Test Vector hello World" /dev/shm/krep_tmp
Found 3964 matches
Search completed in 0.0178 seconds (197.70 MB/s)
Search details:
- File size: 3.52 MB
- Pattern length: 30 characters
- Using AVX2 acceleration
- Case-insensitive search
real 0m0,021s
user 0m0,017s
sys 0m0,004s
Benchmark with fgrep (the first run was good enough):
$ time fgrep -ci hello /dev/shm/krep_tmp
100000
real 0m0,003s
user 0m0,003s
sys 0m0,000s
$ time fgrep -ci "I am a Test Vector hello World" /dev/shm/krep_tmp
100000
real 0m0,010s
user 0m0,009s
sys 0m0,000s
$ time fgrep -c "I am a Test Vector HeLlO World" /dev/shm/krep_tmp
100000
real 0m0,005s
user 0m0,004s
sys 0m0,001s
This is a model name: Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz. There's 40gb of ram free and 10 cores doing nothing. shell is cpuset. On commit 95ed1853b561396c8a8bcbbdd115ed6273848e3f (HEAD -> main, origin/main, origin/HEAD). gcc is 13.3.0-6ubuntu2~24.04
tl;dr: krep produces obviously wrong results slower than fgrep.
That's a good point, though the readme does flatly state that krep "is designed with performance as a primary goal," so the lede's generalization that it is "blazingly fast" isn't correct, despite the later, more deeply buried caveat that "Performance may vary based on hardware, file characteristics, and search pattern" (which describes all software). And the comment you answered doesn't say just that krep is "slower" than fgrep; it says krep "produces obviously wrong results" slower.
Edit: and the fact that krep lacks regular-expression support means it's not a replacement for grep or meaningfully comparable with it.
I try my best to interpret pithy phrases describing a project as first order approximations, rather than literal statements of truth that perfectly generalize. Pithiness is important for communicating ideas quickly, but precision and pithiness are often in tension with one another. So I adjust my expectations accordingly.
Yes, I agree that the wrong results are bad. But that doesn't invalidate my point. I even went out of my way to clarify that the benchmark wasn't worthless. Benchmarking the small input case is absolutely worth it. You just can't tell much about its scaling properties when your measurement is basically "how fast does the process start and stop." Which, again, to be clear, IT MATTERS. It just probably doesn't matter as much as readers might think it matters when they see it.
So treat my comment as adding helpful context for readers that aren't experts in benchmarking grep tools from someone experienced in... benchmarking grep tools. :-) (And regexes in general. See: https://github.com/BurntSushi/rebar)
I'm curious why krep runs faster with large files in a multithreaded manner?
Naively, isn't IO the bottleneck?
IE, I'd think that loading a file would be slow enough that krep would be IO-bound?
Do you have a typical ratio of IO time to search time on a modern disk and CPU?
What about a producer-consumer model where one thread reads files and creates an in-memory queue of file contents; and a different thread handles the actual searching without pauses for IO?
Edit: If you're truly CPU-bound, another variation of producer-consumer is to have a single thread read files into queues, and then multiple threads searching through files. Each thread would search through a single file at a time. This eliminates the shared memory issue that you allude to with overlap.
I didn't read the source, but from the description it says it uses memory mapping. So my guess here is that IO isn't so much of an issue since prefetching can hide away the latency if you are able to memory map a large enough segment of the file.
Iff the statement about prefetching is true though, I wonder how the prefetching wouldn't be bamboozled by the multiple threads accessing the file.
In that case it probably makes more sense to have a shared queue of files, and each thread handles a single file at a time. It'll avoid the overlap issue.
Homepage shows it significantly faster than ripgrep. Impressive. Would like to see how it compares across the entire ripgrep's benchmark suite[1], which also includes a few other similar utilities.
edit: Getting an error related to madvise(). Had to insert '-D_GNU_SOURCE' in Makefile's CFLAGS.
E.g. the chunk boundary stuff in the multi-threaded file search is something that would make me nervous.
It brings new edge cases into a simple search, and those edge cases are not directly related to features in the data; just to the happenstance of where the chunk boundaries land.
Just by adding one character to a file, a character that obviously lies far outside of any match, we shift the boundaries such that a match could break megabytes away from the insertion.
Do those exercise the logic kazinator called out as test-worthy? To my eye, no. They don't use search_file, and their inputs are smaller than MIN_FILE_SIZE_FOR_THREADS anyway.
I'm inclined to agree with kazinator. The code here: <https://github.com/davidesantangelo/krep/blob/ac6783af42c92f...> looks wrong to me. It potentially increases `chunk_size` but doesn't reduce the number of loop iterations to be consistent with that. Maybe search_thread recognizes the boundary violation and does nothing? That'd be a relatively harmless outcome, but it's strange to launch threads that do nothing. But actually it's not immediately obvious to me that it does recognize if end_pos is beyond file_len. And then the code about handling + skipping overlaps in search_thread also looks test-worthy.
This wouldn't build for me, so I had to apply the patch suggested by a sibling comment.
Once I got it building, my first benchmark attempt shows it as being slower:
$ curl -LO 'https://burntsushi.net/stuff/subtitles2016-sample.en.gz'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 265M 100 265M 0 0 48.6M 0 0:00:05 0:00:05 --:--:-- 49.9M
$ gzip -d subtitles2016-sample.en.gz
$ hyperfine --ignore-failure "rg -c 'ZQZQZQZQ' subtitles2016-sample.en" "krep -c 'ZQZQZQZQ' subtitles2016-sample.en"
Benchmark 1: rg -c 'ZQZQZQZQ' subtitles2016-sample.en
Time (mean ± σ): 80.7 ms ± 1.6 ms [User: 57.7 ms, System: 22.7 ms]
Range (min … max): 75.3 ms … 83.3 ms 35 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: krep -c 'ZQZQZQZQ' subtitles2016-sample.en
Time (mean ± σ): 122.8 ms ± 1.4 ms [User: 372.6 ms, System: 24.4 ms]
Range (min … max): 120.2 ms … 125.5 ms 24 runs
Summary
rg -c 'ZQZQZQZQ' subtitles2016-sample.en ran
1.52 ± 0.03 times faster than krep -c 'ZQZQZQZQ' subtitles2016-sample.en
That's a benchmark with no matches, which is the best case essentially for throughput. Now I want to try a benchmark with a high match frequency:
$ hyperfine "rg -c 'the' subtitles2016-sample.en" "krep -c 'the' subtitles2016-sample.en"
Benchmark 1: rg -c 'the' subtitles2016-sample.en
Time (mean ± σ): 411.8 ms ± 3.6 ms [User: 389.7 ms, System: 21.1 ms]
Range (min … max): 404.8 ms … 415.7 ms 10 runs
Benchmark 2: krep -c 'the' subtitles2016-sample.en
Time (mean ± σ): 121.2 ms ± 1.9 ms [User: 364.6 ms, System: 24.9 ms]
Range (min … max): 113.2 ms … 123.0 ms 24 runs
Summary
krep -c 'the' subtitles2016-sample.en ran
3.40 ± 0.06 times faster than rg -c 'the' subtitles2016-sample.en
Which is very nice. So I decided to poke at it:
$ krep -c the subtitles2016-sample.en
Found 29794426 matches
$ rg -c the subtitles2016-sample.en
6123710
$ grep -c the subtitles2016-sample.en
6123710
The counts are way off here. At first I thought maybe it was counting every occurrence of `the` instead of every matching line, but when I ask ripgrep to do that, it gets a different answer:
$ rg -oc the subtitles2016-sample.en
7739791
$ rg -o the subtitles2016-sample.en | wc -l
7739791
$ grep -o the subtitles2016-sample.en | wc -l
7739791
So not sure what's going on here, but it looks like `krep` might not be giving accurate results.
Pushing it a bit more, it seems like it just kind of falls over?
$ time rg -c 'You read Sherlock Holmes to deduce that\?' subtitles2016-sample.en
10
real 0.076
user 0.049
sys 0.026
maxmem 923 MB
faults 0
$ time krep -c 'You read Sherlock Holmes to deduce that?' subtitles2016-sample.en
Found 0 matches
real 0.935
user 3.597
sys 0.029
maxmem 918 MB
faults 0
I ran the above benchmarks in `/dev/shm` on Linux with an i9-12900K.
And it uses memory maps (sometimes, when it thinks it will be fast, but it will do so in the single file case on Linux).
ripgrep also uses parallelism, but at inter-file level. It sounds like `krep` also uses parallelism, but will use multiple threads when searching a single file. I've considered doing the same in ripgrep, but haven't done enough experiments (or seen enough from someone else) to be convinced that it's the right way to go in general. It might edge out single threaded search in some cases for sure though.
EDIT: Looking at the timings in https://dev.to/daviducolo/introducing-krep-building-a-high-p..., I see, for example, ripgrep taking >40 seconds to search for the literal pattern `error` in a 5GB file. Even if you're reading from disk (which the OP is using an SSD), that does not seem right at all. Even for an exceptionally common word like `the` in this haystack, ripgrep can chew through a 13GB file in 5 seconds on my machine:
$ time rg -c the full.txt
83499915
real 5.404
user 5.092
sys 0.302
maxmem 12511 MB
faults 0
Even if I force reading from disk, we get nowhere near 40 seconds:
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time rg -c the full.txt
83499915
real 10.577
user 5.191
sys 2.105
maxmem 12511 MB
faults 42
I'm not saying the benchmark results are definitely wrong, but something looks off here that I can't easily explain. OP, can you please share a way to fully reproduce your benchmark? (Like I did above for `subtitles2016-sample.en`.)
I had a similar experience, but testing by running `strings` on the Steam Deck repair image (the largest file I had handy) to create a 203 MB strings file with 34,206,436 lines, and then checking it for the string "Steam"
$ time fgrep -c "Steam" /tmp/steamstrings
241
grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox,.venv,venv 0.09s user 0.03s system 99% cpu 0.112 total
$ time rg -c Steam /tmp/steamstrings
241
rg -c Steam /tmp/steamstrings 0.03s user 0.02s system 92% cpu 0.054 total
$ time ~/source/other/krep/krep "Steam" /tmp/steamstrings
Found 2226035 matches
Search completed in 0.0338 seconds (5991.67 MB/s)
Search details:
- File size: 202.56 MB
- Pattern length: 5 characters
- Using AVX2 acceleration
- Case-sensitive search
~/source/other/krep/krep "Steam" /tmp/steamstrings 0.08s user 0.02s system 225% cpu 0.045 total
So krep is:
1. Extremely fast
2. Extremely inaccurate
3. Not useful if you actually want to see what the lines actually are rather than just knowing how many there aren't
Not to be facetious, but if the goal is to write a program that gives incorrect output as fast as possible I don't think you need to go as far as using AVX2.
I have tried this on a couple of different machines. On one machine it gives ridiculous answers like you found. On the other it at least works as expected, although it's kinda useless since it doesn't print the matched lines.
On the working machine it reported using SSE4.2 acceleration while the broken one used AVX2 acceleration. However, the machine using SSE4.2 didn't see nearly as much speedup as the AVX2 machine. Regular system grep on the SSE4.2 machine took 0.186 seconds to do the search, while krep needed 0.154 seconds. However the biggest test file I had handy was only 123MB, so maybe the lead will grow more with a larger file?
That's probably because pcmpestri is trash for substring search. There is a good reason why ripgrep doesn't use it. :-)
I looked for an authoritative search for why pcmpestri is trash, and I couldn't find anything I was happy linking to other than Agner Fog's instruction tables: https://www.agner.org/optimize/instruction_tables.pdf You can see that the throughput and latency for pcmpestri is just awful.
And yes, not having any code to print the matching lines means that the only code path in krep is just counting things. If that's all your tool is doing, you can totally beat ripgrep or any other tool that is more applicable to generalized use cases. It's why the `memchr` crate (what ripgrep uses for single substring search) has a specialized routine for counting occurrences of bytes (which ripgrep uses for line counting): https://github.com/BurntSushi/memchr/blob/746182171d2e886006...
Because it's faster to do that than it is to reuse the generalized `memchr` API for finding the location of matching bytes.
And counting matches in a multi-threaded context is way easier than actual managing the printing of matches in the same order that you get them.
krep isn't big. You can skim its source code in a few minutes and get a good idea of how it works.
To give more context, if there is a disk error (logical, physical, etc.), an "fread" would simply return an error, it won't interfere with the rest.
But with memory mapped files, you have to deal this in someway. For example on Windows, through SEH (__try / __except) around blocks reading (or writing) to memory mapped files.
The name "krep" has an interesting origin. It is inspired by the Icelandic word "kreppan," which means "to grasp quickly" or "to catch firmly." I came across this word while researching efficient techniques for pattern recognition.
Just as skilled fishers identify patterns in the water to locate fish quickly, I designed "krep" to find patterns in text with maximum efficiency. The name is also short and easy to remember—perfect for a command-line utility that users might type hundreds of times per day.
Those CPU features (AVX2 and whatnot) need to be detected at runtime, too, however.
Those ifdefs only detect if the compiler supports them, i.e. at build-time only.
So... your program only compiles with AVX2 and others if the compiler supports them; so you should compile where the compiler has all those features (because you want everything to be compiled into one executable, of course), and then use runtime checks to make sure the CPU on which the program is run has actually support for AVX2, for example, as it can select the best implementation based on the available CPU features.
To make things a bit more complicated, let me quote a part from one of the projects he has: "The detection is performed at configure time through both CPUID flags and actual instruction execution tests on the host machine, verifying support in both the CPU and operating system.". Currently what you are doing is the "OS", or rather, compiler, since you are using only macro definitions.
Once you add this, then "Automatically leverages SSE4.2 and AVX2 instructions when available for maximum throughput." from the list of features on the website will be correct / accurate.
If interested, someone I know (or rather, follow) has a single header file for detecting CPU features at runtime (for C), and he also has a build-time detection one, but that has much more features.
I am interested in the CPU intrinsics detection in a single header file, if you don’t mind dropping the link.
It may have SIMD features hard coded at compile time, might not have regular expressions (even though re means regular expression in grep) and the benchmarks might be so short that they are measuring process startup time (as well as mixing SIMD features and algorithms when measuring time), but that landing page is slick.
You can read my blog post about the project at https://dev.to/daviducolo/introducing-krep-building-a-high-p...
Interesting. You may be interested in a more modern search algorithm to replace Boyer Moore. I recently presented HashChain, a very fast sublinear search algorithm at the Symposium of Experimental Algorithms.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol301-sea...
It's the fastest sublinear search algorithm around in almost all cases. I also have a guaranteed worst-case linear version (which is still sublinear and much faster than Boyer Moore in the average case).
Sample code is available here:
https://github.com/nishihatapalmer/HashChain
If you're interested, let me know.
Hi David.
Best of three runs shown: Benchmark with fgrep (the first run was good enough): This is a model name: Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz. There's 40gb of ram free and 10 cores doing nothing. shell is cpuset. On commit 95ed1853b561396c8a8bcbbdd115ed6273848e3f (HEAD -> main, origin/main, origin/HEAD). gcc is 13.3.0-6ubuntu2~24.04tl;dr: krep produces obviously wrong results slower than fgrep.
Consider using a bigger haystack. Your timings are so short that you're mostly just measuring the overhead of running a process.
This is relevant to krep because it spawns threads to search files (I guess for files over 1MB?).
This does not mean your benchmark is worthless. It just means you can't straight-forwardly generalize from it.
The incorrect results are far more important than the times!
I agree.
That's a good point, though the readme does flatly state that krep "is designed with performance as a primary goal," so the lede's generalization that it is "blazingly fast" isn't correct, despite the later, more deeply buried caveat that "Performance may vary based on hardware, file characteristics, and search pattern" (which describes all software). And the comment you answered doesn't say just that krep is "slower" than fgrep; it says krep "produces obviously wrong results" slower.
Edit: and the fact that krep lacks regular-expression support means it's not a replacement for grep or meaningfully comparable with it.
I try my best to interpret pithy phrases describing a project as first order approximations, rather than literal statements of truth that perfectly generalize. Pithiness is important for communicating ideas quickly, but precision and pithiness are often in tension with one another. So I adjust my expectations accordingly.
Yes, I agree that the wrong results are bad. But that doesn't invalidate my point. I even went out of my way to clarify that the benchmark wasn't worthless. Benchmarking the small input case is absolutely worth it. You just can't tell much about its scaling properties when your measurement is basically "how fast does the process start and stop." Which, again, to be clear, IT MATTERS. It just probably doesn't matter as much as readers might think it matters when they see it.
So treat my comment as adding helpful context for readers that aren't experts in benchmarking grep tools from someone experienced in... benchmarking grep tools. :-) (And regexes in general. See: https://github.com/BurntSushi/rebar)
> Consider using a bigger haystack.
O_o
Absolutely not.
> Your timings are so short
Who gives a fuck? krep gives the wrong answer. Nobody needs to know that slower.
> you're mostly just measuring the overhead of running a process
Bull shit.
> This is relevant to krep because it spawns threads to search files (I guess for files over 1MB?).
Don't care.
> This does not mean your benchmark is worthless
I know that.
> It just means you can't straight-forwardly generalize from it.
Don't tell me what I can't do, when I'm obviously doing it right in front of you. You'll never learn a damn thing that way.
Give the author some grace. Sheesh. You sound like a toxic senior dev
What about opening an issue with the incorrect results
I'm curious why krep runs faster with large files in a multithreaded manner?
Naively, isn't IO the bottleneck?
IE, I'd think that loading a file would be slow enough that krep would be IO-bound?
Do you have a typical ratio of IO time to search time on a modern disk and CPU?
What about a producer-consumer model where one thread reads files and creates an in-memory queue of file contents; and a different thread handles the actual searching without pauses for IO?
Edit: If you're truly CPU-bound, another variation of producer-consumer is to have a single thread read files into queues, and then multiple threads searching through files. Each thread would search through a single file at a time. This eliminates the shared memory issue that you allude to with overlap.
I didn't read the source, but from the description it says it uses memory mapping. So my guess here is that IO isn't so much of an issue since prefetching can hide away the latency if you are able to memory map a large enough segment of the file.
Iff the statement about prefetching is true though, I wonder how the prefetching wouldn't be bamboozled by the multiple threads accessing the file.
Forgot about memory mapping.
In that case it probably makes more sense to have a shared queue of files, and each thread handles a single file at a time. It'll avoid the overlap issue.
Was this entire blog post written by AI too? Honestly, this is an impressive feat if so, but the fake benchmarks are really not a good look.
Homepage shows it significantly faster than ripgrep. Impressive. Would like to see how it compares across the entire ripgrep's benchmark suite[1], which also includes a few other similar utilities.
edit: Getting an error related to madvise(). Had to insert '-D_GNU_SOURCE' in Makefile's CFLAGS.
[1]: https://github.com/BurntSushi/ripgrep/blob/master/benchsuite...
Since krep doesn’t support regular expressions, it can’t run most of the ripgrep benchmark suite.
thanks, I've just fixed https://github.com/davidesantangelo/krep/commit/95ed1853b561....
Where are the test cases?
E.g. the chunk boundary stuff in the multi-threaded file search is something that would make me nervous.
It brings new edge cases into a simple search, and those edge cases are not directly related to features in the data; just to the happenstance of where the chunk boundaries land.
Just by adding one character to a file, a character that obviously lies far outside of any match, we shift the boundaries such that a match could break megabytes away from the insertion.
added https://github.com/davidesantangelo/krep/tree/main/test
Do those exercise the logic kazinator called out as test-worthy? To my eye, no. They don't use search_file, and their inputs are smaller than MIN_FILE_SIZE_FOR_THREADS anyway.
I'm inclined to agree with kazinator. The code here: <https://github.com/davidesantangelo/krep/blob/ac6783af42c92f...> looks wrong to me. It potentially increases `chunk_size` but doesn't reduce the number of loop iterations to be consistent with that. Maybe search_thread recognizes the boundary violation and does nothing? That'd be a relatively harmless outcome, but it's strange to launch threads that do nothing. But actually it's not immediately obvious to me that it does recognize if end_pos is beyond file_len. And then the code about handling + skipping overlaps in search_thread also looks test-worthy.
The priority is probably to cover the search algorithms themselves first. These are the bits that are likely to be reused in other programs.
This wouldn't build for me, so I had to apply the patch suggested by a sibling comment.
Once I got it building, my first benchmark attempt shows it as being slower:
That's a benchmark with no matches, which is the best case essentially for throughput. Now I want to try a benchmark with a high match frequency: Which is very nice. So I decided to poke at it: The counts are way off here. At first I thought maybe it was counting every occurrence of `the` instead of every matching line, but when I ask ripgrep to do that, it gets a different answer: So not sure what's going on here, but it looks like `krep` might not be giving accurate results.Pushing it a bit more, it seems like it just kind of falls over?
I ran the above benchmarks in `/dev/shm` on Linux with an i9-12900K.In terms of the approach here, ripgrep is already using a pretty sophisticated substring search algorithm: https://github.com/BurntSushi/memchr?tab=readme-ov-file#algo...
And it uses memory maps (sometimes, when it thinks it will be fast, but it will do so in the single file case on Linux).
ripgrep also uses parallelism, but at inter-file level. It sounds like `krep` also uses parallelism, but will use multiple threads when searching a single file. I've considered doing the same in ripgrep, but haven't done enough experiments (or seen enough from someone else) to be convinced that it's the right way to go in general. It might edge out single threaded search in some cases for sure though.
EDIT: Looking at the timings in https://dev.to/daviducolo/introducing-krep-building-a-high-p..., I see, for example, ripgrep taking >40 seconds to search for the literal pattern `error` in a 5GB file. Even if you're reading from disk (which the OP is using an SSD), that does not seem right at all. Even for an exceptionally common word like `the` in this haystack, ripgrep can chew through a 13GB file in 5 seconds on my machine:
Even if I force reading from disk, we get nowhere near 40 seconds: I'm not saying the benchmark results are definitely wrong, but something looks off here that I can't easily explain. OP, can you please share a way to fully reproduce your benchmark? (Like I did above for `subtitles2016-sample.en`.)I had a similar experience, but testing by running `strings` on the Steam Deck repair image (the largest file I had handy) to create a 203 MB strings file with 34,206,436 lines, and then checking it for the string "Steam"
So krep is:1. Extremely fast
2. Extremely inaccurate
3. Not useful if you actually want to see what the lines actually are rather than just knowing how many there aren't
Not to be facetious, but if the goal is to write a program that gives incorrect output as fast as possible I don't think you need to go as far as using AVX2.
I have tried this on a couple of different machines. On one machine it gives ridiculous answers like you found. On the other it at least works as expected, although it's kinda useless since it doesn't print the matched lines.
On the working machine it reported using SSE4.2 acceleration while the broken one used AVX2 acceleration. However, the machine using SSE4.2 didn't see nearly as much speedup as the AVX2 machine. Regular system grep on the SSE4.2 machine took 0.186 seconds to do the search, while krep needed 0.154 seconds. However the biggest test file I had handy was only 123MB, so maybe the lead will grow more with a larger file?
That's probably because pcmpestri is trash for substring search. There is a good reason why ripgrep doesn't use it. :-)
I looked for an authoritative search for why pcmpestri is trash, and I couldn't find anything I was happy linking to other than Agner Fog's instruction tables: https://www.agner.org/optimize/instruction_tables.pdf You can see that the throughput and latency for pcmpestri is just awful.
And yes, not having any code to print the matching lines means that the only code path in krep is just counting things. If that's all your tool is doing, you can totally beat ripgrep or any other tool that is more applicable to generalized use cases. It's why the `memchr` crate (what ripgrep uses for single substring search) has a specialized routine for counting occurrences of bytes (which ripgrep uses for line counting): https://github.com/BurntSushi/memchr/blob/746182171d2e886006...
Because it's faster to do that than it is to reuse the generalized `memchr` API for finding the location of matching bytes.
And counting matches in a multi-threaded context is way easier than actual managing the printing of matches in the same order that you get them.
krep isn't big. You can skim its source code in a few minutes and get a good idea of how it works.
(In case it's not obvious, parent poster is author of ripgrep.)
Do you have an explanation for the obviously wrong answers in simple examples shown here?
The site being made by AI explains some of it.
How do you handle disk errors and file mapping?
To give more context, if there is a disk error (logical, physical, etc.), an "fread" would simply return an error, it won't interfere with the rest.
But with memory mapped files, you have to deal this in someway. For example on Windows, through SEH (__try / __except) around blocks reading (or writing) to memory mapped files.
Just wondering...
Very cool! The repo is a reference for minimalism. Also, TIL about ifeq in Makefile. So, many thanks!
[flagged]
Minor nit: the "re" part of grep stands for "regular expression". That doesn't seem to be the case with krep so it's a bit misnamed maybe?
He said support for regular expressions is coming soon: https://news.ycombinator.com/item?id=43335300
Source Code on GitHub: https://github.com/davidesantangelo/krep
#include <snarky_comment_about_not_using_rust.h>
Seriously though... thx! this is directly applicable to current interests and the code is not a jumbled mess.
This is only for string matching? I can't find any mentions of regular expression support. Why use the "re" naming scheme?
The Story Behind the Name
The name "krep" has an interesting origin. It is inspired by the Icelandic word "kreppan," which means "to grasp quickly" or "to catch firmly." I came across this word while researching efficient techniques for pattern recognition.
Just as skilled fishers identify patterns in the water to locate fish quickly, I designed "krep" to find patterns in text with maximum efficiency. The name is also short and easy to remember—perfect for a command-line utility that users might type hundreds of times per day.
Was this comment written by an LLM? What use, if any, did you make of LLMs when writing krep?
however support for regular expressions will come very soon!
Finally something not-Rust!
Yeah... because that's totally the same thing. All bugs are created equal! You sure showed me.
Just fix your shit. I don't know who packs ripgrep for Termux but it never returns, you have to Ctrl-C every time. Fix that too
Feel free to submit a patch!
I love the install process.
This is pretty much the standard for C libraries / programs.
It's weird that the_silver_searcher, also known as `ag` [1] is not mentioned in benchmarks, which is also implemented in C.
I wonder why...
[1] https://github.com/ggreer/the_silver_searcher
Classic post where ripgrep is faster than ag, so maybe that does not matter: https://burntsushi.net/ripgrep/
The author seemed to stop working on it around 2020, and nobody seems to have taken it over.
Is ahocorasick in a different plane than this?
Aho Corasick is for multi pattern searching - searching for a lot of different strings simultaneously.
Nice, but why not just do a PR a ripgrep to add your algo?
Because I wanted to experiment and have fun with a personal project that I will evolve
[flagged]
[flagged]