Grep too slow? Use git-grep
If you need to find a file, and are on a git repo, use `git grep` instead of grep. Similar options, much, much faster.
Example, on a directory with 11K+ files:
grep -r . 'float: none' -exclude-dir=.git
56.96s user 0.70s system 99% cpu 58.127 total
git grep 'float: none'
0.42s user 1.05s system 335% cpu 0.438 total
130x+ improvement :D
Have you tried:
ReplyDeleteLC_ALL="C" grep -r ... ?
What does that do?
Deletehttp://stackoverflow.com/questions/8138124/implications-of-lc-all-c-to-speedup-grep
DeleteDid you do another grep test after the git-grep one ?
ReplyDeleteWhen I grep large folders the first grep takes ages but the next ones are incredibly fast. There must be some kind of cache. Or it's just zsh doing it for me, who knows ?
Yup, I did several of them, they all looked the same.
DeleteThe second run will be faster always because the *filesystem* keeps an in-memory cache of recently opened files.
DeleteI'll try it again tomorrow and get some more numbers and screenshots, I'm pretty certain the numbers didn't change between runs.
DeleteYup, you were right:
Delete─$ time grep -r . 'float: none' -exclude-dir=.git > /dev/null grep -r . 'float: none' -exclude-dir=.git > /dev/null 58.77s user 1.82s system 72% cpu 1:23.44 total
─$ time grep -r . 'float: none' -exclude-dir=.git > /dev/null
grep -r . 'float: none' -exclude-dir=.git > /dev/null 58.65s user 0.68s system 99% cpu 59.613 total
─$ time grep -r . 'float: none' -exclude-dir=.git > /dev/null grep -r . 'float: none' -exclude-dir=.git > /dev/null 59.42s user 0.74s system 99% cpu 1:00.47 total
─$ time grep -r . 'float: none' -exclude-dir=.git > /dev/null grep -r . 'float: none' -exclude-dir=.git > /dev/null 57.31s user 0.88s system 98% cpu 58.938 total
There is some variation between runs.
What about Ack?
ReplyDeleteMaybe you can provide us some numbers for comparison :)
DeleteHow do you get the time counter?
Deletetime [command]
DeleteExample:
time grep -r . 'float: none' -exclude-dir=.git
Ack was 10 times slower than grep, even with --css.
DeleteCouldn't run git grep properly with the time command, as the output gets piped to a pager and I have to type q to quit (adding my finger typing to the time result).
Typing q as fast as I can gets me maybe 0.020s faster than grep. Too little project maybe. Well, git grep is faster to type anyway, saving a lot of time.
This isn't apples to apples. The two commands are not equivalent, so of course, one is faster.
ReplyDelete$ grep --exclude-dir=.git -r foo . | wc -l
5829
$ git grep foo | wc -l
500
In this case, I have a huge untracked logs directory in the root of this directory, and git-grep ignores anything not tracked in the repo. grep is much much faster by ignoring that log directory and binary files.
$ grep --exclude-dir=.git --exclude-dir=logs -I -r foo . | wc -l
337
But since the counts don't match, it is still not apples to apples.
Anyway, I wasn't aware of git-grep, so thank you for the tip. Definitely useful if you're not already using an aliased version of grep.
Maybe you're using a slow grep.
ReplyDeletehttp://jlebar.com/2012/11/28/GNU_grep_is_10x_faster_than_Mac_grep.html
echo 3 | sudo tee /proc/vm/sys/drop_caches
ReplyDeleteMistake above : echo 3 | sudo tee /proc/sys/vm/drop_caches
ReplyDeleteI use ack-grep http://beyondgrep.com/
ReplyDeleteThe time difference seems to be solely due to what files are searched. 'git grep' only greps through the files that are tracked, so skips all binary files, while 'grep' by default also searches through compiled executables, .o files, etc.
ReplyDeleteIncluding only .cpp and .h files (which are the largest part of what is in there):
$ time git grep "\(TODO\|FIXME\)" > /dev/null
real 0m0.202s
user 0m0.276s
sys 0m0.060s
$ time grep --exclude-dir=.git --include=\*.{cpp,h} -r "\(TODO\|FIXME\)" . > /dev/null
real 0m0.210s
user 0m0.156s
sys 0m0.052s
$ time LC_ALL="UTF-8" grep --exclude-dir=.git --include=\*.{cpp,h} -r "\(TODO\|FIXME\)" . > /dev/null
real 0m0.252s
user 0m0.172s
sys 0m0.040s
that second one is with LC_ALL="C", btw :)
DeleteThe Silver Searcher anyone ? https://github.com/ggreer/the_silver_searcher
ReplyDeleteOff topic, but if I click the "Internet Defense League" banner on the right, it opens the website in a tiny iFrame. (Safari)
ReplyDeleteIt does to me too in Chrome, I'll fix it, thanks! :)
Deletefixed :)
DeleteBeen using `ag` for a while https://github.com/ggreer/the_silver_searcher. Please do give it a try and compare with these results.
ReplyDelete
ReplyDeleteDid you see the comments on HN?
Dont use the BSD grep included with OSX, use GNU grep
BSD grep is better.
DeleteShared! Shared! This is AWESOME stuff man! Thank you!
ReplyDeletePay Per Click
It is extremely nice to see the greatest details presented in an easy and understanding manner.
ReplyDeletePocket Pussy