*Cryptofunktik is such a little pig that will look for you goodies, that’s what this author’s parser is called.
*It is assumed that the reader has learned the article – “Basic skills of working with keys and addresses, or how not to fuck up the crypto that was so close
*It is very desirable that the reader is also familiar with the crypto-verstack, article – Bottomless barrel of private keys, crypto-verstack, go in search of treasure!
Who is the article aimed at?
Those who collect logs for crypto, and those who parse logs with the same interest.
What’s in the article.
General description.
Case studies.
Big date effect.
Parsing workspace.
Program parsing, keys, results.
Section for those who want to tune and modify the software.
Afterword.
Where do the legs grow from.
This article is based mostly on my experience collecting and analyzing logs on the topic of cryptocurrencies. My main task was to collect information on the client side, it was parsing disks, char, nas, das, browser traffic, memory, it didn’t come to parsing ftp.
What, how and why, we are looking for.
Wallet files and it’s trivial, crypto addresses, private keys, implicit pairs, interesting data – hidden information, guiding information.
Interesting data and covered information sounds like something that needs explaining. Hidden information is for example information overlaid with base64 and cunning delete = at the end, and I found it not only in traffic but also in the files, how did I guess? What prompted you to look inside? – “mass of interest”.
The term “mass of interest” here would mean that the object triggered events, each of which added some cooficient of mass, and this cooficient exceeded the threshold value and parser sent the object (file, traf, memory fragment) to my cold table, so an algorithm appeared that looks inside what is recognized as bases64.
Let’s look at the approach I’m using in general terms, purely by its meaning.
Work with parser is similar to the antivirus, the object is in the search for something and when something is found it is set event, after examining the object we have the most interesting event and weight of all events in the complex, it is by these criteria we determine where to send the object, it is important to note that the most interesting event is more important than the total mass, that is finding a private key is more important than finding any number of addresses, and this object goes into the private keys.
Recently there was a code of searching for passphrases in the files, and the code was in demand among forum users, but word phrases can be in different languages (there seems to be support for 3 languages), and different purses have different sets of words and different ways to count checksums for these sets, and to write a universal algorithm is almost unreal. What in this case the events and their weight approach can offer? – Imagine that we found the phrase “safe place” in the object and it added weight, found the word “coin” and it added weight, found the address of cryptocurrency and it also added weight – probably such a file makes sense to inspect in manual mode. The same way file names add weight. By the way, in addition to crypto, I also searched for accesses…that’s what I mean by a competently composed search masks and events, as well as setting coefficients approach is universal. For example, if address itself has low level of interest, but for example the fact that it was hidden in base64 raises this level.
What are implicit pairs? – let’s say we found 1HZwkjkeaoZfTSaJxDw6aKkxp45agDiEzN and e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 so this hex row fits as a private key for the above address, even if the parser comes across b5bd079c4d57cc7fc28ecf8213a6b791625b8183 and e3b0c44298fc1c149afbf4c8996f92427ae41e4649b934ca495991b7852b855 it will determine that this address-private pair, how important is all this? – and it depends on the material, if you’re looking where you’ve already searched before then that’s the implicit and the whole calculation. Looking for connections between hexbin sequences, I learned from analysis of traffic and memory, and I was surprised when these connections began to be found in the files, and when it is in the files is clearly an occasion to pay more attention to the client as a whole, by the way mass gained by the client a good indicator, for example on the client found no privates or wallets but a bunch of addresses…. pay attention to it is not the worst way to spend time.
People are strange.
Once I decided to collect all 32-byte hex sequences from clients, make keys from them and search for matches in the logs, when I found these matches I started looking for where these hex sequences were, and saw that someone just kept them in the file (maybe the service or purse gave it to a man and told to save), but more fun, I found a file with sha256 music tracks and clever musician made them privately keys and even moved there crypto, hence the growing legs in the search for treasure, and search correspondence in the parser. And the more events the more wonderful, and again food for thought =)
Big Date Effect.
Let’s return to the address 1HZwkjkeaoZfTSaJxDw6aKkxp45agDiEzN it is interesting for several reasons, it was found at one client and the private key to it at another. Parser is so set up that first processes each client individually, and when it processes all the clients it forms a common base and looking for a pair of key-address between clients, and sometimes finds it =)
But back to this finding,
this is what it looks like in the parser log
“b5bd079c4d57cc7fc28ecf8213a6b791625b8183”: { { this body from 1HZwkjkeaoZfTSaJxDw6aKkxp45agDiEzN
“btc_type_prv_hex”: [
“e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855” # is the body of the private key from which this address can be generated
],
“eth_type_prv_hex”: [],
“20p_base_xx_found”: [
“1HZwkjkeaoZfTSaJxDw6aKkxp45agDiEzN” # here we see in what form the address was found
],
“20p_bech32_found”: [],
“20p_bitcoincash_found”: [],
“32p_bitcoincash_found”: [],
“32p_base_xx_found”: [],
“32p_bech32_found”: [],
“32p_found_in_hex”: [
“e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855” # and here we see what form the private key was found in
],
“20p_found_in_hex”: [],
“20p_found_in_trash”: [],
“20p_found_in_trash_src”: []
}
It’s always very interesting when private keys are in an atypical form, and in this case it was found as a hex sequence, which means we need to look at the sources where it was found
In the log from client A, we see in the browser cache
search 1HT7xU2Ngenf7D4yocz2SAcnNLW7rK8d4E
search 1HZwkjkeaoZfTSaJxDw6aKkxp45agDiEzN
search 1J35L45RYDXjvtADKgRn5G9uS7iHgQbaaE
search 1JThDFJLAJz5vsg3WpE56qiidUvpAHVeRC
in the log from client B – we see in the browser cache
ud%5Bem%5D bb6626242fcdb098df32715fef8d39ed6f9892057293fcac8177b2b5984f6422
ud%5Bfn%5D 1630036277d45c5e12c8e17915141b25c603240211e148f3c79ad3415238fa35
ud%5Bln%5D 581b935960d26c5a14cf05ae05cfcd7cf64b8449e3a98f7913f0891bdaff0ca2
ud%5Bph%5D 2e6a44fa51f86fb88997b6473bfc0b497406d12374dfd42f85ce9bd12dc1bf50
ud[em] bb6626242fcdb098df32715fef8d39ed6f9892057293fcac8177b2b5984f6422
ud[fn] 1630036277d45c5e12c8e17915141b25c603240211e148f3c79ad3415238fa35
ud[fn] e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
ud[ln] 581b935960d26c5a14cf05ae05cfcd7cf64b8449e3a98f7913f0891bdaff0ca2
ud[ln] e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
ud[ph] 2e6a44fa51f86fb88997b6473bfc0b497406d12374dfd42f85ce9bd12dc1bf50
ud[ph] e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
I typed this address into a search and found mentions, like here =)
https://medium.com/analytics-vidhya/calculating-the-reproduction-number-ro-of-covid-19-in-india-and-visualizing-the-same-using-e9a5d35ca64c
in general, this private key has some secrets and I think the forum users will disclose them =)
you should start to work with the found hex like this in a crypto-verture, like this
>b:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
retrieve
…
bc1qngw83fg8dz0k749cg7k3emc7v98wy0c74dlrkd – bech32:legacy:bech32 9a1c78a507689f6f54b847ad1cef1e614ee23f1e compressed by default
BTC:private:mainnet
5KYZdUEo39z3FPrtuX2QbbwGnNP5zTd7yyr2SC1j299sBCnWjss – WIF:base58check 80 e3b0c4444298fc1c149afbf4c8996f92427ae41e4649b934ca495991b7852b855 not compressed 5c5bbb26
1HZwkjkeaoZfTSaJxDw6aKkxp45agDiEzN – p2pkh:legacy:base58check 00 b5bd079c4d57cc7fc28ecf8213a6b791625b8183 compressed unknown 2f20e59b
3DMrVbb4GkPDYpPoGUVFetsfjEFVf9Hd3Q – p2psh:p2sh-segwit:base58check 05 8001bbf9081d5f8e0e6d96548ed0db53f2b0277c compressed unknown 3a53855b
LTC:private:mainnet
TAgaTiX4btdMhNY6eSU5N5jvc71o6hXKdhoeBzEk31AHykGDou8i – WIF:base58check b0 e3b0c448298fc1c149afbf4c8996f9b2427ae41e4649b934ca495991b7852b855 01 0e41541f
LZGpRyQPybaDjbRGoB87YH2ebFnmKYmRui – p2pkh:legacy:base58check 30 9a1c78a507689f6f54b847ad1cef1e614ee23f1e compressed unknown 4052c9e1
….
Then type in the check
>c
and you will see the following
address body checked 5, found 1, time elapsed 0.0002538999979151413
or the following
address body checked 5, found 0, time elapsed 3.089999999998723615e-05
If the first option means at some of the addresses is balance, if the second is looking for something delicious somewhere else, through this address was quite a lot of delicious by the way.
Namely
Address
1HZwkjkeaoZfTSaJxDw6aKkxp45agDiEzN
Format
Base58 (P2PKH)
Transactions
683
Total Received
59.08944864 BTC
Total Sent
59.08944864 BTC
Final Balance
0.00000000 BTC
last transaction
Hash
e46465f6754e5b9bb8193ff3a3bc66e1fbca64022178887577d5283c88a457d1
2021-04-05 00:59
1HZwkjkeaoZfTSaJxDw6aKkxp45agDiEzN
0.00002378 BTC
16fDy6r9EkoMENXsDiWGEptfSdGEYJDSg2
0.00001189 BTC
Fee
0.00001189 BTC(5.308 sat/B – 1.327 sat/WU – 224 bytes)
Try checking the ether for this private =)
This example is not without trickery, I think someone will guess what’s the trick with this key.
Next will be a parse of the work with the parser, if you’re not interested, then the next for you will not be interesting.
Working environment.
For processing large amounts of data, I highly recommend doing all this on a RAM drive, as it is desirable to have several gig of free RAM, it all depends on what volumes you parse, and cores – they do not happen a lot, and yes take pity on your old two-core laptop.
Parser settings out of the box.
Parser is configured for “frequent combing”, that is, he is configured not to miss anything, and it will give you a lot of information and it is clear that there will be a lot of unnecessary information, but the information is marked with coefficients and accompanied by explanations and you can make several stages of information processing.
Operation.
Requires installation of python libraries
https://pypi.org/project/bitcoinlib
https://pypi.org/project/pycryptodome/
https://pypi.org/project/ecdsa/
Command line parameters:
-fi folder input
points to the folder where the parsing is going to take place, it is assumed that this folder contains the clients folders, it could look like this
-fi “R:\Temp\src”
The contents of the folder usually look like this
R:\Temp\src
CA[1E……
CA[5C……
Each client will have its own thread, so it’s important that the folders are organized correctly.
-fo folder output
specifies the folder where the result is stored, e.g
-fo “R:\Temp\dst”
it must be either an empty folder, or it must not exist at all, in the latter case the program will create it by itself
-th num threads. ignoerd for single client mode (one client, one thread)
you should not specify more tracks than your processor can handle
sc single client mode. interpret folder in as single client folder, otherwise is interpreted as container of client folders.
you will probably not need this key.
fipt find pairs for total clients
When you are done with client folders, it looks for key-address pairs for everything that was collected from all clients, the big date effect.
fip find pairs for each client separately (use ‘fip’ and ‘fipt’ have sense if ‘icd’ is used)
look for pairs inside each client separately, it does not prevent then to look for all together
icd copy interested client folders. not affected for ‘sc’ mode
clients which were found interesting will be copied to a special folder in the destination folder, it’s convenient to filter out only those that were about crypto, copied clients will be accompanied by explanations of what caused them to be copied. Folders with client data will be very useful when you find BIP38 keys or locked wallets.
icf copy interested files
Files that are considered interesting will be copied to a special folder in the results folder, it is convenient not to jump into folders checking the log – where is the same file to view it in the editor, copied files will be accompanied by an explanation of what caused their copy.
delsrc delete source folder after scan
remove clients source folders, this is very useful if you have enabled copying clients and folders, because your disk will most likely be limited in size, but more about that later
noclistats no save client reasons (__reasons_for_save__) work with “icd” key
not accompany copied clients with a description of the reasons, may be necessary if you just have primary client filtering mode
wchk check words
look for interesting words, and this key refers to this structure in the source
# this checks only if file is not interested by previous checks
# look at min_file_weight_for_interest for knowing what points need for mark file to be interested
interested_words = {
EVENT__interested_words: [
(re.compile(rb”(([sS]eed)|([wW]allet)|([cC]oin)|([sS]secret)”), min_file_weight_for_interest – 1), # (min_file_weight_for_interest – 1) == one of these words and any other + event
(re.compile(rb”(safe place)”), min_file_weight_for_interest)] # keep it in safe place =)
}
The structure is simple – mask, weight given by finding. I want to note that python library re is not the same reg-exps as, say, awk, I didn’t have time to study this module and I just added simple masks, thus forming a template for you, which you can complement, and yes humanly – in external configuration files there was no time to put all this too.
b64chk check in base64
do a search in what could be base64
nocolors b\w mode
black and white mode
nologs no logs mode.
no logs
I do the following
py -3 main.py -fi “R:\Temp\src” -fo “R:\Temp\dst” -th 20 fip fipt icd icf icw wchk b64chk delsrc
Orientation to stats.
stats_files_total = 61983
stats_files_processed = 57895
stats_folders_total = 27346
stats_folders_processed = 27346
stats_bytes_processed = 9965399967
stats_data_convert_times = 3235
stats_skip_files = 4088
stats_files_by_interested_words = 1581
stats_checked_20p_b58_total = 608
stats_low_interested_total = 4
stats_wallet_files = 54
stats_handy_check_files = 96
stats_handy_check_folders = 15
stats_events_in_files_total = 241226
total clients processed 3075
time elapsed 3164.98 seconds.
Results of work.
All work results go to the output folder – remember the -fo key
You can monitor the progress of work by looking at the
temp_log_worker_0
This is a log file of manager, he is not engaged in prrasing customers, he distributes the tasks to other threads, marks statistics and sums up.
You can see the statistics in stats.txt
db_prv_variants.json
this file in the output, one of the most important, here collected all that interpreted as private keys and this file can be checked by workbench for balances, see in the workbench about chkdb command, it usually comes out very cumbersome and for manual processing is not suitable.
files temp_log___il… are logs by events, after that comes the level of interest of the event, and then a description of what it is.
pairs_btc_like.json, pairs_btc_like_hex.json are found pairs for bitcoin-like currencies, the _hex is pairs where some data was not in the usual encoded form but in hex, for ether pairs_eth_like.json.
the contents of the result folder may look like this
182-1.png.ef2d01e4c7ace90ce96e388b6ad497a5.png
Without a file manager like far, all this will be extremely inconvenient to work with, but for those who do not know how to use far, then fine =)
look at interested_word_files
182-2.png.5e519bb849fee7e10374e0410169f7c7.png
The first number in the names is the level of the most interesting event on the file, the second is how much weight it gained, the third is crc32 which helps avoid duplicates.
Each file is accompanied by a description .src.txt which explains why it was chosen, let’s look inside the main.db.src.txt
R:\Temp\1\AA——————————————————————————————–\main.db is the path from where the file was taken
interested words(interested_words): b’safe place’, – this is the event that happened in it, the event name in brackets.
interest file by its weight(file_weight_interest): weight is 30 – and this is the reason why it was considered interesting in this case gained critical weight – the second digit in the name.
The same applies to all other folders and files, for clarity it is worth looking at the wallet
182-3.png.e7147e59267caf68332b81a35a103427.png
note that some files gained more weight than others, this usually means that there were found a lot of addresses, as well as that they were actively used, so start looking at the material by the most interesting event and the highest weight, these are the most promising objects.
let’s look inside 31_306_3900267033_wallet.dat.src.txt
R:\Temp\…………………………………………………………\Coins\BitcoinBitcoinQT\wallet.dat
check it file handy(check_it_file_handy): mask on file/folder name: R:\Temp\…………………………………………………………\Coins\BitcoinBitcoinQT\wallet.dat
wallet file(wallet): mask on file/folder extension: R:\Temp\…………………………………………………………\Coins\BitcoinBitcoinQT\wallet.dat
wallet file(wallet): wallet found by format 12:16 = 62310500
base58 20+ checked(base58_20p_checked): b’16cAVR3………………………’
base58 20+ checked(base58_20p_checked): b’1JaPNw……………………….’
base58 20+ checked(base58_20p_checked): b’33KF8………………………..’
….
base58 20+ checked(base58_20p_checked): b’16Q………………………….’
interest file by its weight(file_weight_interest): weight is 306
It is actually clear why this wallet has gained its weight of 306 points.
Again, I have not had time to accurately tuning parser and that is why there will be obviously unnecessary files in the sample, but they are all in one folder and can easily be cut out / copied this – one, and two – file weight is a very good guide on what is worth starting review and what to push to the last place.
In general, everything, download the public logs, try the parser on them.
About data redundancy.
About all these created jasons, whether they are all necessary, and where to apply them.
I usually try to collect as much data as possible, sometimes ideas how to use them come after a considerable amount of time, which is why the product retains a lot of operational databases, which may be useful only to coders with rich imagination and knowledge of the subject. That is, the db_hex_… files outside work are not needed, I use them to check whether the code worked correctly, search for artifacts (oddities in the work of software), and check all sorts of crazy theories.
Tuning, testing, tuning.
It’s pretty obvious, but I’ll write it anyway, you make up clients and data for them and test parser on them, see what you found from expected and what not, tweak settings, where and where it will spin further. But before you parse your clients, you’d better figure out what the parser can and can’t, maybe it will miss the most important for you.
Modification of the Code.
Part for those who will, fix, modify, refine the code.
Important for understanding and tuning sections:
First, let’s look at class variables
class Ripper:
pay attention to flags and structures
read the events, basically all parsing is built on the events
# events
EVENT__pair = “pair”
EVENT__hidden_content = “hidden_content”
EVENT__base58_32p_checked = “base58_32p_checked”
…
Writing thousands of branches in such tasks is definitely not an option, so tables and callbacks
# event config records
internal_event_types = {
EVENT__pair: { # pair found – private key and addr
# pair check after all of files parsed already, so file of pair source in that moment unknown
“log”: “pairs”, # log file for saving info about that event
“log_legend”: “pair”, # legend why it interested
“log_interest_level”: 50, # level of interest log
“client_interest_level”: 50, # level of interest client where it found
“counters_up”: { “stats_pairs”: 1, “stats_client_weight”: 100},
},
EVENT__hidden_content: { # inside in base64 or binary data
“log”: “hidden”,
“log_legend”: “hidden content”,
“log_interest_level”: 40,
“file_interest_level”: 40,
“save_file_folder”: “interested_files_hidden”,
“client_interest_level”: 40,
{ “counters_up”: { “stats_hidden_content_in_file”: 1, “stats_hidden_content_total”: 1,
{ “stats_file_weight”: 100, “stats_client_weight”: 100},
},
EVENT__file_in: {
{ “handler”: __internal_event__file_in,
{ “set_attributes”: { “fl_in_base64”: False, “fl_in_bin”: False, “fl_current_file_interest_level”: 0,
“fl_skip_current_file”: False, “fl_wallet_file”: False, “fl_handy_check_file”: False,
“interested_file_reasons_log”: “”, “fl_current_file_save_folder”: “”},
{ “counters_up”: { “stats_files_total”: 1, “stats_files_in_folder_and_subfolders”: 1},
{ “counters_set”: {“stats_file_weight”: 0},
},
….
# about file/folder names and extensions (before check file names converted to lower index)
file_name_check = {
“extensions”: {
EVENT__archive_file: [re.compile(r”.*\.(rar$)|(tar$)|(gz$)|(7z$)|(zip$)”],
EVENT__skip_file: [re.compile(r”.*\.(jpg$)(jpeg$)|(gif$)|(waw$)|(avi$)|(mpg$)|(pdf$)”)],
EVENT__wallet: [re.compile(r”.*\.(dat$)|(seco$)|(vault$)”)], # yes all dat files count as purses too is such an idea
},
“names”: {
EVENT__check_it_file_handy: [re.compile(r”(electrum)|(wallet)|(coin)|(passphrase)”)]
}
}
folder_name_check = {
{ “extensions”: {
},
{ “names”: {
EVENT__mark_it_folder_to_handy_check: [re.compile(r”(electrum)|(wallet)|(coin)”)
}
}
# this checks only if file is not interested by previous checks
# look at min_file_weight_for_interest for knowing what points need for mark file to be interested
interested_words = {
EVENT__interested_words: [
(re.compile(rb”(([sS]eed)|([wW]allet)|([cC]oin)|([sS]secret)”), min_file_weight_for_interest – 1), # (min_file_weight_for_interest – 1) == one of these words and any other + event
(re.compile(rb”(safe place)”), min_file_weight_for_interest)] # keep it in safe place =)
}
….
All parsing of file contents is done here
def process_buffer(self, data: bytes) -> None:
Well, in general, that’s all about how the code is organized. Not all coders will be happy with this model, so this little code review will save you a lot of time and effort to get to know the architecture personally.
The most important thing the parser lacks.
Extraction of keys from wallet.dat to db_prv_variants.json I just did not have time to do, and they would be very fast and convenient to check the workbench. Maybe it’s you who will add this function to the pound =)
Afterword.
The period of a month is a very hard condition. I’ve had practically no developments in python, and I’m not familiar with python itself, so the quality of the code is so bad and it was implemented less than I had planned, and also poorly formatted and I write these lines on the 20th day, there was just no time for refactoring. You can say I barely had time. I am saying that if you allocate a little more time to the contest then the products will be of higher quality and maybe there will be more of them, who knows how many coders refused to participate because they had realized that they would not have time with the code. And without the code such material, for the majority, well, in the best case is curious, and you want something usable at once, just took and ran it.