Kemper Duplicate Finder - free program to detect duplicate amp/cab profiles

  • https://kpaduplicates.codeplex.com/


    Here's a free Windows program I made to find duplicate amp/cab profiles among groups of .kipr files. This looks at the actual profile data, not the tags. The tags can be somewhat easily changed. This should always find matches, even if someone dubiously changed the tags, or if the tags were blank/vague. It also ignores the amp/cab parameters, such as Definition, Clarity, High Shift, etc. which could be tweaked by the user. It's pretty simple to use. It's useful for a few purposes:

    • for commercial profilers, you can make sure no one else is ripping you off.
    • for those like me who make lots of tweaks and end up with tons of profiles, it can be handy to quickly find groups of profiles using the same amp or cab, especially if the profiles were tagged poorly
    • it's interesting to run to find out if vendors are really making profiles for all the rigs they distribute or if they are just tweaking a few parameters, reusing the same amp/cab, etc.

    It should be stable, but I have noticed it may hang if you try to compare a large number of files. If people find it helpful and interesting, maybe I'll try to spruce it up.


    Hello Robert,


    I just wanted to give you a shout out, and to thank you so much for this awesome program. I downloaded the latest version a few days ago (v1.1.0 dated Feb 17th), and have been working with it the past couple of days. Friggin' amazing and powerful tool, my man. You rock!


    It helped me solve a problem that had me pulling my hair out. I basically use 3 or 4 different Cab profles, which I have saved as presets. They are all from Merged Profiles. One is from Michael Britt, when he made the merged pack for the Factory 3.1 release. I believe it his is special 3rd Power 2x12 cab. Another is from Till's Cab Lab pack. However, the one I finding myself drawn to, and using quite a bit is from the incredible, free Thumas Merged profiles.


    And this is where your fantastic piece of coding came to the rescue. I knew I was using a Thumas 2x12 Rectifier Cab on quite a few of my favorites. I just couldn't remember which Merged profile I had cut/copied this Cab from. It turns out (thanks to your program) that I am using the same Cab from two different sources/profles...one from one of Thumas' Engl Powerball V1 merged profiles, the other from his Soldano Hot Rod 100W.


    Now the mystery has been solved, thanks to you.


    A couple of quick questions for you, Robert...


    1) Does the Kemper assign a unique Hash code to each Amp/Cab profile, every time the profile process is run? I used your program to search through Thumas' Merged profiles, and basically there were no duplicates of any Cabs. Since the Kemper can't know if a physical Cabinet has been changed between a profiling session, I have to assume it assigns a unique Hash number for each and every single profiling session (both to the Amp as well as to the Cab). Is this correct?


    2) When a profile is made, does the Hash code incorporate the date and time-stamp? The Hash codes themselves are 64 characters in length. The string is so long, it doesn't fit on the program display pane. I had to save a Duplicate Finder session to a .txt file, in order to see the full length of the Hash codes. It looks like a Hexadecimal code, since the highest letter I have seen used is "f". I guess I am asking if the Hash codes themselves have some meaning, or are they randomly generated. I did try converting one of the hash codes to a text string, but I got nothing but gibberish.


    Cheers,
    John

  • Awesome.


    So im computing the hash value using MD5. The kipr files contain blocks of what i assume is binary data. There may be a few more bytes that signify something else before the binary data - i dont know and dont care. I hash the whole thing. I use a hash to reduce the size of the data to display it, and because i dont want to post the true binary in case anyone decided instead to use my program to duplicate profiles rather than detect them. Also, this allows a commercial vendor to share results without sharing their profiles.


    If the profile was created after merging was introduced there are 4 blocks for amp profiles and 3 for cabs in each kipr. From my tests, 2 of the amps and 1 of the cabs can change when put on different rigs or when one of them is switched. So for these kind of profiles i combine the hashes of the 2 amp and 2 cab blocks that dont change. If you double click the file in the right pane, you can see all the amp/cab blocks and their addl id/type. Youll see amp11, amp12, etc.


    For older kipr files, theres only 1 block for amp data and 1 for cab.


    It does appear that the profiling process will never produce identical profiles, even of the same amp/cab. Not sure. I can test this on my pod hd, which should have no variation outside of the randomness of noise in the cables from each attempt at profiling. Refining will have an impact as well, so i will not do that.


    I dont think theres a datetime embedded in the data blocks - there are amp/cab creation date and time tags though. I think the variation between profiles is just how very precise the profiling process is. The slightest change, possibly something totally random, means youll get a different profile. And the data blocks are pretty large - its quite statistically significant if two blocks match. I cant say they were 100% created from the same session, but every real world collision ive detected so far was confirmed that it came from a duplication of a profile, not two different profiling sessions that were incredibly similar.


    If there are real world collisions from different profiling processes, id imagine itd have to be using the exact same gear in the exact same configuration, not like i profile my dual recto in my house in new orleans, while you profile your dual recto in new york and we both use the same amp settings, cab model, and mic and mic position. Even if perfectly similar, there still exists variation in our tubes, the power, speaker wear, mic wear, random noise picked up cables, lightings effect on interference, and room reflections.

  • Excellent - thanks for trying that approach. How is performance?


    Pretty good. It compares 900 rigs and found 109 duplicates in about a second (iMac Quad Core i7 2009).
    Just to be sure... there is no way, right now, to delete duplicates directly from your tool, right?
    BTW, thanks a lot for your idea... pretty smart! :-)

  • there is currently no way to do any kind of file system manipulation via the program. all you can do is double-click the filename in the right pane to see all the tags and amp/cab block hashes. you have the full path for the filename in the grid so you can find it on the computer when necessary.


    so let's say we want some file manipulations. I'm thinking for features we can do a context menu (right-click) with
    * View Tags, Amp/Cab Hashes
    * Delete File
    * Open File Location


    I can also make Delete perform a delete, and Enter open the view tags pop-up.


    I will also make a delete remove that file from the appropriate data structures behind the scenes and remove them from the visible grids.

  • Awesome.


    So im computing the hash value using MD5. The kipr files contain blocks of what i assume is binary data. There may be a few more bytes that signify something else before the binary data - i dont know and dont care. I hash the whole thing. I use a hash to reduce the size of the data to display it, and because i dont want to post the true binary in case anyone decided instead to use my program to duplicate profiles rather than detect them. Also, this allows a commercial vendor to share results without sharing their profiles.


    Thanks for the detailed reply, Robert.


    In case it wasn't evident in my post...I know very little about the underlying structure of computer files, or digital encryption methods. I probably was using incorrect terminology.


    Whatever the method/format...I was assuming that the Kemper hardware embeds some sort of unique identifier/descriptor each time a profile is created and a .kipr is generated. I was thinking perhaps it was based upon a time-stamp (using the internal clock) in combination with the KPA unit's owner name (= Rig Author). Of course, I reserve the right to be completely wrong in this assumption. :P:D


    Cheers,
    John

  • oh, and as far as performance, i was contemplating adding a progress bar and possibly abort button so it doesn't look like the program is simply hung when you start processing finding duplicates. however, it seems to go very quickly. I can compare my main kemper folder to itself (probably over 10k rigs) in about 35 seconds on my Core i5 Surface Pro. Obviously, this is as rigorous a test as will ever occur. For a normal test (one commercial author vs. all commercial profiles (3k+ profiles)), it finishes in about 7 seconds. (I find it generally takes me about 15-20 seconds before I start thinking WTF is this thing stuck?)


    Does anyone think a progress bar or similarly showing some update to the user while processing is necessary?

  • oh, and as far as performance, i was contemplating adding a progress bar and possibly abort button so it doesn't look like the program is simply hung when you start processing finding duplicates. however, it seems to go very quickly. I can compare my main kemper folder to itself (probably over 10k rigs) in about 35 seconds on my Core i5 Surface Pro. Obviously, this is as rigorous a test as will ever occur. For a normal test (one commercial author vs. all commercial profiles (3k+ profiles)), it finishes in about 7 seconds. (I find it generally takes me about 15-20 seconds before I start thinking WTF is this thing stuck?)


    Does anyone think a progress bar or similarly showing some update to the user while processing is necessary?


    That would be great...even if it is something as "simple" as an hourglass or rotating circle metaphor, perhaps with the statement that the program is checking for duplicates. If an actual progress bar, which tracks time to task completion, is just as easy to program...then by all means. :thumbup:

  • I believe merged rigs are 6 kb while studio rigs are 4 kb?


    Hi Michael,


    I think that the increase to 6 kb filesize was due to the OS 3.0, which provides for possibility/capability of Merging process, and Merged profiles. However, the problem is that both Studio and Merged profiles are 6 kb size, beginning with OS 3.0 and later. So, you can't tell just by the filesize.


  • Hi Robert,


    Just wondering if there were any new updates coming in the future, with your excellent KPA Duplicate Finder app.?


    Cheers,
    John