Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Jörg Stucke
    @jstucke
    @acarr91 I tried to reproduce the errors by re-analyzing a larger firmware but didn't get any exceptions. Did you get any stack traces in your logs that might be helpful? Was there something the files for which the analysis failed had in common (e.g. being very large or a certain file type)?
    acarr91
    @acarr91
    @jstucke Thanks for the update! Being the logs in /tmp and having restarted the machine I lost the log; I made some changes on the number of cores and as soon as possible I will launch the firmware analysis adding source_code_analysis and I will try to provide more information. Anyway, as a recommended configuration, is it better to start the analysis with all the plugins, or select only some and add the others later with an update analysis? From the logs I did not understand if the update with the addition of plugins involves an analysis from scratch or the analyzes already carried out are skipped.
    Jörg Stucke
    @jstucke

    From the logs I did not understand if the update with the addition of plugins involves an analysis from scratch or the analyzes already carried out are skipped.

    Analyses will be skipped if there is already an analysis result for this combination of object and analysis plugin in the database and the plugin version hasn't changed. The unpacking process is done again when using "update" or "redo analysis", though.

    is it better to start the analysis with all the plugins, or select only some and add the others later with an update analysis?

    If you're running out of memory or the analysis is very slow and you need faster results, it might be advantageous to only run certain plugins and update later but it should not affect the individual plugin results

    hairlessbear
    @hairlessbear
    Hi! Are there any known issues with files over a certain size? I'm currently trying to analyze a 32 GB image and FACT appears "stuck". I uploaded it in a 7zipped container that was ~6 GB in size. FACT properly extracted the 32 GB image from this file, but doesn't appear to be doing anything since then. That is, the logs just show [Unpacking][INFO]: Queue Length (Analysis/Unpack): 1 / 1 over and over and in htop there doesn't appear to be any analysis or unpacking running (i.e. very low CPU utilization by all processes). The logging is at the DEBUG level, but I don't see any errors. The mongo logs also don't have any errors, although when the unpacking of the 7zipped container finished, the mongo logs did contain [ftdc] serverStatus was very slow, which might be relevant?
    If there aren't any known issues with large files, any pointers on where in the FACT core codebase I should look to debug this would be appreciated.
    Oh, forgot to mention, it's been in this stuck state for about an hour and a half.
    I also used the REST API to grab status, which showed that all of the analyzer plugins have empty queues. Here's backend.analysis (file name removed):
    "analysis": {
            "analysis_main_scheduler": 1,
            "current_analyses": {
              "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_6687791786": {
                "analyzed_count": 0,
                "start_time": 1609877170.701255,
                "total_count": 2,
                "unpacked_count": 1
              }
            }
    hairlessbear
    @hairlessbear
    And this is running on Ubuntu 20.04 with python 3.8.5, single system setup, 16 cores, 64 GB RAM
    Hmm, I just took a closer look at your runtime troubleshooting page (https://github.com/fkie-cad/FACT_core/wiki/troubleshooting-runtime), and noticed that the 8GB of swap are totally used...so maybe there was a memory problem that caused a failure but logged no errors...
    hairlessbear
    @hairlessbear
    Further testing confirmed it's a memory problem. Tried upping the box to 79 GB, same issue happened. Any advice on what I can do to get this to work?
    CarmelinaSeiz
    @CarmelinaSeiz
    @hairlessbear just curious, how long does it take to analyze a 100+mb firmware? Mine feels like its taking ages
    hairlessbear
    @hairlessbear
    Size alone isn't enough to say :) Depends on the analyzers selected and what exactly is in the firmware image. For example, if it has many nested archives, it'll take much longer than if it has fewer files, or if it's just a single monolithic file.
    Analysis in general takes a long time though.
    Jörg Stucke
    @jstucke
    Hi! FACT is not really intended to work with images as large as 32 GB. It ultimately depends on how many files are unpacked recursively and how much RAM and time you have, but everything above a couple of gigabytes will probably get unstable. You could try to disable all non-mandatory analysis plugins. If you start with --log_level DEBUG you will get a lot of debug output which might help to find problems. But please be aware that firmware images around 100 MB already usually contain thousands of files and take a long time to analyze accordingly (probably multiple hours because each file passes through the unpacking and analysis processes individually).
    That being said, we have not really tested FACT with images of that size. Therefore I'm not sure at which points problems will arise.
    Jörg Stucke
    @jstucke
    Now that I think about it, there is already a serious memory problem caused by the fact that FACT reads the contents of the files and passes them in memory to the different analysis systems during analysis (to speed up the process especially for smaller files). It would probably make sense to change this behaviour for very large files and read them from the file system on demand. But many of the analysis plugins will probably need at least the same amount of memory as the file size. Obviously, this also holds true for the unpacking process. A good idea would probably be to limit the number of concurrent worker processes to a minimum (threads in main.cfg) to reduce memory consumption (but this will obviously also slow down the analysis and unpacking speed).
    hairlessbear
    @hairlessbear
    DEBUG logs didn't show any issues, unfortunately, but what you're saying makes sense, thanks! Sounds like it's not the number of files inside an image/archive that matters, but the size of individual files.
    I found a workaround in my case. I ran the fact_extractor docker image on the file outside of the context of FACT core (and without the memory cap). Running it this way meant it only used as much memory as the size of the image, which is much more manageable. Once it extracted the files, I tarred them up, which resulted in a ~1.5GB tarball. Throwing that at FACT worked as expected, since none of the individual files are large.
    hairlessbear
    @hairlessbear

    That said, there may be a bug in the file tree display? While all the files ended up in the right place, in the file tree, some of them are displayed in weird places. As in, if the file tree should be:

    root
    |-topdir1
    |--subdir1
    |---subsubdir1
    |-topdir2
    |-topdir3
    |--subdir3

    What's showing is like this:

    root
    |-subdir1
    |-subdir3file
    |-topdir1
    |--subdir1
    |---subsubdir1
    |-topdir2
    |-topdir3
    |--subdir3
    |--subdir3file

    It's not a huge deal since the files are still analyzed properly, just curious if this is something you've seen, and if so, if there's a known fix.

    Jörg Stucke
    @jstucke

    There were some bugs with the file tree (e.g. a bug with the same file occurring multiple times with different paths in the same container/archive but being only shown once in the file tree) but I was hoping we fixed them all.

    I could have a look at the problem but it will be hard to fix it if I can't reproduce it. If you could provide a minimal example to reproduce the problem, that would be great.

    hairlessbear
    @hairlessbear
    What you're talking about (identical files with different paths) is exactly the scenario I'm facing. I'll work on producing an archive that can reproduce the problem by mimicking the layout of the firmware. Once I figure out an archive that triggers it, I'll throw it your way!
    Caesurus
    @Caesurus

    FYI, Getting an error:

    Traceback (most recent call last):
      File "/usr/lib/python3.6/shutil.py", line 550, in move
        os.rename(src, real_dst)
    OSError: [Errno 18] Invalid cross-device link: '/scratch/E9DpHaYrKW7hpQKl/fact_unpack_egnbcptr/partition_2/usr/sbin/SUSEfirewall2' -> '/saved/files/partition_2/usr/sbin/SUSEfirewall2'
    ​
    During handling of the above exception, another exception occurred:
    ​
    Traceback (most recent call last):
      File "./fact_extractor.py", line 71, in <module>
        sys.exit(main(_parse_args()))
      File "./fact_extractor.py", line 63, in main
        unpacked_files = unpacker.unpack(args.input_file)  # NOTE: This is a list of Path instances, not strings!
      File "/opt/app/fact_extractor/unpacker/unpack.py", line 44, in unpack
        extracted_files = self.move_extracted_files(extracted_files, Path(tmp_dir.name))
      File "/opt/app/fact_extractor/unpacker/unpack.py", line 87, in move_extracted_files
        shutil.move(str(absolute_path), str(target_path))
      File "/usr/lib/python3.6/shutil.py", line 554, in move
        os.symlink(linkto, real_dst)
    FileExistsError: [Errno 17] File exists: 'SuSEfirewall2' -> '/saved/files/partition_2/usr/sbin/SUSEfirewall2'

    this seems to only happen when I try to extract to a directory that is a mounted volume. I do this because I don't want to give docker a large enough FS to hold all the files that are extracted, so i have a volume mounted /saved/ where i can extract files to. This has the downside of being slightly slower than writing to somewhere in the docker FS, but has the benefit of utilizing space that isn't part of the docker container and I skip a copy.

    Caesurus
    @Caesurus
    OK... i think i figured it out.
    move_extracted_files iterates over the files and calls shutil.move for each file. shutil.move will follow the symlink. So if there are two files, and one is a symlink (testfile1, and Testfile1(symlink))
    In [3]: shutil.move('/tmp/tim3/files/get_files_test/testfile1', '/saved/test12345/')
    Out[3]: '/saved/test12345/testfile1'
    
    In [4]: shutil.move('/tmp/tim3/files/get_files_test/Testfile1', '/saved/test12345/')
    ---------------------------------------------------------------------------
    Error                                     Traceback (most recent call last)
    <ipython-input-4-43d156cea8f2> in <module>
    ----> 1 shutil.move('/tmp/tim3/files/get_files_test/Testfile1', '/saved/test12345/')
    
    /usr/lib/python3.6/shutil.py in move(src, dst, copy_function)
        546         real_dst = os.path.join(dst, _basename(src))
        547         if os.path.exists(real_dst):
    --> 548             raise Error("Destination path '%s' already exists" % real_dst)
        549     try:
        550         os.rename(src, real_dst)
    
    Error: Destination path '/saved/test12345/Testfile1' already exists
    
    In [5]:                                                                                                                                                                                                                                                                                                                 
    Do you really want to exit ([y]/n)? y
    root@04847aebd90d:/opt/app/fact_extractor# ll /saved/test12345/
    total 4
    drwxr-xr-x 3 root root 96 Jan 11 21:26 ./
    drwxr-xr-x 3 root root 96 Jan 11 21:26 ../
    -rw-r--r-- 1 root root 62 Sep 15  2015 testfile1
    root@04847aebd90d:/opt/app/fact_extractor#
    Caesurus
    @Caesurus
    IoT-junkrat
    @IoT-junkrat
    Hey guys, I like the statistics page a lot! For file types and firmware containers, the page only shows the top 10. Can you tell me how to get the full list for both?
    Jörg Stucke
    @jstucke
    It shouldn't be much of an issue to make it configurable. I will give it a go.
    Johannes vom Dorp
    @dorpvom
    @jstucke Depending on the amount of different results, a programmatic way of getting the list (e.g. via REST-API) might be more useful than configuring the limit ...
    IoT-junkrat
    @IoT-junkrat
    Rest API would be fine for me! :grinning:
    Jörg Stucke
    @jstucke
    Too late I just opened a PR fkie-cad/FACT_core#536 :sweat_smile:
    It should make the number of elements configurable through the main.cfg file. Maybe you can give it a try.
    IoT-junkrat
    @IoT-junkrat
    I checked the feature you implemented. It works great, but, honestly, I have so many file types that I don't see the donut chart at all. Additionally, I can just see a small fraction of all file types analyzed. Is there an easy to implement solution for the API which lets me retrieve all different file types FACT analyzed like e.g. "application/something", "application/something_else" etc?
    IoT-junkrat
    @IoT-junkrat
    Or is there a temporary way of retrieving the list of e.g. file types manually from the machine running the FACT core? Since I don't need that feature regularly, such a solution would be OK for me \ as well.
    IoT-junkrat
    @IoT-junkrat
    Found it ☺️ I used the functions you guys use in the update_statistics file. Nevertheless, thanks for the new feature and your help!
    Johannes vom Dorp
    @dorpvom
    Perfect. Nice to here that our code is readable enough to find stuff ^^
    IoT-junkrat
    @IoT-junkrat
    Hehe it is absolutely readable 😉
    I already have the next idea I want to work on 😅
    I would like to retrieve the contents of password files (e.g. /etc/shadow) of analyzed firmware images and use haveibeenpwned to identify if the cleartext of found hashes is already out there.
    The tasks on the HIBP API are clear, but do you have a pointer for me how to get the hashes from FACT?
    The summary per analyzed binary spits out the FACT file id, but not the password hash from the plugin "users and passwords" AFAIK.
    Jörg Stucke
    @jstucke
    You could search for the files with results for the plugin "users_and_passwords" and iterate over the entries and look for ones that include "password-hash". This script should accomplish that if you run it from the src directory:
    import json
    
    from pymongo import MongoClient
    
    # start MongoDB without auth before running (i.e.: mongod --config config/mongod.conf)
    client = MongoClient("mongodb://127.0.0.1:27018", connect=False)
    fo_collection = client['fact_main']['file_objects']
    query = {"processed_analysis.users_and_passwords.summary": {"$not": {"$size": 0}}}
    
    results = {}
    for entry in fo_collection.find(query, {"processed_analysis.users_and_passwords": 1}):
        try:
            for pw_result in entry["processed_analysis"]["users_and_passwords"].values():
                if isinstance(pw_result, dict) and "password-hash" in pw_result:
                    results.setdefault(entry["_id"], []).append(pw_result["password-hash"])
        except KeyError:
            pass
    print(json.dumps(results, indent=2))
    but be aware that there might be false positives among the results
    IoT-junkrat
    @IoT-junkrat
    Thank you for that code snippet :-) It worked great.
    hairlessbear
    @hairlessbear

    What you're talking about (identical files with different paths) is exactly the scenario I'm facing. I'll work on producing an archive that can reproduce the problem by mimicking the layout of the firmware. Once I figure out an archive that triggers it, I'll throw it your way!

    I haven't had the chance to try to produce an example archive yet, but I have a theory on what's causing this. In my samples, all of the instances of this bug occur when the same file is present in different "levels" of archives. Here's an example of what I mean:

    archive.zip
    |--file_1
    |--file_2
    |--nested_archive.zip
       |--file_1
       |--file_3
    |--file_4

    In the above example, file_1 is present at the top level of the archive, as well as present in another archive within the root archive. If my theory is right, that's what causes the display bug and file_1 will show up in the wrong place.

    My FACT instance is currently occupied analyzing a big batch of firmware, so I can't test this at the moment. I'll try to do it within the next few days, but I wanted to let y'all know now on the off chance someone else feels like testing this :)
    hairlessbear
    @hairlessbear
    Notably, when identical files are all only present at the same "level" of archive, everything displays properly (at least in all of my samples)
    Jörg Stucke
    @jstucke

    I'm not sure I understand the problem exactly but I was definitely able to find a bug in the file tree:
    A test file uploaded as

    ├── test1.zip
        └── test_file_1.txt
    └── test_dir
        └── test_file_1.txt

    will display in the file tree as

    ├── test1.zip
        ├── test_file_1.txt
        └── test_dir
            └── test_file_1.txt
    ├── test_file_1.txt
    └── test_dir
        └── test_file_1.txt

    I will take a look and try to find the bug that causes this

    Jörg Stucke
    @jstucke
    fkie-cad/FACT_core#541 should (hopefully) fix the problem
    hairlessbear
    @hairlessbear
    Thanks! I'll take a look!
    One other thing, while I'm bringing up display bugs 😅 In some of my firmware samples, the same files exist between multiple different firmwares. When viewing one of these files, FACT properly shows that this file exists in multiple firmwares (in the "parent firmware" section). But in the "complete file paths in container" section, it only shows the paths from a single firmware, not from all of the parent firmwares.
    hairlessbear
    @hairlessbear

    fkie-cad/FACT_core#541 should (hopefully) fix the problem

    Initial test looks great, thank you!

    Jörg Stucke
    @jstucke

    it only shows the paths from a single firmware, not from all of the parent firmwares

    That is actually intended: It should only show the paths of the firmware you are currently looking at. We could display all paths when no "root uid" is selected (the 2nd endpoint parameter after "ro") -- currently the paths of a random firmware are displayed in that case.

    hairlessbear
    @hairlessbear
    That's not the behavior I'm seeing, unfortunately. While the path under the "Download" button is updated based on which firmware you're looking at, the "complete file paths in container" section always displays the paths from the same firmware sample, even if that's not the one I'm looking at.
    Possibly relevant, the sample whose paths are always shown is the first sample I uploaded that had this file in it.
    Jörg Stucke
    @jstucke
    @hairlessbear I finally found the time to look at this problem. Turns out there was a bug in the revised general information section which led to the "root_uid" (the id of the parent firmware) always being None. fkie-cad/FACT_core#544 should fix the issue and also display all paths in the case no "root_uid" is provided
    hairlessbear
    @hairlessbear
    Sweet, thanks! I'll try to take a look later today.
    IoT-junkrat
    @IoT-junkrat

    Hey guys 🙂
    I want to get all the software_components from all my ~800 firmware images.

    If I use the normal curl on the REST interface "/rest/FW_UID", I get only very few software_component summaries which are not empty. But the retrieval is fast.

    If I use the normal curl on the REST interface with the summary option "/rest/FW_UID&summary=true" I get the software_component summaries. But retrieving the results of a single firmware takes like 2 hours. Don't want to waste that much time.

    If I use the curl command on "/rest/firmware/?recursive=true" and the mongo query "'processed_analysis.software_components':{'exists':'true'}}" I get a fast result but with the file UIDs. So I would retrieve the parent FW UID of every file. Then I could merge all those files grouped per FW UID, right? (no clue how via the command line/script)

    Maybe you guys implemented something similar already...
    Can you imagine any other solution which is "fast" and outputs me all the software_component summaries? 😅