Trick to save resource files from visited websites

szatox · Advocate Joined: 27 Aug 2013 Posts: 3404

I've been looking for some way to save files hidden behind javascript viewers. Here is what I eventually came up with.
The solution below is not complete, it is not perfect, and it is not even the most convenient thing in the world, but it can follow you to the more obscure websites which don't have dedicated scrapers, which is way better than nothing at all.

How it works:
We connect to the websites via a proxy which snoops on our traffic and dumps responses on disk.

What do we need:
mitmproxy
custom script for saving files (mitmproxy comes with a bunch of provided scrips; haven't found a suitable one)
custom CA certificate

How to do set fings up:

Banana · Posted: Mon Feb 26, 2024 12:01 pm Post subject:

So, something like this https://docs.trafficserver.apache.org/index.html or this https://www.squid-cache.org/
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire

szatox · Advocate Joined: 27 Aug 2013 Posts: 3404

Kinda similar but not quite.
You _can_ script mitmproxy to act as a caching proxy (like in: returning the same content upon subsequent requests), but my version is not that smart.
I don't think squid can intercept encrypted traffic though, which makes it effectively useless those days. Mitmproxy can, as long as your browser accepts forged certificates. I suppose you could chain those 2, but that wasn't my goal.
Also, AFAIR squid used some kind of hashes for naming bits of data in its cache, so you'll have a hard time extracting it. I have mitmproxy store files under names which map directly to the source URLs. Much easier to use.
_________________
Make Computing Fun Again