Here’s my view on a new way to do webscraping, that maybe already done, but could be very interesting if used in the proper scenario.
- You can use python/autohotkey or a combination of both to navigate to the webpages that you want to check something out of.
- Then take a screenshot of that page, a complete screenshot.
- Then use gpt vision to extract whatever data/text you want.
- Additionally, with a combination of another GPT-3 pass that parses the reply into clean JSON - you have successfully extracted this data.
I’m trying this out, and it’s pretty fun to see and build!
You never know where this small experiment becomes a part of a larger project!