descriptionnone
repository URLhttps://github.com/ECHibiki/Multipurpose-Regex-Webscraper.git
ownerqamirror@cock.li
last changeWed, 3 Apr 2019 03:18:22 +0000 (2 23:18 -0400)
last refreshSat, 27 Apr 2024 05:56:50 +0000 (27 07:56 +0200)
content tags
add:
README.md

Regex-WebScrapper

A CLI for retrieving info from web pages done on python3.7

python3 regexscraper -u "https://example.com" -r "(t|r)est"
This outputs all regex matches on site -s with pattern -r using selenium JS rendered sites

Important:

Selenium is one of those things that takes up more memory than it needs to. Take that into account in use.

Regex patterns:

IP Addresses Tupple: ((([0-1]{0,1}[0-9]{0,2}|25[0-5]|2[0-4][0-9]).){3}([0-1]{0,1}[0-9]{0,2}|25[0-5]|2[0-4][0-9])) This returns a tupple with the full address at the 0th index
IP Address Sort of: \d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} Returns a string of the IP address that is not nescicarily accurate

Things:

https://stackoverflow.com/questions/39547598/selenium-common-exceptions-webdriverexception-message-connection-refused
https://stackoverflow.com/questions/8220108/how-do-i-check-the-operating-system-in-python
https://selenium-python.readthedocs.io/api.html
https://stackoverflow.com/questions/1285917/how-to-disable-javascript-when-using-selenium/51681608#51681608

shortlog
2019-04-03 ECHibikiHandle 404's on --rawmaster
2019-04-03 ECHibikiCreate err_log.txt
2019-03-29 ECHibikiremove blank matches
2019-03-29 ECHibikiAdd requests methods
2019-02-18 ECHibikiUpdate README.md
2019-02-18 ECHibikiUpdate README.md
2018-12-19 ECHibikiChanges for linux
2018-12-18 ECHibikiChange close in favor of quit
2018-12-18 ECHibikiimproved to handle json and multiple sites
2018-12-17 ECHibikiCleanup
2018-12-17 ECHibikiproperly terminate program
2018-12-17 ECHibikiCommand line regex webscrapper
2018-12-16 ECHibikiprep for CLI
2018-12-16 ECHibikiprep for CLI
2018-12-13 ECHibikiMerge pull request #1 from ECHibiki/add-license-1
2018-12-13 ECHibikiCreate LICENSE1/head
...
tags
5 years ago v0.0
heads
5 years ago master