Regex-WebScrapper

A CLI for retrieving info from web pages done on python3.7

python3 regexscraper -u "https://example.com" -r "(t|r)est"
This outputs all regex matches on site -s with pattern -r using selenium JS rendered sites

Important:

Selenium is one of those things that takes up more memory than it needs to. Take that into account in use.

Regex patterns:

IP Addresses Tupple: ((([0-1]{0,1}[0-9]{0,2}|25[0-5]|2[0-4][0-9]).){3}([0-1]{0,1}[0-9]{0,2}|25[0-5]|2[0-4][0-9])) This returns a tupple with the full address at the 0th index
IP Address Sort of: \d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} Returns a string of the IP address that is not nescicarily accurate

Things:

https://stackoverflow.com/questions/39547598/selenium-common-exceptions-webdriverexception-message-connection-refused
https://stackoverflow.com/questions/8220108/how-do-i-check-the-operating-system-in-python
https://selenium-python.readthedocs.io/api.html
https://stackoverflow.com/questions/1285917/how-to-disable-javascript-when-using-selenium/51681608#51681608

description	none
repository URL	https://github.com/ECHibiki/Multipurpose-Regex-Webscraper.git
owner	qamirror@cock.li
last change	Wed, 3 Apr 2019 03:18:22 +0000 (2 23:18 -0400)
last refresh	Sat, 27 Apr 2024 05:56:50 +0000 (27 07:56 +0200)
mirror URL	git://repo.or.cz/Multipurpose-Regex-Webscraper.git
	https://repo.or.cz/Multipurpose-Regex-Webscraper.git
	ssh://git@repo.or.cz/Multipurpose-Regex-Webscraper.git
bundle info	Multipurpose-Regex-Webscraper.git downloadable bundles
content tags	add:

2019-04-03	ECHibiki	Handle 404's on --rawmaster	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2019-04-03	ECHibiki	Create err_log.txt	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2019-03-29	ECHibiki	remove blank matches	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2019-03-29	ECHibiki	Add requests methods	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2019-02-18	ECHibiki	Update README.md	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2019-02-18	ECHibiki	Update README.md	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-19	ECHibiki	Changes for linux	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-18	ECHibiki	Change close in favor of quit	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-18	ECHibiki	improved to handle json and multiple sites	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-17	ECHibiki	Cleanup	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-17	ECHibiki	properly terminate program	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-17	ECHibiki	Command line regex webscrapper	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-16	ECHibiki	prep for CLI	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-16	ECHibiki	prep for CLI	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-13	ECHibiki	Merge pull request #1 from ECHibiki/add-license-1	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
2018-12-13	ECHibiki	Create LICENSE1/head	commit \| commitdiff \| tree \| snapshot (tar.gz zip)
...