description | none |
repository URL | https://github.com/ECHibiki/Multipurpose-Regex-Webscraper.git |
owner | qamirror@cock.li |
last change | Wed, 3 Apr 2019 03:18:22 +0000 (2 23:18 -0400) |
last refresh | Sat, 27 Apr 2024 05:56:50 +0000 (27 07:56 +0200) |
mirror URL | git://repo.or.cz/Multipurpose-Regex-Webscraper.git |
https://repo.or.cz/Multipurpose-Regex-Webscraper.git | |
ssh://git@repo.or.cz/Multipurpose-Regex-Webscraper.git | |
bundle info | Multipurpose-Regex-Webscraper.git downloadable bundles |
content tags |
A CLI for retrieving info from web pages done on python3.7
python3 regexscraper -u "https://example.com" -r "(t|r)est"
This outputs all regex matches on site -s with pattern -r using selenium JS rendered sites
Selenium is one of those things that takes up more memory than it needs to. Take that into account in use.
IP Addresses Tupple: ((([0-1]{0,1}[0-9]{0,2}|25[0-5]|2[0-4][0-9]).){3}([0-1]{0,1}[0-9]{0,2}|25[0-5]|2[0-4][0-9])) This returns a tupple with the full address at the 0th index
IP Address Sort of: \d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} Returns a string of the IP address that is not nescicarily accurate
https://stackoverflow.com/questions/39547598/selenium-common-exceptions-webdriverexception-message-connection-refused
https://stackoverflow.com/questions/8220108/how-do-i-check-the-operating-system-in-python
https://selenium-python.readthedocs.io/api.html
https://stackoverflow.com/questions/1285917/how-to-disable-javascript-when-using-selenium/51681608#51681608
5 years ago | v0.0 | commitlog |
5 years ago | master | logtree |