Commit graph

238 commits

Author SHA1 Message Date
Marian Steinbach b3bb8f34c3
Problembehebung mit Exception-Daten im Ergbebnis, die nicht geschrieben werden können, und Spidern einzelner Sites (#132)
* WIP commit for single job execution

* Convert exception to string

* Pass more arguments

* Move python modules lsit into requirements.txt

* Document single site spidering

* Remove debugging
2019-11-22 23:13:57 +01:00
Marian Steinbach 725ed5439d Set share memory size 2019-11-22 08:39:56 +01:00
Marian Steinbach 802a977fe5 Add required secrets for screenshots job 2019-11-22 08:38:49 +01:00
Marian Steinbach 7788166b0c Fix logging output in dns_resolution 2019-11-22 08:38:26 +01:00
Marian Steinbach 68f2288617
Prüfe DNS auf IPv6 AAAA Record (#124)
* Add check for IPv6 AAAA record

* Adapt rating/resolvable
2019-07-15 22:59:33 +02:00
Marian Steinbach 4401683133 Add shared memory to docker container 2019-07-14 22:51:05 +02:00
Marian Steinbach 1087cc25ec Little devops/README update 2019-07-14 22:50:44 +02:00
Marian Steinbach 0c59111044
Alpine downgrade to 3.8 (#118) 2019-06-04 08:08:53 +02:00
Marian Steinbach 576050d3cd Update alpine repository URLs 2019-06-03 08:12:30 +02:00
Marian Steinbach aac5fb3192 Some Makefile updates 2019-06-03 08:08:55 +02:00
Marian Steinbach 2f6cd9a304 Add extra docker run flags 2019-06-03 08:08:37 +02:00
Marian Steinbach c41e717d3d Update devops stuff 2019-05-05 22:42:48 +02:00
Marian Steinbach 9f7efddc61 Merge branch 'master' of https://github.com/netzbegruenung/green-spider 2019-05-05 22:41:34 +02:00
Marian Steinbach 04a1e98b79
Prüfe Existenz von /favicon.ico und werte dies ebenso wie ein Icon, das im HTML Head verlinkt ist (#115)
* Fix full JSON export

* Update ignore list

* Update README

* Check for /favicon.ico and rate it as icon available

* Remove broken cookies test
2019-05-05 22:26:41 +02:00
Marian Steinbach cc6a52bf45 Update README 2019-05-04 23:01:04 +02:00
Marian Steinbach 620610b48e Update ignore list 2019-05-04 23:00:56 +02:00
Marian Steinbach ab942ca91d Fix full JSON export 2019-05-04 23:00:00 +02:00
Marian Steinbach 9e5426ccde Use alpine 3.9 base image 2019-05-03 23:20:08 +02:00
Marian Steinbach cff4d55f17 Fix problem where pageload was not counted 2019-05-03 22:54:25 +02:00
Marian Steinbach 7621b7ef75 Remove debugging output 2019-05-03 22:54:05 +02:00
Marian Steinbach 56f9f1ba86 Check third party cookies 2019-04-29 10:09:25 +02:00
Marian Steinbach 5e8347916c
Fehlerbehebung im url_reachability check (#108)
* Fix detection of redirects to bad domains

* Fix bad domain check

* Add --url flag to spider for faster debugging

* Pass args to make spider

* Add spidering of a single URL for debugging purposes

* Fix tests

* Fix test in CI

* Remove pip upgrade
2019-04-19 00:35:28 +02:00
Marian Steinbach 2dfcf61cc0
Add netzbegruenung/green-spider-indexer to README 2019-04-11 22:43:16 +02:00
Marian Steinbach 16a05b751b Several fixes for edge cases 2018-12-17 23:54:09 +01:00
Marian Steinbach 3b8328d804 Fixing several bugs in spider code 2018-12-17 17:31:09 +01:00
Marian Steinbach 3b9ead330d
Load feeds and gather info (#103) 2018-12-07 16:32:42 +01:00
Marian Steinbach 3063a4488d
Detect frameset (#102)
* Add frameset checker

* Remove unused variable (unrelated)
2018-12-07 16:31:56 +01:00
Marian Steinbach deff95306b
Extend CMS detection for Urwahl3000 theme (#96)
* Extend check for Urwahl3000 theme
* Remove unused import
2018-12-05 21:27:45 +01:00
Marian Steinbach d0e3a4210f
Fix link raters (social media links, contact link) (#95)
* Fix rating for contact_link and social_media_link

* Skip checks when dependencies not met
2018-11-28 23:46:40 +01:00
Marian Steinbach eac5feb4f5 Kubernetes manifests: replace jobs with cronjobs 2018-11-28 22:19:03 +01:00
Marian Steinbach 678f319e73 Detect two more specific generators 2018-11-28 22:02:30 +01:00
Marian Steinbach 39cba1595a Fix contact link rating 2018-11-23 22:16:26 +01:00
Marian Steinbach 3ba6940e94
Add criteria: social media links, contact link (#90)
* Add hyperlink checker

* Add rating for contact and social media links

* Update a comment

* Remove hyperlinks details from final payload
2018-11-20 22:47:34 +01:00
Marian Steinbach 4524cb5714
Consider site reachable only with status code < 400 (#89) 2018-11-20 20:14:52 +01:00
Marian Steinbach c03ff21a9c
Simplify export (#88)
* Simplify exports

* Create file output in current working directory
2018-11-20 20:00:47 +01:00
Marian Steinbach 38481236ca
Add webapp deployment (#87)
* Add webapp deployment script

* Add some docs for webapp

* Some fixes in run-job.sh

* Update webapp deployment script

* Add some kubernetes job manifests

* Create index.yaml

* Remove local creation of the docker image from targets

* Update README.md
2018-11-20 19:54:23 +01:00
Marian Steinbach 924981659b
Allow Titillium together with Arvo (#78) 2018-11-05 23:18:11 +01:00
Marian Steinbach 325caee2bb
Detect generator jimdo (#81) 2018-11-05 23:00:01 +01:00
Marian Steinbach df1f0bb452
Detect Drupal (#80) 2018-11-05 22:32:06 +01:00
Marian Steinbach 8ce8768465 Improve error messages in export 2018-10-08 08:42:29 +02:00
Marian Steinbach 0538e437ea Fix NoneType error in rater responsive_layout 2018-10-07 21:14:29 +02:00
Marian Steinbach 18be6e7adf Fix rater no_network_errors 2018-10-07 20:55:57 +02:00
Marian Steinbach 4251df6b06
Fixes for two problems found during spidering (#75) 2018-10-05 10:25:05 +02:00
Marian Steinbach fd4a29da8e
Collect cookies in load_in_browser check (#74) 2018-10-04 21:21:30 +02:00
Marian Steinbach 2945372aaf
Fix README and Makefile (#72) 2018-10-04 21:20:53 +02:00
Marian Steinbach c065da4957
More unittests for checks (#73)
* Add test for dns_resolution
* Add test for domain_variations
* Add test for duplicate_content
2018-10-03 22:43:22 +02:00
Marian Steinbach 57f8dea4e0
Improve certificate check to support SNI (#71)
* Fix the certificate check to support SNI
* Better tests for the certificate check
* Activate verbose output when running make test
* Add commenting on the spider test
2018-10-03 21:01:52 +02:00
Marian Steinbach ae6a2e83e9
Refactor and modularize spider (#70)
See PR description for details
2018-10-03 11:05:42 +02:00
Marian Steinbach 7514aeb542 Add date and time to spider result export 2018-09-19 23:04:09 +02:00
Marian Steinbach 220b06fe79 Fix data export logging bug 2018-09-17 17:35:21 +02:00