Author Topic: Process Collection taking 60 seconds per 20000 instead of 1 second?  (Read 59264 times)

Offline ndb1973

  • Decent Indexer
  • ***
  • Posts: 53
  • Helpful: +0/-1
Estimating that over the last 2 days (and still continuing) it is taking forever to update.  I have tracked it down to the process collection phase taking 60 seconds instead of 1 second.

I have tried rebooting but that hasn't helped, disk space is also fine. 

Any ideas what could be causing this and how to fix?

1.33s to download articles, 57.25s to process collections, 0.70s to insert binaries/parts, 0.00s for part repair, 59.19s total.

Thanks
« Last Edit: 2017-01-23, 01:46:08 AM by ndb1973 »

Offline adr3nal1n

  • Junior Indexer
  • **
  • Posts: 32
  • Helpful: +2/-0
Am seeing the same thing when updating a.b.teevee, but as I am running an rpi2 it is taking 240 seconds per 20,000 to process collections, when it used to be much quicker than this. Maybe there is someone/bot spamming some of the groups at the  moment?

Looks like this issue is related http://forums.nzedb.com/index.php?topic=2349.0

« Last Edit: 2017-01-23, 03:46:20 AM by adr3nal1n »

Offline ndb1973

  • Decent Indexer
  • ***
  • Posts: 53
  • Helpful: +0/-1
At least it is not just me.  Bit strange I have never noticed this before. 

I have checked my database setup and everything is set as recommended it just seems to have got very slow on the collections side of things. 

Downloading and inserting speed seems to be unchanged.

Offline xeddog

  • Prolific Indexer
  • ****
  • Posts: 240
  • Helpful: +9/-2
IT'S RUSSIAN HACKING!!!! 

OK, seriously, I am having the same issue with a.b.teevee and a.b.moovee.  Downloading 50,000 articles takes about 1.5-ish seconds, but processing collections takes anywhere from 60 to about 120 seconds.  Most common seems to be about 80-90 seconds.  I have also noticed that when either one of these groups is actively updating or backfilling, one of my cpus (system monitor shows 8 cpus) is always at or very near 100% utilization.  My system monitor reports that the process using all the cpu is mysqld.  Overall system responsiveness is also a bit off, but nothing serious.  If I terminate the update or backfill, cpu and system responsiveness drop back to "normal".  All of the other groups will process 50,000 articles in about 1 to 1.5 seconds and 2 seconds max, and considering the old hardware it runs great.  System monitor also shows memory usage at around 2.5GB or about 15-16% and is very constant.  If I never activate moovee or teevee, swap usage will be zero, but if I activate them swap usage will increase a little bit to about 110MB or 0.7% or thereabouts, but memory usage remains constant.

Just for reference, I am running Ubuntu 16.04 (fresh install btw), and using Mariadb.  This is a personal installation, and I am only indexing about 30 groups, mostly small ones.  At the moment, moovee and teevee are deactivated so make that 28 groups.  As for hardware, an older I7-920 processor, 16GB RAM, and a 1TB Esata disk drive and there is a buttload of free space.

Wayne

Offline ThePeePs

  • Overlord
  • ******
  • Posts: 44
  • Helpful: +7/-0
  • Hardware mod'er and p/t coder
    • nZEDb by ThePeePs
Am seeing the same thing when updating a.b.teevee, but as I am running an rpi2 it is taking 240 seconds per 20,000 to process collections, when it used to be much quicker than this. Maybe there is someone/bot spamming some of the groups at the  moment?

Looks like this issue is related http://forums.nzedb.com/index.php?topic=2349.0

If you guys haven't put in the blacklist entry that I suggested in the other thread, try it.  If that doesn't help, look at what's in the collections table(s) if you see lots and lots of collections from that look very much the same, the are probably all garbage/spam, and you may want to write a blacklist entry for them.

To check what's in the tables, run the following queries on the DB:
Code: [Select]

mysql -u <username> -p
use <nzedb db name>;
select subject,fromname from collections_<groupID> limit 100;
To get the groupID you can use this query:
Code: [Select]
select ID,name from groups where active =1;

Offline xeddog

  • Prolific Indexer
  • ****
  • Posts: 240
  • Helpful: +9/-2
I just ran
Code: [Select]
select ID,name from groups where active =1; and a.b.moovee has a group number of 1000096.  What the heck?

Wayne

Offline ThePeePs

  • Overlord
  • ******
  • Posts: 44
  • Helpful: +7/-0
  • Hardware mod'er and p/t coder
    • nZEDb by ThePeePs
Odd, mine is 69, if you are doing TPG, then you should have a collections_1000096 table.

Offline xeddog

  • Prolific Indexer
  • ****
  • Posts: 240
  • Helpful: +9/-2
I do have all of the xxxx_1000096 tables, but why aren't they just _96?

Wayne

Edit:  and btw, I also ran
Code: [Select]
php delete_releases.php fromname=like="pr3d@NET.world" and it didn't seem to do anything.  I got an immediate prompt back with no other messages.
« Last Edit: 2017-01-27, 04:48:49 PM by xeddog »

Offline xeddog

  • Prolific Indexer
  • ****
  • Posts: 240
  • Helpful: +9/-2
This might be interesting - I added the blacklist to block pr3d@NET.  Now it is taking up to around 20 seconds to DOWNLOAD 50,000 headers instead of <2 seconds.  But now, out of the 50,000, over 40,000 are blacklisted and then the time to process the collections is around 0.5 seconds give or take.

Wayne

Offline david_ritterhous

  • Prolific Indexer
  • ****
  • Posts: 240
  • Helpful: +6/-0
Can you share a screenshot of "edit" that blacklist?  I have some valid posts with that email address that are older.  Want to make sure I did it correctly.
TIA

Offline ThePeePs

  • Overlord
  • ******
  • Posts: 44
  • Helpful: +7/-0
  • Hardware mod'er and p/t coder
    • nZEDb by ThePeePs
This might be interesting - I added the blacklist to block pr3d@NET.  Now it is taking up to around 20 seconds to DOWNLOAD 50,000 headers instead of <2 seconds.  But now, out of the 50,000, over 40,000 are blacklisted and then the time to process the collections is around 0.5 seconds give or take.

Wayne
You might want to cut down the number of headers you pull per run, i only do 5k, and mine keeps up just find, infact, with a 60 sec sleep time in tmux, it takes between 1-2 min to pull all the new headers for 131 groups.

Offline david_ritterhous

  • Prolific Indexer
  • ****
  • Posts: 240
  • Helpful: +6/-0
This might be interesting - I added the blacklist to block pr3d@NET.  Now it is taking up to around 20 seconds to DOWNLOAD 50,000 headers instead of <2 seconds.  But now, out of the 50,000, over 40,000 are blacklisted and then the time to process the collections is around 0.5 seconds give or take.

Wayne
You might want to cut down the number of headers you pull per run, i only do 5k, and mine keeps up just find, infact, with a 60 sec sleep time in tmux, it takes between 1-2 min to pull all the new headers for 131 groups.

But how far behind are you?  meaning, it will get only 5k if there are say 15k, and then next run, 5k and then 30k pending, etc.

Offline ThePeePs

  • Overlord
  • ******
  • Posts: 44
  • Helpful: +7/-0
  • Hardware mod'er and p/t coder
    • nZEDb by ThePeePs
I'm not behind at all.  I'm running threaded, so update_binaries is pulling 5k per thread.

Offline xeddog

  • Prolific Indexer
  • ****
  • Posts: 240
  • Helpful: +9/-2
I'll get to the blacklist stuff in a little while, but I remembered why my group ID shows as 1000096.  It's because one of the things I did earlier was to "Delete" the group.  I meant to do a "Purge", but oh well.  So when I added it back manually it gave the ID of 1000096.


Wayne

Offline tateu

  • Newbie
  • *
  • Posts: 2
  • Helpful: +1/-0
Starting sometime today my processing time jumped back up to around 300 seconds from the normal 1.5 in alt.binaries.teevee. Looking through the collections table, I have a ton of private (junk) releases from a bunch of different posters other than last weeks spam king pr3d. They are all formatted as (9 digit hex)@(same 9 digit hex).(same 9 digit hex), such as:
Code: [Select]
dea7d8dab@dea7d8dab.dea7d8dab
924403a61@924403a61.924403a61
a45a431db@a45a431db.a45a431db

I added a new blacklist rule using regex backreferences  and everything is running smoothly again:
Code: [Select]
(.+)@\1\.\1