The huge coming “data purge”

I’ve decided to go forward with the big data purge mentioned by me a number of times over the years, although I’m still refining the rules.

The basic concept is that most visitors realize after a while that when listening to a syndicated talk show, it doesn’t really matter which station you’re listening to – as long as the station carries the show when you want to hear it, the audio quality is good, and the commercials are not too obnoxious. You probably don’t listen to your local radio station for “local news” as they probably carry national news at the top of the hour. If you’re lucky, you hear 30 seconds of “local news” that the station ripped off by paraphrasing the local newspaper front page at 5 am (three days ago).

Maintaining 260 schedule entries for Rush Limbaugh – 245 of which are from Noon-3PM (ET) – is a lot of work for little to no benefit to the visitors. I refuse to spend significant time listening to Rush talking about birth control.

+------------+----------+--------+
| lstarttime | lendtime | scount |
+------------+----------+--------+
|       1200 |     1500 |    245 |
|       1200 |     1300 |      5 |
|       1300 |     1600 |      3 |
|       1400 |     1700 |      3 |
|       1200 |     1400 |      1 |
|       1400 |     1600 |      1 |
|          0 |      500 |      1 |
|       1600 |     1900 |      1 |
+------------+----------+--------+

You might actually want to know the station that carries Rush at 4 PM (ET) or overnight.

So which stations do people actually listen to Rush on when visiting this web site? If you’re a radio industry consultant, you’ll immediately answer “WABC-AM of course!”, but you could NOT BE MORE WRONG.

People that live in the New York City area don’t need to go to Google to find a station carrying Rush Limbaugh – they go right to the station’s web site. Visitors here are people who CANNOT find a syndicated show on a local radio station, or can hear the station, but not inside their office building at work, or the local station carries the show at a time they can’t listen and they don’t want to pay for a podcast subscription

So here is the top 50 list for Rush (this is all time – I’m going to refine this to be only recent activity):

+------------+------------------+--------+-------------+----------+
| streamname | CityName         | Statex | listencount | pctTotal |
+------------+------------------+--------+-------------+----------+
| WVNN AM    | ATHENS           | AL     |      238149 |  16.6404 |
| WCOA AM    | PENSACOLA        | FL     |      228242 |  15.9482 |
| WRVA AM/HD | RICHMOND         | VA     |       72280 |   5.0505 |
| KID AM     | IDAHO FALLS      | ID     |       49539 |   3.4615 |
| WSBA AM    | YORK             | PA     |       45508 |   3.1798 |
| KOA AM/HD  | DENVER           | CO     |       40729 |   2.8459 |
| KFAY AM    | FARMINGTON       | AR     |       30953 |   2.1628 |
| WTVN AM/HD | COLUMBUS         | OH     |       29288 |   2.0465 |
| KOGO AM/HD | SAN DIEGO        | CA     |       25477 |   1.7802 |
| WHAM AM/HD | ROCHESTER        | NY     |       20277 |   1.4168 |
| WHO AM/HD  | DES MOINES       | IA     |       20228 |   1.4134 |
| WAAV AM    | LELAND           | NC     |       19460 |   1.3597 |
| WISN AM/HD | MILWAUKEE        | WI     |       18903 |   1.3208 |
| WBAP AM    | FORT WORTH       | TX     |       17423 |   1.2174 |
| WLS AM     | CHICAGO          | IL     |       17334 |   1.2112 |
| KLBJ AM    | AUSTIN           | TX     |       17240 |   1.2046 |
| WMXI FM    | LAUREL           | MS     |       15633 |   1.0923 |
| WMAC AM    | MACON            | GA     |       14225 |   0.9940 |
| WBCK FM    | BATTLE CREEK     | MI     |       13221 |   0.9238 |
| KTWO AM    | CASPER           | WY     |       12431 |   0.8686 |
| KRMG AM    | TULSA            | OK     |       11932 |   0.8337 |
| KBOI AM/HD | BOISE            | ID     |       11901 |   0.8316 |
| WPGB FM/HD | PITTSBURGH       | PA     |       10272 |   0.7177 |
| KSCO AM    | SANTA CRUZ       | CA     |       10255 |   0.7166 |
| WMMB AM/HD | MELBOURNE        | FL     |       10042 |   0.7017 |
| WFLN AM    | ARCADIA          | FL     |        9742 |   0.6807 |
| WROK AM    | ROCKFORD         | IL     |        9441 |   0.6597 |
| WFNC AM    | FAYETTEVILLE     | NC     |        9067 |   0.6335 |
| KMAJ AM    | TOPEKA           | KS     |        8232 |   0.5752 |
| WLAC AM/HD | NASHVILLE        | TN     |        8216 |   0.5741 |
| KCRS AM    | MIDLAND          | TX     |        8186 |   0.5720 |
| WOKV AM    | JACKSONVILLE     | FL     |        7277 |   0.5085 |
| WKRC AM    | CINCINNATI       | OH     |        7050 |   0.4926 |
| WMT AM/HD  | CEDAR RAPIDS     | IA     |        6714 |   0.4691 |
| KEX AM/HD  | PORTLAND         | OR     |        6650 |   0.4647 |
| KNRS FM/HD | CENTERVILLE      | UT     |        6638 |   0.4638 |
| WOOD AM    | GRAND RAPIDS     | MI     |        6383 |   0.4460 |
| WREC AM/HD | MEMPHIS          | TN     |        6342 |   0.4431 |
| WSPC AM    | ALBEMARLE        | NC     |        5952 |   0.4159 |
| WFLA AM/HD | TAMPA            | FL     |        5622 |   0.3928 |
| KSFA AM    | NACOGDOCHES      | TX     |        5396 |   0.3770 |
| KNZZ AM    | GRAND JUNCTION   | CO     |        5379 |   0.3759 |
| KREI AM    | FARMINGTON       | MO     |        5368 |   0.3751 |
| WNTK FM    | NEW LONDON       | NH     |        5104 |   0.3566 |
| WMAL AM    | WASHINGTON       | DC     |        4924 |   0.3441 |
| WOWO AM/HD | FORT WAYNE       | IN     |        4663 |   0.3258 |
| WHAS AM/HD | LOUISVILLE       | KY     |        4553 |   0.3181 |
| KVOR AM/HD | COLORADO SPRINGS | CO     |        4375 |   0.3057 |
| WJBO AM    | BATON ROUGE      | LA     |        4373 |   0.3056 |
| WTVN AM/HD | COLUMBUS         | OH     |        4342 |   0.3034 |
+------------+------------------+--------+-------------+----------+

I’m undecided how to set the cutoff – “Less than x% of the total” would be pretty easy to do – I would like something more along the lines of “the top 5”, but that is not so easy to do in MySQL (at least with my skill level)

This probably will happen in a day or two. This is your last chance to provide your input so I can ignore it.

This entry was posted in About the Guide, Volunteer. Bookmark the permalink.

17 Responses to The huge coming “data purge”

  1. Art Stone says:

    Here is what I’m proposing to retain for Rush Limbaugh – the rest will vanish from the visible web site, including the volunteer to do list pages

    +------------+------------+----------+-----------+--------+-----------+----------+
    | streamname | lstarttime | lendtime | dayofweek | lcount | timecount | share    |
    +------------+------------+----------+-----------+--------+-----------+----------+
    | WCOA AM    |          0 |      100 | MTWTFXX   |      3 |         3 | 100.0000 |
    | WTVN AM/HD |          0 |      100 | XTWTFSX   |      3 |         3 | 100.0000 |
    | WTVN AM/HD |          0 |      500 | XTWTFXX   |    447 |       447 | 100.0000 |
    | WTVN AM/HD |        400 |      500 | XTWTFSX   |     11 |        11 | 100.0000 |
    | KMOX AM/HD |       1200 |     1300 | MTWTFXX   |      7 |        15 |  46.6667 |
    | KWON AM    |       1200 |     1300 | MTWTFXX   |      2 |        15 |  13.3333 |
    | KLBJ AM    |       1200 |     1300 | MTWTFXX   |      2 |        15 |  13.3333 |
    | KFYR AM    |       1200 |     1300 | MTWTFXX   |      2 |        15 |  13.3333 |
    | WTAW AM/HD |       1200 |     1300 | MTWTFXX   |      1 |        15 |   6.6667 |
    | KURV AM    |       1200 |     1300 | MTWTFXX   |      1 |        15 |   6.6667 |
    | WFOY AM    |       1200 |     1400 | MTWTFXX   |     42 |        42 | 100.0000 |
    | WRVA AM/HD |       1200 |     1500 | MTWTFXX   |   1563 |     20115 |   7.7703 |
    | WAAX AM    |       1200 |     1500 | MTWTFXX   |   1395 |     20115 |   6.9351 |
    | KTSM AM/HD |       1200 |     1500 | MTWTFXX   |    836 |     20115 |   4.1561 |
    | KALZ FM/HD |       1200 |     1500 | MTWTFXX   |    819 |     20115 |   4.0716 |
    | WISN AM/HD |       1300 |     1600 | MTWTFXX   |    706 |      1227 |  57.5387 |
    | KENI AM/HD |       1300 |     1600 | MTWTFXX   |    272 |      1227 |  22.1679 |
    | KFBX AM    |       1300 |     1600 | MTWTFXX   |    249 |      1227 |  20.2934 |
    | WOC AM     |       1400 |     1500 | MTWTFXX   |      2 |         2 | 100.0000 |
    | KFYR AM    |       1400 |     1600 | MTWTFXX   |    108 |       186 |  58.0645 |
    | KWON AM    |       1400 |     1600 | MTWTFXX   |     78 |       186 |  41.9355 |
    | WMT AM/HD  |       1400 |     1700 | MTWTFXX   |    741 |      1888 |  39.2479 |
    | WHO AM/HD  |       1400 |     1700 | MTWTFXX   |    563 |      1888 |  29.8199 |
    | WJAG AM    |       1400 |     1700 | MTWTFXX   |    524 |      1888 |  27.7542 |
    | KAJO AM    |       1600 |     1900 | MTWTFXX   |    519 |       519 | 100.0000 |
    | WKVP FM    |       2100 |     2400 | MTWTFXX   |     12 |        12 | 100.0000 |
    +------------+------------+----------+-----------+--------+-----------+----------+
    

    The criteria used was that if the station is receiving less than 4% of the traffic for that time period, it gets dropped. Where there is only one station (like WKVP-FM) carrying the show, they have a 100% share and won’t go away.

    For the live broadcast, that will leave WRVA, WAAX, KTSM and KALZ. Testing those 4 once a month will take about 2 minutes.

    • Art Stone says:

      Stations that carry less than the entire show get a reprieve which probably isn’t justified – for example, KMOX in Saint Louis does actually carry the entire show, but is only listed in the database as carrying the first hour. Once 95% of the trees are clear cut, then I can focus on fixing the rest.

  2. Art Stone says:

    And in conclusion, here is the purge summary for those shows who will lose more than 3 redundant affiliates

    +-----------+--------------------------------+------------------------------+
    | redundant | Show                           | host                         |
    +-----------+--------------------------------+------------------------------+
    |       235 | Rush Limbaugh                  |                              |
    |       131 | Sean Hannity                   |                              |
    |       117 | Glenn Beck                     | Pat Gray & Stu               |
    |       105 | Coast to Coast AM              | George Noory                 |
    |        75 | Mark Levin                     |                              |
    |        59 | Live on Sunday Night           | Bill Cunningham              |
    |        55 | Michael Savage                 |                              |
    |        46 | Moneytalk                      | Bob Brinker                  |
    |        33 | Red Eye Radio                  | Gary McNamara & Eric Harley  |
    |        32 | Dave Ramsey                    |                              |
    |        31 | Morning in America             | Bill Bennett                 |
    |        27 | Laura Ingraham                 |                              |
    |        26 | America Now                    | Andy Dean                    |
    |        25 | Ground Zero                    | Clyde Lewis                  |
    |        20 | John Batchelor                 |                              |
    |        19 | Dennis Miller                  |                              |
    |        16 | Dennis Prager                  |                              |
    |        15 | Coast to Coast Monday Morning  | George Knapp                 |
    |        13 | Coast to Coast Sunday Morning  |                              |
    |        12 | Michael Medved                 |                              |
    |        10 | Hugh Hewitt                    |                              |
    |         9 | Somewhere in Time              | Art Bell                     |
    |         9 | Jim Bohannon At Night          |                              |
    |         7 | Kim Komando on Computers       |                              |
    |         7 | Herman Cain (National)         |                              |
    |         6 | Jerry Doyle                    |                              |
    |         6 | Jim Rome (CBS)                 |                              |
    |         6 | Rush Limbaugh's Week in Review |                              |
    |         5 | Coast to Coast Rewind          | George Noory                 |
    |         5 | Mike Gallagher                 |                              |
    |         4 | Bob & Tom                      |                              |
    +-----------+--------------------------------+------------------------------+
    
    

    The flag is set, the only thing left to do is to start paying attention to it.

    Very much related to my decision is the lack of participation in checking the stations. Here is the list for July, keeping in mind that CharlotteNC is me.

    +---------------+--------+
    | username      | vcount |
    +---------------+--------+
    | CharlotteNC   |   1225 |
    | editor        |    679 |
    | Simmons       |     94 |
    | janderson021  |     12 |
    | salubrious    |      8 |
    | briand75      |      6 |
    | florida fresh |      2 |
    | WesternMA     |      2 |
    +---------------+--------+
    

    Your influence on my decision is directly proportional to your presence on that list.

  3. I know I checked many more than 12 stations in July — maybe there is a change in the way checking is done and I don’t know about it. I thought I read Art’s posts carefully but maybe I missed something.

  4. Aha! I was checking streams as usual (except for clicking on the link to the station web site and finding the player rather than the now-deposed listen button).

    I have now found the “needs review” page and will try to contribute there. Have I got it right that you want us to use that to check for “authentic” station web sites?

    • Art Stone says:

      You have that correct. This “purge” is the reason that I’ve been pushing people away from testing schedules or streams (for now) as most of that data was going away soon.

      You won’t find news/talk stations left on that list – I took care of them first.

      You also won’t find most stations in North Carolina, Idaho and most Classical Music stations, which are the areas that I’ve been hitting. I would encourage you to focus on those things you know well by experience – stations in your state, stations of the format that you’re familiar with… one of my challenges as a non-rock music person is many radio stations don’t identify their genre, just put pictures of rock stars/groups on their page. It’s hard enough to guess even if you know who the artist is. What format is Elton John?

      You have the power to update the URL’s and station formats and some other things, but do that with caution. If you suspect it is wrong / out of date and you’re in doubt what to do – just click ignore, and that will push it out of your pile and onto mine.

      For probably 2/3 of stations, it’s just a look at these things:
      – Does it appear to be the web page for the radio station
      – Does the frequency and/or URL seem to match
      – is there a recent call sign change?
      – is the station format still reasonably accurate?
      – Has the page been updated recently? (Recent “News” means nothing as that is coming from a third party feed)

      If it all lines up, click on “Clear”, “Complete”, or “Accurate”, depending on which web page you’re looking at. Clearing the item, removes the “Potentially unsafe” warning for that station for 6 months.

  5. Art Stone says:

    The pages are now putting red *** redundant *** warnings on what is about to vanish.

    For those who are curious or wish to nitpick, the schedule data is not technically going away – it is just flagged so that it won’t be shown and excluded from being maintained. I might set up some kind of appeals process eventually, but the rationale has to be something stronger that “That’s the one I want to listen to!”

    For those who haven’t noticed, on the “What’s on now” pages, you can change the “View Time” to look at other parts of the day – so if you wanted to see which stations Michael Savage loses, change the View time to 3 PM (or adjust it for your time zone in not Eastern)

    • Art Stone says:

      I”m mulling over the idea of letting people choose the stations that appear on the list using their SRGuide volunteer points :). My concern is thought that as soon as I make an actual reason to have points, people will then focus on earning points as fast as possible rather than editing the data as accurately as possible.

  6. Art Stone says:

    A big part of the wasted effort is that the (supposed to be) abandoned volunteer pages were waaaaay too optimistic. The default was to show you things that hadn’t been checked in 15 days.

    The result was the people still using the page were testing the same stations over and over on day 16 – while never getting to the ones that hadn’t been tested in 180 days, which just grew older and staler.

  7. Art Stone says:

    Here is the beginning of a page to examine the “redunancy” criteria…

    http://streamingradioguide.com/adm/redundant.php?showid=422

    It doesn’t yet have a search front end – this is the page for Rush Limbaugh – if you’re clever enough to know the showid for other programs, you can make it cough those up….

    One oversight this pointed out was that if a show had never been listened to on a station in this year, it was getting a free pass.

  8. janderson says:

    Are you planning to leave the ‘all streams’ link available after the department of redundancy department purge?

    • Art Stone says:

      Sure – but the redundant links will go away – I guess I might need to change the wording. I already get an occasional email saying “but you left out…”

    • Art Stone says:

      If you go to the All Streams page, now it only gives you the non-redundant links – but if you click the existing “Show me every station we ever knew about”, it will include the redundant ones.

      What I don’t want to do is wind up with people seeing the “redundant” links and then cause me hours of time a day to defend the fact I’m no longer maintaining them. If I get too much push back, they’ll just go away completely.

  9. Art Stone says:

    One problem which is not going to be solved is dealing with shows that are split over the end of day in the time zone where the station is located. The best example of this would be a station in pacific time that carries the Red Eye Trucker show live. Red Eye Radio is live Eastern Time 1-5 AM Monday through Friday mornings (Saturday morning and Sunday morning shows are not live). What that means is an affiliate in Califonia carries the show starting live at 10 PM on Sunday night. To keep the database from getting insanely complicated, I have to split the show into two records breaking at midnight local time, but the times themselves are kept in Eastern Time. Confused yet?

    That means the record for a station on Pacific Time like this carrying only the live show – is on
    “MTWTXXS” 100-300 (ET) [10pm-midnight Pacific]
    and
    “MTWTFXX” 300-500 (ET) [Midnight-2AM Pacific]

    And don’t get me started on Daylight Saving Time stations in Arizona.

    And if your instinct to the above information is to write me an email telling me your easy solution to the problem (“Why don’t you just…”), click the Delete Message button now. I sure will if you don’t.

  10. Linda S. says:

    Sorry, Fred.
    I am totally confused. Guess I just need to stay away from trying to help… Because, I know not what I should do. Therefore, I am of no use! SAD!!

    • Art Stone says:

      Here are the key things to know:

      1) Testing streams ended about 3 months ago – the player links are no longer maintained. The *only* thing I care about (and not very much) is whether the station streams or not.
      2) As of this week, about 1/3 of all the program schedule entries have been removed, mostly the redudant links. Those are the ones that were consuming 95% of the testing time.
      3) There are about 8000 radio stations currently showing the “This web site might be unsafe” warning because it has been at least 6 months since I had time to check them. Many of them are over 2 years old and many are wrong because of ownership changes and format changes.

      The only thing remaining that people might be helpful with is confirming those 8000 stations *carefully*, but only if you have adequate virus protection and backed up your critical data. I can’t guarantee your computer will not be hijacked. That’s why they are flagged as potentially unsafe.

      I’ve changed the old pages to just automatically redirect people to the “Needs Review” page. With most of the redundant program schedules removed, I no longer need help maintaining that information.

      Some of you may have noticed that I no longer list “News/Talk” as the first format. That’s not an accident. I’m done with listening to it other than maybe 2 or 3 hours a month to test the remaining schedule items. I’ve moved on.

      I’m sorry to hear you are sad. I hope you recover soon.

Leave a Reply