Drive:Activated logo
hi there!

I see you've stumbled on to my humble home on the net, Drive:Activated. My name's Sam, I'm an ambitious and driven uni student, residing in Melbourne, Australia, wanting to make my mark on our world. This is my site, which is mainly just my blog and some other bits. There's no definite theme to my blog, just anything that interests me, and currently that's web trends, startups, ideas and cool stuff. Check it out, leave me a comment, click on 'Who is this?' to find out more about me, or drop me a line by clicking on 'Let's Talk'. Hope you enjoy it!

My signature

Content sign

The reality of clean-feed internet filtering

   Filed under: , ,    

This isn't the way I'd like to start the new year here, but it's creating a storm and it's something that has the potential to affect everyone as internet users.

For those who have been out partying it up over the new year and haven't yet caught up with Australian news,

Senator Conroy [Federal Telecommunications Minister] says it will be mandatory for all internet service providers to provide clean feeds, or ISP filtering, to houses and schools that are free of pornography and inappropriate material.

"If people equate freedom of speech with watching child pornography, then the Rudd-Labor Government is going to disagree."

Senator Conroy says anyone wanting uncensored access to the internet will have to opt out of the service.

http://www.abc.net.au/news/stories/2007/12/31/2129471.htm

Australian bloggers are up in arms, some completely disagreeing with the proposal and think we're on the slippery slope of internet censorship, while others are commending the government for taking a such a strong stance against undesirable content on the internet.

Let's take a moment and step away from the censorship argument. I'm not going to tackle that because it has been done to death, and there is simply not enough information to seriously discuss that scenario without speculation.

Instead, I'm going to run through the reality of a clean-feed solution.

In the real world, we have classification in Australia for TV shows, movies, magazines, games, music and other types of media, provided by the Classification Board. Depending on the type of media and it's usage, classification may be required, and performed by the Classification Board, and disputes can be brought up with the Classification Review Board. Anything that needs to be classified and is not or has been refused classification can not be sold.

Here are some statistics on what they classified, 2006-2007:

  • 214 publications
  • 402 films for public exhibition
  • 4,555 videos or DVDs for sale or hire
  • 890 computer games
  • 28 Australian Communications and Media Authority Internet referrals, and
  • 134 enforcement referrals

http://www.classification.gov.au/special.html?n=250&p=58

The internet however, is generally unclassified. So it would make sense that we classify it, so we can block illegal content, and prevent questionable material from getting into the hands of those who shouldn't be viewing it, right? After all, no one can argue that people should be able to access child porn, and that 10 year old kids can watch hardcore porn videos.

Thing is, the internet is unlike any communication medium we have had before. Anyone can contribute content, wherever, whoever, whenever they may be, and that content is available to anyone in the world. There are no hard statistics on the amount of content on the internet due to the distributed and dynamic nature of it, but Netcraft, an internet monitoring company, recently counted 155,230,051 active websites (not pages) as of December 2007. I'd say that's a fairly conservative figure, and growing significantly by the second.

Keeping that in mind, let's look at classification and filtering. There are 4 ways of doing it.

Whitelisting

This is when the filter has a list of good sites, and anything that is not in that list is blocked.

This is how classification in the offline world works right now - they assess material that requires classification, and if it is refused classification, it is banned. It works because every year, there is generally only about 5000 or so items that need to be classified, and once material is classified, that material never changes.

Internet 'material' however is completely different.

To start with, there's the sheer enormity. 17 Classification Board members classify 5000 items a year currently. If we assume there are 155,230,051 public websites out there, and all need classification because they can be accessed by anyone, using the current classification workload figures, we would need 527,782 board members to classify them all. Imagine trying to keep 527,782 people consistent, and the bureaucracy that gets created. Also, any new sites that just popped up will be blocked, until the classifiers get around to it - say goodbye to access to the latest web apps, information, content.

Then there's the dynamic nature of internet content. A web page does not stay constant once it has been published, unlike a book or a movie. It can be edited (or hacked) at any time to anything. Or it can dynamically draw content in from other websites. Therefore classification is effectively useless because by the time it has been classified, the content has changed so much it probably needs reclassification. The advent of enormous amounts of user generated content makes this even more ridiculous - try classifying every photo on flickr, a popular photo sharing service that has some risque shots in there, all the while thousands of new photos are being uploaded every hour.

Of course, we could just permit a small selection of websites, but that amounts to censorship - why does website X get permitted, but website Y doesn't? It still doesn't tackle the dynamic nature of the internet, and removes one of the best things about the internet - you can find information about anything you want on it basically. Have a look at these stats from the UK's clean-feed service - http://www.cleanfeed.co.uk/catstats.php. They have only classified 9 million of the 155 million active websites out there.

Blacklisting

This is the opposite of whitelisting - if the site you're trying to access is on the list, then you are prohibited from accessing it; sites not on the list are allowed.

This approach suffers from similar issues to whitelisting. In order for the blacklist to be effective, every site would need to be reviewed, or at least a large proportion because otherwise you would easily be able to go to another site for content that was blocked. It would need to be constantly updated as well, because of the dynamic nature of the internet.

In addition, it is near useless against internet proxy servers too. Internet proxy servers are similar to a proxy in real-life - they act on behalf of someone else. There are thousands of proxy servers available on the internet for such functions, with many changing on a day-to-day basis. Blacklisting is ineffective against proxy servers because using a proxy hides the real request.

For example, let's say www.porn.com was blacklisted. If I try to access www.porn.com I would be blocked. If I used a proxy however, I would instead be accessing www.proxyserver.com and telling it that I wanted to access www.porn.com. The filter would think I was access www.proxyserver.com and let me through.

Sure you can blacklist proxy servers, but with so many popping up, and things like distributed proxy networks like Tor where anyone can be a proxy for anyone else,  it's a futile task. And some 'proxy servers' are actually useful - for example, you can turn Google Translate into a proxy server, by telling it you want to translate a particular web page in English, from Spanish to English. It looks for Spanish words in the page, but finds none because the page is in English, and returns the page to you, all the while the filter thinks you're just accessing Google Translate.

Content filtering

Instead of having humans classify every website, this approach allows classifiers to set particular bits of content to watch for, and if they exist or enough triggers exist, the website is blocked. The content may be certain words or phrases, or certain types of images (e.g. ones with significant amount of area in a skin tone colour). So instead of humans doing the filtering, computers are.

The problem with this approach is that computers are dumb. They see the world in black and white, when it is in fact grey. They do not understand what they're filtering, and hence they miss any contextual significance. For example, is an image with significant amounts of skin tone a pornographic image, or an image of a medical condition? Or is it a web page with instructions on how to build bombs, or a web page on the chemical reactions of particular substances? 

However, this is the only approach that is at all feasible when it comes to filtering the internet, because it is the only one that can handle the enormity of the internet. There is a lot of research going on to teach computers to understand human languages, and to develop algorithms that are smart enough to deduce the meaning of documents, images, videos and other media. But it's a long road, and there's still a lot to be done before it is useful. And remember, we don't stand still either - our language and way of expressing ourselves is constantly changing.

It's worth noting that encrypted websites are immune to this type of filtering.

A combination of the above

Most internet filtering systems use a combination of the above approaches. But as you have just seen, none of them are in any way effective on their own, so combining 3 systems that don't work doesn't result in one that does.

So, now what?

You may be tempted to say, so what? At least we're doing something to stop our kids from accessing porn and violence, or the propagation of child pornography.

Stop kidding yourself. The enormity and dynamic nature of the internet means classifying and filtering it effectively isn't too hard. It's impossible. If you think a filter will be effective in protecting your kids from porn, or people from accessing illegal material, think again - proxies, secure tunnels, google, hacks. And even if you don't think your kids or next door neighbours are smart enough to work it out, they'll know people who do.

Think it through thoroughly before forming your opinion. Go to google and do a search on the most random thing you can think of, then do one on the most common thing you can think of, and consider the effort needed to classify that, and everything in between. Then get your kids to show you or sign up on to facebook yourself and see all the avenues that content can be contributed and changed.

Internet filtering is a blunt instrument. Consistently banging that hammer on the internet will hurt those who legitimately use the internet much more than those who use it for objectionable purposes. This is amplified due to the opt-out nature of the proposed internet filtering - we all know few would be bothered opting out due to laziness, and/or the implication that because they're opting out, they want to access child porn.

Think of the implications of someone who earns a living from their online store, yet it is suddenly blocked a few weeks before Christmas because of a questionable comment a user made on a product. That person could potentially miss out on the entire Christmas period, depending on how fast the classifiers re-review their site. Their name may also be forever tarnished, as users find out the site has been blocked, implying it contains pornographic material. On the other hand, if you block one child porn site, they'll just respawn under a different address and away we go again.

Filters may form part of the solution, but it isn't a silver bullet. There is no way to control all the content the way we do offline, online. The fact that anyone can post content on to the internet is both the internet's strength and weakness. It revolutionised the way we communicate, with ability to broadcast our thoughts, views, information, content to anyone in the world who cares to look at it. And it will become more and more important as we explore and adopt ways of using it to make our lives easier. Yet on the flipside, we are finding objectionable content is more freely available, as those propagating such content exploit the internet as well.

The internet is a new medium, unlike any other. We need to treat it as such, stop applying ideas that worked in other mediums, and think of new, radical, innovative ideas that cater for the internet's unique abilities. We need to educate the public, and tackle the ignorance people have about the internet, the ignorance that political groups and companies with vested interests are exploiting so well, the ignorance that will bite them when they realise they have been tricked.

If you're on facebook, join this group and let's work out a solution that actually works.

Trackbacks sign
No Trackbacks
Trackback URL
No trackbacks yet - link to me people!
Comments sign
No Comments
Comments RSS RSS icon
Come on, be the first to unleash those thoughts from within.
Post comment sign
Leave a Comment
I know you want to!
(required)  
(optional)
(required)  

Want to keep stay in the loop with the comments here? Leave your email address below and you'll be informed when a new comment is added to this blog post.

(optional):  

Submit