Quantcast
Channel: Planet Ubuntu
Viewing all articles
Browse latest Browse all 17727

Paul Tagliamonte: Some really rough planet.ubuntu stats

$
0
0

After waking up today, I said to myself: “Hey. Today seems like a good day to run useless statistics that may just be totally off base.”

Well. Here’s what I did.

I got a copy of the planet.ubuntu config file, and started to work through it. First pass on the script was to yank all the URLs out.

$ HOSTS=`cat config.ini | grep -v "^#.*" | grep "\[.*\]" | tr -d "[" | tr -d "]"`; for x in $HOSTS; do echo $x >> hostnames; done

Fancy. Now, let’s see how many lines we have:

$ cat hostnames | wc -l
404

badass. This is already looking great. That, and it’s funny.

Now that I have a list of the hosts, I wanted to see how many servers self-identified.

$ HOSTS=`cat hostnames`; for x in $HOSTS; do ID=`curl $x | grep "\"`; if [ "x$ID" != "x" ]; then echo "$x $ID" >> positive-ids; fi; done;

After that finished, I checked the result

$ cat positive-ids  | wc -l
253

This is a really really bad way of doing it. I never said it was pretty. More on this later.

Next, I wanted an overview on how many dead hosts there are. Since ping won’t work ( filtering ping is not only normal, but a good idea ). To do this, I used curl ( again ).

$ HOSTS=`cat hostnames`; for x in $HOSTS; do curl $x > /dev/null; if [ $? -ne 0 ]; then echo "$x" >> errord; fi; done

Well, that ran, the output looked good, so I took a look at it

$ cat errord | wc -l
11

Great.

Now, let’s go back to the positive-ids. I extracted the data from the tags using a bit of sed-voodoo.

$ sed -n -e 's/.*\(.*\)\/generator>.*/\1/p' ./positive-ids > blog-engines

Now, I have a file full of all the blog engines ( or homebrew softwares ). So, I did a quick check on it.

$ cat blog-engines | wc -l
144

Careful readers will point out that this number is less then the count of my positive ids. Yes, you’re right. My script snagged newlines. As a result, there are a few lines that are “runover” from the last. This output is good.

So, now. Let’s figure out what the most popular feed generator is.

$ cat blog-engines | sort | uniq -c | sort -n -r > counts

And the results? Well, I’m getting there!

 42 http://wordpress.org/?v=3.0.1
 22 http://wordpress.com/
 21 http://wordpress.org/?v=2.9.2
  8 http://wordpress.org/?v=3.0.2
  5 LiveJournal / LiveJournal.com
  5 http://wordpress.org/?v=2.8.4
  5 Blogger
  4 http://wordpress.org/?v=3.0
  3 Dotclear
  2 Serendipity 1.5.4 - http://www.s9y.org/
  2 mod_virgule
  2 http://wordpress.org/?v=abc
  2 http://wordpress.org/?v=2.9.1
  2 http://pipes.yahoo.com/pipes/
  2 Apache Roller (incubating) 4.0.1 (20090102102238:dave)
  1 TYPO3 - get.content.right
  1 Tumblr (3.0; @schwuk)
  1 Tumblr (3.0; @paultag)
  1 Tumblr (3.0; @castrojo)
  1 Tumblr (3.0; @bholtsclaw)
  1 Serendipity 1.2 - http://www.s9y.org/
  1 PyBlosxom http://pyblosxom.sourceforge.net/ 1.3.2 2/13/2006
  1 http://wordpress.org/?v=3.1-beta1-16590
  1 http://wordpress.org/?v=3.1-alpha
  1 http://wordpress.org/?v=2.9
  1 http://wordpress.org/?v=2.8.6
  1 http://wordpress.org/?v=2.7.1
  1 http://wordpress.org/?v=2.6.5
  1 http://wordpress.org/?v=2.5.1
  1 http://wordpress.org/?v=2.2.2
  1 blosxom 2.1.2+dev
  1 blosxom/2.1.2

So, remember. There are 404 total blogs. Let’s come up with some statistics!

Results!

The totals ( combined )

Wordpress:   114
LiveJournal: 5
Blogger:     5
Tumblr:      4
Dotclear:    3
blosxom:     3
Serendipity: 3
Other:       5

Percentage-wise:

Of all the blogs, 28.2% of blogs reported that they run Wordpress. LiveJournal and Blogger, as well as other engines not included in this paragraph ( combined ) both power roughly 1% of the blogs on planet.ubuntu. Tumblr has 0.9%. Dotclear, blosxom and Serendipity power 0.7% of the blogs on planet.ubuntu.

Out of all of the blogs that reported, 80% run Wordpress. LiveJournal, Blogger, and other reporting engines hold 3.5% ( each ). Tumblr holds a solid 2.8% ( myself included ). Dotclear, blosxom and Serendipity hold 2.1% each.

Overview

Wordpress is clearly the most popular blog engine that reported. Wordpress powers an astounding 80% of the reporting engines. This is the same as 28.2% of all blogs on planet.ubuntu. My script only identified 142 ( out of 404 ) of all of the blogs’ RSS engines. That’s a mild 35.1% reporting rate.

LiveJournal and Blogger power roughly 3.5% ( each ) of reporting engines, which corresponds to 1% of the total number of total blogs. Sweet.

2.7% of the blogs failed to render a page at the URL set in planet.ubuntu. They have either been deleted, or their domain has expired. All the domains that threw an error were on personal domain names.

I did not see any Drupal strings, so I think there must be a bug somewhere in the code.

I plan to re-write this at some point to be a bit more accurate. For now, I think that’s enough work.


Viewing all articles
Browse latest Browse all 17727

Trending Articles