2007-03-29 22:06Atomise.shAs explained last time, the PHP script I wrote as a contribution to MPlayer’s website could not be accepted, and I was encouraged to rewrite it in pure BASH. That is indeed what I did, and I include the code for this new version below. In addition, it would be natural to give a link to the output of the script, hosted live on their server, assuming that such output did in fact exist. Unfortunately it does not, and will not, because even my new script was rejected, without so much as a “Thank you for your efforts” or “Sorry you went to so much trouble.” As such there is a bit of a story to tell, which I will do here. First there was a response of apathy (again, names changed, parallel conversations removed): <Hagfish> Qux: did you get the bash script i sent for the website? <Qux> rss? <Qux> yes <Hagfish> great <Hagfish> what are your feelings on it? <Qux> to be honest, i have none <Qux> it’s not something i really care about then, renewed (and still valid) claims about the limitations of their hardware: <Qux> i’m hesitant to put anything in place that might increase load on the mirrors <Hagfish> that’s why you could test it <Qux> we lost two mirrors shortly ago <Qux> if i had the time and motivation - yes, i could .. <Qux> Foo: what about you? <Hagfish> the feed doesn’t include images or CSS <Foo> how can we test whether it creates more traffic or not, when we alredy have fluctiations up to 400% ? <Foo> i’m against it then a pointless insult by Foo which I won’t repeat or defend myself against, and then an attempt to actually understand the situation, rather than make excuses: <Foo> well, the only reason why i would consider something like this, if it might decrease the load of the mirrors <Hagfish> and the burden of prove is on me? <Hagfish> *proof <Foo> half <Foo> show me that it could theoreticaly decrease it <Hagfish> my theory is that you don’t have to download the images or css files, so it’s less bandwidth, but i need to calculate how often people would hit the server and finally some counter-theories and counter-counter-theories which didn’t really lead anywhere. Perhaps I should do more research to try to prove to them that the load increase wouldn’t be so great, but when their own statistics show that the fifth most popular file is a 7 megabyte tar.bz2 file, arguments about file size aren’t going to convince them. The only information unknown is the number of subscribers they would have and how those subscribers’ news clients would be set up. Even if I could somehow tell them that, they would still do a limited trial and watch the change in statistics for a while before fully accepting it. There is only so much time one should spend trying to persuade an organisation to accept your contributions, after which you have to accept that there may be better organisations out there to contribute to. For the record, though, here is the script: #!/bin/bash # atomise.sh - converts the MPlayer website’s news.src.en into an Atom feed. # # Copyright 2007 Hagfish # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License, # version 2, as published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. function outputHead { cat <<Here-document <?xml version="1.0" encoding="UTF-8" ?> <feed xmlns="http://www.w3.org/2005/Atom"> <id>http://www.mplayerhq.hu/atomise.xml</id> <title>MPlayer - The Movie Player</title> <subtitle>MPlayer News</subtitle> <link rel="self" type="application/atom+xml" href="http://www.mplayerhq.hu/atomise.xml"/> <link href="http://www.mplayerhq.hu/design7/news.html" /> <author><email>webmaster@mplayerhq.hu</email><name>Webmaster</name></author> Here-document } function outputMetaData { echo "<id>http://www.mplayerhq.hu/atomise.xml/"$isoDate"T00:00:00Z</id>" echo "<title>$title</title>" echo "<updated>"$isoDate"T00:00:00Z</updated>" echo ‘<link rel="alternate" type="text/html" href="http://www.mplayerhq.hu/mplayernews.html#’$anchor‘" />’ echo "<author><name>$author</name></author>" } function gatherMetaData { extractDate $i extractTitle $i extractAnchor $i extractAuthor $i } function extractDate { # try removing the text " :: " from the end of the line aDatePlus=${line%% :: *} #echo $datePlus if [ "$aDatePlus" != "$line" ] then # the removal has shortened the line, so this line contains the token present on a line with a date datePlus=${aDatePlus##*>} date=${datePlus%%,*} isoDate=${date//./-} fi } function extractTitle { # try removing the text " :: " from the beginning of the line titlePlus=${line##* :: } #echo $titlePlus if [ "$titlePlus" != "$line" ] then # the removal has shortened the line, so this line contains the token present on a line with a title title=${titlePlus%%</a>*} fi } function extractAnchor { # try removing the text " :: " from the end of the line aAnchorDate=${line%% :: *} #echo $aAnchorDate if [ "$aAnchorDate" != "$line" ] then # the removal has shortened the line, so this line contains the token present on a line with an anchor anchorDate=${aAnchorDate##*<a name=?} anchor=${anchorDate%%?>*} fi } function extractAuthor { # try removing the text "posted by" from the beginning of the line authorPlus=${line##*posted by } #echo $authorPlus if [ "$authorPlus" != "$line" ] then # the removal has shortened the line, so this line contains an author author=${authorPlus%%</span>} fi } function encodeEntities { noAmps=${line//&[^a]/&} noOpenBracket=${noAmps//</<} noCloseBracket=${noOpenBracket//>/>} line=$noCloseBracket } entryNumber=0 cat news.src.en | while read line do if [[ $mode = ‘waiting for content’ ]] then if [[ $line = ‘</h2>’ ]] then # $line is the end of a metadata section # start the new entry if [ $entryNumber -eq 1 ] then # before the first entry, there is a head section which contains the date the feed was updated outputHead echo ‘<updated>’$isoDate‘T00:00:00Z</updated>’ fi echo ‘<entry>’ outputMetaData echo ‘<content type="html">’ #this <div> is to match the uncaught </div> which in the source document closes the <div class="newsentry"> echo ‘<div>’ #Uncomment when the news page uses XHTML: echo ‘<content type="xhtml" xml:lang="en">’ #Uncomment when the news page uses XHTML: echo ‘<div xmlns="http://www.w3.org/1999/xhtml">’ mode=‘echoing content’ continue else gatherMetaData $line fi fi if [[ $line = ‘<div class="newsentry">’ ]] then # $line is the start of a new entry entryNumber=$(($entryNumber+1)) # check whether this is the start of the first entry if [ $entryNumber -gt 1 ] then # this is not the first entry # close the previous entry echo ‘</content>’ echo ‘</entry>’ fi # the next lines are gathering metadata until the content section starts mode=‘waiting for content’ fi if [[ $mode = ‘echoing content’ ]] then # this is one of the content lines to echo #Minimal content debug mode: echo -n ${line:0:3} #When the page is XHTML and the <img /> tags self-close: echo $line encodeEntities $line echo $line fi done echo ‘</content>’ echo ‘</entry>’ echo ‘</feed>’ Does anyone else feel like contributing to their project? Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
You may notice that the syntax highlighting isn't perfect. This is because the plugin I use for that purpose does not understand BASH and I had to specify Perl instead. I have also noticed that the script is not entirely pure as it uses "cat", but this script could perhaps be altered so that it can be invoked on the command line with
cat news.src.en | atomise.sh
or it could make use of a feature documented in the BASH manual
The command substitution $(cat file) can be replaced by the equivalent but faster $(< file).
|
QuicksearchCategoriesSyndicate This BlogBlog Administration |