Coding For E2: E2 Syntax (review) by kthejoker Mon Apr 14 2008 at 20:06:38

linked by OldMiner

Hi, and welcome to the second edition of the Coding For E2 guidebook. I'm your host, kthejoker, star of such coding projects as The Costume Shop and that annoying thing where guest users "like" cocaine abuse. Today we'll be talking about some of the specific E2 syntax and code structures that keep this piece of sh fabulous family friendly website of awesomeness afloat.

As stated in the initial article, we'll be using everybody's favorite ubiquitous nodelet Epicenter as our review code. You can view its code by visiting the node directly and appending "&displaytype=viewcode" to the URL or using the "viewcode" link within the Everything Developer nodelet. Let's look at the first 4 lines:

1: {borgcheck}[%
2:
3: my $isGuest = $$USER{user_id} == $HTMLVARS{guest_user};
4: my $isRoot = htmlcode('isAdmin');

Right away, we've got some great examples of E2-specific syntax:

{borgcheck}
E2 uses brackets to invoke all code on the site. It uses [% to invoke pure Perl, and [{ as a short of syntactic sugar for htmlcodes. An htmlcode is the equivalent of E2 (and any eCore implementation)'s library functions. That is, it accepts arguments and returns a value. Most commonly it returns HTML (hence the name) but they sometimes return values that are then used for logic purposes. In simplified form, the call is

{htmlcodeName:arg1,arg2,arg3...}

In this case, there's no args, so it's just {borgCheck}, which checks to see if the USER is borged or not. (Advanced Coding Tip: edev members can view htmlcode code by visiting the htmlcode. Check out borgcheck and see if you can figure out how it works.)
my $isGuest = $$USER{user_id} == $HTMLVARS{guest_user};
From the original primer, can you tell what this is doing? It's assigning the value of whether or not $$USER{user_id} (value of the user_id key of the $USER hash) is equal ("==", remember?) to $HTMLVARS{guest_user} (guest_user key, $HTMLVARS hash) to the variable $isGuest. Both $USER and $HTMLVARS are global supervariables for E2: $USER is the actual user visiting the page (i.e. you!), and $HTMLVARS is a lookup table of common E2 data elements (such as the common node_ids for things like Guest User and the Findings: page). So if the USER's user_id is guest_user's user_id, then $isGuest is set to 1, and we can do things like "if ($isGuest) {don't show them the vote buttons}". Other supervariables include $NODE (the node you're viewing), $VARS (your personal vars, such as your user preferences and ekw themes), and $DB (the database handle.)
my $isRoot = htmlcode('isAdmin');
This is the other form of an htmlcode call, actually invoking the htmlcode() function found in E2's custom HTML perl module. The syntax is htmlcode('htmlcodeName','arg1,arg2,arg3'). (Advanced Coding Tip: Sometimes this syntax can be troublesome, especially if your argument values contain quotes, apostrophes, or commas - the argument list is just passed as a bare string and is exploded within the htmlcode() function, so it's in your best interest to sanitize data before invoking htmlcodes using the data as an argument.)

Let's skip ahead now to line 12:

12: $loginStr = htmlcode('minilogin')."<p align='center'>".linkNode(getNode('Everything2 Help','e2node'), 'Everything2 Help')."</p>";

I've highlighted the relvant part of this line. This line contains two of the most important node functions E2 has: linkNode and getNode. getNode, as you can see, will return a node by using a title and nodetype. If there are more than one result, it returns an array of the matching nodes. linkNode takes the form of linkNode(NODE, linkText, Optional Hash of Parameters) and then generates a link from this information. What's nice about linkNode is it can also accept a node_id instead of a NODE reference for its first argument as well as a null second argument, so you can just return a node_id from the database and use linkNode to do the extra fetching of the title - so linkNode(220) will return a link to nate without having to do any more work. It's also nice because if you provide it a node, it links directly to it, whereas using linkNodeTitle or brackets doesn't guarantee there's not a similarly titled node out there that might conflict with your code. In this case, linking directly to Everything2 Help ensures nobody subverts that link.

For one more check of E2 syntax, let's head to line 102:

102: my $link = $DB->sqlSelect('to_node', 'links', 'from_node='.$$NODE{node_id}. ' and linktype='.getNode( 'coollink', 'linktype' )->{node_id}.' limit 1');

Here we have one of our many database helper functions, sqlSelect. There is a convenient list and explanation of these (and many other node functions, such as the previously mentioned getNode) at

Everything::Nodebase Functions

Which is pretty much required reading for E2 coders. But the key here is that the database functions link into the $DB supervariable to return results from the database. In this case, sqlSelect always returns either a variable if you're only looking for one field or an array for multiple fields based on a single record's worth of criteria. So it's useful when you're looking for counts, averages, or one specific database entry (such as a person's vote on a given writeup.) Here it's being used to pull the ed cool link for the node you're on.

So now you've had a bit of a primer in E2 syntax. There are of course over a hundred htmlcodes in use around the site, and plenty of module functions, too. For the most part, though, you just need to know how to get nodes and node info from the database and then do something with it in the code, be it displaying or computation or logic testing.

In our third installment, we'll take a look at the many different nodetypes of E2 and how they are used to govern each request from talking in the catbox to posting a writeup to joining a usergroup. We'll also trade pictures of our cats!

root log: February 2008 (log) by kthejoker Fri Feb 15 2008 at 19:11:47

linked by Oolong

A Brief Review of E2's Server Errors

If you have used Everything2 for any length of time you have probably discovered that fanciful bit of nonsense known as the server error. Cryptic, technical, and unwelcome messages requiring (often indifferent) higher powers to decipher and address. They're sort of like tax law in this regard.

Basically, with E2, there are two flavors of server errors: the 500 and the 503.

503

The 503 means Apache is down. This is the inevitable result of a cascading failure that looks something like this:

Our database design has historically been pretty minimal, and we have a lot of SQ queries that could (and should) be optimized but are not. To put it in perspective, most queries we run clock in at less than a tenth of a second (quite a few at less than a hundredth.) But the queries at Voting Oracle, for example, take about 7 seconds to run, which is a lifetime in terms of database queries. And since we use MyISAM tables instead of InnoDB (the InnoDB engine not being available in MySQL when E2 was first written back in the early 17th century), this invokes a table lock on the vote table. So anybody else trying to vote, or read Voting Oracle, or do anything that involves voting, has to wait their turn. This causes lag. However, once that Voting Oracle query clears, everything catches up pretty readily.
However. Sometimes a query is run that takes 30 seconds or more. Sometimes a minute. Sometimes more. And depending on the tables it is locking up, this causes MySQL to basically come to a standstill.
Meanwhile, people who have a frozen page naively hit refresh or try to load up two other pages while they're waiting. The child processes in Apache start loading up, and then ...
503! Until Apache restarts itself, anyway.

503s have been caused by other things: memcached filled up a hard drive with its log once, our Apache .conf mysteriously vomited at certain URL calls for awhile, and we've been DOSed on at least one occasion by a (I think benevolent) webcrawler.

500

A 500 error, on the other hand, is different. The server is just fine, but the request you're attempting is no good. There are typically 5 kinds of 500 errors, which we will now explain.

No Default Pages Loaded. Our most popular 500 error, this is the error message you get when you choose a displaytype on a node that does not have a htmlpage associated with that particular displaytype and node. For example, if you go to a nodelet node (say Epicenter, node_id 262) and add "&displaytype=printable" to the URL, you'll get a 500 error. This happens mostly for webcrawlers.
Solution Set the displaytype to "display" if an htmlpage cannot be found. Implemented February 19, 2008.
Nodegroup issues. The second most popular 500 error, I can diagnose it but without some debugging I'm not quite sure how to address it without setting off a thousand other issues. Basically, non-admin users have different rights around here, so when they try to view certain pages, they get a "NO SALE" sign. Only for some types of nodes, this comes as a 500 error due to how E2 handles nodegroups (an e2node is a nodegroup consisting of writeups, etc.) The canonical example is trying to view a jscript node (try empty javascript). Which to me makes no sense, since jscript is just a clone of fullpage, which does not give such errors. But clearly this is a traceable error - if and when we get our dev server running, this'll be addressed.
Solution Get better try-and-catch built into the nodegroup process.
Malformed XML. Generally our XML parsers and renderers are pretty well. The most common infringer is the catbox, people's Unicode and obscure characters (and our attempts to modify what people use from time to time) causing occasionally malformed XML and thus 500 errors on the sites that read that XML.
Solution: Work harder to conform to XML spec. But really, not that big a deal.
"Near Matches" searching. This pulls up errors every once in awhile, and I'm not sure how to duplicate it. But there is some magic entry somewhere that makes the SQL query it runs go bad, which causes the whole page to crash and burn. And it actually happens kind of frequently, so if you experience this error, please let me know.
Sex crawlers. I'm not sure what causes this one, exactly. But people using some form of adult content search engine are coming across our writeups regarding sex (mostly newsgroup-y stuff like bestiality and animation porn) and then from there submitting a form POST (?), which throws up a 500. Needless to say, Googling this is too embarrassing to be helpful.

So there you have it, the common 500 errors. Generally they occur do to lack of appropriate error catching and handling in our code. On the positive side, there used to be a lot more such errors and they have been addressed on an ad-hoc base and to say that we only have 3 or 4 common causes of 500 errors is not so bad.

Hope this was enlightening!

Coding For E2: A Primer (thing) by kthejoker Tue Dec 04 2007 at 19:38:56

linked by Oolong

Hi, I'm kthejoker. You may know me from such E2 coding projects as the Wheel of Surprise, My Achievements, and that thing that's horrible and needs to be undone or the site will collapse any day now. You know the one.

I'm here today to help you learn the basics of what it takes to code for E2, and some of the basic tips and techniques to make your transition to E2 coder a smooth one. To help us out, we're going to be using the Epicenter nodelet as our chief document and reference guide. If I start talking about "the code", that's what I'm referring to.

The Basics: Perl

Before we get started diving into the code of Everything2, let's be clear: if you want to code in Everything2, you have to know some Perl. There are a lot of books and websites on Perl and its labyrinthine syntax and methods, but we're just going to cover the nitty gritty here. Basically, you need to know 4 things: variables, operators, functions, and loops.

Variables

The first thing you need to know about Perl variables in E2 is that we use strict mode, which means you must instantiate each variable in your code with the "my" syntax. This keeps everything in scope for the interpreter. We'll see how this used below.

There are three basic variable types in Perl: the scalar ($), the array (@), and the hash (%).

The scalar is just a single valued variable.
my $num = 1;
initializes a scalar variable called num and sets its value to 1. Notice we used the "my" to initialize the variable, and we ended the statement with a semi-colon - another Perl requirement.
The array is a variable that can hold multiple values in a numerically-indexed .. well, array. Generally, these values are scalar in nature, but you can also have an array of arrays, or an array of hashes.
my @array = (1, 2, 3);
initializes an array and sets its first three values to 1, 2, and 3, respectively.
The hash is the last variable type, and is generally much more important than arrays for E2 coding. A hash is a lot like an array, except its keys can be associative. So, for example,
my %book = {title => 'A Tale of Two Cities', author => 'Charles Dickens'};
creates a hash with two keys, title and author, and sets their values. You can then retrieve these individual values using the "$$" syntax - $$book{title} - or the "->" syntax - $book->{title}. When an E2 node is retrieved from the database, it is returned as a hash - this is the basis for doing most of the dirty work.

Obviously, there's a lot more to all of this, but for now, this'll get you started. When looking through code, identify scalars, arrays, and hashes, and try to determine their purpose. Sometimes their name makes them obvious ($$NODE{title}) - other times, not so much ($isHappyPanda).

Operators

Lucky you, you've already learned one operator: the =. That, as you may have noticed, is the assignment operator - you use it to set variables to a particular value. There are also three other types of operators: math, concatentation, and comparison.

Mathematical operators are pretty straightforward: +, -, *, and /. But there are also some shortcuts - for example, $num++; will increase the value of num by 1. $num += 100; will increase $num by 100.
The chief concatenation operator is the period (.). This is used to combine strings together. So if you have
my $word = 'Hello';
$word = $word . ' World';
, then at the end, $word is equal to 'Hello World'. This also has a shortcut, .= , which gets a lot of use here on the site.
The comparison operators are a little trickier, because you have two different sets of them. The first set compares numbers and should be familiar: <, >, <=, >=, !=, and == (those last two are "not equal" and "equal to", respectively.) The second set compares strings and correlate with the numeric operators nicely: lt, gt, le, ge, neq, and eq.

Again, not comprehensive, but should help you understand what if ($$NODE{title} eq 'Butterfinger McFlurry') means.

Functions

Functions and subprocedures are great tools in Perl, but here at E2 we don't generally write our own functions - at least not directly (more on htmlcodes later.) The E2 Perl modules come with a bunch of handy subs, though, that get a lot of use, so you should be familiar with the syntax of a function call:

functionName(parameter1, parameter2, etc);

Not every function has parameters, but if they are required, you'd better have 'em. Also, many functions return a value that can be used in your code. A simple example is

my $fearlessLeader = getNodeById(220);

getNodeById does exactly what you think it does, and in this case returns Nate to the variable. Then we do something like

return $$fearlessLeader{experience};

which will then print the returned value (Nate's XP) to the display.

Loops

Loops and conditionals are a fact of life in programming, but luckily they're also pretty straightforward to understand. There are three main loops and conditionals in use at E2:

the if-else conditional. The syntax is
if (someBoolean) {doSomething;}
else {doSomethingElse;}
The else statement is entirely optional - you can just say
if (canVote) {castVote;}
and be done with it. Also there is a shortcut for one line if statements - you can just do
$num++ if $notDone
and Perl will do as it's told. (You can also do $num++ unless $Done, but that's pretty rare.)
the for/foreach loop. The for loop allows us to run the same code a certain preset number of times. The foreach loop goes through each item in an array or hash and runs code. So, same result, but different ways to implement a loop. This is useful when, for example, printing every writeup under an e2node, or printing the top 20 Staff Picks.
the while loop. The while loop syntax is
while (someBoolean) {doSomething;}
and does what you would expect: while someBoolean is true, it keeps running the code. In this case, you can run into an infinite loop unless you include code that sets someBoolean to false somewhere within the loop.

I'm kind of glazing over these, because really reading an example is the best way to understand what's going on, which we'll cover in our next couple of sections.

Overview

So we've covered some of the basics of Perl. As we're going through code, you may find yourself asking what this or that bit of code is doing. Always try to break it down in terms of the variables, functions, loops, and operators being invoked. E2 isn't exactly a bastion of good documentation, so I often find myself backtracking from some error point until I get to the root variable or function that is really at issue. Being able to go from the big picture to the small detail is essential to understanding E2 code.

Next we'll talk about some of the basic syntax for E2-specific coding: the global variables, database calls, and htmlcode functions.

E2 Community Development Newsletter, Summer 2007 by danne Sun Jul 22 2007 at 20:51:59

linked by RoguePoet

As many of you may have read, heard, and discussed, there are a number of changes and new developments in the works for Everything2. I had an opportunity recently to meet with Jack, clampe, and many of the staff in person and online, and discussed these developments for our community and our site. We've made good progress over the past many months addressing the implementation, scalability, legal, and staffing concerns. We believe we can make these things happen over the next few months with our coder staff. The new hardware is on its way, and with our newfound help, these features should be implemented by mid-October, 2007.

Administrative changes

Ownership disclosure: So we're all on the same page, Everything2 is fully owned by Blockstackers Intergalactic, which consists of Nathan Oostendorp, Jeff Bates, Kurt DeMaagd, and Rob Malda. Our hosting provider is Michigan State University's College of Communication Arts and Science, which is coordinated by Cliff Lampe. Dann Stayskal directs the day-to-day operations, overall vision, and administrative staff of Everything2. Jack Thompson is the Editor-in-Chief, in charge of the Content Editors, and all editorial decisions.

Copyright of submissions: Our policy isn't changing: users who submit content to the site retain full copyright to that material. We will, however, be branching out our options to make certain open licenses available, such as Creative Commons and Public Domain. These will be configurable when submitting new writeups or editing current ones, and a facility will exist to re-license your writeups en-masse.

Behavioral standards: We're in the process of drafting behavioral standards for users and staff. We're not trying to codify common sense, but we feel we need more than our current two words to relay the behavior we expect of ourselves as users, editors, and administrators. Please send any input on these to me.

Donation box: Because of our generous hosting contract with MSU, we're going to be closing down the donation box. The money given to it has been used to buy new hardware, or more frequently upgrade existing hardware. With our MSU budget for hardware, we need to avoid conflict of interest and stop taking donations. Thank you for the financial support you've given us over the past many years - it helped keep the servers online and ticking.

Staff interaction: We may be allowing noders with symbols to hide them in the Other Users nodelet, and adding a link to a general contact and help page, listing which members of staff are online, and who to contact with various topical, technical, and editorial questions. Likewise, we're also looking at giving the code gods their own usergroup separate from the main pantheon.

Relationships with other sites: We've historically been an isolated site, and we're looking to change that. Many of the new features in the works integrate with social bookmarking, networking, and multimedia sites such as flickr, del.icio.us, Digg, reddit, and Facebook. We're also improving our capabilities for linking to content on other sites.

New servers: We're pulling in some new hardware, care of the MSU College of Communication Arts and Sciences. We all saw a boost in speed and stability with the current crop of servers, which will improve many times over with the next group. Keeping our current farm around as backup support, we're ordering a rack of hardware to allow us to plan, develop, and deploy updates to the site in an orderly and tested fashion, with minimum downtime.

Community2 and "The New E2": Community2 was a testing ground for features we wanted to see on Everything2, as well as a place to explore new ways for noders to contribute to the nodegel. "The New E2" was a group of features we looked at to steer us towards a print journal. Community2 is now offline, and many of its features are in place here, many more on their way. Some features of "The New E2" are still in the works, but with substantial changes. These features are discussed below.

Code control

We're changing the way we look at source control and collaboration here. By the end of this month, or as soon as the development hardware is ready to go, we're going to call a code freeze on developments to the production codebase of Everything2. One of the new servers coming in will be used as a developmental mirror to facilitate orderly planning, development, documentation, and testing of the new codebase before deploying the featureset to the production hardware. This server will be open to our coder staff, edev, and to a group of beta testers.

New Features

Many of these features have been in the works for years, and are coming to be implemented soon. To be certain, this isn't a list of "what will happen" - it's a discussion of the goals towards which we're working on the back-end. Many of these will come to pass, some will wait until the next development cycle. It will all depend on how much code, documentation, and testing help we get.

Usergroups: We'd like to begin allowing the option of open usergroups, allowing users to join and leave without having to contact an owner. Likewise, those groups which prefer to remain controlled should be able to send invitations for users to join. Finally, noders level six and above should be able to create their own groups.

User control of their own content: Noders will soon be able to remove their own writeups from the database, and move their writeups to new nodeshells. This frees our editorial and administrative staff to concentrate on improving writeups, rather than filling nuke requests and title changes. Nuke and title change requests will still be around, though, for requests on others' work. A new facility should be created for our staff to review changes to writeups, and for users to flag writeup changes as "significant", or to be reviewed.

New themes: We're finishing our XHTML and CSS-driven "Zen" theme, which will allow us to modernize the look and feel of the site. We're planning a contest, with a cash reward for the best new theme for the public face of Everything2. This will also allow us to modernize our printing capabilities, and increase interoperability with mobile users and those who rely on screen readers. Details on that will be posted as soon as the contest rules and XHTML semantic backplane of the site are ready to distribute.

Multimedia writeups: We will finally be allowing multimedia content on Everything2. This will include images, audio, and video, with creation tied to the voting / experience system. Images and video will work somewhat analogously, in that they will be integrated into writeups. The way it will work is this: A superdoc will be available to upload images (hosted locally) or YouTube links. Once the content is received, you'll be given a token such as <e2image id="527387610"> to drop into the texts of your writeups. This will integrate the content where you placed the token without having image and video content as standalone nodes. There will be image and video repositories, though, to find images which others have uploaded which you'd want to include in your own writeups. Audio will be managed slightly differently - it'll be integrated with writeups. When posting a writeup, you'll be able to submit an MP3 with that writeup of supporting audio. This is geared toards being a spoken word rendition of the writeup: an audio nodetype.

Images will probably be hosted locally, and integrated into writeups. If it turns out to be load or bandwidth-prohibitive, we may source this out to social photography sites such as Flickr or 23hq, but the link-token system will remain the same. Noders level three and above will be able to add new images to the database, and all users will be able to tie existing images into their writeups. Noders will also be able to size and align images using attributes to the token tag, e.g. <e2image id="527387610 " scale="40%" align="center">
Audio will be available as a part of writeups, to complement the writeup itself, and to allow for podcasting. We anticipate hosting an RSS feed of the most recent writeup recordings to this end. These recordings will also be hosted on our servers, with no current plans on outsourcing. The current plan is to allow noders level two and above to post audio to their writeups. We may also be able to simultaneously support an alternative model for sound clips used to augment audio-related descriptions in a writeup, such as phonetics and music theory.
Video will likely be integrated from YouTube, in the same manner by which images are integrated. These will be available to be integrated into writeups. Audio and images will take priority in implementation over videos, but chances are we'll get to all three. Noders level seven and above should be able to post videos.

The heart of Everything2 has always been writing and community, and shall always be writing and community. Images, audio, and video are the spice, not the meat. Multimedia content, at least in the beginning, will only be allowed in writeups, with the possible exception of images in comments.

Registries: GTKY content should soon have a home again on Everything2, by means of registries. Taken from Community2, these consist of a question and a series of answers given by users, with the option of displaying the answers on their homenode. The creation of registries will be available begininning with level four. Responses will be votable, but not C!-able, and able to be moderated by our editorial staff in the same manner as writeups. 2 XP will be given for each registry response, with our editorial staff still maintaining oversight.

Syndication: Nate has our RSS and ATOM feeds mostly sorted out, and we'll be adding more feeds as time permits. We'll also be standardizing our feed links and header data to allow cleaner interoperability with current standards and neighbor communities. Data from all feeds will be available in RSS, ATOM, and JavaScript feedrolls to be placed on other community sites. We're also working on a module for integration into Facebook's new plugin system.

Writeup tagging and searching: We plan on doing away with the "person", "place", "idea", and "thing" designations, in favor of tagging of writeups. This will also facilitate searching the content by metadata while we keep working on how to implement a full-text search as part of a later release. Users will be able to free-text tag their own writeups, and the writeups of others, and will be given 1 XP for doing so. Our editorial staff will be able to moderate submitted tags, and users will be able to tag their own writeups en-masse, in the same manner as assigning their writeups a license.

Writeup comments: We're implementing a system of threaded discussion on writeups which will enable users to contribute comments on a piece of writing without having to send a message. Comments will also be votable, but not C!-able. These will be moderated by the userbase on the whole, by means of "spam", "abuse", and "correction / corrected" button in the comment header. These buttons will enqueue the comment to a list to be presented to the editorial staff for review. Noders level three and above will be able to leave comments on writeups.

Other writeup features: Once a noder has reached level eight, he or she will be able to create writeups immune to votes and C!s if they so choose. This will be on a writeup-by-writeup basis. We will also begin allowing more than one writeup per user per node, for noders of any level who have as much to say about the city in Texas as the city in France. Writeups should also be able to connect in a series, with "next" and "previous" links in the footer. Finally, writeups will be able to carry their own lede / summary / abstract as a separate text block for quick review through integration into user and site search.

Fully semantic URLs: We're working on updating the URL structure of the site to reflect semantic web conventions. URLs such as http://www.everything2.com/index.pl?node_id=650043 , we'll soon be replaced by their semantic equivalents, http://www.everything2.com/users/dann . This allows the site, in combination with RDF metadata, to be shared wth people as easily as with machines.

Frontpage updates: We'd like to completely overhaul the front page, allowing users to see only the information they want, with dynamically controllable and orderable AJAX widgets. The widgets we'd like to implement first are:

a site calendar for quests and gatherings
lists of the most recent nodes, C!s, and editor cools
recent site news
daylog posts by friends, and
new messages
a daily tag cloud.

The frontpage as shown to users not logged in will be vastly simplified, with only news, recent nodes, C!s, a search box, and editor cools.

Social Network: We're adding the facility to link users by XFN relationship types, including friend, colleague, neighbor, and spouse. A full list of these relationship types is available at http://gmpg.org/xfn/11 . This may have an opt-out button for those who'd rather not participate. This data allows us to show users new content contributed by their friends, whether via daylog post aggregation, syndication feeds, or the nightly email.

Polls: Polls will now have the option to be available to level one users, and closed polls will show the results regardless of whether you've voted on it.

Nightly Email: Finally, we'll be bringing back the E2 Nightly Email. This will allow you to follow the nodes of those in your friends list, and optionaly a summary of all new writeups for the day.

There are a number of other site facilities we're looking at streamlining, such as Node Heaven, the Everything FAQ, Message Inbox, and Everything User Search. If you have any suggestions of minor features which can quickly (by your estimate) be added to the code-gel, drop me and message and we'll discuss it.

Ad Campaign

Coinciding with the above updates, Jack will be running a grassroots international advertising campaign, something in between a flyering effort and a quest. handbills will be made available to participants that they can print and xerox and post on bulletin boards or in coffee houses, record shops, rec rooms, and cafeterias to start. Our target ad-space is anywhere there's foot traffic where potential users could be hanging out or wandering past, the basic idea being to bring us some new talent. Quest participants will be encouraged to daylog their progress and to take pictures of their efforts for XP and C!-type rewards.

"How can I help?"

If you can volunteer some help, we'd love to speak with you:

If you can help code in Perl, JavaScript, XHTML, and CSS, send a sample of your work to dann@everything2.com. Experience in edev is a plus.
If you're wiling to help out as a beta tester for new E2 features, send an email to dann@everything2.com to get on the list.
If you have input on the behavioral standards, or any administrative development, contact me.
If you can help edit or moderate, contact Jack. We're taking applications.
If you're a graphic designer who'd like to help out with the new theme, watch the frontpage for news on the contest.

-- Dann Stayskal

Updates, July 29th, 2007

Based on feedback from noders and staff, we've decided to make a few changes to the plan:

We decided to allow users with symbols to hide them in Other Users, rather than simply turn them off unilaterally. Staff membership will still show up on homenodes, however.
I removed the bit about getting rid of the "Gods" title. We'll still plan to separate code gods from the the rest of the pantheon, mainly for security and stability concerns.

edev: Tables and HTML Validation (idea) by call Thu Sep 11 2003 at 23:32:54

Tables and HTML validation

HTML in E2 writeups is filtered; only a subset of the available HTML tags are actually allowed. There are approximately two reasons for this:

Malice: to avoid allowing untrustworthy users to spoil the E2 experience for the rest of us by using scripts, images, and external links (to, eg. goatse.cx)
Functionality: Scripts or tables can quite easily break an E2 page, either by acccident or on purpose. For example, since layout is performed mostly with tables, a spare </tr> tag could wreak havoc with the layout of the rest of the page, particularly nodelets etc.

Tables would be fine if they could be validated to prove that they're well-formed and hence will not break tables outside the writeup. Tables in particular aren't allowed because:

Table validation code was believed to be non-trivial to implement.
It was believed that it'd be too expensive to implement in terms of CPU time on the web server

Also, another issue which was not discussed is that allowing tables in writeups might be "exploited" for forcing layout within writeups; eg. to split a writeup into columns, etc.

Having written some fairly simple code to validate table HTML, running some performance tests on it seems to show that the 'worst case' HTML (a nightmare of tables, taken from an entire pageload on Community2) indicates that the table validation takes about a fifth of the time the normal HTML validation takes.

Approach

The design of the table validation code emerges from a slightly different approach than is used for general HTML validation. The existing HTML validation code in ecore strips out disallowed tags to create valid HTML. Transforming the markup in such a way can be fairly expensive, and for tables could be quite complex to perform. Hence, a much simpler approach was adopted based on observations of writeup content.

Most writeups (currently all of them) do not contain any tables whatsoever.
Writeups which contain tables will mostly contain valid tables, and thus require no transformation to ensure valid and innocuous HTML is the result.

Thus the two cases for which performance should be optimised are the corresponding cases of no table tags at all, and tables which are valid and well-formed. Hence the main code merely scans the structure of the tables in the HTML and determines whether tables have valid structure or not. When a malformed table is discovered, the code adopts an entirely different approach (and one which may in the long run be more useful...) of rendering the table tags visible in the displayed HTML, and adding <div> tags with dashed outlines around the elements to aid debugging.

With this approach, the majority of the web server's activity will be in simply checking the validity of any table tags.

Here's the source to the htmlcode:

# Okay, in brief:
# Fast 'cause it's optimised to the 'common' cases:
# Most writeups have no tables.
# Writeups that have tables will mostly have valid tables:
#   => Only a quick parse to validate.
# We 'enforce' the validity of tables by outputting debug info
#   for badly formed tables. This is UGLY so writeup authors will
#   fix 'em quick.
# In an HTMLcode, so compilation of this code is amortised.
# [screenHTML] should still be used, and can be used to control
#   attributes in the tags. Ideally this works on the output of
#   screenHTML, but only because the 'debug' output uses <div>s
#   with dashed outlines to help HTML writers find their oopsies.


# Should be reasonably fast: scans through the HTML using a m''g, which
# is about as fast as anything in perl can be. Stacks the tags (only
# looks at table tags) and checks the structural validity by 
# matching a two-level context descriptor (stack . tag) against
# an RE describing valid contexts. (again, perl and RE => faster than
# a bunch of ifs or whatever)
sub tableWellFormed ($) {
    my (@stack);
    # Note that htmlScreen ensures that the HTML input to screenTable will
    # only ever terminate the tag name with a space or a closing >, so this
    # is all we have to match.
    for ($_[0] =~ m{<(/?table|/?tr|/?th|/?td)[\s>]}ig) {
        my $tag = lc $_;
        my $top = $stack[$#stack];

        if (substr($tag, 0, 1) eq '/') {
            # Closing tag. Pop from stack and check that they match.
            return (0, "$top closed with $tag")
              if pop @stack ne substr($tag, 1);
        } else {
            # Opening tag. Push, and check context is valid.
            push @stack, $tag;
            return (0, "$tag inside $top") 
                if (($top.$tag) !~ /^(table(tr)?|tr(td|th)|(td|th)(table))$/);
        }
    }
    return (0, "Unclosed table elements: " . join ", ", @stack)
        if ($#stack != -1);
    return 1;
}

sub debugTag ($) {
    my ($tag) = @_;
    my $htmltag = $tag;
    $htmltag =~ s/</amp;lt;/g; # should be encodeHTML, but of course
                            # I don't have that in my standalone testbench.
    $htmltag = "<strong><small>amp;lt;" . $htmltag . "amp;gt;</small></strong>";

    if (substr($tag, 0, 1) ne '/') {
        return $htmltag . "<div style=\"margin-left: 16px; border: dashed 1px grey\">";
    } else {
        return "</div>". $htmltag;
    }
}

sub debugTable ($$) {
    my ($error, $html) = @_;
    $html =~ s{<((/?)(table|tr|td|th)((\s[^>]*)|))>}{debugTag $1}ige;
    return "<p><strong>Table formatting error: $error</strong></p>".$html;
}

my ($text) = @_;
my ($valid, $error) = tableWellFormed($text);
$text = debugTable ($error, $text) if ! $valid;

$text;

A similar approach might yield speed improvements to htmlScreen too. It should be possible, for instance, to construct a single regular expression to determine if some HTML consists entirely of valid tags and attributes (and this regular expression can be constructed automatically, of course). However, since there's a large base of existing writeups that may have invalid tags and attributes that nobody has noticed (since the existing htmlScreen provides no real feedback), it's unlikely that this will be a particularly palatable idea.

As a last word, caching of computed results in ecore is something that could be done better, and more consistently; but we all know this by now, and it's late in the evening so I'm not going to tell you things you already know.

Y'all can try it out over at

http://community2.org?node=test+screenTable

http://kahani.org?node=test+screenTable

In particular, I want y'all to try and break it, defeat the validation to get unpleasant HTML through the validator. And remember that in the final version it will be used in conjunction with htmlScreen... ;)

<- newer | older ->

edev

Category:

Venerable members of this group: