On a fairly frequent basis, I (and Jen Riehle) have to help resolve random website issues for sites hosted on campus. Generally, the remedy ticket comes in, we check it out, and try to resolve the issue as quickly as possible. Sometimes it is very quick, but sometimes it takes a little longer because we were not the people who originally developed the website. It would be helpful if we could easily find the original developers, so either they could resolve the issue, or at least help point us in the right direction. This becomes quite necessary when dealing with graphics that need to be altered, or navigation that doesn't immediately make sense.

So, what I'm asking is for ideas on the best way to store this information. I'll pose this as the starter idea:

In the .htaccess file at the root of any ncsu website, add the following information as a commented section:

# Begin Author Information
# Date: (site launch date)
# Lead Author's Name (or multiple author names)
# Department:
# Email Address:
# End Author Information

This way, it can be replicated on any website developed on campus, and can be found in the same place for any website at all. If you don't have an .htaccess file in the website you're developing because you're not creating any permissions or custom rules, then you can still put this information in there, and nothing else.

Anyway, there's a start. I'm open to new ideas, because at the end of the day, I'd like to have something standardized that all web developers on campus can follow without too much effort. Ideally, I don't want us to get bogged down with implementation issues or buy-in questions before we develop the template or method itself.

Views: 36

Reply to This

Replies to This Discussion

Nick, excellent suggestion! This is something we've recently been pondering in Engineering, especially for all of our Student Organization websites which experience ownership turnover just about every other semester.

Using the .htaccess file to store this information could be a bit cumbersome for us. Most of our .htaccess files are actually auto-generated by WolfTech's GuardDog tool. While GuardDog does allow you to add custom code to the auto-generated file, you would still have to maintain this information via GuardDog and not the file system. Any users that have read/write perms on a particular locker but don't have access to GuardDog would be unable to get to it.
Which sites / AFS paths will be affected by this policy?
I would hope that all sites developed and housed on NCSU servers would have this information stored. I don't know that we could ever mandate or enforce this kind of thing. Its more of a good practice, so that people who follow in your footsteps, or people who have to fix/update things on websites you've designed can easily find who the original developer was.

So, i guess to answer your question directly...(in my opinion): all sites. all afs paths.

This is all just discussion at the moment --- there's no mandate or push coming from any higher-ups (AFAIK), so this is something that we campus web developers could decide upon for ourselves, for the betterment of our own community.
As Mike says (he and I work together), we have been talking about an "About this Site" tool that could be written for each site that is housed in one of our web lockers.

As our sites multiply, we need to keep some record about the who, what, where, when and how of their development. I have been thinking of a db that might generate a custom "About this Site" page with such content as:

Who is responsible for it/authorization
Who designed it
Who supports it
Compliances
Analytics/stats
Refresh dates
Technologies/versions used
etc.

Most sites have some kind of "About Us" which is about the people and content of the site. But few have good accounting/reporting about the site as a web resource. I think we need something driven from the vhost level or even across vhosts, if possible.
i thought about a centralized database of site information that could be easily searchable...but... having that information separated from the site itself may pose problems with keeping things up to date. At least with an htaccess-type solution, there's no need to remember to go to a separate website, login, and update site information.

I do like the idea of a website registration system though, and think that would be very nice to have that information handy. I wonder if we could have a hybrid solution? An htaccess file with the most basic information in it, and then the registration system that has a more fully compiled list of websites and their characteristics/metadata etc.
Another idea would be to use an existing standard, like Dublin Core.

I like the idea, but I feel like the big challenge is going to be to get people to agree on some standard, then to get enough people to do it, then getting them to keep it up to date. In the Library we use Dublin Core for both Internet and intranet, but the level of maintenance on the metadata varies greatly across the site.

The techie wiki had started to compile some of this information. I don't think it was hostname specific though, more arranged along organizational lines. http://techies.ncsu.edu/wiki/Webservers_%40_NCSU

The .htaccess file would only be readable from the file system. Something that could be accessed by the web side could be useful to consider (like an admin.txt, or DC metadata) and might lend better to a mixed central/distributed method of maintenance.
I like the Dublin Core idea (all this time I thought it was Dublin, Ireland, not Dublin, Ohio. Who knew?). See 2.3. Metadata Storage and Maintenance Issues of http://dublincore.org/documents/usageguide/ and http://www.ukoln.ac.uk/metadata/dcdot/. I think we could work with a subset of the standard in any case.
A a good idea.
But, I do have one follow up question.
If it's in a .htacces file, and we do not have PTS permissions to get access the the web site's files, how could we get to it to look at it?
It needs to be somewhere where there is unrestricted access (header, footer, META, About this Site, etc.)
It would also be nice for it to NOT be dependent on what tool is used to develop the site (Dreamweaver Templates, Cascade, Drupal, etc.) Some generic way of doing it.

My $0.02 worth.
Don
Im not familiar with Dublin core, but i like the idea of using an existing standard. I dont think we need to reinvent the wheel if someone has already built it for us.

I guess the htaccess file may not be the best idea, for the permissions reason you mention. The reason i thought this may be the easiest solution, is because i was only considering it for use when developers are needing to update the site. If they are updating the site, then they'll have permissions to the htaccess file, and if they arent supposed to update the site, then there'll be no need for them to see the metadata we are talking about. I guess there will be times when you may need to see the metadata, in order to direct a troubleshooting request to the proper person, so that puts a vote in the column for a non-htaccess solution.

John -- could you explain a little how you are using the dublin core standard for the library? you mentioned the level of maintenance varies....how so?

I wonder if we can keep our scope fairly narrow to begin with, or maybe specify a smaller set of 'mandatory' information, and then if people want to include extra, they can choose that on a site by site basis. I was thinking that we could just store those few things i mentioned in the first post ( site launch date, Lead Author's Name (or multiple author names), department, Email Address). although i think that the 'refresh dates' may be helpful to the department itself (and its own developer), i'm not sure its valuable information for the purpose i had with this topic. But maybe that's my narrow-minded approach to this singular issue.

I'm trying to think how we move this forward, as it seems that we agree that this is an important issue to address. Do we need to develop and agree on a standard set of data first, and then a secondary set of data that is more of the optional type? Or should we decide on a method of storage first, and then figure out the data later? I guess we could tackle them at the same time too. I'm hoping we can have some kind of idea to talk about at the next web developer's meet-up, so we can have an idea from the whole community (in case they aren't checking these forums). I'm not sure on the date right now, but this doesn't seem too insurmountable does it? I like to be able to present an idea that those of us interested in it have agreed on, and then see what's left standing when we show it to the whole community. Does that sound feasible?
We have a tool on our intranet to help authors add the appropriate tags to any page they author. I don't know why it is sequestered off, but to give you an idea: screen shot of the DC generating tool. Also the results of filling out the form.

I'm relatively certain I could replicate it easily as a weekend project. (those sound like famous last words).

Unlike the idea for the .htaccess solution, DC is a per file solution. You can certainly integrate it with a static template or some other form of CMS. The down side is this is per file, where as .htaccess file (or some kind of .txt file convention) could be assumed to apply on a folder level. In that sense at least it is less maintenance.

I think storing the information in a way that can only be read from the file system has its uses. It would make some people more comfortable though I think there is also a growing desire for transparency. One issue is that while (it seems) most of campus is hosting web out if AFS there are significant groups that are not. Engineering has (or at least had, I can only assume it is still in place) a great centralized system where they were tracking owners and access to all the lockers. Complex, but powerful. With the .htaccess files, if only the "admins" need to have access to the information then it works fine. If we are going the route of community support and collaboration, it would be problematic.

One example, let's say I find a page that has something really great, or really bad somewhere else on the university web presence (a pretty amorphous concept). Knowing who to contact would depend on what it is I'm looking at. If there is some neat thing they are doing with JavaScript or Micro-formats I might need to talk to the designer/programmer. If there is content I want to get permission to duplicate the author would be better.

If you are aiming to improve the facility for a central point of contact for any page on campus, relying on .htaccess wouldn't work to well. I guess one thing is to figure out what the desired outcome is. I think there are several slightly different motivations being voiced.

When I say "level of maintenance varies", I mainly mean that what we are doing (currently) is mostly static and per file, and it is in the source. Often times people will edit, or even assume responsibility for a page without thinking to update the metadata. We have some really through user training, but when you're in a rush keeping up metadata is easy to forget. So if there is a way we can help keep the metadata up-to-date that would be desirable.

I think keeping the scope narrow is a great start. Then depending on what the goal(s) are pick an appropriate format/convention. Looking at a body of standards, rather than just one would be good. There are probably micro-formats we could consider. I don't know if there is anything like a robots.txt for what we are looking at, but if there is some convention we could look at that as well.
ok -- so i dont want to let this go too long without moving us incrementally forward (or sideways as the case may be). I'm not sure what the next step should be though.

It seems that the solutions we are talking about are:

1. Centralized database of website information (a full compliment of info about the site), which is updated by the developer upon launch, and then by the party who is maintaining the site as time moves on.

2. Decentralized method of storing simple details about the website. Probably not in an .htaccess file, but something that can be placed in any NCSU directory. An equivalent of robots.txt was mentioned.


As mentioned before, it might be helpful to start with a narrow scope (ie: #2 above), and then branch out from there. How about a file called something like "siteinfo.txt" and it contains the following:

-- Developed/Designer Information (Name, Dept, Unity ID's, Email)
-- Last major version/launch date
-- "Supported by" information, if different than the developer information. (Name, Dept, Unity IDs, Email)

In order to help people learn or remember that the siteinfo.txt is there, would it be helpful to put a comment line into the htaccess file that refers to it? At least then, when a permitted person does see the htaccess file, they'll remember to check the siteinfo.txt for full information.

Anyway -- there's a start. What does everyone think? By no means does this have to be the solution, but i figured if i posted something, we can say yes or no to it, and keep this moving till we get something that everyone can agree on.
Good idea. Could #2 be leveraged to create and maintain #1? Such as, some tool would scan the host directories for the info file (I'm partial to "site.info", so it'll stand out from a regular .txt) and parse the information to the database? That way developers only need maintain one set of data. If you use a basic .ini file structure, it should be easy enough to parse and extend (see attached file for example).
Attachments:

RSS

© 2012   Created by Jason Austin.

Badges  |  Report an Issue  |  Terms of Service