Last modified: 980124
Return to "How To" main page.

WEBER HELP FILES ON CREATING A WEB PAGE

The following files were downloaded from weber's help pages. I reformated them slightly to put them into web page format and added an occasional comment in italics (with my initials after it). If there are ambiguities that could be due to my modifications, check the original files on weber by typing "help" at the % prompt and then "web" when you get to the help pages. If you have questions about anything on this page, remember that it was created by the weber consultants, not by yours truly. The items I have put in italics in the table of contents I regard as absolutely, totally, and completely crucial. There is no reason to talk to you if you haven't read them. DKJ
Contents:
File:: anthro.ucsd.edu:/usr/local/lib/info/offerings/userhelp/www,d

Intro to World Wide Web (WWW)

Much has be written about the emerging Information Super Highway (ISH). For Weber users, access to the ISH is available now through the use of WWW (World Wide Web) browsers which include Netscape for Macintosh and Window PCs as well as a vt100 terminal-based browser called Lynx. These Web browsers (there are many) access documents written in the "Hypertext Markup Language" (HTML) and are intended to be self-guiding and a limited set of instructions are printed at the top or bottom of the screen depending on the browser, once the browser is started.
The following, indented material is a bit obsolete and probably irrelevant to us. DKJ

Lynx is already installed on Weber. The Netscape software can be obtained from Network Operations on the first floor in AP&M. (Bring your checkbook, it costs $2. ) You can only o use Netscape or other visual WWW browsers if you are connected via ethernet in your office or a SLIP connection from home. If you only have a LAN connection you will have to use Lynx.

For those with a VT100 (or a VT100 emulator), use Lynx (invoked by the command "lynx"). If you are using a VT100 terminal emulator, such as a MicroTerm 4520, you will need to key in the command "setenv TERM vt100" before you start Lynx. Note that with lynx there are limitations on information display since VT100 terminals can only display ASCII text and not graphics. Not all VT100 emulating terminals have been tested.

If you would like to run Netscape from your Windows PC (DOS & OS2 are not supported) or Macintosh (requires SYS7 and MAC II or higher) you must have access to the internet, and need assistance with installation and setup, SSCF staff can help.

In browser terminology, information databases are referred to as "URL's" (Univeral Resource Locator). Inside Netscape, a URL can be changed by pulling down the "FILE" menu and selecting "Open Location" and then specifying the URL. Inside Lynx, one uses the "G(o)" key to initiate a URL change.

See "help internet" for more information on other internet resources.


File:: anthro.ucsd.edu:/usr/local/lib/info/offerings/userhelp/www,d

The Social Science Division Web Site

The Social Science Computing Facility would like to announce the Social Science Division Web Site. This site contains links to departments in the division, databases, info on creating your own web page, and more.

The URL is http://anthro.ucsd.edu/


File:: anthro.ucsd.edu:/usr/local/lib/info/offerings/userhelp/www,d

Setting Up Your Account for Web Pages

There is a way to make your own web page, known as a "home page" available to those on the Internet. The next couple of articles describe setting up a home page.

At the weber % prompt execute the command 'htmlsetup'. This command will create a directory called public_html under your home directory and copy a template homepage called index.html into that directory. It will also create a link under your public_html directory to the Weber Graphics directory. See article on Graphics for more info.

Note: You can have as many html documents as you want. The only way they can be viewed through WWW is if they are in your public_html directory. Make sure any files you want others to be able to look at have read and execute privileges for others. The Unix command is:

chmod og+rx filename(s)
In my experience this is the default status of anything you put in your public_html subdirectory, so it happens by itself. DKJ
File:: anthro.ucsd.edu:/usr/local/lib/info/offerings/userhelp/www,d

Editing Your Home Page

Make any changes that you want to the index.html file.
The file called "index.html" created in the previous step is a generic home page waiting for you to substitute your own name for the words "Your Name" and that sort of thing. Editing is therefore necessary, and the fact that they have already created a template means you can do it without actually knowing any of the commands they have in there as long as you don't change the commands themselves until you understand what they do.

The editing can be done with any word processor because web pages are merely text files containing HTML commands and named with a name ending in a dot and the letters htm or html (your choice). The ONLY program you need to edit a web page is a word processor, the more primitive the better. To view the file as a web page the ONLY program you need is a web browser, typically Netscape or Internet Explorer --for practical purposes they are interchangeable.

If you have a stand-alone computer and wish to edit your pages off-line, you can do this and then upload the finished text files exactly the same way you would upload any other text file. Most people these days do this using FTP ("File Transfer Protocol") and several programs are available, usually by that name, for doing this much the way you go about getting your mail or otherwise accessing your account. If you prefer to edit your pages directly in weber using any of weber's text editors, that works fine too. DKJ

Learn HTML to individualize your home page. Here is the URL for the HTML Primer:

http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html

You can of course rename your home page from index.html to anything you prefer. The only advantage to calling it index.html is that you do not have to explicitly name it in the URL to view your homepage. It is the default. For example, if your home page is called index.html then its URL is

http://anthro.ucsd.edu/~username

(username is name you use to login).

If it is called something else (e.g. home.html ) then its URL is

http://anthro.ucsd.edu/~username/home.html


File:: anthro.ucsd.edu:/usr/local/lib/info/offerings/userhelp/www,d

Registering and/or Deregistering Your Home Page

If you want to have your home page listed under the Weberian Home Pages list, then at the weber % prompt execute the command 'reghome'. This command will ask you for your first name, last name, username, and the filename of your homepage.

Your name will appear on the list automatically and you will get an email message for confirmation.

If you decide that you don't want to have your page registered or if you feel you have made a mistake you can deregister your home page. Simply enter the command 'dereghome' at the weber % prompt.

The best way of learning about the Web is to start browsing, also known as "Surfing the Net".


File:: anthro.ucsd.edu:/usr/local/lib/info/offerings/userhelp/www,d

Using the Graphics directory on Weber

The World Wide Web browsers like Netscape can look at graphics which are .gif files. Therefore, a gif is just a graphics file...an image that can be used to enhance or personalize an HTML document.

To save disk space we have collected numerous gifs that can be shared by weber users. To visually view the available gifs the URL is

http://anthro.ucsd.edu/Graphics/Graphics.html

The html command below each gif on the above URL indicates how you would refer to the gif in an html file. For this reference to work you must have a symbolic link to the Graphics directory under your public_html directory. You can do this by either typing 'htmlsetup' at the weber % prompt or by typing

ln -s /usr/local/lib/ftpd/httpd/Web/Graphics Graphics

at the weber % prompt while you are in the public_html directory.

To save disk space we would like just one copy of a gif on weber at a time. If you download some gifs that are publicly available, instead of keeping a copy in your directory please copy them to /usr/local/lib/ftpd/httpd/Web/Graphics/Contrib

All gifs in this area will be sorted and made available to all weber users.

My impression is that the above system has broken down pretty completely. The gifs in the collection are uninteresting and more difficult of access than this suggests. Unless you are going to use a lot of them, this may not be worth bothering with. DKJ

Interesting URL's

The following is a short list of URL's which may be interesting:
UCSD InfoPathhttp://infopath.ucsd.edu/
CS Departments Worldwide http://www.cs.colorado.edu/homes/mcbryan/public_html/bb/3/summary.html
A Conglomeration of Mosaic Hotlists http://www.stolaf.edu/people/staff/fritchie/hotlists.html
Encyclopedia Britannica http://www.ed.com/eb.htm
Index of Images Storehouse http://cornea.mbvlab.wpafb.af.mil/image_storehouse
Internet Resources Meta-Index http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/MetaIndex.html
The Mother-of-all BBS http://www.cs.colorado.edu/homes/mcbryan/public_html/bb/summary.html
Weather Maps and Movies http://rs560.cl.msu.edu/weather/
The WWW Virtual Library:Subject Catalogue http://info.cern.ch/hypertext/DataSources/bySubject/Overview.html
The WWW Initiative: The Project http://info.cern.ch/hypertext/WWW/TheProject.html


File:: anthro.ucsd.edu:/usr/local/lib/info/offerings/userhelp/checker,d

Checking URL Links in HTML Documents

The World Wide Web is constantly changing. Because of the constant growth, URL links in HTML documents become invalid when sites move and change. On weber we currently have a program called 'checker' which helps users keep their HTML documents up-to-date and useful. File:: anthro.ucsd.edu:/usr/local/lib/info/offerings/userhelp/checker,d

Quick Start With 'Checker' Program

Create a file called .checkercfg in you home directory. It should contain the following lines:

-base-address http://anthro.ucsd.edu
-current-address http://anthro.ucsd.edu/~username
-only-errors
Then to check all of your html files in your public_html directory type the following:
% cd public_html
% checker *.html
If you have html files in subdirectories under the public_html directory you have to check the files in each directory separately. For example, if you had a subdirectory in your public_html directory, you would check the files in the subdirectory with the following commands:
% cd subdir
% checker -current-address http://anthro.ucsd.edu/~username/subdir *.html
The 'checker' program will output messages indicating which URLs are no longer valid.
;File:: anthro.ucsd.edu:/usr/local/lib/info/offerings/userhelp/checker,d

Manual Page for 'Checker' Program

NAME Anchor Checker -- check anchors in a HTML document for invalid links

By: George Xie(q7f192@ugrad.cs.ubc.ca)

SYNOPSIS checker [options] html-documents

What do they do:
  -help                 print a help screen(not much right now)
  -version              print out program name and version number
  -trace                turn on trace mode
  -base-address         http-address relative address root(see below for
                                description)
  -current-address      http-address address up to directory of the
                                documents(see below)
  -config-file file     specify where can the program find config file
  -only-errors          output only errors
  -remoteurl URL        check remote document
  -tmpdir directory     temporary directory
  -detail               Show last modified time and server type
  -verbal               Report total links after checking of each document
  -update-notice        Tell you if the document has been changed since
                                you last visit
  -no-smtp              Do NOT check telnet and mailto tags html-documents
                              documents you want to check
DESCRIPTION Checker is used to check links in your HTML documents. You can use regular expression to specify your files, one example: checker *.html. You can control its behavior by giving it some options. -help will print out all the options you can use. -version prints out version of the program you are using. Both of these options will not do the actual checking. You can trace the program by specifying -trace option. The output will not be interested to many people. If you want the program only prints out invalid links, -only-errors is the option you want to use.

Options -base-address and -current-address are a little complicated, so please pay attention to following description. When you specify -base-address, you must give the address of your http server in URL form, for example, in my case, I will do: -base-address http://www.ugrad.cs.ubc.ca Any of your links which starts with a "/" will append to this base address.

For example, in my document, I have: <A HREF="/spider/q7f192/home.html">. The whole URL will be http://www.ugrad.cs.ubc.ca/spider/q7f192/home.html. Is this clear? -current-address works almost the same way. Except that links without a "/" in front are appended to current-address. For example: -current-address http://www.ugrad.cs.ubc.ca/spider/q7f192. And in my document, I have <A HREF = "branch/checker.html">. Then the whole URL will resolve to http://www.ugrad.cs.ubc.ca/spider/q7f192/branch/checker.html.

From 0.97b, I added a function to scan for <base> tag in head section. If you know how to use base, you don't need -base-address, and -current-address anymore.

To check a remote HTML document, you can use option -remoteurl document. Where document should be a full URL. You must have another software, either lynx or www(line mode browser) installed on your system to check remote URL. You can browser) installed on your system to check remote URL. You can get www at the end of this document. I use lynx, or www, to download remote URL to your machine and check it. Therefore, you must specify a temporary directory to store the file using option -tmpdir directory. Default directory is current directory if you don't specify one and you want to check remote document. One thing you should know is that checker does not delete this temporary file after it exits. I deliberately avoid doing that for obvious reasons. File name is unique, because I use the time of downloading the file as file name. For those who don't know what I am talking about, file name consists of all digits. I am telling you all this in case you want to delete these files yourself. What I do is specifying /tmp as my temporary directory and let my system take care of deleting.

You can also request extra information about remote document by using -detail option. Other than normal information, checker will show you the last modified time of the remote document, and remote server type. Please remember that this option only works for HTTP protocol. Some servers do not return these two optional fields. I am not sure what I am going to do with these two fields in the future. Do you have any suggestion?

You can suppress total links report after each document by NOT specifying -verbal option. If you have over 600 documents to check, you might be appreciate this flag.

-update-notice will tell you if a link has been updated since you last visit. This only works on netscape bookmark because the bookmark contains last visit information. Again, this also depends on server return last modified time. If the server doesn't return last modified time, there is no way I can tell if the link has been changed or not.

If telnet and mailto tags give you trouble when you are using checker, you might want to disable these checkings by using -no-smtp option. It is hard to get these two working correctly on all UNIX platforms. Not many people will pay attention to these two links anyway.

So you ask, "do I have to type in all these options every time?". NO! You can put all these options in a config file. Then pointing this config file to checker at run time by checker -config-file file-name. If you do not specify a config file, default config file is $(HOME)/.checkercfg. i.e. it's in your home directory. Checker will run happily without a config file. If you specify command line options that are already in your config file, then the options in command line will overwrite the ones in config file. Here is an example of my config file:

# Line starts with a "#" will be ignored # order and case of these options are not significant. #-help #-version #-trace #-only-errors # ignore blank lines

;-base-address http://www.ugrad.cs.ubc.ca -current-address http://www.ugrad.cs.ubc.ca/spider/q7f192 -remoteurl http://www.ugrad.cs.ubc.ca/spider/q7f192/home.html -tmpdir /tmp -detail -verbal -update-notice

When checker starts, it searches through your page and looks for links. As soon as it finds one, it will try to follow this link. If it cannot follow the link, that means the link probably invalid. Checker will display one line message for both valid and invalid links. Checker can handle most of popular protocols HTTP, FTP, telnet, gopher ...

DIAGNOSTICS

Scanning file "my-cat.html"...

line-number: SUCCESS http://XXXXXXXXX/YYYYY line-number: OK      http://XXXXXX/YYYYY        Notice: document's changed since last visit on Thu Apr  6 15:02:52 1995

...... Scanning file "my-dog.html"...
Line number is where the link appears in your document. If you receive a SUCCESS message, it means that the link is valid. If You receive an ERROR message using a HTTP protocol, that link is very likely invalid. OK message basically means the same thing as SUCCESS. The most interesting message is "DENIED". This message occurs when a ftp server denies your anonymous login, either because it's busy or just does not allow anonymous login. You get an error if the site you want to telnet to doesn't exist. Error can also be generated when you specify an invalid email address. Other message is self explanatory.


Return to top.