I see the anatomy of a (VRML 2.0/97) widget as a layer architecture (similar to the OSI layer model of networking) rather than, say, a component archictecture. To explain the layers I am going to use specific examples from both my VRML widgets and 2D GUIs. This does not mean that a 2D GUI can be layered in the same way, by the way, so you will find that the analogies to 2D are occasionally a stretch.
These are the VRML -Sensor nodes, including their clamping and constraints (e.g. the cycleInterval and loop fields in the TimeSensor). This is how user input is gathered by the browser. A 2D analog is reading the mouse position or a mouse click, including things like the behavior of a modal dialog box where clicking the mouse outside the dialog does nothing but beep.
The VRML -Interpolator nodes fall into this layer, as does a certain amount of scripting. This layer is concerned with type casting (e.g. extracting the X coordinate of a SFVec3f into a SFFloat), arithmetic operations (both those which require a script and those which can be performed with the right -Interpolator), logical (boolean) operations/gates, etc. The 2D equivalent would be the mapping in a scrollbar from screen pixels to fraction of the bar or the operation of radio buttons (i.e. `only one selected at a time' can be represented by a series of logic gates). (I wrote a paper about implementing some of this.)
This is where actual widgets come in. You have a -Sensor which you want to attach to some geometry (ignoring TimeSensors for the moment), some internal glue to map the values the -Sensor produces to the desired output (SFFloat, SFVec3f, SFRotation, whatever), and all you need is that geometry. This should presumably be pluggable in the sense that it can be represented outside the PROTO definition of the widget and receive events and such from outside the PROTO scope. An example here is the TwistSlider, which is nothing more than a BoxSlider and a Slider with some external glue (see below) making it twist as it slides (and the thumb on the slide change size as it moves). In a 2D GUI, consider the size, color, and label on a button.
This is the layer that attaches user input garnered from widget output to action in the virtual world. It is also desirable to express feedback to the user with widgets, such as having a Slider representing a number from 0 to 255 and 8 Buttons representing the number in binary (see my Button Demo). I can go on and on about this layer, but I will restrain myself.
It can be as simple as ROUT'ing the output from a Button to the set_enabled eventIn of a (looping) TimeSensor controlling an animation, thus connecting the state of the Button with whether the animation is on and looping or not. It can be at least as complex as three Sliders combining their values into the X, Y, and Z translation of an object or the viewpoint. This is where type casting, logic gates, and arithmetic operations become the most useful in a general sort of way. A 2D example would be a slider that controls the volume of a CD that is playing.
This is the layer we want the user to see and use in, one hopes, an intuitive manner. Consider the CosmoPlayer Examiner dashboard (I like the SGI one better) as such a VRUI. There are widgets to rotate the world, translate the world, zoom in on a particular object, etc. These all work together to form a user interface that is easy to understand and use (no, really, the SGI dashboard is better than the one on the PC). Consider a 2D dialog box, including buttons, text fields, etc. as analogous.