Geoff Bowker, Graduate School of Library and Information Science
(I,NU) = Inaudible or not understandable
Indent = new speaker
All right. Im here today interviewing Jeff Bowker whos a professor in the School of Library and Information Science here at the University of Illinois. Today is the second of February, 1999 and were going to discuss some of the issues around classification. Jeff why dont we just start with what does your interest in classification come from or what is it related to?
Ok, well Ive been involved in a series of studies over the last 7 or 8 years now, looking at classification systems in medicine. So weve looked at the international classification of diseases, classifications of nursing work, classifications of viruses, and built up from there into race classification systems as well. Looking at the social, political, and ethnical aspects of classification systems, what work do they do? That we often think of classification as something that is done before the intellectual work gets done, before the political work gets done, before the/its just kind of setting the scene so that we can get on and do it/and do the real work and the real analysis. And the basic point of my analysis in the last several years has been looking at ways in which social and political and ethnical decisions are in fact made at that earlier stage, when we decide what entities there are in the world and how we classify those entities, how we name them. Actually is the site of a lot of very significant decisions. A typical case is far as medical classification goes, for example, is looking at the classification of abortion, classification of the moment of death has been a huge debate within different countries based on catholic, protestant, or other understandings of what it takes for an entity to be alive/is it alive at the moment of conception? Is it alive as it comes out of the womb? What counts as a still-birth? What counts as a live birth? These have been sites of political debate and ethical debate at the same time as their medical classifications which are being introduced for epedemiological and other purposes. So its trying to understand, trying to dig down and understand what is held by the classification systems that we have in our daily lives.
Can you give an example of how you go about doing that?
Sure! The/and part of it is very simple. Its following a technique which has been pioneered in library science by Sammie Burman who sat down and read the Library of Congress classification system, just from A to Z, how is each classification structured? And hes made the argument that if you look at, for example, classification of native americans in LC is very different/Library of Congress is very different from the classification of Black Americans, classification of women is very different from the classification of men, obviously youll get lots of women who are associated with problems, Black Americans who are associated with problems in the Library of Congress system. Men tend to be the unmarked category in the system. Native Americans are associated with belief structures, belief systems, everything that we associate with anthropology. Now in a sense thats a fair representation of the literature thats out there. And so theres definitely a feedback (I,NU). But whats significant and I think/what we need to think about when we deal with classification systems is that it sets up another kind of feedback (I,NU) that when I want to study for example women or I want to study for example native americans or black americans, then I will find it much easier to follow the paths which the classification system has opened up for me. So Ill find it much easier to talk in terms of problems if I wanted to talk about belief structures within/not in the native american community, but in for example the white community. I find that very difficult to follow up. The information will be there, but it will be highly scattered. The classification system will hide it from me. So part of it is just doing a reading of the classification systems and trying to be aware of well whats there but also whats not there? What is it difficult for me to do with this classification system? or What is it easy for me to do with that system? So thats one aspect of the study. A second aspect is to do an ethnographic analyses of the way in which classification systems actually get used in practice. So weve looked at classifications of nursing work for example. And gone into the hospitals and looked at ways in which the classifications are actually used in practice. And again seeing what do people choose to represent about their working lives and what they choose not to represent. And how does that create/build up a picture over time which is a skewed picture. Not of itself a false picture, but its missing out certain things and its emphasizing other things. So overall it ends up carrying a moral and political message at the same time as acting as a formal and scientific classification of nursing work.
For instance if Im a nurse, from what I do within an 8-hour shift, what do I write down on a chart? The difference between that and what I actually did?
Well exactly and many nurses are getting into computerized systems now and a typical problem for both nurses and doctors in fact, but especially for nurses, is that they are very busy. You know theyre working 50-60 hour weeks. They dont see the point in writing down every last thing that theyre doing. Just to make the administrators happy as far as the classification system goes. On the other hand, people doing research about/well what is it that nurses do? They want to be able to pick on something you know and show the value of it and show the range of activities. Now the problem is what they choose not to represent, or what they have difficulty representing and it tends to get factored out of the equation. So in fact the nurses that weve been studying/one of the interesting cases, for example, is that they have classified humor as a nursing activity, so telling a joke. Now hospital administrators have a lot of problem with that. But they say well if its part of our job and its something that we do every day and we wanted to have this count as part of our work, so theyve been fighting for the classification system. And thats been part of their struggle is a recognition of the categories that/the ways in which they choose to classify their own work.
You were talking earlier/you mentioned that the web is often categorized as an unclassified resource. At the same time you started to talk about some of the ways in which information is sorted and categorized on the web. Could you speak to that intention?
Absolutely! Yeah there are two extremes when we go up to search for information on the web (I,NU). One is the kind of classification system to get through to Yahoo, which is a fully hierarchical system and you know you can do word searches on it. But lets take it the hierarchical (I,NU) that you can follow down from a high level topic like a country, you can go down to region, from a region you can go down to a municipality, from a municipality to a town, and from the town you can branch out to various activities in the town. So its setting up a huge complex hierarchical classification system. Now these classification systems have exactly the same problem as the system/the huge system like Library of Congress system does. They break the world up in certain ways, and they make it easy for us to view the world in certain ways, and they end up hiding things which dont fit the normal categories, which are just not standard forms of behavior, not standard forms of self-description, multi-disciplinary, multi-modal kinds of activities tend to get locked out of a hierarchical classification system. However, at the other end of the (I,NU) youve got a search engine like Altavista which is the one that I always use, but Altavista you will just type in a couple of key words, you can refine the search a little bit with (I,NU) and then you just get what seems to be a fully undifferentiated list ation (I,NU) that problem to some extent. And a second problem is that behind the scenes, and this is the really important point about the kind of infrastructure that were building up on the web. Behind the scenes a lot of decisions are actually being made through the search engines like Altavista, about how they present their findings. So first of all its possible for (I,NU) user to trick the site one way or another by loading their own site with key words which then get picked up by the Altavista search engine. So they will get/so the expert web designer will actually be able to guarantee more hits for their site by throwing in lots of key words that they think are likely to be attractive to the audience.
Because they know which kinds of key words tend to get hit more often?
Absolutely and also they know how to write a web page. You dont want to fill a web page with 500 key words, but you can bury them in your HTML codes so theyll get picked up by the search engine anyway. Thats something that certain kinds of expert (I,NU) and others wont. And the second problem with an Altavista style search is that Altavista itself will, for a fee, guarantee that certain kinds of site/that certain sites will come up as the first answer to given queries. So if youre the official fan club for Brad Pit, you can guarantee that the Brad Pit fan club will always come up first under Brad Pit. Thats/but thats at a cost. And so you as a user of the internet think that youre getting a free fair access to all of the information thats out there in the world of the web. But youre not. Youre getting it directed in a couple of ways. First of all what the expert uses and secondly by the search engine providers themselves.
Any idea how they actually make that more accessible once you pay your fee? What do they do to make it hit more often than the other sites?
Ive no idea. Its very easy to (I,NU). Its just saying ok when these words come up...
...they might use some of the same techniques that the expert web (I,NU) would use.
They just use a straight filtering mechanism. (I,NU) I mean the query comes up Brad Pit and then he just goes (I,NU) subroutine that says ok Brad Pits there, hey that has to go to that particular page, first of all. Then thats/you see that very markedly right now in the Altavista. That they have the related sites featured as well. And that related site (I,NU) you dont get to be a related site unless youve given Altavista some money or made some sort of arrangement with them. Search engines are very powerful economic and political tools as far as the internet goes.
Did they find/do they find that people tend to use the same/Yahoo, the hierarchical search engine that will have access actually to many fewer things than (I,NU). Do they find that people tend to use those more often than the general user because it is hierarchical? Because its sorted easy categories?
Ive not seen the actual/the figures on that/Id be interested/that would be my suspicion. I mean certainly the people that I know who use aol.com or the kind of/the general user like for example my in-laws you know kind of in their sixties, they discovered the web a few years ago, theyre quite frightened in the sense by the power of it. And they found it a lot easier if they can follow through a set of/through a set of procedures to get them where they want to go in a sort of easy sequence steps. And thats why/I mean thats my reading of why so much money has been spent/ridiculous amounts of money on the take-overs lately of GS cities, of Yahoo. Its the whole idea of if you own a Gateway site and if the Gateway itself can direct the use of the internet user in particular directions, then thats obviously going to be worth an awful amount of money and is going to prove that the evaluations are worth it. Something (I,NU) I feel is still/you know despite what I said, it still gives you more freedom.
You talk a lot about sort of/start to talk about credibility and access issues in terms of getting to the site.
Yeah.
Can you talk a little bit about how you think credibility is being established within a site? Pick up a journal or a magazine or a book we have certain markings of credibility, publishing companies and what-not. How is this being established now? Do you have any sense?
A little bits (I,NU) I think thats/I mean people have talked about this as being a huge problem. I dont see it as being a problem and I think over the next 10-15 years were going to generate/as a culture were going to generate pretty good ways of reading sites. People are very good about reading CD labels, for example, and understanding CD labels. And thats a technology that was/has only been there for the last 10 or 15 years. We used to have problems with (I,NU) and e-mail and I think thats/people are now much more aware of that. We already have a cultural set of norms in place. When Ive done a kind of mini-study in class last year around how people judge credibility of web sites, and there are certain indications. One nice one is length of address. If you have a very short address, http//whitehouse.gov or something like that, that tends to hold high credibility than something thats buried down there you know, www.aol.com/~///. I mean thats/that gives you an idea that were dealing with someone whos got a major server and this is a major site and theyve paid/you know to get their particular .com or whatever address. People also build up fairly quickly a kind of cultural recognition of you know what does .gov mean? What does .com mean? What does .org mean? And I think that sort of awareness is spreading. From within a site one of the things to look at is well what else do they refer to? If they dont refer out from that site, then I think people tend to be suspicious of it and probably should be suspicious of it. If they link into other sites which appear to be equally strong and equally robust in their information then you get the idea that there is a web of information out there of which this is a legitimate citizen you know, a legitimate member. Whereas someone whos just you know kind of writing something off the top of their own heads wont be able to make those kind of links. And certainly wont be referred back to from other places.
Right.
So if its referred from a source that you trust, again thats something that you will tend to trust. So I think were building up sets of markers there now.
And some of them look a whole lot like the markers that we already have. (I,NU) is a citation basically right?
Yeah absolutely! Absolutely. I mean there are other markers as well which I think are somewhat trustworthy, but less so and thats the look and feel of the site. Theres a lovely site which is www.buttugly.com which is/just shows the worst possible web design, many flashing features and black and yellow pages and you cant read the print because its faint and violet. Things like that you tend to mistrust fairly quickly.
Right.
Now that said/I mean it can be very difficult to read. People can establish themselves as experts and claim to be experts in areas and unless you already have knowledge of that area it can be very difficult to actually judge you know, is this true, is this not? And thats something that/you know I dont think theres any external/theres no sign that a web page carries that is actually going to tell you whether or not you can really really trust it. I mean there has to be (I,NU).
And there are also ways to connect it to commercial/buying certain domain names and establishing that credibility just through purchase.
Thats right.
I wonder if you could/this circles back a little bit to the issues that you were talking about with the Library of Congress in like classifying/say if you look up white or look up black and how problems become associated with African Americans that are not through these classifications. I wonder if you could about that a little bit with like this visibility issue in regards to countries. I know you mentioned earlier something about India.
Yeah.
And how India becomes represented.
Yeah, well its a couple of issues there/I mean one is just the straight infrastructural one, which is the connections to places like India, and Africa. The connections are not as good as they are to Europe, as they are to other countries that we consider "first world countries." What this means is that even if theres a site out there in India which I think is a very valuable site, if I click on my (I,NU) and try and visit that site and I spend quite a deal of time trying to find about the internet in India, it will take about 5 minutes for a site to load. Many times a day and night because were dealing with poor intercontinental connections and poor server capabilities within that country. So I basically loose interest and people need an immediate response. So you tend to go where theres an immediate response. You become like the drunk whos looking a little (I,NU) you know because thats where the light is, its not where he dropped his key. And its that same principle. Its much easier for me to study other areas when Im studying the internet to study other areas where there are good T-1 connections and people with good service. Now theres a related problem there which goes beyond the infrastructure which is because there is so much information out there on the web, and because it seems to be so all embracing and because I can still get 100,000 hits when I click on Africa, an altavista site, then I assume that everything is covered and that I dont see the things that just are not part of the web and the gaping holes that are there and are still there very much to this day. And I think thats a second kind of problem is that its very easy for me again as a student whos you know doing research in various areas, its very easy for me to do research by looking at e-mail logs, looking at listserv logs, going into (I,NU) and that draws my attention to various parts of the community and draws my attention away from parts of the community that dont have that access yet and that probably are going to have a lot of difficulty getting that access. And when they do get it, theyre only going to get the cheaper versions of it. So I think it/I mean it definitely sets up issues there of/you know we need to keep our politicals antennae very lively when we go into studying (I,NU) studying the web and being overwhelmed by just how all-embracing it is and recognizing that its not/that there are definite holes there. And that there are a lot of people who are becoming yet more invisible through the creation of this incredible (I,NU) which is the web. Because anything you cant see through the (I,NU) is just completely below your radar. Its completely invisible.
So thinking about whats not there becomes as or even more important than thinking about whats there?
Absolutely. And thats true generally of any/you know going back to classification systems. Thats a key finding about classification systems if you like. Its very hard to think outside of classification systems, to think outside of things that arent represented. But thats exactly what you need to do when you want to understand what are the political and social and ethical decisions which have been made that have put this structure into place.
Jeff let me ask you just one more question. I wanted to/in thinking about the trends that youve seen over the last 3 years and I was wondering if you could imagine where you think the web might be going in the next few years. Do you sense that as things will become more hierarchical or there will be new forms of classification or just in related classification what kinds of trends do you see?
Thats a great question. In classification systems were dealing always with legacy systems and thats something that needs to be recognized in general. That there are huge new possibilities with object relational data bases or object (I,NU) data bases/you can do incredibly clever things with the information right now with information handling and information processing, which would allow you to post-process data to keep it flexibly so that whenever you changed your ideas about a system or changed a classification system you could re-classify everything back in time so that all data could still be used, would still be valuable, you could/the phrase that I use there is that the ideal of the new classification systems is that you can reconfigure the world at will. At that is a possibility in the new technology. I mean thats the kind of/thats the gung-ho positive aspect of it. The negative side of that is well with whats actually happening and what are the actual processes over the next three years? People that I see who are creating vast new data bases on the web and are still using whats called relational data bases, which are actually incredibly inflexible/its very very difficult to migrate data from a relational data base to an (I,NU) data base which is what you would want to do if you wanted to create really flexible classification systems. Legacy systems are a huge problem. We always assume when we talk about the wonders of the web and you know the possibilities of the next three years, we assume that everybodys going to have the latest level of technology at the latest pitch. And theyre going to magically snap their fingers and bring all their old files up to date.
Right.
Thats not what happens. So if you ask me, do I think in the next 100 years were going to have something amazing, yes I do. I think we are going to see new ways of organizing information, you know, which are going to create/which are going to have enormous potential for thinking with if you like. I mean its the same revolution for thinking with that we had with the invention of printed book. Ive absolutely no doubt about that. And were getting the foretaste of that right now. If were talking about the next 3 years, its/were still at the very early stages. The battle is still there, the legacy systems are still there, and theyre going to be there for an awful long time. The inequalities of being entrenched/generatively entrenched more and more at the moment and we need to fight the creation of those inequalities in representation as well as inequalities in access to information. And until we you know do that work that hard (I,NU) work, which is going to take an awful long time, were not going to fully realize the benefits which are there as a twinkle in the developers eyes at the moment.
I hear you saying that in fighting inequalities early on will have more durable effects for/could have for many years.
Oh absolutely yeah. And if we dont fight it now/if we dont fight them now theyre going to become so deeply entrenched in the web and so deeply entrenched in its organization that its going to be very difficult to unthink them. And because/and this is the point about classification systems. The more we need to look at things like classification systems because when we create an infrastructure, when we really create an infrastructure, then that resonates you know all out into/all aspects of our life or parts about social and political organizations and they really get enormous kind of body behind them. Theyre very very difficult to change. Were at the point right now where you know its like a butterfly flapping a wing in South America. A very small change could have a huge effect in the future. In ten years time even, thats going to be much less so because you know you wont be able to flap your wings any more in the same way.
Right. Jeff thanks a lot. This has been great. I really enjoyed it.
Ok many thanks, me too.
Thank you.