Geoff Bowker, Graduate School of Library and Information Science

(I,NU) = Inaudible or not understandable

Indent = new speaker

All right. I’m here today interviewing Jeff Bowker who’s a professor in the School of Library and Information Science here at the University of Illinois. Today is the second of February, 1999 and we’re going to discuss some of the issues around classification. Jeff why don’t we just start with what does your interest in classification come from or what is it related to?

Ok, well I’ve been involved in a series of studies over the last 7 or 8 years now, looking at classification systems in medicine. So we’ve looked at the international classification of diseases, classifications of nursing work, classifications of viruses, and built up from there into race classification systems as well. Looking at the social, political, and ethnical aspects of classification systems, what work do they do? That we often think of classification as something that is done before the intellectual work gets done, before the political work gets done, before the/it’s just kind of setting the scene so that we can get on and do it/and do the real work and the real analysis. And the basic point of my analysis in the last several years has been looking at ways in which social and political and ethnical decisions are in fact made at that earlier stage, when we decide what entities there are in the world and how we classify those entities, how we name them. Actually is the site of a lot of very significant decisions. A typical case is far as medical classification goes, for example, is looking at the classification of abortion, classification of the moment of death has been a huge debate within different countries based on catholic, protestant, or other understandings of what it takes for an entity to be alive/is it alive at the moment of conception? Is it alive as it comes out of the womb? What counts as a still-birth? What counts as a live birth? These have been sites of political debate and ethical debate at the same time as their medical classifications which are being introduced for epedemiological and other purposes. So it’s trying to understand, trying to dig down and understand what is held by the classification systems that we have in our daily lives.

Can you give an example of how you go about doing that?

Sure! The/and part of it is very simple. It’s following a technique which has been pioneered in library science by Sammie Burman who sat down and read the Library of Congress classification system, just from A to Z, how is each classification structured? And he’s made the argument that if you look at, for example, classification of native americans in LC is very different/Library of Congress is very different from the classification of Black Americans, classification of women is very different from the classification of men, obviously you’ll get lots of women who are associated with problems, Black Americans who are associated with problems in the Library of Congress system. Men tend to be the unmarked category in the system. Native Americans are associated with belief structures, belief systems, everything that we associate with anthropology. Now in a sense that’s a fair representation of the literature that’s out there. And so there’s definitely a feedback (I,NU). But what’s significant and I think/what we need to think about when we deal with classification systems is that it sets up another kind of feedback (I,NU) that when I want to study for example women or I want to study for example native americans or black americans, then I will find it much easier to follow the paths which the classification system has opened up for me. So I’ll find it much easier to talk in terms of problems if I wanted to talk about belief structures within/not in the native american community, but in for example the white community. I find that very difficult to follow up. The information will be there, but it will be highly scattered. The classification system will hide it from me. So part of it is just doing a reading of the classification systems and trying to be aware of well what’s there but also what’s not there? What is it difficult for me to do with this classification system? or What is it easy for me to do with that system? So that’s one aspect of the study. A second aspect is to do an ethnographic analyses of the way in which classification systems actually get used in practice. So we’ve looked at classifications of nursing work for example. And gone into the hospitals and looked at ways in which the classifications are actually used in practice. And again seeing what do people choose to represent about their working lives and what they choose not to represent. And how does that create/build up a picture over time which is a skewed picture. Not of itself a false picture, but it’s missing out certain things and it’s emphasizing other things. So overall it ends up carrying a moral and political message at the same time as acting as a formal and scientific classification of nursing work.

For instance if I’m a nurse, from what I do within an 8-hour shift, what do I write down on a chart? The difference between that and what I actually did?

Well exactly and many nurses are getting into computerized systems now and a typical problem for both nurses and doctors in fact, but especially for nurses, is that they are very busy. You know they’re working 50-60 hour weeks. They don’t see the point in writing down every last thing that they’re doing. Just to make the administrators happy as far as the classification system goes. On the other hand, people doing research about/well what is it that nurses do? They want to be able to pick on something you know and show the value of it and show the range of activities. Now the problem is what they choose not to represent, or what they have difficulty representing and it tends to get factored out of the equation. So in fact the nurses that we’ve been studying/one of the interesting cases, for example, is that they have classified humor as a nursing activity, so telling a joke. Now hospital administrators have a lot of problem with that. But they say well if it’s part of our job and it’s something that we do every day and we wanted to have this count as part of our work, so they’ve been fighting for the classification system. And that’s been part of their struggle is a recognition of the categories that/the ways in which they choose to classify their own work.

You were talking earlier/you mentioned that the web is often categorized as an unclassified resource. At the same time you started to talk about some of the ways in which information is sorted and categorized on the web. Could you speak to that intention?

Absolutely! Yeah there are two extremes when we go up to search for information on the web (I,NU). One is the kind of classification system to get through to Yahoo, which is a fully hierarchical system and you know you can do word searches on it. But let’s take it the hierarchical (I,NU) that you can follow down from a high level topic like a country, you can go down to region, from a region you can go down to a municipality, from a municipality to a town, and from the town you can branch out to various activities in the town. So it’s setting up a huge complex hierarchical classification system. Now these classification systems have exactly the same problem as the system/the huge system like Library of Congress system does. They break the world up in certain ways, and they make it easy for us to view the world in certain ways, and they end up hiding things which don’t fit the normal categories, which are just not standard forms of behavior, not standard forms of self-description, multi-disciplinary, multi-modal kinds of activities tend to get locked out of a hierarchical classification system. However, at the other end of the (I,NU) you’ve got a search engine like Altavista which is the one that I always use, but Altavista you will just type in a couple of key words, you can refine the search a little bit with (I,NU) and then you just get what seems to be a fully undifferentiated list ation (I,NU) that problem to some extent. And a second problem is that behind the scenes, and this is the really important point about the kind of infrastructure that we’re building up on the web. Behind the scenes a lot of decisions are actually being made through the search engines like Altavista, about how they present their findings. So first of all it’s possible for (I,NU) user to trick the site one way or another by loading their own site with key words which then get picked up by the Altavista search engine. So they will get/so the expert web designer will actually be able to guarantee more hits for their site by throwing in lots of key words that they think are likely to be attractive to the audience.

Because they know which kinds of key words tend to get hit more often?

Absolutely and also they know how to write a web page. You don’t want to fill a web page with 500 key words, but you can bury them in your HTML codes so they’ll get picked up by the search engine anyway. That’s something that certain kinds of expert (I,NU) and others won’t. And the second problem with an Altavista style search is that Altavista itself will, for a fee, guarantee that certain kinds of site/that certain sites will come up as the first answer to given queries. So if you’re the official fan club for Brad Pit, you can guarantee that the Brad Pit fan club will always come up first under Brad Pit. That’s/but that’s at a cost. And so you as a user of the internet think that you’re getting a free fair access to all of the information that’s out there in the world of the web. But you’re not. You’re getting it directed in a couple of ways. First of all what the expert uses and secondly by the search engine providers themselves.

Any idea how they actually make that more accessible once you pay your fee? What do they do to make it hit more often than the other sites?

I’ve no idea. It’s very easy to (I,NU). It’s just saying ok when these words come up...

...they might use some of the same techniques that the expert web (I,NU) would use.

They just use a straight filtering mechanism. (I,NU) I mean the query comes up Brad Pit and then he just goes (I,NU) subroutine that says ok Brad Pit’s there, hey that has to go to that particular page, first of all. Then that’s/you see that very markedly right now in the Altavista. That they have the related sites featured as well. And that related site (I,NU) you don’t get to be a related site unless you’ve given Altavista some money or made some sort of arrangement with them. Search engines are very powerful economic and political tools as far as the internet goes.

Did they find/do they find that people tend to use the same/Yahoo, the hierarchical search engine that will have access actually to many fewer things than (I,NU). Do they find that people tend to use those more often than the general user because it is hierarchical? Because it’s sorted easy categories?

I’ve not seen the actual/the figures on that/I’d be interested/that would be my suspicion. I mean certainly the people that I know who use aol.com or the kind of/the general user like for example my in-laws you know kind of in their sixties, they discovered the web a few years ago, they’re quite frightened in the sense by the power of it. And they found it a lot easier if they can follow through a set of/through a set of procedures to get them where they want to go in a sort of easy sequence steps. And that’s why/I mean that’s my reading of why so much money has been spent/ridiculous amounts of money on the take-overs lately of GS cities, of Yahoo. It’s the whole idea of if you own a Gateway site and if the Gateway itself can direct the use of the internet user in particular directions, then that’s obviously going to be worth an awful amount of money and is going to prove that the evaluations are worth it. Something (I,NU) I feel is still/you know despite what I said, it still gives you more freedom.

You talk a lot about sort of/start to talk about credibility and access issues in terms of getting to the site.

Yeah.

Can you talk a little bit about how you think credibility is being established within a site? Pick up a journal or a magazine or a book we have certain markings of credibility, publishing companies and what-not. How is this being established now? Do you have any sense?

A little bit’s (I,NU) I think that’s/I mean people have talked about this as being a huge problem. I don’t see it as being a problem and I think over the next 10-15 years we’re going to generate/as a culture we’re going to generate pretty good ways of reading sites. People are very good about reading CD labels, for example, and understanding CD labels. And that’s a technology that was/has only been there for the last 10 or 15 years. We used to have problems with (I,NU) and e-mail and I think that’s/people are now much more aware of that. We already have a cultural set of norms in place. When I’ve done a kind of mini-study in class last year around how people judge credibility of web sites, and there are certain indications. One nice one is length of address. If you have a very short address, http//whitehouse.gov or something like that, that tends to hold high credibility than something that’s buried down there you know, www.aol.com/~///. I mean that’s/that gives you an idea that we’re dealing with someone who’s got a major server and this is a major site and they’ve paid/you know to get their particular .com or whatever address. People also build up fairly quickly a kind of cultural recognition of you know what does .gov mean? What does .com mean? What does .org mean? And I think that sort of awareness is spreading. From within a site one of the things to look at is well what else do they refer to? If they don’t refer out from that site, then I think people tend to be suspicious of it and probably should be suspicious of it. If they link into other sites which appear to be equally strong and equally robust in their information then you get the idea that there is a web of information out there of which this is a legitimate citizen you know, a legitimate member. Whereas someone who’s just you know kind of writing something off the top of their own heads won’t be able to make those kind of links. And certainly won’t be referred back to from other places.

Right.

So if it’s referred from a source that you trust, again that’s something that you will tend to trust. So I think we’re building up sets of markers there now.

And some of them look a whole lot like the markers that we already have. (I,NU) is a citation basically right?

Yeah absolutely! Absolutely. I mean there are other markers as well which I think are somewhat trustworthy, but less so and that’s the look and feel of the site. There’s a lovely site which is www.buttugly.com which is/just shows the worst possible web design, many flashing features and black and yellow pages and you can’t read the print because it’s faint and violet. Things like that you tend to mistrust fairly quickly.

Right.

Now that said/I mean it can be very difficult to read. People can establish themselves as experts and claim to be experts in areas and unless you already have knowledge of that area it can be very difficult to actually judge you know, is this true, is this not? And that’s something that/you know I don’t think there’s any external/there’s no sign that a web page carries that is actually going to tell you whether or not you can really really trust it. I mean there has to be (I,NU).

And there are also ways to connect it to commercial/buying certain domain names and establishing that credibility just through purchase.

That’s right.

I wonder if you could/this circles back a little bit to the issues that you were talking about with the Library of Congress in like classifying/say if you look up white or look up black and how problems become associated with African Americans that are not through these classifications. I wonder if you could about that a little bit with like this visibility issue in regards to countries. I know you mentioned earlier something about India.

Yeah.

And how India becomes represented.

Yeah, well it’s a couple of issues there/I mean one is just the straight infrastructural one, which is the connections to places like India, and Africa. The connections are not as good as they are to Europe, as they are to other countries that we consider "first world countries." What this means is that even if there’s a site out there in India which I think is a very valuable site, if I click on my (I,NU) and try and visit that site and I spend quite a deal of time trying to find about the internet in India, it will take about 5 minutes for a site to load. Many times a day and night because we’re dealing with poor intercontinental connections and poor server capabilities within that country. So I basically loose interest and people need an immediate response. So you tend to go where there’s an immediate response. You become like the drunk who’s looking a little (I,NU) you know because that’s where the light is, it’s not where he dropped his key. And it’s that same principle. It’s much easier for me to study other areas when I’m studying the internet to study other areas where there are good T-1 connections and people with good service. Now there’s a related problem there which goes beyond the infrastructure which is because there is so much information out there on the web, and because it seems to be so all embracing and because I can still get 100,000 hits when I click on Africa, an altavista site, then I assume that everything is covered and that I don’t see the things that just are not part of the web and the gaping holes that are there and are still there very much to this day. And I think that’s a second kind of problem is that it’s very easy for me again as a student who’s you know doing research in various areas, it’s very easy for me to do research by looking at e-mail logs, looking at listserv logs, going into (I,NU) and that draws my attention to various parts of the community and draws my attention away from parts of the community that don’t have that access yet and that probably are going to have a lot of difficulty getting that access. And when they do get it, they’re only going to get the cheaper versions of it. So I think it/I mean it definitely sets up issues there of/you know we need to keep our political’s antennae very lively when we go into studying (I,NU) studying the web and being overwhelmed by just how all-embracing it is and recognizing that it’s not/that there are definite holes there. And that there are a lot of people who are becoming yet more invisible through the creation of this incredible (I,NU) which is the web. Because anything you can’t see through the (I,NU) is just completely below your radar. It’s completely invisible.

So thinking about what’s not there becomes as or even more important than thinking about what’s there?

Absolutely. And that’s true generally of any/you know going back to classification systems. That’s a key finding about classification systems if you like. It’s very hard to think outside of classification systems, to think outside of things that aren’t represented. But that’s exactly what you need to do when you want to understand what are the political and social and ethical decisions which have been made that have put this structure into place.

Jeff let me ask you just one more question. I wanted to/in thinking about the trends that you’ve seen over the last 3 years and I was wondering if you could imagine where you think the web might be going in the next few years. Do you sense that as things will become more hierarchical or there will be new forms of classification or just in related classification what kinds of trends do you see?

That’s a great question. In classification systems we’re dealing always with legacy systems and that’s something that needs to be recognized in general. That there are huge new possibilities with object relational data bases or object (I,NU) data bases/you can do incredibly clever things with the information right now with information handling and information processing, which would allow you to post-process data to keep it flexibly so that whenever you changed your ideas about a system or changed a classification system you could re-classify everything back in time so that all data could still be used, would still be valuable, you could/the phrase that I use there is that the ideal of the new classification systems is that you can reconfigure the world at will. At that is a possibility in the new technology. I mean that’s the kind of/that’s the gung-ho positive aspect of it. The negative side of that is well with what’s actually happening and what are the actual processes over the next three years? People that I see who are creating vast new data bases on the web and are still using what’s called relational data bases, which are actually incredibly inflexible/it’s very very difficult to migrate data from a relational data base to an (I,NU) data base which is what you would want to do if you wanted to create really flexible classification systems. Legacy systems are a huge problem. We always assume when we talk about the wonders of the web and you know the possibilities of the next three years, we assume that everybody’s going to have the latest level of technology at the latest pitch. And they’re going to magically snap their fingers and bring all their old files up to date.

Right.

That’s not what happens. So if you ask me, do I think in the next 100 years we’re going to have something amazing, yes I do. I think we are going to see new ways of organizing information, you know, which are going to create/which are going to have enormous potential for thinking with if you like. I mean it’s the same revolution for thinking with that we had with the invention of printed book. I’ve absolutely no doubt about that. And we’re getting the foretaste of that right now. If we’re talking about the next 3 years, it’s/we’re still at the very early stages. The battle is still there, the legacy systems are still there, and they’re going to be there for an awful long time. The inequalities of being entrenched/generatively entrenched more and more at the moment and we need to fight the creation of those inequalities in representation as well as inequalities in access to information. And until we you know do that work that hard (I,NU) work, which is going to take an awful long time, we’re not going to fully realize the benefits which are there as a twinkle in the developer’s eyes at the moment.

I hear you saying that in fighting inequalities early on will have more durable effects for/could have for many years.

Oh absolutely yeah. And if we don’t fight it now/if we don’t fight them now they’re going to become so deeply entrenched in the web and so deeply entrenched in its organization that it’s going to be very difficult to unthink them. And because/and this is the point about classification systems. The more we need to look at things like classification systems because when we create an infrastructure, when we really create an infrastructure, then that resonates you know all out into/all aspects of our life or parts about social and political organizations and they really get enormous kind of body behind them. They’re very very difficult to change. We’re at the point right now where you know it’s like a butterfly flapping a wing in South America. A very small change could have a huge effect in the future. In ten years’ time even, that’s going to be much less so because you know you won’t be able to flap your wings any more in the same way.

Right. Jeff thanks a lot. This has been great. I really enjoyed it.

Ok many thanks, me too.

Thank you.