Week 3: Feminist Search (Code Critique)

Christine.Meinders · February 2020

by Christine Meinders, Jana Thompson, Sarah Ciston, Catherine Griffiths

Approaches to Co-Creation
In this example, community-sourced data can be traced both visually and in code, and can be used to inform the very model used to process this information. Rather than simply coding, the prototyping process is incorporated in the code from a critical perspective. This process is guided by the Cultural AI Design Tool, which refocuses the design process so that questions of creator, data origin, and rule-creation are centered rather than marginally examined or ignored. Using these as a basis for this particular critical code context, contributors are credited, while also keeping the prototype open for co-creation and reformulation by the community.

Modeling Binaries:
There are several pieces that contribute to Feminist Search: personal data donation, interface design, and the use of binaries in data collection and model creation.

The Feminist Search project explores what is safe and what is dangerous. Binary notions of safety and danger are just the starting point. Within the last five years, rising dangerous rhetoric is becoming socially acceptable once more and a corresponding rise in violent acts globally. Beyond this, there are the pressures of misogyny, racism, and other forms of bigotry that increase an individual or community's constant awareness of action to make themselves safe. What makes people feel safe? Safety can be categorized differently, such as physical, emotional and professional safety.

These binary definitions can be expanded by examining the grey spaces with the questions in the personal data donation. By having people discuss what safety means to them, or semantics of this term and related concepts, models can be built that reflect these spectrums, that allows for both exciting design and technical challenges, but more importantly, for creating technology that is for the people who contribute their data. Feminist Search explores the challenges of search from a community perspective---with a goal of reflecting the shared data of communities in Los Angeles and San Francisco.

One highlight is that computation is fundamentally binary, as are labels in machine learning---the data donation portion of Feminist Search uses labels of safe and dangerous. However, the goal is to move beyond a true/false dichotomy, because truth value in subjective particularly in categorizations of feelings and sentiments.

For those who are not familiar with the details of machine learning, fundamentally, machine learning is mathematical representations of geometric spaces that have distance functions as part of their definition. In defining geometric classes, there will be a division between classes in an n-dimensional space (as in linear regression), or instead perhaps something such as a centroid in a clustering algorithm that will be the most representational of a cluster. Prediction(s) as to a class or type of image will depend on the geometric location in the vector space of the item(s).

The interesting problems in data science and machine learning aren't in churning out mathematically good predictions, however. The outcome of an algorithm is only as good as the data given to it and how the person(s) constructing it use that data in the creation of a model. What often happens in construction of models is that outliers from other data points are often thrown out or are drowned out in the majority vote of the more "normal" considerations. Thus, these lead to models where a literal tyranny of the majority can happen, since the majority of opinions have more weight statistically - instead of treating all the data equally.

In this approach, the simple act of search can be used to understand binary decisions that are used to form a model, and how users can donate information to understand who is contributing to search and data collection. This is the central starting point that prioritizes visualization and creates a space to develop a community search engine. In Feminist Search, communities create and provide contexts for evaluation, with the goal of sharing these decisions along with donated personal data, and the "why" in the search results.

An additional goal of Feminist Search is to highlight thoughtful data donation and model weighting processes, while also showing how search is used---thus incorporating Feminist.AI approaches by exploring the act of search by utilizing embodied, multi-sensory (movement, sound, and images) methods through critical prototyping. Feminist Search is a way to solidify and continually honor the work of feminist communities.

Here is the code for Feminist Search
Thompson, 2020, Python

import numpy as np
import cv2
import glob
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV

def  import_image(path):
    """
    INPUT: path to image file in jpg
    OUTPUT: machine readable image file
    """
    image = cv2.imread(path)
    return image

class  ClusteredImages:
    def  __init__(self, positive_images_path, negative_images_path, image_suffix, number_of_clusters):
    self.positive_images = set(glob.glob(positive_images_path + '/' + image_suffix))
    self.negative_images = set(glob.glob(negative_images_path + '/' + image_suffix))
        self.no_of_clusters = number_of_clusters

    self.image_paths = [[path, True] for path in self.positive_images] + [[path, False] for path in self.negative_images]
        self.image_array = np.array(self.image_paths)
        
    self.descriptors = []
        for path, label in self.image_array:
            image = import_image(path)
            b_and_w = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            sift = cv2.xfeatures2d.SIFT_create()
        kp, each_descriptors = sift.detectAndCompute(b_and_w, None)
            self.descriptors.append(each_descriptors)

    def  return_labels(self):
        return np.array(self.image_paths)[:, -1]

    def  generate_features(self, clustering_model):
            # rename function to reflect that it returns both training data and predictable data
            number_of_clusters = clustering_model.n_clusters
        descriptors_pre_array = [desc for desc_list in self.descriptors for desc in desc_list]
            descriptors_array = np.array(descriptors_pre_array)
            clustering_model.fit(descriptors_array)
        clustered_words = [clustering_model.predict(words) for words in self.descriptors]
            return np.array([np.bincount(words, minlength=number_of_clusters) for words in clustered_words])

class  ParameterFinder:
        def  __init__(self, X, y):
            # use gammas for rbf, poly and sigmoid
            #degrees for poly
            self.X = X
            self.y = y
            self.kernels_to_try = ['linear', 'rbf', 'poly', 'sigmoid']
            self.C_params = [0.001, 0.01, 0.1, 1, 10]
            self.gamma_params = [0.001, 0.01, 0.1, 1]
            self.degree_params = [0.0, 1.0, 2.0, 3.0, 4.0]
        
    def  find_best_params(kernel, X, y, param_grid):
                grid_search = GridSearchCV(svm.SVC(kernel = kernel), param_grid)
                grid_search.fit(X, y)
                return grid_search.best_params_

        def  return_all_best_params(self):
            best_params = {}
                # should rewrite to pass kernel and find parameters
                for kernel in self.kernels_to_try:
                    if kernel == 'linear':
                            param_grid = {'C': self.C_params}
                            search_for_params = find_best_params('rbf', self.X, self.y, param_grid)
                            best_params['linear'] = search_for_params
                    elif kernel == 'rbf':
                            param_grid = {'C': self.C_params, 'gamma': self.gamma_params}
                            search_for_params = find_best_params('rbf', self.X, self.y, param_grid)
                            best_params['rbf'] = search_for_params
                    elif kernel == 'poly':
                            param_grid = {'C': self.C_params, 'gamma': self.gamma_params, 'degree': self.degree_params}
                            search_for_params = find_best_params('poly', self.X, self.y, param_grid)
                            best_params['poly'] = search_for_params
                    else:
                            pass
        return best_params

jeremydouglass · February 2020

Thank you for sharing this!

I'm curious about the included starter model, fs_model.pkl, and what is in it. The README says "I created a model with some original data we had." What is this model made of, and what is it for classifying?

CatherineGriffiths · February 2020

I’m just starting to understand how this code works, but my initial interest is to think about machine learning as a process that constantly reduces the complexity of information.

Beginning with the many interpretations of the phenomenon of ’safety’ contained in the original dataset in this case, complexity is removed from the dataset by way of image filtering, then through clustering, and eventually by classification. This reduction of complexity is conducted in the name of readability of the data by the algorithm.

In this case, perhaps feminist search could be considered a process of seeking to preserve a greater amount of complexity in the system at its various stages?

To focus on one aspect of this, if we look at the ClusteredImages class, which if we look in the GitHub link, it shows that it uses a KMeans function. Here K stands for the number of clusters. Perhaps, this could be interpreted as a variable for complexity?

jeremydouglass · February 2020

Thank you! I suppose I was wondering more about the inputs and outputs less then the means (or the kmeans clustering).

I think the model was trained on photos, like jpeg images, is that right? What were they of? Faces, cats and dogs, guns, things posted to Twitter? If some of them were labeled as safe, who labeled them -- the researcher, social media users, volunteers?

I'm just trying to find a human context for the prototype or the concept -- not how it works, but what it is for and how it is used, specifically. "Its process trained on images of cats and dogs as safe and dangerous -- we told it dogs are dangerous in the training data, and now if it sees a cat, it says SAFE." That kind of thing.

Christine.Meinders · February 2020

The model was trained on images (jpegs -rather than pngs). These images were provided and labeled by Feminist.AI members. The donated images varied - for example, one Feminist.AI member provided an image of a tree, and labeled it as safe.

Another Feminist.AI member provided an image of a tree, and labeled it as dangerous. The image provided was taken a few years ago - over the past few months branches started falling off the tree and the city declared it unsafe.

It's interesting to see the image taken from google earth - as the digital artifact affects the outcome of the training model.

Christine.Meinders · February 2020

The model will be periodically updated with new images from the feminist search data donation website: https://aidesigntool.com/feminist-search

There is also a section to add an additional category - which creates a space to critically investigate how people view safety and danger beyond a binary.

CatherineGriffiths · February 2020

@jeremydouglass said:

I think the model was trained on photos, like jpeg images, is that right? What were they of? Faces, cats and dogs, guns, things posted to Twitter? If some of them were labeled as safe, who labeled them -- the researcher, social media users, volunteers?

I'm just trying to find a human context for the prototype or the concept -- not how it works, but what it is for and how it is used, specifically. "Its process trained on images of cats and dogs as safe and dangerous -- we told it dogs are dangerous in the training data, and now if it sees a cat, it says SAFE." That kind of thing.

My understanding of the process of data collection and labeling in this project, is that it's first of all, an attempt to include a broader community in the creation and labeling of data behind a search engine. It's less about how objectively correct the label is. Additionally, it's an attempt to be transparent about that process by foregrounding the human role in making such decisions rather than standing behind a supposed sense of objectivity.

Often when we are presented with machine learning prototypes, the problems being addressed are ones with very clear terms, such as classifying if something is a cat or dog or apple or orange. It's a problem with a singular and correct answer. That the Feminist.AI developers have chosen the search term 'safe' for their critical prototype presents us with a more complex problem, to which there is no correct answer or set of results which are more important than others. The more we get into this ambiguous territory, the more we get into ethically complex scenarios. My understanding is that the Feminist.AI team are potentially defining 'feminist search' as an engine that is more transparent about these ambiguities, and also is premised on open data shared by a diverse community.

I agree it would be great to see the original data set, to really have a visual sense of these ambiguities and subjective concerns, but I think that it hasn't been uploaded by the developers yet because when the data was originally collected, permissions hadn't been given for it to be shared yet. That's why as part of this discussion week, they are seeking new data donations that can be made public: https://aidesigntool.com/feminist-search

CatherineGriffiths · February 2020

@jeremydouglass said:
Thank you for sharing this!

I'm curious about the included starter model, fs_model.pkl, and what is in it. The README says "I created a model with some original data we had." What is this model made of, and what is it for classifying?

Yeah I'm curious if anyone reading this thread has managed to compile the code or looked at the .pkl file?

ebuswell · February 2020

@CatherineGriffiths said:
I’m just starting to understand how this code works, but my initial interest is to think about machine learning as a process that constantly reduces the complexity of information.

Beginning with the many interpretations of the phenomenon of ’safety’ contained in the original dataset in this case, complexity is removed from the dataset by way of image filtering, then through clustering, and eventually by classification. This reduction of complexity is conducted in the name of readability of the data by the algorithm.

In this case, perhaps feminist search could be considered a process of seeking to preserve a greater amount of complexity in the system at its various stages?

I'm thinking about this, in combination with:

@Christine.Meinders said:
The interesting problems in data science and machine learning aren't in churning out mathematically good predictions, however. The outcome of an algorithm is only as good as the data given to it and how the person(s) constructing it use that data in the creation of a model.

And, well, yes it seems like the interesting thing here is definitely the creation of an alternative model, and the way the model kind of refuses to abstract from the real world in the problematic ways other things do (the locality-based community aspect of this data, going into a presumably eventually networked search algo, is fascinating). But what about the algorithm itself? Even before the AI algorithm, there's search. Like—and I don't know too much about machine learning so apologies if I'm getting things wildly wrong—this reduction of y to x that @CatherineGriffiths is talking about seems to be a reduction of the not-searchable to the searchable, and then the way this is done determines the outcome. But what about this separation of the not searchable from the searchable in the first place?

On the one hand, there's something about textuality here. Like the image search needs machine learning, but a text-based search wouldn't.

But even beyond that, why search at all? This calls to mind the whole global optimization fiasco, where eventually it was proven that over all data sets, no algorithm is better than any other, including a random walk. But that's what's interesting. Yeah, of course we have purpose sometimes and random walks don't get us there. But not all the time. Maybe sometimes we need to just walk through and experience the data as it comes. And even in the case of purpose, optimization (and also search) generally defines the purpose teleologically—there is a result that is the object before the search/optimization begins. What about a purpose in fits and starts?

jeremydouglass · February 2020

@CatherineGriffiths said:
Yeah I'm curious if anyone reading this thread has managed to compile the code or looked at the .pkl file?

pkl ("pickle") files may often require that project dependencies be installed in order for the pickle data to be loadable / legible. That is true in this case -- trying to inspect the .pkl file with pickle() gives:

ModuleNotFoundError: No module named 'sklearn'

So, to inspect the pkl, I first set up the complete project, with a python3 virtual environment, the files downloaded, and requirements installed. For example, on macOS 10.12, from a Terminal bash shell:

mkvirtualenv --python=/usr/local/bin/python3 feminist_search
cd ~/git
git clone https://github.com/FeministAI/feminist_search.git
cd feminist_search/
# edit requirements.txt to remove the circular dependency in the first line
pip install -r requirements.txt

Then view the object from the command line:

python3 -mpickle fs_model.pkl

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)

This reveals that it is an SVC object (Support Vector Classification). Printing shows key attributes / states when the model was saved (which are, hopefully, the ones that were used when the model vectors were built). The scikit-learn SVC documentation can tell us more about what each of these means.

We can do further investigation with a script, additionally inspect available methods and attributes on the model object:

import pickle

with open('fs_model.pkl', 'rb') as f:
    fs_model = pickle.load(f)
print(fs_model)

model_methods = [method_name for method_name in dir(object)
    if callable(getattr(object, method_name))]
print(model_methods)
model_attributes = a.__dict__
print(model_attributes)

This further details available about the contents of the model -- although these are primarily mathematical (weights and vectors) rather than labels, categories, types of input data -- in other words, we may not immediately see "context" or "intent" here.

['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', 
 '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__',
 '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
 '__subclasshook__']

{'decision_function_shape': 'ovr', 'break_ties': False, 'kernel': 'rbf', 'degree': 3, 'gamma':
 'auto', 'coef0': 0.0, 'tol': 0.001, 'C': 1.0, 'nu': 0.0, 'epsilon': 0.0, 'shrinking': True,
 'probability': False, 'cache_size': 200, 'class_weight': None, 'verbose': False,
 'max_iter': -1, 'random_state': None, '_sparse': False, 'class_weight_': array([1., 1.]),
 'classes_': array(['False', 'True'], dtype='<U36'), '_gamma': 0.2, 'support_': array([ 1,  2,
  5,  6,  8,  9, 10,  0,  3,  4,  7, 11], dtype=int32), 'support_vectors_': array([[  15.,  111.,
   91.,   77.,   44.],
        [  36.,   52.,   68.,   79.,   59.],
        [  16.,   92.,   70.,   75.,   40.],
        [  30.,  226.,  118.,   61.,   70.],
        [ 206., 1219.,  559., 1035.,  362.],
        [  12.,   96.,   48.,  101.,   25.],
        [  23.,  113.,  109.,   80.,   64.],
        [ 869., 3609., 2848., 2420., 1791.],
        [1282., 8180., 4028., 4141., 2498.],
        [ 418., 2091., 1825., 1460.,  896.],
        [  37.,  153.,   94.,  125.,   75.],
        [  46.,  184.,  111.,  154.,  100.]]), '_n_support': array([7, 5], dtype=int32),
 'dual_coef_': array([[-0.71386719, -0.71484375, -0.71484375, -0.71484375,
 -0.71386719,
         -0.71386719, -0.71386719,  1.        ,  1.        ,  1.        ,
          1.        ,  1.        ]]), 'intercept_': array([-0.28571429]), 'probA_': array([],
 dtype=float64), 'probB_': array([], dtype=float64), 'fit_status_': 0, 'shape_fit_': (12, 5),
 '_intercept_': array([0.28571429]), '_dual_coef_': array([[ 0.71386719,  0.71484375,
  0.71484375,  0.71484375,  0.71386719,
          0.71386719,  0.71386719, -1.        , -1.        , -1.        ,
         -1.        , -1.        ]])}

patricia_s · February 2020

I'm really interested in seeing/learning from search patterns using this model.
Right now, I can't think of an answer to "What does a feminist search pattern look like?".

How does a pool of communal data used to train an AI protocol become permeable to the outside world without filtering the danger, and by that I mean the constant violence that male bodies are capable of inflicting on non-male bodies?
I realize it may be too soon for this question.

Christine.Meinders · February 2020

@patricia_s

As the Feminist Search continues it would be interesting to explore not only what a Feminist Search pattern looks like, but possibly sounds like and feels like.

Molly Wright Steenson writes about this history of patterns and AI in her book
Architectural Intelligence

CatherineGriffiths · February 2020

@jeremydouglass
Thank you for doing this, this is revealing! Perhaps not from the information that is there, but rather to the information that is not. As we know, AI and in particular deep learning models, suffer from the interpretability problem. By opening the model or classifier we can access the weights generated from the training process, and which are always obscured and abstracted from the decision-making process that we experience as users by a deep learning-powered technology. This would make more sense if we could also see an example of the original data. But we can still think about how AI blurs accountability, deleting or perhaps purposely forgetting processes that are necessary to reconstruct the tentacles between decisions, through data, to origins. A Feminist AI perhaps needs to take responsibility of not forgetting, improving upon the data structures of training and classification to contain paths back to provenance. I have written something similar in the main thread, but perhaps Donna Haraway’s concept of tentacular thinking can provide critical grounding for an implementation to what we could call a slower code, one that recognizes and does not reject its frictions as it is entangled with subjects and contexts.

My work with machine learning generated decision tree models attempts to visualize paths, bifurcation and real-time decisions during the execution of the algorithm. However, this is only possible because decision trees are easily interpretable systems. Deep neural networks on the other hand, present us with a profound challenge because the diffusion of decision making throughout the network can create a dangerous weapon. My proposal is to welcome complexity, friction, context and try to visualize some of these inner workings.

Lesia.Tkacz · February 2020

But even beyond that, why search at all? This calls to mind the whole global optimization fiasco, where eventually it was proven that over all data sets, no algorithm is better than any other, including a random walk. But that's what's interesting. Yeah, of course we have purpose sometimes and random walks don't get us there. But not all the time. Maybe sometimes we need to just walk through and experience the data as it comes. And even in the case of purpose, optimization (and also search) generally defines the purpose teleologically—there is a result that is the object before the search/optimization begins. What about a purpose in fits and starts?

Thanks for bringing this up @ebuswell. Half the time what I actually really want to do is to stroll through data and information and get to know what's there out of curiosity, and hopefully chance upon some quirks, surprises, or hidden issues. And I'm not sure that the standard tools at most people's disposable support this very well. I don't always want to search - I want to explore and discover and to see what there is on a certain path (a bit like staring out of the window of a train or observing scenes on a park walk).
I feel that something that isn't being done enough with data is leveraging it for its creative potential, and creating or adapting tools for that purpose.

Howdy, Stranger!

Categories

In this Discussion

Week 3: Feminist Search (Code Critique)

Comments