Deep Learning is a branch of Machine Learning. Most people interested in new technologies have heard about ML. It can help with automating many processes, retrieving useful information from data, finding hidden patterns, etc. In last major Metadata Digger Release we added support for Deep Learning to enrich metadata extracted from images. In a current version enrichment means recognizing objects on image and adding them as labels. It’ll make your OSINT search easier as you can narrow results to images with particular objects (table, person, etc.). Before we try it, let’s talk about some ML and DL basics. Remember that this intro contains some simplifications. If you want to go directly to practice, move to section “Deep Learning configuration (Metadata Digger)”.
Machine Learning
Machine Learning is a branch of a broader topic – Artificial Intelligence. Simply put – it’s a set of techniques which allows to automatically get some knowledge from a sample of data (called training data) and then make a prediction or a decision based on unseen information. One of those mechanisms are Artificial Neural Networks (ANN). It consists of neurons connected with each other and grouped into layers.
Every time we want to train data or make a prediction, we have to put it in an appropriate format to the input layer. Supposing you want to build ANN model that can get a picture and produce on output list of labels describing objects on the image. In a traditional approach, you have to implement complex algorithms retrieving crucial information from the picture. Then you have to put it into input layer of our ANN. Architecture of such network is usually not very complex (at most a few layers). A whole training process consists of many steps where you give input and show correct output. Neural Network adjusts parameters (called weights) to achieve the best fit to the output. When it’s ready, you should test your model (ANN with the best weights) on unseen images. A good model can recognize objects on new pictures.
Deep Learning
Deep Learning works very similar but here we are moving out our advanced algorithms retrieving information from the image (first step). Instead, we use new and more complex architecture of Neural Network containing many layers and internal mechanisms (those are replacement for the mentioned first step).
Now it’s the best part, you can just put your picture in original form (array of pixels) to the input layer and each next part of network will automatically focus on different aspects of image analysis. I wrote “in original format” but in most cases there is some work to do if you want to have very good results. Anyway, in DL approach you spend much more time on network architecture than in traditional one (shallow). Results could be really great and the good news is that there are plenty of ANN architectures for particular cases (image and text classification, objects recognition, etc.), so you don’t have to figure out the whole network from scratch.
Additionally there is a great technique called Transfer Learning providing methods for using pretrained part of ANN (you can select X layers) with all learned parameters (weights) and use in a different problem. It’s especially useful if you don’t have a huge training set and want to take an advantage of existing ANN models trained on a large number of samples. Learning process is time and resources consuming. In case of a big data set, it’s expensive as you have to use big machines with fast GPU/CPU and large memory. We used Transfer Learning in our sample model and utilized most layers of model built with known ANN architecture: Inception V1.
Deep Learning in scale – BigDL
As we use Spark for whole processing, we decided to choose BigDL framework which provides Deep Learning algorithms on Spark (supports Scala and Python). Additionally there is one another library – Analytics Zoo. It’s built on top of BigDL and contains integration with known Machine Learning formats (Tensorflow, Keras, etc.). Both are Open Source and created by Intel. They are quite new but look very promising.
Currently the only way of using Deep Learning in Metadata Digger is an image classification (recognizing what’s on the picture). You have to provide three things to make it work:
- Trained Deep Learning model in BigDL format. We prepared basic model and published on Open Source license but keep in mind that it’s not perfect. Our main goal was to provide framework for such analysis, not to provide a very accurate model. We have some plans for training other models (and more accurate) in the future. If you are interested, have some idea or just want to help, let us know on dev@datahunters.ai.
- List of mappings for labels. Neural Network provides a list of values on output where each element is assigned to a single category. You have to provide a list containing index on the list and name of the category.
- Threshold (default – 0.5). Output values of Neural Network are between 0 and 1. In most cases we can assume that value below 0.5 means that the model didn’t recognize a particular category/object on the image. Value greater than 0.5 means that it was found. You can set this threshold in MD properties to control this behavior and make your decision less or more restrict, depening on your needs. Sometimes you want to have more results, even less accurate (including false positives) but don’t want to miss something. In such a case, you can set this threshold to small value like 0.4.
Example model
I suppose you don’t want to build your own model in first step 😉 as it is not so simple. I trained a model recognizing 80 general categories. It’s far from perfect but gives quite a good result. If you tune classification process using threshold property, you can benefit from it in your research. I used COCO (Common Objects in Context) data set for training. I won’t go into details about training process in this post, just want to show sample usage. If you want to read more about this particular model, please go to the following repo: https://github.com/data-hunters/metadata-digger-ai.
Deep Learning configuration (Metadata Digger)
At first we’ll run the simplest version – recognizing objects on images without metadata extraction. If you are not familiar with Metadata Digger, you can read intro before you go into next steps. If you like videos, you can also watch this one showing how to download, configure and start Metadata Digger.
Let’s start with downloading latest version of Metadata Digger. Last week, we released a new minor version of MD containing Solr performance improvements, so if you want to use Solr, it’s much better to use version 0.2.1 instead of 0.2.0. You can find link here. Unzip appropriate distribution and follow these steps:
- Download Deep Learning model from our repo or use other in BigDL/Analytics Zoo format. You can find direct link to our repo here: https://github.com/data-hunters/metadata-digger-ai/raw/master/zoo_models/zoo_inception_v1_based_coco.model
- Copy configs/csv.config.properties and open in text editor. Set the following properties:
input.paths
– your input path containing imagesoutput.directoryPath
– path to directory that will be created by MD to store results.enrichment.classifier.modelPath
– path to downloaded Deep Learning modelenrichment.classifier.mapping
– mapping of Neural Network outputs on actual labels. Set the following list:69:oven,0:person,5:bus,10:fire hydrant,56:chair,42:fork,24:backpack,37:surfboard,25:umbrella,52:hot dog,14:bird,20:elephant,46:banana,57:couch,78:hair drier,29:frisbee,61:toilet,1:bicycle,74:clock,6:train,60:dining table,28:suitcase,38:tennis racket,70:toaster,21:bear,33:kite,65:remote,9:traffic light,53:pizza,77:teddy bear,13:bench,41:cup,73:book,2:car,32:sports ball,34:baseball bat,45:bowl,64:mouse,17:horse,22:zebra,44:spoon,59:bed,27:tie,71:sink,12:parking meter,54:donut,49:orange,76:scissors,7:truck,39:bottle,66:keyboard,3:motorcycle,35:baseball glove,48:sandwich,63:laptop,18:sheep,50:broccoli,67:cell phone,16:dog,31:snowboard,11:stop sign,72:refrigerator,43:knife,40:wine glass,26:handbag,55:cake,23:giraffe,8:boat,75:vase,58:potted plant,36:skateboard,30:skis,51:carrot,19:cow,4:airplane,79:toothbrush,47:apple,15:cat,68:microwave,62:tv
enrichment.classifier.threshold
– threshold (must be between 0 and 1) used to decide if the model recognized a particular object or not. The higher value is, the more restricted decision we have. I suggest to use at least 0.5 but if you want to have less false positive results, you should increase it to 0.8 (for this particular model).- Comment out the following property with “#” at the beginning of line or remove:
filter.mandatoryTag
s. We won’t need it now.
Categories detection
We can start MD action: detect_categories
now. You can do this easily with the following command (adjust path to configuration file):
sh run-standalone-metadata-digger.sh detect_categories configs/csv.config.properties
If you use distributed version of MD, change script file to run-distributed-metadata-digger.sh
.
Output of MD should be similar to this one:
You should have at least one CSV file with results. Output file contains rows with 4 columns:
ID
– MD5 hash based on full file pathBasePath
– path used in input.paths (you can set a list of input paths that’s why we put it in results)FilePath
– full file pathLabels
– comma separated list of labels
You can easily change output format to JSON by changing value of output.format
property to json
. Remember that running processing twice on the same output directory will cause error as Spark needs to create fresh directory for results.
Categories detection and metadata extraction
Categories detection could be treated as metadata enrichment process and that was the main reason for implementing this feature. We cannot say about metadata enrichment if we don’t have metadata ;), so let’s execute full Metadata Digger run which consists of categories detection and metadata extraction. This time we’ll write results to Solr (you can use CSV or JSON of course). If you don’t know how to configure MD with Solr, please read this tutorial.
When you configure standard Solr properties described in a mentioned article, please also add to a new configuration file: enrichment.classifier.modelPath
, enrichment.classifier.mapping
, enrichment.classifier.threshold
as described in “Deep Learning configuration (Metadata Digger)” section above.
When you’re ready, run this:
sh run-standalone-metadata-digger.sh full <PATH_TO_CONFIG>
As a result you will have all metadata put into Solr with additional field: labels
. It contains a list of recognized objects/categories on the image. You can easily narrow results in Solr to pictures with people and cars by adding to query:
labels:(person AND car)
You can also check how many images with particular categories have been detected using facets. To do that enable facet checkbox in Solr Admin -> Collection -> Query and type labels
value into facet.field
as on the screenshot below.
Results should look like this:
Example model – accuracy
As I mentioned, a provided model is not very accurate (general accuracy is ~97% but this number depends on a testing set), however in my opinion it’s still useful, especially when you want to find images containing a person (in my case it recognized a person despite poor quality of an image and the fact that person was a small part of the picture), a car, a truck, a table, a chair or a laptop. Due to imbalanced training data set and too short time of training it gives quite funny results in some cases.
These are some examples:
- It detects a tie when there is a person on image wearing a suit but still without a tie 😉 In a general search when you want to find people dressed more formally, you can use this label.
- If you have a field with sheaves of hay, it recognizes sheep 😉
- If there is a tower, the model very often “sees” a clock 😉 (probably due to towers with clock images in training data set)
- Label “vase” is treated as anything related to any type of a “container” for flowers 😉
It’s not surprising as time of training wasn’t very long and a training data set is imbalanced (e.g. there are more samples with people than with a carrot).
OSINT use cases
The used model could be helpful in narrowing huge amounts of images to some general objects. However, you can train your own one on your data set. It could be also general but more accurate. You can also think about building a specialized model related to the most common topics of your work which recognizes for instance different types of:
- military buildings,
- guns,
- documents,
- smartphones,
- cars,
- public events,
- places,
- many, many more.
One of interesting topics here is DeepFake detection. Training a model detecting images created with DeepFake tools could be very useful to find suspicious accounts on a large set of avatars from Twitter, LinkedIn, Facebook, etc. When you combain your search by a label and Exifs, you can find very interesting information.
Training new models
We are going to work on more accurate and also more specialized models in coming months, so if you want to help (ANN architecture, training set, servers, etc.), just leave a comment or send an email (dev@datahunters.ai). If you already have an interesting BigDL model, let us know and we’ll check it and link on our metadata-digger-ai repo.
If you are interested in building a model for commercial purposes (without publishing as Open Source) and just want to pay for it to have it done, we are also open, just let us know on contact@datahunters.ai.
I am very grateful for your attention and hope you enjoyed this post 🙂 If you want to be up to date with my latest articles, follow me on Twitter – @jca3s.