Individual AI solutions
Computer vision is a process for applying the ability of vision in a variety of automated applications. In this article, our computer vision expert Stefan Kinauer offers insights into the increasing relevance of computer vision in recent years and the various possibilities for jambit’s customer areas such as Banking & Insurance or the manufacturing industry.
A thought experiment to explain how computer vision works: In computer graphics, a 3D model is created with geometry and surface properties such as color, reflectivity, and transparency. Based on lighting and camera models, this can be rendered and displayed on a screen. Computer vision now turns this process around and infers the 3D model or its semantics and other input variables from the image. All the information that went into the process of image generation can be subject and target of computer vision.
The projection of the world onto a 2-dimensional image plane is replaced by the reconstruction of the world from an image, i.e. a kind of inverse function. In addition to contexts modeled by humans, about how an overall picture is created from individual pieces of information, neural networks can also infer contexts in a data-driven manner. In addition to the image formation process as modeled by humans, neural networks can also infer parts of this process in a data-driven manner. During "training", the network learns to make up a model of the environment from a large number of pictures and a given solution of what can be seen in the pictures.
Breakthroughs in computer vision since 2012
The fact that computer vision applications are now being considered for practical use in companies is the consequence of various developments in recent years and is largely based on the use of complex neural networks. To train these networks, one needs not only large computing capacities but also enormous volumes of (training) data. The development of graphic cards and their massive computational parallelism was a significant factor in this. In parallel, the internet made more and more training data available. The breakthrough for neural networks was then achieved in 2012 with the work of Alex Krizhevsky et al., which outperformed all competitors of a benchmark. Following this work, the field of deep learning has become very popular.
What computer vision is currently capable of
With the groundbreaking successes since 2012, the research field around computer vision and machine learning has grown significantly and progress could be achieved faster and faster. Today, general classification of image content or recognition of a car in the image area is a standard task that can be solved with open-source libraries. Developed techniques also enable, for example, to determine the car brand based on an image or to analyze the motion of a person in a video. Furthermore, it is possible to infer simple object relations (e.g., the book is on the table, a chair is next to the bicycle, etc.) or to reconstruct objects in all three dimensions.
In recent years, popular generative neural networks (GANs) have emerged that can be used to manipulate images, among other things. This is done, for example, by filling in missing image information (image inpainting), by changing or adding "false" information (so-called deep fakes) or changing the style of a painting (style transfer). Since image manipulations and alterations can deceive people, solutions must be developed to detect them. Therefore, researchers are working in parallel on the detection of such image manipulations.
Before these approaches work in practice, the algorithm usually has to be adapted to the specific application and the characteristics of the images. For most tasks, the algorithm needs a quite data set to learn how to determine the correct "solution". Especially in this phase, the transfer of knowledge and the close exchange between software service provider and customer are important in development projects. This is because our customers usually know their domain much better than we do. Therefore, our main contribution often lies in the adaptation or creation of algorithms and their training. But sometimes we also help to find or create data sets. And of course, we advise our customers with our expertise in computer vision and machine learning.
The potential of computer vision for the financial sector
Computer vision can automate time-consuming and previously manual processes for financial companies to analyze both computer-generated and handwritten documents. Analyzed documents can be categorized and forwarded to the right contact person. In a later step, machine learning can be used to make predictions of financial variables, such as stock prices or a company's credit score. Satellite data provide conclusions about economic developments. By analyzing these large volumes of image data from space, growth rates of countries or regions can be substantiated not only on the basis of published figures, but also through real images of traffic, infrastructure or resources. This provides relevant information for investors, for example.
Potentials of computer vision for the manufacturing industry
In the industrial sector, computer vision can support human activities in production processes that were very time-consuming in the past. For example, component traceability or better quality monitoring. Computer vision can also assist with inventories or the continuous estimation of storage space utilization. In this way, space can be optimally used and maximized in terms of its economic value. At the same time, automated monitoring options increase the safety of buildings or of people moving around inside those buildings. In the area of quality control, computer vision can check the completeness of ordered parts and ensure the manufacturing quality, for example by checking weld seams. With the help of computer vision, construction sites can be better supervised in terms of their progress, for example by construction supervisors.
Which hardware equipment is necessary?
A certain amount of hardware equipment is necessary for the use of computer vision. Depending on the application, either conventional cameras or high-quality models can be used. Conventional cameras are sufficient in many cases, e.g. cell phone cameras. However, in the industrial sector, higher quality cameras are often more appropriate, for example when analyzing fine surface structures or fast-moving processes. For area monitoring, cameras with wide-angle lenses are an option. In special applications, such as in the automotive industry, LIDARs are also used. Also, multispectral cameras to perceive light in relevant wavelengths are used for example in agriculture. Some computer vision algorithms are very computationally intensive. Therefore, the computing power available on the chips built directly into the device or in the cloud is also crucial. Cloud solutions also make many applications scalable, as companies do not need a physical infrastructure.
What profile do software service providers need to advise clients in the field of computer vision?
In the field of artificial intelligence, jambit can rely on roles such as AI experts, Data Scientists, Big Data Engineers and Research Engineers. In current projects, jambit's experts work with Intellij and VS Code development environments. The used programming languages include Python and C++.
To support customer projects, experts need knowledge in machine learning and a clear understanding of image formation processes. In addition, a solid mathematical basis in linear algebra, probabilistics/statistics and analysis as well as in optimization methods and data structures is important.