NAV Navbar
  • Deep Learning for mobile and IoT
  • A high level view
  • Numericcal Cloud API
  • Numericcal Edge SDK
  • Deep Learning for mobile and IoT


    Why run locally on devices?

    When building a mobile or IoT application with Deep Learning it is appropriate to consider whether Deep Neural Network models should live in the Cloud or on users’ local devices. Here are some hints to help you decide:

    • Edge is cheap
      Moving DNN models to local devices will lower your operating costs by reducing the need for Cloud compute.
    • Edge is scalable
      New users come with their own devices, so you don’t need to redesign your system as you grow.
    • Edge is fast
      The roundtrip time from a device to the cloud is 200ms. Even pretty large models can run locally in less time.
    • Edge is available offline
      You can serve your users even when they are offline when the whole app runs on their devices.
    • Edge is secure
      By keeping all the data on user’s device you are less exposed to hacking.

    Welcome to numericcal! We build tools for easy optimization and management of Deep Learning models on mobile, embedded and IoT devices.

    We currently support Android.Java and will be adding iOS.Swift, Unity.C# and other SDKs. Let us know about your use case, we want to help! As you read along, you can view examples and additional information in the dark area.

    Our tools make it quick and easy to:

    A high level view

    At the highest level, to use our system you need to:

    Below, we go into more details about how components work and fit together to provide developers with infrastructure they need to easily and quickly close the virtuous circle of AI.

    Numericcal Cloud API

    Numericcal Cloud API provides a unified interface to control and monitor all of your Deep Neural Network models across all applications and devices in your edge fleet. You can

    Currently, the API is accessible through a graphical dashboard, which accommodates the most common use cases. In the next release, we will open up the API for programmatic access.

    Model management

    Packaging upload Model import and creation

    With Numericcal AI Model Repository, you can

    On the right we show how you can acquire a pre-packaged model and how to package your custom model by providing TensorFlow graphs and auxiliary files. The system does not have any restrictions on auxiliary files. For example, we use them to store labels for classification networks and anchor boxes for object detectors.

    Model packaging

    Packaging committing Populating model metadata

    Packaging ensures that all the necessary information to deploy and run a model is in one place. It makes models reusable and simplifies the integration code since model related constants do not need to be hard-coded on the application side. To package a model

    1. upload all the necessary files,
    2. specify input and output graph nodes and their shapes,
    3. specify the runtime engine to be used, and
    4. declare and specify model hyperparameters.

    Machine Learning engineers can enforce an abstraction barrier between the model and the client application by introducing model parameters. Names and types of model parameters must stay the same across versions of a unique modelId. However, values of model parameters can change across versions. As long as software engineering treats model parameters as values known only at runtime, ML engineers can freely change, improve and update models with no overhead for the engineering.

    For example, we package TinyYOLOv2 S (cell count), B (boxes per cell) and C (classes predicted per box) values as model parameters. This leaves us space to grow and change the model in the future without risking engineering overheads. If we need to, we can switch from 13x13 predictor with 5 boxes per cell and 20 classes to 19x19 predictors with 8 boxes per cell and 40 classes TinyYOLOv2 without ever touching the client code.

    Deployment configuration

    model_collection_config Configuring model collection

    Numericcal Deployment Manager provides a highly flexible pipeline for pushing models to devices running applications. By creating a deployable model collection and copying the generated collection ID to the edge-side model instantiation code, we can assign different neural networks to different subgroups of the fleet without changing the application itself. Figure to the right shows how you can distribute different versions of a model to the edge devices running the application, according to the their SoCs or Android API levels. Note this configuration can be modified at any time, before or after the application gets installed onto the device.

    With our infrastructure, it is possible to enable more complex distribution scheme (different models for different geolocations/different user groups etc.) These more advanced options will be accessible in the next release.

    Analytics and monitoring

    analytics Monitoring the deployment

    From the deployment manager, you can track how frequently your models are being used, and also have an overview how performant they are when running on different devices. The figure on the right shows how to access these statistics from the dashboard. Currently, the dashboard interface provides a daily active user chart, which aggregates the number of unique users for all your deployments. Per model deployment, you can also see how many inferences have been performed and how fast they were running on all kinds of systems. With these profiling information, you can adjust your deployment strategy to provide good user experience across a wide variety of devices.

    Numericcal Edge SDK

    Our Edge SDKs are designed for simplicity and ease of use. The design rests on two main principles:

    Below is documentation for the Edge SDK.

    SDK Setup

    Edit the root build.gradle to add our artifactory

    allprojects {
        repositories {
            // ...
            maven { url "" }

    Edit the build.gradle of your app to add SDK dependency and enable Java 1.8.

    android {
            // ...
            compileOptions {
                sourceCompatibility JavaVersion.VERSION_1_8
                targetCompatibility JavaVersion.VERSION_1_8
            // ...
        aaptOptions {
            noCompress "pb", "md", "tflite", "lite", "labels"
    // ...
    dependencies {
        // ...
        // numericcal edge SDK for Android
        implementation "com.numericcal.edge:edge:x.y.z-phase"

    The current version is 0.20.0-beta. Import com.numericcal.edge.Dnn to use numericcal SDK

    import com.numericcal.edge.Dnn;

    To start using numericcal SDK you should

    1. point your build system to our repository,
    2. set build options, and
    3. add numericcal Edge SDK dependency.

    See the example section for the details on how to do this.

    Model Config

    Dnn.ModelConfig cfg = Dnn.configBuilder

    Configuring a model request is as simple as describing the account and model collection from which the model should be obtained. Note that, by itself, this code simply creates a configuration object for requesting the model. To actually perform model loading we need to use Dnn.Manager.

    Model Manager

    Obtain an instance of Dnn.Manager.

    // onCreate() or onResume()
    Dnn.Manager dnnManager = Dnn.createManager(getApplicationContext());

    You should release the manager when you are done.

    // onPause() or onDestroy()

    Dnn.Manager, as the name implies, performs all the tedious work necessary to

    Essentially, it manages the lifecycle of the model artifact (package) from the time we request it through Dnn.ModelConfig object until the time when we load it in memory for use.

    Manager lifecycle

    Dnn.Manager acquires resources (threads and memory) for running models. These resources should be returned to the system by calling .release() when we are done using it.

    Instantiating a network handle

    To perform model loading we simply pass the configuration object to the manager, like so:

    Single<Dnn.Handle> dnn = dnnManager.createHandle(cfg);

    Loading a model is a simple matter of passing the ModelConfig object to the Manager.

    However, loading models can take 10s of milliseconds, even when the model is available locally! Hence, the call returns a Single<Dnn.Handle> to indicate that the handle will be delivered asynchronously. Internally, Dnn.Manager instance will offload work to a new thread, check whether the model is already cached locally or needs to be downloaded. Even if there is a local version of the model, it will (occasionally) check whether updates are available on the server and download them.

    Note that the only information about the model that’s available to the manager is what’s contained in the configuration object. Specifically, manager does not know what exact package it should download or what runtime engine the package requires. This makes it possible to customize deployment, even after edge applications have been installed, entirely from the Cloud API! We call this mechanism Late Artifact Binding (LAB).

    Model Handle

         .flatMap(handle -> {
             int inputWidth =;
             int inputHeight =;
             int outputLen =;
             return Camera.getFeed(this, cameraView, camPerm, inputWidth, inputHeight)
                      .compose(Camera.bmpToFloatHWC(IMAGE_MEAN, IMAGE_STD))
                      .compose(handle.runInference()) // ⇐ all that’s needed to run inference!
                      .compose(Camera.lpf(outputLen, 0.75f))
                      .map(probs -> Utils.topkLabels(probs,, TOP_LABELS));

    Once we have a handle we can insert it into any ReactiveX chain. In the example on the right we load the network, open a camera stream based on the network input shape, send frames to the network and post-process the inference results. The actual inference is performed by the .compose(handle.runInference()) method call.

    Note that a single handle can safely be used across multiple Rx chains. The library will make sure that access to the underlying DNN model is properly serialized, and perform input/output multiplexing. Moreover, the library will automatically move every handle to a separate thread.

    public static class ModelParams {
            public int S;
            public int C;
            public int B;
            public int inputMean = 0;
            public float inputStd = 256.0f;
            public List<AnchorBox> anchors = new ArrayList<>();
            public List<String> labels;
            ModelParams(Dnn.Handle hdl) throws JSONException, IOException {
                JSONObject json =;
      , json.toString());
                this.S = json.getInt("S");
                this.C = json.getInt("C");
                this.B = json.getInt("B");
                this.labels = Utils.loadLines(hdl.loadHyperParamFile(json.getString("labels")));
                for (String anchorStr: Utils.loadLines(hdl.loadHyperParamFile(
                        json.getString("anchors")))) {

    Dnn.Handle also has one public field of type Dnn.Handle.Info. This field holds information about the network that was loaded from the package. Model handle also carries model parameters as a JSON serialized public field <handle>.info.params. Users must write a parser for this data because the system does not impose any constraints on what can be present there. However, since any given uniqueId must maintain the same names and types of parameters this parser needs to be written only once for any model. On the right we show how model parameter parsing works for TinyYOLOv2 model. Applications can leverage information contained in this object to abstract over network details and improve modularity of the system.