Skip to content

AppForge-Bench

A benchmark suite for LLM-based Android application synthesis and testing.
AppForge-Bench provides reproducible environments, Android emulator integration, and self-fix mechanisms for evaluating code generation and UI behavior across large-scale Android apps.


🚀 Quick Start

🔧 Prerequisite

Make sure you have Android Emulator and Android SDK installed on your machine.

Unzip the AppForge_Bench.zip and install dependencies:

cd AppForge_Bench
pip install -r requirements.txt

âš™ī¸ Environment Setup

You can install our module in AppForge.zip in editable mode:

cd AppForge
pip install -e .[example]

🧠 Example Run

We provide an example script test.py under the examples/ folder.

A quick test with qwen3coder can be executed using:

python examples/test.py \
  --emulator_id <emulator_id> \
  --bench_folder <path_to_AppForge_Bench> \
  --sdk_path <sdk_path> \
  --model qwen3coder \
  --runs example_qwen3 \
  --api_key_path <api_key_path> \
  --start_id 0 \
  --end_id 1

Example on our machine:

python examples/test.py \
  --emulator_id emulator-5554 \
  --bench_folder /mnt/AppForge-Bench \
  --sdk_path /home/Android/sdk \
  --model qwen3coder \
  --runs example_qwen3 \
  --api_key_path dash_scope.key \
  --start_id 0 \
  --end_id 1

🧩 Optional: Self-Fix with Compilation Feedback

To activate self-fix (automatic repair using compilation feedback), set the parameter:

--self_fix_attempts <N>

If you don't have access to the provided model options, you can:

  • Add your own model integration, or
  • Use --model=naive to apply a baseline that makes no change to the base template.

â„šī¸ For detailed running parameters, see the source code.
đŸ’ģ View Code Reference


đŸ“Ļ Downloads

âŦ‡ī¸ Download AppForge.zip âŦ‡ī¸ Download AppForge_Bench.zip