AppForge-Bench¶
A benchmark suite for LLM-based Android application synthesis and testing.
AppForge-Bench provides reproducible environments, Android emulator integration, and self-fix mechanisms for evaluating code generation and UI behavior across large-scale Android apps.
đ Quick Start¶
đ§ Prerequisite¶
Make sure you have Android Emulator and Android SDK installed on your machine.
Unzip the AppForge_Bench.zip and install dependencies:
cd AppForge_Bench
pip install -r requirements.txt
âī¸ Environment Setup¶
You can install our module in AppForge.zip in editable mode:
cd AppForge
pip install -e .[example]
đ§ Example Run¶
We provide an example script test.py
under the examples/
folder.
A quick test with qwen3coder can be executed using:
python examples/test.py \
--emulator_id <emulator_id> \
--bench_folder <path_to_AppForge_Bench> \
--sdk_path <sdk_path> \
--model qwen3coder \
--runs example_qwen3 \
--api_key_path <api_key_path> \
--start_id 0 \
--end_id 1
Example on our machine:
python examples/test.py \
--emulator_id emulator-5554 \
--bench_folder /mnt/AppForge-Bench \
--sdk_path /home/Android/sdk \
--model qwen3coder \
--runs example_qwen3 \
--api_key_path dash_scope.key \
--start_id 0 \
--end_id 1
đ§Š Optional: Self-Fix with Compilation Feedback¶
To activate self-fix (automatic repair using compilation feedback), set the parameter:
--self_fix_attempts <N>
If you don't have access to the provided model options, you can:
- Add your own model integration, or
- Use
--model=naive
to apply a baseline that makes no change to the base template.
âšī¸ For detailed running parameters, see the source code.
đģ View Code Reference
đĻ Downloads¶
âŦī¸ Download AppForge.zip âŦī¸ Download AppForge_Bench.zip