With our default menuconfig (plus memory debugging enabled), I get: I (0) cpu_start: App cpu up. I (2206) heap_alloc_caps: Initializing. RAM available for dynamic allocation: I (2229) heap_alloc_caps: At 3FFAFF10 len 000000F0 (0 KiB): DRAM I (2249) heap_alloc_caps: At 3FFCA290 len 00015D70 (87 KiB): DRAM I (2270) heap_alloc_caps: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM I (2291) heap_alloc_caps: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM I (2313) heap_alloc_caps: At 400955AC len 0000AA54 (42 KiB): IRAM OVMS > module memory ============================ Free 8-bit 77696/218088, 32-bit 18608/43580, numafter = 872 task=Asy total= 0 0 24972 change= +0 +0 +24972 task=tiT total= 352 0 0 change= +352 +0 +0 task=Hou total= 20192 24444 0 change= +20192 +24444 +0 task=mai total= 9488 26948 0 change= +9488 +26948 +0 task=IDL total= 72 0 0 change= +72 +0 +0 task=ipc total= 9320 0 0 change= +9320 +0 +0 task=eve total= 41440 0 0 change= +41440 +0 +0 If I turn off bluetooth in menuconfig, I get: OVMS > module memory ============================ Free 8-bit 150548/290912, 32-bit 18628/43600, numafter = 872 task=Asy total= 0 0 24972 change= +0 +0 +24972 task=tiT total= 352 0 0 change= +352 +0 +0 task=Hou total= 44592 0 0 change= +44592 +0 +0 task=mai total= 36436 0 0 change= +36436 +0 +0 task=IDL total= 72 0 0 change= +72 +0 +0 task=ipc total= 9320 0 0 change= +9320 +0 +0 task=eve total= 41468 0 0 change= +41468 +0 +0 72KB RAM seems a bit ridiculous for a bluetooth stack. I really hope this is not Espressif loading those binary blobs from libbtdm_app.a into RAM. If I configure a minimal (for our needs) bluetooth (BLE only), I get: --- Bluedroid Bluetooth stack enabled (3072) Bluetooth event (callback to application) task stack size (NEW) [ ] Bluedroid memory debug (NEW) [ ] Classic Bluetooth (NEW) [*] Release DRAM from Classic BT controller [*] Include GATT server module(GATTS) (NEW) [ ] Include GATT client module(GATTC) [*] Include BLE security module(SMP) (NEW) [*] Close the bluedroid bt stack log print (NEW) (2) BT/BLE MAX ACL CONNECTIONS(1~7) (NEW) OVMS > module memory ============================ Free 8-bit 106312/246720, 32-bit 18576/43548, numafter = 872 task=Asy total= 0 0 24972 change= +0 +0 +24972 task=tiT total= 352 0 0 change= +352 +0 +0 task=Hou total= 21840 22776 0 change= +21840 +22776 +0 task=mai total= 36436 0 0 change= +36436 +0 +0 task=IDL total= 72 0 0 change= +72 +0 +0 task=ipc total= 9320 0 0 change= +9320 +0 +0 task=eve total= 41476 0 0 change= +41476 +0 +0 With just classic bluetooth enabled: OVMS > module memory ============================ Free 8-bit 77696/218096, 32-bit 18608/43580, numafter = 872 task=Asy total= 0 0 24972 change= +0 +0 +24972 task=tiT total= 352 0 0 change= +352 +0 +0 task=Hou total= 20200 24444 0 change= +20200 +24444 +0 task=mai total= 9488 26948 0 change= +9488 +26948 +0 task=IDL total= 72 0 0 change= +72 +0 +0 task=ipc total= 9320 0 0 change= +9320 +0 +0 task=eve total= 41440 0 0 change= +41440 +0 +0 Turning off our memory and task debugging features, and enabling BLE GATT server + security (2 connections), gives me 122KB in metric m.freeram. That should include bluetooth requirements, but no wifi running yet. With wifi, it drops to 82KB. That should be what end-users of OVMS will see, and I’m happy with that, so have committed that as a default config for the moment. With memory and stack debugging enabled (as developers will most likely use), it drops 122KB-> 106KB. I did see some comments in the forums about this. For example: https://www.esp32.com/viewtopic.php?t=3139 <https://www.esp32.com/viewtopic.php?t=3139> It seems that IDF 3.0 changes the way this works (dynamic, rather than static as per menuconfig), and in the documentation for that they mention 70KB. It looks like a minimal bluetooth for us is going to cost 40KB RAM or so. Regards, Mark.
On Mon, 23 Oct 2017, Mark Webb-Johnson wrote:
72KB RAM seems a bit ridiculous for a bluetooth stack. I really hope this is not Espressif loading those binary blobs from libbtdm_app.a into RAM.
I've been investigating the memory map to try to understand better what is available to us. From the OS code and the technical reference manual I constructed the following map: 0x3ff00000 - 0x3ff80000 Peripheral 0x3ff80000 - 0x3ff82000 RTC fast memory 0x3ff82000 - 0x3ff90000 --- 0x3ff90000 - 0x3ffa0000 Internal ROM 0x3ffae000 - 0x3ffaff10 DRAM, reserved for ROM data region, inc region needed for BT ROM routines 0x3ffaff10 - 0x3ffb0000 Heap: DRAM little piece 0x3ffb0000 - 0x3ffc0000 DRAM, reserved for BT hardware shared memory & BT data region 0x3ffc0000 - 0x3ffd0578 DRAM used by bss/data static variables (VARIES) 0x3ffd0578 - 0x3ffe0000 Heap: DRAM 8-bit accessible 0x3ffe0000 - 0x3ffe0440 D/IRAM, reserved for ROM PRO data region 0x3ffe0440 - 0x3ff34000 Heap: D/IRAM DMA capable, 8-bit accessible 0x3ffe4000 - 0x3ffe4350 D/IRAM, reserved for ROM APP data region 0x3ffe4350 - 0x40000000 Heap: D/IRAM DMA capable, 8-bit accessible 0x40000000 - 0x40060000 Internal ROM 0x40060000 - 0x40070000 --- 0x40070000 - 0x40078000 IRAM CPU0 cache region 0x40078000 - 0x40080000 IRAM CPU1 cache region 0x40080000 - 0x400955c0 IRAM used by code? (VARIES) 0x400955c0 - 0x400a0000 Heap: IRAM EXEC capable, 32-bit only 0x400a0000 - 0x400c0000 Internal SRAM used for something else See there are two chunks of memory for Bluetooth, 7952 bytes at 0x3ffae000 and 65536 bytes at 0x3ffb0000. I have reduced the increased bss/data size that resulted from configuring in the memory debugging code by replacing the static arrays with memory allocated from the IRAM heap that is only 32-bit accessible. There might be other memory uses that could go there. Right now I'm using 24972 out of that 49560 space leaving only 18588, but I'm thinking about how to reduce that by moving some of the data reduction into the OS. -- Steve
I'd like to add another $.02 regarding RAM usage. We need to consider the impact of all three modes of RAM usage: 1. static declaration of variables 2. local (automatic) variables that cause the maximum stack usage to increase so we need to dedicate a larger stack allocation 3. dynamic allocations from the heap I have more specific comments for each of these. 1. If you need a large buffer, declaring it as static storage means that it is always allocated even if your code is not being used (unless it is something like vehicle-specific code that is configured out). So, it would be better to dynamically allocate that buffer space from the heap (malloc) when needed and then free it when finished so that the usage is only temporary. That way the same space might be used for other purposes at other times. 2. I recomment NOT USING std::string except where it is really needed and useful. In particular if you have a function parameter that is always supplied as a character constant but the type of the parameter is std::string then the compiler needs to expand the caller's stack space by 32 bytes, for each such instance of the call, to hold the std:string structure. Additional heap space is required for the characters. None of that would be required if the parameter type were const char*. The same problem applies to functions that return std::string, since the compiler must allocate stack space in the calling function for the return value to be copied. In particular if the caller is just going to print that string with .c_str(), it would be much better to put the .c_str() in the called function and return the const char* UNDER ONE IMPORTANT CONDITION: this depends on the std::string in the called function being stable, such as a member variable of the class. If the string in the called function is automatic (allocated on the stack), then the .c_str() of it won't be valid in the caller. I saved substantial stack space and also heap space by changing the command maps in OvmsCommand from map<std::string, OvmsCommand*> to map<const char*, OvmsCommand*, CompareCharPtr>. This was possible because all of the command token strings that are put in the map come from character constants anyway, and those are stable. I think there are several more functions that could safely have their arguments or return values changed. Now, I don't mean to be pushing us back to essentially writing C code in C++ and ignoring the benefits of C++. For places where dynamic storage is needed, as for a class member, using std::string is a big advantage and not a problem. Just be cognizant of the costs where it is used. 3. As I mentioned in an earlier message, there is another 40K of RAM available for dynamic allocation by code that only requires 32-bit access, not byte-access. This is in IRAM (Instruction RAM). It won't be allocated by 'malloc' or 'new' but can be allocated explicitly with pvPortMallocCaps(sizeof(size, MALLOC_CAP_32BIT). I'm currently using part of it for the storage of metadata about blocks allocated from the heap in the ovms_module debugging code to minimize the impact that using that code has on the memory available for the code to be tested. -- Steve
While implementing the set type metrics I mentioned before, I've checked out metrics RAM usage: just adding four new metrics reduced my free memory by 656 bytes. 112 bytes of that were names, 16 bytes were actual data content, 16 bytes were pointers to the metrics. So it seems each metric currently needs 128 bytes management overhead (+ RAM for the name in case of dynamic names like for an array of battery cells). I've added 128 custom metrics now, which add up to ~ 20 KB of RAM -- more than 30% of the free RAM available before. I've got ~80% of my planned metrics now, so it should fit for the Twizy, but a more complex monitoring would need to use combined metrics for i.e. cell data. Maybe we can change the registry structure to not using std::map. The fast btree lookup is not really needed if listeners use pointers. A simple std::forward_list would do if new metrics are inserted sorted. Also, allocating RAM for name patterns like "...cell.<n>..." is of course very wasteful, maybe we can introduce some array logic to generate these names. Regards, Michael Am 25.10.2017 um 07:09 schrieb Stephen Casner:
I'd like to add another $.02 regarding RAM usage. We need to consider the impact of all three modes of RAM usage:
1. static declaration of variables
2. local (automatic) variables that cause the maximum stack usage to increase so we need to dedicate a larger stack allocation
3. dynamic allocations from the heap
I have more specific comments for each of these.
1. If you need a large buffer, declaring it as static storage means that it is always allocated even if your code is not being used (unless it is something like vehicle-specific code that is configured out). So, it would be better to dynamically allocate that buffer space from the heap (malloc) when needed and then free it when finished so that the usage is only temporary. That way the same space might be used for other purposes at other times.
2. I recomment NOT USING std::string except where it is really needed and useful. In particular if you have a function parameter that is always supplied as a character constant but the type of the parameter is std::string then the compiler needs to expand the caller's stack space by 32 bytes, for each such instance of the call, to hold the std:string structure. Additional heap space is required for the characters. None of that would be required if the parameter type were const char*. The same problem applies to functions that return std::string, since the compiler must allocate stack space in the calling function for the return value to be copied. In particular if the caller is just going to print that string with .c_str(), it would be much better to put the .c_str() in the called function and return the const char* UNDER ONE IMPORTANT CONDITION: this depends on the std::string in the called function being stable, such as a member variable of the class. If the string in the called function is automatic (allocated on the stack), then the .c_str() of it won't be valid in the caller.
I saved substantial stack space and also heap space by changing the command maps in OvmsCommand from map<std::string, OvmsCommand*> to map<const char*, OvmsCommand*, CompareCharPtr>. This was possible because all of the command token strings that are put in the map come from character constants anyway, and those are stable.
I think there are several more functions that could safely have their arguments or return values changed. Now, I don't mean to be pushing us back to essentially writing C code in C++ and ignoring the benefits of C++. For places where dynamic storage is needed, as for a class member, using std::string is a big advantage and not a problem. Just be cognizant of the costs where it is used.
3. As I mentioned in an earlier message, there is another 40K of RAM available for dynamic allocation by code that only requires 32-bit access, not byte-access. This is in IRAM (Instruction RAM). It won't be allocated by 'malloc' or 'new' but can be allocated explicitly with pvPortMallocCaps(sizeof(size, MALLOC_CAP_32BIT). I'm currently using part of it for the storage of metadata about blocks allocated from the heap in the ovms_module debugging code to minimize the impact that using that code has on the memory available for the code to be tested.
-- Steve _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Firstly, remember that with memory debugging turned on (menuconfig, Components, OVMS, Developer, Enabled extended RAM memory allocation statistics), there is an extra overhead for each memory allocation. I tested a simple allocation of a 32 byte object, and that reduced memory by 56 bytes. So, seems to be 24 bytes for each allocation. Not sure what it is without the debugging turned on (as much harder to see the allocations), but it should be a lot less. I added some instrumentation, and got this: I (2778) metrics: Initialising METRICS (1810) I (2817) metrics: OvmsMetric is 28 bytes I (2861) metrics: OvmsMetricBool is 32 bytes I (2909) metrics: OvmsMetricInt is 32 bytes I (2955) metrics: OvmsMetricFloat is 32 bytes I (3003) metrics: OvmsMetricString is 52 bytes Then, a ‘test metric’ to register a new OvmsMetricBool metric, and got this: OVMS > test metric OVMS > module memory ============================ Free 8-bit 102028/246616, 32-bit 18576/43548, numafter = 921 task=Asy total= 0 104 24972 change= +0 +104 +0 ============================ That is using a ‘x.test.metric’ name (which should fit in the std::string small string allocation). It seems that using OvmsMetricBool (probably our smallest), we currently need 32bytes for the bool itself, plus 48 bytes for the OvmsMetrics map linkage, plus 24 bytes allocation overhead. Looking at the OvmsMetricBool, it adds a ‘bool m_value’ on top of the base OvmsMetric values, which are: const char* m_name; bool m_defined; bool m_stale; int m_autostale; metric_unit_t m_units; std::bitset<METRICS_MAX_MODIFIERS> m_modified; uint32_t m_lastmodified; Looking at the OvmsMetric member variables, we can live with a autostale of unsigned 16 bit integer, so re-arranging those is a simple win: const char* m_name; metric_unit_t m_units; std::bitset<METRICS_MAX_MODIFIERS> m_modified; uint32_t m_lastmodified; uint16_t m_autostale; bool m_defined; bool m_stale; With that done, OvmsMetric goes from 28 bytes to 24 (with similar 4 byte win on all the other derived types). Also, my simple registration of a new OvmsMetricBool goes from 104 bytes to 100. Looking at the m_metrics map storage, I tried a simple: std::forward_list<uint32_t> x; for (uint32_t k=0;k<100;k++) x.push_front(k); OVMS > test metric OVMS > module memory task=Asy total= 0 3228 24972 change= +0 +3228 +0 OVMS > test metric OVMS > module memory task=Asy total= 0 6428 24972 change= +0 +3200 +0 That works out at 32 bytes per std::forward_list entry. That is presumably the 24 bytes for the allocation, plus 8 bytes per entry (a pointer to the next, plus a pointer to the entry content). The absolute most efficient dynamic system would be an OvmsMetric* m_next in OvmsMetric, then a OvmsMetric* m_first in OvmsMetrics, and remove the map altogether. That would be 4 bytes for the OvmsMetrics list, plus an extra 4 bytes for each metric. The original design made a lot of use of MyMetrics.Find(), but that has been deprecated now and I don’t see anything using it at all. The only thing we need is a ‘metrics list’ iterator to show the details - all that needs is an ordered list. So, I went ahead and did that. Changed ovms_metrics to use a manually managed one-way linked list. With the original code, and my default set of metrics (not including the Twizy ones), this comes to: OVMS > module memory ============================ Free 8-bit 102264/246616, 32-bit 18576/43548, numafter = 917 OVMS > vehicle module RT I (107076) v-renaulttwizy: Renault Twizy vehicle module OVMS > module memory ============================ Free 8-bit 75380/246616, 32-bit 18576/43548, numafter = 1000 With the new code, and my default set of metrics (not including the Twizy ones), this comes to: OVMS > module memory ============================ Free 8-bit 106536/246632, 32-bit 18576/43548, numafter = 828 OVMS > vehicle module RT I (55126) v-renaulttwizy: Renault Twizy vehicle module OVMS > module memory ============================ Free 8-bit 85800/246632, 32-bit 18576/43548, numafter = 1000 So, a reasonable saving of 10KB. I tried with memory and task debugging turned off, and we get m.freeram 118888/98200 going to 121884/104932. With the optimised OvmsMetrics code, RT seems to be allocating memory as follows: Free 8-bit 85812/246632, 32-bit 18576/43548, numafter = 1000 task=Asy total= 0 20616 24972 change= +0 +20616 +0 We could use a static style allocation (std::vector, or a static array), and fixed structures (rather than dynamic objects). That would save the allocation overhead, but would make things a lot more rigid. All the above committed and pushed. Regards, Mark. P.S. Note that Espressif are just now coming out with WROVER modules that include 32Mbit of PSRAM (can be mapped so that heap size goes up around 4MB!) Probably another few months before code support for that stabilises, and modules available in quantity. Perhaps in our future we could switch to that, but for the moment I think we’re ok. Just need to be careful.
On 30 Oct 2017, at 5:45 AM, Michael Balzer <dexter@expeedo.de> wrote:
While implementing the set type metrics I mentioned before, I've checked out metrics RAM usage: just adding four new metrics reduced my free memory by 656 bytes. 112 bytes of that were names, 16 bytes were actual data content, 16 bytes were pointers to the metrics.
So it seems each metric currently needs 128 bytes management overhead (+ RAM for the name in case of dynamic names like for an array of battery cells).
I've added 128 custom metrics now, which add up to ~ 20 KB of RAM -- more than 30% of the free RAM available before. I've got ~80% of my planned metrics now, so it should fit for the Twizy, but a more complex monitoring would need to use combined metrics for i.e. cell data.
Maybe we can change the registry structure to not using std::map. The fast btree lookup is not really needed if listeners use pointers. A simple std::forward_list would do if new metrics are inserted sorted. Also, allocating RAM for name patterns like "...cell.<n>..." is of course very wasteful, maybe we can introduce some array logic to generate these names.
Regards, Michael
Am 25.10.2017 um 07:09 schrieb Stephen Casner:
I'd like to add another $.02 regarding RAM usage. We need to consider the impact of all three modes of RAM usage:
1. static declaration of variables
2. local (automatic) variables that cause the maximum stack usage to increase so we need to dedicate a larger stack allocation
3. dynamic allocations from the heap
I have more specific comments for each of these.
1. If you need a large buffer, declaring it as static storage means that it is always allocated even if your code is not being used (unless it is something like vehicle-specific code that is configured out). So, it would be better to dynamically allocate that buffer space from the heap (malloc) when needed and then free it when finished so that the usage is only temporary. That way the same space might be used for other purposes at other times.
2. I recomment NOT USING std::string except where it is really needed and useful. In particular if you have a function parameter that is always supplied as a character constant but the type of the parameter is std::string then the compiler needs to expand the caller's stack space by 32 bytes, for each such instance of the call, to hold the std:string structure. Additional heap space is required for the characters. None of that would be required if the parameter type were const char*. The same problem applies to functions that return std::string, since the compiler must allocate stack space in the calling function for the return value to be copied. In particular if the caller is just going to print that string with .c_str(), it would be much better to put the .c_str() in the called function and return the const char* UNDER ONE IMPORTANT CONDITION: this depends on the std::string in the called function being stable, such as a member variable of the class. If the string in the called function is automatic (allocated on the stack), then the .c_str() of it won't be valid in the caller.
I saved substantial stack space and also heap space by changing the command maps in OvmsCommand from map<std::string, OvmsCommand*> to map<const char*, OvmsCommand*, CompareCharPtr>. This was possible because all of the command token strings that are put in the map come from character constants anyway, and those are stable.
I think there are several more functions that could safely have their arguments or return values changed. Now, I don't mean to be pushing us back to essentially writing C code in C++ and ignoring the benefits of C++. For places where dynamic storage is needed, as for a class member, using std::string is a big advantage and not a problem. Just be cognizant of the costs where it is used.
3. As I mentioned in an earlier message, there is another 40K of RAM available for dynamic allocation by code that only requires 32-bit access, not byte-access. This is in IRAM (Instruction RAM). It won't be allocated by 'malloc' or 'new' but can be allocated explicitly with pvPortMallocCaps(sizeof(size, MALLOC_CAP_32BIT). I'm currently using part of it for the storage of metadata about blocks allocated from the heap in the ovms_module debugging code to minimize the impact that using that code has on the memory available for the code to be tested.
-- Steve _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
_______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
Also should mention that AnalogLamb are working on a drop-in replacement for the WROOM-32 module that we use, but with the 32Mbit PSRAM inside. Pinout is the same. Still waiting for details on this (especially whether it uses 3.3V or 1.8V for flash and SPRAM). I just don’t think we need to wait for these. At some point, it is ‘good enough’, and the ESP32 WROOM-32 seems fine for our purposes today. Regards, Mark.
On 30 Oct 2017, at 10:38 AM, Mark Webb-Johnson <mark@webb-johnson.net> wrote:
P.S. Note that Espressif are just now coming out with WROVER modules that include 32Mbit of PSRAM (can be mapped so that heap size goes up around 4MB!) Probably another few months before code support for that stabilises, and modules available in quantity. Perhaps in our future we could switch to that, but for the moment I think we’re ok. Just need to be careful.
Great, thanks! I'll also have a look at other saving options. The RT core size is just 1308 bytes, already including 512 bytes for metrics pointers. So names seem to count. Btw: the bluetooth stack eats up 64K of RAM when compiled in… https://www.esp32.com/viewtopic.php?t=3139 Regards, Michael Am 30.10.2017 um 03:38 schrieb Mark Webb-Johnson:
So, a reasonable saving of 10KB.
I tried with memory and task debugging turned off, and we get m.freeram 118888/98200 going to 121884/104932.
With the optimised OvmsMetrics code, RT seems to be allocating memory as follows:
Free 8-bit 85812/246632, 32-bit 18576/43548, numafter = 1000 task=Asy total= 0 20616 24972 change= +0 +20616 +0
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26
Michael, Regarding Bluetooth, please check ‘diff sdkconfig.default sdkconfig’ and make sure bluetooth settings match. We’ve got it configured to minimise that (primarily disabling the classic stack, and enabling the function to reclaim memory from that). Also, note that bluetooth is statically allocated at boot (for the 2.1 release of ESP IDF) so that means we’ve already got it included in our RAM remaining. Regards, Mark.
On 30 Oct 2017, at 9:07 PM, Michael Balzer <dexter@expeedo.de> wrote:
Great, thanks!
I'll also have a look at other saving options. The RT core size is just 1308 bytes, already including 512 bytes for metrics pointers. So names seem to count.
Btw: the bluetooth stack eats up 64K of RAM when compiled in… https://www.esp32.com/viewtopic.php?t=3139 <https://www.esp32.com/viewtopic.php?t=3139>
Regards, Michael
Am 30.10.2017 um 03:38 schrieb Mark Webb-Johnson:
So, a reasonable saving of 10KB.
I tried with memory and task debugging turned off, and we get m.freeram 118888/98200 going to 121884/104932.
With the optimised OvmsMetrics code, RT seems to be allocating memory as follows:
Free 8-bit 85812/246632, 32-bit 18576/43548, numafter = 1000 task=Asy total= 0 20616 24972 change= +0 +20616 +0
-- Michael Balzer * Helkenberger Weg 9 * D-58256 Ennepetal Fon 02333 / 833 5735 * Handy 0176 / 206 989 26 _______________________________________________ OvmsDev mailing list OvmsDev@lists.teslaclub.hk http://lists.teslaclub.hk/mailman/listinfo/ovmsdev
On Mon, 30 Oct 2017, Mark Webb-Johnson wrote:
Firstly, remember that with memory debugging turned on (menuconfig, Components, OVMS, Developer, Enabled extended RAM memory allocation statistics), there is an extra overhead for each memory allocation. I tested a simple allocation of a 32 byte object, and that reduced memory by 56 bytes. So, seems to be 24 bytes for each allocation. Not sure what it is without the debugging turned on (as much harder to see the allocations), but it should be a lot less.
In release 2.1 of esp-idf, the overhead for each block allocated with malloc or new when the memory debugging has NOT been turned on is the following 8-byte header: typedef struct A_BLOCK_LINK { struct A_BLOCK_LINK *pxNextFreeBlock; /*<< The next free block in the list. */ int xBlockSize: 24; /*<< The size of the free block. */ int xTag: 7; /*<< Tag of this region */ int xAllocated: 1; /*<< 1 if allocated */ } BlockLink_t; When the memory debugging is turned on there is an additional 12-byte header and 4-byte trailer added which accounts for the total of 24 bytes of overhead that Mark observed: typedef struct { unsigned int dog; TaskHandle_t task; unsigned int pc; } block_head_t; typedef struct { unsigned int dog; } block_tail_t; The "dog" words are set to 0x1A2B3C4D and are examined to look for bounds overruns. The task id is what we really need to keep track of who's allocating memory. I don't see the pc word being used in the heap_regions_debug.c code now, but my guess is that it was keeping track of where the allocation was done. Hmmm, maybe I can save 4 bytes? [does a test compile] Yes! Removing that word has the following savings. Here is the memory usage with WiFi, MDNS, Telnet and Web servers started, first before removing the "pc" word: Free 8-bit 59884/245680, 32-bit 16620/43428, blocks dumped = 0 task=Housekeeping total= 19400 46316 26140 change= +19400 +46316 +26140 task=main total= 34980 0 0 change= +34980 +0 +0 task=IDLE total= 36 0 0 change= +36 +0 +0 task=IDLE total= 36 0 0 change= +36 +0 +0 task=ipc1 total= 36 0 0 change= +36 +0 +0 task=ipc0 total= 9284 0 0 change= +9284 +0 +0 task=eventTask total= 44348 14292 668 change= +44348 +14292 +668 task=no task total= 8784 0 0 change= +8784 +0 +0 task=tiT total= 336 1440 0 change= +336 +1440 +0 task=wifi total= 0 5896 0 change= +0 +5896 +0 task=NetManTask total= 0 836 0 change= +0 +836 +0 task=TelnetServer total= 0 424 0 change= +0 +424 +0 Now after removing the "pc" word, a 4264-byte savings, which implies 1066 blocks allocated: Free 8-bit 64148/245712, 32-bit 16648/43432, blocks dumped = 0 task=Housekeeping total= 22076 42328 26120 change= +22076 +42328 +26120 task=main total= 34824 0 0 change= +34824 +0 +0 task=IDLE total= 32 0 0 change= +32 +0 +0 task=IDLE total= 32 0 0 change= +32 +0 +0 task=ipc1 total= 32 0 0 change= +32 +0 +0 task=ipc0 total= 9240 0 0 change= +9240 +0 +0 task=eventTask total= 42052 14204 664 change= +42052 +14204 +664 task=no task total= 8656 0 0 change= +8656 +0 +0 task=tiT total= 312 1392 0 change= +312 +1392 +0 task=wifi total= 0 5864 0 change= +0 +5864 +0 task=NetManTask total= 0 832 0 change= +0 +832 +0 task=TelnetServer total= 0 328 0 change= +0 +328 +0 I'll commit that change now. -- Steve
On Mon, 30 Oct 2017, Mark Webb-Johnson wrote:
Looking at the m_metrics map storage, I tried a simple:
std::forward_list<uint32_t> x;
[snip]
That works out at 32 bytes per std::forward_list entry. That is presumably the 24 bytes for the allocation, plus 8 bytes per entry (a pointer to the next, plus a pointer to the entry content).
The 8 bytes are for a pointer to the next entry and then the 4 bytes of the uint32_t itself. The overhead of forward_list is only one pointer.
The absolute most efficient dynamic system would be an OvmsMetric* m_next in OvmsMetric, then a OvmsMetric* m_first in OvmsMetrics, and remove the map altogether. That would be 4 bytes for the OvmsMetrics list, plus an extra 4 bytes for each metric.
The original design made a lot of use of MyMetrics.Find(), but that has been deprecated now and I don’t see anything using it at all. The only thing we need is a ‘metrics list’ iterator to show the details - all that needs is an ordered list.
So, I went ahead and did that. Changed ovms_metrics to use a manually managed one-way linked list.
Removing the map is fine. The overhead of forward_list is no more than your own explicitly written m_next pointer, so if there is any desire to keep the template for its available functions, you can.
We could use a static style allocation (std::vector, or a static array), and fixed structures (rather than dynamic objects). That would save the allocation overhead, but would make things a lot more rigid.
There is an approach in between where you create a pool of blocks of a given size by allocating some number of them in one malloc to amortize the overhead, then dynamically hand out blocks from those pools. When you need more, you allocate another chunk of blocks. -- Steve
participants (3)
-
Mark Webb-Johnson -
Michael Balzer -
Stephen Casner