Done. See BZ 834019.
Thanks,
Andreas
On Wed, Jun 20, 2012 at 3:50 PM, John Mazzitelli <mazz(a)redhat.com> wrote:
Andreas,
Can you write a BZ on this, explaining the problem (as you did here) and attach your
patch to it? I'd like to track this. I might have time to look at this myself.
Thanks,
John
----- Original Message -----
> We have a somewhat particular RHQ setup where we monitor a large
> number of resources remotely from a single agent. Par agent, we have
> +/- 25000 scheduled measurements with +/- 1500 measurement collected
> per minute. Since most of the metrics are collected with the same
> interval (10 minutes), this causes the following problem: when the
> agent is started (t=0), it will schedule all these metrics in the
> same
> interval [0s,30s]. However, because of the large number of
> measurements, the agent is not able to collect all of them in that
> 30s
> interval and will reschedule the remaining ones to the next interval
> in the original schedule, i.e. to [10m,10m+30s]. The same thing again
> happens in the interval [10m,10m+30s] and most of the measurements
> are
> rescheduled to the next interval [20m,20m+30s] and so forth. This
> means that some metrics are never collected (and are reported as
> "late" in the metrics of the RHQ agent).
>
> Note that the issue only occurs after restarting the agent. When the
> resources are originally added to the inventory, the corresponding
> measurement schedules are spread more or less randomly and the agent
> is able to collect all of them.
>
> To solve that issue with RHQ 3.0, I applied the following patch:
>
> Index:
> src/main/java/org/rhq/core/pc/measurement/MeasurementManager.java
> ===================================================================
> --- src/main/java/org/rhq/core/pc/measurement/MeasurementManager.java
> (revision 141630)
> +++ src/main/java/org/rhq/core/pc/measurement/MeasurementManager.java
> (revision 141631)
> @@ -484,6 +484,13 @@
> this.scheduledRequests.offer(scheduledMeasurement);
> }
> }
> +
> + public synchronized void
> reschedule(Set<ScheduledMeasurementInfo>
> scheduledMeasurementInfos, long interval) {
> + for (ScheduledMeasurementInfo scheduledMeasurement :
> scheduledMeasurementInfos) {
> +
>
scheduledMeasurement.setNextCollection(scheduledMeasurement.getNextCollection()
> + interval);
> + this.scheduledRequests.offer(scheduledMeasurement);
> + }
> + }
>
> /**
> * Sends the given measurement report to the server, if this
> plugin container has server services that it can
> Index:
> src/main/java/org/rhq/core/pc/measurement/MeasurementCollectorRunner.java
> ===================================================================
> ---
> src/main/java/org/rhq/core/pc/measurement/MeasurementCollectorRunner.java
> (revision 141630)
> +++
> src/main/java/org/rhq/core/pc/measurement/MeasurementCollectorRunner.java
> (revision 141631)
> @@ -71,7 +71,7 @@
> log.debug("Measurement collection is falling
> behind... Missed requested time by ["
> + (System.currentTimeMillis() -
> requests.iterator().next().getNextCollection()) + "ms]");
>
> - this.measurementManager.reschedule(requests);
> + this.measurementManager.reschedule(requests,
> 30000L);
> return report;
> }
>
> The idea is that instead of rescheduling the measurement according to
> the original schedule (e.g. from [0s,30s] to [10m,10m+30s]), it
> should
> simply be rescheduled to the next interval (from [0s,30s] to
> [30s,60s]).
>
> We are currently in the process of upgrading to RHQ 4.4. I didn't
> test
> the patch with that version yet, but after looking at the code I
> think
> it is still applicable. I would like to get some feedback about the
> approach: is it a valid way to solve the issue or are there better
> ways to do that?
>
> Andreas
> _______________________________________________
> rhq-users mailing list
> rhq-users(a)lists.fedorahosted.org
>
https://fedorahosted.org/mailman/listinfo/rhq-users
>
_______________________________________________
rhq-users mailing list
rhq-users(a)lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/rhq-users